Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
cd7bf7f
Parameterize CMS with type K
Oct 9, 2014
90fe5e1
Use require instead of assert to enforce contract
Oct 9, 2014
c770d79
Test monoid laws of CMS for Byte, Short, Int, Long, BigInt
Oct 9, 2014
89acbfc
CMS: use a type alias to test K=Long in test spec
Oct 9, 2014
5932918
Use idiomatic ScalaTest matching style instead of asserts
Oct 9, 2014
bfd5aba
CMS: test K in {Short, Int, Long, BigInt}
Oct 9, 2014
f386593
CMS: remove implicit for Byte
Oct 9, 2014
43f5101
Reformat class parameters of CMSInstance
Oct 10, 2014
d911817
CMS: split responsibility of counting/freq estimation and heavy hitte…
Oct 10, 2014
7288f8e
Improve documentation
Oct 11, 2014
49f275f
Reorder classes and improve their docs
Oct 11, 2014
adb85b0
Improve documentation
Oct 11, 2014
278213d
Require at least d (aka depth) hash functions
Oct 11, 2014
316c110
Move TODO out of scaladoc
Oct 11, 2014
0fd84f9
Move implicits to top of class
Oct 11, 2014
fc7ea4f
Do not shadow type parameter K of test suite
Oct 11, 2014
ec997b2
Add monoid laws test for Top-% CMS
Oct 11, 2014
83b5133
Add default value (0.01) for heavyHittersPct
Oct 11, 2014
791239d
Uppercase Cms to CMS throughout all CMS-related classes
Oct 11, 2014
12f4ff0
Revert 83b5133: Add default value (0.01) for heavyHittersPct
Oct 11, 2014
be6c213
Move scaladoc to correct class
Oct 11, 2014
c02d771
Reorder code
Oct 11, 2014
ae0a963
Improve docs
Oct 11, 2014
13121c9
Add caliper benchmark for Count-Min Sketch
Oct 12, 2014
c22218a
Add README for algebird-caliper
Oct 12, 2014
093991c
Add example output to algebird-caliper README
Oct 12, 2014
1e065e2
Rename conf to params to be consistent across CMS implementations
Oct 12, 2014
8dea743
Assert that computed hashes are not negative
Oct 12, 2014
e54a271
Improve scaladoc of CMS and TopPctCMS
Oct 13, 2014
d1cc35b
Add aggregator test for CMS[K], split test cases by CMS trait
Oct 13, 2014
10af35f
Use TopPctCMS for testing CMSCounting functionality
Oct 13, 2014
e61fad4
Also test CMSCounting-based aggregator functionality of TopPctCMS
Oct 13, 2014
ddd2985
Add reference to Spire's numerical data types as possible choices for K
Oct 13, 2014
dd5eaae
heavyHittersPct should not be part of the trait/contact of CMSHeavyHi…
Oct 14, 2014
9162b20
Clarify the meaning and effect of heavyHittersPct
Oct 14, 2014
a76b645
Simplify purging heavy hitters when building a new TopPctCMSInstance
Oct 14, 2014
47d12ac
Decouple logic from computing/pruning heavy hitters from "Top" CMS
Oct 14, 2014
ba19158
Restructure code
Oct 15, 2014
ec1e4f8
Explicitly cast to Vector[V]
Oct 15, 2014
cf7a150
Document K type
Oct 15, 2014
3cf86f2
Minor code reformatting (no functional changes)
Oct 15, 2014
c3f3678
Add TODO to re-consider returning a sorted list for HeavyHitters#items
Oct 15, 2014
3273d0b
Include heavy hitters logic in CMSHeavyHitters trait
Oct 15, 2014
f2a1e53
CMSBenchmark: Use correct CMS classes
Oct 15, 2014
317eb32
Document reasoning behind our use of implicit conversions
Oct 16, 2014
f4479e4
Remove unneeded comment
Oct 16, 2014
5daf454
Clarify why we keep the top-N CMS tests in the spec
Oct 16, 2014
1042303
Add warning about unsafe merge operation for top-N CMS
Oct 16, 2014
680c1e9
Clarify semantics of updateHeavyHitters
Oct 16, 2014
855f54f
Remove TODOs
Oct 16, 2014
c4a578f
Update scaladoc of TopCMS to match latest code
Oct 16, 2014
d2bcb72
Clarify scaladoc comment of heavyHittersLogic
Oct 16, 2014
840f180
Improve docs
Oct 16, 2014
989f709
Require the same hash functions for sketches when merging them
Oct 16, 2014
15563d3
Use scaladoc wording recommended by Scala style guide
Oct 16, 2014
3a1441d
Update top-N CMS tests to clarify top-N limitations
Oct 16, 2014
47416ea
Clarify test case of top-% CMS
Oct 16, 2014
bd5c279
Clarify merge tests
Oct 16, 2014
90490f3
Improve spec of CMS, e.g. new tests added
Oct 16, 2014
e5d4441
Clarify when warning on top-N CMS merges actually apply
Oct 16, 2014
6a3b015
Merge branch 'develop' into GH-345
Nov 18, 2014
a57bb74
Use iterator to prevent unneeded allocation
Nov 18, 2014
40f673c
Remove unneeded Ordering context bound from HeavyHitter case class
Nov 18, 2014
cd17745
Fix typo in test description
Nov 18, 2014
bec16fe
Add com.twitter.algebird.legacy for old CMS and CountMinSketchMonoid
Nov 19, 2014
9c02f1a
Add tests to verify roundtripping depth/delta and width/eps
Nov 19, 2014
ba498d6
Improve width() by truncating decimal places to eliminate precision e…
Nov 19, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions algebird-caliper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
[Caliper](https://code.google.com/p/caliper/)-based Benchmarks for Algebird data structures.

# Usage

Run the following commands from the top-level Algebird directory:

$ ./sbt # <<< enter sbt REPL
> project algebird-caliper

Now you can run the following commands from within the sbt REPL:

# List available benchmarks
> show cappi::benchmarks

# Run a particular benchmark
> cappi::benchmarkOnly com.twitter.algebird.caliper.HLLBenchmark

# Debug a particular benchmark (shows e.g. number of repetitions that will be run)
> cappi::benchmarkOnly --debug com.twitter.algebird.caliper.HLLBenchmark

# Run all benchmarks (apparently this is broken, see https://github.com/softprops/cappi/issues/1)
> cappi::benchmarks

You can find further details in the [cappi](https://github.com/softprops/cappi) documentation, which is the sbt plugin
we use to run the caliper benchmarks.

Example output for [CMSBenchmark](src/test/scala/com/twitter/algebird/caliper/CMSBenchmark.scala):

> cappi::benchmarkOnly com.twitter.algebird.caliper.CMSBenchmark
[info] Running com.google.caliper.Runner com.twitter.algebird.caliper.CMSBenchmark
[info] 0% Scenario{vm=java, trial=0, benchmark=PlusOfFirstHundredIntegersWithLongCms, delta=0.0000001, eps=0.1, heavyHittersPct=0.2, maxBits=2048, operations=100} 292576.31 ns; σ=1271.12 ns @ 3 trials
[info] 17% Scenario{vm=java, trial=0, benchmark=PlusOfFirstHundredIntegersWithBigIntCms, delta=0.0000001, eps=0.1, heavyHittersPct=0.2, maxBits=2048, operations=100} 830195.29 ns; σ=7349.10 ns @ 3 trials
[info] 33% Scenario{vm=java, trial=0, benchmark=PlusOfRandom2048BitNumbersWithBigIntCms, delta=0.0000001, eps=0.1, heavyHittersPct=0.2, maxBits=2048, operations=100} 3362751.81 ns; σ=104683.16 ns @ 10 trials
[info] 50% Scenario{vm=java, trial=0, benchmark=PlusOfFirstHundredIntegersWithLongCms, delta=0.0000001, eps=0.005, heavyHittersPct=0.2, maxBits=2048, operations=100} 384133.61 ns; σ=41211.47 ns @ 10 trials
[info] 67% Scenario{vm=java, trial=0, benchmark=PlusOfFirstHundredIntegersWithBigIntCms, delta=0.0000001, eps=0.005, heavyHittersPct=0.2, maxBits=2048, operations=100} 1018308.55 ns; σ=43285.12 ns @ 10 trials
[info] 83% Scenario{vm=java, trial=0, benchmark=PlusOfRandom2048BitNumbersWithBigIntCms, delta=0.0000001, eps=0.005, heavyHittersPct=0.2, maxBits=2048, operations=100} 3610991.09 ns; σ=195033.95 ns @ 10 trials
[info]
[info] benchmark eps us linear runtime
[info] PlusOfFirstHundredIntegersWithLongCms 0.1 293 ==
[info] PlusOfFirstHundredIntegersWithLongCms 0.005 384 ===
[info] PlusOfFirstHundredIntegersWithBigIntCms 0.1 830 ======
[info] PlusOfFirstHundredIntegersWithBigIntCms 0.005 1018 ========
[info] PlusOfRandom2048BitNumbersWithBigIntCms 0.1 3363 ===========================
[info] PlusOfRandom2048BitNumbersWithBigIntCms 0.005 3611 ==============================
[info]
[info] vm: java
[info] trial: 0
[info] delta: 0.0000001
[info] heavyHittersPct: 0.2
[info] maxBits: 2048
[info] operations: 100
[success] Total time: 74 s, completed Oct 12, 2014 2:36:04 PM
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
package com.twitter.algebird.caliper

import com.google.caliper.{ Param, SimpleBenchmark }
import com.twitter.algebird.{ TopPctCMS, TopCMS, CMSHasherImplicits, TopPctCMSMonoid }

/**
* Benchmarks the Count-Min sketch implementation in Algebird.
*
* We benchmark different `K` types as well as different input data streams.
*/
// Once we can convince cappi (https://github.com/softprops/capp) -- the sbt plugin we use to run
// caliper benchmarks -- to work with the latest caliper 1.0-beta-1, we would:
// - Let `CMSBenchmark` extend `Benchmark` (instead of `SimpleBenchmark`)
// - Annotate `timePlus` with `@MacroBenchmark`.
class CMSBenchmark extends SimpleBenchmark {

@Param(Array("0.1", "0.005"))
val eps: Double = 0.0

@Param(Array("0.0000001" /* 1E-8 */ ))
val delta: Double = 0.0

@Param(Array("0.2"))
val heavyHittersPct: Double = 0.0

@Param(Array("100"))
val operations: Int = 0 // Number of operations per benchmark repetition (cf. `reps`)

@Param(Array("2048"))
val maxBits: Int = 0

var random: scala.util.Random = _
var cmsLongMonoid: TopPctCMSMonoid[Long] = _
var cmsBigIntMonoid: TopPctCMSMonoid[BigInt] = _

override def setUp {
// Required import of implicit values (e.g. for BigInt- or Long-backed CMS instances)
import CMSHasherImplicits._

cmsLongMonoid = {
val seed = 1
TopPctCMS.monoid[Long](eps, delta, seed, heavyHittersPct)
}

cmsBigIntMonoid = {
val seed = 1
TopPctCMS.monoid[BigInt](eps, delta, seed, heavyHittersPct)
}

random = new scala.util.Random
}

// Case A (K=Long): We count the first hundred integers, i.e. [1, 100]
def timePlusOfFirstHundredIntegersWithLongCms(reps: Int): Int = {
var dummy = 0
while (dummy < reps) {
(1 to operations).view.foldLeft(cmsLongMonoid.zero)((l, r) => { l ++ cmsLongMonoid.create(r) })
dummy += 1
}
dummy
}

// Case B.1 (K=BigInt): We count the first hundred integers, i.e. [1, 100]
def timePlusOfFirstHundredIntegersWithBigIntCms(reps: Int): Int = {
var dummy = 0
while (dummy < reps) {
(1 to operations).view.foldLeft(cmsBigIntMonoid.zero)((l, r) => { l ++ cmsBigIntMonoid.create(r) })
dummy += 1
}
dummy
}

// Case B.2 (K=BigInt): We draw numbers randomly from a 2^maxBits address space
def timePlusOfRandom2048BitNumbersWithBigIntCms(reps: Int): Int = {
var dummy = 0
while (dummy < reps) {
(1 to operations).view.foldLeft(cmsBigIntMonoid.zero)((l, r) => {
val n = scala.math.BigInt(maxBits, random)
l ++ cmsBigIntMonoid.create(n)
})
dummy += 1
}
dummy
}

}
Loading