-
Notifications
You must be signed in to change notification settings - Fork 347
Description
This ticket is a follow-up on the discussion we had in the pull request 354, i.e. the CMS[K] pull request.
Algebird currently has the three data structures with overlapping functionality, and the question is whether we could/should consolidate them in some way -- primarily to reduce maintenance cost.
- SketchMap
- CMS with top-N and top-% variants)
- SpaceSaver
Overlapping functionality
- SketchMap[K, V] and CMS[K] are almost identical except that SketchMap also parameterizes the value.
- However, the current implementation of SketchMap is significantly slower than CMS (in this experiment,
SketchMap[BigInt]achieved0.3k writes/scompared to65k writes/sfor CMS[BigInt], i.e. it was slower by two orders of magnitude).
- However, the current implementation of SketchMap is significantly slower than CMS (in this experiment,
- SpaceSaver may be an alternative to a top-N CMS.
- Note that a top-N CMS is not commutative, which limits is applicability in practice. (A top-% CMS is commutative though.)
Past discussions
The snippets below are taken from the conversations in GH-345 and the pull request 354, where we discussed the work on turning the old Long-based CMS into a parameterized CMS[K].
@johnynek wrote (source):
I am concerned that we have now SketchMap and CMS that are almost identical except that SketchMap parameterizes the value. Given that making the key generic didn't hurt performance, I tend to think we must just have some small bottleneck in SketchMap that could be fixed. [...] It would be great to give this a going over with yourkit and see if we can identify a few hotspots [...].@avibryant wrote (source):
I don't think SpaceSaver replaces SketchMap, but I do think it might be a good alternative to TopNLogic. (It would be interesting to do some benchmarking of error rates between these two). But SpaceSaver doesn't even parameterize the value type currently, which is something else we should fix.