-
Notifications
You must be signed in to change notification settings - Fork 347
Algebird support for spark #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aggregatorOnAll and aggregatorByKey rather than aggregateAll and aggregateByKey? The latter seem a closer fit to spark's reduceByKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already aggregateByKey. I will add tests and see if the implicit enrichment can differentiate based on the arguments. It may, or it may give the user a very confusing message. Let's see.
|
any tests? looks fine modulo Avi's comment about naming would be nice to be closer to sparks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Edit: Doh, should have read the code more closely before asking the first question. Never mind. :-)
-
Any particular reason to give a default of
nulltoordKand/or to use an implicit parameter instead ofK: ClassTag: Ordering?implicit ordK: Ordering[K] = null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like spark follows this style, which I assume that it means it can try to work even if there is no Ordering. Scalding requires the ordering, and uses the approach you suggest.
|
Interested in this effort, especially with the algebra stuff going on. I've talked with a few people briefly about a more open-source effort to share common Spark things that are related to Algebird, e.g. having functions that takes a |
|
@johnynek This looks good to me, if we can fix the merge conflict looks good to merge |
|
@ianoc we need tests too, right? |
|
An example would be nice too. |
|
Tests would be great but as some tiny implicits id happily ship if manually verified and follow up with those later. |
|
Latest version of Spark is now 1.3.0 |
|
+1 for a small tutorial job of how to make spark work with algebird. |
|
@adelbertc can you review? @ianoc this has tests now, and I think reasonably efficient implementations, especially for sum, sumOption, aggregate, aggregateOption. I don't see a way to use sumOption in sumByKey or aggregateByKey with reimplementing or skipping map-side combining. I wish spark had something like scalding's sumByLocalKeys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest Spark version is 1.4.0 now, not sure how you want to handle versioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its provided so the user brings their own. Once the API's ABI compatible in spark this uses it should work fine for both
|
👍 LGTM, only thing I'm concerned about is whether or not commutativity is needed in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToAlgebird?
|
I think this addresses all review concerns. I also added Partitioner support for the cases where you want to control partitioning. |
|
lgtm, merge when green |
|
@johnynek incase you haven't seen, failing with |
|
@ianoc yeah. I fixed it by adding a class that @non , @avibryant and I (I think it was) discussed and @non put in algebra. still need to get algebird to depend on algebra. |
|
Cool, lgtm anyway with this in place |
|
coveralls is keeping this yellow. Need to get this sorted out. |
This needs tests to make sure the implicits are wired up correctly, but it seems like it is time to make Algebird-spark to make it easy to use Algebird with spark.