Algebird support for spark #397

johnynek · 2015-01-10T00:12:03Z

This needs tests to make sure the implicits are wired up correctly, but it seems like it is time to make Algebird-spark to make it easy to use Algebird with spark.

johnynek · 2015-01-10T00:13:40Z

@ianoc and @tsdeng care to look at this?

avibryant · 2015-01-10T00:32:17Z

algebird-spark/src/main/scala/com/twitter/algebird/spark/package.scala

why aggregatorOnAll and aggregatorByKey rather than aggregateAll and aggregateByKey? The latter seem a closer fit to spark's reduceByKey

There is already aggregateByKey. I will add tests and see if the implicit enrichment can differentiate based on the arguments. It may, or it may give the user a very confusing message. Let's see.

ianoc · 2015-01-13T01:08:53Z

any tests? looks fine modulo Avi's comment about naming would be nice to be closer to sparks.

miguno · 2015-01-14T09:00:45Z

algebird-spark/src/main/scala/com/twitter/algebird/spark/package.scala

Edit: Doh, should have read the code more closely before asking the first question. Never mind. :-)

Any particular reason to give a default of null to ordK and/or to use an implicit parameter instead of K: ClassTag: Ordering?

implicit ordK: Ordering[K] = null

seems like spark follows this style, which I assume that it means it can try to work even if there is no Ordering. Scalding requires the ordering, and uses the approach you suggest.

adelbertc · 2015-02-28T23:22:50Z

Interested in this effort, especially with the algebra stuff going on. I've talked with a few people briefly about a more open-source effort to share common Spark things that are related to Algebird, e.g. having functions that takes a Monoid instead of A => (A => A => A), etc.

ianoc · 2015-03-26T19:04:09Z

@johnynek This looks good to me, if we can fix the merge conflict looks good to merge

johnynek · 2015-03-27T19:37:57Z

@ianoc we need tests too, right?

virtualirfan · 2015-03-27T19:59:57Z

An example would be nice too.

ianoc · 2015-03-27T21:15:57Z

Tests would be great but as some tiny implicits id happily ship if manually verified and follow up with those later.

adelbertc · 2015-03-28T20:44:15Z

Latest version of Spark is now 1.3.0

reconditesea · 2015-03-30T17:37:11Z

+1 for a small tutorial job of how to make spark work with algebird.

johnynek · 2015-07-01T23:37:27Z

@adelbertc can you review?

@ianoc this has tests now, and I think reasonably efficient implementations, especially for sum, sumOption, aggregate, aggregateOption.

I don't see a way to use sumOption in sumByKey or aggregateByKey with reimplementing or skipping map-side combining. I wish spark had something like scalding's sumByLocalKeys.

adelbertc · 2015-07-02T03:59:28Z

project/Build.scala

Latest Spark version is 1.4.0 now, not sure how you want to handle versioning

Its provided so the user brings their own. Once the API's ABI compatible in spark this uses it should work fine for both

adelbertc · 2015-07-02T04:08:44Z

👍 LGTM, only thing I'm concerned about is whether or not commutativity is needed in the sumOption case

jnievelt · 2015-07-02T04:29:45Z

algebird-spark/src/main/scala/com/twitter/algebird/spark/AlgebirdRDD.scala

ToAlgebird?

johnynek · 2015-07-02T18:22:18Z

I think this addresses all review concerns. I also added Partitioner support for the cases where you want to control partitioning.

ianoc · 2015-07-02T18:55:35Z

lgtm, merge when green

ianoc · 2015-07-02T19:55:30Z

@johnynek incase you haven't seen, failing with
[info] Compiling 1 Scala source to /home/travis/build/twitter/algebird/algebird-bijection/target/scala-2.11/classes...
[error] /home/travis/build/twitter/algebird/algebird-spark/src/main/scala/com/twitter/algebird/spark/AlgebirdRDD.scala:14: in class AlgebirdRDD, multiple overloaded alternatives of method sumByKey define default arguments.
[error] class AlgebirdRDD[T](val rdd: RDD[T]) extends AnyVal {
[error] ^
[error] one error found

johnynek · 2015-07-02T19:56:45Z

@ianoc yeah. I fixed it by adding a class that @non , @avibryant and I (I think it was) discussed and @non put in algebra.

still need to get algebird to depend on algebra.

ianoc · 2015-07-02T20:01:00Z

Cool, lgtm anyway with this in place

Algebird support for spark

johnynek · 2015-07-02T20:38:42Z

coveralls is keeping this yellow. Need to get this sorted out.

Initial sketch of using Algebird with spark

382bab1

avibryant reviewed Jan 10, 2015
View reviewed changes

miguno reviewed Jan 14, 2015
View reviewed changes

johnynek added 4 commits July 1, 2015 12:13

Add tests, refactor to .algebird enrichment

2fcf1cd

Merge with develop

7b188ad

Improve the implementation of sumOption

861481d

Minor improvement to sumOption

e3c32d7

johnynek changed the title ~~Initial sketch of using Algebird with spark~~ Algebird support for spark Jul 1, 2015

johnynek added 4 commits July 1, 2015 13:39

Add spark to the aggregate target

2c35745

Add Monoid/Semigroup for RDD

c761e3e

Actually add the tests

9d55e82

force shuffle in sumOption

8ca7278

adelbertc reviewed Jul 2, 2015
View reviewed changes

jnievelt reviewed Jul 2, 2015
View reviewed changes

algebird-spark/src/main/scala/com/twitter/algebird/spark/AlgebirdRDD.scala Outdated

Copy link

Contributor

jnievelt Jul 2, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToAlgebird?

johnynek added 2 commits July 2, 2015 08:15

Add comments, add Partitioner support

a61d7d1

Fix comment, test API usage

1c3f753

Fix multiple default arg method error scala 2.11/add Priority

78d2436

DRY (and fix missing implicit error)

0e79e83

johnynek added a commit that referenced this pull request Jul 2, 2015

Merge pull request #397 from twitter/algebird-spark

917e6bf

Algebird support for spark

johnynek merged commit 917e6bf into develop Jul 2, 2015

johnynek deleted the algebird-spark branch July 2, 2015 20:38

Algebird support for spark #397

Algebird support for spark #397

Uh oh!

Conversation

johnynek commented Jan 10, 2015

Uh oh!

johnynek commented Jan 10, 2015

Uh oh!

avibryant Jan 10, 2015

Choose a reason for hiding this comment

Uh oh!

johnynek Jan 10, 2015

Choose a reason for hiding this comment

Uh oh!

ianoc commented Jan 13, 2015

Uh oh!

miguno Jan 14, 2015

Choose a reason for hiding this comment

Uh oh!

johnynek Jan 14, 2015

Choose a reason for hiding this comment

Uh oh!

adelbertc commented Feb 28, 2015

Uh oh!

ianoc commented Mar 26, 2015

Uh oh!

johnynek commented Mar 27, 2015

Uh oh!

virtualirfan commented Mar 27, 2015

Uh oh!

ianoc commented Mar 27, 2015

Uh oh!

adelbertc commented Mar 28, 2015

Uh oh!

reconditesea commented Mar 30, 2015

Uh oh!

johnynek commented Jul 1, 2015

Uh oh!

adelbertc Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

ianoc Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

adelbertc commented Jul 2, 2015

Uh oh!

jnievelt Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

johnynek commented Jul 2, 2015

Uh oh!

ianoc commented Jul 2, 2015

Uh oh!

ianoc commented Jul 2, 2015

Uh oh!

johnynek commented Jul 2, 2015

Uh oh!

ianoc commented Jul 2, 2015

Uh oh!

johnynek commented Jul 2, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants