-
Notifications
You must be signed in to change notification settings - Fork 347
Add Batched[A] type for efficient lazy addition #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is the free semigroup -- it represents combining one or more elements using an associative operation. The values are stored in a tree structure, which can be traversed. If a semigroup for the underlying values becomes available, the structure can be summed to a single value. Batching represents a space/time trade-off (we use more space to use less time). In some cases these structures may become too big. To address this, there are constructors for Monoid[Batched[A]] and Semigroup[Batched[A]] instances that will periodically compact themselves (summing the tree to a single value) to keep the overall size below a specific parameter. Batched is appropriate anytime an eager concatenation or combination will produce intermediate values that will be thrown away, and a more efficient aggregate summation is possible. In those cases, building up a batch and then summing it should result in much more efficient operations. This commit adds the type, with documentation. It also adds some basic law-checks and tests.
This commit uses internal batching to avoid creating intermediate lists. We store a tree structure and then just render the list once at the end. There are probably other places that can benefit from the same approach, but this was one that was pointed out to me.x
| * `Batched[T]` values are guaranteed not to be empty -- that is, they | ||
| * will contain at least one `T` value. | ||
| */ | ||
| sealed abstract class Batched[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extend Serializable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There are a number of changes here: - Added Equiv[Batched[A]] (though it's slow) - Removed the non-compacting Monoid[Batched[A]] - Added/improved comments - Added .compact method - Ensured that folds are correctly compacted - Make Batched[A] serializable
| * One thing to note here is that two equivalent batches might | ||
| * produce different lists (for instance, if one of the batches has | ||
| * more zeros in it than another one). | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this implicit?
|
👍 |
|
looks like a 2.10 compile failure: https://travis-ci.org/twitter/algebird/jobs/137652585#L1517 |
| */ | ||
| implicit def equiv[A](implicit s: Semigroup[A]): Equiv[Batched[A]] = | ||
| new Equiv[Batched[A]] { | ||
| def equiv(x: Batched[A], y: Batched[A]): Boolean = x.sum(s) == y.sum(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually his needs Equiv[A] too. We should not be using universal equals inside here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! Heh...
|
👍 when green |
|
Looks like a stalled build or something. 🚒 |
|
Thanks @non! this is great! |
This is the free semigroup -- it represents combining one or more
elements using an associative operation. The values are stored in a
tree structure, which can be traversed. If a semigroup for the
underlying values becomes available, the structure can be summed to a
single value.
Batching represents a space/time trade-off (we use more space to use
less time). In some cases these structures may become too big. To
address this, there are constructors for
Monoid[Batched[A]]andSemigroup[Batched[A]]instances that will periodically compactthemselves (summing the tree to a single value) to keep the overall
size below a specific parameter.
Batching is appropriate anytime an eager concatenation or combination
will produce expensive intermediate values that will be thrown away, and a more
efficient aggregate summation is possible. In those cases, building up
a batch and then summing it should result in much more efficient
operations.
This commit adds the type, with documentation. It also adds some basic
law-checks and tests.