Skip to content

Conversation

@non
Copy link
Contributor

@non non commented Jun 14, 2016

This is the free semigroup -- it represents combining one or more
elements using an associative operation. The values are stored in a
tree structure, which can be traversed. If a semigroup for the
underlying values becomes available, the structure can be summed to a
single value.

Batching represents a space/time trade-off (we use more space to use
less time). In some cases these structures may become too big. To
address this, there are constructors for Monoid[Batched[A]] and
Semigroup[Batched[A]] instances that will periodically compact
themselves (summing the tree to a single value) to keep the overall
size below a specific parameter.

Batching is appropriate anytime an eager concatenation or combination
will produce expensive intermediate values that will be thrown away, and a more
efficient aggregate summation is possible. In those cases, building up
a batch and then summing it should result in much more efficient
operations.

This commit adds the type, with documentation. It also adds some basic
law-checks and tests.

This is the free semigroup -- it represents combining one or more
elements using an associative operation. The values are stored in a
tree structure, which can be traversed. If a semigroup for the
underlying values becomes available, the structure can be summed to a
single value.

Batching represents a space/time trade-off (we use more space to use
less time). In some cases these structures may become too big. To
address this, there are constructors for Monoid[Batched[A]] and
Semigroup[Batched[A]] instances that will periodically compact
themselves (summing the tree to a single value) to keep the overall
size below a specific parameter.

Batched is appropriate anytime an eager concatenation or combination
will produce intermediate values that will be thrown away, and a more
efficient aggregate summation is possible. In those cases, building up
a batch and then summing it should result in much more efficient
operations.

This commit adds the type, with documentation. It also adds some basic
law-checks and tests.
This commit uses internal batching to avoid creating intermediate
lists. We store a tree structure and then just render the list
once at the end.

There are probably other places that can benefit from the same
approach, but this was one that was pointed out to me.x
* `Batched[T]` values are guaranteed not to be empty -- that is, they
* will contain at least one `T` value.
*/
sealed abstract class Batched[T] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extend Serializable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

There are a number of changes here:

 - Added Equiv[Batched[A]] (though it's slow)
 - Removed the non-compacting Monoid[Batched[A]]
 - Added/improved comments
 - Added .compact method
 - Ensured that folds are correctly compacted
 - Make Batched[A] serializable
* One thing to note here is that two equivalent batches might
* produce different lists (for instance, if one of the batches has
* more zeros in it than another one).
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this implicit?

@johnynek
Copy link
Collaborator

👍

@johnynek
Copy link
Collaborator

looks like a 2.10 compile failure: https://travis-ci.org/twitter/algebird/jobs/137652585#L1517

*/
implicit def equiv[A](implicit s: Semigroup[A]): Equiv[Batched[A]] =
new Equiv[Batched[A]] {
def equiv(x: Batched[A], y: Batched[A]): Boolean = x.sum(s) == y.sum(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually his needs Equiv[A] too. We should not be using universal equals inside here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Heh...

@johnynek
Copy link
Collaborator

👍 when green

@non
Copy link
Contributor Author

non commented Jun 15, 2016

Looks like a stalled build or something. 🚒

@johnynek johnynek merged commit 6ee6f92 into twitter:develop Jun 15, 2016
@johnynek
Copy link
Collaborator

Thanks @non! this is great! Batched along with sumOption pretty much allow fairly good optimization of almost any semigroup (assuming the allocation costs of Batched << cost of semigroup.plus).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants