Skip to content

Conversation

@sritchie
Copy link
Collaborator

@sritchie sritchie commented Dec 3, 2016

Various plugin and SBT upgrades. I managed to get rid of all deprecation warnings in the build.sbt, which is a nice start to this new era.

The three issues we found were

  • a legit bug with transient lazy val serialization on scala 2.12.0 - this is fixed by 2.12.1.
  • a bug with Interval (Either gained a contains method, screwing up our implicit conversion trick)
  • a bug with a generator in one of the tests, fixed by using Gen.nonEmptyList

We also did some work to remove the Ordering instance from the Interval trait, since we're going binary-incompatible on this release.

The only library excluded is algebird-spark, since spark isn't on 2.12 yet. Turns out they're waiting on our chill release!

@sritchie sritchie added this to the 0.13.0 milestone Dec 3, 2016
@sritchie sritchie mentioned this pull request Dec 3, 2016
@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

Whelp, looks like just killing those overrides breaks QTree. Leaving this one to the pros!

@johnynek
Copy link
Collaborator

johnynek commented Dec 3, 2016

@ianoc we are willing to break binary compatibility here so maybe we can clean up.

As I recall, you made this change to make the hashCode lazy, is that right?

Did you look up some alternate syntax that would work here? We could always just manually implement a hashCode for these types (also not using lazy is probably worth it, and just checking if the hashCode is 0 or something and recomputing if it is).

@ianoc
Copy link
Collaborator

ianoc commented Dec 3, 2016

Yeah i made the change, it has a pretty big performance impact. I don't really care how it gets done, I'd probably keep it using lazy unless it makes a visible change to a benchmark, just that its obvious what it does to anyone driving past and doesn't seem necessary for the fix.

Upstream just up and deleted these functions iirc, since the code generation in case classes doesn't call them anymore, i believe they inject the hash code function now directly into the generated class.

the method in 2.11 is just
scala.util.hashing.MurmurHash3.productHash(this)

which still exists in 2.12, so the minimal change is to just use
https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/hashing/MurmurHash3.scala#L211
directly?

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

This excludes algebird-spark, since spark isn't ready for 2.12 primetime yet. Also fixes deprecation warnings from the SBT upgrade (there's still one left that I'm not sure how to fix)

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

Legit failures now:

[info] IntervalLaws:
[info] - [x, x + 1) contains x *** FAILED ***
[info]   GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
[info]    (CheckProperties.scala:12)
[info]     Falsified after 0 successful property evaluations.
[info]     Location: (CheckProperties.scala:12)
[info]     Occurred when passed generated values (
[info]       arg0 = 0 // 32 shrinks
[info]     )
[info] - (x, x + 1] contains x + 1 *** FAILED ***
[info]   GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
[info]    (CheckProperties.scala:12)
[info]     Falsified after 0 successful property evaluations.
[info]     Location: (CheckProperties.scala:12)
[info]     Occurred when passed generated values (
[info]       arg0 = 0 // 32 shrinks
[info]     )

And this serialization error:

[info] - should not serialize BFInstance *** FAILED ***
[info]   "...ebird.BF"Ԉ��Y�X�xp[0sr,com.googlecode.javaewah.EWAHCompressedBitma��f�5I8xpw�0�������xpsr�com.twitter.algebird.BFHash��PP\l����I	numHashesI�sizeI�widthxp��0]" did not equal "...ebird.BF"Ԉ��Y�X�xp[�0sr,com.googlecode.javaewah.EWAHCompressedBitma��f�5I8xpw�0�������xsr�com.twitter.algebird.BFInstance��	������bitmap$0I�numBitsI	numHashesI�widthL�bitst#Lscala/collection/immutable/BitSet;L�hashesq~�L�sizet"Lcom/twitter/algebird/Approximate;xq~����0sr)scala.collection.immutable.BitSet$BitSet1�]��FD����J�elemsxr!scala.collection.immutable.BitSet�\�2�����xp�����sr�com.twitter.algebird.BFHash��PP\l����I	numHashesI�sizeI�widthxp��0sr com.twitter.algebird.Approximate����8d:v��D�probWithinBoundsestimatet�Ljava/lang/Object;L�maxq~�L�minq~�L�numerict�Lscala/math/Numeric;xpsr�java.lang.Long;��̏#���J�valuexr�java.lang.Number���������xsq~�	sq~��sr"scala.math.Numeric$LongIsIntegral$��wԗ����xpq~�]" (BloomFilterTest.scala:266)

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

Here's the bug for interval, it's legit:

scala> Interval.leftClosedRightOpen(0, 1).contains(0)
<console>:15: warning: a type was inferred to be `Any`; this may indicate a programming error.
       Interval.leftClosedRightOpen(0, 1).contains(0)
res8: Boolean = false

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

This bug too for 2.12

[info] ! CountMinSketch.CMS works for small lists: Exception raised on property evaluation.
[info] > Exception: java.lang.IllegalArgumentException: oneOf called on empty collection
[info] org.scalacheck.Gen$.oneOf(Gen.scala:471)
[info] com.twitter.algebird.CmsFrequencyProperty.inputGenerator(CountMinSketchTest.scala:320)
[info] com.twitter.algebird.CmsFrequencyProperty.inputGenerator(CountMinSketchTest.scala:312)
[info] com.twitter.algebird.ApproximateProperty$.$anonfun$successesAndProbabilities$1(ApproximateProperty.scala:39)
[info] com.twitter.algebird.ApproximateProperty$.successesAndProbabilities(ApproximateProperty.scala:37)
[info] com.twitter.algebird.ApproximateProperty$.$anonfun$toProp$1(ApproximateProperty.scala:58)
[info] com.twitter.algebird.ApproximateProperty$$$Lambda$5713/1306151911.apply(null:-1)
[info] org.scalacheck.Prop$.$anonfun$apply$1(Prop.scala:292)
[info] com.twitter.algebird.ApproximateProperty$$$Lambda$5733/1466457919.apply(null:-1)
[info] org.scalacheck.PropFromFun.apply(Prop.scala:22)
[info] org.scalacheck.Prop$.$anonfun$delay$1(Prop.scala:461)
[info] org.scalacheck.Prop$$$Lambda$2719/509484219.apply(null:-1)
[info] org.scalacheck.Prop$.$anonfun$apply$1(Prop.scala:292)
[info] org.scalacheck.Prop$$$Lambda$2720/724561827.apply(null:-1)
[info] org.scalacheck.PropFromFun.apply(Prop.scala:22)
[info] org.scalacheck.Test$.workerFun$1(Test.scala:294)
[info] org.scalacheck.Test$.$anonfun$check$1(Test.scala:323)
[info] org.scalacheck.Test$.$anonfun$check$1$adapted(Test.scala:323)
[info] org.scalacheck.Test$$$Lambda$3523/1495267299.apply(null:-1)
[info] org.scalacheck.Platform$.runWorkers(Platform.scala:40)
[info] org.scalacheck.Test$.check(Test.scala:323)
[info] org.scalacheck.ScalaCheckRunner$$anon$2.$anonfun$execute$5(ScalaCheckFramework.scala:102)
[info] org.scalacheck.ScalaCheckRunner$$anon$2.$anonfun$execute$5$adapted(ScalaCheckFramework.scala:100)
[info] org.scalacheck.ScalaCheckRunner$$anon$2$$Lambda$5703/42367799.apply(null:-1)
[info] scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:742)
[info] scala.collection.TraversableLike$WithFilter$$Lambda$245/1914758367.apply(null:-1)
[info] scala.collection.immutable.List.foreach(List.scala:378)
[info] scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:44)

@johnynek
Copy link
Collaborator

johnynek commented Dec 3, 2016

The Any being inferred looks like an issue to me.

The second issue is probably a change due to how scalacheck changed, I would think.

@johnynek
Copy link
Collaborator

johnynek commented Dec 3, 2016

The serialization test seems to say (didn't look in detail yet) that it should not serialize but does? Maybe that is just a weird test or testing some detail that changes with java 8.

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

yeah, looks like this change for scalacheck typelevel/scalacheck@a2100f6

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 3, 2016

calling it quits for now. The build works, so we can iterate on the three issues:

  • "should not serialize" bug
  • scalacheck getting an empty list for oneOf, which was invalid anyway
  • Any inference in interval

Otherwise all the build BS is cleaned up!

@isnotinvain
Copy link
Contributor

Ah yeah we ran into typelevel/scalacheck@a2100f6 too internally - though it's overall a good thing. It's surfacing mistakes that were quietly being ignored before.

@isnotinvain
Copy link
Contributor

Can you explain some of the Interval changes? Why was Either replaced w/ MaybeEmpty? Why does the implicit Ordering move from being a class member of Interval to a method param on each method? Having the Ordering bound at construction time instead of separately (possibly differently) at each call sight seems like a big change.

@johnynek
Copy link
Collaborator

johnynek commented Dec 7, 2016

Either now has a .contains so the old code to trigger the implicit no longer runs. So, rather than use a type we can't control, we added MaybeEmpty, so we know the implicit will get triggered.

The implicit Ordering at callsite is something we should do, IMO, but we could factor that out to a second PR.

@isnotinvain
Copy link
Contributor

OK -- I wasn't saying it should be one way or another. I don't really understand this enough to have an opinion so that's why I was asking for some clarification for my own understanding. The Either part makes sense.

Can you clarify why an object should have its ordering provided to each of its methods instead of once to its constructor? That means an instance will behave differently depending on where it's called from which surprised me.

@johnynek
Copy link
Collaborator

johnynek commented Dec 7, 2016

@isnotinvain by capturing the typeclass, we have two issues:

  1. serialization is painful, since you need to deal with this captured Ordering.
  2. if you have two different Interval[T], they can have the same type T but different inner Ordering[T] instances, which makes code inconsistent. This explicitly pushes the Ordering[T] up to the users for them to keep unique.

Generally, taking typeclasses where they are used is usually considered the better design these days (Haskell always does this, but it also requires typeclass uniqueness).

@isnotinvain
Copy link
Contributor

OK. That goes counter to some "encapsulation" type intuitions I have, like if I construct an Interval, that's probably the point in time where I know what the right ordering is, and then I hand it over to some other code, I'd sort of expect it to "keep" its ordering. The case you mentioned "if you have two different Interval[T], they can have the same type T but different inner Ordering[T] instances" is more in line with what I'd expect.

But I guess as you said, if you're going the route of Interval is just a data container, and the type class contains all the behavior, then whenever you want to "do" something with an interval you have to bring your own ordering for that. So I think I get it, it's sort of like how a List() doesn't contain an ordering, but List().sort() will ask you for one.

Thanks for the explanation.

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 7, 2016

I like the way you put it, @isnotinvain. I think this is the way to go, and cleans up the code - as I keep pushing on documentation I'll highlight this distinction so that it's clear that we're taking a stance, and folks can respond / we can change if this introduces problems anywhere. Does that work for you?

@isnotinvain
Copy link
Contributor

Yeah SGTM!

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 10, 2016

Looks like the BFInstance is a straight-up scala 2.12 bug:

scala/scala#5570

scala 2.12 wasn't respecting the @transient annotation on lazy val. Fixed in 2.12.1.

val bytesAfterSizeCalled = new String(serialize(bf))
assert(bytesBeforeSizeCalled == bytesAfterSizeCalled)
val bytesAfterSizeCalled = Bytes(serialize(bf))
assert(bytesBeforeSizeCalled.size == bytesAfterSizeCalled.size)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this started as a legit bug with @transient lazy val. I fixed that by upgrading to scala 2.12. I can see now that this does NOT serialize the dense instance... but a single byte is different in the serialized version, so equality won't work for the test. Nothing's corrupt as far as I can tell.

I changed the test to just check for size. Is this legit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the confusing thing is that contains should be read-only. How does calling contains change the binary representation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It instantiates the dense representation in the lazy Val. I wonder if some but gets flipped? If so seems like a regression in 2.12, but one that doesn't affect the behavior. Maybe we should make a minimal test case to see if this is true for all classes with a transient lazyval.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worrisome in that the transient lazy val trick is VERY commonly used to sidestep scalding serialization issues. We should probably chase it a bit or we will see massive pains on trying to merge this at scale.

class CmsSmallFrequencyProperty[K: CMSHasher: Gen] extends CmsFrequencyProperty[K] {
def exactGenerator: Gen[Vector[K]] = Gen.containerOf[Vector, K](implicitly[Gen[K]])
def exactGenerator: Gen[Vector[K]] =
Gen.nonEmptyListOf[K](implicitly[Gen[K]]).map(_.toVector)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easy fix here for the empty generators.

@codecov-io
Copy link

codecov-io commented Dec 10, 2016

Current coverage is 80.33% (diff: 54.54%)

Merging #600 into develop will increase coverage by 0.04%

@@            develop       #600   diff @@
==========================================
  Files           113        110     -3   
  Lines          4932       4928     -4   
  Methods        4491       4493     +2   
  Messages          0          0          
  Branches        440        435     -5   
==========================================
- Hits           3960       3959     -1   
+ Misses          972        969     -3   
  Partials          0          0          

Powered by Codecov. Last update 06f83f6...f805f3b

@johnynek
Copy link
Collaborator

sorry to be a pain @sritchie, and it is totally my fault too, but can we actually break the interval changes out into a separate PR. I don't want to mess up history by atomically merging those. I think we should just do the minimal in this PR to get 2.12 working (since we have some risk of introducing bugs here and we want to have a clear version to see that happen).

@sritchie
Copy link
Collaborator Author

@johnynek sure, I can do that tomorrow before we pair.

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 11, 2016

okay @johnynek, backed out those changes for Interval.

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can merge and see about dealing with the bloomfilter issue as things come up.

@isnotinvain note this transient lazy val may bite scalding pretty hard at Twitter. I imagine you should look into it.

},

scalacOptions ++= {
if (scalaVersion.value startsWith "2.12")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure we should be turning on optimization, actually. I have heard that this is best for end projects, not libraries, but I guess we can leave it as is for now.

@sritchie sritchie merged commit d14c81a into develop Dec 11, 2016
@sritchie sritchie deleted the sritchie/scala212 branch December 11, 2016 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants