-
Notifications
You must be signed in to change notification settings - Fork 347
Dealing with probabilistic tests #478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name for falsePositiveRate might be falseFailureRate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this exact value corresponding to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exact is useful here so that we can generate appropriate test inputs.
For example, suppose we're testing CMS[Long]. We need to generate some inputs so that we can query exactList.frequency(x) and cms.frequency(x) and compare them.
We could simply query using random Long values. However, most of the random Long values we generate would probably not be in exactList, so these test cases would not be very useful. If we have exactList, we can choose random elements from exactList as our test cases.
I'm not sure if this is the best way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why couldn't we do something like:
// I need an exact value for my implementation:
def needExact(e: Exact): Gen[Input]
// but I could still implement this:
def inputGenerator: Gen[Input] = exactGenerator.flatMap(needExact) Am I missing something? Putting this in the trait as is seems to be constraining more than is needed. Why can't we just get by with def inputGenerator: Gen[Input]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's link to:
https://en.wikipedia.org/wiki/Hoeffding%27s_inequality
and add the result;
When Prob(X = 1) >= p, then
Prob(sum(X) <= (p - eps)*n) <= exp(-2*eps*eps*n)
If we want that Prob <= fprate, then it is sufficient that, fprate = exp(-2*eps*eps*n) or:
eps = math.sqrt( - math.log(fprate)/(2.0 * n) )
So, I think there is a slight error in your version (n is in the numerator in yours).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff = n * fprate, so it's correct in the numerator.
My only concern here is that the formula is for the Bernoulli version of Hoeffding's. Since we're taking separate probability values from each result, we aren't guaranteed that they'll be uniform (very possibly they won't be). But maybe it's "close enough."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general case (https://en.wikipedia.org/wiki/Hoeffding%27s_inequality#General_case) is
P(sum(X) - sum(E[X]) >= nt) <= exp(-2*n*t^2)
(here I used sum(X) = X_1 + ... + X_n instead of Xbar = 1/n (X_1 + ... + X_N))
So we have
exp(-2*n*t^2) = fprate
=> t = sqrt(-log(fprate) / (2*n) )
So I think Oscar's right.
I don't think this assumes the Bernoulli distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I read Joe's comment again, and he's right.
diff = n * t, so we need diff = sqrt(-n * log(fprate) / 2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I overlooked the n * t on the outside.
I think the bound applies perfectly, with the assumption that our Random class is a real RNG. That is obviously not true, and in fact it is pseudo-random, but I don't see how that is escapable. We could beef up the RNG used to be stronger (but certainly slower), but that might require some changes to scalacheck, not sure.
Create a trait GeneralizedApproximate that contains the minimal behavior necessary for ApproximateProperty. Change return type of ApproximateProperty#approxResult to GeneralizedApproximate[Result] instead of Approximate[Result]. Create implicit conversions from - Approximate[T] to GeneralizedApproximate[T] - ApproximateBoolean to GeneralizedApproximate[Boolean]
version.sbt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bump to a snapshot version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I didn't mean to check that change in. I'll remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should actually swap all of them to work like this. We've seen some funky errors and mixed classpath's if you have a published version in the version.sbt. Not sure if its a new sbt thing or what. But can be a different PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I keep getting a "Unresolved dependencies: algebird-test" error when I don't use "-SNAPSHOT". Is this happening to other people too?
f07db15 to
46390c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like what we really want is
trait ApproximateSet[S, -T] {
def contains(set: S, t: T): ApproximateBoolean
}
and use that as a typeclass rather than implicit conversions.
object ApproximateSet {
def contains[S, T](s: S, t: T)(implicit as: ApproximateSet[S, T]): ApproximateBoolean = as.contains(s, t)
implicit def fromApproximate[N: Numeric]: ApproximateSet[Approximate[N], N] = new ApproximateSet ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what you are trying to do here is add a common interface to ApproximateBoolean and Approximate[N] right?
Maybe just do that directly rather than through implicit conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm trying to make a common interface to ApproximateBoolean and Approximate[N].
I like the idea of having a trait ApproximateSet[T]:
trait ApproximateSet[T] {
def contains(t: T): ApproximateBoolean
}
ApproximateBoolean extends ApproximateSet[Boolean]
Approximate[N: Numeric] extends ApproximateSet[N]I think I'll follow that approach.
|
@ianoc everything in this PR uses Scalacheck foralls and Properties classes. Do we want to rewrite it in terms of Scalatest foralls and property tests? |
|
Looks like this doesn't merge cleanly to develop @sid-kap can you merge in develop to your branch and resolve the conflicts? |
|
Nope, lets just get this in and stable, once Oscar is happy I think we should try get it in. Can worry about that other stuff later/maybe never. |
|
Ok sounds good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call list vec instead? Also, vec.count(_ == key) is going to be faster here since it does not materialize a second vector.
|
A couple of comments. Also note that this does not merge cleanly, so you'll need to merge in develop. |
|
Scalacheck just released the new version with the feature that I requested. So I'll try to bring in the new dependency and finish up this PR sometime next week. |
Also, create ApproximateProperties class
|
By the way, I don't think we should use the Chernoff-Hoeffing bound. A few weeks of studying concentration inequalities in Randomized Algorithms has taught me not to use the Hoeffding bound for a binomial with very small or very large p -- this bound doesn't depend on p, and therefore doesn't take into account the fact that the binomial concentrates much better for very small/very large p. Using a weak bound would result in overly lax tests -- for example, suppose we are running 100 trials with 0.975 probability of success, and we want to fail at most 0.01 of the time. The true threshold for the binomial ( For fun: we could use a Bernstein-type inequality (since the binomial is sub-gamma), which would give us a cutoff of Or we could be lame and use the inverse binomial function... |
|
Do we want to try to merge this in? I've updated Scalacheck, so now it runs each test the correct number of times. The only things I think need to be addressed are
Concerning 2: I made a class called ApproximateProperties that extends Properties but has a default of running each test once instead of 100 times. All Properties classes that have approximate properties should extend from this, and should contain only approximate properties. (We need to run approximate properties only once because when the ApproximateProperty check code is called once, it generates many (say, 100) inputs and succeeds if enough of those 100 succeed.) Is there a clean way to enforce, using the type system, that only ApproximatePropertys can be put in the ApproximateProperties class? I would not want to emulate the Scalacheck Properties class because that class is implemented using messy mutable structures (every time you call |
|
@sid-kap Thanks! I do very much want to merge this, I just haven't looked back at it since the scalacheck upgrade. I'll review. Thanks for not letting it drop! Appreciated. |
|
I've had some success comparing algorithms involving random behaviors by using the Kolmogorov-Smirnov test, where you can test whether two distributions are different (or, in the inverse case, whether they are not different). If you can define (either in closed-form, or via sampling) a reference of "correct" behavior, you can then run your test some number of times and run the KS-test between test and reference. Worth noting that with random behaviors, there is generally no way to guarantee that your random test will never violate, though it is possible to put things on a theoretical footing that fails less frequently. apache/spark#2455 |
|
@ianoc what do you think? I'm happy to merge this now. I think it's a big improvement, and we let it sit too long. |
|
I'm in favor of pushing ahead and getting it in, if there are futher improvements to be made we can do it later |
Dealing with probabilistic tests
This is a rough sketch of a way we can deal with flaky/probabilistic tests.
I defined a trait ApproximateProperty that allows us to describe approximate properties, and, as an example, implemented a working ApproximateProperty test for CMS.
In detail:
ExacttypeApproxtype which approximates the behavior of ExactExactandApproximplement.Exactshould be able to take something of typeInputand produce something of typeResultApproxshould be able to take something of typeInputand produce something of typeApproximate[Result]ApproximateProperty.toPropfunction that converts anApproximatePropertyto anorg.scalacheck.Propthat runs the ApproximateProperty test many times and fails if the number of failed tests exceeds a certain threshold. (using Hoeffding's inequality)Comments/suggestions?