Skip to content

Conversation

@sid-kap
Copy link
Contributor

@sid-kap sid-kap commented Aug 4, 2015

This is a rough sketch of a way we can deal with flaky/probabilistic tests.

I defined a trait ApproximateProperty that allows us to describe approximate properties, and, as an example, implemented a working ApproximateProperty test for CMS.

In detail:

  • Defined the ApproximateProperty trait to describe a relationship between
    • an Exact type
    • an Approx type which approximates the behavior of Exact
    • a behavior that both Exact and Approx implement.
      • Exact should be able to take something of type Input and produce something of type Result
      • Approx should be able to take something of type Input and produce something of type Approximate[Result]
    • some generators/glue functions (exactGenerator, makeApproximate, inputGenerator) that determine how to create an instance of Exact, how to convert an instance of Exact to an instance of Approximate, and how to create an Input
  • Defined the ApproximateProperty.toProp function that converts an ApproximateProperty to an org.scalacheck.Prop that runs the ApproximateProperty test many times and fails if the number of failed tests exceeds a certain threshold. (using Hoeffding's inequality)

Comments/suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name for falsePositiveRate might be falseFailureRate

@DanielleSucher DanielleSucher mentioned this pull request Aug 5, 2015
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this exact value corresponding to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact is useful here so that we can generate appropriate test inputs.

For example, suppose we're testing CMS[Long]. We need to generate some inputs so that we can query exactList.frequency(x) and cms.frequency(x) and compare them.
We could simply query using random Long values. However, most of the random Long values we generate would probably not be in exactList, so these test cases would not be very useful. If we have exactList, we can choose random elements from exactList as our test cases.

I'm not sure if this is the best way to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why couldn't we do something like:

// I need an exact value for my implementation:
def needExact(e: Exact): Gen[Input]
// but I could still implement this:
def inputGenerator: Gen[Input] = exactGenerator.flatMap(needExact) 

Am I missing something? Putting this in the trait as is seems to be constraining more than is needed. Why can't we just get by with def inputGenerator: Gen[Input]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's link to:
https://en.wikipedia.org/wiki/Hoeffding%27s_inequality

and add the result;

When Prob(X = 1) >= p, then

Prob(sum(X) <= (p - eps)*n) <= exp(-2*eps*eps*n)

If we want that Prob <= fprate, then it is sufficient that, fprate = exp(-2*eps*eps*n) or:
eps = math.sqrt( - math.log(fprate)/(2.0 * n) )

So, I think there is a slight error in your version (n is in the numerator in yours).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff = n * fprate, so it's correct in the numerator.

My only concern here is that the formula is for the Bernoulli version of Hoeffding's. Since we're taking separate probability values from each result, we aren't guaranteed that they'll be uniform (very possibly they won't be). But maybe it's "close enough."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general case (https://en.wikipedia.org/wiki/Hoeffding%27s_inequality#General_case) is

P(sum(X) - sum(E[X]) >= nt) <= exp(-2*n*t^2)

(here I used sum(X) = X_1 + ... + X_n instead of Xbar = 1/n (X_1 + ... + X_N))
So we have

   exp(-2*n*t^2) = fprate
=> t = sqrt(-log(fprate) / (2*n) )

So I think Oscar's right.
I don't think this assumes the Bernoulli distribution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I read Joe's comment again, and he's right.
diff = n * t, so we need diff = sqrt(-n * log(fprate) / 2).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I overlooked the n * t on the outside.

I think the bound applies perfectly, with the assumption that our Random class is a real RNG. That is obviously not true, and in fact it is pseudo-random, but I don't see how that is escapable. We could beef up the RNG used to be stronger (but certainly slower), but that might require some changes to scalacheck, not sure.

sid-kap added 7 commits August 5, 2015 17:50
Create a trait GeneralizedApproximate that contains the minimal
behavior necessary for ApproximateProperty.

Change return type of ApproximateProperty#approxResult to
GeneralizedApproximate[Result] instead of Approximate[Result].

Create implicit conversions from
 - Approximate[T] to GeneralizedApproximate[T]
 - ApproximateBoolean to GeneralizedApproximate[Boolean]
@sid-kap sid-kap changed the title Dealing with probabilistic tests (Don't merge) Dealing with probabilistic tests Aug 11, 2015
version.sbt Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bump to a snapshot version here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I didn't mean to check that change in. I'll remove it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should actually swap all of them to work like this. We've seen some funky errors and mixed classpath's if you have a published version in the version.sbt. Not sure if its a new sbt thing or what. But can be a different PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I keep getting a "Unresolved dependencies: algebird-test" error when I don't use "-SNAPSHOT". Is this happening to other people too?

@sid-kap sid-kap force-pushed the test_failure_bounds branch from f07db15 to 46390c9 Compare August 12, 2015 17:14
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like what we really want is

trait ApproximateSet[S, -T] {
  def contains(set: S, t: T): ApproximateBoolean 
}

and use that as a typeclass rather than implicit conversions.

object ApproximateSet {
  def contains[S, T](s: S, t: T)(implicit as: ApproximateSet[S, T]): ApproximateBoolean = as.contains(s, t)
  implicit def fromApproximate[N: Numeric]: ApproximateSet[Approximate[N], N] = new ApproximateSet ...
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what you are trying to do here is add a common interface to ApproximateBoolean and Approximate[N] right?

Maybe just do that directly rather than through implicit conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm trying to make a common interface to ApproximateBoolean and Approximate[N].

I like the idea of having a trait ApproximateSet[T]:

trait ApproximateSet[T] {
  def contains(t: T): ApproximateBoolean
}

ApproximateBoolean extends ApproximateSet[Boolean]
Approximate[N: Numeric] extends ApproximateSet[N]

I think I'll follow that approach.

@sid-kap
Copy link
Contributor Author

sid-kap commented Aug 19, 2015

@ianoc everything in this PR uses Scalacheck foralls and Properties classes. Do we want to rewrite it in terms of Scalatest foralls and property tests?

@ianoc
Copy link
Collaborator

ianoc commented Aug 19, 2015

Looks like this doesn't merge cleanly to develop @sid-kap can you merge in develop to your branch and resolve the conflicts?

@ianoc
Copy link
Collaborator

ianoc commented Aug 19, 2015

Nope, lets just get this in and stable, once Oscar is happy I think we should try get it in. Can worry about that other stuff later/maybe never.

@sid-kap
Copy link
Contributor Author

sid-kap commented Aug 19, 2015

Ok sounds good.
The maintainer of scalacheck finally merged in the PR that I sent. Now we just have to wait for him to do a new release. Once that happens, I can try to merge this in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call list vec instead? Also, vec.count(_ == key) is going to be faster here since it does not materialize a second vector.

@johnynek
Copy link
Collaborator

A couple of comments. Also note that this does not merge cleanly, so you'll need to merge in develop.

@sid-kap
Copy link
Contributor Author

sid-kap commented Sep 11, 2015

Scalacheck just released the new version with the feature that I requested. So I'll try to bring in the new dependency and finish up this PR sometime next week.

@sid-kap
Copy link
Contributor Author

sid-kap commented Oct 4, 2015

By the way, I don't think we should use the Chernoff-Hoeffing bound. A few weeks of studying concentration inequalities in Randomized Algorithms has taught me not to use the Hoeffding bound for a binomial with very small or very large p -- this bound doesn't depend on p, and therefore doesn't take into account the fact that the binomial concentrates much better for very small/very large p.

Using a weak bound would result in overly lax tests -- for example, suppose we are running 100 trials with 0.975 probability of success, and we want to fail at most 0.01 of the time. The true threshold for the binomial (qbinom(0.01, 100, 0.975)) is 93, but the Chernoff-Hoeffding bound gives us 97.5 - sqrt(-2 * log(0.01) / 100) = 82.3. This gets worse for properties with a 0.99 success probability -- the true value is 96, and the Hoeffding bound gives us 83.8.

For fun: we could use a Bernstein-type inequality (since the binomial is sub-gamma), which would give us a cutoff of 97.5 - sqrt(-4*0.975*(1-0.975)*log(0.01)) = 90.7 for 0.975 success probability, and 99 - sqrt(-4*0.99*(1-0.99)*log(0.01)) = 94.7 for the 0.99 success probability. This bound is essentially tight because it uses the Gaussian tail bound (e^(-x^2)) instead of an exponential bound (e^(-x)). For proof, see the last page of http://www.cs.utexas.edu/~ecprice/courses/randomized/notes/lec6.pdf (Note that these notes assume that p << 1, so it uses the approximation sqrt(p(1-p)) ~ sqrt(p), which is not the case here.)

Or we could be lame and use the inverse binomial function...

@sid-kap
Copy link
Contributor Author

sid-kap commented Oct 16, 2015

Do we want to try to merge this in? I've updated Scalacheck, so now it runs each test the correct number of times.

The only things I think need to be addressed are

  1. Maybe make the bounds tighter, as mentioned above
  2. How to enforce that people use these classes correctly

Concerning 2: I made a class called ApproximateProperties that extends Properties but has a default of running each test once instead of 100 times. All Properties classes that have approximate properties should extend from this, and should contain only approximate properties. (We need to run approximate properties only once because when the ApproximateProperty check code is called once, it generates many (say, 100) inputs and succeeds if enough of those 100 succeed.)

Is there a clean way to enforce, using the type system, that only ApproximatePropertys can be put in the ApproximateProperties class? I would not want to emulate the Scalacheck Properties class because that class is implemented using messy mutable structures (every time you call property("foo") =, it appends the property to a mutable list of properties). Maybe we could make a function that takes a List[ApproximateProperty] and generates the appropriate Properties instance? This would work but it might lead to messier code (we would not be able to take advantage of the property("foo") = DSL).

@johnynek
Copy link
Collaborator

@sid-kap Thanks! I do very much want to merge this, I just haven't looked back at it since the scalacheck upgrade. I'll review. Thanks for not letting it drop! Appreciated.

@erikerlandson
Copy link
Contributor

I've had some success comparing algorithms involving random behaviors by using the Kolmogorov-Smirnov test, where you can test whether two distributions are different (or, in the inverse case, whether they are not different). If you can define (either in closed-form, or via sampling) a reference of "correct" behavior, you can then run your test some number of times and run the KS-test between test and reference.

Worth noting that with random behaviors, there is generally no way to guarantee that your random test will never violate, though it is possible to put things on a theoretical footing that fails less frequently.

apache/spark#2455
http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/

@johnynek
Copy link
Collaborator

@ianoc what do you think? I'm happy to merge this now. I think it's a big improvement, and we let it sit too long.

@ianoc
Copy link
Collaborator

ianoc commented Nov 25, 2015

I'm in favor of pushing ahead and getting it in, if there are futher improvements to be made we can do it later

johnynek added a commit that referenced this pull request Nov 30, 2015
@johnynek johnynek merged commit 3a4c927 into twitter:develop Nov 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants