How does it work?

Honest AB is an A/B testing service that uses interpretable machine learning models to provide both statistically significant testing as well as human-level insights. See it in action: https://honest-ab.herokuapp.com/demo.

How does it work?

The app exposes a web UI as well as an API for creating experiments (A/B tests) and submitting data from your app. The service combines typical A/B test data - clicks and views for each variant - with any additional data you think might have an effect on a user's decision to click. The service uses a Bayesian regression model to detect linear relationships between input features and click rate. Once the test reaches significance, the model is able to determine whether there is a statistically significant relationship between the features provided on the users in the experiment and their chance of clicking on either variant. The system is able to synthesize these trends into plain English, revealing, for example, that one of the input features strongly correlates with success of one of the variants. This, along with a statistically significant overall result for the A/B test, can give valuable insights for future testing as well as help interpret the test at hand.

The Models

Honest AB is powered by two machine learning models, one handling the significance estimation for the experiment, and the other performing regression analysis on the additional input features. Both are Bayesian inference models evaluated analytically, meaning no approximations or gradient-based learning. This property allows the incoming data to be processed in a streaming environment. Requests to the API are served first by basic data validation and encoding and then by appending to a write-ahead log. The log is processed asynchronously to update the models and then the data is deleted, since it's no longer needed. To allow for this architecture and the significant efficiency it allows, the models have been formulated in terms of sufficient statistics that are fixed size and bounded magnitude. Model training happens in linear time and space constant with respect to data cardinality. Due to the asynchronous architecture and the user- and data-parallel algorithms that power it, Honest AB is highly scalable to many users, large datasets, and high availability.

The Significance Model

Contrary to standard procedure for an A/B test, Honest AB uses a Bayesian measure of significance for experiment results instead of a T-test. This allows for the statistic to serve as a stopping condition for the test, whereas the significance value reported for a T-test is only valid for a fixed, predetermined sample size. The model estimates the probability that the true click rate for variant A is greater than for variant B, modelling the click experiment as a Bernoulli trial and its corresponding click rate with a conjugate Beta prior. The sufficient statistic for this model is the counts for successes and failures of each variant. (Miller)

The Feature Insights Model

The feature insights are based on Bayesian linear regression models from the input features to the output click rate for each variant. Each feature is treated independently, since covariance between features and their regression weights is not useful for human insights. There is one regression model for each feature for each variant. The model performs 2D regression to learn a multiplicative weight and a bias as well as the posterior covariance of those estimates. The bias is learned simply to normalize the data and ensure that the multiplicative weight is representative of a linear trend. Trends only become insights if they are statistically significant, and this determination comes from a two-sided T-test with null hypothesis that the multiplicative weight of the feature is zero. The test is performed using the variance of the multiplicative weight (representing the uncertainty in that estimate) learned by the Bayesian model. The prior for the weights is zero-mean, with the prior multiplicative variance set as the reciprocal of the variance of the corresponding data feature. This prior corresponds to the assumption that the features are independent and have no relationship to the click rate and should then contribute equally to the variance of the regression output.

References

(Miller): http://www.evanmiller.org/bayesian-ab-testing.html

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
honest_ab		honest_ab
test		test
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
Readme.md		Readme.md
application.py		application.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How does it work?

The Models

The Significance Model

The Feature Insights Model

References

About

Uh oh!

Releases

Packages

Languages

adamtobey/honest_ab

Folders and files

Latest commit

History

Repository files navigation

How does it work?

The Models

The Significance Model

The Feature Insights Model

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages