54
volodymyrk How to conclude online experiments in Python Volodymyr (Vlad) Kazantsev Head of Data Science at Product Madness

How to conclude online experiments in python

Embed Size (px)

Citation preview

Page 1: How to conclude online experiments in python

volodymyrk

How to conclude online experiments

in PythonVolodymyr (Vlad) Kazantsev

Head of Data Science at Product Madness

Page 3: How to conclude online experiments in python

volodymyrk

Goal of the tutorial

Uncover the “magic” behind statistics used for A/B testing and other online experiments

Page 4: How to conclude online experiments in python

volodymyrk

● Head of Data Science (Social Gaming)

● Product Manager at King

● MBA at London Business School

● Visual Effect developer (Avatar, Batman, ...)

● MSc in Probability (Kiev Uni, Ukraine)

A quick bioNow

2004

Page 5: How to conclude online experiments in python

volodymyrk

Different kinds of tests

● Classic A/B tests

● Long running activities with control groups

● Longitudinal tests

Page 6: How to conclude online experiments in python

volodymyrk

Why bother?

● To test your hypothesis and learn● To avoid blindly following HiPPOs● To audit performance of product and

marketing teams

Page 7: How to conclude online experiments in python

volodymyrk

Why Stats?

● To separate data from the noise● To quantify uncertainty

Page 8: How to conclude online experiments in python

volodymyrk

Fruit Crush Epic

The Story of almost real mobile game, in the almost real gaming company.. and one Data Scientist

Page 9: How to conclude online experiments in python

volodymyrk

Day-13 seconds panic-attack

Page 10: How to conclude online experiments in python

volodymyrk

Day 1 - loading time panic-attack!Fruit Crush Epic

Page 11: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 12: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 13: How to conclude online experiments in python

volodymyrk

One sample t-testNull Hypothesis:- avg. loading time <=3 seconds for last hour's observation

Alternative Hypothesis:- population mean is >3 seconds for last hour's observation

Test:- single sample, one-sided t-test.

Page 14: How to conclude online experiments in python

volodymyrk

One sample t-test

t_value = t-test(samples, expected mean)

p-value: 0.086 probability of obtaining the result as extreme as observed, assuming Null-hypothesis is true

t-distribution lookup(t_value, sample_size)

Page 15: How to conclude online experiments in python

volodymyrk

If you want to code it yourself

Page 16: How to conclude online experiments in python

volodymyrk

Stats in Python

numpy

scipy.stats

statsmodels.stats

theano

pymc3

Classical Bayesian

* High-level view. Lot’s of stuff missing here. pymc3 uses statsmodels for GLM

Page 17: How to conclude online experiments in python

volodymyrk

One sample t-test and z-test

Page 18: How to conclude online experiments in python

volodymyrk

Confidence Interval

Page 19: How to conclude online experiments in python

volodymyrk

Confidence Interval for the Mean

Page 20: How to conclude online experiments in python

volodymyrk

Standard Error of the Mean in Python

Page 21: How to conclude online experiments in python

volodymyrk

Next Day

Page 22: How to conclude online experiments in python

volodymyrk

Day-2OMG, my Retention is low!

Page 23: How to conclude online experiments in python

volodymyrk

Is my day-1 retention low?

Day-1 results:

installs 448

returned next day 123

Day-1 retention 27.46%

Retention target 30%

Fruit Crush Epic

Page 24: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 25: How to conclude online experiments in python

volodymyrk

One sample z-test for proportionNull Hypothesis:- avg. retention >=30%

Alternative Hypothesis:- avg. retention <30%

Test:- single sample, one-sided z-test for proportion

Page 26: How to conclude online experiments in python

volodymyrk

In Python...

Page 27: How to conclude online experiments in python

volodymyrk

So what is my confidence interval?

Page 28: How to conclude online experiments in python

volodymyrk

Day-5Connect with Facebook or Die!

The First A/B test

Page 29: How to conclude online experiments in python

volodymyrk

A/B test 1 - connect to Facebook

Page 30: How to conclude online experiments in python

volodymyrk

A/B test design

Group A

Group B Start Level 1

Start Level 1

Finish Level 1

50%

50%

Have seen prompt 2501

Connected 1104

Connect rate 44.1%

Have seen prompt 2141

Connected 1076

Connect rate 50.2%

Fruit Crush Epic

Page 31: How to conclude online experiments in python

volodymyrk

Is it statistically significant?Fruit Crush Epic

Page 32: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 33: How to conclude online experiments in python

volodymyrk

Two samples z-test for proportionNull Hypothesis:- avg. connection rate is the same. P1 = P2

Alternative Hypothesis:- P1 ≠ P2

Test:- two samples z-test for proportion. Two sided

Page 34: How to conclude online experiments in python

volodymyrk

Two samples z-test for proportion in Python

Page 35: How to conclude online experiments in python

volodymyrk

Confidence interval for difference in proportion

Page 36: How to conclude online experiments in python

volodymyrk

In Python

Page 37: How to conclude online experiments in python

volodymyrk

What should we measure, exactly?

1000

1000

150

400

450

30

390

430

160

840

40

400

400

connected: 47%retained: 82%

connected: 50%retained: 80%Start

Level 1

Start Level 1

Start Level 2

Start Level 2

Page 38: How to conclude online experiments in python

volodymyrk

What about Bayesian Stats?

Page 39: How to conclude online experiments in python

volodymyrk

Bayesian Credible Interval vs. CI

Page 40: How to conclude online experiments in python

volodymyrk

Day-30Do you want to buy last chance?

A/B testing Revenue

Page 41: How to conclude online experiments in python

volodymyrk

How much an extra life is worth?

LOSER!!!

Purchase another chance

for only..

$0.99

LOSER!!!

Purchase another chance

for only..

$1.99

Fruit Crush Epic

Page 42: How to conclude online experiments in python

volodymyrk

How we are going to test it?Consider● There are multiple items to buy in game (lives, boosters, blenders, etc)● We expect more people to make a $0.99 purchase, so we hope to make

more money overall, even at lower priceA/B test Design● We will show A/B test to new users only● Will run for 2 months● We will measure overall revenue per user in the first 30 days● Null-hypothesis: we make more money from $0.99 group

Measurements● Difference in Average Revenue Per User (ARPU) in 30 days● Difference in Conversion Rate (%% of users who make at least 1 purchase)

Page 43: How to conclude online experiments in python

volodymyrk

Results

count 450 390mean 151.9 214.225% 20.8 26.550% 55.3 69.475% 147.3 231.3max 3960 3647.8

Fruit Crush Epic

* random generator used in the example is available in ipython notebooks** distribution is made more extreme than what is normally observed in casual game, like our imaginary match-3 title

Page 44: How to conclude online experiments in python

volodymyrk

Results

30,000 users in each group450 payers 390 payers

p-value = 0.037Significant

p-value = ???Is it Significant?

Page 45: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 46: How to conclude online experiments in python

volodymyrk

Welch's t-test (σ1≠σ2)

Can we actually use t-test?

Page 47: How to conclude online experiments in python

volodymyrk

Poor’s man non-parametric test: split 5

p < 3%

Page 48: How to conclude online experiments in python

volodymyrkIf you don’t know enough stats - simulate!

This is very close to p-value from t-test

Page 49: How to conclude online experiments in python

volodymyrk

Can we improve sensitivity?27 players, who have spent > $1000 in both group.10 in $0.99 group and 17 in $1.99 groupMax spent = $3960

Page 50: How to conclude online experiments in python

volodymyrkAnd we re-run our analysis

Again, we can use t-test

Page 51: How to conclude online experiments in python

volodymyrk

Final Thoughts

Page 52: How to conclude online experiments in python

volodymyrk

Can we analyse distributions?

You can quantify difference between two curvesArea under the curve is Average Revenue per User

Fruit Crush Epic

* random generator used in the example is available in ipython notebooks** distribution is made more extreme than what is normally observed in casual game, like our imaginary match-3 title

Page 53: How to conclude online experiments in python

volodymyrk

Is 30 day revenue a good metric?LTV projection A LTV projection B

Fruit Crush Epic

Page 54: How to conclude online experiments in python

volodymyrk

Summary:

● There are only few stats tests that any Data Scientist must know

● t-tests are robust to be useful even with skewed data sets

● Bayesian and MCMC is cool, but don’t use MCMC for trivial cases

● It is hard to detect the difference in heavily-skewed cases

IPython Notebooks for this tutorial are available at: http://nbviewer.ipython.org/github/VolodymyrK/stats-testing-in-python