View
1.068
Download
0
Embed Size (px)
Transcript
Introduction to A/B testing
Crash Course in A/B testingA statistical perspective
Wayne Tai Lee
RoadmapWhat is A/B testing?Good experiments and the role of statisticsSimilar to proof by contradictionTestsBig data meets classic asymptoticsComplaints with classical hypothesis testingAlternatives?
What is A/B TestingAn industry term for controlled and randomized experiment between treatment/control groups.Age old problem.especially with humans
What most people know:
ABGather samplesAssign treatmentsApply treatmentsMeasure OutcomeCompare
?
What most people know:
AB
?Only difference is in the treatment!
Reality:
AB
??????Variability fromSamples/InputsVariability fromTreatment/functionVariability fromMeasurementHow do we accountfor all that?
If there are variabilities in addition to the treatment effect, how can we identify/isolate the effect from the treatment?
Confounding:
Controlled variabilitySystematic and desiredi.e. our treatmentBias Systematic but not desiredAnything that can confound our studyNoise Random error but not desiredWont confound the study but makes it hard to make a decision.3 Types of Variability:
How do we categorize each?
AB
??????Variability fromSamples/InputsVariability fromTreatment/functionVariability fromMeasurement
Reality:
AB
??????Good instrumentation!
Reality:
AB
??????Randomize assignment!Convert bias to noise
Reality:
AB
??????Randomize assignment!Convert bias to noiseYour population can be skewed or biased.but that only restricts the generalizability of the results
12
Reality:
AB
?Think about what you want to measure and how!Minimize the noise level/variability in the metric.
Wine testing pairing vs just two groups13
A good experiment in general:Good design and implementation should be used to avoid bias.For unavoidable biases, use randomization to turn it into noise.Good planning to minimize noise in data.
How do we deal with noise?Bread and butter of statisticians!Quantify the magnitude of the treatmentQuantify the magnitude of the noiseJust compare..most of the time
Formalizing the ComparisonSimilar to proof by contradiction You assume the difference is by chance (noise)
Formalizing the ComparisonSimilar to proof by contradiction You assume the difference is by chance (noise) See how the data contradicts the assumption
Formalizing the ComparisonSimilar to proof by contradiction You assume the difference is by chance (noise) See how the data contradicts the assumption If the surprise surpasses a threshold, we reject the assumption. .nothing is 100%
Difference due to chance?IDPVPerson 139Person 2209Person 331Person 498Person 59Person 6151Red > treatment; Black > control
Difference due to chance?IDPVmeanmeanPerson 13972124.5Person 2209Person 331Person 498Person 59Person 6151Red > treatment; Black > controlDiff = 52.5.so what?Lets measure the difference in means!
Difference due to chance?IDPVIDPVPerson 139139Person 22092209Person 331331Person 498498Person 5959Person 61516151Red > treatment; Black > control
If there was no difference from the treatment, shuffling the treatment statuscan emulate the randomization of the samples.
Difference due to chance?IDPVIDPVPerson 139139Person 22092209Person 331331Person 498498Person 5959Person 61516151Red > treatment; Black > control
Diff = 122.25 24 = 98.25
Difference due to chance?IDPVIDPVPerson 139139Person 22092209Person 331331Person 498498Person 5959Person 61516151Red > treatment; Black > control
Diff = 107. 5 53.5 = 54
Difference due to chance?
Our original 52.550000 repeats later..
Difference due to chance?
Our original 52.546.5% of the permutations yielded a larger if not the same difference as our original sample (in magnitude). Are you surprised by the initial results?
TestsCongratulations!
You just learned the permutation test!The 46.5% is the pvalue under the permutation test.
TestsCongratulations!
You just learned the permutation test!The 46.5% is the pvalue under the permutation test.
Problems:Permuting the labels can be computationally costly.Not possible before computers!Statistical theory says there are many tests out there.
Tests28
Standard ttest:1) Calculate delta: = mean_treatment mean_control2) Assumes follows a Normal distribution then calculatethe pvalue.
3) If pvalue < 0.05 then we reject the assumption that there is nodifference between treatment and control.
pvalue = sum of red areas0

29Wait, our metrics may not be Normal!
Big data meets classic Stats
Big Data meets Classic Stat30Wait, our metrics may not be Normal!
We care about the mean ofthe metric and not the actual metric distribution.
Big Data meets Classic Stat31Wait, our metrics may not be Normal!
Central Limit Theorem:The mean of the metric will be Normal if the sample size is LARGE!We care about the mean ofthe metric and not the actual metric distribution.
Big Data meets Classic Stat32Assumptions with ttestNormality of %deltaGuaranteed with large sample sizesIndependent SamplesNot too many 0s
Thats IT!!!Easy to automate.Simple and general.
What are Tests?33Statistical tests are just procedures that depend on data to make a decision.Engineerify: Statistical tests are functions that take in data, treatments, and return a boolean.
What are Tests?34Statistical tests are just procedures that depend on data to make a decision.Engineerify: Statistical tests are functions that take in data, treatments, and return a boolean.Guarantees:By setting the pvalue to compare to a 5% threshold, we controlP( Test says difference exists  In reality NO difference) =80%95%=80%95% A/B testing?
Recommended
View more >