About the talk
Why
What
Framework
Break / hack
Tech Details
Test design
Analysis
Tuesday, July 26, 2011
Why?
Tuesday, July 26, 2011
Questions
“What will happen if I do X”?
“Is X better than Y?”
Tuesday, July 26, 2011
The future &
alternate universes (We’re bad at those.)
Tuesday, July 26, 2011
Then what?
Tuesday, July 26, 2011
Experiments
Tuesday, July 26, 2011
Try it out.
Experiments
Tuesday, July 26, 2011
Try it out.
Data beats speculation.
Experiments
Tuesday, July 26, 2011
Try different alternatives
on different people.
Experiments
Tuesday, July 26, 2011
Try different alternatives
on different people.
Experiments
Tuesday, July 26, 2011
Which is better?
v.s.
Tuesday, July 26, 2011
Not a great experiment
Tuesday, July 26, 2011
Web apps
Tuesday, July 26, 2011
Front end experiments
• Layout, colors, images, copy, ...
• No functional changes
• Impact can be surprisingly high
Tuesday, July 26, 2011
A little more complex...
• Multipage flows
• Functionality changes
Tuesday, July 26, 2011
Backend experiments
• Why not?
• Algorithms, architectures, batch processes, ...
Tuesday, July 26, 2011
The Etsy search backend
• New algorithm
• New RPC protocol
• New result data structure
• New Solr trunk snapshot
Web app
Search cluster A
Search cluster B
search()
searchA() searchB()
Tuesday, July 26, 2011
DB re-architecture
• Postgres => Sharded MySQL
• Multiple experiments
Tuesday, July 26, 2011
Whole new features
New pages+
New DB tables+
New batch jobs+...
Tuesday, July 26, 2011
Not just 2 variants
• A/B/C... tests
• Multi-variate tests
Tuesday, July 26, 2011
Caveats
• Content not under your control
• Price tests?
• Hard-to-measure/quantify things
• Long term impact?
Tuesday, July 26, 2011
Other tests
• Internal users testing
• Whitelisted user testing
Tuesday, July 26, 2011
Opt-in experiments
Tuesday, July 26, 2011
Complementary techniques
• Observed/recorded testing
- show different people the same thing
• Side-by-side testing
- show each person 2 alternatives
Tuesday, July 26, 2011
Side by side testing
Tuesday, July 26, 2011
How
Tuesday, July 26, 2011
A common approach
• JS-based
• Non-techie UI
• “No IT!”
• “Designed For Marketers, By Marketers”
Tuesday, July 26, 2011
• The developer is the user
• Code as configuration
• An integral part of the dev process
Our approach
Tuesday, July 26, 2011
Developer as the user
• The builder of the feature writes the test
• Not just a marketing tool
Tuesday, July 26, 2011
Code as config
• Simplicity
• Expressivity
• Quality
• Version => complete system state
• Revision history
Tuesday, July 26, 2011
Part of the dev process
Every change is an experiment!
Tuesday, July 26, 2011
What does it look like?
Tuesday, July 26, 2011
Tuesday, July 26, 2011
Default => Experiment => (new) Default
Tuesday, July 26, 2011
To add a new feature...
+ $config[‘new_search’] = array(+ ‘enabled’ => ‘off’+ );
function search() {+ if ($cfg->isEnabled(‘new_search’)) {+ return do_new_search();+ }
// existing stuff}
Tuesday, July 26, 2011
Deploy that
Tuesday, July 26, 2011
Now we go crazy...
function do_new_search() { // exciting new stuff // that might or might not work // but we can deploy it anyway // since it’s flagged off}
Tuesday, July 26, 2011
Internal user testing
$config[‘new_search’] = array(+ ‘enabled’ => ‘rampup’,+ ‘rampup’ => array(+ ‘admin’ => true
));
Tuesday, July 26, 2011
$config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array(
+ ‘whitelist’ => array('zhida'), ‘admin’ => true ));
Whitelists
Tuesday, July 26, 2011
$config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array(
+ ‘group’ => 12345, ‘admin’ => true ));
Opt-in experiments
Tuesday, July 26, 2011
$config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array(
+ ‘percent’ => 1.5, ‘admin’ => true ));
A/B
Tuesday, July 26, 2011
$config[‘new_search’] = array(+ ‘enabled’ => ‘on’
);
If it works...
Tuesday, July 26, 2011
Order matters
Whitelist / Blacklist > Internal > Opt-in > Random
Tuesday, July 26, 2011
The framework
Tuesday, July 26, 2011
As easy as...
Tuesday, July 26, 2011
As easy as...
1. Pick a variant
Tuesday, July 26, 2011
As easy as...
1. Pick a variant
2. Do what it says
Tuesday, July 26, 2011
As easy as...
1. Pick a variant
2. Do what it says
3. Log the event
Tuesday, July 26, 2011
What's in a test?
Tuesday, July 26, 2011
Variants
• Key-value pairs
• interpreted by the app
• Name
• mostly for logging
Tuesday, July 26, 2011
SubjectIdProvider
• Why?
• hashing and other selectors
• logging
• Types of subjects
• Users...but not always
• Different groups of users - sellers vs buyers, etc.
• Different ways to identify them - signed in vs signed out
function getID()
Tuesday, July 26, 2011
Selectors
function select($subjectID) => Variant Name
Tuesday, July 26, 2011
Combining multiple selectors
• OR
• breaks blacklists
• AND
• breaks whitelists
• Sequence
• works!
Tuesday, July 26, 2011
Selector sequence
• Defines an ordering
• Returns A/B/C/... or <don't care>
Tuesday, July 26, 2011
Loggers
function log($testKey, $variantKey, $subjectKey)
Tuesday, July 26, 2011
More => better
• More data
• More ways to track
• access logs
• 3P analytics
• custom
Tuesday, July 26, 2011
Access log augmentation
• Apache note
• Lots of log analysis tools
• grep
• $$
Tuesday, July 26, 2011
3P Analytics
• Quick to start
• May be cheap
• Volume?
• Lag time?
• Flexibility / customization?
Tuesday, July 26, 2011
3P Analytics - how
• Custom variables
• take note of number & size limits
• Custom segments
• Canned metrics
Tuesday, July 26, 2011
3P Analytics - example
<script type="text/javascript">var pageTracker = _gat._getTracker("UA-1234567-8");
pageTracker._initData();
pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3);
pageTracker._trackPageview();
</script>
Tuesday, July 26, 2011
Our own event tracking
• HTML beacons
• Hadoop
• Cloud
Web appHTML, JS
Hadoop
eventbeacon
Event log
Results
Tuesday, July 26, 2011
Break / hackhttps://github.com/etsy/ab
Tuesday, July 26, 2011
Building on top of the core API
Tuesday, July 26, 2011
Test builders
• Capture common patterns
• feature ramp ups
• opt-in experiments
• Help with test design
• weight equalization
• multivariate testing
Tuesday, July 26, 2011
Automatic Dispatchers
• Separate dispatching and work
• Work with components that have well-defined invocation APIs
• Define a particular level of granularity
• Feel like magic
Tuesday, July 26, 2011
Dispatcher example - MVC
• View dispatch
• Controller dispatch
• Spring framework, etc.
Tuesday, July 26, 2011
Selector Registry
• Reuse
• Clarity
• Documentation
$selectorReg = array( ‘staff’ => ‘InternalUserSelector’, ‘whitelist’ => ‘WhitelistSelector’, ‘percent’ => ‘WeightedSelector’);
Tuesday, July 26, 2011
Randomized Selector
Tuesday, July 26, 2011
What does it mean?
Tuesday, July 26, 2011
What does it mean?
• Independent of subject attributes
Tuesday, July 26, 2011
What does it mean?
• Independent of subject attributes
• Independent of other tests
Tuesday, July 26, 2011
What does it mean?
• Independent of subject attributes
• Independent of other tests
• Independent of (coarse-grained) time
Tuesday, July 26, 2011
Persistence
Tuesday, July 26, 2011
Persistence
• Better experience
Tuesday, July 26, 2011
Persistence
• Better experience
• Better data
Tuesday, July 26, 2011
Persistence
• Better experience
• Better data
• Multi-part tests
Tuesday, July 26, 2011
Persistence
• Better experience
• Better data
• Multi-part tests
• ...but not forever
Tuesday, July 26, 2011
Ramping up/down
• Vary group sizes
• Reduce risk
• Distribute load
Tuesday, July 26, 2011
Persistence + Ramping
• Minimize inconsistency
• Ramping up
• Should just add people to the treatment group
• Ramping down
• Should just remove part of the treatment group
Tuesday, July 26, 2011
rand()
• Explicit persistence
• Cookie
• DB
• Scaling
• Maintenance
Tuesday, July 26, 2011
Hashing
variant = H(id)
Tuesday, July 26, 2011
Hashing
variant = H(id)
Persistence
Tuesday, July 26, 2011
Hashing
variant = H(id)
Persistence
Tuesday, July 26, 2011
Hashing
variant = H(id)
Persistence
Attribute independence
Tuesday, July 26, 2011
Hashing
variant = H(id)
Persistence Attribute independence
Tuesday, July 26, 2011
Hashing
variant = H(id)
Persistence
Test independence?
Attribute independence
Tuesday, July 26, 2011
Hashing
variant = H(test id, id)
Persistence
Test independence
Attribute independence
Tuesday, July 26, 2011
Hashing
variant = H(test id, id)
Persistence Test independenceAttribute independence
Tuesday, July 26, 2011
Hashing
variant = H(test id, id)
Persistence
What else?
Test independenceAttribute independence
Tuesday, July 26, 2011
Hashing
variant = H(test id, id)
Persistence
Weights!
Attribute independence Test independence
Tuesday, July 26, 2011
Hashing
h = H(test id, id)
Persistence Attribute independence Test independence
Tuesday, July 26, 2011
Hashing
h = H(test id, id)
variant = P(h, weights)
Persistence Attribute independence Test independence
Tuesday, July 26, 2011
Partitioning
Hash
0 1
Tuesday, July 26, 2011
Partitioning
Hash
0 1
Partition
.5
Tuesday, July 26, 2011
Partitioning
Hash
0 1
Partition
A B.5
Tuesday, July 26, 2011
Ramping up
Hash
0 1
Partition
A B.7
Tuesday, July 26, 2011
Which hash function?
• MD5/SHA-256/...
• Test it!
• But be careful...
Tuesday, July 26, 2011
A/B + opt-in
• Need to separate the groups for analysis
• Solution: use more than 2 variants!
• Act according to variant properties
• Track by variant name
Tuesday, July 26, 2011
Analysis
Tuesday, July 26, 2011
... Confidence interval ... something something ... Binomial ... blah blah ...
Tuesday, July 26, 2011
• How sure are we?
• What if it were random?
Confidence Intervals
Tuesday, July 26, 2011
Binomial experiments
Tuesday, July 26, 2011
H T H T T T H T H H
Binomial experiments
Tuesday, July 26, 2011
H T H T T T H T H H
T H T H T T H H T H
Binomial experiments
Tuesday, July 26, 2011
Results
Tuesday, July 26, 2011
Dashboards
Tuesday, July 26, 2011
A few test design tips
Tuesday, July 26, 2011
Whatʼs the question?
Tuesday, July 26, 2011
Whatʼs the question?
What metrics?
Tuesday, July 26, 2011
Whatʼs the question?
What metrics?
How much better?
Tuesday, July 26, 2011
Who?
• Different roles
• Old vs new
• Novelty
• Habit
• Expectation
Tuesday, July 26, 2011
When?
• User types vary
• Activity patterns vary
• Site content might vary
• Performance might vary
• Full weeks are often a good starting point
Tuesday, July 26, 2011
Summary
Tuesday, July 26, 2011
Better living through experimentation
• More risk taking => better product
• MTTR
• Lower stress
Tuesday, July 26, 2011
You can too.
Tuesday, July 26, 2011