Experimenting on Humans
Aviran Mordo Head of Back-end Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
Sagy Rozman Back-end Guild master
www.linkedin.com/in/sagyrozman
@sagyrozman
Wix In Numbers
• Over 55M users + 1M new users/month • Static storage is >1.5Pb of data • 3 data centers + 3 clouds (Google, Amazon, Azure) • 1.5B HTTP requests/day • 900 people work at Wix, of which ~ 300 in R&D
1542 (A/B Tests in 3 months)
• Basic A/B testing
• Experiment driven development
• PETRI – Wix’s 3rd generation open source experiment
system
• Challenges and best practices
• How to (code samples)
Agenda
11:31 A/B Test
To B or NOT to B?
A
B
Home page results (How many registered)
Experiment Driven Development
This is the Wix editor
Our gallery manager What can we improve?
Is this better?
Don’t be a loser
Product Experiments Toggles & Reporting
Infrastructure
How do you know what is running?
If I “know” it is better, do I really need to test it?
Why so many?
Sign-up Choose Template Edit site Publish Premium
The theory
Result = Fail
Intent matters
• EVERY new feature is A/B tested
• We open the new feature to a % of users ○ Measure success
○ If it is better, we keep it
○ If worse, we check why and improve
• If flawed, the impact is just for % of our users
Conclusion
Start with 50% / 50% ?
• New code can have bugs
• Conversion can drop
• Usage can drop
• Unexpected cross test dependencies
Sh*t happens (Test could fail)
• Language
• GEO
• Browser
• User-agent
• OS
Minimize affected users (in case of failure) Gradual exposure (percentage of…)
• Company employees
• User roles
• Any other criteria you have (extendable)
• All users
• First time visitors = Never visited wix.com
• New registered users = Untainted users
Not all users are equal
We need that feature
…and failure is not an option
Defensive Testing
Adding a mobile view
First trial failed
Performance had to be improved
Halting the test results in loss of data. What can we do about it?
Solution – Pause the experiment! • Maintain NEW experience for already exposed users • No additional users will be exposed to the NEW feature
PETRI’s pause implementation
• Use cookies to persist assignment ○ If user changes browser assignment is unknown
• Server side persistence solves this ○ You pay in performance & scalability
Decision
Keep feature Drop feature
Improve code & resume experiment
Keep backwards compatibility for exposed users forever?
Migrate users to another equivalent feature
Drop it all together (users lose data/work)
The road to success
• Numbers look good but sample size is small
• We need more data!
• Expand
Reaching statistical significance
25% 50% 75% 100%
75% 50% 25% 0% Control Group (A)
Test Group (B)
Keep user experience consistent
Control Group
(A)
Test Group
(B)
• Signed-in user (Editor) ○ Test group assignment is determined by the user ID ○ Guarantee toss persistency across browsers
• Anonymous user (Home page)
○ Test group assignment is randomly determined ○ Can not guarantee persistent experience if changing
browser • 11% of Wix users use more than one desktop
browser
Keeping persistent UX
There is MORE than one
# of active experiment
Possible # of states
10 1024
20 1,048,576
30 1,073,741,824
Possible states >= 2^(# experiments)
Wix has ~200 active experiments = 1.606938e+60
A/B testing introduces complexity
• Override options (URL parameters, cookies, headers…) • Near real time user BI tools
• Integrated developer tools in the product
Support tools
Define
Code
Experiment Expand
Merge code
Close
• Spec = Experiment template (in the code) ○ Define test groups ○ Mandatory limitations (filters, user types) ○ Scope = Group of related experiments (usually by product)
• Why is it needed ○ Type safety ○ Preventing human errors (typos, user types) ○ Controlled by the developer (developer knows about the context) ○ Conducting experiments in batch
Define spec
public class ExampleSpecDefinition extends SpecDefinition {
@Override protected ExperimentSpecBuilder customize(ExperimentSpecBuilder builder) {
return builder.withOwner("OWNERS_EMAIL_ADDRESS").withScopes(aScopeDefinitionForAllUserTypes(
"SOME_SCOPE")) .withTestGroups(asList("Group A", "Group B")); }}
Spec code snippet
• Experiment = “If” statement in the code
Conducting experiment
final String result = laboratory.conductExperiment(key, fallback, new StringConverter());
if (result.equals("group a")) // execute group a's logicelse if (result.equals("group b")) // execute group b's logic // in case conducting the experiment failed -
the fallback value is returned// in this case you would usually execute the
'old' logic
• Upload the specs to Petri server ○ Enables to define an experiment instance
Upload spec
{ "creationDate" : "2014-01-09T13:11:26.846Z", "updateDate" : "2014-01-09T13:11:26.846Z", "scopes" : [ { "name" : "html-editor", "onlyForLoggedInUsers" : true }, { "name" : "html-viewer","onlyForLoggedInUsers" : false } ], "testGroups" : [ "old", "new" ], "persistent" : true, "key" : "clientExperimentFullFlow1", "owner" : "" }
Start new experiment (limited population)
Manage experiment states
1. Convert A/B Test to Feature Toggle (100% ON)
2. Merge the code
3. Close the experiment
4. Remove experiment instance
Ending successful experiment
• Define spec
• Use Petri client to conduct experiment in the code (defaults to old)
• Sync spec
• Open experiment
• Manage experiment state
• End experiment
Experiment lifecycle
Petri is more than just an A/B test framework
Feature toggle
A/B Test
Personalization
Internal testing
Continuous deployment
Jira integration
Experiments
Dynamic configuration
QA
Automated testing
• Expose features internally to company employees • Enable continuous deployment with feature toggles • Select assignment by sites (not only by users) • Automatic selection of winning group* • Exposing feature to #n of users* • Integration with Jira * Planned feature
Other things we (will) do with Petri
Petri is now an open source project https://github.com/wix/petri
Q&A
Aviran Mordo Head of Back-end Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
https://github.com/wix/petri http://goo.gl/L7pHnd
Sagy Rozman Back-end Guild master
www.linkedin.com/in/sagyrozman
@sagyrozman
Credits http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg http://goo.gl/nEiepT https://www.flickr.com/photos/ilo_oli/2421536836 https://www.flickr.com/photos/dexxus/5791228117 http://goo.gl/SdeJ0o https://www.flickr.com/photos/112923805@N05/15005456062 https://www.flickr.com/photos/wiertz/8537791164 https://www.flickr.com/photos/laenulfean/5943132296 https://www.flickr.com/photos/torek/3470257377 https://www.flickr.com/photos/i5design/5393934753 https://www.flickr.com/photos/argonavigo/5320119828
• Modeled experiment lifecycle
• Open source (developed using TDD from day 1)
• Running at scale on production
• No deployment necessary
• Both back-end and front-end experiment
• Flexible architecture
Why Petri
PERTI Server Your app
Laboratory
DB Logs