Measuring For Impact: Knowing What, and How to A/B Test
You know you should A/B test.
You also know you should exercise more eat less sugar spend less on coffee wear sunscreen etc., etc.
(Don’t worry, I’m not going to say anything else
about sugar or sunscreen.)
So, how do you create a culture in which people will constructively A/B
Do six things.
1. Embrace “I don’t know”
We have 2+ ideas.
I don’t know which one will be more effective.
2. Have Data, Choose Metrics
To test, you need:• People using your product• (Approximate) agreement on the
metrics that matter
Not Many Users? Don’t A/B test!
• Laserlike, has ~60 users and has never run an A/B test
• We will run many, many tests when we have enough users
• A test should have at least a few hundred instances (and a lot more if effect sizes are likely to be small)
• Test iff you can have “business significance”
Know What You Want to Optimize
• If it’s important, you should be running tests to improve it
• If it’s not important, spend time on other things
• Most tests should be aimed at improving 1-2 specific variables
3. Have Clear Process, Tech for Testing
A/B Testing Process• New feature: if possible, roll out to a
small test subset first (10s or 100s of thousands)
• Version change: always test things that could (cumulatively) have business impact
• Everyone on the product team should be running and resolving tests
A/B Testing Tech• Using a third party testing service is
akin to building your site on Wordpress: great at some scales/competency levels
• No matter how you’re testing, a new test should be at most a few lines of code
• It should be easy to see how each side of a test compares across many variables
4. Understand the Math of What to Test
Process: Same vs. New Tweak
• What’s the probability your tweak will have a positive effect?
• What kind of effect might that have, and how might that effect change the company’s prospects?
• Will you be able to measure the change?
• Optimize on one variable, but look at others
Process: Same vs. Big Change
• What’s the probability that your change will have a negative impact?
• How big an impact might there be?• Will you be able to measure the
change?• Holistic approach
A/B Test for Quality
• Circle of Moms: test “warning” users when questions seemed short, low quality
• Resulting questions were graded for quality, without grader knowing test bucket
• End result: warning yielded ~5% fewer questions, but much higher quality
5. Understand the Math of Picking Winners
Resolving Too Soon vs. Resolving Too Late
• How big is the potential audience for this test?
• Example 1: end of year “most popular baby names” email that will never be sent again
• Example 2: Facebook signup flow
Longitudinal Tests vs. Immediate Tests
• Longitudinal: change home page, email frequency, product framing
• Need to examine effect over a long period
• Immediate: change button color, email subject
• Likely that long-term effects will be minimal
Automatically Resolve Tests?
• Longitudinal tests should not be automatically resolved
• Example: new home page design
• Immediate tests can be automatically resolved when speed is important and there is one clear objective function
• Example: Circle of Moms email subject optimization
Choose robust statistics• Bad: # of page views• Good: % of users viewing at least [5,
25, 100] pages• Potentially bad: # of sales (when
small)• Potentially good: # of people getting
through the second step of a sales funnel
6. Celebrate A/B Testing Successes