View
110
Download
5
Category
Tags:
Preview:
DESCRIPTION
Citation preview
Experimental Design
Sergei VassilvitskiiColumbia University
Computational Social ScienceApril 5, 2013
Thursday, April 25, 13
Sergei Vassilvitskii
Measurement
2
“Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”
- John Wanamaker
Thursday, April 25, 13
Sergei Vassilvitskii
Measurement
3
“Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”
- John Wanamaker, 1875
Thursday, April 25, 13
Sergei Vassilvitskii
Helping John:
4Thursday, April 25, 13
Sergei Vassilvitskii
Helping John:
Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget
5Thursday, April 25, 13
Sergei Vassilvitskii
Idea 1:
Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget
Findings:– Total sales typically higher after intense advertising
6Thursday, April 25, 13
Sergei Vassilvitskii
Idea 1:
Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget
Findings:– Total sales typically higher after intense advertising
Problems:– Stores advertise when people tend to spend – Christmas shopping periods– Travel during the summer – Ski gear in winter, etc.
7Thursday, April 25, 13
Sergei Vassilvitskii
Correlation vs. Causation
8Thursday, April 25, 13
Sergei Vassilvitskii
Idea 1
Within Subject pre-test, post-test design.
9Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
“Measuring the online sales impact of an online ad or a paid-search campaign -- in which a company pays to have its link appear at the top of a page of search results -- is straightforward: We determine who has viewed the ad, then compare online purchases made by those who have and those who have not seen it."
10Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
“Measuring the online sales impact of an online ad or a paid-search campaign -- in which a company pays to have its link appear at the top of a page of search results -- is straightforward: We determine who has viewed the ad, then compare online purchases made by those who have and those who have not seen it." – Magid Abraham, CEO, President & Co-Founder of ComScore, in HBR
article (2008)
11Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
Measure the difference between people who see ads and who don’t.
12Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
Measure the difference between people who see ads and who don’t.
Findings:– People who see the ads are more likely to react to them
13Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
Measure the difference between people who see ads and who don’t.
Findings:– People who see the ads are more likely to react to them
Problems:– Ads are finely targeted. These are exactly the people who are likely to
click! – Don’t advertise cars in fashion magazines. – Even more extreme online -- which ads are shown depends on the
propensity of the user to click on the ad.
14Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
15Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
Problems:– Hard to define “the same.” Beware of lurking variables.
16Thursday, April 25, 13
Sergei Vassilvitskii
Ad Wear-out
17
What is the optimal number of times to show an ad?
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad Wear-out
Few:– Don’t want user to be annoyed
– No need to waste money if ad is ineffective
Many:– Make sure the user sees it
– Reinforce the message
18
What is the optimal number of times to show an ad?
Thursday, April 25, 13
Sergei Vassilvitskii
Observational Study
Look through the data:– Find the users who saw the ad once
– Find the users who saw the ad many times
19Thursday, April 25, 13
Sergei Vassilvitskii
Observational Study
Look through the data:– Find the users who saw the ad once
– Find the users who saw the ad many times
Measure Revenue for the two sets of users: –
Conclusion: Limit the number of impressions
20Thursday, April 25, 13
Sergei Vassilvitskii
Correlations
Why did some users only see the ad once? – They must use the web differently
– : Sign on once a week to check email
– : Are always online
21Thursday, April 25, 13
Sergei Vassilvitskii
Correlations
Why did some users only see the ad once? – They must use the web differently
– : Sign on once a week to check email
– : Are always online
Correct conclusion:– People who visit the homepage often are unlikely to click on ads
– Have not measured the effect of wear-out
22Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
Problems:– Hard to define “the same.” Beware of lurking variables.
23Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox
Kidney Stones [Real Data].
You have Kidney stones. There are two treatments A & B. – Empirically, treatment A is effective 78% of time– Empirically, treatment B is effective 83% of time– Which one do you chose?
24Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox
Kidney Stones [Real Data].
You have Kidney stones. There are two treatments A & B. Digging into the data you see:
If they are large:– Treatment A is effective 73% of the time– Treatment B is effective 69% of the time
If they are small:– Treatment A is effective 93% of the time– Treatment B is effective 87% of the time
25Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox
If they are large:– Treatment A is effective 73% of the time– Treatment B is effective 69% of the time
If they are small:– Treatment A is effective 93% of the time– Treatment B is effective 87% of the time
Overall:– Treatment A is effective 78% of the time– Treatment B is effective 83% of the time
26Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox Summary Stats
27
A B
Small 81/87 (93%) 234/270 (87%)
Large 192/263 (73%) 55/80 (69%)
Combined 273/350 (78%) 289/350 (83%)
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
Problems:– Hard to define “the same.” Beware of lurking variables.– Simpson’s Paradox
28Thursday, April 25, 13
Sergei Vassilvitskii
Getting at Causation
Randomized, Controlled Experiments. – Select a target population– Randomly decide whom to show the ad– Subjects cannot influence whether they are in the treatment or control
groups
29Thursday, April 25, 13
Sergei Vassilvitskii
Measuring Wear Out
30
Parallel Universe
Thursday, April 25, 13
Sergei Vassilvitskii
Measuring Wear Out
31
Parallel Universe
Control Treatment
++
Thursday, April 25, 13
Sergei Vassilvitskii
Measuring Wear Out
32
Parallel Universe
Control TreatmentControl Treatment
++
Thursday, April 25, 13
Sergei Vassilvitskii
Creating Parallel Universes
When user first arrives:– Check browser cookie, assign to control or treatment group– Control group: shown PSA– Treatment group: shown ad– Treatment the same on repeated visits
33Thursday, April 25, 13
Sergei Vassilvitskii
Creating Parallel Universes
When user first arrives:– Check browser cookie, assign to control or treatment group– Control group: shown PSA– Treatment group: shown ad– Treatment the same on repeated visits
Advertising Effects:– Positive !– But smaller than reported through observational studies
34Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
35Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:– Can reach tens of millions of people!
• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%!
36Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:– Can reach tens of millions of people!
• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%!
– Can be relatively cheap (Mechanical Turk)
37Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:– Can reach tens of millions of people!
• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%!
– Can be relatively cheap– Can be recruit diverse subjects
• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic).
38Thursday, April 25, 13
Sergei Vassilvitskii
WEIRD People
Which line is longer?
– Henrich, Joseph; Heine, Steven J.; Norenzayan, Ara (2010) : The weirdest people in the world?, Working Paper Series des Rates für Sozialund Wirtschaftsdaten
39Thursday, April 25, 13
Sergei Vassilvitskii
WEIRD People
40Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:– Can reach tens of millions of people!
• Can estimate very small effects.
– Can be relatively cheap– Can be recruit diverse subjects
• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic).
– Access: subjects in other countries, geographically diverse– Can be quick
41Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:– Can reach tens of millions of people!
• Can estimate very small effects.
– Can be relatively cheap– Can be recruit diverse subjects
• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic).
– Access: subjects in other countries, geographically diverse– Can be quick
Challenges:– Limited choice in range of treatments (no MRI studies)– Do people behave differently offline?
42Thursday, April 25, 13
Sergei Vassilvitskii
External Validity
Major Challenge in all lab experiments:– Virtual and physical labs– Do findings hold outside the lab?
Enter:– Natural Experiments
43Thursday, April 25, 13
Sergei Vassilvitskii
Natural Experiments
The experimental condition:– Is not decided by the experimenter– But is exogenous (subjects have no effect on the results)
44Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad-wear out
Back to Ad-wear out.
Natural Experiment:– When there were two competing campaigns, the Yahoo! ad server
decided which campaign to show at random!
– This was by engineering design -- both campaigns got an equal share of pageviews. (Less complex, easy to distribute than a round robin system)
45
Few:– Don’t want user to be annoyed
– No need to waste money if ad is ineffective
Many:– Make sure the user sees it
– Reinforce the message
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad-wear out
Natural Experiment:– When there were two competing campaigns, the Yahoo! ad server
decided which campaign to show at random!
– This was by engineering design -- both campaigns got an equal share of pageviews. (Less complex, easy to distribute than a round robin system)
Experiments:– Compare behavior of people who saw the same total number of ads,
but different number of each campaign.
46Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad-wear out
47
Yes:– Some advertisements see a 5x drop in click-through rate after the
first exposure
– These typically have very high click-through rates
No:– Others see no decrease in click-through rate even after ten exposures
– Have lower, but steady click-through rates
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Does a higher Yelp Rating lead to higher revenue?
How to do the experiment?
48Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Does a higher Yelp Rating lead to higher revenue?
How to do the experiment?– Observational -- no causality.– Control -- deception.– Natural?
49Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Does a higher Yelp Rating lead to higher revenue?
Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars
50Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars
Data:– Raw ratings from Yelp– Restaurant revenue (from tax records)
51Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars
Data:– Raw ratings from Yelp– Restaurant revenue (from tax records) – Finding: a one star increase leads to a 5-9% increase in revenue.
52Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 3: Badges
How do Badges influence user behavior?
Specifically:– The “epic” badge on stackoverflow.
– Awarded after hitting the maximum number of points (through posts, responses, etc.) on 50 distinct days.
53Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 3: Badges
How do Badges influence user behavior?
Specifically:– The “epic” badge on stackoverflow.
– Awarded after hitting the maximum number of points (through posts, responses, etc.) on 50 distinct days.
Experimental Design:– Within subject pre-post test (again)
– Look at user behavior before/after receiving badge
– Averaged over different user, different timings, (hopefully) all other factors.
54Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 3: Badges
Results:
55Thursday, April 25, 13
Sergei Vassilvitskii
Overall
Experimental Design is hard!– Be extra skeptical in your analyses. Lots of spurious correlations
Experiments:– Natural and Controlled are best way to measure effects
Observational Data:– Sometimes best you can do– Can lead interesting descriptive insights– But beware of correlations!
56Thursday, April 25, 13
Recommended