56
Experimental Design Sergei Vassilvitskii Columbia University Computational Social Science April 5, 2013 Thursday, April 25, 13

Computational Social Science, Lecture 10: Online Experiments

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Computational Social Science, Lecture 10: Online Experiments

Experimental Design

Sergei VassilvitskiiColumbia University

Computational Social ScienceApril 5, 2013

Thursday, April 25, 13

Page 2: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Measurement

2

“Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”

- John Wanamaker

Thursday, April 25, 13

Page 3: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Measurement

3

“Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”

- John Wanamaker, 1875

Thursday, April 25, 13

Page 4: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Helping John:

4Thursday, April 25, 13

Page 5: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Helping John:

Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget

5Thursday, April 25, 13

Page 6: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 1:

Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget

Findings:– Total sales typically higher after intense advertising

6Thursday, April 25, 13

Page 7: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 1:

Idea 1: Measure the final effect:– Track total store sales, compare to advertising budget

Findings:– Total sales typically higher after intense advertising

Problems:– Stores advertise when people tend to spend – Christmas shopping periods– Travel during the summer – Ski gear in winter, etc.

7Thursday, April 25, 13

Page 8: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Correlation vs. Causation

8Thursday, April 25, 13

Page 9: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 1

Within Subject pre-test, post-test design.

9Thursday, April 25, 13

Page 10: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 2

“Measuring the online sales impact of an online ad or a paid-search campaign -- in which a company pays to have its link appear at the top of a page of search results -- is straightforward: We determine who has viewed the ad, then compare online purchases made by those who have and those who have not seen it." 

10Thursday, April 25, 13

Page 11: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 2

“Measuring the online sales impact of an online ad or a paid-search campaign -- in which a company pays to have its link appear at the top of a page of search results -- is straightforward: We determine who has viewed the ad, then compare online purchases made by those who have and those who have not seen it." – Magid Abraham, CEO, President & Co-Founder of ComScore, in HBR

article (2008)

11Thursday, April 25, 13

Page 12: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 2

Measure the difference between people who see ads and who don’t.

12Thursday, April 25, 13

Page 13: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 2

Measure the difference between people who see ads and who don’t.

Findings:– People who see the ads are more likely to react to them

13Thursday, April 25, 13

Page 14: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 2

Measure the difference between people who see ads and who don’t.

Findings:– People who see the ads are more likely to react to them

Problems:– Ads are finely targeted. These are exactly the people who are likely to

click! – Don’t advertise cars in fashion magazines. – Even more extreme online -- which ads are shown depends on the

propensity of the user to click on the ad.

14Thursday, April 25, 13

Page 15: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 3

Matching:– Compare people in a group who saw an ad with people who are

similar, but didn’t see an ad, but are otherwise “the same.”

15Thursday, April 25, 13

Page 16: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 3

Matching:– Compare people in a group who saw an ad with people who are

similar, but didn’t see an ad, but are otherwise “the same.”

Problems:– Hard to define “the same.” Beware of lurking variables.

16Thursday, April 25, 13

Page 17: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Ad Wear-out

17

What is the optimal number of times to show an ad?

Thursday, April 25, 13

Page 18: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study: Ad Wear-out

Few:– Don’t want user to be annoyed

– No need to waste money if ad is ineffective

Many:– Make sure the user sees it

– Reinforce the message

18

What is the optimal number of times to show an ad?

Thursday, April 25, 13

Page 19: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Observational Study

Look through the data:– Find the users who saw the ad once

– Find the users who saw the ad many times

19Thursday, April 25, 13

Page 20: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Observational Study

Look through the data:– Find the users who saw the ad once

– Find the users who saw the ad many times

Measure Revenue for the two sets of users: –

Conclusion: Limit the number of impressions

20Thursday, April 25, 13

Page 21: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Correlations

Why did some users only see the ad once? – They must use the web differently

– : Sign on once a week to check email

– : Are always online

21Thursday, April 25, 13

Page 22: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Correlations

Why did some users only see the ad once? – They must use the web differently

– : Sign on once a week to check email

– : Are always online

Correct conclusion:– People who visit the homepage often are unlikely to click on ads

– Have not measured the effect of wear-out

22Thursday, April 25, 13

Page 23: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 3

Matching:– Compare people in a group who saw an ad with people who are

similar, but didn’t see an ad, but are otherwise “the same.”

Problems:– Hard to define “the same.” Beware of lurking variables.

23Thursday, April 25, 13

Page 24: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Simpson’s Paradox

Kidney Stones [Real Data].

You have Kidney stones. There are two treatments A & B. – Empirically, treatment A is effective 78% of time– Empirically, treatment B is effective 83% of time– Which one do you chose?

24Thursday, April 25, 13

Page 25: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Simpson’s Paradox

Kidney Stones [Real Data].

You have Kidney stones. There are two treatments A & B. Digging into the data you see:

If they are large:– Treatment A is effective 73% of the time– Treatment B is effective 69% of the time

If they are small:– Treatment A is effective 93% of the time– Treatment B is effective 87% of the time

25Thursday, April 25, 13

Page 26: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Simpson’s Paradox

If they are large:– Treatment A is effective 73% of the time– Treatment B is effective 69% of the time

If they are small:– Treatment A is effective 93% of the time– Treatment B is effective 87% of the time

Overall:– Treatment A is effective 78% of the time– Treatment B is effective 83% of the time

26Thursday, April 25, 13

Page 27: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Simpson’s Paradox Summary Stats

27

A B

Small 81/87 (93%) 234/270 (87%)

Large 192/263 (73%) 55/80 (69%)

Combined 273/350 (78%) 289/350 (83%)

Thursday, April 25, 13

Page 28: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Idea 3

Matching:– Compare people in a group who saw an ad with people who are

similar, but didn’t see an ad, but are otherwise “the same.”

Problems:– Hard to define “the same.” Beware of lurking variables.– Simpson’s Paradox

28Thursday, April 25, 13

Page 29: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Getting at Causation

Randomized, Controlled Experiments. – Select a target population– Randomly decide whom to show the ad– Subjects cannot influence whether they are in the treatment or control

groups

29Thursday, April 25, 13

Page 30: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Measuring Wear Out

30

Parallel Universe

Thursday, April 25, 13

Page 31: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Measuring Wear Out

31

Parallel Universe

Control Treatment

++

Thursday, April 25, 13

Page 32: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Measuring Wear Out

32

Parallel Universe

Control TreatmentControl Treatment

++

Thursday, April 25, 13

Page 33: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Creating Parallel Universes

When user first arrives:– Check browser cookie, assign to control or treatment group– Control group: shown PSA– Treatment group: shown ad– Treatment the same on repeated visits

33Thursday, April 25, 13

Page 34: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Creating Parallel Universes

When user first arrives:– Check browser cookie, assign to control or treatment group– Control group: shown PSA– Treatment group: shown ad– Treatment the same on repeated visits

Advertising Effects:– Positive !– But smaller than reported through observational studies

34Thursday, April 25, 13

Page 35: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Online Experiments

Advantages:

35Thursday, April 25, 13

Page 36: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Online Experiments

Advantages:– Can reach tens of millions of people!

• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%!

36Thursday, April 25, 13

Page 37: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Online Experiments

Advantages:– Can reach tens of millions of people!

• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%!

– Can be relatively cheap (Mechanical Turk)

37Thursday, April 25, 13

Page 38: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Online Experiments

Advantages:– Can reach tens of millions of people!

• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%!

– Can be relatively cheap– Can be recruit diverse subjects

• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic).

38Thursday, April 25, 13

Page 39: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

WEIRD People

Which line is longer?

– Henrich, Joseph; Heine, Steven J.; Norenzayan, Ara (2010) : The weirdest people in the world?, Working Paper Series des Rates für Sozialund Wirtschaftsdaten

39Thursday, April 25, 13

Page 40: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

WEIRD People

40Thursday, April 25, 13

Page 41: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Online Experiments

Advantages:– Can reach tens of millions of people!

• Can estimate very small effects.

– Can be relatively cheap– Can be recruit diverse subjects

• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic).

– Access: subjects in other countries, geographically diverse– Can be quick

41Thursday, April 25, 13

Page 42: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Online Experiments

Advantages:– Can reach tens of millions of people!

• Can estimate very small effects.

– Can be relatively cheap– Can be recruit diverse subjects

• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic).

– Access: subjects in other countries, geographically diverse– Can be quick

Challenges:– Limited choice in range of treatments (no MRI studies)– Do people behave differently offline?

42Thursday, April 25, 13

Page 43: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

External Validity

Major Challenge in all lab experiments:– Virtual and physical labs– Do findings hold outside the lab?

Enter:– Natural Experiments

43Thursday, April 25, 13

Page 44: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Natural Experiments

The experimental condition:– Is not decided by the experimenter– But is exogenous (subjects have no effect on the results)

44Thursday, April 25, 13

Page 45: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study: Ad-wear out

Back to Ad-wear out.

Natural Experiment:– When there were two competing campaigns, the Yahoo! ad server

decided which campaign to show at random!

– This was by engineering design -- both campaigns got an equal share of pageviews. (Less complex, easy to distribute than a round robin system)

45

Few:– Don’t want user to be annoyed

– No need to waste money if ad is ineffective

Many:– Make sure the user sees it

– Reinforce the message

Thursday, April 25, 13

Page 46: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study: Ad-wear out

Natural Experiment:– When there were two competing campaigns, the Yahoo! ad server

decided which campaign to show at random!

– This was by engineering design -- both campaigns got an equal share of pageviews. (Less complex, easy to distribute than a round robin system)

Experiments:– Compare behavior of people who saw the same total number of ads,

but different number of each campaign.

46Thursday, April 25, 13

Page 47: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study: Ad-wear out

47

Yes:– Some advertisements see a 5x drop in click-through rate after the

first exposure

– These typically have very high click-through rates

No:– Others see no decrease in click-through rate even after ten exposures

– Have lower, but steady click-through rates

Thursday, April 25, 13

Page 48: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 2: Yelp

Does a higher Yelp Rating lead to higher revenue?

How to do the experiment?

48Thursday, April 25, 13

Page 49: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 2: Yelp

Does a higher Yelp Rating lead to higher revenue?

How to do the experiment?– Observational -- no causality.– Control -- deception.– Natural?

49Thursday, April 25, 13

Page 50: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 2: Yelp

Does a higher Yelp Rating lead to higher revenue?

Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars

50Thursday, April 25, 13

Page 51: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 2: Yelp

Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars

Data:– Raw ratings from Yelp– Restaurant revenue (from tax records)

51Thursday, April 25, 13

Page 52: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 2: Yelp

Natural Experiment:– Yelp rounds ratings to the nearest half star.– 4.24 becomes 4 stars, 4.26 is 4.5 stars

Data:– Raw ratings from Yelp– Restaurant revenue (from tax records) – Finding: a one star increase leads to a 5-9% increase in revenue.

52Thursday, April 25, 13

Page 53: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 3: Badges

How do Badges influence user behavior?

Specifically:– The “epic” badge on stackoverflow.

– Awarded after hitting the maximum number of points (through posts, responses, etc.) on 50 distinct days.

53Thursday, April 25, 13

Page 54: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 3: Badges

How do Badges influence user behavior?

Specifically:– The “epic” badge on stackoverflow.

– Awarded after hitting the maximum number of points (through posts, responses, etc.) on 50 distinct days.

Experimental Design:– Within subject pre-post test (again)

– Look at user behavior before/after receiving badge

– Averaged over different user, different timings, (hopefully) all other factors.

54Thursday, April 25, 13

Page 55: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Case Study 3: Badges

Results:

55Thursday, April 25, 13

Page 56: Computational Social Science, Lecture 10: Online Experiments

Sergei Vassilvitskii

Overall

Experimental Design is hard!– Be extra skeptical in your analyses. Lots of spurious correlations

Experiments:– Natural and Controlled are best way to measure effects

Observational Data:– Sometimes best you can do– Can lead interesting descriptive insights– But beware of correlations!

56Thursday, April 25, 13