Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
3/12/2016
1
Massive Online Experiments: Practical Advice
Joseph A. Konstan
moderator
Massive Online Experiments?
3/12/2016
2
Massive Online Experiments?
In September 2000, Amazon.com outraged some customers when its own price discrimination was revealed. One buyer reportedly deleted the cookies on his computer that identified him as a regular Amazon customer. The result? He watched the price of a DVD offered to him for sale drop from $26.24 to $22.74. The company said the difference was the result of a random price test and offered to refund customers who paid the higher prices.
Massive Online Experiments?
3/12/2016
3
Is Offline Different?
Introducing Your Panelists
Duncan Watts Microsoft Research
Jeff Hancock Stanford
University
Elizabeth Churchill Google
3/12/2016
4
Tweet Your Questions
#ACMatSXSW16
Tweet Your Questions
#ACMatSXSW16
Joe
3/12/2016
5
Tweet Your Questions
#ACMatSXSW16
Joe
Tweet Your Questions
#ACMatSXSW16
Joe
3/12/2016
6
Tweet Your Questions
#ACMatSXSW16
The A/B Illusion
Duncan Watts Microsoft Research
SXSW Panel on Massive Online Experiments
3/12/2016
7
Individual Decision Making
Individual Decision Making
3/12/2016
8
Individual Decision Making
Traditionally, Policy Making Has Looked the Same
“Which Policy is Best: A or B?”
3/12/2016
9
Traditionally, Policy Making Has Looked the Same
POLICY A
After some argument….
Traditionally, Policy Making Has Looked the Same
POLICY B
But also possibly….
3/12/2016
10
Traditionally, Policy Making Has Looked the Same
POLICY A
POLICY B
We only ever see A or B, so we never know which would have been better
???
Increasingly Technology Allows us to “A/B Test” Rather than Argue
A/B Testing is pervasive in online settings
3/12/2016
11
A/B Testing Can be Applied To Many (But Not All) Policy Decisions
POLICY A
POLICY B
50%
50%
The A/B Illusion
• When it is unclear ex-ante which of A and B is better: – It seems ethical to apply policy A to 100% of people – It seems ethical to apply policy B to 100% of people
• So why does it seem unethical to randomly assign policy A to 50% of people and policy B to the other 50%?
• Facebook “Emotional Contagion” experiment • OK Cupid matching experiment
• It is not because 50% are getting “worse” treatment – That would be unethical, but it also be unethical to give it
to 100% of people
• Call this the “A/B Illusion” (Meyer and Chabris, 2015)
3/12/2016
12
Why The Illusion?
• Is it just that “experiment” and “manipulation” have negative connotations? – Would “policy testing” or “learning what works” be better?
• Is it that randomization itself is bad – Are arbitrary human decisions better than algorithms?
• Is it that experimentation concedes ignorance? – Would we prefer to keep making mistakes than to
acknowledge the limits or our knowledge?
• Is it that changing how we make decisions shifts power from traditional decision makers? – The “Moneyballization of policy”
• Something else?
You Manipulated What? Lessons from the FB Emotional
Contagion Experiment
Jeff Hancock Stanford University
SXSW Panel on Massive Online Experiments
3/12/2016
13
Reaction from Users
“My friend’s dad just passed away,
and if I was in this
experiment I’d never have known”
2. Newsfeed is important
“How dare you manipulate my
Newsfeed!” 1. Newsfeed is manipulated?
3. Emotions are distinct
4. Big data is personal “I want to know if I was in your
experiment”
“mood control”
“mind control”
3/12/2016
14
Reaction from Media
"unwitting guinea pigs"
"treating people like laboratory rats
"tweaking the newsfeed algorithm”
“tinkering with people's emotions”
manipulated users' emotions
% articles
tinker guinea Pig manipulation
Reaction from Media
3/12/2016
15
* * FB coverage
emotion
non-emotion
% word count
Reaction from Media
1st Person Singular
Anger Anxiety
Reaction from Academia
3/12/2016
16
* * FB coverage
emotion
non-emotion
% word count
1st Person Singular
Anger Anxiety
Reaction from Academia
1. Understanding user’s folk theories 2. Consent 3. Assessing risk beside privacy
3/12/2016
17
1. Evidence-based design 2. What is the control group? 3. Autonomous experiments 4. the 2% vs. 50% problem
EXPERIMENT(AL) DESIGN
Elizabeth F. Churchill
3/12/2016
18
Key takeaway
Design the experiment
as you design the experience
Design programmatically
for the bigger picture
3/12/2016
19
An engagement story, a cautionary tale
Duration of chat session # of Play/
Pause
# of Chats
# of Scrubs
Duration
of chat
session# of Play/
Pause
# of
Chats
# of
Scrubs
Work with David Ayman Shamma
3/12/2016
20
What to collect to measure engagement?
• Type of event (e.g., player command or a normal chat message)
• Anonymous hash (e.g., uniquely identifies the sender and the receiver, without exposing personal account data)
• Timestamp for the event
• The player time (with respect to the specific video) at the point the event occurred
• The number of characters and the number words typed (for chat messages)
• Emoticons used in the chat message
• URL to the shared video
Work with David Ayman Shamma
Volume of actions over time – an arbitrary session
Work with David Ayman Shamma
3/12/2016
21
Volume of actions over time – a human session
Work with David Ayman Shamma
Chat follows the video
CHAT
Work with David Ayman Shamma
3/12/2016
22
www.90percentofeverything.com
3/12/2016
23
Discover Define Develop Deliver
EXPLORE EVALUATE, EXPLORE EVALUATE
Double Diamond model – Design research council -http://www.designcouncil.org.uk
A framework
GLOBAL LOCAL
EXPLORE Open questions,
ethnographic
studies, surveys,
observations,
market analysis
Based on prior and
related data, A/B
tests explore
potential for large
gains from small
changes
EVALUATE Rough prototypes,
A/B tested
triangulate
approaches &
studies
A/B tests for small
changes, strong
hypotheses,
clarifying questions
Work with Rochelle King and Caitlin Tan
3/12/2016
24
Key takeaway
Design the experiment
as you design the experience
Design programmatically
for the bigger picture
Questions?
Also: More information
in our upcoming book.
3/12/2016
25
Q&A
Remember: Tweet to #ACMatSXSW16