Massive Online Experiments: Practical Advice · SXSW Panel on Massive Online Experiments . 3/12/2016 13 ... “mind control” ... 3/12/2016 24 Key takeaway Design the experiment

3/12/2016

1

Massive Online Experiments: Practical Advice

Joseph A. Konstan

moderator

Massive Online Experiments?

3/12/2016

2


In September 2000, Amazon.com outraged some customers when its own price discrimination was revealed. One buyer reportedly deleted the cookies on his computer that identified him as a regular Amazon customer. The result? He watched the price of a DVD offered to him for sale drop from $26.24 to $22.74. The company said the difference was the result of a random price test and offered to refund customers who paid the higher prices.


3/12/2016

3

Is Offline Different?

Introducing Your Panelists

Duncan Watts Microsoft Research

Jeff Hancock Stanford

University

Elizabeth Churchill Google

3/12/2016

4

Tweet Your Questions

#ACMatSXSW16


#ACMatSXSW16

Joe

3/12/2016

5


#ACMatSXSW16

Joe


#ACMatSXSW16

Joe

3/12/2016

6


#ACMatSXSW16

The A/B Illusion

Duncan Watts Microsoft Research

SXSW Panel on Massive Online Experiments

3/12/2016

7

Individual Decision Making


3/12/2016

8


Traditionally, Policy Making Has Looked the Same

“Which Policy is Best: A or B?”

3/12/2016

9


POLICY A

After some argument….


POLICY B

But also possibly….

3/12/2016

10


POLICY A

POLICY B

We only ever see A or B, so we never know which would have been better

???

Increasingly Technology Allows us to “A/B Test” Rather than Argue

A/B Testing is pervasive in online settings

3/12/2016

11

A/B Testing Can be Applied To Many (But Not All) Policy Decisions

POLICY A

POLICY B

50%

50%

The A/B Illusion

• When it is unclear ex-ante which of A and B is better: – It seems ethical to apply policy A to 100% of people – It seems ethical to apply policy B to 100% of people

• So why does it seem unethical to randomly assign policy A to 50% of people and policy B to the other 50%?

• Facebook “Emotional Contagion” experiment • OK Cupid matching experiment

• It is not because 50% are getting “worse” treatment – That would be unethical, but it also be unethical to give it

to 100% of people

• Call this the “A/B Illusion” (Meyer and Chabris, 2015)

3/12/2016

12

Why The Illusion?

• Is it just that “experiment” and “manipulation” have negative connotations? – Would “policy testing” or “learning what works” be better?

• Is it that randomization itself is bad – Are arbitrary human decisions better than algorithms?

• Is it that experimentation concedes ignorance? – Would we prefer to keep making mistakes than to

acknowledge the limits or our knowledge?

• Is it that changing how we make decisions shifts power from traditional decision makers? – The “Moneyballization of policy”

• Something else?

You Manipulated What? Lessons from the FB Emotional

Contagion Experiment

Jeff Hancock Stanford University

SXSW Panel on Massive Online Experiments

3/12/2016

13

Reaction from Users

“My friend’s dad just passed away,

and if I was in this

experiment I’d never have known”

2. Newsfeed is important

“How dare you manipulate my

Newsfeed!” 1. Newsfeed is manipulated?

3. Emotions are distinct

4. Big data is personal “I want to know if I was in your

experiment”

“mood control”

“mind control”

3/12/2016

14

Reaction from Media

"unwitting guinea pigs"

"treating people like laboratory rats

"tweaking the newsfeed algorithm”

“tinkering with people's emotions”

manipulated users' emotions

% articles

tinker guinea Pig manipulation

Reaction from Media

3/12/2016

15

* * FB coverage

emotion

non-emotion

% word count

Reaction from Media

1st Person Singular

Anger Anxiety

Reaction from Academia

3/12/2016

16

* * FB coverage

emotion

non-emotion

% word count

1st Person Singular

Anger Anxiety

Reaction from Academia

1. Understanding user’s folk theories 2. Consent 3. Assessing risk beside privacy

3/12/2016

17

1. Evidence-based design 2. What is the control group? 3. Autonomous experiments 4. the 2% vs. 50% problem

EXPERIMENT(AL) DESIGN

Elizabeth F. Churchill

3/12/2016

18

Key takeaway

Design the experiment

as you design the experience

Design programmatically

for the bigger picture

3/12/2016

19

An engagement story, a cautionary tale

Duration of chat session # of Play/

Pause

# of Chats

# of Scrubs

Duration

of chat

session# of Play/

Pause

# of

Chats

# of

Scrubs

Work with David Ayman Shamma

3/12/2016

20

What to collect to measure engagement?

• Type of event (e.g., player command or a normal chat message)

• Anonymous hash (e.g., uniquely identifies the sender and the receiver, without exposing personal account data)

• Timestamp for the event

• The player time (with respect to the specific video) at the point the event occurred

• The number of characters and the number words typed (for chat messages)

• Emoticons used in the chat message

• URL to the shared video


Volume of actions over time – an arbitrary session


3/12/2016

21

Volume of actions over time – a human session


Chat follows the video

CHAT


3/12/2016

22

www.90percentofeverything.com

3/12/2016

23

Discover Define Develop Deliver

EXPLORE EVALUATE, EXPLORE EVALUATE

Double Diamond model – Design research council -http://www.designcouncil.org.uk

A framework

GLOBAL LOCAL

EXPLORE Open questions,

ethnographic

studies, surveys,

observations,

market analysis

Based on prior and

related data, A/B

tests explore

potential for large

gains from small

changes

EVALUATE Rough prototypes,

A/B tested

triangulate

approaches &

studies

A/B tests for small

changes, strong

hypotheses,

clarifying questions

Work with Rochelle King and Caitlin Tan

3/12/2016

24

Key takeaway

Design the experiment

as you design the experience

Design programmatically

for the bigger picture

Questions?

[email protected]

Also: More information

in our upcoming book.

mailto:[email protected]

3/12/2016

25

Q&A

Remember: Tweet to #ACMatSXSW16

Documents

Massive Online Experiments: Practical Advice · SXSW Panel on Massive Online Experiments . 3/12/2016 13 ... “mind control” ... 3/12/2016 24 Key takeaway Design the experiment