- Home
- Data & Analytics
*A/B Testing for Game Design*

prev

next

out of 102

View

73Category

## Data & Analytics

Embed Size (px)

Steve Collins CTO / Swrve

A/B Testing for Game Design Iteration: A Bayesian Approach

Biography

Push Notifications In-App MessagingA/B TestingTargeting & Analytics

Introduction to A/B Testing

Game Service

Test

Dev

Upd

ate Test

Dev

Upd

ate Test

Dev

Upd

ate Test

Dev

Upd

ate

Development Beta

Soft Launch

LAUNCHTest

Dev

Launch

On-board

1st experience Tutorial Flow Incentives / Promo

MonetizationIAPs, virtual items Promotions / oers / Ads Economy / Game Balance

RetentionMessaging / Push Notify Core loops, events Community / Social

EngagementFun factor! Content refresh / evolution Level balance, progress

LTV (+$)

Acquisition

Ads (CPC, CPI) Cross Promo Discovery, PR

UAC (-$)

ROI = LTV - UAC

ROI UACLTV =nX

x=1

ARPUx COSTx(1 +WACC)x

CACARPULTV =nX

x=1

ARPUx COSTx(1 +WACC)x

CACLTV =nX

x=1

ARPUx COSTx(1 +WACC)x

CACd

d

lifetime?

LTV

{

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30" 31" 32"

High Mean Low

Install-day-cohort 30-day revenue

Test%Hypotheses%Data$driven$

Take%ac'on%Iterate,'fail+fast'

Understand)Metrics()>(Analy0cs()>(Insight(

1. Split population 2. Show variations 4. Choose winner 4. Deploy winner3. Measure response

State Sync State Sync

Game State DBData Warehouse

Analyse

Events

(tagg

ed wit

h A or

B)

Game Server

Content Management System

A or B variant

A/B Test Server

What to test?

vs

Message layouts / content

choke point

Tutorial Flow

vs

25 25 -20% -40%

Promotion Discounts

Elasticity testing: exchange rate

$4.99 $4.99

100 coins

A B

200 coins

vs

Price set A

Price set B

Price set C

Store Inventory

(f) Zoomed in version of the previous scatterplot. Note that jitter and transparencyare used to make scatterplot easier to read; apparent banding is due to jitter. Itappears that most users who initially buy in for small amounts tend to contributeonly a small amount of total revenue.

2.1.4 Cheaters/Pirates

Up to now, all charts have removed spending that is not verified, or has beenfrom users marked as Cheaters.

We now look at the cumulative revenue histogram for just those users whohave been marked as cheaters, or failed verification or who have bought in,but never had a purchase verified.

8

Value of first purchase

Tota

l sp

end

$70$30$15

A/B Test

VIPs

Repeat Purchases

$5$2

Timing

Wait Sessions (3)

Wait Event(movie_closed)

In-app Msg"Request Facebook

Registration"FB Reg

when where

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

number of user trials

Prob

abilit

y of w

inning

Canacandycrushbalt*

Day 1 retention = 30%

* Apologies to King and Adam Saltsman

Beta Test

Original New Version

Expecting 30% day-1 retention

After 50 users, we see 0%

Is this bad?

Null Hypothesis Testing (NHT) View

The Null Hypothesis

f(x) =1p22

e(x)2)

22

H0 : x !

H0 : x !

Accept

Reject

H0 : x =

30%

f(x) =1p22

e(x)2)

22Reject

0%

30%

Conclusion: 30% is unlikely to be the retention rate

Issue #1 p-value

p-Value

95%

2.5% 2.5%

f(x) =1p22

e(x)2)

22

P [r, n] =n!

(n r)!r!pr(1 p)nr

= np =np(1 p)

z =x

z =x

n

SE(x) =n

n =(z + z)2 2

2

+ 1.96

1.96

P [r, n] =n!

(n r)!r!pr(1 p)nr

= np =np(1 p)

z =x

z =x

n

SE(x) =n

n =(z + z)2 2

2

+ 1.96

1.96

p-Value

The probability of observing as extreme a result assuming the null hypothesis is true

The probability of the data given the model

OR

Null Hypothesis: H0

H H

Accept H Type-II Error

Reject H Type-I Error

Truth

Observation

False Positive

All we can ever say is either

not enough evidence that retention rates are the same

the retention rates are different, 95% of the time

p < 0.05

p-Value

actually...

The evidence supports a rejection of the null hypothesis, i.e.

the probability of seeing a result as extreme as this, assuming

the retention rate is actually 30%, is less than 5%.

p < 0.05

Issue #2 Peeking

Required change

Standard deviationNumber of

participants per group n 16

2

2

Peeks 5% Equivalent

1 2.9%

2 2.2%

3 1.8%

5 1.4%

10 1%

To get 5% false positive rate you need...

i.e. 5 timeshttp://www.evanmiller.org/how-not-to-run-an-ab-test.html

Issue #3 Family-wise Error

Family-wise Error

p = P( Type-I Error) = 0.05

5% of the time we will get a false positive - for one treatment

P( no Type-I Error) = 0.95

P( at least 1 Type-I Error for 2 treatments) = (1 - 0.9025) = 0.0975P( no Type-I Error for 2 treatments) = (0.95)(0.95) = 0.9025

Bayesian View

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

Retention

Density

Expected retention rate~30%

Belief

uncertainty

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

Retention

Density

~15%New belief after 0 retained users

Stronger belief after observing evidence

The probability of the model given the data

p(heads) = p(tails) = 0.5

Tossing the Coin

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

Number of flips

Prop

ortio

n of

Hea

dsHeads

Tails

THTHTTTHHTHH

1 5 10 50 500

0.0

0.2

0.4

0.6

0.8

1.0

Number of flips

Prop

ortio

n of

Hea

dsTossing the Coin

Long run average = 0.5

p(y|x) = p(y, x)p(x)

p(x|y) = p(x, y)p(y)

p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y)p(y|x) = p(x|y)p(y)

p(x)

p(y|x) = p(x|y)p(y)Py p(x|y)p(y)

Probability of xp(y|x) = p(y, x)p(x)

p(x|y) = p(x, y)p(y)

p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y)p(y|x) = p(x|y)p(y)

p(x)

p(y|x) = p(x|y)p(y)Py p(x|y)p(y)

Probability of x and y(conjoint)

p(y|x) = p(y, x)p(x)

p(x|y) = p(x, y)p(y)

p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y)p(y|x) = p(x|y)p(y)

p(x)

p(y|x) = p(x|y)p(y)Py p(x|y)p(y)

Probability of x given y(conditional)

Terminology

The Bernoulli distribution

For a fair coin, = 0.5

Head (H) = 1, Tails (T) = 0

p(x|) = x(1 )(1x)

1

A single toss:

p(x|) = x(1 )(1x)

p(heads = 1|0.5) = 0.51(10.5)(11) = 0.5

p(tails = 0|0.5) = 0.50(10.5)(10) = 0.5

1

p(x|) = x(1 )(1x)

p(heads = 1|0.5) = 0.51(10.5)(11) = 0.5

p(tails = 0|0.5) = 0.50(10.5)(10) = 0.5

1

The Binomial

p(x|) = x(1 )(1x)

1

Probability of heads in a single throw:

Probability of x heads in n throws:

p(|x, n) = x(1 )(nx) (a1)(1 )(b1) / B(a, b) p(x, n)

p(|x, n) = x+a1(1 )(nx+b1) / B(x+ a, n x+ b)

p(a > b) =

a>b

p(a|xa)p(b|xb) dA dB

p(x|, n) = x(1 )(nx)

B(a, b) =(a)(b)

(a+ b)=

(a 1)!(b 1)!(a+ b 1)!

beta(|a, b) = (a1)(1 )(b1) / B(a, b)

p(|x) = x(1 )(nx) p() / p(x)p0(|x0) = p(x0|) p() / p(x0)

p1(|x1) = p(x1|) p0() / p0(x1)

pn(|xn) = p(xn|) pn1() / pn1(xn)

p(|x) = x(1 )(nx)p()

x(1 )(nx)p()dztest =

pc pvpc(1pc)+pv(1pv)

n

p(h, 20|0.2)p() = p(x|)p()

p(x|) = x(1 )(1x)

p(heads = 1|0.5) = 0.51(1 0.5)(11) = 0.5

p(tails = 0|0.5) = 0.50(1 0.5)(10) = 0.5

p(h|n, ) =(

nh

)h(1 )(nh)

p(x|, n) =(

nx

)x(1 )(nx)

1

20 tosses of a fair coin

f(x) =1p22

e(x)2)

22

p(|x, n) = x(1 )(nx) (a1)(1 )(b1) / B(a, b) p(x, n)

p(|x, n) = x+a1(1 )(nx+b1) / B(x+ a, n x+ b)

p(a > b) =

a>b

p(a|xa)p(b|xb) dA dB

p(x|, n) = x(1 )(nx)

B(a, b) =(a)(b)

(a+ b)=

(a 1)!(b 1)!(a+ b 1)!

beta(|a, b) = (a1)(1 )(b1) / B(a, b)

p(|x) = x(1 )(nx) p() / p(x)p0(|x0) = p(x0|) p() / p(x0)

p1(|x1) = p(x1|) p0() / p0(x1)

pn(|xn) = p(xn|) pn1() / pn1(xn)

p(|x) = x(1 )(nx)p()

x(1 )(nx)p()dztest =

pc pvpc(1pc)+pv(1pv)

n

p(x|0.2, 20) p(x|0.5, 20)p() = p(x|)p()

p(x|) = x(1 )(1x)

p(heads = 1|0.5) = 0.51(1 0.5)(11) = 0.5

p(tails = 0|0.5) = 0.50(1 0.5)(10) = 0.5

p(h|n, ) =(

nh

)h(1 )(nh)

p(x|, n) =(

nx

)x(1 )(nx)

1

20 tosses of an un-fair coin

f(x) =1p22

e(x)2)

22

p(|x, n) = x(1 )(nx) (a1)(1 )(b1) / B(a, b) p(x, n)

p(|x, n) = x+a1(1 )(nx+b1) / B(x+ a, n x+ b)

p(a > b) =

a>b

p(a|xa)p(b|xb) dA dB

p(x|, n) = x(1 )(nx)

B(