A/B Testing for Game Design

Embed Size (px)

Text of A/B Testing for Game Design

  • Steve Collins CTO / Swrve

    A/B Testing for Game Design Iteration: A Bayesian Approach

  • Biography

  • Push Notifications In-App MessagingA/B TestingTargeting & Analytics

  • Introduction to A/B Testing

  • Game Service

    Test

    Dev

    Upd

    ate Test

    Dev

    Upd

    ate Test

    Dev

    Upd

    ate Test

    Dev

    Upd

    ate

    Development Beta

    Soft Launch

    LAUNCHTest

    Dev

    Launch

  • On-board

    1st experience Tutorial Flow Incentives / Promo

    MonetizationIAPs, virtual items Promotions / oers / Ads Economy / Game Balance

    RetentionMessaging / Push Notify Core loops, events Community / Social

    EngagementFun factor! Content refresh / evolution Level balance, progress

    LTV (+$)

    Acquisition

    Ads (CPC, CPI) Cross Promo Discovery, PR

    UAC (-$)

  • ROI = LTV - UAC

  • ROI UACLTV =nX

    x=1

    ARPUx COSTx(1 +WACC)x

    CACARPULTV =nX

    x=1

    ARPUx COSTx(1 +WACC)x

    CACLTV =nX

    x=1

    ARPUx COSTx(1 +WACC)x

    CACd

    d

    lifetime?

    LTV

    {

  • 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30" 31" 32"

    High Mean Low

    Install-day-cohort 30-day revenue

  • Test%Hypotheses%Data$driven$

    Take%ac'on%Iterate,'fail+fast'

    Understand)Metrics()>(Analy0cs()>(Insight(

  • 1. Split population 2. Show variations 4. Choose winner 4. Deploy winner3. Measure response

  • State Sync State Sync

    Game State DBData Warehouse

    Analyse

    Events

    (tagg

    ed wit

    h A or

    B)

    Game Server

    Content Management System

    A or B variant

    A/B Test Server

  • What to test?

  • vs

    Message layouts / content

  • choke point

    Tutorial Flow

  • vs

    25 25 -20% -40%

    Promotion Discounts

  • Elasticity testing: exchange rate

    $4.99 $4.99

    100 coins

    A B

    200 coins

    vs

  • Price set A

    Price set B

    Price set C

    Store Inventory

  • (f) Zoomed in version of the previous scatterplot. Note that jitter and transparencyare used to make scatterplot easier to read; apparent banding is due to jitter. Itappears that most users who initially buy in for small amounts tend to contributeonly a small amount of total revenue.

    2.1.4 Cheaters/Pirates

    Up to now, all charts have removed spending that is not verified, or has beenfrom users marked as Cheaters.

    We now look at the cumulative revenue histogram for just those users whohave been marked as cheaters, or failed verification or who have bought in,but never had a purchase verified.

    8

    Value of first purchase

    Tota

    l sp

    end

    $70$30$15

    A/B Test

    VIPs

    Repeat Purchases

    $5$2

  • Timing

    Wait Sessions (3)

    Wait Event(movie_closed)

    In-app Msg"Request Facebook

    Registration"FB Reg

    when where

  • 0 10000 20000 30000 40000 50000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    number of user trials

    Prob

    abilit

    y of w

    inning

  • Canacandycrushbalt*

    Day 1 retention = 30%

    * Apologies to King and Adam Saltsman

  • Beta Test

    Original New Version

  • Expecting 30% day-1 retention

    After 50 users, we see 0%

    Is this bad?

  • Null Hypothesis Testing (NHT) View

  • The Null Hypothesis

    f(x) =1p22

    e(x)2)

    22

    H0 : x !

    H0 : x !

    Accept

    Reject

    H0 : x =

    30%

  • f(x) =1p22

    e(x)2)

    22Reject

    0%

    30%

  • Conclusion: 30% is unlikely to be the retention rate

  • Issue #1 p-value

  • p-Value

    95%

    2.5% 2.5%

    f(x) =1p22

    e(x)2)

    22

    P [r, n] =n!

    (n r)!r!pr(1 p)nr

    = np =np(1 p)

    z =x

    z =x

    n

    SE(x) =n

    n =(z + z)2 2

    2

    + 1.96

    1.96

    P [r, n] =n!

    (n r)!r!pr(1 p)nr

    = np =np(1 p)

    z =x

    z =x

    n

    SE(x) =n

    n =(z + z)2 2

    2

    + 1.96

    1.96

  • p-Value

    The probability of observing as extreme a result assuming the null hypothesis is true

    The probability of the data given the model

    OR

  • Null Hypothesis: H0

    H H

    Accept H Type-II Error

    Reject H Type-I Error

    Truth

    Observation

    False Positive

  • All we can ever say is either

    not enough evidence that retention rates are the same

    the retention rates are different, 95% of the time

    p < 0.05

    p-Value

  • actually...

    The evidence supports a rejection of the null hypothesis, i.e.

    the probability of seeing a result as extreme as this, assuming

    the retention rate is actually 30%, is less than 5%.

    p < 0.05

  • Issue #2 Peeking

  • Required change

    Standard deviationNumber of

    participants per group n 16

    2

    2

  • Peeks 5% Equivalent

    1 2.9%

    2 2.2%

    3 1.8%

    5 1.4%

    10 1%

    To get 5% false positive rate you need...

    i.e. 5 timeshttp://www.evanmiller.org/how-not-to-run-an-ab-test.html

  • Issue #3 Family-wise Error

  • Family-wise Error

    p = P( Type-I Error) = 0.05

    5% of the time we will get a false positive - for one treatment

    P( no Type-I Error) = 0.95

    P( at least 1 Type-I Error for 2 treatments) = (1 - 0.9025) = 0.0975P( no Type-I Error for 2 treatments) = (0.95)(0.95) = 0.9025

  • Bayesian View

  • 0.0 0.2 0.4 0.6 0.8 1.0

    02

    46

    810

    Retention

    Density

    Expected retention rate~30%

    Belief

    uncertainty

  • 0.0 0.2 0.4 0.6 0.8 1.0

    02

    46

    810

    Retention

    Density

    ~15%New belief after 0 retained users

    Stronger belief after observing evidence

  • The probability of the model given the data

  • p(heads) = p(tails) = 0.5

  • Tossing the Coin

    0 10 20 30 40 50

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Number of flips

    Prop

    ortio

    n of

    Hea

    dsHeads

    Tails

    THTHTTTHHTHH

  • 1 5 10 50 500

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Number of flips

    Prop

    ortio

    n of

    Hea

    dsTossing the Coin

    Long run average = 0.5

  • p(y|x) = p(y, x)p(x)

    p(x|y) = p(x, y)p(y)

    p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y)p(y|x) = p(x|y)p(y)

    p(x)

    p(y|x) = p(x|y)p(y)Py p(x|y)p(y)

    Probability of xp(y|x) = p(y, x)p(x)

    p(x|y) = p(x, y)p(y)

    p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y)p(y|x) = p(x|y)p(y)

    p(x)

    p(y|x) = p(x|y)p(y)Py p(x|y)p(y)

    Probability of x and y(conjoint)

    p(y|x) = p(y, x)p(x)

    p(x|y) = p(x, y)p(y)

    p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y)p(y|x) = p(x|y)p(y)

    p(x)

    p(y|x) = p(x|y)p(y)Py p(x|y)p(y)

    Probability of x given y(conditional)

    Terminology

  • The Bernoulli distribution

    For a fair coin, = 0.5

    Head (H) = 1, Tails (T) = 0

    p(x|) = x(1 )(1x)

    1

    A single toss:

    p(x|) = x(1 )(1x)

    p(heads = 1|0.5) = 0.51(10.5)(11) = 0.5

    p(tails = 0|0.5) = 0.50(10.5)(10) = 0.5

    1

    p(x|) = x(1 )(1x)

    p(heads = 1|0.5) = 0.51(10.5)(11) = 0.5

    p(tails = 0|0.5) = 0.50(10.5)(10) = 0.5

    1

  • The Binomial

    p(x|) = x(1 )(1x)

    1

    Probability of heads in a single throw:

    Probability of x heads in n throws:

    p(|x, n) = x(1 )(nx) (a1)(1 )(b1) / B(a, b) p(x, n)

    p(|x, n) = x+a1(1 )(nx+b1) / B(x+ a, n x+ b)

    p(a > b) =

    a>b

    p(a|xa)p(b|xb) dA dB

    p(x|, n) = x(1 )(nx)

    B(a, b) =(a)(b)

    (a+ b)=

    (a 1)!(b 1)!(a+ b 1)!

    beta(|a, b) = (a1)(1 )(b1) / B(a, b)

    p(|x) = x(1 )(nx) p() / p(x)p0(|x0) = p(x0|) p() / p(x0)

    p1(|x1) = p(x1|) p0() / p0(x1)

    pn(|xn) = p(xn|) pn1() / pn1(xn)

    p(|x) = x(1 )(nx)p()

    x(1 )(nx)p()dztest =

    pc pvpc(1pc)+pv(1pv)

    n

    p(h, 20|0.2)p() = p(x|)p()

    p(x|) = x(1 )(1x)

    p(heads = 1|0.5) = 0.51(1 0.5)(11) = 0.5

    p(tails = 0|0.5) = 0.50(1 0.5)(10) = 0.5

    p(h|n, ) =(

    nh

    )h(1 )(nh)

    p(x|, n) =(

    nx

    )x(1 )(nx)

    1

  • 20 tosses of a fair coin

    f(x) =1p22

    e(x)2)

    22

    p(|x, n) = x(1 )(nx) (a1)(1 )(b1) / B(a, b) p(x, n)

    p(|x, n) = x+a1(1 )(nx+b1) / B(x+ a, n x+ b)

    p(a > b) =

    a>b

    p(a|xa)p(b|xb) dA dB

    p(x|, n) = x(1 )(nx)

    B(a, b) =(a)(b)

    (a+ b)=

    (a 1)!(b 1)!(a+ b 1)!

    beta(|a, b) = (a1)(1 )(b1) / B(a, b)

    p(|x) = x(1 )(nx) p() / p(x)p0(|x0) = p(x0|) p() / p(x0)

    p1(|x1) = p(x1|) p0() / p0(x1)

    pn(|xn) = p(xn|) pn1() / pn1(xn)

    p(|x) = x(1 )(nx)p()

    x(1 )(nx)p()dztest =

    pc pvpc(1pc)+pv(1pv)

    n

    p(x|0.2, 20) p(x|0.5, 20)p() = p(x|)p()

    p(x|) = x(1 )(1x)

    p(heads = 1|0.5) = 0.51(1 0.5)(11) = 0.5

    p(tails = 0|0.5) = 0.50(1 0.5)(10) = 0.5

    p(h|n, ) =(

    nh

    )h(1 )(nh)

    p(x|, n) =(

    nx

    )x(1 )(nx)

    1

  • 20 tosses of an un-fair coin

    f(x) =1p22

    e(x)2)

    22

    p(|x, n) = x(1 )(nx) (a1)(1 )(b1) / B(a, b) p(x, n)

    p(|x, n) = x+a1(1 )(nx+b1) / B(x+ a, n x+ b)

    p(a > b) =

    a>b

    p(a|xa)p(b|xb) dA dB

    p(x|, n) = x(1 )(nx)

    B(