73
Applied Bayesian Inference with PyMC @MrSantoni

Applied Bayesian Inference with PyMC

Embed Size (px)

Citation preview

Page 1: Applied Bayesian Inference with PyMC

Applied Bayesian Inference with PyMC

@MrSantoni

Page 2: Applied Bayesian Inference with PyMC

Which color will sell more?

Page A

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page B

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page 3: Applied Bayesian Inference with PyMC

Page A

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page B

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

#buy / N #buy / N

Page 4: Applied Bayesian Inference with PyMC

• What if N is small?• What is N to have 90% confidence?• What if N is different on A and B?

Page 5: Applied Bayesian Inference with PyMC

Bayesian Inference

Page 6: Applied Bayesian Inference with PyMC

Probability:

Claim: we think Bayesian

FrequentistBayesian

FrequenceBelief

Page 7: Applied Bayesian Inference with PyMC

test 1 test 2 test 3

Claim: we think Bayesian

no-bugs confidence

Page 8: Applied Bayesian Inference with PyMC

Bayesian Inference =

update your beliefs

new evidence

prior belief

Page 9: Applied Bayesian Inference with PyMC

The Developer View

Statistical Problem

def frequentist(): return 80%

def bayesian(): return0% 100%

Page 10: Applied Bayesian Inference with PyMC

How to?

0% 100%

Page 11: Applied Bayesian Inference with PyMC

How to?

𝑃 ( 𝐴|𝐵 )=𝑃 (𝐵|𝐴 )𝑃 (𝐴)

𝑃 (𝐵)

Closed-form solution:

Realistic Cases

Toy Examples

0% 100%

Page 12: Applied Bayesian Inference with PyMC

PyMC

Page 13: Applied Bayesian Inference with PyMC

PyMC

• Perform Bayesian Inference• Markov Chain Monte Carlo techniques• A.k.a. Probabilistic Programming

Page 14: Applied Bayesian Inference with PyMC

Show me the code!

Page 15: Applied Bayesian Inference with PyMC

Example A/B test

Page 16: Applied Bayesian Inference with PyMC

Only one difference between A and B

Page A

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page B

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page 17: Applied Bayesian Inference with PyMC

Page A

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page B

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page 18: Applied Bayesian Inference with PyMC

Assume there isp_aprobability of clicking BUY when landing on Ap_bprobability of clicking BUY when landing on B

How to compute p_a and p_b?

Page 19: Applied Bayesian Inference with PyMC

Page A– N_a visitors– C_a BUY-click on page A

Page B– N_b visitors– C_b BUY-click on page B

Page 20: Applied Bayesian Inference with PyMC

Frequentist:C_a / N_a

BUT:Observed frequency does not necessarily equal p_a

Page 21: Applied Bayesian Inference with PyMC

Bayesian:Infer true frequency from observed data

Page 22: Applied Bayesian Inference with PyMC

Page A

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page 23: Applied Bayesian Inference with PyMC

Bayesian Worflow

1. Define prior2. Fit to observations3. Get posteriors

Page 24: Applied Bayesian Inference with PyMC

from pymc import Uniform, rbernoulli, Bernoulli, MCMCfrom matplotlib import pyplot as plt

p_A_true = 0.05N = 1500occurrences = rbernoulli(p_A_true, N)

print 'Click-BUY:'print occurrences.sum()print 'Observed frequency:'print occurrences.sum() / float(N)

Click-BUY:68Observed frequency:0.0453333333333

Page 25: Applied Bayesian Inference with PyMC

Clicking BUY

Bernoulli distribution

𝑃 (𝑐𝑙𝑖𝑐𝑘 )={ 𝑝1−𝑝

𝑐𝑙𝑖𝑐𝑘=1𝑐𝑙𝑖𝑐𝑘=0

click=1 click=00

0.10.20.30.40.50.60.70.8

𝑝

Page 26: Applied Bayesian Inference with PyMC

p_A = Uniform('p_A', lower=0, upper=1)0 1 P_a

print p_A.random()print p_A.value

array(0.906086144982998)array(0.906086144982998)

print p_A.random()print p_A.value

array(0.285313846133313)array(0.285313846133313)

Page 27: Applied Bayesian Inference with PyMC

p_A = Uniform('p_A', lower=0, upper=1)

obs = Bernoulli('obs', p_A, value=occurrences, observed=True)

Page 28: Applied Bayesian Inference with PyMC

p_A = Uniform('p_A', lower=0, upper=1)

[------- 20% ] 4053 of 20000 complete in 0.5 sec[------------- 36% ] 7315 of 20000 complete in 1.0 sec[-----------------53% ] 10627 of 20000 complete in 1.5 sec[-----------------69%------ ] 13939 of 20000 complete in 2.0 sec[-----------------81%----------- ] 16376 of 20000 complete in 2.5 sec[-----------------96%---------------- ] 19342 of 20000 complete in 3.0 sec[-----------------100%-----------------] 20000 of 20000 complete in 3.1 sec[ 0.04656576 0.04656576 0.04656576 ..., 0.03803667 0.03803667 0.03803667]

mcmc = MCMC([p_A, obs])mcmc.sample(20000, 1000)

print mcmc.trace('p_A')[:]

obs = Bernoulli('obs', p_A, value=occurrences, observed=True)

Page 29: Applied Bayesian Inference with PyMC

plt.figure(figsize=(8, 7))plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)plt.xlabel('Probability of clicking BUY')plt.ylabel('Density')plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')plt.legend()plt.savefig('p_A_hist_N_%s.png' % N)plt.show()

Page 30: Applied Bayesian Inference with PyMC

Confidence 90% that P is between X and Y?

There is 90% probability that p_A is between 0.0373019596856 and 0.0548052806892

p_A_samples = mcmc.trace('p_A')[:]lower_bound = np.percentile(p_A_samples, 5)upper_bound = np.percentile(p_A_samples, 95)

print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)

Page 31: Applied Bayesian Inference with PyMC

What if N_a is lower?

Page 32: Applied Bayesian Inference with PyMC

from pymc import Uniform, rbernoulli, Bernoulli, MCMCfrom matplotlib import pyplot as plt

p_A_true = 0.05N = 50occurrences = rbernoulli(p_A_true, N)

print 'Click-BUY:'print occurrences.sum()print 'Observed frequency:'print occurrences.sum() / float(N)

Click-BUY:2Observed frequency:0.04

Page 33: Applied Bayesian Inference with PyMC

p_A = Uniform('p_A', lower=0, upper=1)

obs = Bernoulli('obs', p_A, value=occurrences, observed=True)

mcmc = MCMC([p_A, obs])mcmc.sample(20000, 1000)

print mcmc.trace('p_A')[:]

[----- 14% ] 2874 of 20000 complete in 0.5 sec[----------- 30% ] 6035 of 20000 complete in 1.0 sec[-----------------47% ] 9440 of 20000 complete in 1.5 sec[-----------------63%---- ] 12775 of 20000 complete in 2.0 sec[-----------------81%---------- ] 16203 of 20000 complete in 2.5 sec[-----------------100%-----------------] 20000 of 20000 complete in 3.0 sec[ 0.06240723 0.06240723 0.06240723 ..., 0.01864419 0.01864419 0.01864419]

Page 34: Applied Bayesian Inference with PyMC

plt.figure(figsize=(8, 7))plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)plt.xlabel('Probability of clicking BUY')plt.ylabel('Density')plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')plt.legend()plt.savefig('p_A_hist_N_%s.png' % N)plt.show()

Page 35: Applied Bayesian Inference with PyMC

Confidence 90% that P is between X and Y?

There is 90% probability that p_A is between 0.0160966147705 and 0.114655284797

p_A_samples = mcmc.trace('p_A')[:]lower_bound = np.percentile(p_A_samples, 5)upper_bound = np.percentile(p_A_samples, 95)

print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)

Page 36: Applied Bayesian Inference with PyMC

N_a = 1500 N_a = 50

Page 37: Applied Bayesian Inference with PyMC

Does the red have a larger probability of being clicked?

Page A

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page B

A Tea Pot

Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.

Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.

Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.

BUY

Page 38: Applied Bayesian Inference with PyMC

from pymc import Uniform, rbernoulli, Bernoulli, MCMC, deterministicfrom matplotlib import pyplot as plt

p_A_true = 0.05p_B_true = 0.04N_A = 1500N_B = 750

occurrences_A = rbernoulli(p_A_true, N_A)occurrences_B = rbernoulli(p_B_true, N_B)

print 'Observed frequency:'print 'A'print occurrences_A.sum() / float(N_A)print 'B'print occurrences_B.sum() / float(N_B)

Observed frequency:A0.0533333333333B0.0413333333333

Page 39: Applied Bayesian Inference with PyMC

p_A = Uniform('p_A', lower=0, upper=1)p_B = Uniform('p_B', lower=0, upper=1)

@deterministicdef delta(p_A=p_A, p_B=p_B):

return p_A - p_B

obs_A = Bernoulli('obs_A', p_A, value=occurrences_A, observed=True)obs_B = Bernoulli('obs_B', p_B, value=occurrences_B, observed=True)

mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])mcmc.sample(25000, 5000)[----- 14% ] 3561 of 25000 complete in 0.5 sec[--------- 25% ] 6332 of 25000 complete in 1.0 sec[------------ 33% ] 8454 of 25000 complete in 1.5 sec[--------------- 41% ] 10499 of 25000 complete in 2.0 sec[-----------------50% ] 12602 of 25000 complete in 2.5 sec[-----------------59%-- ] 14780 of 25000 complete in 3.0 sec[-----------------67%----- ] 16883 of 25000 complete in 3.5 sec[-----------------75%-------- ] 18954 of 25000 complete in 4.0 sec[-----------------83%----------- ] 20877 of 25000 complete in 4.5 sec[-----------------91%-------------- ] 22924 of 25000 complete in 5.0 sec[-----------------100%-----------------] 25000 of 25000 complete in 5.5 sec

Page 40: Applied Bayesian Inference with PyMC

p_A_samples = mcmc.trace('p_A')[:]p_B_samples = mcmc.trace('p_B')[:]delta_samples = mcmc.trace('delta')[:]

Page 41: Applied Bayesian Inference with PyMC

plt.subplot(3,1,1)plt.xlim(0, 0.1)plt.hist(p_A_samples, bins=35, histtype='stepfilled', normed=True, color='blue', label='Posterior of p_A')plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A (unknown)')plt.xlabel('Probability of clicking BUY via A')plt.legend()plt.subplot(3,1,2)plt.xlim(0, 0.1)plt.hist(p_B_samples, bins=35, histtype='stepfilled', normed=True, color='green', label='Posterior of p_B')plt.vlines(p_B_true, 0, 90, linestyle='--', label='True p_B (unknown)')plt.xlabel('Probability of clicking BUY via B')plt.legend()plt.subplot(3,1,3)plt.xlim(0, 0.1)plt.hist(delta_samples, bins=35, histtype='stepfilled', normed=True, color='red', label='Posterior of delta')plt.vlines(p_A_true - p_B_true, 0, 90, linestyle='--', label='True delta (unknown)')plt.xlabel('p_A - p_B')plt.legend()plt.savefig('A_and_B.png')plt.show()

Page 42: Applied Bayesian Inference with PyMC
Page 43: Applied Bayesian Inference with PyMC

p_A > p_BHow much are we confident?

print 'Probability that p_A > p_B:'print (delta_samples > 0).mean()

Probability that p_A > p_B:0.8919

Page 44: Applied Bayesian Inference with PyMC

N_A = 1500N_B = 750

N_A = 1500N_B = 200

Page 45: Applied Bayesian Inference with PyMC

print 'Probability that p_A > p_B:'print (delta_samples > 0).mean()

Probability that p_A > p_B:0.73455

Page 46: Applied Bayesian Inference with PyMC

MCMC

Page 47: Applied Bayesian Inference with PyMC

mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])mcmc.sample(25000, 5000)

Posterior P(p_A, p_B, delta | obs_A, obs_B) as samples

25000 iterations5000 burn-in

Metropolis-Hastings algorithm

Page 48: Applied Bayesian Inference with PyMC

Open the black box

mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])mcmc.sample(25000, 5000)

from pymc.Matplot import plot as mcplot

mcplot(mcmc)

Page 49: Applied Bayesian Inference with PyMC
Page 50: Applied Bayesian Inference with PyMC
Page 51: Applied Bayesian Inference with PyMC
Page 52: Applied Bayesian Inference with PyMC

PyMC

• Easy to interpret results– confidence, no p-values!

• No crazy math• Computationally expensive

Page 53: Applied Bayesian Inference with PyMC
Page 54: Applied Bayesian Inference with PyMC

Thank you

@[email protected]

Page 55: Applied Bayesian Inference with PyMC

Back

Page 56: Applied Bayesian Inference with PyMC

Serie A 13/14

Page 57: Applied Bayesian Inference with PyMC

Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR24/08/2013 Sampdoria Juventus 0 1 A 0 0 D24/08/2013 Verona Milan 2 1 H 1 1 D25/08/2013 Cagliari Atalanta 2 1 H 1 1 D25/08/2013 Inter Genoa 2 0 H 0 0 D25/08/2013 Lazio Udinese 2 1 H 2 0 H25/08/2013 Livorno Roma 0 2 A 0 0 D25/08/2013 Napoli Bologna 3 0 H 2 0 H25/08/2013 Parma Chievo 0 0 D 0 0 D25/08/2013 Torino Sassuolo 2 0 H 1 0 H26/08/2013 Fiorentina Catania 2 1 H 2 1 H31/08/2013 Chievo Napoli 2 4 A 2 2 D31/08/2013 Juventus Lazio 4 1 H 2 1 H01/09/2013 Atalanta Torino 2 0 H 0 0 D01/09/2013 Bologna Sampdoria 2 2 D 1 1 D01/09/2013 Catania Inter 0 3 A 0 1 A01/09/2013 Genoa Fiorentina 2 5 A 0 3 A01/09/2013 Milan Cagliari 3 1 H 2 1 H01/09/2013 Roma Verona 3 0 H 0 0 D01/09/2013 Sassuolo Livorno 1 4 A 0 1 A01/09/2013 Udinese Parma 3 1 H 1 0 H14/09/2013 Inter Juventus 1 1 D 0 0 D14/09/2013 Napoli Atalanta 2 0 H 0 0 D14/09/2013 Torino Milan 2 2 D 0 0 D15/09/2013 Fiorentina Cagliari 1 1 D 0 0 D

https://datahub.io/dataset/italian-football-data-serie-a-b

Page 58: Applied Bayesian Inference with PyMC

Win-rate

Did it change?

Page 59: Applied Bayesian Inference with PyMC

Bayesian Worflow

1. Define Prior2. Fit to observations3. Get Posteriors

Page 60: Applied Bayesian Inference with PyMC

Winning a Match

Bernoulli distribution

𝑃 (𝑤 )={ 𝑝1−𝑝

𝑤=1𝑤=0

Win (w=1) Lose (w=0)0

0.10.20.30.40.50.60.70.8

𝑝

Page 61: Applied Bayesian Inference with PyMC

𝑝 : switchpoint?

Page 62: Applied Bayesian Inference with PyMC

Model the switchpoint

𝑝={𝑝1𝑝2 𝑡<𝜏𝑡≥𝜏

Goal -> infer

Page 63: Applied Bayesian Inference with PyMC

Bayesian Worflow

1. Define Prior2. Fit to observations3. Get Posteriors

Page 64: Applied Bayesian Inference with PyMC

Let’s model this

• goal: infer unknown p1, p2, TAU• FIRST STEP OF Bayesian Inference: assign a prior

probability to different possible values of p• what would be a good prior for p1, p2? Use

uniform:– p1 ~ Uniform(0,1)– p2 ~ Uniform(0,1)– TAU ~ DiscreteUniform(1, 38)

• P(TAU=k)=1/38 for all k

Page 65: Applied Bayesian Inference with PyMC

from pymc import Uniform, DiscreteUniform, deterministic, Bernoulli, Model, MCMC

p_1 = Uniform('p_1', lower=0, upper=1)p_2 = Uniform('p_2', lower=0, upper=1)tau = DiscreteUniform('tau', lower=1, upper=38)

print 'Random output: ', tau.random(), tau.random(), tau.random()

Random output: 14 24 33

@deterministicdef p_(tau=tau, p_1=p_1, p_2=p_2, num_matches=38): # concatenate p_1 and p_2 based on tau out = np.empty(num_matches) out[:tau] = p_1 out[tau:] = p_2 return out

Page 66: Applied Bayesian Inference with PyMC

Load Data

import pandas as pd

df = pd.read_csv('serie_a.csv', parse_dates=['Date'], date_parser=parse_date)

matches = df[(df.HomeTeam == ‘Milan’) | (df.AwayTeam == ‘Milan’)]matches = matches.set_index(['Date'])matches = compute_extra_columns(matches, team)# some pandas manipulations occur herematches[‘Win’] = … # 1 if Milan won, 0 otherwise

Page 67: Applied Bayesian Inference with PyMC

Fit the Model

observed_matches = Bernoulli('obs', p=p_, value=matches[['Win']], observed=True)

model = Model([observed_matches, p_1, p_2, tau])mcmc = MCMC(model)mcmc.sample(40000, 10000)

p_1_samples = mcmc.trace('p_1')[:]p_2_samples = mcmc.trace('p_2')[:]tau_samples = mcmc.trace('tau')[:]

print p_1_samples[:10]print p_2_samples[:10]print tau_samples[:10][ 0.42067236 0.42067236 0.42067236 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391][ 0.49213381 0.49213381 0.49213381 0.56072562 0.79863176 0.79863176 0.67416932 0.68382528 0.6069458 0.60062698][10 10 24 35 35 35 35 27 27 27]

Page 68: Applied Bayesian Inference with PyMC

plt.figure(figsize=(14.5, 10))ax = plt.subplot(311)ax.set_autoscaley_on(False)plt.hist(p_1_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_1', color='#A60628', normed=True, bins=30)plt.legend(loc='upper left')ax = plt.subplot(312)plt.hist(p_2_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_2', color='#7A68A6', normed=True, bins=30)plt.legend(loc='upper left')ax = plt.subplot(313)plt.hist(tau_samples, histtype='stepfilled', alpha=0.85, label='posterior of tau', color='#467821', normed=True, bins=30)plt.legend(loc='upper left')plt.show()

Page 69: Applied Bayesian Inference with PyMC
Page 70: Applied Bayesian Inference with PyMC

Expected Win Probability

num_matches = 38N = tau_samples.shape[0]expected_p_per_match = np.zeros(num_matches)for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) expected_p_per_match[match] = np.percentile(p_samples_match, 50)

Page 71: Applied Bayesian Inference with PyMC
Page 72: Applied Bayesian Inference with PyMC

Compute Confidence Bounds

lower_p_per_match = np.zeros(num_matches)upper_p_per_match = np.zeros(num_matches)for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) lower_p_per_match[match] = np.percentile(p_samples_match, 5) upper_p_per_match[match] = np.percentile(p_samples_match, 95)

Page 73: Applied Bayesian Inference with PyMC

Bayesian returns a distribution. What have we gained? We see uncertainty in our estimates. The wider the distribution, the less certain our posterior belief should be.