30
Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services and Advertising NIPS 2009 – December 2009

Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Embed Size (px)

Citation preview

Page 1: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Probabilistic Machine Learning in Computational Advertising

Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela

Online Services and AdvertisingMicrosoft Research Cambridge, UK

NIPS 2009 – December 2009

Page 2: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Outline

• Online Advertising and Paid Search• AdPredictorTM: Predicting User Clicks on Ads

[Appendix]• Model shrinking• Parallel training

Page 3: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

ONLINE ADVERTISING AND PAID SEARCH

Page 4: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Advertising Industry Business: Size

0

100

200

300

400

500

600

2001 2002 2003 2004 2005 2006

Year

Outdoor

Cinema

Radio

TV

Print

Online

Annu

al E

xpen

ditu

re (i

n bi

llion

USD

)

GDP Denmark (2006)

Microsoft Revenue (2008)

Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007

Page 5: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Advertising Industry Business: Growth

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

2001 2002 2003 2004 2005 2006

Year

Outdoor

Cinema

Radio

TV

Print

Online

Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007

Page 6: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services
Page 7: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

$1.00$1.00

$2.00$2.00

$0.10$0.10

* 10%* 10%

* 4%* 4%

* 50%* 50%

=$0.10=$0.10

=$0.08=$0.08

=$0.05=$0.05

$0.80$0.80

$1.25$1.25

$0.05$0.05

Display to users (expected bid)Display to users (expected bid) Charge advertisers (per click)Charge advertisers (per click)

Page 8: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

The Scale of Things

• Realistic training set for proof of concept: 7,000,000,000 impressions

• 2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day =

1,209,600 seconds• Learning algorithm speed requirement:– 5,787 impression updates / sec– 172.8 μs per impression update

Page 9: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

ADPREDICTORBayesian Linear Probit Regression

Page 10: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Impression Level Predictions

Page 11: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

One Weight per Feature Value102.34.12.201

15.70.165.9

221.98.2.187

92.154.3.86

Client IP

Exact Match

Broad Match

MatchType

Position

ML-1

SB-1

SB-2

++ pClick

Page 12: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Click Potential

Linear: click potential = sum of feature click contributions

click potentialclick potential00

PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2

ListingId = 798831ListingId = 798831

ClientIP = 98.0.101.23ClientIP = 98.0.101.23

clickclickclickclickno clickno clickno clickno click

Impression click potentialImpression click potential

Page 13: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Gaussian Noise

Probit: area under Gaussian tail as a function of click potential

click potentialclick potential00

Impression click potentialImpression click potential

clickclickclickclickno clickno clickno clickno click

P(click) = P(potential > 0)P(click) = P(potential > 0)

Page 14: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Probit

Probit: area under Gaussian tail as a function of click potential

click potentialclick potential00

Impression click potentialImpression click potential

100%100%

P(click|Impression)P(click|Impression)

Page 15: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

click potentialclick potential00

PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2

ListingId = 798831ListingId = 798831

ClientIP = 98.0.101.23ClientIP = 98.0.101.23

Modelling Uncertainty

Page 16: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

click potentialclick potential00

Impression click potentialImpression click potential

Uncertainty about the Potential

Page 17: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

click potentialclick potential00

Impression click potentialImpression click potential

Probability of Click

100%100%

Page 18: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Uncertainty: Bayesian Probabilities102.34.12.201

15.70.165.9

221.98.2.187

92.154.3.86

Client IP

Exact Match

Broad Match

MatchType

Position

ML-1

SB-1

SB-2

p(pClick)++

Page 19: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Principled Exploration

Page 20: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Training Algorithm in Action

w1

w1

w2

w2++

zz

cc Prediction Training/Update

Page 21: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Posterior Updates for the Click Event

Page 22: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Client IP: Mean & Variance

Page 23: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Calibrated Predictions

Page 24: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Joint Updates vs. Independent Aggregation

Naive Bayes

Page 25: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

adPredictor Wrap Up

Page 26: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Thank [email protected]

Page 27: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

APPENDIX

Page 28: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Dealing with Millions of Variables• Observation 1: Large variable bags follow a

power-law w.r.t. frequency of items• Observation 2: Weight posteriors of rare items

are close to their prior• Idea:

1. Initially, the belief of each new item is compactly represented by one (and the same) prior

2. After observing an item for the first time, the posterior is allocated

3. At regular intervals, all weight posteriors with a small deviation from the prior are removed

Page 29: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Naïve Approach – Shared Memory

• Does not scale– Constant contention for locks– Some features are very frequent– Synchronization issues

Training Node 1 Training Node 2

Impression A Impression B

MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)

ModelFile

Conflict!

UpdateUpdate

UpdateUpdate

Upda

te

Upda

te

Upda

te

Upd

ate

Page 30: Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services

Proposal: Approximate LearningTrain Node 1 Train Node 2

Impression A Impression B

MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)

Upda

te UpdateUpdate

Update

Update

Merge Deltas

Updat

e

Update

Update

Final Model File