Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf...

Probabilistic Machine Learning in Computational Advertising

Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela

Online Services and AdvertisingMicrosoft Research Cambridge, UK

NIPS 2009 – December 2009

Outline

• Online Advertising and Paid Search• AdPredictorTM: Predicting User Clicks on Ads

[Appendix]• Model shrinking• Parallel training

ONLINE ADVERTISING AND PAID SEARCH

Advertising Industry Business: Size

2001 2002 2003 2004 2005 2006

Outdoor

Cinema

Online

GDP Denmark (2006)

Microsoft Revenue (2008)

Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007

Advertising Industry Business: Growth

-20.00%

-10.00%

10.00%

20.00%

30.00%

40.00%

50.00%

2001 2002 2003 2004 2005 2006

Outdoor

Cinema

Online

Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007

$1.00$1.00

$2.00$2.00

$0.10$0.10

* 10%* 10%

* 4%* 4%

* 50%* 50%

=$0.10=$0.10

=$0.08=$0.08

=$0.05=$0.05

$0.80$0.80

$1.25$1.25

$0.05$0.05

Display to users (expected bid)Display to users (expected bid) Charge advertisers (per click)Charge advertisers (per click)

The Scale of Things

• Realistic training set for proof of concept: 7,000,000,000 impressions

• 2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day =

1,209,600 seconds• Learning algorithm speed requirement:– 5,787 impression updates / sec– 172.8 μs per impression update

ADPREDICTORBayesian Linear Probit Regression

Impression Level Predictions

One Weight per Feature Value102.34.12.201

15.70.165.9

221.98.2.187

92.154.3.86

Client IP

Exact Match

Broad Match

MatchType

Position

++ pClick

Click Potential

Linear: click potential = sum of feature click contributions

click potentialclick potential00

PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2

ListingId = 798831ListingId = 798831

ClientIP = 98.0.101.23ClientIP = 98.0.101.23

clickclickclickclickno clickno clickno clickno click

Impression click potentialImpression click potential

Gaussian Noise

Probit: area under Gaussian tail as a function of click potential

clickclickclickclickno clickno clickno clickno click

P(click) = P(potential > 0)P(click) = P(potential > 0)

Probit

Probit: area under Gaussian tail as a function of click potential

100%100%

P(click|Impression)P(click|Impression)

PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2

ListingId = 798831ListingId = 798831

ClientIP = 98.0.101.23ClientIP = 98.0.101.23

Modelling Uncertainty

Uncertainty about the Potential

Probability of Click

100%100%

Uncertainty: Bayesian Probabilities102.34.12.201

15.70.165.9

221.98.2.187

92.154.3.86

Client IP

Exact Match

Broad Match

MatchType

Position

p(pClick)++

Principled Exploration

Training Algorithm in Action

cc Prediction Training/Update

Posterior Updates for the Click Event

Client IP: Mean & Variance

Calibrated Predictions

Joint Updates vs. Independent Aggregation

Naive Bayes

adPredictor Wrap Up

Thank you!thoreg@microsoft.com

APPENDIX

Dealing with Millions of Variables• Observation 1: Large variable bags follow a

power-law w.r.t. frequency of items• Observation 2: Weight posteriors of rare items

are close to their prior• Idea:

1. Initially, the belief of each new item is compactly represented by one (and the same) prior

2. After observing an item for the first time, the posterior is allocated

3. At regular intervals, all weight posteriors with a small deviation from the prior are removed

Naïve Approach – Shared Memory

• Does not scale– Constant contention for locks– Some features are very frequent– Synchronization issues

Training Node 1 Training Node 2

Impression A Impression B

MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)

ModelFile

Conflict!

UpdateUpdate

Proposal: Approximate LearningTrain Node 1 Train Node 2

Impression A Impression B

MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)

te UpdateUpdate

Update

Merge Deltas

Update

Final Model File

Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf...

Documents

Wolf-Gradient, a multi-agent learning algorithm for normal ... · which resulted in this thesis: Joaquin Quinon˜ ero-Candela, Ralf Herbrich and Thore Graepel. This thesis would not

Moluscos continentales del delta del Ebro …molluscat.com/assets/spira_5_3_3.pdfMoluscos continentales del delta del Ebro (Cataluña, España) Sergio Quiñonero Salgado 1,* & Joaquín

INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel

er amiliæ - TrustedPartnercdn.trustedpartner.com/docs/library/ArmoryArts2010/mayer-reprint.pdf · Theatrum Familiæ Installation View, Galerie Lausberg 2012 Foto: Thomas Herbrich

Rumi, El Masnavi, Vol I (en español) versión de Marià Corbí Quiñonero

NEURAL NETWORKS IN ECONOMICS RALF HERBRICH …smartquant.com/references/NeuralNetworks/neural7.pdf · NEURAL NETWORKS IN ECONOMICS Background, Applications and New Developments RALF

JUAN PEDRO QUIÑONERO, UN ESCRITOR DE NOVELA …

Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge

Tema 2 redes informáticas daniel quiñonero rodriguez

graepel product catalogue 2008:Layout 1 - Graepels | …€¦ · · 2017-02-22General Fabrication– We offer a cutting, folding, notching and welding service for those who cannot

Social Psychology YOU ARE WHAT YOU LIKEget.aca.ntu.edu.tw/getcdb/retrieve/448982/17101L03_201808.pdf · Kosinski, Michal, David Stillwell, and Thore Graepel. "Private traits and attributes

H. Traineau, B. Herbrich, E. Lasne, D. Tournaye

Introduction to Support Vector Machines · 2017. 10. 19. · – Herbrich et al., “Large Margin Rank Boundaries for Ordinal Regression”, Advances in Large Margin Classifiers,

Maximum flexibility. - Graepel Perforators · Maximum flexibility. Profiled safety gratings for safe walking and safe standing. 2 Quality made by Graepel. ... Putzmeister Aichtal

Microsoft Research1 A Statistical Analysis of the Precision-Recall Graph Ralf Herbrich, Hugo Zaragoza, Simon Hill. Microsoft Research, Cambridge University,

ORIGEN DEL SISTEMA SOLAR Jorge Quiñonero y Sara Oñoro 1ºE

WIKIPEDIA GRADO DE MAESTRO EN EDUCACIÓN PRIMARIA Alumna: Ana Quiñonero Bengochea Valencian International University (VIU) Asignatura: Sociedad, Tecnología

1 Learning CRFs with Hierarchical Features: An Application to Go Scott Sanner Thore Graepel Ralf Herbrich Tom Minka TexPoint fonts used in EMF. Read the

David Stern Ralf Herbrich Thore Graepel Microsoft Research Cambridge, UK

Graepel-Perl (bisher Graepel-Kegel) Graepel-Roste Graepel-Perl · 14 15 Rutschhemmungswerte Werkstoff Bewertung Rutschhemmung DD 11 feuerverzinkt R 10 DX 51 D bandverzinkt R 12 Edelstahl