Upload
kira
View
1.134
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Part of the Search Engine course given in the Technion (2011)
Citation preview
Computational advertising
Kira Radinsky
Slides based on material from the paper
“Bandits for Taxonomies: A Model-based Approach” by
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti,
Vanja Josifovski, in SDM 2007
The Content Match Problem
Advert
isers
Ads
DB
Ads
Ad impression: Showing an ad to a user
(click)
The Content Match Problem
Advert
isers
Ads
Ad click: user click leads to revenue for ad server and content provider
Ads
DB
(click)
The Content Match Problem
Advert
isers
Ads
DB
Ads
The Content Match Problem:
Match ads to pages to maximize clicks
The Content Match Problem
Advert
isers
Ads
DB
Ads
Maximizing the number of clicks means: For each webpage, find the ad with the best
Click-Through Rate (CTR) but without wasting too many impressions in
learning this.
Outline
Problem
Background: Multi-armed bandits
• Proposed Multi-level Policy
• Experiments
• Conclusions
Background: Bandits
Bandit “arms”
p1 p2 p3(unknown payoff
probabilities)
Pull arms sequentially so as to maximize the total
expected reward
• Estimate payoff probabilities pi
• Bias the estimation process towards better arms
Background: Bandits Solutions
• Try 1: Greedy Solution:
• Compute the sample mean of an arm A by dividing the total reward received from the arm by the number of times the arm has been pulled. At each time step choose the arm with
highest sample mean.
• Try 2: Naïve solution:
• Pull each arm an equal number of times.
• Epsilon-greedy strategy:
• The best bandit is selected for a proportion 1 − ε of the trials,
and another bandit is randomly selected (with uniform
probability) for a proportion ε.
• Many more strategies
Ad matching as a bandit problemW
eb
pa
ge
1
Bandit “arms”
We
bp
ag
e 2
We
bp
ag
e 3
= ads
~106 ads
~109
pages
Ad matching as a bandit problem
Ads
Web
pa
ge
s
Content Match = A matrix
• Each row is a bandit
• Each cell has an unknown CTR
One instance of the MAB
problem (1 bandit)
Unknown CTR
Background: Bandits
Bandit Policy
1.Assign priority to
each arm
2. “Pull” arm with
max priority, and
observe reward
3.Update priorities
Priority 1 Priority 2 Priority 3
Allocation
Estimation
Background: Bandits
Why not simply apply a bandit policy
directly to the problem?
• Convergence is too slow
~109 instances of the MAB
problem(bandits), with ~106 arms per
instance (bandit)
• Additional structure is available, that
can help Taxonomies
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
• Experiments
• Conclusions
Multi-level Policy
Ads
Webpages
… …
……
……
classes
classes
Consider only two levels
Multi-level Policy
ApparelCompu-
ters Travel
… …
……
……
Consider only two levels
Tra
ve
lC
om
pu
-
ters
Ap
pa
rel
Ad parent
classes
Ad child classes
Block
One MAB problem
instance (bandit)
Multi-level Policy
ApparelCompu-
ters Travel
… …
……
……
Key idea: CTRs in a block are homogeneous
Ad parent
classes
Block
One MAB problem
instance (bandit)
Tra
ve
lC
om
pu
-
ters
Ap
pa
rel
Ad child classes
Multi-level Policy
• CTRs in a block are homogeneous
– Used in allocation (picking ad for each new page)
– Used in estimation (updating priorities after each observation)
Multi-level Policy
• CTRs in a block are homogeneous
Used in allocation (picking ad for each new page)
– Used in estimation (updating priorities after each observation)
C
A C T
AT
Multi-level Policy (Allocation)
?
Page
classifier
• Classify webpage page class, parent page class
• Run bandit on ad parent classes pick one ad parent class
C
A C T
AT
Multi-level Policy (Allocation)
• Classify webpage page class, parent page class
• Run bandit on ad parent classes pick one ad parent class
• Run bandit among cells pick one ad class
• In general, continue from root to leaf final ad
?
Page
classifier
ad
C
A C T
AT
ad
Multi-level Policy (Allocation)
Bandits at higher levels
• use aggregated information
• have fewer bandit arms
Quickly figure out the best ad parent class
Page
classifier
Multi-level Policy
• CTRs in a block are homogeneous
Used in allocation (picking ad for each new page)
Used in estimation (updating priorities after each observation)
Multi-level Policy (Estimation)
• CTRs in a block are homogeneous
– Observations from one cell also give information about others in the block
– How can we model this dependence?
Multi-level Policy (Estimation)
• Shrinkage Model
Scell | CTRcell ~ Bin (Ncell, CTRcell)
CTRcell ~ Beta (Paramsblock)
# clicks in
cell
# impressions in cell
All cells in a block come from the same distribution
Multi-level Policy (Estimation)
• Intuitively, this leads to shrinkageof cell CTRs towards block CTRs
E[CTR] = α.Priorblock + (1-α).Scell/Ncell
Estimated
CTR
Beta prior (“block
CTR”)
Observed
CTR
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
• Conclusions
Experiments [S. Panday et al. 2007]
Root
20 nodes
221 nodes
…
~7000 leaves
Taxonomy structure
use these 2
levels
Depth 0
Depth
7
Depth 1
Depth 2
Experiments
• Data collected over a 1 day period
• Collected from only one server, under some other ad-matching rules (not our bandit)
• ~229M impressions
• CTR values have been linearly transformed for purposes of confidentiality
Experiments (Multi-level Policy)
Multi-level gives much higher #clicks
Number of pulls
Clic
ks
Experiments (Multi-level Policy)
Multi-level gives much better Mean-Squared Error it has learnt
more from its explorations
Mean-S
qu
are
d E
rror
Number of pulls
Conclusions
• When having a CTR guided system, exploration is a key component
• Short term penalty for the exploration needs to be limited (exploration budget)
• Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance)
• Exploration in a reduced dimensional space: class hierarchy
• Top down traversal of the hierarchy to determine the class of the ad to show