Upload
liang-tang
View
699
Download
0
Embed Size (px)
Citation preview
Ensemble Contextual Bandits for Personalized Recommenda8on
Liang Tang, Yexi Jiang, Lei Li, Tao Li Florida Interna8onal University
10/7/14 ACM RecSys 2014 1
Cold Start Problem for Learning based Recommenda8on
• Issue: Do not have enough appropriate data. – Historical user log data is biased. – User interest may change over 8me. – New items (or users) are added.
• Approach: Exploita8on and Explora8on – Contextual Mul8-‐Arm Bandit Algorithm
10/7/14 ACM RecSys 2014 2
The contextual informa8on are item features and user features
Contextual Bandit Algorithm with Personalized Recommenda8on
• Contextual Bandit – Let a1, …, am be a set of arms. – Given a context xt, the model decides which arm to pull. – AYer each pull, you receive a random reward, which is determined by the pulled arm and xt.
– Goal: maximize the total received reward.
• Online Recommenda8on – Arm à Item Pull à Recommend – Context à User feature – Reward à Click
10/7/14 ACM RecSys 2014 3
Problem Statement • Problem Se3ng: have many different recommenda8on models (or policies): – Different CTR Predic8on Algorithms. – Different Explora8on-‐Exploita8on Algorithms. – Different Parameter Choices.
• No data to do model valida;on
• Problem Statement: how to build an ensemble model that is close to the best model in the cold start situa8on ?
10/7/14 ACM RecSys 2014 4
How Ensemble?
• Classifier ensemble method does not work in this seang – Recommenda8on decision is NOT purely based on the predicted CTR.
• Each individual model only tells us: – Which item to recommend.
10/7/14 ACM RecSys 2014 5
Ensemble Method
• Our Method: – Allocate recommenda8on chances to individual models.
• Problem: – Beder models should have more chances. – We do not know which one is good or bad in advance.
– Ideal solu8on: allocate all chances to the best one.
10/7/14 ACM RecSys 2014 6
Current Prac8ce: Online Evalua8on (or A/B tes8ng)
Let π1, π2 … πm be the individual models. 1. Deploy π1, π2 … πm into the online system at the
same 8me. 2. Dispatch a small percent user traffic to each model. 3. AYer a period, choose the model having the best
CTR as the produc8on model.
10/7/14 ACM RecSys 2014 7
Current Prac8ce: Online Evalua8on (or A/B tes8ng)
Let π1, π2 … πm be the individual models. 1. Deploy π1, π2 … πm into the online system at the
same 8me. 2. Dispatch a small percent user traffic to each model. 3. AYer a period, choose the model having the best
CTR as the produc8on model.
10/7/14 ACM RecSys 2014 8
If we have too many models, this will hurt the performance of the online system.
Our Idea 1 (HyperTS) • The CTR of model πi is a random unknown variable, Ri .
• Goal: – maximize , rt is a random number drawn from Rs(t), s(t)=1,2,…, or m. For each t=1,…,N, we decide s(t).
• Solu;on: – Bernoulli Thompson Sampling (flat prior: beta(1,1)) .
– π1, π2 … πm are bandit arms.
10/7/14 ACM RecSys 2014 9
1N
rtt=1
N
∑CTR of our ensemble
model
No tricky parameters
An Example of HyperTS
10/7/14 ACM RecSys 2014 10
In memory, we keep these es8mated CTRs for π1, π2 … πm.
R1
R2
Rk
…
Rm
…
An Example of HyperTS
10/7/14 ACM RecSys 2014 11
A user visit
HyperTS selects a candidate model, πk .
R1
R2
Rk
…
Es8mated CTRs
Rm
…
An Example of HyperTS
10/7/14 ACM RecSys 2014 12
A user visit
HyperTS selects a candidate model, πk .
πk recommends item A to the user.
A
xt:: context features
Es8mated CTRs
R1
R2
Rk
…
Rm
…
An Example of HyperTS
10/7/14 ACM RecSys 2014 13
A user visit
HyperTS selects a candidate model, πk .
πk recommends item A to the user.
A
xt:: context features
rt :click or not
HyperTS updates the es8ma8on of Rk based on rt.
Es8mated CTRs
R1
R2
Rk
…
Rm
… update
Two-‐Layer Decision
10/7/14 ACM RecSys 2014 14
Bernoulli Thompson Sampling
π1 π2 πm πk
Item A Item B Item C
… …
Our Idea 2 (HyperTSFB)
• Limita8on of Previous Idea: – For each recommenda8on, user feedback is used by only one individual model (e.g., πk).
• Mo8va8on: – Can we update all R1, R2, …, Rm by every user feedback? (Share every user feedback to every individual model).
10/7/14 ACM RecSys 2014 15
Our Idea 2 (HyperTSFB)
• Assume each model can output the probability of recommending any item given xt. – E.g., for determinis8c recommenda8on, it is 1 or 0.
• For a user visit xt: 1. πk is selected to perform recommenda8on (k=1,2,…, or
m). 2. Item A is recommended by πk given xt. 3. Receive a user feedback (click or not click), rt. 4. Ask every model π1, π2 … πm, what is the probability of
recommending A given xt.
10/7/14 ACM RecSys 2014 16
Our Idea 2 (HyperTSFB)
• Assume each model can output the probability of recommending any item given xt. – E.g., for determinis8c recommenda8on, it is 1 or 0.
• For a user visit xt: 1. πk is selected to perform recommenda8on (k=1,2,…, or
m). 2. Item A is recommended by πk given xt. 3. Receive a user feedback (click or not click), rt. 4. Ask every model π1, π2 … πm, what is the probability of
recommending A given xt.
10/7/14 ACM RecSys 2014 17
Es;mate the CTR of π1, π2 … πm (Importance Sampling)
Experimental Setup • Experimental Data – Yahoo! Today News data logs (randomly displayed).
– KDD Cup 2012 Online Adver8sing data set.
• Evalua;on Methods – Yahoo! Today News: Replay (see Lihong Li et. al’s WSDM 2011
paper). – KDD Cup 2012 Data: Simula>on by a Logis8c Regression Model.
10/7/14 ACM RecSys 2014 18
Compara8ve Methods
• CTR Predic8on Algorithm – Logis8c Regression
• Exploita8on-‐Explora8on Algorithms – Random, ε-‐greedy, LinUCB, SoYmax, Epoch-‐greedy, Thompson sampling
• HyperTS and HyperTSFB
10/7/14 ACM RecSys 2014 19
Results for Yahoo! News Data • Every 100,000 impressions are aggregated into a bucket.
10/7/14 ACM RecSys 2014 20
Results for Yahoo! News Data (Cont.)
10/7/14 ACM RecSys 2014 21
Conclusions for Experimental Results
1. The performance of baseline exploita8on-‐explora8on algorithms is very sensi8ve to the parameter seang. – In cold-‐start situa8on, no enough data to tune parameter.
2. HyperTS and HyperTSFB can be close to the op8mal baseline algorithm (No guarantee be beder than the op8mal one), even though some bad individual models are included.
3. For contextual Thompson sampling, the performance depends on the choice of prior distribu8on for the logis8c regression. – For online Bayesian learning, the posterior distribu8on approxima8on is
not accurate(cannot store the past data).
10/7/14 ACM RecSys 2014 22
Ques8on & Thank you
• Thank you!
• Ques8on?
10/7/14 ACM RecSys 2014 23