23
Ensemble Contextual Bandits for Personalized Recommenda8on Liang Tang, Yexi Jiang, Lei Li, Tao Li Florida Interna8onal University 10/7/14 ACM RecSys 2014 1

Ensemble Contextual Bandits for Personalized Recommendation

Embed Size (px)

Citation preview

Page 1: Ensemble Contextual Bandits for Personalized Recommendation

Ensemble  Contextual  Bandits  for  Personalized  Recommenda8on  

Liang  Tang,  Yexi  Jiang,  Lei  Li,  Tao  Li  Florida  Interna8onal  University  

10/7/14   ACM  RecSys  2014   1  

Page 2: Ensemble Contextual Bandits for Personalized Recommendation

Cold  Start  Problem  for  Learning  based  Recommenda8on  

•  Issue:  Do  not  have  enough  appropriate  data.  – Historical  user  log  data  is  biased.  – User  interest  may  change  over  8me.  – New  items  (or  users)  are  added.  

•  Approach:  Exploita8on  and  Explora8on  – Contextual  Mul8-­‐Arm  Bandit  Algorithm  

 

10/7/14   ACM  RecSys  2014   2  

The  contextual  informa8on  are  item  features  and  user  features  

Page 3: Ensemble Contextual Bandits for Personalized Recommendation

Contextual  Bandit  Algorithm  with  Personalized  Recommenda8on  

•  Contextual  Bandit  –  Let  a1,  …,  am  be  a  set  of  arms.  –  Given  a  context  xt,  the  model  decides  which  arm  to  pull.  –  AYer  each  pull,  you  receive  a  random  reward,  which  is  determined  by  the  pulled  arm  and  xt.  

–  Goal:  maximize  the  total  received  reward.  

•  Online  Recommenda8on  –  Arm      à  Item      Pull    à  Recommend  –  Context      à  User  feature    –  Reward    à  Click  

10/7/14   ACM  RecSys  2014   3  

Page 4: Ensemble Contextual Bandits for Personalized Recommendation

Problem  Statement  •  Problem  Se3ng:  have  many  different  recommenda8on  models  (or  policies):  – Different  CTR  Predic8on  Algorithms.  – Different  Explora8on-­‐Exploita8on  Algorithms.  – Different  Parameter  Choices.  

•  No  data  to  do  model  valida;on  

•  Problem  Statement:  how  to  build  an  ensemble  model  that  is  close  to  the  best  model  in  the  cold  start  situa8on  ?  

10/7/14   ACM  RecSys  2014   4  

Page 5: Ensemble Contextual Bandits for Personalized Recommendation

How  Ensemble?  

•  Classifier  ensemble  method  does  not  work  in  this  seang  – Recommenda8on  decision  is  NOT  purely  based  on  the  predicted  CTR.  

•  Each  individual  model  only  tells  us:  – Which  item  to  recommend.    

10/7/14   ACM  RecSys  2014   5  

Page 6: Ensemble Contextual Bandits for Personalized Recommendation

Ensemble  Method  

•  Our  Method:  – Allocate  recommenda8on  chances  to  individual  models.  

•  Problem:  – Beder  models  should  have  more  chances.  – We  do  not  know  which  one  is  good  or  bad  in  advance.  

–  Ideal  solu8on:  allocate  all  chances  to  the  best  one.  

10/7/14   ACM  RecSys  2014   6  

Page 7: Ensemble Contextual Bandits for Personalized Recommendation

Current  Prac8ce:  Online  Evalua8on  (or  A/B  tes8ng)  

Let  π1,  π2    …  πm  be  the  individual  models.  1.  Deploy  π1,  π2    …  πm  into  the  online  system  at  the  

same  8me.    2.  Dispatch  a  small  percent  user  traffic  to  each  model.  3.  AYer  a  period,  choose  the  model  having  the  best  

CTR  as  the  produc8on  model.  

10/7/14   ACM  RecSys  2014   7  

Page 8: Ensemble Contextual Bandits for Personalized Recommendation

Current  Prac8ce:  Online  Evalua8on  (or  A/B  tes8ng)  

Let  π1,  π2    …  πm  be  the  individual  models.  1.  Deploy  π1,  π2    …  πm  into  the  online  system  at  the  

same  8me.    2.  Dispatch  a  small  percent  user  traffic  to  each  model.  3.  AYer  a  period,  choose  the  model  having  the  best  

CTR  as  the  produc8on  model.  

10/7/14   ACM  RecSys  2014   8  

If  we  have  too  many  models,  this  will  hurt  the  performance  of  the  online  system.    

Page 9: Ensemble Contextual Bandits for Personalized Recommendation

Our  Idea  1  (HyperTS)  •  The  CTR  of  model  πi  is  a  random  unknown  variable,  Ri  .    

•  Goal:    –  maximize                                ,  rt  is  a  random  number  drawn  from  Rs(t),  s(t)=1,2,…,  or  m.  For  each  t=1,…,N,  we  decide  s(t).  

•  Solu;on:  –  Bernoulli  Thompson  Sampling  (flat  prior:  beta(1,1))  .  

–  π1,  π2    …  πm  are  bandit  arms.    

10/7/14   ACM  RecSys  2014   9  

1N

rtt=1

N

∑CTR  of  our  ensemble  

model  

No  tricky  parameters  

Page 10: Ensemble Contextual Bandits for Personalized Recommendation

An  Example  of  HyperTS  

10/7/14   ACM  RecSys  2014   10  

In  memory,  we  keep  these  es8mated  CTRs  for  π1,  π2    …  πm.  

R1  

R2  

Rk  

…  

Rm  

…  

Page 11: Ensemble Contextual Bandits for Personalized Recommendation

An  Example  of  HyperTS  

10/7/14   ACM  RecSys  2014   11  

A  user  visit  

HyperTS  selects  a  candidate  model,  πk  .  

R1  

R2  

Rk  

…  

Es8mated  CTRs  

Rm  

…  

Page 12: Ensemble Contextual Bandits for Personalized Recommendation

An  Example  of  HyperTS  

10/7/14   ACM  RecSys  2014   12  

A  user  visit  

HyperTS  selects  a  candidate  model,  πk  .  

πk  recommends  item  A  to  the  user.  

A  

xt::  context    features  

Es8mated  CTRs  

R1  

R2  

Rk  

…  

Rm  

…  

Page 13: Ensemble Contextual Bandits for Personalized Recommendation

An  Example  of  HyperTS  

10/7/14   ACM  RecSys  2014   13  

A  user  visit  

HyperTS  selects  a  candidate  model,  πk  .  

πk  recommends  item  A  to  the  user.  

A  

xt::  context    features  

rt  :click  or  not  

HyperTS  updates  the  es8ma8on  of  Rk  based  on  rt.  

Es8mated  CTRs  

R1  

R2  

Rk  

…  

Rm  

…  update  

Page 14: Ensemble Contextual Bandits for Personalized Recommendation

Two-­‐Layer  Decision  

10/7/14   ACM  RecSys  2014   14  

Bernoulli  Thompson  Sampling  

π1   π2   πm  πk  

Item  A   Item  B   Item  C  

…   …  

Page 15: Ensemble Contextual Bandits for Personalized Recommendation

Our  Idea  2  (HyperTSFB)  

•  Limita8on  of  Previous  Idea:  – For  each  recommenda8on,  user  feedback  is  used  by  only  one  individual  model  (e.g.,  πk).  

 

•  Mo8va8on:  – Can  we  update  all  R1,  R2,  …,  Rm  by  every  user  feedback?  (Share  every  user  feedback  to  every  individual  model).  

10/7/14   ACM  RecSys  2014   15  

Page 16: Ensemble Contextual Bandits for Personalized Recommendation

Our  Idea  2  (HyperTSFB)  

•  Assume  each  model  can  output  the  probability  of  recommending  any  item  given  xt.  –  E.g.,  for  determinis8c  recommenda8on,  it  is  1  or  0.  

•  For  a  user  visit  xt:  1.  πk  is  selected  to  perform  recommenda8on  (k=1,2,…,  or  

m).    2.  Item  A  is  recommended  by  πk  given  xt.    3.  Receive  a  user  feedback  (click  or  not  click),  rt.  4.  Ask  every  model  π1,  π2    …  πm,  what  is  the  probability  of  

recommending  A  given  xt.  

10/7/14   ACM  RecSys  2014   16  

Page 17: Ensemble Contextual Bandits for Personalized Recommendation

Our  Idea  2  (HyperTSFB)  

•  Assume  each  model  can  output  the  probability  of  recommending  any  item  given  xt.  –  E.g.,  for  determinis8c  recommenda8on,  it  is  1  or  0.  

•  For  a  user  visit  xt:  1.  πk  is  selected  to  perform  recommenda8on  (k=1,2,…,  or  

m).    2.  Item  A  is  recommended  by  πk  given  xt.    3.  Receive  a  user  feedback  (click  or  not  click),  rt.  4.  Ask  every  model  π1,  π2    …  πm,  what  is  the  probability  of  

recommending  A  given  xt.  

10/7/14   ACM  RecSys  2014   17  

Es;mate  the  CTR  of    π1,  π2    …  πm    (Importance  Sampling)  

Page 18: Ensemble Contextual Bandits for Personalized Recommendation

Experimental  Setup  •  Experimental  Data  –  Yahoo!  Today  News  data  logs  (randomly  displayed).  

–  KDD  Cup  2012  Online  Adver8sing  data  set.  

•  Evalua;on  Methods  –  Yahoo!  Today  News:  Replay  (see  Lihong  Li  et.  al’s  WSDM  2011  

paper).    –  KDD  Cup  2012  Data:  Simula>on  by  a  Logis8c  Regression  Model.  

10/7/14   ACM  RecSys  2014   18  

Page 19: Ensemble Contextual Bandits for Personalized Recommendation

Compara8ve  Methods  

•  CTR  Predic8on  Algorithm  – Logis8c  Regression  

•  Exploita8on-­‐Explora8on  Algorithms  – Random,  ε-­‐greedy,  LinUCB,  SoYmax,  Epoch-­‐greedy,  Thompson  sampling  

•  HyperTS  and  HyperTSFB    

10/7/14   ACM  RecSys  2014   19  

Page 20: Ensemble Contextual Bandits for Personalized Recommendation

Results  for  Yahoo!  News  Data  •  Every  100,000  impressions  are  aggregated  into  a  bucket.    

10/7/14   ACM  RecSys  2014   20  

Page 21: Ensemble Contextual Bandits for Personalized Recommendation

Results  for  Yahoo!  News  Data  (Cont.)  

10/7/14   ACM  RecSys  2014   21  

Page 22: Ensemble Contextual Bandits for Personalized Recommendation

Conclusions  for  Experimental  Results  

1.  The  performance  of  baseline  exploita8on-­‐explora8on  algorithms  is  very  sensi8ve  to  the  parameter  seang.  –  In  cold-­‐start  situa8on,  no  enough  data  to  tune  parameter.  

2.  HyperTS  and  HyperTSFB  can  be  close  to  the  op8mal  baseline  algorithm  (No  guarantee  be  beder  than  the  op8mal  one),  even  though  some  bad  individual  models  are  included.  

3.  For  contextual  Thompson  sampling,  the  performance  depends  on  the  choice  of  prior  distribu8on  for  the  logis8c  regression.  –  For  online  Bayesian  learning,  the  posterior  distribu8on  approxima8on  is  

not  accurate(cannot  store  the  past  data).  

10/7/14   ACM  RecSys  2014   22  

Page 23: Ensemble Contextual Bandits for Personalized Recommendation

Ques8on  &  Thank  you  

•  Thank  you!  

•  Ques8on?  

10/7/14   ACM  RecSys  2014   23