36
RecSys 2019 Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction SI Ktena, A Tejani, L Theis, P Kumar Myana, D Dilipkumar, F Huszár, S Yoo, W Shi

Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

RecSys 2019

Addressing Delayed Feedback for Continuous Training with

Neural Networks in CTR prediction SI Ktena, A Tejani, L Theis, P Kumar Myana,

D Dilipkumar, F Huszár, S Yoo, W Shi

Page 2: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Why continuous training?

�2

Background

Page 3: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Why continuous training?

�2

Background

Page 4: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�3

New campaign IDs + non-stationary features

Background

Page 5: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Challenge: Delayed feedback

Fact:

Users may click ads after 1 second 1 minute or 1 hour

Page 6: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Challenge: Delayed feedbackWhy is it a challenge?

Should we wait? → Delays model training

Should we not wait? How do we decide the label?

Training Delay

Mod

el q

ualit

y

Page 7: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Solution: accept “fake negative”(user1, ad1, t1) imp 1 (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Tim

e

Event Label Weight

Page 8: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Solution: accept “fake negative”(user1, ad1, t1) imp 1 (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Tim

e

Event Label Weight

Page 9: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Solution: accept “fake negative”(user1, ad1, t1) imp 1 (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Tim

e

Event Label Weight

Page 10: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Solution: accept “fake negative”(user1, ad1, t1) imp 1 (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Tim

e same features

Event Label Weight

Page 11: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Solution: accept “fake negative”(user1, ad1, t1) imp 1 (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Tim

e same features

Event Label Weight

Page 12: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Solution: accept “fake negative”(user1, ad1, t1) imp 1 (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Tim

e

Assume X #Clicks out of Y #Impressions

Works well when CTR is low, where X/Y ~= X/ (X+Y)

same features

Event Label Weight

Page 13: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�7

Background

Delayed feedback models

Page 14: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�7

Background

Delayed feedback models

Page 15: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�7

Background

Delayed feedback models

● The probability of click is not constant through time [Chapelle 2014]

Page 16: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�7

Background

Delayed feedback models

● The probability of click is not constant through time [Chapelle 2014]

● Second model similar to survival time analysis models captures the delay between impression and click

Page 17: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�7

Background

Delayed feedback models

● The probability of click is not constant through time [Chapelle 2014]

● Second model similar to survival time analysis models captures the delay between impression and click

● Assume an exponential distribution or other non-parametric distribution

Page 18: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�7

Background

Delayed feedback models

● The probability of click is not constant through time [Chapelle 2014]

● Second model similar to survival time analysis models captures the delay between impression and click

● Assume an exponential distribution or other non-parametric distribution

Page 19: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�8

Background

Delayed feedback models

Page 20: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Our approach

Page 21: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�10

Importance sampling

● p is the actual data distribution ● b is the biased data distribution

Importance weights

Our approach

Page 22: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�11

Our approach

● Continuous training scheme -> potentially wait infinite time for positive engagement

● Two models○ Logistic regression○ Wide-and-deep model

● Four loss functions○ Delayed feedback loss [Chapelle, 2014]○ Positive-unlabeled loss [du Plessis et al., 2015]○ Fake negative weighted○ Fake negative calibration

Page 23: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�11

Our approach

● Continuous training scheme -> potentially wait infinite time for positive engagement

● Two models○ Logistic regression○ Wide-and-deep model

● Four loss functions○ Delayed feedback loss [Chapelle, 2014]○ Positive-unlabeled loss [du Plessis et al., 2015]○ Fake negative weighted○ Fake negative calibration

both rely on importance sampling

Page 24: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�12

Delayed feedback loss

Assume exponential distribution for time delay

Loss functions

Page 25: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�12

Delayed feedback loss

Assume exponential distribution for time delay

Loss functions

Page 26: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�13

Fake negative weighted & calibration

Don’t apply any weights on the training samples, only calibrate the output of the network using the following formulation

Loss functions

Page 27: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�13

Fake negative weighted & calibration

Don’t apply any weights on the training samples, only calibrate the output of the network using the following formulation

Loss functions

Page 28: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Experiments

Page 29: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�15

Offline experiments

Criteo data

○ Small dataset & public○ Training - 15.5M / Testing: 3.5M examples

RCE: normalised version of cross-entropy (higher values are better)

Page 30: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�15

Offline experiments

Criteo data

○ Small dataset & public○ Training - 15.5M / Testing: 3.5M examples

RCE: normalised version of cross-entropy (higher values are better)

Page 31: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�16

Offline experiments

Twitter data

○ Large & proprietary due to user information○ Training: 668M ads w. FN / Testing: 7M ads

RCE: normalised version of cross-entropy (higher values are better)

Page 32: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�16

Offline experiments

Twitter data

○ Large & proprietary due to user information○ Training: 668M ads w. FN / Testing: 7M ads

RCE: normalised version of cross-entropy (higher values are better)

Page 33: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�17

Online experiment

Online (A/B test)

Pooled RCE: RCE on combined traffic generated by models RPMq: Revenue per thousand requests

Page 34: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�18

Conclusions

● Solve problem of delayed feedback in continuous training by relying on importance weights

● FN weighted and FN calibration proposed and applied for the first time

● Offline evaluation on large proprietary dataset and online A/B test

Page 35: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

�19

Future directions

● Address catastrophic forgetting and overfitting

● Exploration / exploitation strategies

Page 36: Addressing Delayed Feedback for Continuous Training with ... · Logistic regression Wide-and-deep model Four loss functions Delayed feedback loss [Chapelle, 2014] Positive-unlabeled

Questions?

https://careers.twitter.com

@s0f1ra