46
Claudia Perlich Chief Scientist, Dstillery Adjunct Professor, Stern (NYU) @claudia_perlich Tales from data trenches of display advertising

MLconf NYC Claudia Perlich

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: MLconf NYC Claudia Perlich

Claudia Perlich Chief Scientist, Dstillery

Adjunct Professor, Stern (NYU)

@claudia_perlich

Tales from data trenches of display advertising

Page 2: MLconf NYC Claudia Perlich

Ad Exchange

Shopping at one of

our campaign sites

cookies

10 Million

URL’s

200 Million

browsers

0.0001% to 1%

baserate 10 Billions of

auctions

per day

conversion

Where should

we advertise and

at what price?

Does the ad have

causal effect?

What data should

we pay for?

Attribution?

Who should

we target for

a marketer?

What requests

are fraudulent?

Page 3: MLconf NYC Claudia Perlich

The Non-Branded Web

A consumer’s online/mobile activity

The Branded Web

gets recorded like this:

Our Browser Data: Agnostic

I do not want to ‘understand’ who you are …

Browsing History

Hashed URL’s: date1 abkcc

date2 kkllo

date3 88iok

date4 7uiol

Browsing History

Hashed URL’s: date1 abkcc

date2 kkllo

date3 88iok

date4 7uiol

Brand Event Encoded date1 3012L20 date 2 4199L30

… date n 3075L50

Brand Event Encoded date1 3012L20 date 2 4199L30

… date n 3075L50

Page 4: MLconf NYC Claudia Perlich

Targeting Model

Bidding Model

Fraud

Causal Analysis

Analytical Decomposition

Page 5: MLconf NYC Claudia Perlich

The Heart and Soul

Predictive modeling on hashed browsing history

10 Million dimensions for URL’s (binary indicators)

extremely sparse data

positives are extremely rare

Targeting Model

P(Buy|URL,inventory,ad)

Page 6: MLconf NYC Claudia Perlich

How can we learn from 10M features with no/few positives?

We cheat.

In ML, cheating is called “Transfer Learning”

Page 7: MLconf NYC Claudia Perlich

The heart and soul

Has to deal with the 10 Million URL’s

Need to find more positives!

Targeting Model P(Buy|URL,inventory,ad)

Page 8: MLconf NYC Claudia Perlich

Experiment

Randomized targeting across 58 different large display ad campaigns.

Served ads to users with active, stable cookies

Targeted ~5000 random users per day for each marketer. Campaigns ran

for 1 to 5 months, between 100K and 4MM impressions per campaign

Observed outcomes: clicks on ads, post-impression (PI) purchases

(conversions)

Data

Targeting

• Optimize targeting using Click and PI Purchase

• Technographic info and web history as input variables

• Evaluate each separately trained model on its ability to rank order users for PI

Purchase, using AUC (Mann-Whitney Wilcoxin Statistic)

• Each model is trained/evaluated using Logistic Regression

Page 9: MLconf NYC Claudia Perlich

.2.4

.6.8

AU

C

Train on Click Train on Purchase

®

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) for purchase learning

[Dalessandro et al. 2012]

.2.4

.6.8

AU

C

Train on Click Train on Purchase

®

Page 10: MLconf NYC Claudia Perlich

.2.4

.6.8

AU

C

Train on Click Train on Purchase

®

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) for click learning

[Dalessandro et al. 2012]

Evalu

ate

d o

n p

redic

ting p

urc

ha

ses

(AU

C in the t

arg

et

dom

ain

)

Page 11: MLconf NYC Claudia Perlich

.2.4

.6.8

1

Train on Clicks Train on Site Visits Train on Purchase

AU

C D

istr

ibu

tio

n

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) for Site Visit learning

[Dalessandro et al. 2012]

Significantly better targeting training on source

task

Evalu

ate

d o

n p

redic

ting p

urc

ha

ses

(AU

C in the t

arg

et

dom

ain

)

Page 12: MLconf NYC Claudia Perlich

Why is learning the wrong thing better???

Page 13: MLconf NYC Claudia Perlich

Transfer: Navigating Bias-Variance

Page 14: MLconf NYC Claudia Perlich

.2.4

.6.8

1

Train on Clicks Train on Site Visits Train on Purchase

AU

C D

istr

ibu

tion

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) across 58 different display ad campaigns

[Dalessandro et al. 2012]

Significantly better targeting training on source

task

High cost

High correlation

High Variance

Low cost

Low correlation

High Bias

Low Cost

High correlation

Low Bias & Variance

Page 15: MLconf NYC Claudia Perlich

The heart and soul

Has to deal with the 10 Million URL’s

Transfer learning:

Use all kinds of Site visits instead of new purchases

Biased sample in every possible way to reduce variance

Negatives are ‘everything else’

Pre-campaign without impression

Stacking for transfer learning

Targeting Model

Organic: P(SiteVisit|URL’s)

P(Buy|URL,inventory,ad)

MLJ 2014

Page 16: MLconf NYC Claudia Perlich

Logistic regression in 10 Million dimensions

Stochastic Gradient Descent

L1 and L2 constraints

Automatic estimation of optimal learning rates

Bayesian empirical industry priors

Streaming updates of the models

Fully Automated ~10000 model per week

KDD 2014

Targeting Model

p(sv|urls) =

Page 17: MLconf NYC Claudia Perlich

Ad Ad Ad

Real-time Scoring of a User

Ad

OBSERVATION

Purchase

ProspectRank

Threshold

site visit with positive correlation

site visit with negative correlation

ENGAGEMENT

Some prospects fall

out of favor once their

in-market indicators

decline.

Page 18: MLconf NYC Claudia Perlich

0

5

10

15

20

25

0

1.0M

2.0M

3.0M

4.0M

5.0M

6.0M

NN

Lif

t o

ver

RO

N

Tota

l Im

pre

ssio

ns

median lift = 5x

Note: the top prospects are consistently rated as

being excellent compared to alternatives by advertising

clients’ internal measures, and when measured by their

analysis partners (e.g., Nielsen): high ROI,

low cost-per-acquisition, etc.

Lift over random for 66 campaigns for online display ad prospecting

Lift

over

baselin

e

<snip>

Page 19: MLconf NYC Claudia Perlich

The Pokerface Bidding Model P(SiteVisit|Prospect Rank, Inventory, ad)

KDD 2012 Best Paper

Marginal Inventory Score:

Convert into bid price:

Page 20: MLconf NYC Claudia Perlich

Inventory for Hotel Campaign

20

Lift

Page 21: MLconf NYC Claudia Perlich

Measuring causal effect?

A/B Testing

Practical concerns

Estimate Causal effects from observational data

Using targeted maximum likelihood (TMLE)

to estimate causal impact

Can be done ex-post for different questions

Need to control for confounding

Data has to be ‘rich’ and cover all combinations of

confounding and treatment

ADKDD 2011 E[YA=ad] – E[YA=no ad]

Page 22: MLconf NYC Claudia Perlich

An important decision…

I think she is hot!

Hmm – so what should I write

to her to get her number?

Page 23: MLconf NYC Claudia Perlich

Source: OK Trends

? ?

Page 24: MLconf NYC Claudia Perlich

Hardships of causality.

Beauty is Confounding

determines both the probability

of getting the number and of the probability that James will say it

need to control for the actual beauty or it can appear that making compliments is a bad idea

“You are beautiful.”

Page 25: MLconf NYC Claudia Perlich

Hardships of causality.

Targeting is Confounding

We only show ads to people we know are more likely to convert (ad or not)

convers

ion r

ate

s

DID NOT SEE AD SAW AD

Need to control for confounding

Data has to be ‘rich’ and cover all combinations of confounding and treatment

Page 26: MLconf NYC Claudia Perlich

Observational Causal Methods: TMLE

Negative Test: wrong ad

Positive Test: A/B comparison

Page 27: MLconf NYC Claudia Perlich

Some creatives do not work …

27

Page 28: MLconf NYC Claudia Perlich

The Police Fraud

Tracking artificial co-visitation patters

Blacklist inventory in the exchanges

Ignore the browser

KDD 2013

Page 29: MLconf NYC Claudia Perlich

Unreasonable Performance Increase Spring 12

2 weeks

Pe

rfo

rma

nc

e In

de

x

2x

Page 30: MLconf NYC Claudia Perlich

Oddly predictive websites?

Page 31: MLconf NYC Claudia Perlich

36% traffic is Non-Intentional

2011 2012

6% 36%

Page 32: MLconf NYC Claudia Perlich

Traffic patterns are ‘non - human’

website 1 website 2 50%

Data from Bid Requests in Ad-Exchanges

Page 33: MLconf NYC Claudia Perlich

Node:

hostname

Edge:

50% co-visitation

WWW 2010

Page 34: MLconf NYC Claudia Perlich

Boston Herald

Page 35: MLconf NYC Claudia Perlich

Boston Herald

Page 36: MLconf NYC Claudia Perlich

womenshealthbase?

Page 37: MLconf NYC Claudia Perlich
Page 38: MLconf NYC Claudia Perlich
Page 39: MLconf NYC Claudia Perlich
Page 40: MLconf NYC Claudia Perlich

WWW 2012

Page 41: MLconf NYC Claudia Perlich

Unreasonable Performance Increase Spring 12

2 weeks

Pe

rfo

rma

nc

e In

de

x

2x

Page 42: MLconf NYC Claudia Perlich

Now it is coming also to brands

• ‘Cookie Stuffing’ increases the value of the ad for retargeting

• Messing up Web analytics …

• Messes up my models because a botnet is easier to predict than a human

Page 43: MLconf NYC Claudia Perlich

Fraud pollutes my models

• Don’t show ads on those sites

• Don’t show ads to a high jacked browser

• Need to remove the visits to the fraud sites

• Need to remove the fraudulent brand visits

When we see a browser on caught up in fraudulent

activity: send him to the penalty box where we

ignore all his actions

Page 44: MLconf NYC Claudia Perlich

Using the penalty box: all back to normal

44

3 more weeks in spring 2012

Pe

rfo

rma

nc

e I

nd

ex

Page 45: MLconf NYC Claudia Perlich

In eigener Sache

[email protected]

Page 46: MLconf NYC Claudia Perlich

1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand

Advertising: Privacy Friendly Social Network Targeting, KDD 2009

2. O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Estimating The Effect Of

Online Display Advertising On Browser Conversion. ADKDD 2011

3. C.Perlich, O. Stitelman, B. Dalessandro, T. Raeder and F. Provost. Bid Optimizing

and Inventory Scoring in Targeted Online Advertising. KDD 2012 (Best Paper Award)

4. T. Raeder, O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Design

Principles of Massive, Robust Prediction Systems. KDD 2012

5. B. Dalessandro, O. Stitelman, C. Perlich, F. Provost Causally Motivated Attribution for

Online Advertising. In Proceedings of KDD, ADKDD 2012

6. B. Dalessandro, R. Hook. C. Perlich, F. Provost. Transfer Learning for Display

Advertising MLJ 2014

7. T. Raeder, C. Perlich, B. Dalessandro, O. Stitelman, F. Provost. Scalable Supervised

Dimensionality Reduction Using Clustering at KDD 2013

8. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, F. Provost. Using Co-

visitation Networks For Classifying Non-Intentional Traffic‘ at KDD 2013

46

Some References