MLconf NYC Claudia Perlich

Preview:

DESCRIPTION

 

Citation preview

Claudia Perlich Chief Scientist, Dstillery

Adjunct Professor, Stern (NYU)

@claudia_perlich

Tales from data trenches of display advertising

Ad Exchange

Shopping at one of

our campaign sites

cookies

10 Million

URL’s

200 Million

browsers

0.0001% to 1%

baserate 10 Billions of

auctions

per day

conversion

Where should

we advertise and

at what price?

Does the ad have

causal effect?

What data should

we pay for?

Attribution?

Who should

we target for

a marketer?

What requests

are fraudulent?

The Non-Branded Web

A consumer’s online/mobile activity

The Branded Web

gets recorded like this:

Our Browser Data: Agnostic

I do not want to ‘understand’ who you are …

Browsing History

Hashed URL’s: date1 abkcc

date2 kkllo

date3 88iok

date4 7uiol

Browsing History

Hashed URL’s: date1 abkcc

date2 kkllo

date3 88iok

date4 7uiol

Brand Event Encoded date1 3012L20 date 2 4199L30

… date n 3075L50

Brand Event Encoded date1 3012L20 date 2 4199L30

… date n 3075L50

Targeting Model

Bidding Model

Fraud

Causal Analysis

Analytical Decomposition

The Heart and Soul

Predictive modeling on hashed browsing history

10 Million dimensions for URL’s (binary indicators)

extremely sparse data

positives are extremely rare

Targeting Model

P(Buy|URL,inventory,ad)

How can we learn from 10M features with no/few positives?

We cheat.

In ML, cheating is called “Transfer Learning”

The heart and soul

Has to deal with the 10 Million URL’s

Need to find more positives!

Targeting Model P(Buy|URL,inventory,ad)

Experiment

Randomized targeting across 58 different large display ad campaigns.

Served ads to users with active, stable cookies

Targeted ~5000 random users per day for each marketer. Campaigns ran

for 1 to 5 months, between 100K and 4MM impressions per campaign

Observed outcomes: clicks on ads, post-impression (PI) purchases

(conversions)

Data

Targeting

• Optimize targeting using Click and PI Purchase

• Technographic info and web history as input variables

• Evaluate each separately trained model on its ability to rank order users for PI

Purchase, using AUC (Mann-Whitney Wilcoxin Statistic)

• Each model is trained/evaluated using Logistic Regression

.2.4

.6.8

AU

C

Train on Click Train on Purchase

®

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) for purchase learning

[Dalessandro et al. 2012]

.2.4

.6.8

AU

C

Train on Click Train on Purchase

®

.2.4

.6.8

AU

C

Train on Click Train on Purchase

®

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) for click learning

[Dalessandro et al. 2012]

Evalu

ate

d o

n p

redic

ting p

urc

ha

ses

(AU

C in the t

arg

et

dom

ain

)

.2.4

.6.8

1

Train on Clicks Train on Site Visits Train on Purchase

AU

C D

istr

ibu

tio

n

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) for Site Visit learning

[Dalessandro et al. 2012]

Significantly better targeting training on source

task

Evalu

ate

d o

n p

redic

ting p

urc

ha

ses

(AU

C in the t

arg

et

dom

ain

)

Why is learning the wrong thing better???

Transfer: Navigating Bias-Variance

.2.4

.6.8

1

Train on Clicks Train on Site Visits Train on Purchase

AU

C D

istr

ibu

tion

*Restricted feature set used for these modeling results; qualitative conclusions generalize

Predictive performance* (AUC) across 58 different display ad campaigns

[Dalessandro et al. 2012]

Significantly better targeting training on source

task

High cost

High correlation

High Variance

Low cost

Low correlation

High Bias

Low Cost

High correlation

Low Bias & Variance

The heart and soul

Has to deal with the 10 Million URL’s

Transfer learning:

Use all kinds of Site visits instead of new purchases

Biased sample in every possible way to reduce variance

Negatives are ‘everything else’

Pre-campaign without impression

Stacking for transfer learning

Targeting Model

Organic: P(SiteVisit|URL’s)

P(Buy|URL,inventory,ad)

MLJ 2014

Logistic regression in 10 Million dimensions

Stochastic Gradient Descent

L1 and L2 constraints

Automatic estimation of optimal learning rates

Bayesian empirical industry priors

Streaming updates of the models

Fully Automated ~10000 model per week

KDD 2014

Targeting Model

p(sv|urls) =

Ad Ad Ad

Real-time Scoring of a User

Ad

OBSERVATION

Purchase

ProspectRank

Threshold

site visit with positive correlation

site visit with negative correlation

ENGAGEMENT

Some prospects fall

out of favor once their

in-market indicators

decline.

0

5

10

15

20

25

0

1.0M

2.0M

3.0M

4.0M

5.0M

6.0M

NN

Lif

t o

ver

RO

N

Tota

l Im

pre

ssio

ns

median lift = 5x

Note: the top prospects are consistently rated as

being excellent compared to alternatives by advertising

clients’ internal measures, and when measured by their

analysis partners (e.g., Nielsen): high ROI,

low cost-per-acquisition, etc.

Lift over random for 66 campaigns for online display ad prospecting

Lift

over

baselin

e

<snip>

The Pokerface Bidding Model P(SiteVisit|Prospect Rank, Inventory, ad)

KDD 2012 Best Paper

Marginal Inventory Score:

Convert into bid price:

Inventory for Hotel Campaign

20

Lift

Measuring causal effect?

A/B Testing

Practical concerns

Estimate Causal effects from observational data

Using targeted maximum likelihood (TMLE)

to estimate causal impact

Can be done ex-post for different questions

Need to control for confounding

Data has to be ‘rich’ and cover all combinations of

confounding and treatment

ADKDD 2011 E[YA=ad] – E[YA=no ad]

An important decision…

I think she is hot!

Hmm – so what should I write

to her to get her number?

Source: OK Trends

? ?

Hardships of causality.

Beauty is Confounding

determines both the probability

of getting the number and of the probability that James will say it

need to control for the actual beauty or it can appear that making compliments is a bad idea

“You are beautiful.”

Hardships of causality.

Targeting is Confounding

We only show ads to people we know are more likely to convert (ad or not)

convers

ion r

ate

s

DID NOT SEE AD SAW AD

Need to control for confounding

Data has to be ‘rich’ and cover all combinations of confounding and treatment

Observational Causal Methods: TMLE

Negative Test: wrong ad

Positive Test: A/B comparison

Some creatives do not work …

27

The Police Fraud

Tracking artificial co-visitation patters

Blacklist inventory in the exchanges

Ignore the browser

KDD 2013

Unreasonable Performance Increase Spring 12

2 weeks

Pe

rfo

rma

nc

e In

de

x

2x

Oddly predictive websites?

36% traffic is Non-Intentional

2011 2012

6% 36%

Traffic patterns are ‘non - human’

website 1 website 2 50%

Data from Bid Requests in Ad-Exchanges

Node:

hostname

Edge:

50% co-visitation

WWW 2010

Boston Herald

Boston Herald

womenshealthbase?

WWW 2012

Unreasonable Performance Increase Spring 12

2 weeks

Pe

rfo

rma

nc

e In

de

x

2x

Now it is coming also to brands

• ‘Cookie Stuffing’ increases the value of the ad for retargeting

• Messing up Web analytics …

• Messes up my models because a botnet is easier to predict than a human

Fraud pollutes my models

• Don’t show ads on those sites

• Don’t show ads to a high jacked browser

• Need to remove the visits to the fraud sites

• Need to remove the fraudulent brand visits

When we see a browser on caught up in fraudulent

activity: send him to the penalty box where we

ignore all his actions

Using the penalty box: all back to normal

44

3 more weeks in spring 2012

Pe

rfo

rma

nc

e I

nd

ex

In eigener Sache

claudia.perlich@gmail.com

1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand

Advertising: Privacy Friendly Social Network Targeting, KDD 2009

2. O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Estimating The Effect Of

Online Display Advertising On Browser Conversion. ADKDD 2011

3. C.Perlich, O. Stitelman, B. Dalessandro, T. Raeder and F. Provost. Bid Optimizing

and Inventory Scoring in Targeted Online Advertising. KDD 2012 (Best Paper Award)

4. T. Raeder, O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Design

Principles of Massive, Robust Prediction Systems. KDD 2012

5. B. Dalessandro, O. Stitelman, C. Perlich, F. Provost Causally Motivated Attribution for

Online Advertising. In Proceedings of KDD, ADKDD 2012

6. B. Dalessandro, R. Hook. C. Perlich, F. Provost. Transfer Learning for Display

Advertising MLJ 2014

7. T. Raeder, C. Perlich, B. Dalessandro, O. Stitelman, F. Provost. Scalable Supervised

Dimensionality Reduction Using Clustering at KDD 2013

8. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, F. Provost. Using Co-

visitation Networks For Classifying Non-Intentional Traffic‘ at KDD 2013

46

Some References