How did you know this Ad will be relevant for me?!

Preview:

DESCRIPTION

Predicting the most relevant ad at any point in time for every individual is how Rocket Fuel optimizes ROI for an advertiser. One of the factors influencing this prediction is a consumer's online interactions and behavioral profile. With more than 45 billion interactions being processed daily, this data runs into several Petabytes in our Hadoop warehouse. Running machine-learning algorithms and Artificial Intelligence on this vast scale requires many practical issues to be addressed. First, behavioral patterns are shortlived, so to accurately reflect the tendencies of a consumer, we need to curate and refresh his or her profiles as quickly as possible while avoiding multiple scans over the raw data and dealing with issues like transient system outages. Second, we must address the difficulty of building models utilizing behavioral profiles without overwhelming our Hadoop cluster. At this scale, frequent refreshes of several models can place an undue burden on even a thousand-node cluster. In this talk, we will dive into (a) the practical challenges involved in designing a highly scalable and efficient solution to build behavioral profiles using Hadoop framework and (b) techniques for ensuring reliability and availability of mission critical machine learning pipelines.

Citation preview

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ?

Savin GoyalSivasankaran Chandrasekar

Proprietary & Confidential. Copyright © 2014.Proprietary & Confidential. Copyright © 2014.

ADVERTISER

ROCKET FUEL

200+RTB

advertisingsupply

partners

50+ MnWebsites

50+ BnDaily impressions

3B WW CONSUMERS

100,000+ DEVICES

Proprietary & Confidential. Copyright © 2014.

Exchanges

AdExchange

Rocket Fuel Platform

Auto Optimization

Real-Time Bidding

Agencies

Data Partners

Display Advertising Ecosystem

Proprietary & Confidential. Copyright © 2014.

Bid on Ad

User Data

Bid Request

Rocket Fuel Winning AdAd Request

Ad Served to User

Page RequestWeb Browser

Rocket Fuel Platform

Smart Ad Servers

Response Prediction

Models

1

8

2 7

Calculate Propensity Score

5User Engagement Recorded

9 User Engages with Ad

Publishers

Refresh learning

Campaign & Audience

Data

4

Qualify Campaign

10

3

6

Data Partners

Exchange Partners

Programmatic Buying

Proprietary & Confidential. Copyright © 2014.

$2.38965$0.6782$1.7234

$0.09$1.78964$1.6782$1.7234$0.809$2.421.25

$2.11$1.26

$2.178$2.056$0.809$2.421.25

$2.11$1.26$2.78$1.56

$1.809$2.421.25

$2.11$1.26$2.78$0.56$2.421.25

$2.11$1.26$2.78

$0.756$0.809$2.421.25

$2.11$1.26$2.78

$1.256$1.809$2.421.25

$2.11$1.26$2.78

$0.586$2.009

1.25$2.11$1.26$2.78$1.56

$0.00

Site/PageGeo/WeatherTime of DayBrand AffinityUser

[ + ][ + ]

Real Time Auction

Proprietary & Confidential. Copyright © 2014.

Goal:Leads& sales

Goal:Coupondownloads

Goal:Brandawareness

Site/PageGeo/WeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse

X

Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+9.7%

+40-70-20+10+15-25-40

-18+0.7

%

+10-10-20+20+10-35-25+10

+1.4% X✓

Real Time Auction

Proprietary & Confidential. Copyright © 2014.

Scalable Predictive Models

Age/Gender

Occupation

IncomeEthnicity

Purchase Intent

OnlinePurchases

OfflinePurchases

BrowsingBehavior

Site Actions

Zip CodeCity/DMA

Search Sites

SearchCategories

Recency

Search Keywords

Web Site/Page

Referral URL

Site Category

Bizographics

Social

Interests Lifestyle

Positive Lift

Marginal Impact

Negative Lift

-7

+17

X

-2

+8

+14

X

-9

-13

-12

X

+19

+13

+11

X

+11

X

XX

+25

+6

X

-7 +17

-2

+28

X

+11

X

X

-9

+14

+17 +19

+8 +11

X

X

-9

+17

-23

+6

X

+17

-7

X

-2

-13

-12

X

+13

+6

+11

XX

X-9 X

+17

X

+19

+8

+14

+18

-23

+17

-12

+11

-9

+8 +14X

+11

-13

-12

+13

+11

X

X

-7

+17 +8

+18X

+11X -12-10

+6

+14

X

+8

+11-10+13

+28 +6

+13+19

X

+8

+11

-10

+13

-12

+17

X

-7

+8

X

Automated Feature Selection

▪ Infinite number of models

▪ Determine perfect model size

▪ Balance past data fit

and future generalization

Learn-Test-Refine

▪ Automatically learn from

each response

▪ Cross-validate - A / B testing

infrastructure

▪ Training pipeline

Proprietary & Confidential. Copyright © 2014.

Throughput

Proprietary & Confidential. Copyright © 2014.

Rocket Fuel Scale

▪ 34,474 CPU Processor Cores▪ 2655 servers▪ 187.4 Teraflops of computing

▪ 188 Terabytes of memory▪ 13X the memory of Jeopardy-

winning IBM Watson

▪ 42 Petabytes of storage▪ 106X the data volume of entire

Library of Congress

Proprietary & Confidential. Copyright © 2014.

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Data Warehouse Growth

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting

▪ Leverage online activities on the web to learn about user’s ▪ Long Term Interests

▪ User is interested in luxury cars▪ Short Term Interests

▪ User is looking for a pizza right now

▪ Expand user set beyond retargeting▪ Explore v/s Exploit

▪ Identify relevant users even if they have never been targeted previously

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting @ Rocket Fuel

Label Data

Train Model

Back Test

Calibrate

TrainingEvents

Pixel Stream Ad Logs

BT Features (HBase)

Feature Generation

Score Profiles

Profile Generation

Scoring

Ad Serving Data Centers Model

Proprietary & Confidential. Copyright © 2014.

Hadoop/HBase @ Rocket Fuel

▪ Cluster Highlights▪ 650+ Slaves (64 GB + 12 *3 TB)▪ 20 PB Storage▪ HA Name Node Set Up▪ 9k Map Slots + 5.5k Reduce Slots▪ Co-located to run HBase for offline processing

▪ HBase 0.94.15▪ 5 Node ZooKeeper quorum▪ Monitoring with OpenTSDB▪ Dual Master Setup

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting @ Rocket Fuel

bmw.com 11:23

Cars 11:23

pizzahut.com 11:26

Food 11:26

honda.com 11:27

Cars 11:2730 minutes

honda.com

11:27 Recent 6 hours: 5

Between 6 and 12 hours: 3

Between 12 hours and …

Food 11:26 Recent 6 hours: 2

Between 6 and 12 hours: 7

Between 12 hours and …

Read events of last N days

Recency

Frequency

Others..Behavioral Targeting Profile

11:23 11:26 11:27

Proprietary & Confidential. Copyright © 2014.

HBase Data Model

11:23ABCD06EFG

2014060416:site:bmw.com 2014060416:category:food

11:26

row_key: user_id

Single Column Family “u”

Column Qualifier:<date><hour>:<type>:<value>

Cell Value: [Protobuf]Most recent timestamp, Event details relative to timestamp

Event details relative to 11:23 Event details relative to 11:26

• Efficient look up for a given user

• Access range of events by event date, hour and type

Proprietary & Confidential. Copyright © 2014.

Proprietary & Confidential. Copyright © 2014.

Key Challenges

User Profile Freshness Scaling Issues Pipeline Failures

Proprietary & Confidential. Copyright © 2014.

User Profile Freshness

▪ Strict latency requirements▪ Recent activity much better predictor

Solutions - ▪ Staggered Pipelines▪ Real Time Behavioral Targeting

Proprietary & Confidential. Copyright © 2014.

Staggered Pipelines

Extract Score Filter Upload

Extract Score Filter UploadSource Data

Extract Score Filter Upload

Extract Score Filter Upload

Extract Score Filter Upload

Proprietary & Confidential. Copyright © 2014.

Real Time Behavioral Targeting

Proprietary & Confidential. Copyright © 2014.

Batched Profile

Blackbird – HBase instance tuned for 2ms latencies

Refreshed every N hours

Real Time Behavioral Targeting

Offline BT Pipeline

BT Profile

Ad Servers Merge Profiles

Logs

Blackbird

Online Profile

Record events for users in real time

Request

Response

Proprietary & Confidential. Copyright © 2014.

Batched Updates vs. Real Time Updates

Event Granularity Aggregated over several hours/days

Raw recorded events appended for recent

N hours

Processing Load Requires minimal CPU processing

Needs aggregation on-the-fly

Disk FootprintCompact

representation captures several days

Strict limits to ensure read times are

acceptable

Coverage All interactions Only interactions at a data center

▪ Real Time Profile updated in milliseconds

▪ Batched Profile refreshed every N hours

Batched Profile Real Time Profile

Proprietary & Confidential. Copyright © 2014.

Scaling Issues

▪ 3X growth in events processed/year▪ First Party Data▪ App Interactions▪ Geo-location Data▪ …

▪ Case Studies▪ HBase Region Hot-spotting▪ Network Bandwidth Troubles

Proprietary & Confidential. Copyright © 2014.

HBase Region Hot Spotting

Proprietary & Confidential. Copyright © 2014.

HBase Region

HBase Region Hot-spotting

High Write Load

HBase Region

HBase Region

Region Split (painful!)

Some users more active than othersNo control on user id’s generated

Still problematic

Non-uniform

distribution!

Proprietary & Confidential. Copyright © 2014.

HBase Region Hot-spotting

▪ Uneven write-load distribution▪ Non-Uniform Row Key Distribution

▪ Salt row key’s to ensure uniform distribution▪ Fixed length hashed prefix▪

Murmur hash based prefix

Original User ID

▪ Uniform pre-splits

Proprietary & Confidential. Copyright © 2014.

HBase Region Hot-spotting

▪ Don’t stop at salting▪ Map input splits configured for region boundaries

Region 1\x03\x85\x1E\xB8ZZZZZZ

Region 2\x07\x5C\xF5\xC2928ZZ

Region m\xFF\xAE\x14\xE1Z28ZZ

12345571234568123457912345831234594

..

..

..

..ZZAHT654ZZZGT934ZZZZNGA2ZZZZKLO1

Key Partitioner

‘k’ splits ‘m’ regions‘m’ splits

\x01\x85\x1E\xB811ZKL1\x01\x86\x1E\xB8129542

..\x03\x85\x1E\xB8ZZZKL1

\x05\x35\x9E\x18087KL1\x06\x86\x1E\xB8AHV24

..\x07\x5C\xF5\xC16534Z

\xEB\x27\x92\x1508RKL1\xFE\x86\x1E\xB8AHV24

..\xFF\xAE\x14\x126534Z

Proprietary & Confidential. Copyright © 2014.

HBase Key Partitioner

▪ As many splits as regions to maximize parallelism

▪ Key Partitioner (MR) – ▪ Reads region boundaries of HBase table▪ Salts and sorts row key accordingly▪ Multiple Output Format to optimize reduce phase▪ Each generated split file corresponds to a single region

▪ Drastically reduces read latencies

Proprietary & Confidential. Copyright © 2014.

Network Bandwidth Troubles

Proprietary & Confidential. Copyright © 2014.

Data Center Expansion

Proprietary & Confidential. Copyright © 2014.

Network Bandwidth Constraints

▪ Consistently overshot bandwidth limit during uploads▪ All sorts of delays (Redis, MySQL, Blackbird…)▪ Bidding hampered

Proprietary & Confidential. Copyright © 2014.

Solutions

▪ Intelligent storage – protobufs everywhere

▪ Throttle writes

▪ Geo-splitting

Proprietary & Confidential. Copyright © 2014.

Geo Splitting

Proprietary & Confidential. Copyright © 2014.

Geo-splitting

▪ Tag user’s location history & predict future data center visits

▪ ⨍(dc, geo_history, bt_profile)

▪ A separate workflow periodically generates geo-split rules:▪ Clusters users & analyzes migration patterns▪ Ensures maximal look-up coverage of profiles▪ Minimizes total number of profiles stored

▪ Ensures efficient use of resources, with minimal impact on perf

Proprietary & Confidential. Copyright © 2014.

Geo-splitting

Label Data

Train Model

Back Test

Calibrate

TrainingEvents

Pixel Stream Ad Logs

BT Features (HBase)

Feature Generation

Score Profiles

Profile Generation

Scoring

Ad Serving Data Centers Model

Cluster Users

Analyze Patterns

Generate Rules

Geo-split

Proprietary & Confidential. Copyright © 2014.

Proprietary & Confidential. Copyright © 2014.

Quick Recovery From Failures

▪ Break pipeline into short payloads▪ Fail fast, recover fast!▪ Actionable alerts, cut down noise

Proprietary & Confidential. Copyright © 2014.

Quick Recovery From Failures

▪ Materialize data as frequently as possible▪ Cross system fault tolerance▪ Idempotency

▪ Backfill at EOD to plug holes if needed

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Proprietary & Confidential. Copyright © 2014.

We Are Hiring!

Proprietary & Confidential. Copyright © 2014.

Proprietary & Confidential. Copyright © 2014.

Questions ?

Thank You!

Sivasankaran Chandrasekarchandra@rocketfuel.com

Savin Goyalsavin@rocketfuel.com

Proprietary & Confidential. Copyright © 2014.

We are hiring! (as always)

http://rocketfuel.com/careers

savin@rocketfuel.comchandra@rocketfuel.com

Proprietary & Confidential. Copyright © 2014.

Recommended