24

Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Embed Size (px)

Citation preview

Page 1: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Page 2: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Learn Like a Human – Taking Machine Learning from Batch to Real-Time

Elad Rosenheim

Page 3: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Who am IArchitect at Dynamic Yield,

“Predictors” Team Lead

Previously:AlphaCSPSAP

Performance & Scale, DevOpsMeasure All the Things!

East-Asia & Japan

Page 4: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Who’s Dynamic Yield?

We’re optimizing & personalizing websites since 2011

Start-up in Tel-Avivheaded by Liad Agmon

I Joined as 5th employee, we’re 50 now and growing fast

Page 5: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

On the AgendaOur clients’ problem

Old School Solutions

Meet the ML Bandits

Page 6: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Our clients’ problem

Publishers, retailers, SaaSall share a common problem

They know their domainbut not how to optimize for each user

Screen real-estate is limitedyet everyone sees the same thing

Page 7: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

What top videos to show on NBC News’ site?

What user segments should see this element at this location?

What’s the best layout for this element?

Page 8: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Both the layout of this page and each element in it deserve testing

What’s the best layout?

What types of products to show whom?

Page 9: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

What articles to show on ynet’s homepage? What titles and images?

In what order?

Page 10: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

What is the best default sort order for products on Adika?

Does is significantly differ between user segments?

Page 11: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

The BeginningFirst, there was the educated guess

Then, there was the A/B test"Data Beats Opinion“Freedom to experiment (with nice tools)Hopefully: less fear of change, less politics

How does it work?Split traffic between baseline and alternative variationsIn theory: sit & wait for significant resultsIn practice: peek at the numbers till the nice “95% confidence”

Page 12: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

A/B Tests: Already Old School?

While you wait, you're bleeding clicks

clicks == money

What about the really dynamic stuff?Campaigns, Current Headlines, Products on Sale

Page 13: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Enter the Multi-Arm BanditsA Single-Arm Bandit

Suppose I have multiple arms in front of me,each with its unknown mean reward…

How do I optimize income from multiple machines?Caution or Haste?

Explore vs. Exploit

In our context:How do I optimize multiple variations?

Page 14: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Bandits - A Classic Problem(Very) Simple Solutions

ε-greedy, ε–decreasingFirst 100% random explore, then ~90% exploit?Magic numbers, built-in revenue loss

Bayesian-based approachesSmoother curve from explore to exploit

“Winner” is now a less relevant term

Page 15: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Bandits work well when…

We want to find the variation “best on average“

…but we’re not improving the conversion rate of any single variation

2.4% 1.7% 0.4%

Page 16: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Enter PersonalizationEach of us is a beautiful and unique feature vector!

By showing the right variation to the right people,we can improve conversions per variation

and beat the best variation

ML Challenge Accepted

Page 17: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

The Usual Suspects

Collaborative Filtering?Very big, very sparse matrix

Cold StartBatch

Not suitable in this case

Classifiers?Logistic Regression, Random Forest et al.Periodically learn over all converters so far

More data == more time, bigger modelNot the classic question

Page 18: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

What We NeedLike a bandit, we need to learn as we go (not in batch),

but this time with “context” - the user’s data

Incremental Learning over the stream of impressions & rewards(“Partial Fit”)

We’re looking to…Start learning from the first impressionHandle the explore-exploit curveRun fast (enough)In the worst case: converge on the best variation, like a bandit

Page 19: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Meet the Contextual Bandits

They “eat” the data streamThey demand fast access to user data

Historical or immediateTheir model is always ready for action

In the PapersLinear Bayes, LinUCB

What we do: Per-Variation Logistic RegressionA variant supporting updates in “mini-batches”Exploration-on-topWorst case: “Garbage In Multi Arm Bandit Out”Light on memory, compact output

Page 20: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Online should be fast & scale

Offline: a testbed for iteratively testing new ideasNew algorithmsTweaked parametersFeature transformations

How We Do It: Online & Offline

Page 21: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

The Online Flow

DY Web Servers

a. get our scriptb. log impressions, conversions

Queue Per Test

Learn Workers

User DB

Persist ModelLoad to

Predict Server

Queue Per Test

A B C

A B C

A B C

Predictions

Page 22: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

The Offline Evaluator

Test, Improve, Iterate

Using real-world data

Using generated dataFrom easy to hard

Page 23: Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Going GlobalLearn in the center site, fast predict in each geo. How?

Push models via local Redis slaves Compressed SSH tunnel

User data - daily aggregationStorage into LMDB (simple, fast memory-mapped K/V DB)Sync via S3 (LZ4 compressed), read from SSD

Learn & Predict servicesPython as ML lingua franca: NumPy, SciPy, scikit-learn