Upload
yhat
View
106
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Yhat presentation at PyData Boston 2013. Predictive Models for Production Apps with Yhat. Building a beer recommender with Python and Yhat.
Citation preview
Predictive Models for Production Applications
July 2013
Why building analytical apps is hard
Overcoming the challenge
Case study: building a beer recommender
If you double the number of experiments you do per year, you're going to double your inventiveness.“
”- Jeff Bezos
We need to reduce churn. Okay. I'll look into it.
Lots of conversations like this
I figured out that....some complex stuff about vector space that'll improve...
....and that's how we'll reduce churn.
Sounds good. Let's do that...
The "a ha" moment isn't the end.
Now what?
Any of you know what Gradient Boosting is?
So when can we go live with the new model?
It's hard to incorporate analytical work into day-to-day operations
We know finding a data scientist tough.
http://drewconway.com/
Building applications from their insights is tougher.
"cool. what do we do now?" scenarios
http://scikit-learn.org/stable/auto_examples/index.html
Product Page Search Results
Order Confirmation / Checkout Page
Reduce QA time for classifying purchases made online
How to go from prototype to product?
How do companies address this problem?
Rewriting Code
Common Approaches Challenge
Rewriting Code
Common Approaches
Cross-environment validation
Challenge
Rewriting Code
Batch Jobs
Common Approaches
Cross-environment validation
Challenge
Rewriting Code
Batch Jobs
Common Approaches
Cross-environment validation
High maintenance and config
Challenge
Rewriting Code
Batch Jobs
PMML
Common Approaches
Cross-environment validation
High maintenance and config
Challenge
Rewriting Code
Batch Jobs
PMML
Common Approaches
Cross-environment validation
High maintenance and config
Limited to certain libraries, Still rewriting
Challenge
Rewriting Code
Batch Jobs
PMML
Common Approaches
Cross-environment validation
High maintenance and config
Limited to certain libraries, Still rewriting
Challenge
More people, more tools, more time to market.
Can we build and bring to market smarter applications faster?
Rewriting Code
Batch Jobs
PMML
Common Approaches
Cross-environment validation
High maintenance and config
Limited to certain libraries, Still rewriting
Challenge
A platform for running predictive models in production applications.
Key Tenets
1. Work with the tools you already know
Key Tenets
1. Work with the tools you already know2. Iterate quickly
Key Tenets
1. Work with the tools you already know2. Iterate quickly3. Low touch
Key Tenets
1. Work with the tools you already know2. Iterate quickly3. Low touch4. No rewriting code
Key Tenets
demo
A Beer Recommender in Python
What beer should I drink?
Tell us a beer you like
We'll tell you some other beers you'll like
1. Import the data2. Find common reviewers3. Calculate review distance4. Rank beers
Plan
1. Import the data2. Find common reviewers3. Calculate review distance4. Rank beers
Plan
The Dataset
The Dataset
The Dataset
1. Import the data2. Find common reviewers3. Calculate review distance4. Rank beers
Plan
Find common reviewers
1. Import the data2. Find common reviewers3. Calculate distance4. Rank beers
Plan
Comparing 2 Similar Beers
vs
Dale's Pale Ale and Fat Tire Amber Ale
Dale's Pale Ale and Fat Tire Amber Ale
"Perfect Agreement"
Dale's Pale Ale and Fat Tire Amber Ale
Dale's Pale Ale and Fat Tire Amber Ale
Dale's Pale Ale and Fat Tire Amber Ale
Dale's Pale Ale and Fat Tire Amber Ale
Similar reviews
Comparing 2 Dissimilar Beers
vs
Dale's Pale Ale and Fat Tire Amber Ale
Michelob Ultra and Fat Tire Amber Ale
Measuring distance
Measuring distance...yes, there are other ways to do this.
Calculating Distance
Distance Implementation
Calculate the Distance
● Generate all beer pairs● Calculate distance between each pair
1. Import the data2. Find common reviewers3. Calculate review distance4. Rank results
Plan
So if I like Coors Light, what other beers might I like?
shipping your work with
Make analytical routines available to other apps
{ "beer": "Coors Light", "weights": [3, 2, 0, 1]}
[ ["Bud Light", 9.2], ["Budweiser", 12.2], ["Sierra Nevada", 21.2],]
1. Pre-processing & transformations
2. Prediction & post-processing
Use the same code you wrote during exploration and modeling.
2. Prediction & post-processing
We're ready to deploy
Analytical routine is now ready to be deployed Go-to-market with as little overhead as possible
Deploy
Pass objects you'd like included in your model as named arguments
Specify User Defined Functions you want to include in your project
Deploy
Pass the name of your model and your BaseModel object to the deploy function
Deploy
Execute deploy to host your model on Yhat
Deploy
Make predictions in a production app
pydata-beer.herokuapp.comhttps://github.com/yhat/Beer-Rec-Flask
Data/Code Bundle
Webapp: https://github.com/yhat/Beer-Rec-FlaskIPython Notebook: http://bit.ly/1bkCTHzDataset: http://bit.ly/14Wl64k
yhathq.com@YhatHQ
blog.yhathq.com
Appendix
Learn by iteration from the context of real-world business applications.
Deployment
Execute the deploy function to host your model on Yhat
Pass the name of your model and your BaseModel object to the deploy function
Pass objects you'd like included in your model as named arguments
Specify User Defined Functions you want to include in your project