22
Machine Learning Real-Life Data & ML in Production @benfreu Ben Freundorfer

Machine Learning in Production

Embed Size (px)

Citation preview

Page 1: Machine Learning in Production

Machine LearningReal-Life Data & ML in Production

@benfreu Ben Freundorfer

Page 2: Machine Learning in Production
Page 3: Machine Learning in Production

Costs

Page 4: Machine Learning in Production

What’s a model

Page 5: Machine Learning in Production

Many algorithms are a bunch of matrix calculations.

• Costly to train models

• Cheap to apply models (predict)

Page 6: Machine Learning in Production

Human work

Page 7: Machine Learning in Production

Real-Life Data

Page 8: Machine Learning in Production

TransformationTransform relational data into vectors

All algos need: matrices of numbers

Some need0.0 ≤ x ≤ 1.0mean=0σ=1

Look out for algos requiring „normalized“ or „standardized“ values → feature scaling

Page 9: Machine Learning in Production

Categories

• Features with no numerical relation

• Category 5 doesn’t have 5x the y of category 1

• Fix: Dummy variables

• cat_1, cat_2, … cat_5 with values 0 or 1

Page 10: Machine Learning in Production

Missing Values• days_since_last_purchase = null

How to deal with this? 0 or 999?

• Often intuitively clear from the data domain One solution: max(days_since_last_purchase of other users)

• HAS to be addressed

Page 11: Machine Learning in Production

Outliers

• days_since_last_purchase = 2837 for a legacy customer

• If it’s irrelevant, get rid of the whole example (legacy customer)

• Or cap at a max/min value

Page 12: Machine Learning in Production

Reduce Features

• check for correlation between features. get rid of correlated ones

• get rid of intuitively useless features

Page 13: Machine Learning in Production

A Better Model

• Less features - i.e. is simpler

• Trained on more training examples

Page 14: Machine Learning in Production

Moving to Production

Page 15: Machine Learning in Production
Page 16: Machine Learning in Production

Online vs Offline

OFFLINE From time to time retrain whole model and upload model

ONLINE Algorithm runs each time a new example is added and adapts the model a bit

examples should be randomized

Page 17: Machine Learning in Production

ExamplePredict which category user will buy from after

newsletter-signup

Page 18: Machine Learning in Production

Build Model• Collect data

Traffic source, categories looked at prior to signup, etc. and y = category of purchase after signup

• Analyze Try to make predictions using e.g. logistic regression

• Train final model

• Save weights to DB or JSON or file

Page 19: Machine Learning in Production

Predict• User signs up

• Load weights and predict probabilities of categories.

• If P(category X) > thresholdclassify user as „interested in category X“

• Send out newsletters

Page 20: Machine Learning in Production

Tips• Use R or Python/Jupyter/Pandas to analyze data

• Test if you need a separate system for predictions or just for training

• Try not to implement algos yourself If you do, use numerical computation libraries (probably wrappers for C or Fortran code)

• Be sure the past predicts the future

Page 21: Machine Learning in Production

Ethics

• Your model might turn into a racially profiling sexist.

• Be aware of what your input features mean & what you actually base your predictions on

• Relatively harmless when predicting product categories - questionable for credit ratings

Page 22: Machine Learning in Production

Thank youBen Freundorfer

@benfreu