ML: from research to practice•Chuyên Toán at PTNK –ĐHQG TPHCM (99-02) •B.S. in Math and CS at UT Austin (2006) •Ph.D. in Computational & Applied Mathematics at UT Austin

ML: from research to practice@AI4Life, 05/2018

Kenneth Tran

Principal ML Scientist

Microsoft Research

About MSR

• Microsoft Research is an academic arm of Microsoft• Mission: advance the state-of-the-art of computing

and solve challenging world problems

• The most academic among industry research labs

• Multiple Turing award winners, Fields Medal winners, etc.

• MSR did research in ML long before it became phenomenon

• MSR AI = AI division of MSR

My Background

• Chuyên Toán at PTNK – ĐHQG TPHCM (99-02)

• B.S. in Math and CS at UT Austin (2006)

• Ph.D. in Computational & Applied Mathematics at UT Austin (2012)

• Microsoft Research ML Groups (2012 – Present)currently as a Principal ML Scientist/Engineer

Previous Research Work

• DL in Computer Vision and Language Understanding• Object Detection and Semantic Segmentation

• Image Captioning (check out https://www.captionbot.ai)

• Black-box Optimization

• ML/DL Platforms

https://www.captionbot.ai/

Current Research Work

• Reinforcement Learning• Sample efficient learning

• Off-policy

• Nonmarkovian

• Model-based


• Reinforcement Learning

• Deep4Cast• Time Series Forecast using CNN + RNN

• With uncertainty estimate

• Open source: https://github.com/MSRDL/Deep4Cast

https://github.com/MSRDL/Deep4Cast



• Deep4Cast

• ML in Controlled Environment Agriculture (CEA)



• Deep4Cast

• ML in Controlled Environment Agriculture (CEA)

AI beyond prediction and perception

ML/DL: from research to practice10 Lessons

Inspirations

#1. MetricsData and Models are great. You know what’s even better?

The right evaluation approach.

Case: Fraud Detection

Characteristics• Class imbalance

• Different costs for FP and FN

Case: Fraud Detection

So what metrics should we use?• Accuracy

Metrics


• ROC AUC

Metrics


• ROC AUC

• PR AUC

Metrics


• ROC AUC

• PR AUC

• How about simple economics

𝐿 = 𝑁𝐹𝑃 × 𝐶𝐹𝑃 +𝑁𝐹𝑁 × 𝐶𝐹𝑁

#2. Uncertainty: your model should be able to tell what it doesn’t know

Predict with uncertainty

www.CaptionBot.ai

Seeing AI: https://www.youtube.com/watch?v=bqeQByqf_f8

https://www.youtube.com/watch?v=bqeQByqf_f8

Other use cases

• Demand forecasting

• Autonomous or semi-autonomous driving

• Health care

• Any application in which a misclassification (or misprediction) is costly or when the prediction is an input to a decision making process

#3. Understand the inter-dependency between models and featuresThe fact that a more complex model does not improve things does not mean you don’t need one

Better models and features that don’t work

Imagine the following scenario

• You have a Random Forest model and for some time you have been selecting and optimizing features for that model

• If you try a Neural Nets model with the same features you are not likely to see any improvement

• If you try to add more expressive features (e.g. text embedding), the existing random forest model is likely not to capture them and you are not likely to see any improvement

It’s important to understand the interplay between features and models.The fact that a complex model doesn’t work well doesn’t imply that you should discard it.

A counter scenario

• A company/team is tasked to solve a ML problem

• They spend lots of effort on a DL model and very little effort on the other approaches

• They later claim having improved the results using DL

• Slight improvement using DL used to generate more PR (and hence promotion/investments) than more substantial improvements with non-DL methods

A counter scenario

• Slight improvement using DL used to generate more PR (and hence promotion/investments) than more substantial improvements with non-DL methods

• This was fairly common in “Silicon Valley”

#4: model performance is a monotone function of engineering effort

Better results don’t always imply smarter model. Be aware of hype vs. substance.

#5. You may not need all your Big Data

“Big data is like teenage sex; everyonetalks about it, nobody really knowshow to do it, everyone thinkseveryone else is doing it, so everyoneclaims they are doing it.”

- Dan Ariely, Duke University (2013)

Slide Credit: Xavier Amatriain

#6. You mostly don’t need distributed ML

Distributed ML is another “dangerous” trend, similar to “Big Data”

#7. You likely don’t need online learning

Most commercially deployed models are trained/retrained offline!

What is online learning?

Online Learning is an approach that updates/mutates the model for every new data point

Common confusion: online learning != online prediction

Why not online learning

• In most applications: you don’t need online learning• Your model should learn to capture the pattern in the data

• Pattern doesn’t change on example basis

• Optimization perspective: online learning is sub-optimal because you can visit each example only once

• Software engineering perspective: online learning is bad because the model is mutable and hence hard to debug

#8. Be aware of feedback loops

• A model may directly influence the selection of its own future training data.

• Learn contextual bandits

Direct Feedback Loops

• Example: 2 stock-market prediction model

Hidden Feedback Loops

#9. Transfer Learning from Software Engineering

• Model versioning

• Experiment review

• Hierarchical and compositional mindset• DL was inspired by this!

#10. AI – The revolution hasn’t happened yet

#10. AI – The revolution hasn’t happened yet

• Start simple and take incremental steps• Make sure you understand what you are learning/doing at each step

• Start with real problems, then identify the technologies• Common mistakes: start with (hyped) technologies, then find applications

• Examples: chat bots, personal assistants, etc.

• There are many simple, unsexy, but high-value ML problems

Thank [email protected]

[email protected]

mailto:[email protected]

mailto:[email protected]

Documents

ML: from research to practice•Chuyên Toán at PTNK –ĐHQG TPHCM (99-02) •B.S. in Math and CS at UT Austin (2006) •Ph.D. in Computational & Applied Mathematics at UT Austin