46
Scaling Machine Learning @ Careem

Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Scaling Machine Learning @ Careem

Page 2: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Who?

Ahmed Kamal- Machine Learning Platform Lead @ Careem, - Computer Engineer by training

I blog @ ahmedkamal.me and tweet @_akamal_

Representing the work of Yoda team and other awesome colleagues @ Careem

Page 3: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Who are we?

Page 4: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Careem was founded in 2012,in Dubai, UAE...

Page 5: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Our mission is to simplify and improve the lives of people and build an awesome organization that inspires

Page 6: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Multi-vertical platform of Mobility, Delivery, Payments

15 Countries 120 Cities 3,500 Colleagues

33M Customers 1M Captains

Page 7: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

We are the leading technology platform in the middle east !

Page 8: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

AI Use cases we have at Careem

Page 9: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

● Predict demand and supply

○ Essential for peak and dynamic pricing

○ Plan ahead while dispatching

● Fulfill our promise to customers and captains

○ Predict ETAs (Expected time of arrival)

○ Predict prices

● Predict customer and captain’s behavior

○ Predict cancellations

○ Predict captain acceptance

● Platform Integrity

○ Fraud Prevention and Detection

○ Anomaly Detection

Sneak Peek into ML @ Careem

Page 10: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Example

6 min

MaryamCaptain Oussama

Page 11: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Example

6 min

8 minCaptain

Oussama

Maryam

Rashid

Page 12: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Example

6 min

8 min

Captain Oussama

Maryam

Rashid

Proba(Cancellation) = 0.8

Proba(Cancellation) = 0.1

Page 13: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Demand prediction (Time Series Forecasting)

Reality Prediction

Page 14: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

You are now one of us

14

Page 15: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Let’s Solve a Problem !

Page 16: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Estimated Time Arrival

Problem Definition

Calculate the post assignment ETA for a captain for specific booking accurately in real time

Problem Type

Regression

Page 17: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

user_id : 1car_type : “Business”

captain_long": 55.1486643,captain_lat": 25.0911563,

booking_long": 54.1506833,booking_lat": 25.084343

Dataset Sample

Page 18: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

R&D workflow

Get Data

Train Model

PersistModelModeling

Page 19: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

ML in real world (or why do we need a MLE)

Page 20: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Data is huge :)

ML in real world (or why do we need a MLE)

Page 21: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Data is huge :)

ML in real world (or why do we need a MLE)

Feature Engineering takes long time

Page 22: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Data is huge :)

ML in real world (or why do we need a MLE)

Feature Engineering takes long time Training needs lots of resources

Page 23: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Data is huge :)

ML in real world (or why do we need a MLE)

Feature Engineering takes long time Training needs lots of resources

Page 24: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Data is huge :)

ML in real world (or why do we need a MLE)

Feature Engineering takes long time Training needs lots of resources

Needs to be on Production

Page 25: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Data is huge :)

ML in real world (or why do we need a MLE)

Feature Engineering takes long time Training needs lots of resources

Build an API

Needs to be on Production

Low Latency & High Throughput

Monitoring & Alerting

Fault Tolerance & Autoscaling

Page 26: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

How much time ?

Page 27: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Congrats on the amazing model !

Page 28: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Continuously refresh and updated deployed models

Post Deployment Challenges

Page 29: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Continuously refresh and updated deployed models

Post Deployment Challenges

Performance Monitoring and alerting

Page 30: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Continuously refresh and updated deployed models

Post Deployment Challenges

Performance Monitoring and alerting

A/B Testing between new/old or new/new models

Page 31: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Continuously refresh and updated deployed models

Post Deployment Challenges

Performance Monitoring and alerting

A/B Testing between new/old or new/new models

Which cities are ready for ML ?

Page 32: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Continuously refresh and updated deployed models

Post Deployment Challenges

Additional 119 models ?

Performance Monitoring and alerting

A/B Testing between new/old or new/new models

Which cities are ready for ML ?

Page 33: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Continuously refresh and updated deployed models

Post Deployment Challenges

Additional 119 models ?

Performance Monitoring and alerting

A/B Testing between new/old or new/new models

Too many APIs ? Integration headache !Which cities are ready for ML ?

Page 34: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Is this real ! Google says, yes :)

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Page 35: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Yoda

by

Page 36: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Yoda

● Our mission as Yoda team: Accelerate the adoption of AI at Careem.

● We are building an in-house Machine Learning Platform, to automate the end to end experience.

● To enable expeditious development and deployment of 1000+ of ML models in production

Serve modelsMonitoring and alerting

Fetch dataClean dataFormat data

Feature engineeringBuilding modelsEvaluating models

Machine learning cycle

Page 37: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Your Life After Yoda

Page 38: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Dataset Generation

Page 39: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Model Training

Page 40: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

40

Model Report

Page 41: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

- Configuration Based Deployment service enabling deployment in few minutes.

- Latency, integration and API tests

- Auto-Rollout Capabilities to smoothen model updates experience

One Click Deployment

Configs Production Level APIOne Click !

Page 42: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

URL => eta-service.yoda.com/v1/101/eta

Request =>

[{"uuid": "9fdsaf9as9da9sd9", "assignment_time": "2019-03-14 14:09:46", "captain_lat": 30.0039,

"captain_long": 31.1422, "booking_lat": 30.0022, "booking_long": 31.1405}]

Response =>{

"response": [{

"uuid": "9fdsaf9as9da9sd9",

"prediction": 2.5

}

],

"result": "ok"

}

Now you have an API

Page 43: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Model Serving

Page 44: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Summary of Yoda

Page 45: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

• Enrich Farabi DB (Feature Store)

• Auto-ML

• AI for everyone

• Time Series Forecasting

• Anomaly Detection

• Natural Language Processing

What’s Next for Yoda?

Page 46: Scaling Machine Learning @ Careem-Care… · Predict ETAs (Expected time of arrival) ... Feature Engineering takes long time Training needs lots of resources Needs to be on Production

Thank you! Danke! Teşekkürler!شكراً! شکریہ!

For Slides : @_akamal8_