26
Arif Wider & Christian Deger DATA SCIENCE, DELIVERED CONTINUOUSLY

Data Science - Delivered Continuously - XConf 2017

Embed Size (px)

Citation preview

Page 1: Data Science - Delivered Continuously - XConf 2017

Arif Wider & Christian Deger

DATA SCIENCE,DELIVERED CONTINUOUSLY

Page 2: Data Science - Delivered Continuously - XConf 2017

A CONFERENCE ALL ABOUT TECHNOLOGY

Page 3: Data Science - Delivered Continuously - XConf 2017

Christian DegerChief [email protected]@cdeger

Page 4: Data Science - Delivered Continuously - XConf 2017

Arif WiderSenior Consultant/[email protected]@arifwider

Page 5: Data Science - Delivered Continuously - XConf 2017

PL

S

RUS

UA

RO

CZ

D

NL

B

FA

HRI

E

BG

TR

18countries

2.4m+cars & motos

10m+users per

month

Page 6: Data Science - Delivered Continuously - XConf 2017

The task: A consumer-facing data product

6XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 7: Data Science - Delivered Continuously - XConf 2017

The task: A consumer-facing data product

7XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 8: Data Science - Delivered Continuously - XConf 2017

The task: A consumer-facing data product

8XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 9: Data Science - Delivered Continuously - XConf 2017

The prediction model: Random forest

9

Volkswagen GolfCar listings oflast two years

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 10: Data Science - Delivered Continuously - XConf 2017

How to turn an R-based prediction model into a high-performance web application?

10

?

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 11: Data Science - Delivered Continuously - XConf 2017

How to turn an R-based prediction model into a high-performance web application?

11XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 12: Data Science - Delivered Continuously - XConf 2017

How to turn an R-based prediction model into a high-performance web application?

12XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 13: Data Science - Delivered Continuously - XConf 2017

How to turn an R-based prediction model into a high-performance web application?

13

Continuous Delivery!

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 14: Data Science - Delivered Continuously - XConf 2017

Application code in one repository per

service.

CI

Deployment packageas artifact.

CD

Deliver package to servers

Typical delivery pipeline

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 15: Data Science - Delivered Continuously - XConf 2017

Continuous delivery pipelines

15

Prediction Model Pipeline

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 16: Data Science - Delivered Continuously - XConf 2017

Continuous delivery pipelines

16

Prediction Model Pipeline

Web Application Pipeline

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 17: Data Science - Delivered Continuously - XConf 2017

The price for CD: Extensive model validation

17XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 18: Data Science - Delivered Continuously - XConf 2017

The price for CD: Extensive model validation

18XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 19: Data Science - Delivered Continuously - XConf 2017

Lessons learned

19

Form a cross-functional team of data scientists & software engineers!

Software engineers… learn how data scientists work… and understand the quirks of a prediction model

Data Scientist… learn about unit testing, stable interfaces, git, etc.... get quick feedback about the impact of their work

Model and product iterations become much faster!

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 20: Data Science - Delivered Continuously - XConf 2017

Lessons learned

20

Generating gigabytes of Java code

is a challenge for the JVM

Use the G1 garbage collector

Turn off Tiered Compilation

Do extensive warm-ups

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 21: Data Science - Delivered Continuously - XConf 2017

Lessons learned – Warm up

21XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 22: Data Science - Delivered Continuously - XConf 2017

Lessons learned

22

The approach of applying Continuous Delivery to

Data Science is useful independently of the tech

Successfully applied similarly to a Python- and

Spark-based project

Even more useful when quick model evolution

is required because of rapidly changing inputs

(e.g. user interaction)

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 23: Data Science - Delivered Continuously - XConf 2017

Conclusions

23

Continuous Delivery allows us to bring prediction

model changes live very quickly.

Only extensive automated end-to-end tests

provide confidence to deploy to production

automatically.

Java code generation allows for very low response

times and excellent scalability for high loads but

requires plenty of memory.

XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 24: Data Science - Delivered Continuously - XConf 2017

Conclusions: Price evaluation everywhere

24XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 25: Data Science - Delivered Continuously - XConf 2017

Conclusions: Price evaluation everywhere

25XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger

Page 26: Data Science - Delivered Continuously - XConf 2017

QUESTIONS?

THANK YOU

Arif Wider & Christian Deger

@arifwider @cdeger