30
PRETZEL: Opening the Black Box of ML Prediction Serving Systems Yunseong Lee s , Alberto Scolari p , Byung-Gon Chun s , Marco Domenico Santambrogio p , Markus Weimer m , Matteo Interlandi m

ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

PRETZEL: Opening the Black Box of ML Prediction Serving Systems

Yunseong Lees, Alberto Scolarip, Byung-Gon Chuns,Marco Domenico Santambrogiop, Markus Weimerm, Matteo Interlandim

–„„fi ¿� ¶ �

–„¿�„fi ¿� ¶ �

¿�„fi ¿� ¶ �A

¿�„fi ¿� ¶ �B

BS06‰ –�·ˇ ‡�-� ¿� ¶ �

‰ –�·ˇ ‡· �…›¿�·º —–‡� �‰�”…‚¶ '¿ �• �� ‚��� �`¶ ��»�‚» · �� �‚• ,���· ��ß¿º‚¯ …� �»� †¿¡��߶���ł�£,�•„��� ¿�����»���•` ¿'�

�ß�� ��� �»�…– ˆ ¿'�»�¿º �·�.�–�·��œ�¿º��¿¡�‚ ��• � ¿� ¶ �,�»� ˇ ¶ ����� �‰ –�·ˇ ‡� � ‚ƒ��‡„� ¿·��� ¿� ¶ ��¶ · �»� ˇ ¶ �� �

¿�…–��ß¿º–� ��”�”���• ��˛ ��� �‚‚�,�»�¿º‰ �”æ• ‡“��£��,�¯'–�‚ƒ���� • �”fl�� ��…���ł·�.�‰ –�·ˇ ‡� �»�»��”�Full�Color���¿º»�»��»�

¿�…–�ß�‚• �»�¿º ‚�����„�� �»�»�� �¿º¿¡�·º ��»� ��”�”»��¡����¶�� � �»�»� �¿º�»� ��� �·�.���»��”�CD-Rom ¿¡�…�• � ��¥�� ‚�»�¿º�»�

¿ł ¢�‚• � �·�

Page 2: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Machine Learning Prediction Serving1. Models are learned from data2. Models are deployed and served together

Prediction serving

UsersServerData

Training

ModelLearn Deploy

Performance goal:1) Low latency2) High throughput3) Minimal resource usage

2

Page 3: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

ML.Net

ML Prediction Serving Systems: State-of-the-art

• Assumption: models are black box• Re-use the same code in training phase

• Encapsulate all operationsinto a function call (e.g., predict())

• Apply external optimizations

TF ServingClipper

Prediction ServingSystem

“Pretzel is tasty”

cat

car

Text Analysis

ImageRecognition

Resultcaching

Replication

ensembleRequestBatching

3

Page 4: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

How do Models Look inside Boxes?

<Example: Sentiment Analysis>

Pretzel is tasty Model J vs. L(text) (positive vs. negative)

4

Page 5: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Tokenizer

How do Models Look inside Boxes?

<Example: Sentiment Analysis>

Pretzel is tasty

CharNgram

WordNgram

Concat LogisticRegression

DAG of Operators

J vs. L

FeaturizersPredictor

5

Page 6: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Tokenizer

How do Models Look inside Boxes?

<Example: Sentiment Analysis>

Pretzel is tasty

CharNgram

WordNgram

Concat LogisticRegression

DAG of Operators

J vs. L

Split text into tokens

ExtractN-grams

Merge two vectors

Compute final score

6

Page 7: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Many Models Have Similar Structures

• Many part of a model can be re-used in other models

• Customer personalization, Templates, Transfer Learning

• Identical set of operators with different parameters

7

Page 8: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Outline

•Prediction Serving Systems• Limitations of Black Box Approaches•PRETZEL: White-box Prediction Serving System• Evaluation•Conclusion

8

Page 9: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Limitation 1: Resource Waste

•Resources are isolated across Black boxes

1. Unable to share memory spaceè Waste memory to maintain duplicate objects

(despite similarities between models)

2. No coordination for CPU resources between boxesè Serving many models can use too many threads

machine9

Page 10: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Tokenizer

CharNgram

WordNgram

Concat LogReg

Limitation 2: Inconsideration for Ops’ Characteristics

1. Operators have different performance characteristics• Concat materializes a vector• LogReg takes only 0.3% (contrary to the training phase)

2. There can be a better plan if such characteristics are considered• Re-use the existing vectors• Apply in-place update in LogReg

0% 20% 40% 60% 80% 100%Latency breakdown

23.1 34.2 32.7

0.3

9.6

CharNgram WordNgram Concat LogReg Others

10

Page 11: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Limitation 3: Lazy Initialization• ML.Net initializes code and memory lazily (efficient in training phase)• Run 250 Sentiment Analysis models 100 timesè cold: first execution / hot: average of the rest 99• Long-tail latency in the cold case• Code analysis, Just–in-time (JIT) compilation, memory allocation, etc• Difficult to provide strong Service-Level-Agreement (SLA)

Tokenizer

CharNgram

WordNgram

Concat LogReg 13x

444x

11

Page 12: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Outline

• (Black-box) Prediction Serving Systems• Limitations of Black Box Approaches•PRETZEL: White-box Prediction Serving System• Evaluation•Conclusion

12

Page 13: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

PRETZEL: White-box Prediction Serving•We analyze models to optimize the internal execution

•We let models co-exist on the same runtime, sharing computation and memory resources

•We optimize models in two directions:1. End-to-end optimizations2. Multi-model optimizations

13

Page 14: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

End-to-End Optimizations

Optimize the execution of individual models from start to end1. [Ahead-of-time Compilation]

Compile operators’ code in advanceà No JIT overhead

2. [Vector pooling] Pre-allocate data structuresà No memory allocation on the data path

14

Page 15: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Multi-model Optimizations

Share computation and memory across models1. [Object Store]

Share Operators parameters/weightsà Maintain only one copy

2. [Sub-plan Materialization] Reuse intermediate results computed by other modelsà Save computation

15

Page 16: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

System Components

1. Flour: Intermediate Representation

2. Oven: Compiler/Optimizer

var fContext = ...;var Tokenizer = ...;return fPrgm.Plan();

Logical StagesS1 S2 S3 1: [x]

2: [y,z]3: …

int[100]float[200]…

Params Stats

Physical StagesS1 S2 S3

ML.NETModel

Stats

ParamsLogical StagesPhysicalStages

Model Plan

3. Runtime: Execute inference queries

Runtime

ObjectStore

Scheduler

4. FrontEnd: Handle user requests

FrontEnd

16

Page 17: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Prediction Serving with PRETZEL

1. Offline• Analyze structural information of models• Build ModelPlan for optimal execution• Register ModelPlan to Runtime

2. Online• Handle prediction requests• Coordinate CPU & memory resources

RuntimeFrontEnd

Model

ModelPlan RuntimeRegister

Analyze

17

Page 18: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

1. Translate Model into Flour Program

System Design: Offline Phase

Tokenizer

CharNgram

WordNgram

Concat LogReg

<Model>var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.FromText(fields, fieldsType, sep)

.Tokenize();

var tCNgram = tTokenizer.CharNgram(numCNgrms, ...);var tWNgram = tTokenizer.WordNgram(numWNgrms, ...);var fPrgrm = tCNgram

.Concat(tWNgram)

.ClassifierBinaryLinear(cParams);

return fPrgrm.Plan();

<Flour Program>

18

Page 19: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

2. Oven optimizer/compiler build Model Plan

System Design: Offline Phase

var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.FromText(fields, fieldsType, sep)

.Tokenize();

var tCNgram = tTokenizer.CharNgram(numCNgrms, ...);var tWNgram = tTokenizer.WordNgram(numWNgrms, ...);var fPrgrm = tCNgram

.Concat(tWNgram)

.ClassifierBinaryLinear(cParams);

return fPrgrm.Plan();

<Flour Program>

Push linear predictor & Remove Concat

Stage 1

Stage 2

Group ops into stages

Rule-based optimizer

S1

S2

<Model Plan>

Logical DAG

19

Page 20: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

2. Oven optimizer/compiler build Model Plan

System Design: Offline Phase

var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.FromText(fields, fieldsType, sep)

.Tokenize();

var tCNgram = tTokenizer.CharNgram(numCNgrms, ...);var tWNgram = tTokenizer.WordNgram(numWNgrms, ...);var fPrgrm = tCNgram

.Concat(tWNgram)

.ClassifierBinaryLinear(cParams);

return fPrgrm.Plan();

<Flour Program>

Push linear predictor & Remove Concat

Stage 1

Stage 2

Group ops into stages

Rule-based optimizer

S1

S2

<Model Plan>

Logical DAG

var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.FromText(fields, fieldsType, sep)

.Tokenize();

var tCNgram = tTokenizer.CharNgram(numCNgrms, ...);var tWNgram = tTokenizer.WordNgram(numWNgrms, ...);var fPrgrm = tCNgram

.Concat(tWNgram)

.ClassifierBinaryLinear(cParams);

return fPrgrm.Plan();

var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.FromText(fields, fieldsType, sep)

.Tokenize();

var tCNgram = tTokenizer.CharNgram(numCNgrms, ...);var tWNgram = tTokenizer.WordNgram(numWNgrms, ...);var fPrgrm = tCNgram

.Concat(tWNgram)

.ClassifierBinaryLinear(cParams);

return fPrgrm.Plan(); Parameters

e.g., Dictionary, N-gram Length

Statisticse.g., dense vs. sparse,maximum vector size 20

Page 21: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

3. Model Plan is registered to Runtime

System Design: Offline Phase

<Model Plan>

Logical DAG

Parameters

Statistics

Physical Stages

2. Find the most efficient physical impl. using params & stats

S1 S2

Logical StagesModel1

S1

S2

Object Store

1. Store parameters & mapping between

logical stages 21

Page 22: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

3. Model Plan is registered to Runtime

System Design: Offline Phase

<Model Plan>

Logical DAG

Parameters

Statistics

Physical Stages

2. Find the most efficient physical impl. using params & stats

S1 S2Catalog

3. Register selected physical stages to

Catalog

Logical StagesModel1

S1

S2

Object Store

1. Store parameters & mapping between

logical stages

N-gram length1 vs. 3 Sparse vs. Dense

22

Page 23: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

System Design: Online Phase

Runtime

Logical StagesModel1 Model2

S1

S2

S1’

S2’

4. Send result back to Client

1. When a prediction request arrives

<Model1, “Pretzel is tasty”>

3. Execute stages using thread-pools,

managed by Scheduler

Physical Stages

2. Instantiate physical stages

along with parameters

Object Store

23

Page 24: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Outline

• (Black-box) Prediction Serving Systems• Limitations of Black box approaches•PRETZEL: White-box Prediction Serving System• Evaluation•Conclusion

24

Page 25: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Evaluation

• Q. How PRETZEL improves performance over black-box approaches?• in terms of latency, memory and throughput

• 500 Models from Microsoft Machine Learning Team• 250 Sentiment Analysis (Memory-bound)• 250 Attendee Count (Compute-bound)

• System configuration• 16 Cores CPU, 32GB RAM • Windows 10, .Net core 2.0

25

Page 26: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Evaluation: Latency

• Micro-benchmark (No server-client communication)• Score 250 Sentiment Analysis models 100 times for each• Compare ML.Net vs. PRETZEL

10 2 10 1 100 101Latency (ms, log-scaled)

0

20

40

60

80

100

CDF

(%)

ML.Net(hot)

ML.Net(cold)

0.6 8.1ML.Net PRETZEL

P99 (hot)P99 (cold)Worst (cold)

ML.Net PRETZELP99 (hot) 0.6P99 (cold) 8.1Worst (cold)

ML.Net PRETZELP99 (hot) 0.6 0.2P99 (cold) 8.1 0.8Worst (cold)

ML.Net PRETZELP99 (hot) 0.6 0.2P99 (cold) 8.1 0.8Worst (cold) 280.2 6.2

10 2 10 1 100 101Latency (ms, log-scaled)

0

20

40

60

80

100

CDF

(%)

PRET

ZEL

(hot

)

PRET

ZEL

(col

d) 0.6 8.10.2 0.8

3x

10x

45xbetter

26

Page 27: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

• Measure Cumulative Memory Usage after loading 250 models• Attendee Count models (smaller size than Sentiment Analysis)• 4 settings for Comparison

Evaluation: Memory

better0 50 100 150 200 250

Number of pipelines10MB

0.1GB

1GB

10GB

Cum

ulat

ive

Mem

ory

Usag

e (

log-

scal

ed)

32GBSettings Shared

ObjectsShared

RuntimeML.Net + Clipper

ML.Net ✓PRETZEL without ObjectStore ✓PRETZEL ✓ ✓

0 50 100 150 200 250Number of pipelines

10MB

0.1GB

1GB

10GB

Cum

ulat

ive

Mem

ory

Usag

e (

log-

scal

ed)

32GB

0 50 100 150 200 250Number of pipelines

10MB

0.1GB

1GB

10GB

Cum

ulat

ive

Mem

ory

Usag

e (

log-

scal

ed)

32GB

164MB

2.9GB3.7GB9.7GB

25x 62x

PRETZEL

(w.o. ObjStore)ML.Net

ML.Net + Clipper

27

Page 28: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Evaluation: Throughput

1 2 4 8 13Num. CPU Cores

(SA)

0

50

100

150

Thro

ughp

ut (K

QPS

)

(ideal)

(ideal)

PretzelML.Net

1 2 4 8 13Num. CPU Cores

(AC)

0

5

10

15(ideal)

(ideal)

1 2 4 8 13Num. CPU Cores

(SA)

0

50

100

150

Thro

ughp

ut (K

QPS

)

(ideal)

(ideal)

PretzelML.Net

1 2 4 8 13Num. CPU Cores

(AC)

0

5

10

15(ideal)

(ideal)

better

• Micro-benchmark• Score 250 Attendee Count models 1000 times for each• Request 1000 queries in a batch• Compare ML.Net vs. PRETZEL

10x

Close to ideal scalability

More resultsin the paper!

28

Page 29: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

Conclusion• PRETZEL is the first white-box prediction serving system for ML pipelines

• By using models’ structural info, we enable two types of optimizations:• End-to-end optimizations generate efficient execution plans for a model

• Multi-model optimizations let models share computation and memory resources

• Our evaluation shows that PRETZEL can improve performance compared to Black-box systems (e.g., ML.Net)• Decrease latency and memory footprint

• Increase resource utilization and throughput

29

Page 30: ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25. Evaluation: Latency •Micro-benchmark (No server-client communication) •Score 250

PRETZEL: a White-Box ML Prediction Serving System

Thank you! Questions?

–„„fi ¿� ¶ �

–„¿�„fi ¿� ¶ �

¿�„fi ¿� ¶ �A

¿�„fi ¿� ¶ �B

BS06‰ –�·ˇ ‡�-� ¿� ¶ �

‰ –�·ˇ ‡· �…›¿�·º —–‡� �‰�”…‚¶ '¿ �• �� ‚��� �`¶ ��»�‚» · �� �‚• ,���· ��ß¿º‚¯ …� �»� †¿¡��߶���ł�£,�•„��� ¿�����»���•` ¿'�

�ß�� ��� �»�…– ˆ ¿'�»�¿º �·�.�–�·��œ�¿º��¿¡�‚ ��• � ¿� ¶ �,�»� ˇ ¶ ����� �‰ –�·ˇ ‡� � ‚ƒ��‡„� ¿·��� ¿� ¶ ��¶ · �»� ˇ ¶ �� �

¿�…–��ß¿º–� ��”�”���• ��˛ ��� �‚‚�,�»�¿º‰ �”æ• ‡“��£��,�¯'–�‚ƒ���� • �”fl�� ��…���ł·�.�‰ –�·ˇ ‡� �»�»��”�Full�Color���¿º»�»��»�

¿�…–�ß�‚• �»�¿º ‚�����„�� �»�»�� �¿º¿¡�·º ��»� ��”�”»��¡����¶�� � �»�»� �¿º�»� ��� �·�.���»��”�CD-Rom ¿¡�…�• � ��¥�� ‚�»�¿º�»�

¿ł ¢�‚• � �·�

30