ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25....

PRETZEL: Opening the Black Box of ML Prediction Serving Systems

Yunseong Lees, Alberto Scolarip, Byung-Gon Chuns,Marco Domenico Santambrogiop, Markus Weimerm, Matteo Interlandim

–„„fi ¿� ¶ �

–„¿�„fi ¿� ¶ �

¿�„fi ¿� ¶ �A

¿�„fi ¿� ¶ �B

BS06‰ –�·ˇ ‡�-� ¿� ¶ �

‰ –�·ˇ ‡· �…›¿�·º —–‡� �‰�”…‚¶ '¿ �• �� ‚�� `¶ ��»�‚» · �� ‚• ,��· ��ß¿º‚¯ …� �»� †¿¡��ß¶��ł�£,�•„�� ¿��»��•` ¿'�

�ß�� »�…– ˆ ¿'�»�¿º �·�.�–�·��œ�¿º��¿¡�‚ ��• � ¿� ¶ �,�»� ˇ ¶ �� ‰ –�·ˇ ‡� � ‚ƒ��‡„� ¿·�� ¿� ¶ ��¶ · �»� ˇ ¶ ��

¿�…–��ß¿º–� ��”�”��• ��˛ �� ‚‚�,�»�¿º‰ �”æ• ‡“��£��,�¯'–�‚ƒ�� • �”fl�� …��ł·�.�‰ –�·ˇ ‡� �»�»��”�Full�Color��¿º»�»��»�

¿�…–�ß�‚• �»�¿º ‚��„�� »�»�� ¿º¿¡�·º ��»� ��”�”»��¡��¶�� »�»� �¿º�»� �� ·�.��»��”�CD-Rom ¿¡�…�• � ��¥�� ‚�»�¿º�»�

¿ł ¢�‚• � �·�

Machine Learning Prediction Serving1. Models are learned from data2. Models are deployed and served together

Prediction serving

UsersServerData

Training

ModelLearn Deploy

Performance goal:1) Low latency2) High throughput3) Minimal resource usage

ML.Net

ML Prediction Serving Systems: State-of-the-art

• Assumption: models are black box• Re-use the same code in training phase

• Encapsulate all operationsinto a function call (e.g., predict())

• Apply external optimizations

TF ServingClipper

Prediction ServingSystem

“Pretzel is tasty”

Text Analysis

ImageRecognition

Resultcaching

Replication

ensembleRequestBatching

How do Models Look inside Boxes?

<Example: Sentiment Analysis>

Pretzel is tasty Model J vs. L(text) (positive vs. negative)

Tokenizer

Pretzel is tasty

CharNgram

WordNgram

Concat LogisticRegression

DAG of Operators

J vs. L

FeaturizersPredictor

Tokenizer

Pretzel is tasty

CharNgram

WordNgram

Concat LogisticRegression

DAG of Operators

J vs. L

Split text into tokens

ExtractN-grams

Merge two vectors

Compute final score

Many Models Have Similar Structures

• Many part of a model can be re-used in other models

• Customer personalization, Templates, Transfer Learning

• Identical set of operators with different parameters

Outline

•Prediction Serving Systems• Limitations of Black Box Approaches•PRETZEL: White-box Prediction Serving System• Evaluation•Conclusion

Limitation 1: Resource Waste

•Resources are isolated across Black boxes

1. Unable to share memory spaceè Waste memory to maintain duplicate objects

(despite similarities between models)

2. No coordination for CPU resources between boxesè Serving many models can use too many threads

machine9

Tokenizer

CharNgram

WordNgram

Concat LogReg

Limitation 2: Inconsideration for Ops’ Characteristics

1. Operators have different performance characteristics• Concat materializes a vector• LogReg takes only 0.3% (contrary to the training phase)

2. There can be a better plan if such characteristics are considered• Re-use the existing vectors• Apply in-place update in LogReg

0% 20% 40% 60% 80% 100%Latency breakdown

23.1 34.2 32.7

CharNgram WordNgram Concat LogReg Others

Limitation 3: Lazy Initialization• ML.Net initializes code and memory lazily (efficient in training phase)• Run 250 Sentiment Analysis models 100 timesè cold: first execution / hot: average of the rest 99• Long-tail latency in the cold case• Code analysis, Just–in-time (JIT) compilation, memory allocation, etc• Difficult to provide strong Service-Level-Agreement (SLA)

Tokenizer

CharNgram

WordNgram

Concat LogReg 13x

Outline

• (Black-box) Prediction Serving Systems• Limitations of Black Box Approaches•PRETZEL: White-box Prediction Serving System• Evaluation•Conclusion

PRETZEL: White-box Prediction Serving•We analyze models to optimize the internal execution

•We let models co-exist on the same runtime, sharing computation and memory resources

•We optimize models in two directions:1. End-to-end optimizations2. Multi-model optimizations

End-to-End Optimizations

Optimize the execution of individual models from start to end1. [Ahead-of-time Compilation]

Compile operators’ code in advanceà No JIT overhead

2. [Vector pooling] Pre-allocate data structuresà No memory allocation on the data path

Multi-model Optimizations

Share computation and memory across models1. [Object Store]

Share Operators parameters/weightsà Maintain only one copy

2. [Sub-plan Materialization] Reuse intermediate results computed by other modelsà Save computation

System Components

1. Flour: Intermediate Representation

2. Oven: Compiler/Optimizer

var fContext = ...;var Tokenizer = ...;return fPrgm.Plan();

Logical StagesS1 S2 S3 1: [x]

2: [y,z]3: …

int[100]float[200]…

Params Stats

Physical StagesS1 S2 S3

ML.NETModel

ParamsLogical StagesPhysicalStages

Model Plan

3. Runtime: Execute inference queries

Runtime

ObjectStore

Scheduler

4. FrontEnd: Handle user requests

FrontEnd

Prediction Serving with PRETZEL

1. Offline• Analyze structural information of models• Build ModelPlan for optimal execution• Register ModelPlan to Runtime

2. Online• Handle prediction requests• Coordinate CPU & memory resources

RuntimeFrontEnd

ModelPlan RuntimeRegister

Analyze

1. Translate Model into Flour Program

System Design: Offline Phase

Tokenizer

CharNgram

WordNgram

Concat LogReg

<Model>var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.FromText(fields, fieldsType, sep)

.Tokenize();

var tCNgram = tTokenizer.CharNgram(numCNgrms, ...);var tWNgram = tTokenizer.WordNgram(numWNgrms, ...);var fPrgrm = tCNgram

.Concat(tWNgram)

.ClassifierBinaryLinear(cParams);

return fPrgrm.Plan();

2. Oven optimizer/compiler build Model Plan

var fContext = new FlourContext(...)var tTokenizer = fContext.CSV

.Tokenize();

.Concat(tWNgram)

Push linear predictor & Remove Concat

Stage 1

Stage 2

Group ops into stages

Rule-based optimizer

Logical DAG

2. Oven optimizer/compiler build Model Plan

.Tokenize();

.Concat(tWNgram)

Push linear predictor & Remove Concat

Stage 1

Stage 2

Group ops into stages

Rule-based optimizer

Logical DAG

.Tokenize();

.Concat(tWNgram)

.Tokenize();

.Concat(tWNgram)

return fPrgrm.Plan(); Parameters

e.g., Dictionary, N-gram Length

Statisticse.g., dense vs. sparse,maximum vector size 20

3. Model Plan is registered to Runtime

Logical DAG

Parameters

Statistics

Physical Stages

2. Find the most efficient physical impl. using params & stats

Logical StagesModel1

Object Store

1. Store parameters & mapping between

logical stages 21

3. Model Plan is registered to Runtime

Logical DAG

Parameters

Statistics

Physical Stages

2. Find the most efficient physical impl. using params & stats

S1 S2Catalog

3. Register selected physical stages to

Catalog

Logical StagesModel1

Object Store

1. Store parameters & mapping between

logical stages

N-gram length1 vs. 3 Sparse vs. Dense

System Design: Online Phase

Runtime

Logical StagesModel1 Model2

4. Send result back to Client

1. When a prediction request arrives

<Model1, “Pretzel is tasty”>

3. Execute stages using thread-pools,

managed by Scheduler

Physical Stages

2. Instantiate physical stages

along with parameters

Object Store

Outline

• (Black-box) Prediction Serving Systems• Limitations of Black box approaches•PRETZEL: White-box Prediction Serving System• Evaluation•Conclusion

Evaluation

• Q. How PRETZEL improves performance over black-box approaches?• in terms of latency, memory and throughput

• 500 Models from Microsoft Machine Learning Team• 250 Sentiment Analysis (Memory-bound)• 250 Attendee Count (Compute-bound)

• System configuration• 16 Cores CPU, 32GB RAM • Windows 10, .Net core 2.0

Evaluation: Latency

• Micro-benchmark (No server-client communication)• Score 250 Sentiment Analysis models 100 times for each• Compare ML.Net vs. PRETZEL

10 2 10 1 100 101Latency (ms, log-scaled)

ML.Net(hot)

ML.Net(cold)

0.6 8.1ML.Net PRETZEL

P99 (hot)P99 (cold)Worst (cold)

ML.Net PRETZELP99 (hot) 0.6P99 (cold) 8.1Worst (cold)

ML.Net PRETZELP99 (hot) 0.6 0.2P99 (cold) 8.1 0.8Worst (cold)

ML.Net PRETZELP99 (hot) 0.6 0.2P99 (cold) 8.1 0.8Worst (cold) 280.2 6.2

10 2 10 1 100 101Latency (ms, log-scaled)

d) 0.6 8.10.2 0.8

45xbetter

• Measure Cumulative Memory Usage after loading 250 models• Attendee Count models (smaller size than Sentiment Analysis)• 4 settings for Comparison

Evaluation: Memory

better0 50 100 150 200 250

Number of pipelines10MB

32GBSettings Shared

ObjectsShared

RuntimeML.Net + Clipper

ML.Net ✓PRETZEL without ObjectStore ✓PRETZEL ✓ ✓

0 50 100 150 200 250Number of pipelines

2.9GB3.7GB9.7GB

25x 62x

PRETZEL

(w.o. ObjStore)ML.Net

ML.Net + Clipper

Evaluation: Throughput

1 2 4 8 13Num. CPU Cores

(ideal)

PretzelML.Net

15(ideal)

(ideal)

PretzelML.Net

15(ideal)

(ideal)

better

• Micro-benchmark• Score 250 Attendee Count models 1000 times for each• Request 1000 queries in a batch• Compare ML.Net vs. PRETZEL

Close to ideal scalability

More resultsin the paper!

Conclusion• PRETZEL is the first white-box prediction serving system for ML pipelines

• By using models’ structural info, we enable two types of optimizations:• End-to-end optimizations generate efficient execution plans for a model

• Multi-model optimizations let models share computation and memory resources

• Our evaluation shows that PRETZEL can improve performance compared to Black-box systems (e.g., ML.Net)• Decrease latency and memory footprint

• Increase resource utilization and throughput

PRETZEL: a White-Box ML Prediction Serving System

Thank you! Questions?

–„„fi ¿� ¶ �

–„¿�„fi ¿� ¶ �

¿�„fi ¿� ¶ �A

¿�„fi ¿� ¶ �B

BS06‰ –�·ˇ ‡�-� ¿� ¶ �

‰ –�·ˇ ‡· �…›¿�·º —–‡� �‰�”…‚¶ '¿ �• �� ‚�� `¶ ��»�‚» · �� ‚• ,��· ��ß¿º‚¯ …� �»� †¿¡��ß¶��ł�£,�•„�� ¿��»��•` ¿'�

�ß�� »�…– ˆ ¿'�»�¿º �·�.�–�·��œ�¿º��¿¡�‚ ��• � ¿� ¶ �,�»� ˇ ¶ �� ‰ –�·ˇ ‡� � ‚ƒ��‡„� ¿·�� ¿� ¶ ��¶ · �»� ˇ ¶ ��

¿�…–��ß¿º–� ��”�”��• ��˛ �� ‚‚�,�»�¿º‰ �”æ• ‡“��£��,�¯'–�‚ƒ�� • �”fl�� …��ł·�.�‰ –�·ˇ ‡� �»�»��”�Full�Color��¿º»�»��»�

¿�…–�ß�‚• �»�¿º ‚��„�� »�»�� ¿º¿¡�·º ��»� ��”�”»��¡��¶�� »�»� �¿º�»� �� ·�.��»��”�CD-Rom ¿¡�…�• � ��¥�� ‚�»�¿º�»�

¿ł ¢�‚• � �·�

ł PRETZEL: Opening the Black Box of ML Prediction Serving ......•Windows 10, .Netcore 2.0 25....

Documents

Credential of netcore email marketing

APPLICATIONS AIRBORNE SYSTEMS - inCAM · Ł Electronic News Gathering Ł Unmanned Aerial Vehicles Ł Reconnaissance Ł Anti-Submarine Warfare (ASW) Ł Fire Ł Public Utility Surveillance

Digital Modulation - NORTH SHORE AMATEUR RADIO CLUBarchive.nsarc.ca/hf/modulation.pdf · -Ł-Ł--Ł--ŁŁ Ł ŁŁŁ-Ł--ŁŁŁ-Ł ŁŁŁ Ł-Ł 07-Dec-2006 NSARC - Quick Guide to Digital

Introducing Soft Pretzel Dinner Roll

How to make Pretzel S'Mores

Pothim Et HaZohar-2010 10 24 1. - Kabbalah Mediafiles.kabbalahmedia.info/files/heb_o_rav_book_pothim-et-hazohar_we… · ł æ .ł ł æ ł , ł Æ Ø ªœø ... Ø ł ł º łØ

Oracle Business Intelligence Executive Briefingdownload.oracle.com/global/it/bi/docs/bi_launch_22mar06.pdf · Ł Siebel CRM Ł Retek Ł ProfitLogic Ł G-Log Ł Application Server

Æ„—„Ó 2 '”Ñ”ä`Ł⁄Ł˝´´Ł˝ · ”•Æ„—„ÓLATEX2ε '”Ñ”ä`Ł⁄Ł˝´´Ł˝ ¸ˆ×˝LATEX2εª„öò„Ò•Õ â·´ Tobias Oetiker Hubert Partl,

FAMOUS WINGS PRETZEL NUGGETS FAMOUS FRIES …

Nut Calories – Pretzel Calories. Mass = 4.12g Nut Calories – Pretzel Calories Mass = 4.12g Mass Al Cup= 27.65 g

Pretzel Crisps Advertising Campaign

Concurrent NetCore: From Policies to Pipelines

Ł Utilice œnicamente el lÆpiz y Ł NO marque las tildes Ł ...Ł NO marque las tildes Ł NO abrevie Ł NO desprenda ninguna hoja Ł Utilice œnicamente el lÆpiz y el borrador que

I Ł N Ł T Ł R Ł O Ł D Ł U Ł ˆ Ł O A ESTUDO DA DOUTRINA … · 2 - Fecho de abóbada:Fecho de abóbada: pedra angular e principal de uma abóbada ou arco, na qual se sustenta

I Ł N Ł T Ł R Ł O Ł D Ł U Ł ˆ Ł O A ESTUDO DA ... · AO ESTUDO DA DOUTRINA ESP˝RITA I ... Panteísmo:Panteísmo: doutrina filosófica segundo a qual só Deus é real.

Sociedad Colombiana de Otorrinolaringología y Cirugía de ... · Ł Dificultad respiratoria Ł Necesidad de aire Ł Salivación excesiva Ł Inhabilidad para deglutir líquidos Ł

¨iastka 13 Zbierka zÆkonov Ł. 24 Príloha Ł. 4 k vyhlÆıke Ł

ç ‚ EBG -W˝‚ · 2011-06-08 · 3 ßyjgzc zc#Ö} ¯ÝgzZDäs¯yZZü äÛn¿´äÛnÛŁŠ’+ﬂZgY#aÚŽ ŠÜ ß × ł › ł æ ł ä ô n ß × ł ´ ł o F Ö^ˆ ł ł i ö†]o

Copenhagen - Cloudinary€¦ · Copenhagen MERKMALE: Ł Primo Laminato (HPL), Farbe Eiche Naturell Ł Metallgestell Ł LED-Beleuchtung Ł Soft-Close-System Ł Metallrahmen um Türen

PretZel - Past Perfect Tense