33
RISELab: Enabling Intelligent Real-time Decisions Ion Stoica February 8, 2017

RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica

Embed Size (px)

Citation preview

RISELab: Enabling Intelligent Real-time Decisions

Ion StoicaFebruary 8, 2017

Berkeley’s AMPLab (2011-2016)

2

Algorithms

Machines People

Goal: Next generation of open sourcedata analytics stack for industry & academia

Berkeley Data Analytics Stack (BDAS)

Berkeley’s AMPLab (2011-2016)

3

Algorithms

Machines People

Goal: Next generation of open sourcedata analytics stack for industry & academia

Berkeley Data Analytics Stack (BDAS)

RISE: Real-time Intelligent Secure Execution

From batch data to advanced analytics

AMPLab

5

From live data to real-time decisions

RISELab

RISE Lab (2017-2022)

12 faculty across AI, systems, security, and architectures

11 Founding sponsors

6

Why?Data only as valuable as the decisions it enables

7

Why?

What does this mean?• Faster decisions better than slower decisions• Decisions on fresh data better than decisions on stale data• Decisions on personalized data better than on aggregate data

8

Data only as valuable as the decisions it enables

Goal

Real-time decisions

on live data

with strong security

9

decide in ms

the current state of the environment

privacy, confidentiality, integrity

Typical decision system

Decision SystemQuery

Decision

Environment+

sensors & actuators

Observations, Feedback

Preprocess Intermediatedata

DecisionEngine

Decision

QueryDecisionEnginePreprocess Intermediate

data

Environment+

sensors & actuators

Typical decision system

Decision System

Observations, Feedback

LiveUpdate latency(e.g., ~1 seconds)

Decision

QueryDecisionEnginePreprocess Intermediate

data

Environment+

sensors & actuators

Typical decision system

Decision System

Observations, Feedback

LiveUpdate latency(e.g., ~1 seconds)

Secure

Real-timedecision latency

(e.g., ~10 ms)

Example of decision systems

Decision SystemQuery

DecisionTraining

Models(diff. tradeoffs

complexity/accuracy)

ModelServing

FeedbackObservations, Feedback

ML Pipeline(e.g., Clipper +

Spark/Tensorflow)

Decision SystemObs.

Action

Update Policy

Policyobs àaction

QueryPolicy

Observations, Rewards

ReinforcementLearning Systems

(e.g., Ray)

Pre-process

Interm-ediateData

DecisionEngine

What else do we want from decisions?

Intelligent: complex decisions in uncertain environments

Robust: handle complex noise, unforeseen inputs, failures

Explainable: ability to explain non-obvious decisions

Goal

Develop open source platforms, tools, and algorithms for

intelligent real-time decisions on live-data

Some Proposed Research

Secure Real-time Decisions Stack (SRDS) • Open source platform to develop of RISE apps• Secure from ground up• Reinforcement Learning (RL) as one of key app patterns

Learning control hierarchies: speedup learning, training

Shared learning: learn over confidential data

16

Secure Real-time Decisions Stack (SRDS) • Open source platform to develop of RISE apps• Secure from ground up• Reinforcement Learning (RL) as one of key app patterns

Learning control hierarchies: speedup learning, training

Shared learning: learn over confidential data

17

Some Proposed Research

Secure Real-time Decision Stack (SRDS)

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Minimalist execution engine:• Support both data flow and task-parallel execution models• High-throughput, low-latency: ~ 1M tasks/sec @ ms latency

Secure Real-time Decision Stack (SRDS)

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Central repository for models, APIs to capture the context in which data gets used and producedStatus: ongoing project with industry partners

Secure Real-time Decision Stack (SRDS)

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Replaying of apps at fine granularity• Simplify development, debugging• Robustness: replay against perturbed inputs• Explainability: identify inputs causing decision• Security: confirm vulnerabilities, test security

patches, compliance auditing

Secure Real-time Decision Stack (SRDS)

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Dramatically simplify development of RISE applications• Apache Spark: improve latency and security• Clipper: model serving for Apache Spark, Scikit learn, etc• Ray: framework for RL applications

Secure Real-time Decision Stack (SRDS)

Improving Apache SparkDrizzle• Decrease latency of Structured Streaming and ML algorithms by ~10x• Techniques: group scheduling, shared variables

23

Streaming Latency: YCSB benchmark

24

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Streaming Latency: YCSB benchmark

25

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Streaming Latency: YCSB benchmark

26

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Streaming Latency: YCSB benchmark

27

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Streaming Latency: YCSB benchmark

28

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

15x

MLlib: SGD Performance

29

0

20

40

60

4 8 16 32 64 128

Tim

e / i

ter(

ms)

Machines

Spark Drizzle

MLlib: SGD Performance

30

0

20

40

60

4 8 16 32 64 128

Tim

e / i

ter (

ms)

Machines

Spark Drizzle

6x

Improving Apache SparkDrizzle• Decrease latency of Structured Streaming and ML algorithms by ~10x• Techniques: group scheduling, shared variables• Some of these techniques will make their way to Apache Spark

Opaque• Full data encryption, authentication, and verification (Intel’s SGX)• Oblivious mode: hide data access pattern• Support most SparkSQL functionality• See Wenting’s talk later

31

RISELab

Already promising results

Expect much more over the next five years!

32

Goal: Develop open source platforms, tools, and algorithms for intelligent real-time decisions on live-

data

Thank you