Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Distributed Inference Between Mobile Edge Devices and the Cloud

Sandeep Chinchali*, Jenya Pergament*, Eyal Cidon*, Marco Pavone, Sachin Katti

Neural Net

1

Can robot perception tasks be done in the cloud?• Automated Sensing from Video/LIDAR

• Compute-intensive Deep Neural Nets (DNNs)

• Can resource-constrained robots scalably use

“the cloud?”

2

Uplink-limited

Credit: Alexander Kazeka, https://www.youtube.com/watch?v=1j_3fh34E44

https://www.youtube.com/watch?v=1j_3fh34E44

Sensory Input

Robot Model

Limited Network

Offload Compute

Mobile Robot

Cloud Model

Cloud

Image, MapDatabases

OffloadLogic

Local Compute

Query the cloud for better accuracy?Latency vs. Accuracy vs. Power …

OutlineLearning-Based Approach to Cloud Offloading in Robotics Sandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament, Eyal Cidon, Sachin Katti, Marco Pavone, [accepted to Robotics: Science and Systems (RSS) 2019]

1. Accuracy vs Compute-Efficiency Trade-offs of DNNs2. Network Costs of Streaming Video/ LIDAR

3. A learning-based approach to Cloud Offloading

4. Simulation and Hardware Experiments

4

Accuracy of Robot and Cloud DNNs

Cloud ModelRobot Model

5

If embedded AI gets better, will I still need the cloud?

Cloud is still useful to:1. Pool video from multiple

robots2. Access large map, image

databases3. Query models trained on

more/newer data

“Cloud”: could even be a bigger on-board model

6

Jetson TX2 GPU (~$480)

Google Edge TPU (~$150)Jetson Nano (~$99)

Model Raspberry PI 3

R-pi 3 + Intel Neural Compute Stick

Jetson Nano

Edge TPU

SSD MobileNet-v2 (300x300)

1 FPS 11 FPS 39 FPS 48 FPS

Source: https://devblogs.nvidia.com/jetson-nano-ai-computing/

https://devblogs.nvidia.com/jetson-nano-ai-computing/

Outline1. Accuracy vs Compute-Efficiency Trade-offs of DNNs

2. Network Costs of Streaming Video/ LIDAR



7

Uplink-limited

Network Costs of Cloud Communication

1. Congested Wireless Links2. High Bandwidth: Designed for Human, Not Robot Perception

8

J. Emmons, S. Fouladi, G. Ananthanarayanan, S. Venkataraman, S. Savarese, K. Winstein, “Cracking Open the DNN blackbox”

Our Network Congestion Experiments

“ROS Ate My Network Bandwidth!”(ROS User Forums)

~70 Mbps





10

WastedQueries

Cloud Offloading as a Decision Problem

[email protected] 11

Cloud Queries

RobotConfidence

Robot Correct Contending goals• Maximize Accuracy• Minimize latency• Limited Network

Share

Optimal Control

Limited Cloud Queries

RL Approach to Cloud Offloading

DNN

Edge Cloud

12

Reinforcement Learning (RL)

Goal: Maximize the total reward

Agent Environment

Observe state !"

Action #"

Reward $"

13Adapted from Pensieve (Sigcomm 18, Mao et. al.)

Exploration vs. Exploitation Tradeoff

Exploit: On-board Robot Model

Explore: Utility of Cloud by learning

RobotLimited Network

Cloud

Reward

!"#$%&'($))#$*&

!+$,$''-' Offload

Cloud Model Predict*' = /

*' = {1, 3}Past Predictions

*' = 5

State 6'

The Robot Offloading MDP

Cloud Model

Robot Limited Network

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud

,% = {1, 3}Past Predictions

,% = 5

State 6%

The Robot Offloading MDP: Action Space

Cloud Model


!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud


,% = 5

State 6%

The Robot Offloading MDP: State Space

Cloud Model


!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud


,% = 5

State 6%

The Robot Offloading MDP: Reward

Cloud Model


!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud


,% = 5

State 6%





19

Query Cloud

SVM Classifier

Robot Model

!"FaceNet

Embed Face A

90% Conf

Coherence Time

" = $ " = %

RL beats benchmark offloading policies> 2.6x reward of benchmarks

RL: 70 % of oracle reward

All-Robot: today’s de-facto!"

#$%&'()*+,

RL intelligently, but sparingly queries cloud

Hardware Experiments on Live Video + Embedded Compute Platform

RL for Cloud Offloading in Robotics

• Compute model size and sensory data will grow

• Judicious use of Cloud in Robotics

• RL: General Two-Stage Decision Problem

OffloadLogic

Robot ModelCloud Model

Mobile RobotLimited Network

Sensory Input

Cloud

Offload ComputeLocal ComputeImage, MapDatabases

Query the cloud for better accuracy?Latency vs. Accuracy vs. Power …

Thanks! Please See Sandeep, Eyal, Jenya

25

Emmons et. al, “Neural Networks Are Networks Too”

Uplink-limited

Sensor Representation for Machine Perception

1. Human Eye -> High Bandwidth2. All-edge/All-cloud restrictive

Can we send fewer, relevant bits for the same accuracy? 7

Google Edge TPU ($150), Nvidia Jetson Nano ($99), TX2 ($600)

Future Directions

26

Emmons et. al, “Neural Networks Are Networks Too”

Uplink-limited

Network Costs of Cloud Communication

1. Congested Wireless Links2. High Bandwidth: Designed for Human Perception

27

System Architecture

DNN

Edge Cloud

28

Should we split Vision DNNs between edge/cloud?

Edge Google

Split at Layer 5

PredictPixelsOff-the-shelf

Pixels Intermediates

Do not split rapidly-evolving DNNs!NeuroSurgeon ASPLOS ’17

Google v1 v2

Split at Layer 5 10

29

Off-the-shelf

Idea: Keep Vision DNNs Intact

Decoder Edge Encoder Google, FB

Black-Box w/ API

PredictPixels

Benefit: Extends beyond video or DNNs (e.g. robotic map-making) 30

Learning-based Approach

DNN

Edge Cloud

31

Decoder

PixelEstimateCoded

FeaturesVideo

Edge

Feedback Reward (Training)

Predict

Off-the-Shelf

System Architecture

32

Many Open Questions

• Machines (DNNs) will watch most future video

• Research Avenues:• Small—scale RL simulations [Hotnets 18]

• Practical systems prototype [Under review]

• Active Learning to query the cloud [Under review]

• Deep RL with Real Vision DNNs – next!DNN

33

Simplified Systems Prototype

DNN

Edge Cloud

34

Edge Device

Video

Feature Feedback

Coded Features

1. Active Edge Encoders

Dynamically Encode Task-Relevant Content 35

Modify Sub-Image Resolution,Crop Regions,

“Machine” features, …

DNN

Code 1, Camera 1

Code 2, Camera 2

2. Centralized Active Decoder

Estimate Edge Scenes, “Fill-in” Missing Pixels w/ memory 36

DNN

Predict

State-ful DecoderPixel

Estimates

DNNPredict

Codes

Pixels

Edge Device

Feature Feedback

3. Feature Feedback from the Cloud

What content matters?

37

Content Priorities,Camera Angle,

…

MobileNet

Edge

Audio

Video

AI Offloader:

• New Content?

• BW Sufficient?

• Edge Correct?

Low Latency Result

Cloud Model

Accurate

Cloud

Result

Offload

Don’t Offload

Ba

nd

wid

th

Mobile Offloading for Vision

38

1.2-2.1x accuracy of all-edge, 60-90% BW savings compared to all-cloud

Should we split Vision DNNs between edge/cloud?

Edge Google

Split at Layer 5

PredictPixelsOff-the-shelf

Pixels Intermediates

Do not split rapidly-evolving DNNs!NeuroSurgeon ASPLOS ’17

Google v1 v2

Split at Layer 5 10

39

Results: Mobile Offloading for Vision

1. Trade-off Accuracy for BW Savings2. Adapt to edge model accuracy

Results (normalized to all-cloud):1. 60-90% BW savings 2. 80-90% accuracy of oracle3. 1.2-2.1x accuracy of all-edge

40

Edge MobileNet v1, v2

Accuracy

Insight: Bandwidth and Task-Aware Delivery1. Human Eye -> High Bandwidth

2. All-edge/All-cloud restrictive

3. Use Off-the-Shelf DNNs

Black-boxDecoder / EstimatorFeature Extractor/Filter

41

Problem Insights1. Human Eye -> High Bandwidth

2. All-edge/All-cloud restrictive

3. Use Off-the-Shelf DNNs

Proposal: Bandwidth and Task-Aware Video Delivery

Machine Perception

42

Deep-dive into componentsEdge Cloud

43

Edge Device

!"#

Data Center

Feature Feedback

$"#%"#

&"#

Wireless Network

1. Distributed Edge Encoders

%"# = ()*+,-)(!"#, $"#, 0&"#)44

!"#

!$#

Data Center

2. Centralized Active Decoder

Pretrain Predict%#

%# = '()*#)+,-(/0#)

/0# = '2*342*(/0#5", 78#, !#)

Decoder

98# /0#

45

Pretrain Predict

!"# $%# $&#

%'# "'#('#

Edge Device

)'#

Feature Feedback

3. Feature Feedback from the Cloud

Active Decoder*)# = ,-./0-.($%#23, 5"#, (#)

46

Documents

Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,