LAVEA: Latency-Aware Video Analytics on Edge …syi/presentations/sedgec17_1.pdfLAVEA: Latency-Aware...

LAVEA: Latency-Aware Video Analytics on Edge Computing Platform

Shanhe Yi*, Zijiang Hao*, Qingyang Zhang⌈⌇, Quan Zhang⌈, Weisong Shi⌈, Qun Li*

College of William and Mary* Wayne State University⌈ Anhui University, China⌇

Data is a Gold MineVideo

You have to do it quickly.

How to do low-latency video analytics?

Motivation - Amber Alert

• Image acquisition • License plate extraction • License plate analysis • Character recognition

• Video analytic tasks are computation-intensive and bandwidth-hungry

• Run on mobile or IoT devices • Computational latency • Battery drain • Heat dissipation

• Run on cloud: • Transmission latency • Bandwidth cost

How Edge Computing can Help?

• Edge Computing Network

WiFi BSCloudEdge Computing

ServerWAN

WiFi BS

Edge Computing Server

Edge-front - Edge computing services can be ubiquitous as Internet access.

• Feasibility of leveraging edge computing node

wirHG HGgH

Wi)i 5GHGgH

Wi)i 2.4GhzHGgH

HF2HasW

HF2wHsW

wirHG FliHnWWi)i 2.4GHzWi)i 5GHz

Round-trip time (RTT) : Wired connection is the best; WiFi 5GHz has larger mean and variance compared to

the cloud node in the the closest region;

wirHG HGgH

Wi)i 5GHGgH

Wi)i 2.4GhzHGgH

HF2HasW

HF2wHsW

wirHG HGgH

Wi)i 5GHGgH

Wi)i 2.4GhzHGgH

HF2HasW

HF2wHsW

• Feasibility of leveraging edge computing node

wirHG HGgH

Wi)i 5GHGgH

Wi)i 2.4GhzHGgH

HF2HasW

HF2wHsW

Bandwidth (BW) : All clients have benefits in utilizing a wired or advanced-wireless edge computing node.

Edge computing node has huge advantages in latency & bandwidth!

How shall we provide low-latency video analytics in edge computing system?

Video Analytics meets Edge Computing

Latency Flexibility

Response Time Minimization Problem

• Client task offloading selecting • Offloaded task prioritizing • Offloaded task placing

Edge Computing Platform Design

• Serverless architecture • Edge computing service

• Offloading service • Queueing service • Scheduling service

• Introduction • System Design Overview• Edge Computing Services • Evaluation • Conclusion

System Design - Edge Client

• Resource-constrained devices • Run lightweight data

processing locally • Offload heavy tasks to

nearby edge computing nodes

• Profiler • Collect task performance

• Offloading Controller • Act as an agent to fulfill

offloading decisions

System Design - Edge Computing Node

Host OS

Container

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

Container Container Container Container

MonitoringService

Worker Worker

Task Scheduler

Worker Worker Worker

Producer

Workload Optimizer

Producer

Queue Prioritizer

Task Worker

ProfilerService

OS-level Virtualization

• Docker container - resource allocation/isolation, easy deployment • Modular services • Serverless architecture (Function-as-a-Service)

• AWS Lambda@Edge, Apache OpenWhisk • Event-based micro-service framework

Host OS

Container

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

MonitoringService

Worker Worker

Task Scheduler

Producer

Workload Optimizer

Producer

Queue Prioritizer

Task Worker

ProfilerService

• user request • event of interests (e.g. plate, face, car) • input source • the event handler code (function), build the docker image • execution configuration (e.g. cron job, where to save the result, or

trigger another event) • resource configuration (limit container resource)

Host OS

Container

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

MonitoringService

Worker Worker

Task Scheduler

Producer

Workload Optimizer

Producer

Queue Prioritizer

Task Worker

ProfilerService

• user request -> task (in a format docker command along with input) • event of interests (e.g. plate, face, car) • input source • the event handler code (function) -> docker image • a script for docker to run

Host OS

Container

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

MonitoringService

Worker Worker

Task Scheduler

Producer

Workload Optimizer

Producer

Queue Prioritizer

Task Worker

ProfilerService

• consume the tasks in docker container instances

• Introduction • System Design Overview • Edge Computing Services

• Offloading Service - Client Task Offloading Problem• Queueing Service - Offloaded Task Prioritizing• Scheduling Service - Offloaded Task Placement

• Evaluation • Conclusion

Client Task Offloading - System Model

OpenALPR Task Graph

Directed Acyclic Graph

G = (V,E)

Each vertex weight is the computation cost of a task

Each edge weight is the data size of intermediate result

Weights are gathered via profilers, as pre-runtime information.

Client Task Offloading - Problem Formulation

Edge Computing Node

For each client We use an indicator If , task at client runs locally Otherwise, run remotely

The local execution time of client

the computation costthe processor speed

The network delay

connection rate assignedthe communication latency

data size of intermediate results

The remote execution time

the computation costthe edge processor speed

Bandwidth constraint

Avoid ping-pong constraint

Delay tolerate constraint

Mixed Integer Non-Linear Programming (MINLP) • relax the integer constrains • solve the NLP using a

constrained nonlinear optimization solver (SQP)

• branch & bound • brutal force

Prioritizing Edge Task Queue

• An offloaded task - two stage • Wait for the input or intermediate data (e.g., image or video) • Processing the data and return result

Data Compute

[1] Selmer Martin Johnson. 1954. Optimal two-and three-stage production schedules with setup times included. Naval research logistics quarterly 1, 1 (1954), 61–68. [2] KR Baker. 1990. Scheduling groups of jobs in the two-machine ow shop. Math- ematical and Computer Modelling 13, 3 (1990), 29–36.

• Schedule offloaded tasks to minimize the makespan time • Flow job shop model and Johnson’s rule [1]

JOB T1 T2J1 1 3J2 5 2J3 4 6

T1? -> Head

T2? -> TailPick Job with smallest stage time

JOB T1 T2J1 1 3J2 5 2J3 4 6

T1? -> Head

T2? -> Tail

Pick Job with smallest stage time

JOB T1 T2J1 1 3J2 5 2J3 4 6

T1? -> Head

T2? -> Tail

JOB T1 T2J1 1 3J2 5 2J3 4 6

T1? -> Head

T2? -> Tail

JOB T1 T2J1 1 3J2 5 2J3 4 6

T1? -> Head

T2? -> Tail

J1 J2J3

• Our heuristic solution for tasks with dependencies [2]• Group tasks with dependencies • In each group, find the topological order with minimal

makespan time • Apply Johnson’s rule directly on task groups

• how to apply when there are dependencies?

Inter Edge Collaboration

• Motivation • With increasing number of client node nearby, edge-front node

can be overloaded and non-responsive to new requests • Collaborate with nearby edge node by placing tasks to some

not-so-busy neighbor edge nodes

• Problem • Given an edge-front node and its neighbors, when the edge-

front is overloaded, how to select neighbor as task placement target?

Inter Edge Collaboration

• Scheme • Naive schemes

• Shortest Transmission Time First (STTF) • Periodical recalibrate the latency of transmitting data

• Neglect existing workload of the neighbor • Shortest Queue Length First (SQLF)

• Query nearby edge nodes about the task queue length • Neglect network latency • Scalability Issue: pull based

• Our scheme • Shortest Scheduling Latency First (SSLF)

• Dispatch special task to nearby edge nodes • When special task is executed, send response to edge-

front node: push based • Predict the response time (regression analysis) • Piggyback update

• Introduction • System Design Overview • Edge Computing Services • Evaluation• Conclusion

Evaluations - Setup

• Datasets • Caltech Vision Group 2001 testing image database - 126

images with resolution 896x592 • Self-collected video containing license plates and converted

into different resolutions (640x480, 960x720, 1280x960, 1600x1200)

• Testing datasets in OpenALPR - 22 car images, with various resolutions (pixel 405x540 to 2514x1210, size 316 KB to 2.85 MB)

• Testbed • Four edge nodes, one as edge-front and three as neighbor

edge nodes, cable connected, Quad-core CPU, 4GB Mem • Two types of client nodes: Raspberry Pi2 (cable) and

Raspberry Pi 3 (Wifi) • Cloud node: t2.large EC2 instance

Evaluations - Task Profiling

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

0RtiRnDetectRnPODteDetectiRnPODteAnDOysis2C5

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

0RtiRnDetectRn3ODteDetectiRn3ODteAnDOysis2C5

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

0RtLRnDetectRnPODteDetectLRnPODteAnDOysLs2C5

Edge, i7 QuadCore

Cloud, EC2 t2.large

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

Edge, i7 QuadCore

Cloud, EC2 t2.large

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

896x592wRrkORDd1

640x480wRrkORDd2

960x720wRrkORDd2

1280x960wRrkORDd2

1600x1200wRrkORDd2

Edge, i7 QuadCore

Cloud, EC2 t2.large

Evaluations - Task Offloading Selection

• Overall, by offloading tasks to an edge computing platform, the application we had chosen experienced a speedup up to 4.0x on wired client-edge configuration compared to local execution, and up to 1.7x compared to a similar client-cloud configuration.

• For clients with 2.4 GHz wireless interface, the speedup is up to 1.3x on client-edge configuration compared to local execution, and is up to 1.2x compared to similar client-cloud configuration .

640x480wRrklRad2

960x720wRrklRad2

1280x960wRrklRad2

1600x1200wRrklRad2

s) Client-edge RptClient Rnly(dge RnlyClient-clRud RptClRud Rnly

640x480wRrklRad2

960x720wRrklRad2

1280x960wRrklRad2

1600x1200wRrklRad2

s) Client-edge RptClient Rnly(dge RnlyClient-clRud RptClRud Rnly

wired RPi2 2.4Ghz WiFi RPi3

Evaluations - Edge Task Queue Prioritizing

• Simulation • Baselines: shortest IO first, longest CPU last • Result shows LCPUL is the worst among three schemes and

our scheme outperforms the shortest IO first scheme.

5 10 15 20 25 30 351uPbeU RI tasN RIIORadLng UeTuests.

2uU sFhePeSI2)LC3UL

Evaluations — Inter-Edge Collaboration

• STTF scheme intend to place tasks to edge node with lowest transmission overhead but heaviest workload (node1)

• SQLF scheme intend to place tasks to edge node with lightest workload but with highest transmission overhead (node3)

• SSLF scheme considers both transmission time and the waiting time in the queue, therefore achieves the better performance.

0 2 4 6 8 10 12

Throughput (tasks/sec)

Time (min)

Edge-front nodeEdge node #1Edge node #2Edge node #3

0 2 4 6 8 10 12

Time (min)

0 2 4 6 8 10 12

Time (min)

0 2 4 6 8 10 12Throughput (tasks/sec)

Time (min)

No task placement STTF

SQLF SSLF

Conclusion

• We built LAVEA, a low-latency video edge analytic system

• collaborates nearby client, edge and remote cloud nodes, and

• transfers video feeds into semantic information at places closer to the users in early stages.

• We have formulated an optimization problem for offloading task selection and prioritized task queue to minimize the response time.

• In case of a saturating workload on the front edge node, we have proposed and compared various task placement schemes that are tailed for inter-edge collaboration.

End. Thank you.

LAVEA: Latency-Aware Video Analytics on Edge …syi/presentations/sedgec17_1.pdfLAVEA: Latency-Aware...

Documents

NANDFlashSim: Intrinsic Latency Variation Aware NAND Flash ...storageconference.us/2012/Papers/25.Flash.2.NANDFlashSim.pdfNANDFlashSim: Intrinsic Latency Variation Aware NAND Flash

1 Latency and Mobility–Aware Service Function Chain

Latency Aware Drone Base Station Placement in ...ansari/papers/17GC-Xiang.pdfX. Sun and N. Ansari, “Latency Aware Drone Base Station Placement in Heterogeneous Networks,” in 2017

39 Latency-aware Application Module Management for Fog ... · 39 Latency-aware Application Module Management for Fog Computing Environments Redowan Mahmud, The University of Melbourne

Reducing DRAM Latency via Charge-Level-Aware Look-Ahead

LASTor: A Low-Latency AS-Aware Tor Clientweb.eecs.umich.edu/~harshavm/papers/oakland12.pdf · LASTor: A Low-Latency AS-Aware Tor Client Masoud Akhoondi, Curtis Yu, and Harsha V. Madhyastha

Continuum: A Platform for Cost-Aware, Low-Latency Continual …home.cse.ust.hk/~htianaa/assets/continuum-socc18-c5cfcf... · 2019-08-26 · Continuum: A Platform for Cost-Aware, Low-Latency

iDedup: Latency-aware, inline data deduplication for …2012/02/10 · iDedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson,

Quality/Latency-Aware Real-time Scheduling of Distributed ...slam.ece.utexas.edu/pubs/codes19.QLA-RTS.pdf · guaranteed latency constraints. Towards this end, we first develop a quality

LAVEA: Latency-aware Video Analytics on Edge Computing Platformliqun/paper/sec17-lavea.pdf · 2017-10-17 · LAVEA: Latency-aware Video Analytics on Edge Computing Platform Shanhe

Designing of SDN-Assisted Bandwidth and Latency Aware Route Allocation

BREAKING THE LATENCY BARRIER STRONG SCALING LQCD ON … · CUDA aware MPI: MPI can take GPU pointer can do batched, overlapping transfers (high bandwidth) P2P, CUDA aware, GPU Direct

LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

MASK: Redesigning the GPU Memory Hierarchy to Support ......B. Translation-aware L2 Bypass improves L2 cache utilization C. Address-space-aware Memory Scheduler reduces page walk latency

ENERGY AND LATENCY AWARE APPLICATION MAPPING … · ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOMOGENEOUS 3D NETWORK ON CHIP Vaibhav Jha1, Sunny Deol1,

Low Latency Applications - imcsummit.org · Globally available transaction data with millisecond response Low-latency data-aware compute on elastic grid Elastic scalability to support

PARTIES: QoS-Aware Resource Partitioning for Multiple ... · We present PARTIES, a QoS-aware resource manager that enables an arbitrary number of interactive, latency-critical services

PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee

Reducing DRAM Latency via Charge-Level-Aware …...Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration Yaohua Wang† Arash Tavakkol† Lois Orosa†∗ Saugata

VARIABILITY-AWARE LATENCY AMELIORATION IN ...atumanov/docs/AlexeyTumanov...virtual environments. End-to-end latency leads to a number of serious anomalies, especially when it comes