42
Making OpenVX Really “Real Time” Ming Yang 1 , Tanya Amert 1 , Kecheng Yang 1,2 , Nathan Otterness 1 , James H. Anderson 1 , F. Donelson Smith 1 , and Shige Wang 3 1 The University of North Carolina at Chapel Hill 2 Texas State University 3 General Motors Research

Making OpenVX Really “Real Time”

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Making OpenVX Really “Real Time”

Making OpenVX Really “Real Time”

Ming Yang1, Tanya Amert1, Kecheng Yang1,2, Nathan Otterness1, James H. Anderson1, F. Donelson Smith1, and Shige Wang3

1The University of North Carolina at Chapel Hill 2Texas State University

3General Motors Research

Page 2: Making OpenVX Really “Real Time”

700 ms

Page 3: Making OpenVX Really “Real Time”
Page 4: Making OpenVX Really “Real Time”

A new approach for

graph scheduling

Page 5: Making OpenVX Really “Real Time”

Shorter response time +

Less capacity loss

Page 6: Making OpenVX Really “Real Time”

1. State of the art

2. Our approach

3. Future work

!6

Page 7: Making OpenVX Really “Real Time”

!7 Source: https://www.khronos.org/openvx/

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Example OpenVX Graph

Native Camera Control

Downstream Application Processing

Graph-based architecture

Application Application

GPU FPGA DSP

Portability to diverse hardware

Does OpenVX really target “real-time” processing?

Page 8: Making OpenVX Really “Real Time”

!8 Source: https://www.khronos.org/openvx/

1. It lacks real-time concepts

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Example OpenVX Graph

Native Camera Control

Downstream Application Processing

2. Entire graphs = monolithic schedulable entities

Does OpenVX really target “real-time” processing?

Page 9: Making OpenVX Really “Real Time”

!9 Source: https://www.khronos.org/openvx/

1. It lacks real-time concepts2. Entire graphs = monolithic schedulable entities

DC

BA

Does OpenVX really target “real-time” processing?

Page 10: Making OpenVX Really “Real Time”

DCBA

!10 Source: https://www.khronos.org/openvx/

1. It lacks real-time concepts2. Entire graphs = monolithic schedulable entities

DC

BA

Monolithic schedulingTime

A …A B C D

Does OpenVX really target “real-time” processing?

Page 11: Making OpenVX Really “Real Time”

Prior Work• OpenVX nodes = schedulable entities [23, 51]

!11

Coarse-grained scheduling

D

C

B

A

Coarse-grained schedulingTime

A

B

C

D …

Task A:

Task B:

Task C:

Task D:

DC

BA

Page 12: Making OpenVX Really “Real Time”

Prior Work• OpenVX nodes = schedulable entities [23, 51]

!12

Coarse-grained scheduling

Remaining problems: 1. More parallelism to be explored2. Suspension-oblivious analysis was applied and

causes capacity loss.

Page 13: Making OpenVX Really “Real Time”

Fine-Grained SchedulingThis Work

Page 14: Making OpenVX Really “Real Time”

!14

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Page 15: Making OpenVX Really “Real Time”

!15

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Page 16: Making OpenVX Really “Real Time”

Coarse-Grained Scheduling

!16

Time…

Task A:

Task B:

Task C:

Task D:

DC

BA

Suspension for GPU execution

Time

Task A:

Task E:

Task C:

Task D:

Task F:

Task G:

DC

A

GPU execution

E F G

Fine-Grained Scheduling

Page 17: Making OpenVX Really “Real Time”

!17

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Page 18: Making OpenVX Really “Real Time”

Deriving Response-Time Bounds for a DAG*

Step 1: Schedule the nodes as sporadic tasks

Step 2: Compute bounds for every node

Step 3: Sum the bounds of nodes on the critical path

!18

* C. Liu and J. Anderson, “Supporting Soft Real-Time DAG-based Systems on Multiprocessors with No Utilization Loss,” in RTSS, 2013.

Page 19: Making OpenVX Really “Real Time”

!19

Deriving Response-Time Bounds for a DAG

DC

AB E F

Page 20: Making OpenVX Really “Real Time”

!20

Deriving Response-Time Bounds for a DAG

DC

AB E F

Page 21: Making OpenVX Really “Real Time”

CPU

GPU

!21

Deriving Response-Time Bounds for a DAG

DC

AB

E

F

Need a response-time bound analysis for GPU tasks

Page 22: Making OpenVX Really “Real Time”

2048

2048

A system model of GPU Tasks

!22

τ1 = (3076,6,2,1024)

SM1

SM0

0 6 Time3

τi = (Ci, Ti, Bi, Hi)

Period

Number of blocks

Number of threads per block (or block size)

Per-block worst-case workload

C1

B1

H1 = 1024

T1

Page 23: Making OpenVX Really “Real Time”

Response-Time Bounds Proof Sketch

!23

τk,j

rk,j + Rkτk

1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.

Page 24: Making OpenVX Really “Real Time”

Response-Time Bounds Proof Sketch

!24

τk,j

rk,j + Rkτk

1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.

Time

Releases:

1 2 3 4 5

1

2

3

4

5

Without intra-task parallelism:

With intra-task parallelism:

Page 25: Making OpenVX Really “Real Time”

Response-Time Bounds Proof Sketch

!25

τk,j

rk,j + Rkτk

SM1

SM0

Time

Rk

τk,j

rk,j

2. We then bound the unfinished workload from jobs released at

or before .rk,j

3. We prove the job finishes before

.rk,j + Rk

1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.

Page 26: Making OpenVX Really “Real Time”

!26

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Page 27: Making OpenVX Really “Real Time”

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!27

Resize Image Compute Gradients

Compute Orientation Histograms

Normalize Orientation HistogramsResize Image

Resize ImageCompute GradientsCompute

Gradients

Compute Orientation HistogramsCompute

Orientation Histograms

Normalize Orientation Histograms

Normalize Orientation Histograms

vxHOGCellsNodevxHOGCells

NodevxHOGFeature

sNodevxHOGFeaturesNodevxHOGFeaturesNode

vxHOGCellsNode

• Application: Histogram of Oriented Gradients (HOG)

CPU+GPU Execution (Coarse-Grained) GPU Execution (Fine-Grained)

Page 28: Making OpenVX Really “Real Time”

• Application: Histogram of Oriented Gradients (HOG)

• 6 instances

• 33 ms period

• 30,000 samples

• Platform: NVIDIA Titan V GPU + Two eight-core Intel CPUs.

• Schedulers: G-EDF, G-FL (fair-lateness)

!28

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 29: Making OpenVX Really “Real Time”

!29

Left is better

Time

% o

f sam

ples

50% samples have response time less

than 60 ms

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 30: Making OpenVX Really “Real Time”

!30

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 31: Making OpenVX Really “Real Time”

!31

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Half the average response time

[1] [2]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 32: Making OpenVX Really “Real Time”

!32

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Half the average response time

One-third the maximum response time

[1] [2]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 33: Making OpenVX Really “Real Time”

!33

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

[1] [2]

[3]

Half the average response time

One-third the maximum response time

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 34: Making OpenVX Really “Real Time”

!34

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

[1] [2]

[3]

[3]

Half the average response time

One-third the maximum response time

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 35: Making OpenVX Really “Real Time”

!35

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) N/A

[1] [2]

[3]

[3]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 36: Making OpenVX Really “Real Time”

!36

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) N/A N/A

[1] [2]

[3]

[3]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 37: Making OpenVX Really “Real Time”

!37

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) 542.39 N/A N/A

[1] [2]

[3]

[3]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 38: Making OpenVX Really “Real Time”

!38

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) 542.39 N/A N/A

[1] [2]

[3]

[3]

An alert driver takes 700 ms to react.

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 39: Making OpenVX Really “Real Time”

!39

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) 542.39 N/A N/A

[1] [2]

[3]

[3]

An alert driver takes 700 ms to react.

• Fair-lateness-based scheduler is beneficial as it reduced node response times by up to 9.9%.

• Overheads of supporting fine-grained scheduling was 14.15%.

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Page 40: Making OpenVX Really “Real Time”

Conclusions

1. Fine-grained scheduling

2. Response-time bounds analysis for GPU tasks

3. Case study

!40

Page 41: Making OpenVX Really “Real Time”

Future Work

1. Cycles in the graph

2. Other resource constraints

3. Schedulability studies

!41

Page 42: Making OpenVX Really “Real Time”

Thanks!