41

POWERING THE DEEP LEARNING ECOSYSTEM

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: POWERING THE DEEP LEARNING ECOSYSTEM
Page 2: POWERING THE DEEP LEARNING ECOSYSTEM

2

POWERING THE DEEP LEARNING ECOSYSTEM

COMPUTER VISION

OBJECT DETECTION IMAGE CLASSIFICATION

SPEECH & AUDIO

VOICE RECOGNITION LANGUAGE TRANSLATION

NATURAL LANGUAGE

PROCESSINGRECOMMENDATION

ENGINESSENTIMENT ANALYSIS

DEEP LEARNING FRAMEWORKS

Mocha.jl

NVIDIA DEEP LEARNING SDK

developer.nvidia.com/deep-learning-software

Page 3: POWERING THE DEEP LEARNING ECOSYSTEM

3

INFERENCE WITH TENSORRT

❑ Introduction of TensorRT

❑ TensorRT 7: What’s New

❑ TensorRT workflow

❑ TensorRT Plugin

❑ Plugin Sample

Page 4: POWERING THE DEEP LEARNING ECOSYSTEM

4

TENSORRT: GPU INFERENCE

ENGINE

Page 5: POWERING THE DEEP LEARNING ECOSYSTEM

A COMPLETE DL PLATFORM

MANAGE TRAIN DEPLOY

DIGITS

DATACENTER AUTOMOTIVE

TRAINTEST

MANAGE / AUGMENTEMBEDDED

GPU INFERENCE ENGINE

Page 6: POWERING THE DEEP LEARNING ECOSYSTEM

6

TensorRT works

at deploy stage

Page 7: POWERING THE DEEP LEARNING ECOSYSTEM

7

Why TensorRT ?

1.6L Engine

Page 8: POWERING THE DEEP LEARNING ECOSYSTEM

8

Why TensorRT ?

Page 9: POWERING THE DEEP LEARNING ECOSYSTEM

9

Why TensorRT ?

Page 10: POWERING THE DEEP LEARNING ECOSYSTEM

10

ONNX: Added ConstantOfShape, DequantizeLinear, Equal, Erf, Expand, Greater, GRU,

Less, Loop, LRN, LSTM, Not, PRelu, QuantizeLinear, RandomUniform,

RandomUniformLike, Range, RNN, Scan, Sqrt, Tile, and Where

Page 11: POWERING THE DEEP LEARNING ECOSYSTEM

11

Page 12: POWERING THE DEEP LEARNING ECOSYSTEM

12

Page 13: POWERING THE DEEP LEARNING ECOSYSTEM

1311

OPTIMIZATION ENGINE

EXECUTION ENGINE

PLANNEURAL NETWORK

Pre-trained FP32 model and network

Input

Output

● Optimized execution engine on GPU for deployment

Serialized a PLAN can be reloaded from the disk into the TensorRT runtime. There is no need to perform the optimization step again.

Page 14: POWERING THE DEEP LEARNING ECOSYSTEM

14

Page 15: POWERING THE DEEP LEARNING ECOSYSTEM

15

1

5

concat

max pool

next input

3x3 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias5x5 conv.

relu

bias

relu

bias

1x1 conv.

input

concat

Page 16: POWERING THE DEEP LEARNING ECOSYSTEM

16

1

6

concat

max pool

next input

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

input

concat

Page 17: POWERING THE DEEP LEARNING ECOSYSTEM

17

1

7

concat

max pool

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

input

concat

Page 18: POWERING THE DEEP LEARNING ECOSYSTEM

18

1

8

Concat elision

max pool

input

next input

1x1 CBR

1x1 CBR

3x3 CBR 5x5 CBR

Page 19: POWERING THE DEEP LEARNING ECOSYSTEM

19

1

9

Concurrency

max pool

input

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

Page 20: POWERING THE DEEP LEARNING ECOSYSTEM

20

Page 21: POWERING THE DEEP LEARNING ECOSYSTEM

21

main()

Build engine:

• create parser

• setWrokSpace

• setkType

• serialize

• free

PluginFactory:

• createPlugin

• deserialization plugin

implementation

• isPlugin

• free

MyPlugin:

• MyPlugin()/~Myplugin()

• getNbOutputs()

• getOutputDimensions()

• initialize()

• terminate()

• enqueue()--→ Cuda kernel function

• serialize()/deserialize()

do_inference:

• bind the buffers

• create GPU buffers

and a stream

• transfer data

• enqueue

• release the stream

and the buffers

Page 22: POWERING THE DEEP LEARNING ECOSYSTEM

22

Page 23: POWERING THE DEEP LEARNING ECOSYSTEM

23

Page 24: POWERING THE DEEP LEARNING ECOSYSTEM

24

IPluginV2Ext

IPluginV2IOExt

IPluginV2DynamicExt

IPluginCreator

Page 25: POWERING THE DEEP LEARNING ECOSYSTEM

25

Page 26: POWERING THE DEEP LEARNING ECOSYSTEM

26

• →

Page 27: POWERING THE DEEP LEARNING ECOSYSTEM

27

Page 28: POWERING THE DEEP LEARNING ECOSYSTEM

28

Page 29: POWERING THE DEEP LEARNING ECOSYSTEM

29

Page 30: POWERING THE DEEP LEARNING ECOSYSTEM

30

Page 31: POWERING THE DEEP LEARNING ECOSYSTEM

31

Page 32: POWERING THE DEEP LEARNING ECOSYSTEM

32

Page 33: POWERING THE DEEP LEARNING ECOSYSTEM

33

Page 34: POWERING THE DEEP LEARNING ECOSYSTEM

34

MyPlugin enqueueCuda

Kernel

Page 35: POWERING THE DEEP LEARNING ECOSYSTEM

35

Page 36: POWERING THE DEEP LEARNING ECOSYSTEM

36

Page 37: POWERING THE DEEP LEARNING ECOSYSTEM

37

Page 38: POWERING THE DEEP LEARNING ECOSYSTEM

38

Page 39: POWERING THE DEEP LEARNING ECOSYSTEM

39

••

Page 40: POWERING THE DEEP LEARNING ECOSYSTEM

40

https://developer.nvidia-china.com

Page 41: POWERING THE DEEP LEARNING ECOSYSTEM

THANK YOU