40
Bill Veenhuis | [email protected] NVIDIA FOR DEEP LEARNING

NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

Bill Veenhuis | [email protected]

NVIDIA FOR DEEP LEARNING

Page 2: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

2

Nvidia is the world’s leading ai platform

ONE ARCHITECTURE — CUDA

Page 3: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

33

GPUCPU

GPU: Perfect Companion for Accelerating Apps & A.I.

Page 4: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

44

AGENDA&

TOPICS

• Intro to AI

• Deep Learning Intro

• NVIDIA’s DIGITS

• Autoencoding Enhancement

• TensorRT

Page 5: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

5

Intro to AI

Page 6: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

6

ARTIFICIAL NEURONS

From Stanford cs231n lecture notes

Biological neuron

w1 w2 w3

x1 x2 x3

y

y=F(w1x1+w2x2+w3x3)

Artificial neuron

Weights (Wn) = parameters

Page 7: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

7

ARTIFICIAL NEURAL NETWORKA collection of simple, trainable mathematical units that collectively

learn complex functions

Input layer Output layer

Hidden layers

Given sufficient training data an artificial neural network can approximate very complexfunctions mapping raw data to output decisions

Page 8: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

8

DEEP NEURAL NETWORK (DNN)

Input Result

Application components:

Task objectivee.g. Identify face

Training data10-100M images

Network architecture~10s-100s of layers1B parameters

Learning algorithm~30 Exaflops1-30 GPU days

Raw data Low-level features Mid-level features High-level features

Page 9: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

9

WHAT IS DEEP LEARNING?

Page 10: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

10

Accomplishing complex goals

Page 11: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

11

Difference in Workflow

Input

Hand

Designed

Features

Output

Classic Machine Learning [ 1990 : now ]Examples [ Regression and SVMs ]

Model /

Mapping

Example [ Conv Net ]

InputSimple

FeaturesOutput

Deep/End-to-End Learning [ 2012 : now ]

Model/

Mapping

Complex

Features

Page 12: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

12

Traditional Workflow

Input

Hand

Designed

Features

Output

Classic Machine Learning [ 1990 : now ]Examples [ Regression and SVMs ]

Model /

Mapping

Challenge in Slack channel: How would you describe this image to someone (or something) blind?

Difficult: From it’s raw pixels.Medium: From geometric primitives (lines, curves, colors)Easy: Using any words that you may know

Page 13: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

13

Deep Learning Workflow

Example [ Conv Net ]

InputSimple

FeaturesOutput

Deep/End-to-End Learning [ 2012 : now ]

Model/

Mapping

Complex

Features

Experience: Trust Neural Network to learn features and model by providing inputs and outputs.

Key Skill: Experience (data) creation

Page 14: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

14

NVIDIA’S DIGITS

Page 15: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

15

NVIDIA’S DIGITSInteractive Deep Learning GPU Training System

• Simplifies common deep learning tasks such as:

• Managing data

• Designing and training neural networks on multi-GPU systems

• Monitoring performance in real time with advanced visualizations

• Completely interactive so data scientists can focus on designing and training networks rather than programming and debugging

• Open source

Page 16: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

1616

Process Data Configure DNN VisualizationMonitor Progress

Interactive Deep Learning GPU Training System

NVIDIA’S DIGITS

Page 17: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

17

DIGITS - MODEL

Differences may exist between model tasks

Can anneal the learning rate

Define custom layers with Python

Page 18: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

18

DIGITS – VISUALIZATION RESULTS

Page 19: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

19

ENHANCING IMAGES WITH AN AI AUTOENCODER

Page 20: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

20

A great candidate for Deep Learning!

Page 21: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

21

Training Set of images.

1 sample per pixel

• It requires pairs of noisy and noise-free

images. The network will learn to

remove the noise from the images.

• We can then deploy this trained model

to any image we want to denoise.

(inference)

Page 22: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

228

Apply trained network to noisy

images

Deep Learning for Image Denoising

Inferencing

Collect iamhes

Add Noise to training images

Training on progression of images

Training Data

Training

Trained network detects

noise and reconstructs

Trained Neural Network

Page 23: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

23

Learning about images (CNN)

Input Result

Application components:

Task objectivee.g. Identify face

Training data10-100M images

Network architecture~10s-100s of layers1B parameters

Learning algorithm~30 Exaflops1-30 GPU days

Raw data Low-level features Mid-level features High-level features

Page 24: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

24

Autoencoder – in Action

Page 25: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

25

Page 26: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

26

10 VCAs 15 Minutes

1 VCA 2.5 hours

1 M6000 20 hours

Apply Noise Apply Noise

Page 27: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

2727

Provide image to autoencoder enhance

Page 28: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

28

Page 29: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

29

TensorRT

SOFTWARE INFERENCING PERFROMANCE EHANCEMENT

Page 30: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

3030

NVIDIA DEEP LEARNING SOFTWARE PLATFORM

NVIDIA DEEP LEARNING SDK

TensorRT

Embedded

Automotive

Data center

TRAINING FRAMEWORK

TrainingData

Training

Data Management

Model Assessment

Trained NeuralNetwork

developer.nvidia.com/deep-learning-software

Page 31: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

3131

NVIDIA TensorRTHigh-performance deep learning inference for production deployment

developer.nvidia.com/tensorrt

High performance neural network inference engine for production deployment

Generate optimized and deployment-ready models for datacenter, embedded and automotive platforms

Deliver high-performance, low-latency inference demanded by real-time services

Deploy faster, more responsive and memory efficient deep learning applications with INT8 and FP16 optimized precision support

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

2 8 128

CPU-Only

Tesla P40 + TensorRT (FP32)

Tesla P40 + TensorRT (INT8)

Up to 36x More Image/sec

Batch Size

GoogLenet, CPU-only vs Tesla P40 + TensorRTCPU: 1 socket E4 2690 v4 @2.6 GHz, HT-onGPU: 2 socket E5-2698 v3 @2.3 GHz, HT off, 1 P40 card in the box

Images/

Second

Page 32: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

3232

TENSORRT

• Image Classification (AlexNet, GoogleNet, VGG, ResNet)

• Object Detection

• Segmentation

Networks Supported

Not Yet Supported

• RNN/LSTM

• 3D convolutions

• Custom user layers

Page 33: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

3333

TENSORRT

• Convolution: Currently only 2D convolutions

• Activation: ReLU, tanh and sigmoid

• Pooling: max and average

• Scale: similar to Caffe Power layer (shift+scale*x)^p

• ElementWise: sum, product or max of two tensors

• LRN: cross-channel only

• Fully-connected: with or without bias

• SoftMax: cross-channel only

• Deconvolution

Layers Types Supported

Page 34: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

34

TENSORRTWorkflow

Training FrameworkOPTIMIZATION USING TensorRT

RUNTIMEUSING TensorRT

PLANNEURALNETWORK

developer.nvidia.com/tensorrt

Page 35: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

3535

TENSORRTOptimizations

• Fuse network layers

• Eliminate concatenation layers

• Kernel specialization

• Auto-tuning for target platform

• Tuned for given batch sizeTRAINED

NEURAL NETWORK

OPTIMIZEDINFERENCERUNTIME

developer.nvidia.com/tensorrt

Page 36: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

36

GRAPH OPTIMIZATIONUnoptimized network

concat

max pool

input

next input

3x3 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

concat

1x1 conv.

relu

bias

5x5 conv.

relu

bias

Page 37: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

37

GRAPH OPTIMIZATIONVertical fusion

concat

max pool

input

next input

concat

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

Page 38: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

38

GRAPH OPTIMIZATIONHorizontal fusion

concat

max pool

input

next input

concat

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

Page 39: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

3939

INT8 PRECISIONNew in TensorRT

ACCURACYEFFICIENCYPERFORMANCE

0

1000

2000

3000

4000

5000

6000

7000

2 4 128

FP32 INT8

Up To 3x More Images/sec with INT8

Precision

Batch Size

GoogLenet, FP32 vs INT8 precision + TensorRT on

Tesla P40 GPU, 2 Socket Haswell E5-2698 [email protected] with HT off

Images/

Second

0

200

400

600

800

1000

1200

1400

2 4 128

FP32 INT8

Deploy 2x Larger Models with INT8

Precision

Batch Size

Mem

ory

(M

B)

0%

20%

40%

60%

80%

100%

Top 1Accuracy

Top 5Accuracy

FP32 INT8

Deliver full accuracy with INT8

precision

% A

ccura

cy

Page 40: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System

40

THANK YOU