66

Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run
Page 2: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Soumith Chintala

Usable while performant: the challenges building

Page 3: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

Page 4: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

Page 5: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

N samples, each of some shape D

Page 6: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

mini-batch of M samples (M << N), each of shape D

Page 7: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

neural network with weights

Page 8: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

backpropagation: compute derivatives wrt loss, using chain rule

Page 9: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

update weights using the computed gradients

Page 10: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

Page 11: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

neural network with weights

Page 12: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operatorsConvolution

Figure by Vincent Dumolin: https://github.com/vdumoulin/conv_arithmetic

Page 13: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operatorsConvolution

Figure by Vincent Dumolin: https://github.com/vdumoulin/conv_arithmetic

Page 14: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operatorsConvolution

Figure by Vincent Dumolin: https://github.com/vdumoulin/conv_arithmetic

for oc in output_channel: for ic in input_channel: for h in output_height: for w in output_width: for kh in kernel_height: for kw in kernel_width: output_pixel += input_pixel * kernel_value

Page 15: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operators

Figure by Vincent Dumolin: https://github.com/vdumoulin/conv_arithmetic

Page 16: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operatorsMatrix Multiply

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

Page 17: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operatorsPointwise operations

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

for (i=0; i < data_length; i++) { output[i] = input1[i] + input2[i] }

Page 18: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Types of typical operatorsReduction operations

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

double sum = 0.0; for (i=0; i < data_length; i++) { sum += input[i]; }

Page 19: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Chained Together

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

Input

BatchNorm

ReLU

Conv2d Output

Page 20: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Chained Together

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

Input

BatchNorm

ReLU

Conv2d Output

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

Page 21: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Chained Together

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

Input Output

"deep"

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

Page 22: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Chained Together

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

Input Output

"deep"

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

recurrent

Page 23: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Trained with Gradient Descent

Figure by Wikipedia: https://en.wikipedia.org/wiki/Matrix_multiplication

Input Output

"deep"

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

BatchNorm

ReLU

Conv2d

recurrent

Page 24: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

an easy way to see recurrence

Page 25: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

for epoch in range(max_epochs): for data, target in enumerate(training_data): output, hidden = [], zeros() for t in data.size(0): out, hidden = model(data[t], hidden) output.append(out) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

an easy way to see recurrence

Page 26: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

- Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run on GPUs

Page 27: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Problem Statement•Deep Learning Workloads

- Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run on GPUs

- NLP models - LSTM-RNN - model is 1 to 4 "layers" deep - two matmuls across space and time along with pointwise ops -typically run on CPUs if small, GPUs if large

Page 28: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Deep Learning Frameworks•Make this easy to program

for epoch in range(max_epochs): for data, target in enumerate(training_data): output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step()

Page 29: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Pre-PyTorch

meta programming meta programming imperative

Page 30: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Caffe

Page 31: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Caffe

define protobuf, run via command-line utility

Page 32: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Caffe

define protobuf, run via command-line utility

small, efficient library. Could do convents well.

Page 33: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Theano

Page 34: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Theano

meta-program Theano VM via Python API

Page 35: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Theano

meta-program Theano VM via Python APIwhole program optimizations, graph fusion

Page 36: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Theano

meta-program Theano VM via Python APIwhole program optimizations, graph fusion

graphs took minutes to hours to compile and start

Page 37: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Torch-7

Page 38: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Torch-7

imperative programming in Lua

Page 39: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Torch-7

imperative programming in Lua tied closely to underlying C89 implementations

Page 40: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Torch-7

imperative programming in Lua tied closely to underlying C89 implementations

Lua lacked good tooling and ecosystem

Page 41: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

What is PyTorch?

Ndarray library with GPU support

automatic differentiation engine

gradient based optimization package

Deep Learning

Reinforcement LearningNumpy-alternative

Utilities (data loading, etc.)

Page 42: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

ndarray library•np.ndarray <-> torch.Tensor •200+ operations, similar to numpy •very fast acceleration on NVIDIA GPUs

Page 43: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

ndarray library

Numpy PyTorch

Page 44: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

ndarray / Tensor library

Page 45: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

ndarray / Tensor library

Page 46: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

ndarray / Tensor library

Page 47: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

ndarray / Tensor library

Page 48: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

NumPy bridge

Page 49: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

NumPy bridge

Zero memory-copy very efficient

Page 50: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

NumPy bridge

Page 51: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

NumPy bridge

Page 52: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Seamless GPU Tensors

Page 53: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

automatic differentiation enginefor deep learning and reinforcement learning

Page 54: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

W_h = torch.randn(20, 20, requires_grad=True) W_x = torch.randn(20, 10, requires_grad=True) x = torch.randn(1, 10) prev_h = torch.randn(1, 20)

PyTorch Autograd

Page 55: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

MMMM

PyTorch AutogradW_h = torch.randn(20, 20, requires_grad=True) W_x = torch.randn(20, 10, requires_grad=True) x = torch.randn(1, 10) prev_h = torch.randn(1, 20)

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t())

Page 56: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

MMMM

PyTorch AutogradW_h = torch.randn(20, 20, requires_grad=True) W_x = torch.randn(20, 10, requires_grad=True) x = torch.randn(1, 10) prev_h = torch.randn(1, 20)

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h

Page 57: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Add

MMMM

PyTorch AutogradW_h = torch.randn(20, 20, requires_grad=True) W_x = torch.randn(20, 10, requires_grad=True) x = torch.randn(1, 10) prev_h = torch.randn(1, 20)

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h

Page 58: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Add

MMMM

Tanh

PyTorch AutogradW_h = torch.randn(20, 20, requires_grad=True) W_x = torch.randn(20, 10, requires_grad=True) x = torch.randn(1, 10) prev_h = torch.randn(1, 20)

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

Page 59: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

W_h = torch.randn(20, 20, requires_grad=True) W_x = torch.randn(20, 10, requires_grad=True) x = torch.randn(1, 10) prev_h = torch.randn(1, 20)

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

next_h.backward(torch.ones(1, 20))

Add

MMMM

Tanh

PyTorch Autograd

Page 60: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Neural Networks

Page 61: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Neural Networks

Page 62: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Neural Networks

Page 63: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Optimization packageSGD, Adagrad, RMSProp, LBFGS, etc.

Page 64: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Bootstrapping

Writing Dataset loaders

Building models Implementing Training loop

Checkpointing models

Interfacing with environments

Dealing with GPUsBuilding optimizers Building

Baselines

Python + PyTorch - an environment to do all of this

Page 65: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Bootstrapping

Writing Dataset loaders

Building models Implementing Training loop

Checkpointing models

Interfacing with environments

Dealing with GPUsBuilding optimizers Building

Baselines

Python + PyTorch - an environment to do all of thisbootstrapping the Python tooling stack for good UX

Page 66: Soumith Chintala - IBMProblem Statement •Deep Learning Workloads - Vision models - model is very deep, straight-line chain with no recurrence - lots of convolutions - typically run

Python is slow, interpreted•Global interpreter-lock •application logic is order of magnitude slower than C++ •moved autograd engine to C++ •moved everything to ATen

- Side-effect, a clean C++ API