64

Click here to load reader

PyDresden 20170824 - Deep Learning for Computer Vision

Embed Size (px)

Citation preview

Page 1: PyDresden 20170824 - Deep Learning for Computer Vision

Deep Learningfor Computer Vision

Alex Conwayalex @ numberboost.com

PyDresden Meetup 24/08/2017

Page 2: PyDresden 20170824 - Deep Learning for Computer Vision

Hands up!

Page 3: PyDresden 20170824 - Deep Learning for Computer Vision

Image Classification

3

Page 4: PyDresden 20170824 - Deep Learning for Computer Vision

https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Object detection

Page 5: PyDresden 20170824 - Deep Learning for Computer Vision

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Page 6: PyDresden 20170824 - Deep Learning for Computer Vision

Image Captioning & Visual Attention

XXX

6

https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning

Page 7: PyDresden 20170824 - Deep Learning for Computer Vision

Image Q&A

7https://arxiv.org/pdf/1612.00837.pdf

Page 8: PyDresden 20170824 - Deep Learning for Computer Vision

Video Q&A

XXX

8

https://www.youtube.com/watch?v=UeheTiBJ0Io

Page 9: PyDresden 20170824 - Deep Learning for Computer Vision

Pix2Pix

https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow

9

Page 10: PyDresden 20170824 - Deep Learning for Computer Vision

Pix2Pix

https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66

10

Page 11: PyDresden 20170824 - Deep Learning for Computer Vision

11

Page 12: PyDresden 20170824 - Deep Learning for Computer Vision

Style Transfer

https://github.com/junyanz/CycleGAN12

Page 13: PyDresden 20170824 - Deep Learning for Computer Vision

Style Transfer

https://github.com/junyanz/CycleGAN13

Page 14: PyDresden 20170824 - Deep Learning for Computer Vision

1. What is a neural network?

2. What is a convolutional neural network?

3. Quick tutorial

4. More advanced methods

5. Practical tips

6. Where to from here?

14

Page 15: PyDresden 20170824 - Deep Learning for Computer Vision

Big Shout Outs

Jeremy Howard & Rachel Thomashttp://course.fast.ai

Andrej Karpathyhttp://cs231n.github.io

15

Page 16: PyDresden 20170824 - Deep Learning for Computer Vision

1.What is a neural network?

Page 17: PyDresden 20170824 - Deep Learning for Computer Vision

What is a single neuron?

17

• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f

Page 18: PyDresden 20170824 - Deep Learning for Computer Vision

What is an Activation Function?

18

Sigmoid Tanh ReLU

Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]

Page 19: PyDresden 20170824 - Deep Learning for Computer Vision

What is a Deep Neural Network?

19

Inputs outputs

hiddenlayer 1

hiddenlayer 2

hiddenlayer 3

Outputs of one layer are inputs into the next layer

Page 20: PyDresden 20170824 - Deep Learning for Computer Vision

How does a neural network learn?

20

• You need labelled examples “training data”

• Initially, the network makes random predictions (weights initialized randomly)

• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (“loss function” e.g. mean squared error)

• Using a method called ‘backpropagation’ (really just the chain rule from calculus 1), we use the error to update the weights (using ‘gradient descent’) so that the error is a little bit smaller next time (it learns from past errors)

• We show the network many examples and update its parameters

• An epoch is when we’ve shown the network all the training examples

• We train for many epochs, measuring the error rate as we go

• (Actually, we split out our data into training and validation so we don’t overfit)

• Stop training when the validation error rate stops decreasing

Page 21: PyDresden 20170824 - Deep Learning for Computer Vision

http://playground.tensorflow.org

Page 22: PyDresden 20170824 - Deep Learning for Computer Vision

What is a Neural Network?

For much more detail, see:

1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html

2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html

22

Page 23: PyDresden 20170824 - Deep Learning for Computer Vision

2. What is a convolutional

neural network?

Page 24: PyDresden 20170824 - Deep Learning for Computer Vision

What is a Convolutional Neural Network?

24

“like a simple neural network but with special types of layers that work well on images”

(math works on numbers)

• Pixel = 3 colour channels (R, G, B)• Pixel intensity = number in [0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers

Page 25: PyDresden 20170824 - Deep Learning for Computer Vision

25This is VGGNet – don’t panic, we’ll come back to it later!

Example Architecture

Page 26: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

26

http://setosa.io/ev/image-kernels/

Page 27: PyDresden 20170824 - Deep Learning for Computer Vision

New Layer Type: Convolutional Layer

27

• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates the

kernel values (using backpropagation) to try minimize loss• The kernels are learned• Kernels are shared across the whole image (parameter sharing)

Page 28: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

28

“imagine taking this 3x3 matrix (“kernel”) and positioning

it over a 3x3 area of an image, and let's multiply each

overlapping value. Next, let's sum up these products, and

let's replace the center pixel with this new value. If we

slide this 3x3 matrix over the entire image, we can

construct a new image by replacing each pixel in the

same manner just described.”

http://cs231n.github.io/convolutional-networks/

Page 29: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

29

“ ...for example, we can start with 8 randomly

generated filters; that is 8 3x3 matrices with random

elements. Given labeled inputs, we can then use

stochastic gradient descent to determine what the

optimal values of these filters are, and therefore we

allow the neural network to learn what things are

most important to detect in classifying images. “

http://cs231n.github.io/convolutional-networks/

Page 30: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

30

https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

Page 31: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

31

Page 32: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

32

Page 33: PyDresden 20170824 - Deep Learning for Computer Vision

Convolutions

33

Page 34: PyDresden 20170824 - Deep Learning for Computer Vision

Great vid

34

https://www.youtube.com/watch?v=AgkfIQ4IGaM

Page 35: PyDresden 20170824 - Deep Learning for Computer Vision

New Layer Type: Max Pooling

• Reduces dimensionality from one layer to next• By replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time e.g.

Instead of identifying fur, identify cat• Reduces computational load• Controls for overfitting since losing information helps the network

generalize35

Page 36: PyDresden 20170824 - Deep Learning for Computer Vision

New Layer Type: Dropout

• Form of regularization (helps prevent overfitting)• Trades ability to fit training data to help generalize to new data• Used during training (not test)• Randomly set weights in hidden layers to 0 with some probability p

36

Page 37: PyDresden 20170824 - Deep Learning for Computer Vision

Bringing it all together

xxx

37

Page 38: PyDresden 20170824 - Deep Learning for Computer Vision

ImageNet

38

http://image-net.org/explore

Page 39: PyDresden 20170824 - Deep Learning for Computer Vision

ImageNet

39

Page 40: PyDresden 20170824 - Deep Learning for Computer Vision

3. Quick tutorial

Page 41: PyDresden 20170824 - Deep Learning for Computer Vision

Using a Pre-Trained ImageNet-Winning CNN

41

• We’ll be using “VGGNet”

• Oxford Visual Geometry Group (VGG)

• The runner-up in ILSVRC 2014

• Network is 16 layers (deep!)

• Its main contribution was in showing that the depth of the network is a critical component for good performance.

• Only 3x3 (stride 1 pad 1) convolutions and 2x2 max pooling

• Easy to fine-tune

Page 42: PyDresden 20170824 - Deep Learning for Computer Vision

Using a Pre-Trained ImageNet-Winning CNN

42

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Page 43: PyDresden 20170824 - Deep Learning for Computer Vision

43

CODE TIME! https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Page 44: PyDresden 20170824 - Deep Learning for Computer Vision

44

Page 45: PyDresden 20170824 - Deep Learning for Computer Vision

Fine-tuning A CNN To Solve A New Problem

• Fix weights in convolutional layers (set trainable=False)• Re-train final dense layer(s)

45

Page 46: PyDresden 20170824 - Deep Learning for Computer Vision

Visual Similarity

46

• Chop off last 2 VGG layers

• Use dense layer with 4096 activations

• Compute nearest neighbours in the space of these activations

Page 47: PyDresden 20170824 - Deep Learning for Computer Vision

47

Page 48: PyDresden 20170824 - Deep Learning for Computer Vision

48

Page 49: PyDresden 20170824 - Deep Learning for Computer Vision

4. More advanced topics

Page 50: PyDresden 20170824 - Deep Learning for Computer Vision

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Lots of Computer Vision Tasks

Page 51: PyDresden 20170824 - Deep Learning for Computer Vision

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015

Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015

Semantic Segmentation

Page 52: PyDresden 20170824 - Deep Learning for Computer Vision

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Object detection

Page 53: PyDresden 20170824 - Deep Learning for Computer Vision

Object detection

Page 54: PyDresden 20170824 - Deep Learning for Computer Vision

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Page 55: PyDresden 20170824 - Deep Learning for Computer Vision
Page 56: PyDresden 20170824 - Deep Learning for Computer Vision

http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py

Style TransferLoss = content_loss + style_loss

Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions

Page 57: PyDresden 20170824 - Deep Learning for Computer Vision

https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/

Page 58: PyDresden 20170824 - Deep Learning for Computer Vision

http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/

Computer Vision Pipelines

https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html

Page 59: PyDresden 20170824 - Deep Learning for Computer Vision

5. Practical tips

Page 60: PyDresden 20170824 - Deep Learning for Computer Vision

Practical Tips• use a GPU – AWS p2 instances (use spot!)

• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping

• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation

– Flipping / stretching / rotating– Slightly change hues / brightness

3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)

60

Page 61: PyDresden 20170824 - Deep Learning for Computer Vision

6. Where to from here?

Page 62: PyDresden 20170824 - Deep Learning for Computer Vision

Where to From Here?

• Clone the repo and try use your own data

• Do the fast.ai course

• Do Stanford cs231n

• Read colah.github.io/posts

• Email me questions /ideas :) [email protected]

62

Page 63: PyDresden 20170824 - Deep Learning for Computer Vision

We’re hiring!

Page 64: PyDresden 20170824 - Deep Learning for Computer Vision

THANKS!https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Alex Conwayalex @ numberboost.com