PyDresden 20170824 - Deep Learning for Computer Vision

Deep Learningfor Computer Vision

Alex Conwayalex @ numberboost.com

PyDresden Meetup 24/08/2017

Hands up!

Image Classification

3

https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Object detection

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Image Captioning & Visual Attention

XXX

6

https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning

Image Q&A

7https://arxiv.org/pdf/1612.00837.pdf

Video Q&A

XXX

8

https://www.youtube.com/watch?v=UeheTiBJ0Io

Pix2Pix

https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow

9

Pix2Pix

https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66

10

11

Style Transfer

https://github.com/junyanz/CycleGAN12

Style Transfer

https://github.com/junyanz/CycleGAN13

1. What is a neural network?

2. What is a convolutional neural network?

3. Quick tutorial

4. More advanced methods

5. Practical tips

6. Where to from here?

14

Big Shout Outs

Jeremy Howard & Rachel Thomashttp://course.fast.ai

Andrej Karpathyhttp://cs231n.github.io

15

1.What is a neural network?

What is a single neuron?

17

• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f

What is an Activation Function?

18

Sigmoid Tanh ReLU

Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]

What is a Deep Neural Network?

19

Inputs outputs

hiddenlayer 1

hiddenlayer 2

hiddenlayer 3

Outputs of one layer are inputs into the next layer

How does a neural network learn?

20

• You need labelled examples “training data”

• Initially, the network makes random predictions (weights initialized randomly)

• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (“loss function” e.g. mean squared error)

• Using a method called ‘backpropagation’ (really just the chain rule from calculus 1), we use the error to update the weights (using ‘gradient descent’) so that the error is a little bit smaller next time (it learns from past errors)

• We show the network many examples and update its parameters

• An epoch is when we’ve shown the network all the training examples

• We train for many epochs, measuring the error rate as we go

• (Actually, we split out our data into training and validation so we don’t overfit)

• Stop training when the validation error rate stops decreasing

http://playground.tensorflow.org

What is a Neural Network?

For much more detail, see:

1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html

2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html

22

2. What is a convolutional

neural network?

What is a Convolutional Neural Network?

24

“like a simple neural network but with special types of layers that work well on images”

(math works on numbers)

• Pixel = 3 colour channels (R, G, B)• Pixel intensity = number in [0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers

25This is VGGNet – don’t panic, we’ll come back to it later!

Example Architecture

Convolutions

26

http://setosa.io/ev/image-kernels/

New Layer Type: Convolutional Layer

27

• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates the

kernel values (using backpropagation) to try minimize loss• The kernels are learned• Kernels are shared across the whole image (parameter sharing)

Convolutions

28

“imagine taking this 3x3 matrix (“kernel”) and positioning

it over a 3x3 area of an image, and let's multiply each

overlapping value. Next, let's sum up these products, and

let's replace the center pixel with this new value. If we

slide this 3x3 matrix over the entire image, we can

construct a new image by replacing each pixel in the

same manner just described.”

http://cs231n.github.io/convolutional-networks/

Convolutions

29

“ ...for example, we can start with 8 randomly

generated filters; that is 8 3x3 matrices with random

elements. Given labeled inputs, we can then use

stochastic gradient descent to determine what the

optimal values of these filters are, and therefore we

allow the neural network to learn what things are

most important to detect in classifying images. “

http://cs231n.github.io/convolutional-networks/

Convolutions

30

https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

Convolutions

31

Convolutions

32

Convolutions

33

Great vid

34

https://www.youtube.com/watch?v=AgkfIQ4IGaM

New Layer Type: Max Pooling

• Reduces dimensionality from one layer to next• By replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time e.g.

Instead of identifying fur, identify cat• Reduces computational load• Controls for overfitting since losing information helps the network

generalize35

New Layer Type: Dropout

• Form of regularization (helps prevent overfitting)• Trades ability to fit training data to help generalize to new data• Used during training (not test)• Randomly set weights in hidden layers to 0 with some probability p

36

Bringing it all together

xxx

37

ImageNet

38

http://image-net.org/explore

ImageNet

39

3. Quick tutorial

Using a Pre-Trained ImageNet-Winning CNN

41

• We’ll be using “VGGNet”

• Oxford Visual Geometry Group (VGG)

• The runner-up in ILSVRC 2014

• Network is 16 layers (deep!)

• Its main contribution was in showing that the depth of the network is a critical component for good performance.

• Only 3x3 (stride 1 pad 1) convolutions and 2x2 max pooling

• Easy to fine-tune

Using a Pre-Trained ImageNet-Winning CNN

42

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

43

CODE TIME! https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

44

Fine-tuning A CNN To Solve A New Problem

• Fix weights in convolutional layers (set trainable=False)• Re-train final dense layer(s)

45

Visual Similarity

46

• Chop off last 2 VGG layers

• Use dense layer with 4096 activations

• Compute nearest neighbours in the space of these activations

47

48

4. More advanced topics

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Lots of Computer Vision Tasks

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015

Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015

Semantic Segmentation

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Object detection

Object detection

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py

Style TransferLoss = content_loss + style_loss

Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions

https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/

http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/

Computer Vision Pipelines

https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html

5. Practical tips

Practical Tips• use a GPU – AWS p2 instances (use spot!)

• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping

• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation

– Flipping / stretching / rotating– Slightly change hues / brightness

3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)

60

6. Where to from here?

Where to From Here?

• Clone the repo and try use your own data

• Do the fast.ai course

• Do Stanford cs231n

• Read colah.github.io/posts

• Email me questions /ideas :) [email protected]

62

We’re hiring!

THANKS!https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Alex Conwayalex @ numberboost.com

Data & Analytics

PyDresden 20170824 - Deep Learning for Computer Vision