Click here to load reader
Upload
alex-conway
View
121
Download
5
Embed Size (px)
Citation preview
Deep Learningfor Computer Vision
Alex Conwayalex @ numberboost.com
PyDresden Meetup 24/08/2017
Hands up!
Image Classification
3
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
Image Captioning & Visual Attention
XXX
6
https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning
Image Q&A
7https://arxiv.org/pdf/1612.00837.pdf
Video Q&A
XXX
8
https://www.youtube.com/watch?v=UeheTiBJ0Io
Pix2Pix
https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow
9
Pix2Pix
https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66
10
11
Style Transfer
https://github.com/junyanz/CycleGAN12
Style Transfer
https://github.com/junyanz/CycleGAN13
1. What is a neural network?
2. What is a convolutional neural network?
3. Quick tutorial
4. More advanced methods
5. Practical tips
6. Where to from here?
14
Big Shout Outs
Jeremy Howard & Rachel Thomashttp://course.fast.ai
Andrej Karpathyhttp://cs231n.github.io
15
1.What is a neural network?
What is a single neuron?
17
• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f
What is an Activation Function?
18
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]
What is a Deep Neural Network?
19
Inputs outputs
hiddenlayer 1
hiddenlayer 2
hiddenlayer 3
Outputs of one layer are inputs into the next layer
How does a neural network learn?
20
• You need labelled examples “training data”
• Initially, the network makes random predictions (weights initialized randomly)
• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (“loss function” e.g. mean squared error)
• Using a method called ‘backpropagation’ (really just the chain rule from calculus 1), we use the error to update the weights (using ‘gradient descent’) so that the error is a little bit smaller next time (it learns from past errors)
• We show the network many examples and update its parameters
• An epoch is when we’ve shown the network all the training examples
• We train for many epochs, measuring the error rate as we go
• (Actually, we split out our data into training and validation so we don’t overfit)
• Stop training when the validation error rate stops decreasing
http://playground.tensorflow.org
What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html
22
2. What is a convolutional
neural network?
What is a Convolutional Neural Network?
24
“like a simple neural network but with special types of layers that work well on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)• Pixel intensity = number in [0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers
25This is VGGNet – don’t panic, we’ll come back to it later!
Example Architecture
Convolutions
26
http://setosa.io/ev/image-kernels/
New Layer Type: Convolutional Layer
27
• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates the
kernel values (using backpropagation) to try minimize loss• The kernels are learned• Kernels are shared across the whole image (parameter sharing)
Convolutions
28
“imagine taking this 3x3 matrix (“kernel”) and positioning
it over a 3x3 area of an image, and let's multiply each
overlapping value. Next, let's sum up these products, and
let's replace the center pixel with this new value. If we
slide this 3x3 matrix over the entire image, we can
construct a new image by replacing each pixel in the
same manner just described.”
http://cs231n.github.io/convolutional-networks/
Convolutions
29
“ ...for example, we can start with 8 randomly
generated filters; that is 8 3x3 matrices with random
elements. Given labeled inputs, we can then use
stochastic gradient descent to determine what the
optimal values of these filters are, and therefore we
allow the neural network to learn what things are
most important to detect in classifying images. “
http://cs231n.github.io/convolutional-networks/
Convolutions
30
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
Convolutions
31
Convolutions
32
Convolutions
33
Great vid
34
https://www.youtube.com/watch?v=AgkfIQ4IGaM
New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next• By replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time e.g.
Instead of identifying fur, identify cat• Reduces computational load• Controls for overfitting since losing information helps the network
generalize35
New Layer Type: Dropout
• Form of regularization (helps prevent overfitting)• Trades ability to fit training data to help generalize to new data• Used during training (not test)• Randomly set weights in hidden layers to 0 with some probability p
36
Bringing it all together
xxx
37
ImageNet
38
http://image-net.org/explore
ImageNet
39
3. Quick tutorial
Using a Pre-Trained ImageNet-Winning CNN
41
• We’ll be using “VGGNet”
• Oxford Visual Geometry Group (VGG)
• The runner-up in ILSVRC 2014
• Network is 16 layers (deep!)
• Its main contribution was in showing that the depth of the network is a critical component for good performance.
• Only 3x3 (stride 1 pad 1) convolutions and 2x2 max pooling
• Easy to fine-tune
Using a Pre-Trained ImageNet-Winning CNN
42
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
43
CODE TIME! https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
44
Fine-tuning A CNN To Solve A New Problem
• Fix weights in convolutional layers (set trainable=False)• Re-train final dense layer(s)
45
Visual Similarity
46
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these activations
47
48
4. More advanced topics
cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Lots of Computer Vision Tasks
Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015
Semantic Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection
Object detection
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Style TransferLoss = content_loss + style_loss
Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions
https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/
http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/
Computer Vision Pipelines
https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html
5. Practical tips
Practical Tips• use a GPU – AWS p2 instances (use spot!)
• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping
• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation
– Flipping / stretching / rotating– Slightly change hues / brightness
3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)
60
6. Where to from here?
Where to From Here?
• Clone the repo and try use your own data
• Do the fast.ai course
• Do Stanford cs231n
• Read colah.github.io/posts
• Email me questions /ideas :) [email protected]
62
We’re hiring!
THANKS!https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
Alex Conwayalex @ numberboost.com