36
Understanding Convolutional Neural Networks Jeremy Nixon

Understanding Convolutional Neural Networks

Embed Size (px)

Citation preview

Page 1: Understanding Convolutional Neural Networks

Understanding Convolutional Neural Networks

Jeremy Nixon

Page 2: Understanding Convolutional Neural Networks

Jeremy Nixon● Machine Learning Engineer at the Spark Technology Center● Contributor to MLlib, dedicated to scalable deep learning

○ Author of Deep Neural Network Regression

● Previously, Applied Mathematics to Computer Science & Economics at Harvard

Page 3: Understanding Convolutional Neural Networks

Structure1. Introduction / About2. Motivation

a. Comparison with major machine learning algorithmsb. Tasks achieving State of the Artc. Applications / Specific Concrete Use Cases

3. The Model / Forward Pass4. Framing Deep Learning

a. Automated Feature Engineeringb. Non-local generalizationc. Compositionality

i. Hierarchical Learningii. Exponentially Model Flexibility

d. Learning Representationi. Transformation for Linear Separabilityii. Input Space Contortion

e. Extreme flexibility allowing benefits to large datasets5. Optimization / Backward Pass6. Conclusion

Page 4: Understanding Convolutional Neural Networks

Many Successes of Deep Learning1. CNNs - State of the art

a. Object Recognitionb. Object Localizationc. Image Segmentationd. Image Restoratione. Music Recommendation

2. RNNs (LSTM) - State of the Arta. Speech Recognitionb. Question Answeringc. Machine Translationd. Text Summarizatione. Named Entity Recognitionf. Natural Language Generation

g. Word Sense Disambiguationh. Image / Video Captioningi. Sentiment Analysis

Page 5: Understanding Convolutional Neural Networks

Ever trained a Linear Regression Model?

Page 6: Understanding Convolutional Neural Networks

Linear Regression ModelsMajor Downsides:

Cannot discover non-linear structure in data.

Manual feature engineering by the Data Scientist. This is time consuming and can be infeasible for high dimensional data.

Page 7: Understanding Convolutional Neural Networks

Decision Tree Based Model? (RF, GB)

Page 8: Understanding Convolutional Neural Networks

Decision Tree ModelsUpside:

Capable of automatically picking up on non-linear structure.

Downsides:

Incapable of generalizing outside of the range of the input data.

Restricted to cut points for relationships.

Thankfully, there’s an algorithmic solution.

Page 9: Understanding Convolutional Neural Networks

Neural NetworksProperties

1. Non-local generalization2. Learning Non-linear structure3. Automated feature generation

Page 10: Understanding Convolutional Neural Networks

Generalization Outside Data Range

Page 11: Understanding Convolutional Neural Networks

Feedforward Neural NetworkX = Normalized Data, W1, W2 = Weights, b = Bias

Forward:

1. Multiply data by first layer weights | (X*W1 + b1)2. Put output through non-linear activation | max(0, X*W1 + b1)3. Multiply output by second layer weights | max(0, X*W1 + b) * W2 + b24. Return predicted outputs

Page 12: Understanding Convolutional Neural Networks

The Model / Forward Pass● Forward

○ Convolutional layer■ Procedure + Implementation■ Parameter sharing■ Sparse interactions■ Priors & Assumptions

○ Nonlinearity■ Relu■ Tanh

○ Pooling Layer■ Procedure + Implementation■ Extremely strong prior on image, invariance to small translation.

○ Fully Connected + Output Layer○ Putting it All Together

Page 13: Understanding Convolutional Neural Networks

Convolutional LayerInput Components:

1. Input Image / Feature Map2. Convolutional Filter / Kernel / Parameters / Weights

Output Component:

1. Computed Output Image / Feature Map

Page 14: Understanding Convolutional Neural Networks

Convolutional Layer

Goodfellow, Bengio, Courville

Page 15: Understanding Convolutional Neural Networks

Convolutional Layer

Leow Wee Kheng

Page 16: Understanding Convolutional Neural Networks

Convolutional Layer

Page 17: Understanding Convolutional Neural Networks

1. Every filter weight is used over the entire input.a. This differs strongly from a fully connected network where each weight corresponds to a

single feature.

2. Rather than learning a separate set of parameters for each location, we learn a single set.

3. Dramatically reduces the number of parameters we need to store.

Parameter Sharing

Page 18: Understanding Convolutional Neural Networks

Bold Assumptions1. Convolution be thought of as a fully connected layer with an infinitely strong prior probability that

a. The weights for one hidden unit must be identical to the weights of its neighbor. (Parameter Sharing)

b. Weights must be zero except for in a small receptive field (Sparse Interactions)2. Prior assumption of invariance to locality

a. Assumptions overcome data augmentation with translational shiftsi. Other useful transformations include rotations, flips, color perturbations, etc.

b. Equivariant to translation as a result of parameter sharing, but not to rotation or scale (closer in / farther)

Page 19: Understanding Convolutional Neural Networks

Sparse Interactions

Strong prior on the locality of information.

Deep networks end up with greater connectivity.

Page 20: Understanding Convolutional Neural Networks

Non-Linearities● Element-wise transformation (Applied individually over every element)

Relu Tanh

Page 21: Understanding Convolutional Neural Networks

Max Pooling

Downsampling.

Takes the max value of regions of the input image or filter map.

Imposes extremely strong prior of invariance to translation.

Page 22: Understanding Convolutional Neural Networks

Mean Pooling

Page 23: Understanding Convolutional Neural Networks

Output Layer● Output for classification is often a Softmax function + Cross Entropy loss.● Output for regression is a single output from a linear (identity) layer with a

Sum of Squared Error loss.● Feature map can be flattened into a vector to transition to a fully

connected layer / softmax.

Page 24: Understanding Convolutional Neural Networks

Putting it All Together

We can construct architectures that combine convolution, pooling, and fully connected layers similar to the examples given here.

Page 25: Understanding Convolutional Neural Networks

Framing Deep Learning1. Automated Feature Engineering2. Non-local generalization3. Compositionality

a. Hierarchical Learningb. Exponential Model Flexibility

4. Extreme flexibility opens up benefits to large datasets5. Learning Representation

a. Input Space Contortionb. Transformation for Linear Separability

Page 26: Understanding Convolutional Neural Networks

Automated Feature Generation● Pixel - Edges - Shapes - Parts - Objects : Prediction● Learns features that are optimized for the data

Page 27: Understanding Convolutional Neural Networks

Non - Local Generalization

Page 28: Understanding Convolutional Neural Networks

Hierarchical Learning● Pixel - Edges - Shapes - Parts - Objects : Prediction

Page 29: Understanding Convolutional Neural Networks

Hierarchical Learning● Pixel - Edges - Shapes - Parts - Objects : Prediction

Page 30: Understanding Convolutional Neural Networks

Exponential Model Flexibility

● Deep Learning assumes data was generated by a composition of factors or features.

○ DL has been most successful when this assumption holds.

● Exponential gain in the number of relationships that can be efficiently models through composition.

Page 31: Understanding Convolutional Neural Networks

Model Flexibility and Dataset Size

Large datasets allow the fitting of extremely wide & deep models, which would have overfit in the past.

A combination of large datasets, large & flexible models, and regularization techniques (dropout, early stopping, weight decay) are responsible for success.

Page 32: Understanding Convolutional Neural Networks

Learning Representation:Transform for Linear Separability

Hidden Layer+

Nonlinearity

Chris Olah: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Page 33: Understanding Convolutional Neural Networks

The goal:

Iteratively improve the filter weights so that they generate correct predictions.

We receive an error signal from the difference between our predictions and the true outcome.

Our weights are adjusted to reduce that difference.

The process of computing the correct adjustment to our weights at each layer is called backpropagation.

Backward Pass / Optimization

Page 34: Understanding Convolutional Neural Networks

Convolutional Neural NetworksState of the Art in:

● Computer Vision Applications○ Autonomous Cars

■ Navigation System■ Pedestrian Detection / Localization■ Car Detection / Localization■ Traffic Sign Recognition

○ Facial Recognition Systems○ Augmented Reality

■ Visual Language Translation○ Character Recognition

Page 35: Understanding Convolutional Neural Networks

Convolutional Neural NetworksState of the Art in:

● Computer Vision Applications○ Video Content Analysis○ Object Counting○ Mobile Mapping○ Gesture Recognition○ Human Facial Emotion Recognition○ Automatic Image Annotation○ Mobile Robots○ Many, many more

Page 36: Understanding Convolutional Neural Networks

References● CS 231: http://cs231n.github.io/● Goodfellow, Bengio, Courville: http://www.deeplearningbook.org/● Detection as DNN Regression: http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf● Object Localization: http://arxiv.org/pdf/1312.6229v4.pdf● Pose Regression: https://www.robots.ox.ac.uk/~vgg/publications/2014/Pfister14a/pfister14a.pdf● Yuhao Yang CNN: (https://issues.apache.org/jira/browse/SPARK-9273)● Neural Network Image: http://cs231n.github.io/assets/nn1/neural_net.jpeg● Zeiler / Fergus: https://arxiv.org/pdf/1311.2901v3.pdf