30
Overview of Convolutional Neural network Overview of Convolutional Neural network Seoul National University Deep Learning September-December, 2019 1 / 54

Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Overview of Convolutional Neural network

Seoul National University Deep Learning September-December, 2019 1 / 54

Page 2: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network Artificial Neural Networks

Perceptron: Building block

• The perceptron was intended to be a machine, rather than a program,and the perceptron machine was designed for image recognition of anarray of 400 photocells.• The perceptron is an algorithm for a binary classifier: f (x) = 1 ifwx + b > 0, 0, otherwise.

Seoul National University Deep Learning September-December, 2019 2 / 54

Page 3: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network Artificial Neural Networks

Single-layered neural network

• The perceptron model is called single-layered neural network.

Seoul National University Deep Learning September-December, 2019 3 / 54

Page 4: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network Artificial Neural Networks

An example of learned filters or weights for input images

• Note that the filter size is the same as the input size.

Seoul National University Deep Learning September-December, 2019 4 / 54

Page 5: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network Artificial Neural Networks

Multi-layered feedforward neural network

figure from slides of Andrej Karpathy

Feedforward neural networks take input x and predict

P(y = 1|x , θ) = fk(· · · f3(f2(f1(x ; θ1); θ2); θ3) · · · ; θk).

Seoul National University Deep Learning September-December, 2019 5 / 54

Page 6: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network Artificial Neural Networks

Layers of Artificial Neural Network (ANN)

fl(.) is commonly a repeated compositional function of linear andnonlinear transformation.

Trying to estimate invariant function in a compositional manner.

A unit of layers is composed of known and unknown transformations.

Convolutional layer: at the l th layer: Z l = W lhl−1 + bl , whereh0 = x .

W=filters. Z l= neurons. W ’s and b’s are unknown and to beestimated or trained.

Pooling layer

Activation layer: hl = gl(Zl): nonlinear transformation

The last layer: softmax: hKi = exp(Z i )/∑k

l=1 exp(Z l).

Seoul National University Deep Learning September-December, 2019 6 / 54

Page 7: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Convolutional neural network (CNN)

CNN is a special case of feedforward neural network with locality andsharing restriction.

This characteristic is referred to as ‘shift invariance’.

Restriction reduces the number of parameters and helps capture localcharacteristics.

Seoul National University Deep Learning September-December, 2019 7 / 54

Page 8: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Convolutional layer

figure from slides of Andrej Karpathy

Resulting output is a 28 by 28 activation map.

Seoul National University Deep Learning September-December, 2019 8 / 54

Page 9: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Convolutional layer

figure from slides of Andrej Karpathy

Apply 6 filters and obtain 6 activation maps.

Seoul National University Deep Learning September-December, 2019 9 / 54

Page 10: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Role of locality and sharing of convolutional layer

How locality and sharing reduces the number of parameters?

If 32x32x3 volume is processed to 28x28x6 volume as in the figureusing fully connected layer, the number ofparameters=(32*32*3)*(28*28*6)=14.5 Million

With 6 5x5 filters, we only used (5*5*3)*6=450 parameters.

Seoul National University Deep Learning September-December, 2019 10 / 54

Page 11: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Pooling layer

figure from slides of Andrej Karpathy

Average pooling or maxpooling shrinks the representations.Recall averaging or integration can extract invariant features of the

images.Integration over all rota-tions

Seoul National University Deep Learning September-December, 2019 11 / 54

Page 12: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Activation layer

sigm(Z ) = 11+exp(−Z)

tanh(Z )Rectified Linear Unit: ReLU(Z)= max(Z , 0)

Seoul National University Deep Learning September-December, 2019 12 / 54

Page 13: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Stacked layers

The first layer:

Z 1 = W 1h0 + b1 where h0 = x .h1 = g1(Z 1), g1(.) is activation function

The l th layer:

Z l = W lhl−1 + bl

hl = gl(Zl)

Seoul National University Deep Learning September-December, 2019 13 / 54

Page 14: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Stride

• Shrink dimensions by subsampling.

Source:http://adeshpande4.github.io/A-Beginner%Seoul National University Deep Learning September-December, 2019 14 / 54

Page 15: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Padding

Source:https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2%

Seoul National University Deep Learning September-December, 2019 15 / 54

Page 16: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Role of multiple layers via visualization

Seoul National University Deep Learning September-December, 2019 16 / 54

Page 17: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Different architectures

CNNs popularity is triggered by debut of ‘AlexNet’ by Krizhevsky etal. (2012) winning ImageNet Large Scale Visual RecognitionChallenge (ILSVRC).

Imagenet competition is an annual computer vision contest runningsince 2010 after Li launched ImagNet assembling a free database of14 million+ labeled images.

Successful training is due to a large dataset, computational powerusing GPU and some aspects of the algorithm.

Every year through ImageNet competition new architecture andoptimization tips have been proposed and improved the accuracy ofclassification. We cover AlexNet, VGGNet and ResNet.

Seoul National University Deep Learning September-December, 2019 17 / 54

Page 18: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

AlexNet by Krizhevsky et al. (2012)

Start with 224x224x3 input. End with three fully connected layers.

layer Filter size (stride) # filters maxpool (stride) output1.1 11x11x3 (4) 48x2 55x55x961.2 3x3 (2) 27x27x962.1 5x5x96 128x2 27x27x2562.2 3x3 (2) 13x13x2563 3x3x256 192x2 13x13x3844 3x3x384 192x2 13x13x384

5.1 3x3x384 128x2 13x13x2565.2 3x3 (2) 6x6x256=9216

Seoul National University Deep Learning September-December, 2019 18 / 54

Page 19: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

AlexNet (Krizhevsky et al. 2012)

Used ReLu

Heavy data augmentation

Dropout

SGD, batch size 128, momentum=0.9, Reducing learning ratemanually starting from 0.01.

Ensemble of 7 CNNs

Seoul National University Deep Learning September-December, 2019 19 / 54

Page 20: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

VGGNet, OxfordNet (Simonyan and Zisserman, 2014)

Deeper model. More layers (16 layers excluding maxpool and softmaxcompared to 5 layers for AlexNet).

Simpler structure.Only 3x3 filters with stride 1, pad 1, and 2x2 maxpool with stride 2,are used.Number of filters multiplied by two (64, 128, 256, 512)

Source: https://blog.heuritech.com/2016/02/29

Seoul National University Deep Learning September-December, 2019 20 / 54

Page 21: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

VGGNet, OxfordNet (Simonyan and Zisserman, 2014)

Table: Structure of VGGNet

block # cov or fully connected layers # filter size1 2 conv 3x3 64 maxpool2 2 conv 3x3 128 maxpool3 3 conv 3x3 256 maxpool4 3 conv 3x3 512 maxpool5 3 conv 3x3 512 maxpool6 3 Fully connected 4096 (2) 1000 (1) softmax

• maxpool after each block• 140M parameters (heavy from FC layers)

Seoul National University Deep Learning September-December, 2019 21 / 54

Page 22: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

VGGNet: Number of parameters and memory

Seoul National University Deep Learning September-December, 2019 22 / 54

Page 23: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Role of a small filter

If we stack two 3x3 convolutional layers, a neuron in the second layerwill cover 5x5 input region.

If we stack three 3x3 convolutional layers, a neuron in the third layerwill cover 7x7 input region.

If the number of filters is C : 7x7 filter needs Cx(7x7xC ) parameters;three 3x3 filters need 3xCx(3x3xC ). Three 3x3 filters need lessparameters with more nonlinearity.

How about even a smaller filter?

Seoul National University Deep Learning September-December, 2019 23 / 54

Page 24: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Role of a 1x1 filter

• For a HxWxCinput dimension, 1x1x(C/2) filtersoutput HxWx(C/2). (with stride1 and padding to preserve H, W)• (1. 1x1x(C/2) 2. 3x3x(C/2)3. 1x1xC) vs. single 3x3xC?The former needs less numberof parameters, less computation,with more nonlinearity.

Seoul National University Deep Learning September-December, 2019 24 / 54

Page 25: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

GoogLeNet (Szegedy et al., 2014)

Design a good local network topology and stack these modules.

Use of average pooling before the classification

Computationally expensive

Auxiliary classifiers connected to intermediate layers

Seoul National University Deep Learning September-December, 2019 25 / 54

Page 26: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

ResNet (He, Zhang, Ren and Sun, 2015)

Deeper the better? He et al. (2015) showed that deeper models canhave higher training error than shallower models.

Instead of f2(f1(xw1)w2) as in Alexnet or VGGNet, ResNet models theresidual, i.e., f1(xw1) + f2(f1(xw1)w2) so that w2 = 0 reduces to ashallow model.

Seoul National University Deep Learning September-December, 2019 26 / 54

Page 27: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

ResNet

• 152-layer model• Every residual block has 3x3 conv layers• Periodilcally, double thenumber of filters and downsample spatially using stride 2• Additional conv layer at the beginning• No FC layers at the end• For deeper networks (50+ layers) usebottleneck layer to improve efficiency: 1x1→ 3x3 → 1x1• No dropout• Batch normalization • No maxpooling

Seoul National University Deep Learning September-December, 2019 27 / 54

Page 28: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

ResNet (He, Zhang, Ren and Sun, 2015)

Seoul National University Deep Learning September-December, 2019 28 / 54

Page 29: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Performance of various architectures

source: Canziani, Culuciello and Paszke (2017)

Seoul National University Deep Learning September-December, 2019 29 / 54

Page 30: Overview of Convolutional Neural networkstat.snu.ac.kr/mcp/Lecture_5_DNN.pdf · 2019-09-23 · Overview of Convolutional Neural network Arti cial Neural Networks Layers of Arti cial

Overview of Convolutional Neural network

Regularizations

In most cases, the number of parameters exceeds the number oftraining samples. To avoid overfitting, some regularization isnecessary.

ReLU (non-negative thresholding operator)

Early stopping

L1, L2 penalty on weights

Dropout

Batch normalization

Data augmentation

Ensemble

Seoul National University Deep Learning September-December, 2019 30 / 54