Convolutional Neural Network & Backpropagation … · 2017-11-09 · Content 1.Neural Network...

Preview:

Citation preview

Convolutional Neural Network & Backpropagation Algorithm

Xuelin Qian

Content

1.Neural Network

2.Backpropagation Algorithm

3.Convolutional Neural Network

4.Backpropagation on CNN

Content

1.Neural Network

2.Backpropagation Algorithm

3.Convolutional Neural Network

4.Backpropagation on CNN

“neuron”

where is called the activation function

1. Neural Network

neural network: consist of many simple “neurons”

1. Neural Network

InputLayer

HiddenLayer

OutputLayer

Forward Propagation

1. Neural Network

ln : the number of layerslL : the layer l( )lijW : the weight associated with the connection between unit in layer ,

and unit in layer j l

1+li( )lib : the bias associated with unit in layer 1+li( )lia : the acctivation unit in layeri l

Forward Propagation

1. Neural Network

ln : the number of layerslL : the layer l( )lijW : the weight associated with the connection between unit in layer ,

and unit in layer j l

1+li( )lib : the bias associated with unit in layer 1+li( )lia : the acctivation unit in layeri

( )liz : the total weighted sum of inputs to unit in layer li

e.g.( ) ( ) ( )å

=

+=3

1

11

11

21

jjj bxWz( ) ( )( )212

1 zfa =

l

Forward Propagation

1. Neural Network

ln : the number of layerslL : the layer l( )lijW : the weight associated with the connection between unit in layer ,

and unit in layer j l

1+li( )lib : the bias associated with unit in layer 1+li( )lia : the acctivation unit in layeri( )liz : the total weighted sum of inputs to unit in layer li

l

Content

1.Neural Network

2.Backpropagation Algorithm

3.Convolutional Neural Network

4.Backpropagation on CNN

2. Backpropagation Algorithm

“Loss function”

2. Backpropagation Algorithm

“Gradient Descent”

2. Backpropagation Algorithm

“Gradient Descent”

different i

2. Backpropagation Algorithm

“Derivative chain rule”

( ) ( )( )11 ++ = li

li zfa (Page 7)

( )( ) ( )( ) ( )

( ) ( )( )( )

( )lij

liii

li

iilij W

zyxbWJz

yxbWJW ¶

¶´

¶¶

=¶¶ +

+

1

1 ,;,,;,

( ) ( ) ( ) ( )å=

+ +=ls

j

li

lj

lij

li baWz

1

1

2. Backpropagation Algorithm

“error term”

( )( )

( ) ( )( )iili

li yxbWJ

z,;,1

1+

+

¶¶

=d

which measures how much that node was "responsible" for any errors in output

2. Backpropagation Algorithm

“error term”

a) For each output unit in layer lni

( ) ( ) ( ) ( )å=

+ +=ls

j

li

lj

lij

li baWz

1

1( ) ( )( )11 ++ = li

li zfa (Page 7)

( )( )

( ) ( )( )iili

li yxbWJ

z,;,1

1+

+

¶¶

=d

2. Backpropagation Algorithm

“error term”

b) For 2,,3,2,1 !---= lll nnnl

( ) ( ) ( ) ( )å=

+ +=ls

j

li

lj

lij

li baWz

1

1

( ) ( )( )11 ++ = li

li zfa

2. Backpropagation Algorithm

“error term”

b) For 2,,3,2,1 !---= lll nnnl

2. Backpropagation Algorithm

( )( ) ( )( ) ( )

( ) ( )( )( )

( )lij

liii

li

iilij W

zyxbWJz

yxbWJW ¶

¶´

¶¶

=¶¶ +

+

1

1 ,;,,;,

“Gradient Descent”

“Derivative chain rule”

“error term” ( )( )

( ) ( )( )iili

li yxbWJ

z,;,1

1+

+

¶¶

=d( ) ( ) ( ) ( )å

=

+ +=ls

j

li

lj

lij

li baWz

1

1

“Gradient Descent”

2. Backpropagation Algorithm

“error term”

2. Backpropagation Algorithm

“error term”

2. Backpropagation Algorithm

“error term”

结论:一个点(神经元)的残差=前向的时候下一层中使用过该点的神经单元的残差按权重求和

Content

1.Neural Network

2.Backpropagation Algorithm

3.Convolutional Neural Network

4.Backpropagation on CNN

3. Convolutional Neural Network

3. Convolutional Neural Network

3. Convolutional Neural Network

convolution

How to calculate the number of hidden units?1,000,000 hidden units?

3. Convolutional Neural Network

convolution layer

stride, pad, kernel(size), number(output)

1_

)_ker_2(1 +

-´+=+ hstride

hnelhpadHH ll 1

_)_ker_2(

1 +-´+

=+ wstridewnelwpadWW l

l

3_ker_ker == wnelhnel

5== ll WH

0=pad

1=stride

3. Convolutional Neural Network

pooling layer

Max Pooling & Avg Pooling

3. Convolutional Neural Network

fully connected layer

activation function

ReLU Sigmoid tanh

3. Convolutional Neural Network

MNIST

3. Convolutional Neural Network

convolution layer

a. compute the input size

b. setup stride, pad, kernel(size), number(output),

c. compute the size of filter and output

d. *initialize weights (the first iteration)

e. convolution

al,

Content

1.Neural Network

2.Backpropagation Algorithm

3.Convolutional Neural Network

4.Backpropagation on CNN

4. Backpropagation on CNN

Ø Backpropagation on pooling layer

Ø Backpropagation on convolution layer

4. Backpropagation on CNN

“error term”

结论:一个点(神经元)的残差=前向的时候下一层中使用过该点的神经单元的残差按权重求和

4. Backpropagation on CNN

Backpropagation on pooling layer

1 2 3 45 6 7 89 2 3 56 5 1 1

6 8

9 5

max pooling

kernel_size=2stride=2pad=0

Forward

0 0 0 00 1 0 23 0 0 40 0 0 0

1 2

3 4Forward

( ) ( )ll dd ®+1

4. Backpropagation on CNN

Backpropagation on pooling layer

1 2 3 4

5 6 7 8

9 2 3 5

6 5 1 1

7 11

11 5

avg pooling

kernel_size=2stride=2pad=0

Forward

0.25 0.25 0.5 0.5

0.25 0.25 0.5 0.5

0.75 0.75 1 1

0.75 0.75 1 1

1 2

3 4Backward ( ) ( )ll dd ®+1

4. Backpropagation on CNN

Backpropagation on convolution layer

1 1 1 0 0

0 0 0 1 0

0 1 1 1 1

0 1 1 0 0

1 1 1 1 0

w1 w2 w3

w4 w5 w6

w7 w8 w9

cstride=1pad=0

lW 1+ll

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=?

lW ( )padl ++1d ld

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=2

lW ld

结论:一个点(神经元)的残差=前向的时候下一层中使用过该点的神经单元的残差按权重求和

c

( )padl ++1d

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=?

lW ld( )padl ++1d

c

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=?

lW ld

c

( )padl ++1d

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=?

lW ld

c

( )padl ++1d

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=?

lW ld

c

( )padl ++1d

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=2

lW ld( )padl ++1d

4. Backpropagation on CNN

Backpropagation on convolution layer

w1 w2 w3

w4 w5 w6

w7 w8 w9

c

0 0 0 0 0 0 00 0 0 0 0 0 00 0 1 2 3 0 00 0 4 5 6 0 00 0 7 8 9 0 00 0 0 0 0 0 00 0 0 0 0 0 0

1)ker2(1 +

-´+=+ stride

nelpadWW ll

stride=1pad=2

lW ld( )padl ++1d

(rotation)

4. Backpropagation on CNN

Backpropagation on convolution layer

a.

b. rotate

c. compute (convolution)

d. compute gradient

e. update weights

padl ++1dlW °180

ld

JW¶¶

Reference

1. UFLDL: http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial2. Blogs:

http://www.cnblogs.com/tornadomeet/tag/Deep%20Learning/http://blog.csdn.net/zouxy09/article/category/1387932

THANK YOU

Recommended