Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Backpropagation
Hung-yi Lee
Background
• Cost Function 𝐶 𝜃
• Given training examples: 𝑥1, 𝑦1 , ⋯ , 𝑥𝑟 , 𝑦𝑟 , ⋯ , 𝑥𝑅 , 𝑦𝑅
• Find a set of parameters θ* minimizing C(θ)
• 𝐶 𝜃 =1
𝑅 𝑟 𝐶
𝑟 𝜃 , 𝐶𝑟 𝜃 = 𝑓 𝑥𝑟; 𝜃 − 𝑦𝑟
• Gradient Descent
• 𝛻𝐶 𝜃 =1
𝑅 𝑟 𝛻𝐶𝑟 𝜃
• Given 𝑤𝑖𝑗𝑙 and 𝑏𝑖
𝑙, we have to compute 𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 and
𝜕𝐶𝑟 𝜕𝑏𝑖𝑙
• There is an efficient way to compute the gradients of the network parameters – backpropagation.
Chain Rule
Case 1
Case 2
yhz xgy
dx
dy
dy
dz
dx
dzzyx
yxkz , shy sgx
s
y
y
z
s
x
x
z
s
z
s z
x
y
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙
• is the multiplication of two termsl
ij
r
w
C
……
1
2
j
……
1
2
il
ijw l
iz l
ia
rl
i
l
ij ΔCΔzΔw
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
Layer lLayer 1l
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - First Term
• is the multiplication of two termsl
ij
r
w
C
……
1
2
j
……
1
2
il
ijw l
iz l
ia
rl
i
l
ij ΔCΔzΔw
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
Layer lLayer 1l
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - First Term
……
Layer l-1
1
2
j…
…
1
2
i
Layer l
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
l
j
j
l
ij
l
i bawz 1 1
l
jl
ij
l
i aw
zIf l > 1
l
ijw l
iz
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - First Term
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
If l = 1
111
i
r
j
j
iji bxwz r
j
ij
i xw
z
1
1
……
Input
……
1
2
i
Layer 1
rx1
rx2
r
jx1
ijw1
iz
l
i
l
j
j
l
ij
l
i bawz 1 1
l
jl
ij
l
i aw
zIf l > 1
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
• is the multiplication of two termsl
ij
r
w
C
……
1
2
j
……
1
2
il
ijw l
iz l
ia
rl
i
l
ij ΔCΔzΔw
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
Layer lLayer 1l
l
i
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
……
Layer l-1
1
2
j…
…
1
2
i
Layer l …
…1
2
k
Layer l+1
……
……
……
……
1
2
n
Layer L(output layer)
1. How to compute Lδ
2. The relation of and lδ 1lδl
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
l
iδ
lδ2
lδ1
1l
kδ
1
2
lδ
1
1
lδ
L
nδ
L
2δ
Lδ1
lδ1lδ Lδ1-Lδ
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
1. How to compute Lδ
2. The relation of and lδ 1lδl
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
L
L
n
r
nz
C
rrLL Cyaz nnn
rL
r
n
r
n
n
y
C
z
y
L
nz ……
1
2
n
Layer L(output layer)
L
nδ
L
2δ
Lδ1
z
z
Depending on the definition of cost function
L
nz
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
1. How to compute Lδ
2. The relation of and lδ 1lδl
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
L
L
n
r
nz
C
rL
r
n
r
n
n
y
C
z
y
L
n
L
L
L
z
z
z
z
2
1
r
n
r
rr
rr
rr
yC
yC
yC
yC
2
1
rrl yCzδ L
r
n
rL
ny
Cz
element-wise multiplication
?Lδ
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
l
i
l
i ΔaΔz rΔC
kl
k
r
l
i
l
k
l
i
l
i
l
i
rl
iz
C
a
z
z
a
z
C1
1
1
1
lΔz
1
2
lΔz
1l
kΔz
……
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
1. How to compute Lδ
2. The relation of and lδ 1lδ
……
1
2
i
Layer l
……
1
2
k
Layer l+1
l
iδ
lδ2
lδ1
1l
kδ
1
2
lδ
1
1
lδ
1lδlδ
l
i
rl
iz
Cδ
1l
k
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
k
l
k
l
ki
l
i
l
i wz 11……
1
2
i
Layer l
……
1
2
k
Layer l+1
l
iδ
lδ2
lδ1
1l
kδ
1
2
lδ
1
1
lδ
1lδlδ
l
i
l
i ΔaΔz rΔC
1
1
lΔz
1
2
lΔz
1l
kΔz
……
k
l
kl
i
l
k
l
i
l
il
ia
z
z
a 11
l
izσ111 l
k
l
i
i
l
ki
l
k bawz
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
l
iδ i
l
iz
multiply a constant
1l
kδ
1
2
lδ
1
1
lδ
……
1l
kiw
1
2
l
iw
1
1
l
iwoutput
input
new type of neuron
k
l
k
l
ki
l
i
l
i wz 11
……
1
2
i
Layer l
……
1
2
k
Layer l+1
l
iδ
lδ2
lδ1
1l
kδ
1
2
lδ
1
1
lδ
1lδlδ
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
2
…
1
1
lz
1
2
lz
1 l
kz
1
k
2
1
i
…
Layer l+1Layer l
lz1
lz2
l
iz
l
iδ
lδ2
lδ1
1l
kδ
1
2
lδ
1
1
lδ
11 lTlll Wz
1lδlδ
l
i
l
l
l
z
z
z
z
2
1
k
l
k
l
ki
l
i
l
i wz 11
𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term
2
…
1
1
lz
1
2
lz
1 l
kz
1
k
2
1
i
…
Layer l+1Layer l
lz1
lz2
l
iz
l
iδ
lδ2
lδ1
1l
kδ
1
2
lδ
1
1
lδ
11 lTlll Wz
1lδlδ
Compare
1l
ka
1
2
la
1
1
la…
…
Layer l
1
2
i
……
1
2
kl
ia
Layer l+1
la2
la1
la1la
111 llll baWa
1
2
n
……
r
1y
C r
Lz1
Lz2
L
nz
r
2y
C r
r
n
r
y
C
Layer L
2
…
1
1
lz
1
2
lz
1 l
kz
1
k
2
1
i……
…
Layer l+1Layer l
lz1
lz2
l
iz
lδ1
lδ2
l
iδ
2
… 1L
1
z
1
m
Layer L-1
…
……
……
……
1. How to compute Lδ
2. The relation of and lδ 1lδ
11 lTlll Wz
TW L TlW 1
rr yCzδ LL l
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
l
i
rr yCL1-L
1L
2
z
1L mz
1lδ
lδ
Concluding Remarksl
i
r
l
ij
l
i
l
ij
r
z
C
w
z
w
C
Forward Pass Backward Pass
11 ll za
11 lTlll Wz 1211 llll baWz
rrL yCzδ L
1
11
lx
lar
j
l
j
……
1
2
j
……
1
2
il
ijw
Layer lLayer 1l
111 bxWz r
11 za
LTLLL Wz 11
l
i
Acknowledgement
•感謝 Ryan Sun 來信指出投影片上的錯誤