11
1
Backpropagation
11
2
Multilayer Perceptron
R – S1 – S2 – S3 Network
11
3
Example
11
4
Elementary Decision Boundaries
First Subnetwork
First Boundary:a1
1hardlim 1– 0 p 0.5+( )=
Second Boundary:
a21
hardlim 0 1– p 0.75+( )=
11
5
Elementary Decision Boundaries
Third Boundary:
Fourth Boundary:
Second Subnetwork
a31
hardlim 1 0 p 1.5–( )=
a41
hardlim 0 1 p 0.25–( )=
11
6
Total Network
W1
1– 0
0 1–1 0
0 1
= b1
0.50.75
1.5–0.25–
=
W2 1 1 0 0
0 0 1 1= b2 1.5–
1.5–=
W3
1 1= b30.5–=
11
7
Function Approximation Example
f1
n( ) 1
1 en–
+-----------------=
f 2 n( ) n=
w1 1,1
10= w 2 1,1
10= b11
10–= b21
10=
w1 1,2 1= w1 2,
2 1= b2
0=
Nominal Parameter Values
11
8
Nominal Response
-2 -1 0 1 2-1
0
1
2
3
11
9
Parameter Variations
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1– w1 1,2
1≤ ≤
1– w1 2,2
1≤ ≤
0 b21
20≤ ≤
1– b2
1≤ ≤
11
10
Multilayer Network
am 1+
fm 1+
Wm 1+
amb
m 1++( )= m 0 2 … M 1–, , ,=
a0
p=
a aM=
11
11
Performance Index
p1 t1{ , } p2 t2{ , } … pQ tQ{ , }, , ,
Training Set
F x( ) E e2 ][= E t a–( )2 ][=
Mean Square Error
F x( ) E eTe ][= E t a–( )
Tt a–( )][=
Vector Case
F̂ x( ) t k( ) a k( )–( )T t k( ) a k( )–( ) eT k( )e k( )= =
Approximate Mean Square Error (Single Sample)
w i j,m
k 1+( ) wi j,m
k( ) α F̂∂
w i j,m∂
------------–= bim
k 1+( ) bim
k( ) αF̂∂
bim∂
---------–=
Approximate Steepest Descent
11
12
Chain Rule
f n w( )( )dwd
-----------------------f n( )d
nd--------------
n w( )dwd
---------------×=
f n( ) n( )cos= n e2w
= f n w( )( ) e2w( )cos=
f n w( )( )dwd
-----------------------f n( )d
nd--------------
n w( )dwd
---------------× n( )sin–( ) 2e2w( ) e
2w( )sin–( ) 2e2w( )= = =
Example
Application to Gradient Calculation
F̂∂
w i j,m
∂------------
F̂∂ni
m∂---------
nim∂
wi j,m∂
------------×= F̂∂
bim∂
---------F̂∂
nim∂
---------ni
m∂
bim∂
---------×=
11
13
Gradient Calculation
nim
wi j,m
ajm 1–
j 1=
Sm 1–
∑ bim
+=
nim∂
wi j,m∂
------------ a jm 1–
=ni
m∂
bim∂
--------- 1=
sim F̂∂
nim∂
---------≡
Sensitivity
F̂∂w i j,
m∂------------ si
maj
m 1–=
F̂∂
bim
∂--------- si
m=
Gradient
11
14
Steepest Descent
wi j,m
k 1+( ) wi j,m
k( ) αsim
a jm 1–
–= bim
k 1+( ) bim
k( ) αsim
–=
Wm
k 1+( ) Wm
k( ) αsma
m 1–( )
T–= bm
k 1+( ) bmk( ) αsm–=
sm F̂∂
nm
∂----------≡
F̂∂
n1m∂
---------
F̂∂
n2m
∂---------
…F̂∂
nS
mm
∂-----------
=
Next Step: Compute the Sensitivities (Backpropagation)
11
15
Jacobian Matrix
nm 1+
∂
nm
∂-----------------
n1m 1+∂
n1m
∂----------------
n1m 1+∂
n2m
∂---------------- …
n1m 1+∂
nS
mm
∂----------------
n2m 1+
∂
n1m
∂----------------
n2m 1+
∂
n2m
∂---------------- …
n2m 1+
∂
nS
mm
∂----------------
… … …n
Sm 1+m 1+∂
n1m
∂----------------
nSm 1+m 1+∂
n2m
∂---------------- …
nSm 1+m 1+∂
nS
mm
∂----------------
≡
nim 1+∂
n jm∂
----------------
wi l,m 1+
alm
l 1=
Sm
∑ bim 1+
+
∂
n jm∂
----------------------------------------------------------- wi j,m 1+ a j
m∂
n jm∂
---------= =
nim 1+∂
n jm∂
---------------- wi j,m 1+ f m n j
m( )∂
njm∂
--------------------- wi j,m 1+
fÝm
n jm( )= =
fÝm
n jm( )
f m n jm( )∂
njm∂
---------------------=
nm 1+
∂
nm∂----------------- Wm 1+ FÝ
mnm( )= FÝ
mn
m( )
fÝm
n1m( ) 0 … 0
0 fÝm
n2m( )… 0
… … …
0 0 … fÝm
nS
mm( )
=
11
16
Backpropagation (Sensitivities)
sm F̂∂
nm∂----------
nm 1+
∂
nm
∂-----------------
T
F̂∂
nm 1+
∂----------------- FÝ
mnm( ) Wm 1+( )
T F̂∂
nm 1+
∂-----------------= = =
sm
FÝmn
m( ) W
m 1+( )
Ts
m 1+=
The sensitivities are computed by starting at the last layer, andthen propagating backwards through the network to the first layer.
sM
sM 1–
… s2
s1
→ → → →
11
17
Initialization (Last Layer)
siM F̂∂
niM∂
----------t a–( )
Tt a–( )∂
niM∂
---------------------------------------
tj a j–( )2
j 1=
SM
∑∂
niM∂
----------------------------------- 2 ti ai–( )–ai∂
niM∂
----------= = = =
sM
2FÝMn
M( ) t a–( )–=
ai∂
niM∂
----------ai
M∂
niM∂
----------f
Mn i
M( )∂
niM∂
----------------------- fÝM
n iM( )= = =
siM
2 ti ai–( )– fÝM
n iM( )=
11
18
Summary
am 1+
fm 1+
Wm 1+
amb
m 1++( )= m 0 2 … M 1–, , ,=
a0
p=
a aM=
sM
2FÝMn
M( ) t a–( )–=
sm
FÝmn
m( ) W
m 1+( )
Ts
m 1+= m M 1– … 2 1, , ,=
Wm
k 1+( ) Wm
k( ) αsma
m 1–( )
T–= b
mk 1+( ) b
mk( ) αs
m–=
Forward Propagation
Backpropagation
Weight Update
11
19
Example: Function Approximation
g p( ) 1π4---p
sin+=
1-2-1Network
+
-
t
a
ep
11
20
Network
1-2-1Network
ap
11
21
Initial Conditions
W10( ) 0.27–
0.41–= b1
0( ) 0.48–
0.13–= W2
0( ) 0.09 0.17–= b20( ) 0.48=
Network ResponseSine Wave
-2 -1 0 1 2-1
0
1
2
3
11
22
Forward Propagation
a0
p 1= =
a1 f1 W1a0 b1+( ) l ogsig 0.27–
0.41–1
0.48–
0.13–+
logsig 0.75–
0.54–
= = =
a1
1
1 e0.75
+--------------------
1
1 e0.54
+--------------------
0.321
0.368= =
a2
f2 W2a1 b2
+( ) purelin 0.09 0.17–0.321
0.3680.48+( ) 0.446= = =
e t a– 1π4---p
sin+
a2
– 1π4---1
sin+
0.446– 1.261= = = =
11
23
Transfer Function Derivatives
fÝ1
n( )nd
d 1
1 en–
+-----------------
en–
1 en–
+( )2
------------------------ 11
1 en–
+-----------------–
1
1 en–
+-----------------
1 a1
–( ) a1( )= = = =
fÝ2
n( )nd
dn( ) 1= =
11
24
Backpropagation
s2
2FÝ2n
2( ) t a–( )– 2 fÝ
2n
2( ) 1.261( )– 2 1 1.261( )– 2.522–= = = =
s 1 FÝ1n1( ) W2( )
Ts 2 1 a1
1–( ) a1
1( ) 0
0 1 a21–( ) a2
1( )
0.09
0.17–2.522–= =
s1 1 0.321–( ) 0.321( ) 0
0 1 0.368–( ) 0.368( )0.090.17–
2.522–=
s 1 0.218 0
0 0.233
0.227–
0.429
0.0495–
0.0997= =
11
25
Weight Update
W21( ) W2
0( ) αs2 a1( )T
– 0.09 0.17– 0.1 2.522– 0.321 0.368–= =
α 0.1=
W21( ) 0.171 0.0772–=
b21( ) b2
0( ) αs2– 0.48 0.1 2.522–– 0.732= = =
W11( ) W1
0( ) αs 1 a0( )T
– 0.27–
0.41–0.1 0.0495–
0.09971– 0.265–
0.420–= = =
b11( ) b1
0( ) α s1– 0.48–
0.13–0.1 0.0495–
0.0997– 0.475–
0.140–= = =
11
26
Choice of Architecture
g p( ) 1i π4----- p
sin+=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1-3-1 Network
i = 1 i = 2
i = 4 i = 8
11
27
Choice of Network Architecture
g p( ) 16 π4
------ p sin+=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1-5-1
1-2-1 1-3-1
1-4-1
11
28
Convergence
g p( ) 1 πp( )sin+=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1
23
4
5
0
1
2
34
5
0
11
29
Generalization
p1 t1{ , } p2 t2{ , } … pQ tQ{ , }, , ,
g p( ) 1π4---p
sin+= p 2– 1.6– 1.2– … 1.6 2, , , , ,=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1-2-1 1-9-1
11
30
Dudas ???
11
31
Hasta la próxima !!!