29
11 1 Backpropagation

11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

Embed Size (px)

Citation preview

Page 1: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

1

Backpropagation

Page 2: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

2

Multilayer Perceptron

R – S1 – S2 – S3 Network

Page 3: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

3

Example

Page 4: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

4

Elementary Decision Boundaries

First Subnetwork

First Boundary:a1

1hardlim 1– 0 p 0.5+ =

Second Boundary:

a21

hardlim 0 1– p 0.75+ =

Page 5: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

5

Elementary Decision Boundaries

Third Boundary:

Fourth Boundary:

Second Subnetwork

a31

hardlim 1 0 p 1.5– =

a41

hardlim 0 1 p 0.25– =

Page 6: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

6

Total Network

W1

1– 0

0 1–1 0

0 1

= b1

0.50.75

1.5–0.25–

=

W2 1 1 0 0

0 0 1 1= b2 1.5–

1.5–=

W3

1 1= b30.5–=

Page 7: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

7

Function Approximation Example

f1

n 1

1 en–

+-----------------=

f2

n n=

w1 11

10= w 2 11

10= b11

10–= b21

10=

w1 12

1= w1 22

1= b2

0=

Nominal Parameter Values

Page 8: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

8

Nominal Response

-2 -1 0 1 2-1

0

1

2

3

Page 9: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

9

Parameter Variations

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1– w 1 12

1

1– w1 22

1

0 b21

20

1– b2

1

Page 10: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

10

Multilayer Network

am 1+

fm 1+

Wm 1+

amb

m 1++ = m 0 2 M 1– =

a0

p=

a aM=

Page 11: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

11

Performance Index

p1 t1{ , } p2 t2{ , } pQ tQ{ , }

Training Set

F x E e2 = E t a– 2 =

Mean Square Error

F x E eTe = E t a–

Tt a– =

Vector Case

F̂ x t k a k – T t k a k – eT k e k = =

Approximate Mean Square Error (Single Sample)

w i jm

k 1+ wi jm

k F̂

w i jm

------------–= bim

k 1+ bim

k F̂

bim

---------–=

Approximate Steepest Descent

Page 12: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

12

Chain Rule

f n w dwd

-----------------------f n d

nd--------------

n w dwd

---------------=

f n n cos= n e2w

= f n w e2w cos=

f n w dwd

-----------------------f n d

nd--------------

n w dwd

--------------- n sin– 2e2w e

2w sin– 2e2w = = =

Example

Application to Gradient Calculation

w i jm

------------

nim

---------ni

m

wi jm

------------= F̂

bim

---------F̂

nim

---------ni

m

bim

---------=

Page 13: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

13

Gradient Calculation

nim

wi jm

ajm 1–

j 1=

Sm 1–

bim

+=

nim

wi jm

------------ a jm 1–

=ni

m

bim

--------- 1=

sim F̂

nim

---------

Sensitivity

w i jm

------------ sim

ajm 1–

=F̂

bim

--------- si

m=

Gradient

Page 14: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

14

Steepest Descent

wi jm

k 1+ wi jm

k sim

a jm 1–

–= bim

k 1+ bim

k sim

–=

Wm

k 1+ Wm

k sma

m 1–

T–= bm

k 1+ bmk sm–=

sm F̂

nm

----------

n1m

---------

n2m

---------

nS

mm

-----------

=

Next Step: Compute the Sensitivities (Backpropagation)

Page 15: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

15

Jacobian Matrix

nm 1+

nm

-----------------

n1m 1+

n1m

----------------

n1m 1+

n2m

----------------

n1m 1+

nS

mm

----------------

n2m 1+

n1m

----------------

n2m 1+

n2m

----------------

n2m 1+

nS

mm

----------------

n

Sm 1+m 1+

n1m

----------------

nSm 1+m 1+

n2m

----------------

nSm 1+m 1+

nS

mm

----------------

nim 1+

n jm

----------------

wi lm 1+

alm

l 1=

Sm

bim 1+

+

n jm

----------------------------------------------------------- wi jm 1+ a j

m

n jm

---------= =

nim 1+

n jm

---------------- wi jm 1+ f m n j

m

njm

--------------------- wi jm 1+

fÝm

n jm = =

fÝm

n jm

f m n jm

njm

---------------------=

nm 1+

nm----------------- Wm 1+ FÝ

mnm = FÝ

mn

m

fÝm

n1m 0 0

0 fÝm

n2m 0

0 0 fÝm

nS

mm

=

Page 16: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

16

Backpropagation (Sensitivities)

sm F̂

nm---------- n

m 1+

nm

-----------------

T

nm 1+

----------------- FÝ

mnm Wm 1+

T F̂

nm 1+

-----------------= = =

sm

FÝmn

m( ) W

m 1+

Ts

m 1+=

The sensitivities are computed by starting at the last layer, andthen propagating backwards through the network to the first layer.

sM

sM 1–

s2

s1

Page 17: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

17

Initialization (Last Layer)

siM F̂

niM

----------t a–

Tt a–

niM

---------------------------------------

tj a j– 2

j 1=

SM

niM

----------------------------------- 2 ti ai– –ai

niM

----------= = = =

sM

2FÝMn

M( ) t a– –=

ai

niM

----------ai

M

niM

----------f

Mn i

M

niM

----------------------- fÝM

n iM = = =

siM

2 ti ai– – fÝM

n iM =

Page 18: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

18

Summary

am 1+

fm 1+

Wm 1+

amb

m 1++ = m 0 2 M 1– =

a0

p=

a aM=

sM

2FÝMn

M( ) t a– –=

sm

FÝmn

m( ) W

m 1+

Ts

m 1+= m M 1– 2 1 =

Wm

k 1+ Wm

k sma

m 1–

T–= b

mk 1+ b

mk s

m–=

Forward Propagation

Backpropagation

Weight Update

Page 19: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

19

Example: Function Approximation

g p 14---p

sin+=

1-2-1Network

+

-

t

a

ep

Page 20: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

20

Network

1-2-1Network

ap

Page 21: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

21

Initial Conditions

W10 0.27–

0.41–= b1

0 0.48–

0.13–= W2

0 0.09 0.17–= b20 0.48=

Network ResponseSine Wave

-2 -1 0 1 2-1

0

1

2

3

Page 22: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

22

Forward Propagation

a0

p 1= =

a1 f1 W1a0 b1+ l ogsig 0.27–

0.41–1

0.48–

0.13–+

logsig 0.75–

0.54–

= = =

a1

1

1 e0.75+--------------------

1

1 e0.54+--------------------

0.321

0.368= =

a2

f2 W2a1 b2

+ purelin 0.09 0.17–0.321

0.3680.48+( ) 0.446= = =

e t a– 1 4---p

sin+

a2– 1 4---1

sin+

0.446– 1.261= = = =

Page 23: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

23

Transfer Function Derivatives

fÝ1

n nd

d 1

1 en–

+----------------- e

n–

1 en–

+ 2

------------------------ 11

1 en–

+-----------------–

1

1 en–

+-----------------

1 a1

– a1 = = = =

fÝ2

n nd

dn 1= =

Page 24: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

24

Backpropagation

s2

2FÝ2n

2( ) t a– – 2 fÝ

2n

2 1.261 – 2 1 1.261 – 2.522–= = = =

s 1 FÝ1n1

( ) W2 Ts 2 1 a1

1– a1

1 0

0 1 a21

– a21

0.09

0.17–2.522–= =

s1 1 0.321– 0.321 0

0 1 0.368– 0.368 0.090.17–

2.522–=

s 1 0.218 0

0 0.233

0.227–

0.429

0.0495–

0.0997= =

Page 25: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

25

Weight Update

W21 W2

0 s2 a1 T

– 0.09 0.17– 0.1 2.522– 0.321 0.368–= =

0.1=

W21 0.171 0.0772–=

b21 b2

0 s2– 0.48 0.1 2.522–– 0.732= = =

W11 W1

0 s 1 a0 T

– 0.27–

0.41–0.1 0.0495–

0.09971– 0.265–

0.420–= = =

b11 b1

0 s1– 0.48–

0.13–0.1 0.0495–

0.0997– 0.475–

0.140–= = =

Page 26: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

26

Choice of Architecture

g p 1i 4----- p sin+=

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1-3-1 Network

i = 1 i = 2

i = 4 i = 8

Page 27: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

27

Choice of Network Architecture

g p 16 4

------ p sin+=

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1-5-1

1-2-1 1-3-1

1-4-1

Page 28: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

28

Convergence

g p 1 p sin+=

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1

23

4

5

0

1

2

34

5

0

Page 29: 11 1 Backpropagation. 11 2 Multilayer Perceptron R – S 1 – S 2 – S 3 Network

11

29

Generalization

p1 t1{ , } p2 t2{ , } pQ tQ{ , }

g p 14---p

sin+= p 2– 1.6– 1.2– 1.6 2 =

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1-2-1 1-9-1