50
Neural Networks - Berrin Yanıkoğlu 1 Applications and Applications and Examples Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Embed Size (px)

Citation preview

Page 1: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu 1

Applications and ExamplesApplications and Examples

From Mitchell Chp. 4

Page 2: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

2

ALVINN drives70mph on highways

Page 3: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

3

Speech RecognitionSpeech Recognition

Page 4: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

4

Hidden Node FunctionsHidden Node Functions

Page 5: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

5

Page 6: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

6

Page 7: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

7

Page 8: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

8

Page 9: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

9

Page 10: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

10

Head Pose RecognitionHead Pose Recognition

Page 11: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu 11

MLP & Backpropagation IssuesMLP & Backpropagation Issues

Page 12: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

12

ConsiderationsConsiderations

• Network architecture• Typically feedforward, however you may also use local

receptive fields for hidden nodes; and in sequence learning, recursive nodes

• Number of input, hidden, output nodes• Number of hidden nodes is quite important, others

determined by problem setup

• Activation functions• Careful: regression requires linear activation on the output• For others, sigmoid or hyperbolic tangent is a good choice

• Learning rate• Typically software adjusts this

Page 13: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

13

ConsiderationsConsiderations

• Preprocessing• Important (see next slides)

• Learning algorithm• Backpropagation with momentum or• Levenberg-Marquart suggested

• When to stop training• Important (see next slides)

Page 14: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

14

PreprocessingPreprocessingInput variables should be decorrelated and with roughly equal variance

But typically, a very simple linear transformation is applied to the input to obtainzero-mean - unit variance input:

xi = ( xi - xi_mean )/i where i = 1/(N-1) ( xpi - xi_mean )2

patterns p

More complex preprocessing is also commonly done: E.g. Principal component analysis

Page 15: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

15

When to stop trainingWhen to stop training

No precise formula:

1) At local minima, the gradient magnitude is 0– Stop when the gradient is sufficiently small

• need to calculate the gradient over the whole set of patterns• May need to measure the gradient in several directions, to avoid errors

caused by numerical instability

2) Local minima is a stationary point of the performance index (the error)– Stop when the absolute change in weights is small

• How to measure? Typically, rates: 0.01%

3) We are interested in generalization ability– Stop when the generalization, measured as the performance on

validation set, starts to increase

Page 16: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

16

Effects of Sequential versus Batch Mode: SummaryEffects of Sequential versus Batch Mode: Summary

– Batch: – Better estimation of the gradient

– Sequential (online) – Better if data is highly correlated

– Better in terms of local minima (stochastic search)

– Easier to implement

Page 17: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu 17

Performance SurfacePerformance Surface

Motivation for some of the practical issues

Page 18: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

18

Local Minima of the Performance CriteriaLocal Minima of the Performance Criteria

- The performance surface is a very high-dimensional (W) space full of local minima.

- Your best bet using gradient descent is to locate one of the local minima. – Start the training from different random locations (we will later see

how we can make use of several thus trained networks)– You may also use simulated annealing or genetic algorithms to

improve the search in the weight space.

Page 19: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

19

Performance Surface ExamplePerformance Surface ExampleNetwork Architecture

w1 11

10=

w2 11

10=

b11

5–=

b21

5=w1 1

21=

w1 22

1=b

21–=

-2 -1 0 1 20

0.25

0.5

0.75

1

Nominal Function

Parameter Values

Layer numbers are shown as superscripts

Page 20: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

20

Squared Error vs. Squared Error vs. ww111,1 1,1 and and bb11

11

w11,1

b11

-10

0

10

20

30 -30-20

-100

1020

0

0.5

1

1.5

2

2.5

b11w1

1,1-10 0 10 20 30-25

-15

-5

5

15

Page 21: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

21

Squared Error vs. Squared Error vs. ww111,1 1,1 and and ww22

1,11,1

-5

0

5

10

15

-5

0

5

10

15

0

5

10

-5 0 5 10 15-5

0

5

10

15

w11,1w2

1,1

w11,1

w21,1

Page 22: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

22

Squared Error vs. Squared Error vs. bb111 1 and and bb11

22

-10

-5

0

5

10

-10

-5

0

5

10

0

0.7

1.4

-10 -5 0 5 10-10

-5

0

5

10

b11

b21

b21b1

1

Page 23: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu 23

MLP & Backpropagation Summary MLP & Backpropagation Summary REST of the SLIDES are REST of the SLIDES are

ADVANCED MATERIAL ADVANCED MATERIAL (read only if you are interested, or (read only if you are interested, or

if there is something you do^’t understand…)if there is something you do^’t understand…)

These slides are thanks to John Bullinaria

Page 24: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

24

Page 25: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

25

Page 26: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

26

Page 27: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

27

Page 28: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

28

Page 29: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

29

Page 30: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

30

Page 31: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

31

Page 32: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

32

Page 33: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

33

Page 34: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

34

Page 35: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

35

Page 36: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

36

Page 37: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

37

Page 38: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

38

Page 39: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

39

Page 40: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

40

Page 41: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu 41

Alternatives to Gradient DescentAlternatives to Gradient Descent

ADVANCED MATERIAL

(read only if interested)

Page 42: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

42

SUMMARYSUMMARY

There are alternatives to standard backpropagation, intended to deal with speeding up its convergence.

These either choose a different search direction (p) or a

different step size ().

In this course, we will cover updates to standard backpropagation as an overview, namely momentum and variable rate learning, skipping the other alternatives (those that do not follow steepest descent, such as conjugate gradient method). – Remember that you are never responsible of the HİDDEN slides (that do

not show in show mode but are visible when you step through the slides!)

Page 43: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

43

• Variations of Backpropagation– Momentum: Adds a momentum term to effectively increase the

step size when successive updates are in the same direction.– Adaptive Learning Rate: Tries to increase the step size and if the

effect is bad (causes oscillations as evidenced by a decrease in performance)

• Newton’s Method:• Conjugate Gradient• Levenberg-Marquardt• Line search

Page 44: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

44

Motivation for momentum (Bishop 7.5)Motivation for momentum (Bishop 7.5)

Page 45: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

45

Effect of momentumEffect of momentum

wij (n) = E/dwij(n) + wij (n-1)

n

wij(n) = n-t E/dwij(t)

t=0

If same sign in consecutive iterations => magnitude growsIf opposite sign in consecutive iterations => magnitude shrinks

•For wij(n) not to diverge, must be < 1.

•Effectively adds inertia to the motion through the weight space and smoothes out the oscillations

•The smaller the , the smoother the trajectory

Page 46: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

48

Convergence ExampleConvergence Example of Backpropagation of Backpropagation

-5 0 5 10 15-5

0

5

10

15

w11,1

w21,1

Page 47: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

49

Learning Rate Too LargeLearning Rate Too Large

-5 0 5 10 15-5

0

5

10

15

w11,1

w21,1

Page 48: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

50

Momentum BackpropagationMomentum Backpropagation

-5 0 5 10 15-5

0

5

10

15

w11,1

w21,1

0.8=

Page 49: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

51

Variable Learning RateVariable Learning Rate

• If the squared error decreases after a weight update•the weight update is accepted•the learning rate is multiplied by some factor >1. •If the momentum coefficient has been previously set to zero, it is reset to its original value.

• If the squared error (over the entire training set) increases by more than some set percentage after a weight update

•weight update is discarded•the learning rate is multiplied by some factor (1>>0)•the momentum coefficient is set to zero.

• If the squared error increases by less than , then the weight update is accepted, but the learning rate and the momentum coefficient are unchanged.

Page 50: Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

52

ExampleExample

-5 0 5 10 15-5

0

5

10

15

100 102 1040

0.5

1

1.5

Iteration Number100 102 1040

20

40

60

Iteration Number

w11,1

w21,1

1.05=

0.7=

4%=