Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Neural Networks - Berrin Yanıkoğlu 1

Applications and ExamplesApplications and Examples

From Mitchell Chp. 4

ALVINN drives70mph on highways

Speech RecognitionSpeech Recognition

Hidden Node FunctionsHidden Node Functions

Head Pose RecognitionHead Pose Recognition

MLP & Backpropagation IssuesMLP & Backpropagation Issues

ConsiderationsConsiderations

• Network architecture• Typically feedforward, however you may also use local

receptive fields for hidden nodes; and in sequence learning, recursive nodes

• Number of input, hidden, output nodes• Number of hidden nodes is quite important, others

determined by problem setup

• Activation functions• Careful: regression requires linear activation on the output• For others, sigmoid or hyperbolic tangent is a good choice

• Learning rate• Typically software adjusts this

ConsiderationsConsiderations

• Preprocessing• Important (see next slides)

• Learning algorithm• Backpropagation with momentum or• Levenberg-Marquart suggested

• When to stop training• Important (see next slides)

PreprocessingPreprocessingInput variables should be decorrelated and with roughly equal variance

But typically, a very simple linear transformation is applied to the input to obtainzero-mean - unit variance input:

xi = ( xi - xi_mean )/i where i = 1/(N-1) ( xpi - xi_mean )2

patterns p

More complex preprocessing is also commonly done: E.g. Principal component analysis

When to stop trainingWhen to stop training

No precise formula:

1) At local minima, the gradient magnitude is 0– Stop when the gradient is sufficiently small

• need to calculate the gradient over the whole set of patterns• May need to measure the gradient in several directions, to avoid errors

caused by numerical instability

2) Local minima is a stationary point of the performance index (the error)– Stop when the absolute change in weights is small

• How to measure? Typically, rates: 0.01%

3) We are interested in generalization ability– Stop when the generalization, measured as the performance on

validation set, starts to increase

Effects of Sequential versus Batch Mode: SummaryEffects of Sequential versus Batch Mode: Summary

– Batch: – Better estimation of the gradient

– Sequential (online) – Better if data is highly correlated

– Better in terms of local minima (stochastic search)

– Easier to implement

Performance SurfacePerformance Surface

Motivation for some of the practical issues

Local Minima of the Performance CriteriaLocal Minima of the Performance Criteria

- The performance surface is a very high-dimensional (W) space full of local minima.

- Your best bet using gradient descent is to locate one of the local minima. – Start the training from different random locations (we will later see

how we can make use of several thus trained networks)– You may also use simulated annealing or genetic algorithms to

improve the search in the weight space.

Performance Surface ExamplePerformance Surface ExampleNetwork Architecture

5=w1 1

21–=

-2 -1 0 1 20

Nominal Function

Parameter Values

Layer numbers are shown as superscripts

Squared Error vs. Squared Error vs. ww111,1 1,1 and and bb11

30 -30-20

1,1-10 0 10 20 30-25

Squared Error vs. Squared Error vs. ww111,1 1,1 and and ww22

1,11,1

-5 0 5 10 15-5

w11,1w2

Squared Error vs. Squared Error vs. bb111 1 and and bb11

-10 -5 0 5 10-10

MLP & Backpropagation Summary MLP & Backpropagation Summary REST of the SLIDES are REST of the SLIDES are

ADVANCED MATERIAL ADVANCED MATERIAL (read only if you are interested, or (read only if you are interested, or

if there is something you do^’t understand…)if there is something you do^’t understand…)

These slides are thanks to John Bullinaria

Alternatives to Gradient DescentAlternatives to Gradient Descent

ADVANCED MATERIAL

(read only if interested)

SUMMARYSUMMARY

There are alternatives to standard backpropagation, intended to deal with speeding up its convergence.

These either choose a different search direction (p) or a

different step size ().

In this course, we will cover updates to standard backpropagation as an overview, namely momentum and variable rate learning, skipping the other alternatives (those that do not follow steepest descent, such as conjugate gradient method). – Remember that you are never responsible of the HİDDEN slides (that do

not show in show mode but are visible when you step through the slides!)

• Variations of Backpropagation– Momentum: Adds a momentum term to effectively increase the

step size when successive updates are in the same direction.– Adaptive Learning Rate: Tries to increase the step size and if the

effect is bad (causes oscillations as evidenced by a decrease in performance)

• Newton’s Method:• Conjugate Gradient• Levenberg-Marquardt• Line search

Motivation for momentum (Bishop 7.5)Motivation for momentum (Bishop 7.5)

Effect of momentumEffect of momentum

wij (n) = E/dwij(n) + wij (n-1)

wij(n) = n-t E/dwij(t)

If same sign in consecutive iterations => magnitude growsIf opposite sign in consecutive iterations => magnitude shrinks

•For wij(n) not to diverge, must be < 1.

•Effectively adds inertia to the motion through the weight space and smoothes out the oscillations

•The smaller the , the smoother the trajectory

Convergence ExampleConvergence Example of Backpropagation of Backpropagation

-5 0 5 10 15-5

Learning Rate Too LargeLearning Rate Too Large

-5 0 5 10 15-5

Momentum BackpropagationMomentum Backpropagation

-5 0 5 10 15-5

Variable Learning RateVariable Learning Rate

• If the squared error decreases after a weight update•the weight update is accepted•the learning rate is multiplied by some factor >1. •If the momentum coefficient has been previously set to zero, it is reset to its original value.

• If the squared error (over the entire training set) increases by more than some set percentage after a weight update

•weight update is discarded•the learning rate is multiplied by some factor (1>>0)•the momentum coefficient is set to zero.

• If the squared error increases by less than , then the weight update is accepted, but the learning rate and the momentum coefficient are unchanged.

ExampleExample

-5 0 5 10 15-5

100 102 1040

Iteration Number100 102 1040

Iteration Number

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4

Documents

Berrin yangınözü - Bakkal Baharı Sunumu

THIS IS 100 200 300 400 500 Chp. 15a Chp. 15b Chp. 15cChp. 16a Chp. 16b Leaders

1 Bayesian Learning Machine Learning by Mitchell-Chp. 6 Ethem Chp. 3 (Skip 3.6) Pattern Recognition & Machine Learning by Bishop Chp. 1 Berrin Yanikoglu

Covenant CHP Policy Index · Credentialing Policy 320.03 – CHP Quality Assurance and Performance Improvement Policy 330.01 – CHP . Peer Review Policy 340.01 – CHP . Utilization

PCA explained within the context of Face Recognitionpeople.sabanciuniv.edu/berrin/cs512/lectures/11-PCA-FaceReco.pdf · Face Recognition Berrin Yanikoglu FENS Computer Science & Engineering

Mitchell v. Mitchell, Alaska (2016)

Geology 12 Presents UNIT 3 Chp 10 Earth’s Interior and Isostacy Chp 11 Ocean Basin Chp 12 Plate Tectonics Chp 9 Seismology Chp 13 Structure Handout WS

LAN/WAN Optimization Techniques Chp.1~Chp.4

Pages: Chp 9; 229-245 Chp 11; 277-288 . Chp 12; 289-319

199713 berrin burgaz

Berrin Tansel, Ph.D., P.E. Florida International University

Anxiety Disorders Danielle Winder Kathia Johnson Susie Berrin

Doç. Dr. Berrin Yanıkoğlu Bilgisayar Bilimi ve Mühendisliği Programı Sabancı Üniversitesi

Berrin Günaydın, MD, PhD Gazi University School of Medicine

VITA% BERRIN%ERDOGAN% EDUCATION% · 1 VITA% BERRIN%ERDOGAN% The$Express$Personnel$Professor$of$Management$ 2/24/16$ $ EDUCATION% $ Ph.D.$in$Business$Administration,$University$of$Illinois$atChicago$

From the U.S. DOE CHP TAP’s - MAPC...CHP, waste heat to power, and district energy with CHP throughout the United States Just Launched Combined Heat and Power (CHP) for ... 28. CHP,

CHP - 2019 Yerel Seçim · CHP logosu yanlli kullamm CHA CHP yatay Olarak geniiletilemez CHP logosu proporsiyonuyia oynanamaz CHP dikey CHP CHP CHP logosunur beyaz çerçevesa kal&nlamaz

Class –IX HINDI APRIL MONTH SYLLABUS Chapters:- Chp ...punainternationalschool.com/assets/upload/ck-images/Class...Chp-1-Dhool Chp-9-Raidaas-Dohe Chp-2-Dukh ka Adhikaar Chp-10-Rahim

Geology 12 Presents Unit 3 Unit 3 Chp 10 Earth’s Interior Chp 10 Earth’s Interior Chp 11 Ocean Floor Chp 11 Ocean Floor Chp 12 Plate Tectonics Chp 12

Catalog 1 Head Protection - Safetyware Sdn Bhd · 2020. 5. 13. · CHP-6 CHP-7 0.5mm Cross Grid 0.5mm Stripe Line CHP-1 pc CHP-2 pc CHP-3 pc CHP-4 pc CHP-5 pc CHP-6 pc CHP-7 pc Art