Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order...

Supervised learning

1. Early learning algorithms

2. First order gradient methods

3. Second order gradient methods

Early learning algorithms

• Designed for single layer neural networks

• Generally more limited in their applicability

• Some of them are– Perceptron learning– LMS or Widrow- Hoff learning– Grossberg learning

Perceptron learning

1. Randomly initialize all the networks weights.

2. Apply inputs and find outputs ( feedforward).

3. compute the errors.

4. Update each weight as

5. Repeat steps 2 to 4 until the errors reach the satisfactory level.

)()()()1( kekpkwkw jiijij

Performance OptimizationGradient based methods

Basic Optimization Algorithm

Steepest Descent (first order Taylor expansion)

Example

-2 -1 0 1 2-2

LMS or Widrow- Hoff learning

• First introduce ADALINE (ADAptive LInear NEuron) Network

LMS or Widrow- Hoff learningor Delta Rule

• ADALINE network same basic structure as the perceptron network

Approximate Steepest Descent

Approximate Gradient Calculation

LMS AlgorithmThis algorithm inspire from steepest

descent algorithm

Multiple-Neuron Case

Difference between perceptron learning and LMS learning

• DERIVATIVE

• Linear activation function has derivative

• sign (bipolar, unipolar) has not derivative

Grossberg learning (associated learning)

• Sometimes known as instar and outstar training• Updating rule:

• Where could be the desired input values (instar training, example: clustering) or the desired output values (outstar) depending on network structure.

• Grossberg network (use Hagan to more details)

)()()()1( kwkxkwkw iiii

First order gradient method

Back propagation

Multilayer Perceptron

R – S1 – S2 – S3 Network

Example

Elementary Decision Boundaries

Total Network

Function Approximation Example

Nominal Response

-2 -1 0 1 2-1

Parameter Variations

Multilayer Network

Performance Index

Chain Rule

Gradient Calculation

Steepest Descent

Jacobian Matrix

Backpropagation (Sensitivities)

Initialization (Last Layer)

Summary

• Back-propagation training algorithm

• Backprop adjusts the weights of the NN in order to minimize the network total mean squared error.

Network activationForward Step

Error propagationBackward Step

Summary

Example: Function Approximation

Network

Initial Conditions

Forward Propagation

Transfer Function Derivatives

Backpropagation

Weight Update

Choice of Architecture

Choice of Network Architecture

Convergence Global minium (left) local minimum

(rigth)

Generalization

Disadvantage of BP algorithm

• Slow convergence speed• Sensitivity to initial conditions• Trapped in local minima• Instability if learning rate is too large

• Note: despite above disadvantages, it is popularly used in control community. There are numerous extensions to improve BP algorithm.

Improved BP algorithms(first order gradient method)

1. BP with momentum

2. Delta- bar- delta

3. Decoupled momentum

4. RProp

5. Adaptive BP

6. Trinary BP

7. BP with adaptive gain

8. Extended BP

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order...

Documents

Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Gradient Methods April 2004. Preview Background Steepest Descent Conjugate Gradient

Chapter 7: Gradient Echo Imaging Methods

Gradient method - PKUbicmr.pku.edu.cn/~wenzw/opt2015/lect-gm.pdf · gradient method, line search subgradient, proximal gradient methods accelerated (proximal) gradient methods decomposition

Lecture 7: Gradient Methodsmath.xmu.edu.cn › group › nona › damc › Lecture07.pdf · Lecture 7: Gradient Methods April 8 - 10, 2020 Gradient Methods Lecture 7 April 8 - 10,

Policy Gradient Methods: Pathwise Derivative Methods and ...rll.berkeley.edu/deeprlcourse/docs/lec7.pdfPolicy Gradient Methods vs Q-Function Regression Methods I Q-function regression

Gradient methods for constrained problemsyc5/ele522_optimization/lectures/grad_descent... · Gradient methods for constrained problems Yuxin Chen Princeton University, Fall 2019

Gradient Methods for Submodular Maximizationsoltanol/submod.pdf · Gradient Methods for Submodular Maximization ... maxima. We also study stochastic gradient and mirror methods and

GRADIENT-TYPE METHODS FOR UNCONSTRAINED OPTIMIZATIONeprints.utar.edu.my/2141/1/AM-2016-1206885-1.pdf · GRADIENT-TYPE METHODS FOR UNCONSTRAINED OPTIMIZATION ... and Borwein Gradient

Gradient Methods in Structural Optimization

Policy Gradient Methods for Reinforcement Learning with ...papers.nips.cc/paper/1713-policy-gradient-methods... · Policy Gradient Methods for Reinforcement Learning with Function

HIGHER-ORDER GRADIENT ELASTICITYMODELS APPLIED TO ...real.mtak.hu/39957/1/Challamel_et_al_TAM_2015_published.pdf · HIGHER-ORDER GRADIENT ELASTICITYMODELS APPLIED TO GEOMETRICALLYNONLINEAR

Semi-Stochastic Gradient Descent Methods

Gradient Descent: Second Order Momentum and Saturating Errorpapers.nips.cc/paper/454-gradient-descent-second-order... · 2014-04-14 · 1.1 SIMPLE GRADIENT DESCENT First, let us review

Stochastic gradient methods for unconstrained optimizationnatasa/kkjsurvey.pdf · Stochastic Approximation and Sample Average Approximation methods. The concept of stochastic gradient

Spectral Projected Gradient Methods: Review and Perspectives

Gradient-based Methods for Optimization. Part II.math.oregonstate.edu/~gibsonn/optpart2.pdfGradient-based Methods for Optimization. Part II. ... Prof. Gibson (OSU) Gradient-based Methods

Stochastic Gradient Descent Methods

Incremental Gradient, Subgradient, and Proximal Methods for …dimitrib/Incremental_Survey_Slides.pdf · 2015-06-20 · Incremental Gradient, Subgradient, and Proximal Methods for

Chapter 8 Stochastic gradient / subgradient methods