CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION

CHAPTER 5CHAPTER 5

SSTOCHASTIC TOCHASTIC GGRADIENT RADIENT FFORM OF ORM OF

SSTOCHASTIC TOCHASTIC AAPROXIMATIONPROXIMATION

•Organization of chapter in ISSO–Stochastic gradient

•Core algorithm•Basic principles•Nonlinear regression•Connections to LMS

–Neural network training

–Discrete event dynamic systems

–Image processing

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

5-2

Stochastic Gradient FormulationStochastic Gradient Formulation

• For differentiable L(), recall familiar set of p equations and p unknowns for use in finding a minimum :

• Above is special case of root-finding problem

• Suppose cannot observe L() and g() except in presence of noise

– Adaptive control (target tracking)

– Simulation-based optimization

– Etc.

• Seek unbiased measurementunbiased measurement of L/ for optimization

( )

Lg

5-3

Stochastic Gradient Formulation (Cont’d)Stochastic Gradient Formulation (Cont’d)

• Suppose L() = E[Q(,V)] – V represents all random effects

– Q(,V) represents “observed” cost (noisy measurement of L())

• Seek a representation where Q/ is an unbiased measurement of L/ – Not true when distribution function for V depends on

• Above implies that desired desired representationrepresentation is

not not

where pV() is density function for V

( , ) ( , ) ( ) ,[ ]E Q Q p dVV

( , ) ( , ) ( | ) ,[ ]E Q Q p dVV

5-4

Stochastic Gradient Measurement Stochastic Gradient Measurement and Algorithmand Algorithm

• When density pV() is independent of ,

is unbiased measurement of L/– Above requires derivative–integral interchange in L/ =

E[Q(,V)]/ = E[Q(,V)/] to be valid

• Can use root-finding (Robbins-Monro) SA algorithm to attempt to find :

• Unbiased measurement satisfies key convergence conditions of SA (Section 4.3 in ISSO)

( , )( )

Q VY

1ˆ ˆ ˆ( )k k k k ka Y

5-5

Stochastic Gradient Stochastic Gradient TendencyTendency to Move to Move Iterate in Correct DirectionIterate in Correct Direction

5-6

Stochastic Gradient and LMS ConnectionsStochastic Gradient and LMS Connections

• Recall basic linear model from Chapter 3:

• Consider standard MSE loss: L()= – Implies Q =

• Recall basic LMS algorithm from Chapter 3

• Hence LMS is direct application of stochastic gradient SA

• Proposition 5.1 in ISSO shows how SA convergence theory applies to LMS– Implies convergence of LMS to

1 1 1 1( )Tk k k k k k k

kY Q

ˆ ˆ ˆa zh h

Tk k kz vh

212 ( )[ ]T

k kE z h

212( )T

k kz h

5-7

Neural NetworksNeural Networks• Neural networks (NNs) are general function approximators • Actual output zk represented by a NN according to standard

model zk = h(,xk) + vk

– h(,xk) represents NN output for input xk and weight values

– vk represents noise

• Diagram of simple feedforward NN on next slide • Most popular training method is backpropagation backpropagation (mean-

squared-type loss function)• Backpropagation is following stochastic gradient recursion

1ˆ

1 1ˆ

( , )ˆ ˆ

ˆ ( , )

k

k

k kk k k

k k k k

Qa

ha h z

V

x

5-8

Simple Feedforward Neural Network Simple Feedforward Neural Network with with pp = 25 Weight Parameters = 25 Weight Parameters

5-9

Discrete-Event Dynamic SystemsDiscrete-Event Dynamic Systems• Many applications of stochastic gradient methods in

simulation-based optimization• Discrete-event dynamic systems frequently modeled by

simulation– Trajectories of process are piecewise constant

• Derivative–integral interchange critical– Interchange not valid in many realistic systems

– Interchange condition checked on case-by-case basis

• Overall approach requires knowledge of inner workings of simulation– Needed to obtain Q(,V)/– Chapters 14 and 15 of ISSO have extensive discussion of

simulation-based optimization

5-10

Image RestorationImage Restoration• Aim is to recover true image subject to having recorded

image corrupted by noise• Common to construct least-squares type problem

where H s represents a convolution of the measurement process (H) and the true pixel-by-pixel image (s)

• Can be solved by either batch linear regression methods or the LMS/RLS methods

• Nonlinear measurements need full power of stochastic gradient method– Measurements modeled as Z = F(s, x, V)

2mins

Z H s

Documents

CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION