NLP Unconstrained Multivariable

7/29/2019 NLP Unconstrained Multivariable

1/12

Optimization in Engineering Design

Georgia Institute of Technology

Systems Realization Laboratory78

What you can do for one variable,you can do for many (in principle)


2/12




Method of Steepest Descent

The method of steepest descent (also known as the gradient

method) is the simplest example of a gradient based methodforminimizing a function of several variables.

Its core is the following recursion formula:

xk+1 = xk kF k

xk , xk+1 = values of the variables in the k and k+1 iteration.

F(x) = objective function to be minimized (or maximized)

F = gradients of the objective function, constituting the direction of trav = the size of the step in the direction of travel

Advantage: Simple

Disadvantage: Seldom converges reliably.

Remember: Direction = dk= S(k) = -F(x(k))

-

Refer to Section 3.5 for Algorithm and Stopping Criteria


3/12




Newton's Method (multi-variable case)

How to extend Newtons method to multivariable c ase?

xk+1 = xk -y(xk)y(xk)

Is this correct?

Start again with Taylor expansion:

y(x

) = y(x

k) +

y(x

k)(x-x

k) + 0.5 (x

-x

k)H

(x

k) (x

-x

k)

Note thatH is the Hessian containing the second order derivatives.

xk+1 = xk -y(xk)H(xk)

Is this correct?

Newtons method for finding an extreme point is

xk+1 = xk -H-1(xk) y(xk)

No. Why?

Not yet. Why?

Like the Steepest Descent Method,Newtons searches in the negativegradient direction.

See Sec. 1.4.

Remainderis dropped.Significance?

Dont confuse H-1with .

T


4/12




Properties of Newton's Method

Good properties (fast convergence) if started near solution.

However, needs modifications if started far away from solution.

Also, (inverse) Hessian is expensive to calculate.

To overcome this, several modifications are often made.

One of them is to add a search parameter in from of the Hessian.(similar to steepest descent). This is often referred to as themodified Newton's method.

Other modification focus on enhancing the properties of the second

and first order gradient combination. Quasi-Newton methods build up curvature information by observing

the behavior of the objective functions and its first order gradient.This info is used to generate an approximation of the Hessian.


5/12




Conjugate Directions Method

Conjugate direction methods can be regarded as somewhat inbetween steepest descent and Newton's method, havingthe positive features of both of them.

Motivation: Desire to accelerate slow convergence of steepestdescent, but avoid expensive evaluation, storage, and

inversion of Hessian.

Application: Conjugate direction methods are invariablyinvented and solved for the quadratic problem:

Note: Condition for optimality isy = Qx - b = 0 orQx = b (linear equation)

Minimize: () xTQx - bTx

Note: Textbook uses A instead of Q.


6/12




Basic Principle

So, since the vectors di are independent, the solution to thenxn quadratic problem can be rewritten as

x* = 0d0 + ... + n-1 dn-1

Multiplying by Q and by taking the scalar product with di, youcan express in terms ofd, Q, and either x* orb

Definition: Given a symmetric matrix Q, two vectors d1 and d2 are saidto be Q orthogonal orQ conjugate (with respect to Q) ifd1

TQd2 = 0.

Note that orthogonal vectors (d1Td2 = 0)are a special case of conjugate

vectors

Note that A is used instead of Q in your textbook


7/12




Conjugate Gradient Method

The conjugate gradient method is the conjugate directionmethod that is obtained by selecting the successive directionvectors as a conjugate version of the successive gradientsobtained as the method progresses.

You generate the conjugate directions as you go along.

ik

i

ikk dgd

1

0

or kkkk dgd 11Search direction

@ iteration k.

Three advantages:

1) Gradient is always nonzero and linearly independent of all previousdirection vectors.

2) Simple formula to determine the new direction. Only slightly morecomplicated than steepest descent.

3) Process makes good progress because it is based on gradients.


8/12




0 - Starting at any x0

define d0

= -g0

= b - Q x0

, where gk

is thecolumn vector of gradients of the objective function at point f(xk)

1 - Using dk , calculate the new point xk+1= xk+ kdk , where

2 - Calculate the new conjugate gradient direction dk+1, accordingto: dk+1= - gk+1+ kdk

where

Pure Conjugate Gradient Method (Quadratic Case)

Tk= -

gk dkdkTQdk

k=gk+1TQdkdkTQdk

This is slightly different than your current textbook

Note that is calculated


9/12




Non-Quadratic Conjugate Gradient Methods

For non-quadratic cases, you have the problem that you do notknow Q, and you would have to make an approximation.

One approach is to substitute Hessian H(xk) instead ofQ.

Problem is that Hessian has to be evaluated at each point.

Other approaches avoid the Q completely by using LineSearches

Examples: Fletcher-Reeves and Polak-Robiere methods

Difference in methods: find k through line search

different formulas for calculating kthan the pure Conjugate Gradientalgorithm


10/12




Polak-Robiere & Fletcher Reeves Method for Minimizing f(x)

0 -Starting at any x0define d0 = -g0,where g is the column vector

of gradients of the objective function at point f(x)

1 -Using dk , find the new point xk+1= xk+ kdk , where k is foundusing a line search that minimizes f(xk+ kdk)


where kcan vary depending on what (update) formula you use.

Fletcher-Reeves: Polak-Robiere:

Note: gk+1 is the gradient of the objective function at point xk+1

)()()()( 11

kk

kkk

ggggg

T

T

k

)()()()( 11

kk

kk

gggg

T

T

k


11/12




Fletcher-Reeves Method for Minimizing f(x)

0 -Starting at any x0define d0 = -g0,where g is the column vectorof gradients of the objective function at point f(x)

1 -Using dk , find the new point xk+1= xk+ kdk , where k is foundusing a line search that minimizes f(xk+ kdk)


where

)()(

)()( 11

kk

kk

gg

gg

T

T

k

See also Example 3.9 (page73) in your textbook


12/12




Conjugate Gradient Method Advantages

http://www.esm.vt.edu/~zgurdal/COURSES/4084/4084-Docs/Animation.html

For animations of each of ALL preceding search techniques, check out:

See em in action!

Attractive are the simple formulae for updating the direction vector.

Method is slightly more complicated than steepest descent, but

converges faster.

Documents

NLP Unconstrained Multivariable