NLP Unconstrained Multivariable

Embed Size (px)

Citation preview

  • 7/29/2019 NLP Unconstrained Multivariable

    1/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory78

    What you can do for one variable,you can do for many (in principle)

  • 7/29/2019 NLP Unconstrained Multivariable

    2/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory79

    Method of Steepest Descent

    The method of steepest descent (also known as the gradient

    method) is the simplest example of a gradient based methodforminimizing a function of several variables.

    Its core is the following recursion formula:

    xk+1 = xk kF k

    xk , xk+1 = values of the variables in the k and k+1 iteration.

    F(x) = objective function to be minimized (or maximized)

    F = gradients of the objective function, constituting the direction of trav = the size of the step in the direction of travel

    Advantage: Simple

    Disadvantage: Seldom converges reliably.

    Remember: Direction = dk= S(k) = -F(x(k))

    -

    Refer to Section 3.5 for Algorithm and Stopping Criteria

  • 7/29/2019 NLP Unconstrained Multivariable

    3/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory80

    Newton's Method (multi-variable case)

    How to extend Newtons method to multivariable c ase?

    xk+1 = xk -y(xk)y(xk)

    Is this correct?

    Start again with Taylor expansion:

    y(x

    ) = y(x

    k) +

    y(x

    k)(x-x

    k) + 0.5 (x

    -x

    k)H

    (x

    k) (x

    -x

    k)

    Note thatH is the Hessian containing the second order derivatives.

    xk+1 = xk -y(xk)H(xk)

    Is this correct?

    Newtons method for finding an extreme point is

    xk+1 = xk -H-1(xk) y(xk)

    No. Why?

    Not yet. Why?

    Like the Steepest Descent Method,Newtons searches in the negativegradient direction.

    See Sec. 1.4.

    Remainderis dropped.Significance?

    Dont confuse H-1with .

    T

  • 7/29/2019 NLP Unconstrained Multivariable

    4/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory81

    Properties of Newton's Method

    Good properties (fast convergence) if started near solution.

    However, needs modifications if started far away from solution.

    Also, (inverse) Hessian is expensive to calculate.

    To overcome this, several modifications are often made.

    One of them is to add a search parameter in from of the Hessian.(similar to steepest descent). This is often referred to as themodified Newton's method.

    Other modification focus on enhancing the properties of the second

    and first order gradient combination. Quasi-Newton methods build up curvature information by observing

    the behavior of the objective functions and its first order gradient.This info is used to generate an approximation of the Hessian.

  • 7/29/2019 NLP Unconstrained Multivariable

    5/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory82

    Conjugate Directions Method

    Conjugate direction methods can be regarded as somewhat inbetween steepest descent and Newton's method, havingthe positive features of both of them.

    Motivation: Desire to accelerate slow convergence of steepestdescent, but avoid expensive evaluation, storage, and

    inversion of Hessian.

    Application: Conjugate direction methods are invariablyinvented and solved for the quadratic problem:

    Note: Condition for optimality isy = Qx - b = 0 orQx = b (linear equation)

    Minimize: () xTQx - bTx

    Note: Textbook uses A instead of Q.

  • 7/29/2019 NLP Unconstrained Multivariable

    6/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory83

    Basic Principle

    So, since the vectors di are independent, the solution to thenxn quadratic problem can be rewritten as

    x* = 0d0 + ... + n-1 dn-1

    Multiplying by Q and by taking the scalar product with di, youcan express in terms ofd, Q, and either x* orb

    Definition: Given a symmetric matrix Q, two vectors d1 and d2 are saidto be Q orthogonal orQ conjugate (with respect to Q) ifd1

    TQd2 = 0.

    Note that orthogonal vectors (d1Td2 = 0)are a special case of conjugate

    vectors

    Note that A is used instead of Q in your textbook

  • 7/29/2019 NLP Unconstrained Multivariable

    7/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory84

    Conjugate Gradient Method

    The conjugate gradient method is the conjugate directionmethod that is obtained by selecting the successive directionvectors as a conjugate version of the successive gradientsobtained as the method progresses.

    You generate the conjugate directions as you go along.

    ik

    i

    ikk dgd

    1

    0

    or kkkk dgd 11Search direction

    @ iteration k.

    Three advantages:

    1) Gradient is always nonzero and linearly independent of all previousdirection vectors.

    2) Simple formula to determine the new direction. Only slightly morecomplicated than steepest descent.

    3) Process makes good progress because it is based on gradients.

  • 7/29/2019 NLP Unconstrained Multivariable

    8/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory85

    0 - Starting at any x0

    define d0

    = -g0

    = b - Q x0

    , where gk

    is thecolumn vector of gradients of the objective function at point f(xk)

    1 - Using dk , calculate the new point xk+1= xk+ kdk , where

    2 - Calculate the new conjugate gradient direction dk+1, accordingto: dk+1= - gk+1+ kdk

    where

    Pure Conjugate Gradient Method (Quadratic Case)

    Tk= -

    gk dkdkTQdk

    k=gk+1TQdkdkTQdk

    This is slightly different than your current textbook

    Note that is calculated

  • 7/29/2019 NLP Unconstrained Multivariable

    9/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory86

    Non-Quadratic Conjugate Gradient Methods

    For non-quadratic cases, you have the problem that you do notknow Q, and you would have to make an approximation.

    One approach is to substitute Hessian H(xk) instead ofQ.

    Problem is that Hessian has to be evaluated at each point.

    Other approaches avoid the Q completely by using LineSearches

    Examples: Fletcher-Reeves and Polak-Robiere methods

    Difference in methods: find k through line search

    different formulas for calculating kthan the pure Conjugate Gradientalgorithm

  • 7/29/2019 NLP Unconstrained Multivariable

    10/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory87

    Polak-Robiere & Fletcher Reeves Method for Minimizing f(x)

    0 -Starting at any x0define d0 = -g0,where g is the column vector

    of gradients of the objective function at point f(x)

    1 -Using dk , find the new point xk+1= xk+ kdk , where k is foundusing a line search that minimizes f(xk+ kdk)

    2 - Calculate the new conjugate gradient direction dk+1, accordingto: dk+1= - gk+1+ kdk

    where kcan vary depending on what (update) formula you use.

    Fletcher-Reeves: Polak-Robiere:

    Note: gk+1 is the gradient of the objective function at point xk+1

    )()()()( 11

    kk

    kkk

    ggggg

    T

    T

    k

    )()()()( 11

    kk

    kk

    gggg

    T

    T

    k

  • 7/29/2019 NLP Unconstrained Multivariable

    11/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory88

    Fletcher-Reeves Method for Minimizing f(x)

    0 -Starting at any x0define d0 = -g0,where g is the column vectorof gradients of the objective function at point f(x)

    1 -Using dk , find the new point xk+1= xk+ kdk , where k is foundusing a line search that minimizes f(xk+ kdk)

    2 - Calculate the new conjugate gradient direction dk+1, accordingto: dk+1= - gk+1+ kdk

    where

    )()(

    )()( 11

    kk

    kk

    gg

    gg

    T

    T

    k

    See also Example 3.9 (page73) in your textbook

  • 7/29/2019 NLP Unconstrained Multivariable

    12/12

    Optimization in Engineering Design

    Georgia Institute of Technology

    Systems Realization Laboratory89

    Conjugate Gradient Method Advantages

    http://www.esm.vt.edu/~zgurdal/COURSES/4084/4084-Docs/Animation.html

    For animations of each of ALL preceding search techniques, check out:

    See em in action!

    Attractive are the simple formulae for updating the direction vector.

    Method is slightly more complicated than steepest descent, but

    converges faster.