Higher-Order Derivatives in Engineering Derivatives in Engineering Applications, ... By applying the chain rule of derivative calculus ... Higher order derivatives from AD have not

Embed Size (px)

Text of Higher-Order Derivatives in Engineering Derivatives in Engineering Applications, ... By applying...

  • Higher-Order Derivatives in Computational Systems Engineering

    Problem Solving

    Wolfgang MarquardtRWTH Aachen, AVT.PSE, CCES and AICES

    5th International Conference on Automatic Differentiation AD08Bonn, August 11-15, 2008

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 2

    AD and its Applications

    Automatic Differentiation (AD) is a set of techniques based on the mechanical application of the chain rule to obtain derivatives of a function given as a computer program.

    AD is used in the following areas: Numerical Methods Sensitivity Analysis Design Optimization Data Assimilation & Inverse Problems

    www.autodiff.org

    http://www.autodiff.org/

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 3

    Applications of AD in this Talk

    Numerical Methods nonlinear, differential / algebraic equation solving bifurcation analysis nonlinear programming Sensitivity Analysis first and second order sensitivitiesDesign Optimization under uncertaintyInverse Problems state estimation (data assimiliation) parameter and function estimation model-based optimal experimental design

    Optimal Control trajectory optimization dynamic real-time optimization

    topics listed at www.autodiff.org

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 4

    Higher-Order Derivatives and AD

    AD exploits the fact that every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations such as additions or elementary functions such as exp(). By applying the chain rule of derivative calculus repeatedly to these operations, derivatives of arbitrary order can be computed automatically, and accurate to working precision.

    Are higher-order derivatives just a nice-to-have in applications ? Nonlinear Programming Optimal Control Dynamic Real-time Optimization Inverse Problems Model-based Experimental Design Design Optimization under Uncertainty

    www.autodiff.org

    http://www.autodiff.org/

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 5

    AD tools implement the semantic transformation that systematically applies the chain rule of differential calculus to source code written in various programming languages. 29 AD tools listed on www.autodiff.org

    Higher order derivatives from AD have not reached the level of attention (at least in applications) they deserve !

    Are current tools only supporting expert users?Is there sufficient awareness of higher-order AD in applications ?

    AD Tools for the User

    derivative order # tools # citations1 18 2962 6 17n 5 45

    http://www.autodiff.org/

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 6

    Agenda

    Introduction AD for Higher Order Derivatives and their Applications AD Tools for the User

    Higher-Order Derivatives in Applications Numerical Methods Nonlinear Programming Optimal Control Dynamic Real-time Optimization Inverse Problems Model-Based Experimental Design Design Optimization under Uncertainty

    Software Tools Summary and Conclusions

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 7

    Agenda

    Introduction AD for Higher Order Derivatives and their Applications AD Tools for the User

    Higher-Order Derivatives in Applications Numerical Methods Nonlinear Programming Optimal Control Dynamic Real-time Optimization Inverse Problems Model-Based Experimental Design Design Optimization under Uncertainty

    Software Tools Summary and Conclusions

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 8

    Numerical Optimization Methods

    Why optimize ?Engineering design means to search a large space of decision variables for an appropriate solution. Optimization supports this search effectively.

    Applications often result in Nonlinear Programming problems (NLP):

    Solution algorithms are based on first order necessary (Karush-Kuhn-Tucker) conditions.

    I E i ,0(x)cs.t.

    (x) min

    i

    x

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 9

    Necessary Conditions of Optimality

    Lagrange function

    first-order necessary (KKT) conditions

    Iixc

    Ii

    Iixc

    ixc

    xL

    ii

    i

    i

    i

    x

    =

    =

    =

    ,0)(

    ,0

    ,0)(

    ,0)(

    0),(

    **

    *

    *

    *

    **

    )()(),()(

    xcxfxL ixAi

    i

    =

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 10

    Hessian of Lagrange Function ?

    gradient based evaluation of KKT conditions requires Hessian of Lagrange function

    first-order BFGS updates of approximate Hessian vs.

    exact Hessian

    improved robustness and faster convergence with exact Hessian and suitable NLP solvers warm start facilitated by exact Hessian in repetitive optimization check of sufficient conditions of optimality requires projected Hessian of Lagrange function

    NLP algorithms evaluate symmetrically projected Hessiancan be and should be exploited by AD algorithms

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 11

    Agenda

    Introduction AD for Higher Order Derivatives and their Applications AD Tools for the User

    Higher-Order Derivatives in Applications Numerical Methods Nonlinear Programming Optimal Control Dynamic Real-time Optimization Inverse Problems Model-Based Experimental Design Design Optimization under Uncertainty

    Software Tools Summary and Conclusions

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 12

    0),,,,( s.t.

    min,

    =avxvxf

    t fta f&& Finish

    .)(,0)(ftff

    xtxtv =

    Start

    0)0(),0( =vx

    max)( vtv

    0 10 20 30

    -1

    -0.5

    0

    0.5

    1MAX

    MIN

    PATH

    acceleration a

    time [second]0 10 20 300

    2

    4

    6

    8

    10

    0 10 20 30 0

    50

    100

    150

    200

    250

    velocity vdistance x

    time [second]

    A Simple Optimal Control Problem

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 13

    Optimal Control Optimization with ODEs

    Mayer-type optimal control problemfixed final time without loss of generality

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 14

    Semi-Discretization by Control Vector Parameterization

    ui(t)

    i,j(t)

    pij

    discretization of the continuous control u(t) by projection intofinite-dimensional function space (e.g. low-order B-splines)

    example: piecewise constantapproximation of the control

    finite dimensional decision vector p

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 15

    Semi-Discretization Results in NLP

    Lagrangian functional

    How to efficiently determine the Hessian of the Lagrangian ?

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 16

    Composite Adjoint Approach for Hessian Evaluation

    composite adjoint approach is a slight modification of second-order adjoint sensitivity analysis

    set of ODEs involving terms with second-orderderivatives but partly separable functions

    (zyurt & Barton, SISC, 2005)

    (Hannemann & Marquardt, 2008)

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 17

    Example: Optimal Control of van der Pol Oscillator (1)

    0

    10

    20

    30

    40

    50

    60

    np=25 np=50 np=100 np=250

    BFGS (1st-order derivatives)

    Hessian (discrete compositeadjoints, RK4)qual. Hessian (discretecomposite adjoint, Euler)

    number of NLP iterations

    total CPU time for optimization (sec)30 % reduction in NLPiterationsreduced computational effortimproved robustness

    (Hannemann & Marquardt, 2008)

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 18

    Hessian evaluation scales cost for high-accuracy approximate Hessian is same as for gradient evaluation !

    scaled CPU time for Jacobian and Hessian evaluation [msec / # parameters]

    Jacobian (RK4): forward sensitivities, RK4HESS (RK4): exact Hessian, discrete composite adjoints, RK4HESS (Euler): approx. Hessian, discrete composite adjoints, Euler

    (Hannemann & Marquardt, 2008)

    Example: Optimal Control of van der Pol Oscillator (2)

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 19

    adaptive discretization of control #3initial guess: 25 pars. adaptive parameterization at final solution: 129 pars.equivalent equidistant parameterization: 3072 pars.

    CPU time per sensitivity integration (first-order derivative): > 2 h total CPU time for (adaptive) optimization: > 1 month

    A Real World Problem

    large-scale chemical plant (Shell)optimal load change14.000 DAEs, 4 controls, sparse Jacobiantime horizon >> 24 hrs

    successful solution of with adaptive discretzation (~4 million sensitivity equations)

    non-adaptive solution impossible (~100 million sensitivity equations)

    next challenge: use of second-order derivatives

  • Higher-Order Derivatives in Engineering Applications, AD 2008, August 11 - 15 20

    Agenda

    Introduction AD for Higher Order Derivatives and their Applications AD Tools for the User

    Higher-Order Derivatives in Applications Numerica