Optimal Control: Perspectives from the Variational Principles of Mechanics

Optimal Control

Perspectives from the Variational Principles of Mechanics

Ismail Hameduddin

Purdue University

Abstract

Optimal control is a tremendously important (and popular) area ofresearch in modern control engineering. The extraordinary elegance ofoptimal control results, the significance of their implications and the un-resolved nature of their practical implementation have excited the mindsof generations of engineers and mathematicians. The sheer amount ofrecent research dedicated to the topic, even after more than five decadesof the first publication of results, are a testament to this. Despite thiswidespread interest, an appreciation of the philosophical origins of opti-mal control, rooted in analytical mechanics, is still lacking. By weaving-inanalogies from the variational principles of mechanics in the wider contextof an overview of optimal control theory, this work attempts to expose thedeeper connections between optimal control and the early, philosophicallyoriented results in analytical mechanics. Rather than as a dry, rigorousexercise, this is often done through more intellectually satisfying heuristicdiscussions and insights. Although the two-point boundary value problemis given due importance (with its parallel in analytical mechanics), specialemphasis is placed on the feedback form of optimal control (Hamilton-Jacobi-Bellman equation) since this ties in closely with the exceedinglybeautiful Hamilton-Jacobi theory. Numerical solutions to the optimal con-trol problem and in particular, the generalized Hamilton-Jacobi-Bellmanequation with successive Galerkin approximations, are also discussed tohighlight recent trends and motivations behind optimal control research.

1 Introduction

Optimal control is the area of study that deals with choosing free parameters ina set of differential equations such that a cost function is minimized over an evo-lution of time. Optimal control is an extremely important field with applicationsranging from engineering, operations research to finance and economics [19, 25].For instance, the same tool used to study dynamical systems in economic theorywas used to design the controllers on the Apollo spacecraft [15].

Much of the development of optimal control mirrors that of analytical me-chanics. From a philosophical point of view, optimal control is a mimicry ofnature. By the principle of least action, nature choses the motion of a system(or particle) such as to minimize a certain form of “energy”. Then from thepoint of view of nature, it “uses optimal control” to minimize the energy usedby systems in their motion. Optimal control is simply the turning of the tablesso that this tool is available in controlling the behavior of dynamical systems inan optimal manner (with respect to a cost) subject to the (dynamic) constraintsalready imposed by nature.

This report introduces the ideas of optimal control to an audience familiarwith analytical mechanics and variational principles. The intent is to provide abasic understanding of the fundamental results and then delve into some moreadvanced/recent results. The report can also be seen broadly in a chronologicalmanner: It starts with a short review of some basic results calculus of varia-tions (1700-1900), then proceeds to optimal control theory (1950-1970), whichis followed by a discussion of the generalized Hamilton-Jacobi-Bellman equation(1979) and finally, the paper is capped off by a discussion of a numerical schemedeveloped in the 1990’s.

An effort has been made in the presentation to make the material relevantand intellectually stimulating by establishing connections between classical an-alytical mechanics and optimal control.

2 History

Optimal control is an outgrowth of the variational principles of mechanics andit is difficult to pinpoint exactly when a transition was made from examiningsystems moving freely under their own influence to determining a referencecontrol for a system to achieve a certain objective while minimizing a costfunction. A popular choice is the formulation of the brachistochrone problem:

Given two fixed points in a vertical plane, let a particle start fromrest at the higher point and travel to the lower point under its ownweight in a uniform gravity field. What path or curve must theparticle follow in order to reach the second point in the shortestamount of time?

An obvious solution to the minimum length problem is the straight line betweenboth points. However, the straight line does not minimize the amount of time.

3

The correct solution is a cycloid between the two points A and B. This problemwas first proposed by Galileo in 1638 in his book Two New Science. Galileoaccompanied the problem with an incorrect solution based on the geometry ofthe problem. Instead of a cycloid, he suggested a circle through the two pointsand center located a certain distance away (on an axis) [26, 28].

Nearly sixty years later, oblivious to Galileo’s introduction of the problem,Johann Bernoulli proposed the following “challenge” in the June 1696 issue ofActa Eruditorum [28]:

If in a vertical plane two points A and B are given, then it is requiredto specify the orbit AMB of the moveable point M , along which it,starting from A, and under the influence of its own weight, arrives atB in the shortest possible time. So that those who are keen of suchmatters will be tempted to solve this problem, is it good to know thatit is not, as it may seem, purely speculative and without practicaluse. Rather it even appears, and this may be hard to believe, thatit is very useful also for other branches of science than mechanicsIn order to avoid a hasty conclusion, it should be remarked thatthe straight line is certainly the line of shortest distance between Aand B, but it is not the one which is travelled in the shortest time.However, the curve AMB - which I shall divulge if by the end of thisyear nobody else has found it - is very well known among geometers.

This problem is precisely a minimum-time optimal control problem. Five math-ematicians solved the brachistochrone problem including Johan Bernoulli him-self, Leibniz, de l’Hopital, Jakob Bernoulli (Johan’s brother) and Isaac Newton.Jakob Bernoulli formulated a more difficult version of the brachistochrone prob-lem and solved it using a different type of proof. Jakob Bernoulli was mockedby his brother [28, 26] for using a sloppy proof but that proof formed the foun-dation of the future calculus of variations and the work of Lagrange, Hamiltonand Jacobi.

From the brachisochrone problem to the development of control, the historyof optimal control closely parallels that of analytical mechanics (variational prin-ciples of mechanics). Kalman’s work in introducing the state-space architectureto control revolutionized developments and reopened the door for significantdevelopments in optimal control [18].

Two schools of optimal control developed during the 1950’s and 1960’s. Thefirst was led by Richard E. Bellman and was centered in the USA. Bellman wasa mathematician and worked as a research scientist at The RAND Corporationin Santa Monica, California [7]. His research was focused on optimizing thecontrol of multistage (discrete) systems [4, 6]. Two years after joining RANDfrom Princeton, Bellman published his first book “The Theory of Dynamic Pro-gramming” [5]. His development led to the Bellman equation which providessufficient conditions for optimality. Later this was generalized to continous-timesystems where it bore a striking similarity to the Hamilton-Jacobi equation ofanalytical mechanics. In fact, both equations derive from the same principle

4

of minimizing an (integral) performance index subject to nonholonomic con-straints. Thus, the continous-time version of the Bellman equation is known asthe Hamilton-Jacobi-Bellman equation [8]. The derivations in this paper willfocus on the Hamilton-Jacobi-Bellman formulation.

The other school of optimal control was centered in the USSR and led by theacclaimed Soviet mathematician Lev Semenovich Pontryagin. Pontryagin devel-oped his famous maximum principle at around roughly the same time as Bell-man [22] but his work was, until later, available only in Russian [23]. Pontryaginapproach the problem of optimal control from the more classical approach ofthe calculus of variations. The famous Pontryagin’s minimum principle gener-alized necessary conditions for optimality and it was shown that the standardEuler-Lagrange equations are simply a special case of this principle [8].

Ever since these theoretical foundations were laid in optimal control, much ofthe development has been focused on applications and numerical techniques [18].Even half a century after the solution of the optimal control problem was firstformulated, efficient numerical methods for the computation of these solutionsare still an active area of research. In general, the problem remains unresolvedsince there is no efficient numerical scheme applicable in all cases even with theexponentially larger computational resources available today versus five decadesago.

3 The Optimal Control Problem

Consider a nonlinear time-varying dynamical system described by the equations

x(t) = f(x(t),u(t), t); x(t0); t0 ≤ t ≤ tf (1)

where x(t) ∈ Rn is the vector of internal states and u(t) ∈ Rm is the vectorof control input. Suppose we are given an objective to drive the dynamicalsystem from some initial state x(t0) at initial time t = t0 to some specified finalstate x(tf ) at final time t = tf given freedom over the assigned control inputu(t). In general, there are an infinite number of u(t) that satisfy this objective.The goal of optimal control is to determine a u(t) that not only achieves theobjective but is also optimal with respect to a specified performance index orcost. The performance index is chosen by the designer and therefore, the optimalcontrol u∗(t) is not optimal in the universal sense but only with respect to theperformance index.

A general performance index is given by

J = φ(x(tf ), tf ) +

∫ tf

t0

L(x(t),u(t), t) dt (2)

where the L(x(t),u(t), t) is the weighting function and φ(x(tf ), tf ) is the final-state weighting function. The final-state weighting function is a function thatwe desire to minimize at the final state. An example of this might be the finalenergy. The weighting function, on the other hand, is a function that we desire

5

to minimize throughout the time interval [t0, tf ]. The weighting function iscommonly a function of the control input u(t). This is because we often want tominimize the control “effort” expended to achieve the control objective. Duringthe reorientation of a spacecraft, for example, minimizing the control input u(t)over the entire interval reduces the amount of valuable fuel consumed.

The control objective may be stated not only directly in terms of the finalstate x(tf ) but may be function of the final state and time. This function iscalled the final state constraint and is given by

Ψ(x(tf ), tf ) = 0 (3)

where Ψ ∈ Rp. From henceforth Ψ(x(tf ), tf ) will be treated as the control ob-jective. Since this is a control objective, it differs from the final-state weightingfunction φ(x(tf ), tf ) in that φ(x(tf ), tf ) only needs to be minimized at the finaltime while Ψ(x(tf ), tf ) = 0 is a strict condition that must be met by the controlinput u(tf ) at the final time.

The optimal control problem maybe pictured to be the problem of finding anoptimal path from an initial point to a final surface described by Ψ(x(tf ), tf ) =0. Consider the case where we have x ∈ R2. The optimal control problem isthen to find an optimal path from a point in R3, i.e. (x(t0), t0), to the family ofpoints satisfying Ψ(x(tf ), tf ) = 0. Now if we have a fixed final time and fixedend state, this family points is restricted to a single point. Otherwise, if thefinal time is fixed but the final states are a function, we have a line. If we havea free final time (as in a minimum time problem) and final states as a function,we have a surface. This type of visualization is handy tool when dealing withoptimal control problems.

The next section begins a discussion of a basic result from the calculus ofvariations. This is then used to develop a solution to the optimal control problempresented here.

4 Variation with Auxiliary Conditions

It is instructive to first consider the problem of minimizing an integral

I =

∫ tf

t0

F (q, q, t) dt (4)

where q ∈ Rn, subject to the constraints

φ(q, t) = 0. (5)

where φ ∈ Rm. What will follow is a derivation from the calculus variations.The parallels with optimal control will become clear in the next section.

For an unconstrained problem, it is sufficient that the integral (4) be station-ary, i.e., the variation of I vanish, for the minimum, assuming that the second

6

variation ensures a minimum (this is not required for problems of dynamics).Thus, we require

δI = δ

∫ tf

t0

F (q, q, t) dt = 0. (6)

This is not correct for integrals with constraints as above since, although we aretaking variations of all n generalized coordinate, we only have n−m degrees offreedom. Thus in essence, we are only allowed to take free variations of n−mgeneralized coordinates.

We use what is known as the “Lagrange Multiplier Method” to deal withsuch a problem. Taking a variation of the constraint vector, we have

δφ =∂φ1

∂qδq = 0. (7)

Multiplying the variation of the constraint vector by a time-dependent functionvector λT (t) and integrating with respect to time (between t0 and tf ) gives ascalar term ∫ tf

t0

λT (t)δφ dt =

∫ tf

t0

λT (t)

(∂φ

∂qδq

)dt = 0. (8)

which can be augmented to (6) without changing the result since we are simplyadding zero

δI ′ = δ

∫ tf

t0

[F (q, q, t) + λT (t)δφ

]dt = 0. (9)

We can collect terms in δq in the first term of (9) to give

δ

∫ tf

t0

F dt = δ

∫ tf

t0

ET δq dt. (10)

Thus from (9) and (10), we can write δI ′ entirely in terms of the integrals ofterms affine in the δq. The original problem of eliminating m generalized co-ordinates from the system now becomes straightforward. We choose suitableλi such that the coefficients of m generalized coordinates vanish. The station-arity condition still holds on the remaining independent δq and hence, by theEuler-Lagrange equations, we need

∂F

∂q− d

dt

∂F

∂q+ λT (t)

∂φ

∂q= 0. (11)

Alternatively, we can achieve the same results by defining an augmentedfunction F ′ as

F ′ = F + λT (t)δφ (12)

and thus, similar to previously, we have

I ′ =

∫ tf

t0

F ′ dt =

∫ tf

t0

[F + λT (t)δφ

]dt. (13)

7

Setting δI ′ = 0 with an appropriate λ(t) recovers the results (11).For nonholonomic constraints

dφ = aT dq = 0. (14)

the result (11) still holds except that the partial derivatives ∂φ/∂q are replacedby the coefficient a of the nonholonomic constraint vector, (14). We thus have

∂F

∂q− d

dt

∂F

∂q+ λT (t)a. (15)

A similar result for the optimal control problem using the same methods forderivation is shown in the next section.

5 Optimal Control by the Euler-Lagrange Method

The approach of optimal control is to treat the problem of finding the optimalcontrol u(t) as one of finding the stationary value of the performance indexsubject to nonholonomic constraints which are precisely the system dynamics.In this philosophy, we are, in effect, turning the problem upside down. Ratherthan approaching the system dynamics first and then finding a control thatwould minimize a performance index, we approach the performance index firstand treat the system dynamics as auxiliary constraints on the system. It is thissimple, yet groundbreaking, change of perspective that spurred on the decadesof research and produced some of the most significant results of the past halfcentury. After this perspective change, the problem can be solved almost iden-tically as in the previous section.

Consider first the case when there is no final state constraint but we havefixed initial and final time. Begin by rearranging the system dynamics (1),multiplying by an undetermined time-dependent vector λT (t) and integratingbetween the limits to give∫ tf

t0

λT (t) [f(x(t),u(t), t)− x(t)] dt = 0. (16)

We can then augment the performance index (2) with (16) without any impactsince we are simply adding zero, similar to what we did in the general Lagrangemultiplier method

J ′ = φ(x(tf ), tf ) +

∫ tf

t0

L(x(t),u(t), t) + λT (t) [f(x(t),u(t), t)− x(t)]

dt.

(17)As in analytical mechanics, define the Hamiltonian function as

H(x(t),u(t), λ(t), t) = L(x,u(t), t) + λT (t)f(x(t),u(t), t) (18)

which substituting in (17) yields

J ′ = φ(x(tf ), tf ) +

∫ tf

t0

[H(x(t),u(t), λ(t), t)− λT (t)x(t)

]dt. (19)

8

Integrating the last term of (19) by parts∫ tf

t0

λT (t)x(t) dt =[λT (t)x

]tft0

+

∫ tf

t0

λT (t)x(t) dt. (20)

Substituting (20) into (19) and evaluating the limits gives us

J ′ = φ(x(tf ), tf )− λT (tf )x(tf ) + λT (t0)x(t0)

+

∫ tf

t0

[H(x(t),u(t), λ(t), t) + λT (t)x(t)

]dt. (21)

We now consider a variations in J ′ due to variations in the control vectoru(t) while holding the initial time t0 and final time tf fixed. After collectingterms in the variation, we have

δJ ′ =

[(∂φ

∂x+ λT

)δx

]t=tf

+[λT δx

]t=t0

+

∫ tf

t0

[(∂H

∂x+ λT

)δx +

∂H

∂uδu

]dt.

(22)To achieve a stationary point δJ ′ = 0, we choose the arbitrary multiplier func-tions λ(t) such that the coefficients of the δx(t) vanish. This reduces the numberof free variables in our problem and we avoid the need to determine the varia-tions δx(t) produced by a given δu(t). Hence, we first define the dynamics ofthe multiplier functions as

λT (t) = −∂H∂x

= −∂L∂x− λT (t)

∂f

∂x(23)

which eliminates the coefficient of δx inside the integral in (22). We also definethe boundary conditions on these dynamics as

λT (tf ) =∂φ

∂x(tf )(24)

which eliminates the first term in (22). We then have

δJ ′ = λT (t0)δx(t0) +

∫ tf

t0

∂H

∂uδu dt. (25)

For J ′ to be stationary, i.e., δJ ′ = 0, we must have

∂H

∂u= 0 t0 ≤ t ≤ tf (26)

The above equations (23), (24) and (26) are precisely the conditions neededfor the performance index to be stationary, i.e., for u(t) to be an optimal control.We are thus left to solve the following differential equations to determine theoptimal control:

x = f(x,u, t) (27)

9

λ = −(∂f

∂x

)T

λ−(∂L

∂x

)T

= 0 (28)

where u(t) is determined by(∂f

∂u

)T

λ+

(∂L

∂u

)T

= 0 (29)

and the boundary conditions are

x(t0) (30)

λ(tf ) =

(∂φ

∂x

)T

(31)

The equations (27) through (31) parallel the Euler-Lagrange equations fromstandard variational calculus and are referred to as the stationarity conditions.Notice the similarity between (11) and (28),(29).

The multiplier vector elements λ are known as the “costates” because theoptimal control is determined by solving the state dynamics x together with themultiplier dynamics λ.

Since the boundary conditions are specified at both initial and final time, theproblem itself is often called the two-point boundary-value problem (2PBVP).We are required to specify both the initial and final time for such a problem.This restriction (of specifying both initial and final time) is overcome later byusing another method of solution of the optimal control problem that utilizeselements from Hamilton-Jacobi theory.

An assumption of no final state constraint was assumed in the derivationof the previous stationarity conditions. This is not true in many cases. Theproblem where a final state vector

Ψ(x(tf ), tf ) = 0 (32)

is specified is dealt with below.Analagous to the previous treatment, we form a performance index that is

augmented by a multiple of the final state constraint vector with the effect ofadding a multiple of zero

J ′ = φ(x(tf ), tf ) + νT Ψ(x(tf ), tf )∫ tf

t0

L(x(t),u(t), t) + λT (t) [f(x(t),u(t), t)− x(t)]

dt. (33)

where νT is a vector of undetermined multipliers. The previous derivation maybe repeated if we define

Φ = φ+ νT Ψ (34)

10

and substitute into the performance index except that the νT will not be spec-ified. This can be resolved with some incremental effort, and the previousstationarity conditions can be shown to hold with a minor modification to (31)

λ(tf ) =

(∂φ

∂x+ νT

∂ψ

∂x

)t=tf

. (35)

This completes our discussion of optimal control by the Euler-Lagrange method.Although, many further extensions to the current results exists, they are nottreated in this report.

Another approach to solving the optimal control problem is to use paral-lels from the theory of Hamilton-Jacobi from analytical mechanics. Thus, ashort review of the Hamilton-Jacobi theory is given in the next section with anemphasis on parts of the theory that prove useful in optimal control.

6 Hamilton-Jacobi Theory

Hamilton’s problem deals with solving for the motion of a dynamic system suchthat its generalized coordinates are reduced to quadratures. According to theprinciple of least action, the motion of a dynamic system or the solution ofHamilton;s problem is such that it minimizes the total energy or “action”. ByHamilton’s principle, this “action” is the canonical integral. Thus achieving astationary point on the canonical integral implies that a minimum energy motionhas been achieved and Hamilton’s problem has been solved. The stationarypoint is not verified via a second variation because in general, for problemsin dynamics, a stationary point cannot imply a maximum (since the feasiblegeneralized coordinates are theoretically unbounded). Only a basic discussionof this problem and its solution will be presented in this section as a completederivation is beyond the scope of the report. The reader is referred to references[16, 13, 21] for more details.

The canonical integral in analytical mechanics is given by

I =

∫ tf

t0

L(q, q, t) dt = I(q0, q0, t0, tf ) (36)

where L is the Lagrangian, q is the generalized coordinate vector, q is thegeneralized velocity vector and q0, q0 are the vectors of initial conditions. Fora stationary point, the first variation of the canonical integral must be zero

δI = 0. (37)

A motion that satisfies such a condition is achieved in the Hamilton-Jacobi the-ory via a canonical transformation, i.e., a transformation that does not violateHamilton’s principle in the dynamics of the system.

The statement of (36) is that the canonical integral, including integrationconstants, is fully determined once we have the initial generalized coordinates

11

and velocities. Hamilton-Jacobi theory (which will not be derived here) intro-duces a generating function S called “Hamilton’s Principal Function” based onthe canonical integral formulation in (36)

S(q0,qf , t0, tf ) =

∫ tf

t0

L dt (38)

where qf are the generalized coordinates at the final time t = tf . The keydifference between (36) and (38) is that we do not require the initial generalizedvelocities but we instead replace these, via a canonical transformation, by thegeneralized coordinates at the final time. In analytical mechanics finding sucha transformation implies that we have found a complete solution of Hamilton’sproblem. This is because we transform the system from a moving point inconfiguration space to a fixed point. It is natural, therefore, that Hamilton’sPrinciple Function holds a special importance in analytical mechanics (and byextension, the Hamilton-Jacobi theory and optimal control theory).

By the theory of Hamilton-Jacobi, the principal function is the solution of thefollowing partial differential equation known as the Hamilton-Jacobi equation

∂S∂t

+H

(q,∂S∂q

, t

)= 0 (39)

where H is the Hamiltonian (defined in terms of analytical mechanics). Oncethe solution to the Hamilton-Jacobi equation is found (S), we can generatea canonical transformation that transforms the moving point in configurationspace representing the motion of system to a fixed point in configuration space.

In the special case where the Hamiltonian is not dependent on time (conser-vative systems), we have

H

(q,∂S∂q

, t

)= 0. (40)

The results of this section will be exploited later, at the end of the next section,to find an elegant solution to the optimal control problem. First, however, abasic derivation of this result for the optimal control problem, not drawing onthe analogy from analytical mechanics, is presented in the next section.

7 Optimal Feedback Control via the Hamilton-Jacobi-Bellman formulation

The problem of finding an optimal control u∗(t) to proceed from a specifiedinitial state x(t0) to a terminal surface described by Ψ(x(tf ), tf ) = 0 has beenconsidered so far. A result was derived (Euler-Lagrange optimal control) todetermine the optimal control that minimizes the performance index

J = φ(x(tf ), tf ) +

∫ tf

t0

L(x(t),u(t), t) dt (41)

12

and satisfies the final-state constraint (or terminal surface)

Ψ(x(tf ), tf ) = 0 (42)

where the system dynamics are given by

x(t) = f(x(t),u(t), t); x(t0); t0 ≤ t ≤ tf (43)

Implicit in this discussion was that if the initial state x(t0) was changed andselected on the path from the initial point to the terminal surface determinedby optimal control, then the resulting (new) optimal path would lie on the samepath as previously except for beginning at the new initial state. In a significantomission, the possibility of other completely arbitrary initial states that do notlie on the original optimal path was not considered. Indeed, according to theprevious discussion, if another initial state that does not lie on the originalpath is specified, then the optimal problem must be considered anew and theoptimal control Euler-Lagrange equations must be solved anew. Since in realityan infinite number of initial conditions exist, if an efficient method for solvingthe optimal control Euler-Lagrange equations is not available (and often it isnot), the previous optimal control results do not prove very useful. The optimalcontrol Euler-Lagrange equations provide an open-loop or feedforward controlthat do not require the system state information at any time other than theinitial and final time (hence the name: two-point boundary-value problem).

It is preferred to have a family of paths that reach the terminal surfaceΨ(x(tf ), tf ) = 0 from a family of arbitrary initial states x(t0). Each of thesepaths is the optimal path, with respect to the performance index, from the initialstate to the terminal surface. Thus, the family of paths is a family of optimalpaths or extremals which, in a continuous setting, should be representable by aninitial state dependent function. This allows the formation of feedback controllaw rather than the feedforward type control provided by the Euler-Lagrangeformulation.

The most obvious strategy for forming this initial state dependent functionis to use the only two properties possessed by all the optimal paths: each pathis optimal with respect to the performance index and each path ends at theterminal surface Ψ(x(tf ), tf ) = 0. Consider then, the cost of an optimal pathstarting from an arbitrary initial state (initial state x at time t) and endingat the terminal surface. This function is called the value function or optimalreturn function and is given by

V (x, t) = minu(t)

φ(x(tf ), tf ) +

∫ tf

t

L(x(τ),u(τ), τ) dτ

(44)

with boundary conditionV (x, t) = φ(x(t), t). (45)

on the terminal surface Ψ(x(t), t) = 0. For considerations here, we assume thatvalue function V (x, t) ∈ C2 over the interval of interest. The qualifier minu(t)

implies that the evaluation of the value function is along the optimal trajectory.

13

A complete derivation of Hamilton-Jacobi-Bellman equation is shown below,after which another heuristic derivation will be shown using parallels from theHamilton-Jacobi theory of analytical mechanics.

Suppose that the system starts at an arbitrary initial condition (x, t) andproceeds using a non-optimal control u(t) for a short period of time ∆t to reachthe point (by first-order approximation assuming ∆t is sufficiently small)

(x + x∆t, t+ ∆t) = (x + f(x,u, t)∆t, t+ ∆t). (46)

Correspondingly, by another first-order approximation, the value function forthis small non-optimal path is given by

V∆(x, t) =dV (x, t)

dt∆t = L(x,u, t)∆t (47)

where the subscript on V signifies a first-order approximation of a small-pathand the tilde represents the non-optimal nature of the path.

Now suppose optimal control is used for the remainder of the path, i.e., from(x+f(x,u, t)∆t, t+∆t) to the terminal surface Ψ(x(tf ), tf ) = 0. The (subopti-

mal) total value function V (x, t) then is the sum of the (optimal) value functionbeginning at initial state (x + f(x,u, t)∆t, t + ∆t) and the first-order approx-imation to the value function of the small non-optimal path in the beginningV∆(x, t):

V (x, t) = V (x + f(x,u, t)∆t, t+ ∆t) + V∆(x, t) (48)

= V (x + f(x,u, t)∆t, t+ ∆t) + L(x,u, t)∆t. (49)

Obviously, since V (x, t) is suboptimal (due to the small suboptimal path in thebeginning), it will always be greater than the actual (optimal) return functionV (x, t)

V (x, t) ≤ V (x, t). (50)

The equality will only hold in (50) when the optimal control is chosen for the

interval ∆t, i.e., when V (x, t) is minimized, from which we have

V (x, t) = minu V (x + f(x,u, t)∆t, t+ ∆t) + L(x,u, t)∆t . (51)

Due to the assumption V (x, t) ∈ C2, the right-hand side of (51) can beexpanded as a Taylor series about (x, t)

V (x, t) = minu

V (x, t) +

∂V

∂xf(x,u, t)∆t+

∂V

∂t∆t+ L(x,u, t)∆t

. (52)

Since V and ∂V∂t do not explicitly depend on u, setting ∆t→ dt in (52) gives

−∂V∂t

= minu

L(x,u, t) +

∂V

∂xf(x,u, t)

. (53)

14

Now consider the differential (with respect to time) of the value functionwritten in terms of the Hamiltonian analagous to (19)

dV = λT dx−H dt (54)

whereH(x, λ,u, t) = L(x,u, t) + λT f(x,u, t). (55)

From (54), we have on the optimal trajectory

λT =∂V

∂x(56)

and

H = −∂V∂t. (57)

Substituting (56) into (55) gives

H(x, λ,u, t) = L(x,u, t) +∂V

∂xf(x,u, t). (58)

which, when substituted into (53), gives the Hamilton-Jacobi-Bellman Equation

−∂V∂t

= minuH

(x,∂V

∂x,u, t

). (59)

which is solved with the boundary condition

V (x, t) = φ(x(t), t) (60)

on the terminal surface Ψ(x, t) = 0. Solving the Hamilton-Jacobi-Bellman(HJB) equation gives us the V (x, t), which we can use along with the speci-fied performance index and the stationarity condition to determine the optimalcontrol u(x, t) independent of the initial state. Since the HJB equation is a suf-ficient condition for optimality, we thus have a function that provides optimalcontrol in feedback form.

7.1 The Hamilton-Jacobi-Bellman equation from the stand-point of analytical mechanics

We can perform a heuristic derivation of the HJB equation by appealing tothe Hamilton-Jacobi theory of analytical mechanics which shows the parallelsbetween optimal control theory and the variational princples of mechanics.

Recall that we defined Hamilton’s principal function (38) as the canonicalintegral transformed such that it is a function of the generalized coordinates atthe final time rather than the generalized velocities, i.e.,

S = S(q0,qf , t0, tf ). (61)

15

Substitute x = f(x,u, t) into the constrained performance index (19) andlet the initial states and control be arbitrarily assigned

J ′ = φ(xf , tf ) +

∫ tf

t0

[H(x(t),u(t), λ(t), t)− λT (t)x(t)

]dt. (62)

= J ′(x,xf ,u,uf , t0, tf ) (63)

where the subscript f indicates evaluation at the final time.Now that since J ′ = J ′(x0,xf ,u0,uf , t0, tf ) is not a function of the velocities

x and because φ(xf , tf ) is simply a function evaluated at a single point, i.e. aconstant, defining x and u as an extended system of generalized coordinates,allows us to set

S = J ′(x0,x,u0,u, t0, tf ) (64)

Then the new S function is stationary with respect to the first variation ifit satisfies the Hamilton-Jacobi equation (39). Rearranging (39) and changingthe arguments we have

∂S∂t

= −H(x,∂S∂x

,u, t

)(65)

which is simply another statement of the HJB equation (59) since by the Hamilton-Jacobi theory S satisfying the previous partial differential equation immediatelyimplies that the first variation of the canonical integral (in this case, the per-formance index) vanishes.

7.2 A Special Case

A special case is discussed here that utilizes the previous results to show anexample of deriving a feedback optimal control u∗ based on the HJB equation.Specifically, consider a nonlinear system of the form

x = f(x) + g(x)u (66)

where x ∈ Rn (as before), f : Rn → Rn, g : Rn → Rn×m, f(0) = 0 and u is acontrol to be determined.

Let the value function (from the corresponding performance index) be givenby

V (x,u) =

∫ ∞t

(xTQx + uTRu

)dt (67)

=

∫ ∞t

L(x,u) dt (68)

where Q ∈ Rn×n and R ∈ Rm×m are symmetric weighting matrices whosechoice is left as a design consideration. The expression in (67) evaluates thetotal cost up to tf = ∞. It represents the weighted (by Q and R) squaredsum of the total control effort and state “effort” expended, which is commonly

16

a quantity that needs to be minimized. There are no final state constraintsspecified and therefore, the problem is simply one of regulation, i.e., the systemmust be driven to its equilibrium x = 0. Furthermore, there is no final-stateweighting function. Also, notice that the value function (67) is not dependenton time because the original system is not dependent on time. This propertywill play an important role in the following discussion.

Similar to the development in (16) through (19), an augmentation of (67)with the system dynamics multiplied by the costates yield

V (x,u) =

∫ ∞t

(H(x,u, λ)− λT x

)dt (69)

whereH = xTQx + uTRu + λT [f(x) + g(x)u] . (70)

Rewriting the stationarity condition (29) in terms of the new system equationsgives

∂

∂u

[λT (f(x) + g(x)u) + L

]=∂H

∂u= 0 (71)

and hence from (70)∂H

∂u= 2uTR+ λT g(x) = 0 (72)

where it must be noted that the costate λ is not arbitrary and satisfying (72)implies that λ is a costate of the optimal control u∗. We denote this specialcostate λ∗. For purposes of clarity, the expression (72) is transposed and thenrewritten to reflect this

∂H

∂u∗= 2Ru∗ + gT (x)λ∗ = 0. (73)

Rearranging (73) gives an expression for the optimal control

u∗ = −1

2R−1gT (x)λ∗ (74)

where everything on the right-hand side is known except the “optimal costate”λ∗. This is precisely where the HJB equation enters the picture. Since by (56),we have on the optimal trajectory

λ∗ =

(∂V

∂x

)T

(75)

the expression for the optimal control (74) can be written as

u∗ = −1

2R−1gT (x)

(∂V

∂x

)T

(76)

and hence finding the solution to the HJB equation (which gives V ) allows theexplicit analytic expression of the optimal control u∗.

17

Notice that since the system under consideration is conservative, i.e., f =f(x) and g = g(x), the Hamiltonian (70) is not dependent on time

H = H(x,u, λ) (77)

and furthermore, the value function (69) is also not dependent on time

V = V (x,u). (78)

Therefore, we have∂V

∂t= 0 (79)

which implies that the HJB equation (60) reduces to

minuH

(x,∂V

∂x,u, t

)= 0 (80)

over the optimal trajectory. From the expression of the Hamiltonian H (70)

H = minu

xTQx + uTRu + λT [f(x) + g(x)u]

(81)

= xTQx + u∗TRu∗ +∂V

∂x[f(x) + g(x)u∗] = 0 (82)

which was obtained by using the relationship (75).Substituting the optimal control (63) into the modified HJB (75) yields the

partial differential equation

xTQx+1

4(R−1gT (x)λ∗)TR(R−1gT (x)λ∗)+

∂V

∂x

[f(x)− 1

2g(x)(R−1gT (x)λ∗)

]= 0

(83)or by simplifying and using (75)

∂V

∂xf(x) + xTQx− 1

4

∂V

∂xg(x)R−1gT (x)

∂V

∂x= 0. (84)

The only unknown in (84) is ∂V∂x or the partial derivative (with respect to the

states) of the optimal return function/value function. Therefore solving (84) issufficient to determine the optimal control (74).

Unfortunately, solving the partial differential equation (84) is extremely dif-ficult and frequently impossible. Thus, even though a feedback optimal controlbased on the HJB equation, as in (74) is very attractive especially over thefeedforward Euler-Lagrange optimal control solution, the added complexity insolving a partial differential equation such as (84) strictly limits its direct ap-plication [8].

Although several techniques have been proposed to provide a solution tothe HJB equation under special conditions, the problem is still, even after fivedecades, an active area of research. One such a technique is presented in thenext section in significant detail.

18

8 Generalized Hamilton-Jacobi Bellman Equa-tion

Traditionally, the challenge of solving a partial differential equation like (84)was tackled using what is known as the “method of characteristics” [8]. Thebasic idea behind this method is to reduce the partial differential equation intoa family of ordinary differential equations which are then integrated over differ-ent initial conditions to the terminal surface to obtain solutions to the partialdifferential equation. Such a scheme is very useful in studying the qualitativebehavior of partial differential equations and has extensive applications in (com-putational) fluid mechanics where it is used to study phenomena such as turbu-lence and shockwaves via the Navier-Stokes equations. However, its applicationin optimal control is not particularly beneficial. Firstly, the computation andstorage of solutions of (infinitely) large sets of ordinary differential equationsand initial conditions is prohibitive. In fact, this eliminates one of main reasonsof using the HJB solution to the optimal control problem; to avoid computa-tion of arbitarily large numbers of solutions to the two-point boundary valueproblem. Secondly, the solutions via the characteristic equations are not alwayswell-defined. Specifically, under certain conditions, multivalued solutions mightappear. Thirdly, in many cases, the method of characteristics does not cover theentire domain of the partial differential equation and the solution only exists ina weak sense. Despite these apparently critically shortcomings, during the earlyyears of optimal control, the method of characteristics was often considered theonly route to achieve a practical solution to the optimal control problem via theHJB equation.

During the 1970’s, other more efficient techniques hinging on system linearitywere developed to solve the HJB equation to obtain a feedback optimal control.If the system nonlinearities are small, peturbation methods can be used toachieve second-order approximations to the optimal control as was shown in[12, 20, 10, 11]. An explicit assumption in these is that the optimal control hasa sufficiently accurate second-order Taylor series expansion about the origin.This type of assumption severly limits the class of systems to which the methodis applicable. The stability region of the resulting control is also almost alwaysimpossible to determine. Perturbation methods, therefore, did not gain muchmomentum as viable schemes for numerical feedback optimal control.

As feedback linearization (or dynamic inversion) and geometric control gainedpopularity during the late 1980’s and 1990’s, several new attempts were made atattacking the numerical feedback optimal control problem. All of these involvedcanceling system nonlinearities via feedback (dynamic inversion) and then ap-plying optimal control theory to the subsequent linearized system [14, 9, 27].This method has several drawbacks: significant control effort is expended inforcing the nonlinear system to behave linearly, useful nonlinearities that mayhelp in control are eliminated, the dynamic inversion of the control matrix is notalways a global transformation, the dynamic inversion itself is computationallyexpensive and finally, the dynamic inversion is fragile to modeling uncertainties

19

and disturbances.Another approach to utilizing the HJB equation for optimal feedback control

takles the problem not by determining an optimal control u∗ but rather by suc-cessively optimizing an existing stabilizing suboptimal control u(0). The methodutilizes an alternative formulation of the Hamilton-Jacobi equation known as thegeneralized Hamilton-Jacobi-Bellman equation and was first proposed by Saridisand Lee in [24]. The design methodology was further refined in [2, 17, 3] byintroducing the use of Galerkin’s spectral method to approximate partial dif-ferential equations. The following is a detailed mathematical treatment of thismethodology using previously derived results in this report.

Consider a suboptimal stabilizing feedback control u(x) for the (conserva-tive) nonlinear system (66). Analagous to (67), let the suboptimal value functionfor this particular control be given by

V (x) =

∫ ∞t

[xTQx + uT (x)Ru(x)

]dt. (85)

We say that a feedback control u ∈ Ωu is admissible if u is continous and renders(66) asymptotically stable.

Assuming an admissible but suboptimal u is given, can the HJB equationbe exploited to optimize this control successively over time? This question wasfirst addressed by Saridis and Lee in [24] where they introduced the conceptof the generalized Hamilton-Jacobi-Bellman equation. The equation was thusnamed because it applied to all types of u and not just an optimal control. Itis introduced here based on previous results in a nonrigorous fashion.

Differentiating the suboptimal value function (85) along the trajectories ofthe system yields the differential form of the (suboptimal) value function

GHJB :∂V T

∂x[f(x) + g(x)u(x)] + xTQx + uT (x)Ru(x) = 0. (86)

This differential form of the (suboptimal) value function is known as the gener-alized Hamilton-Jacobi-Bellman (GHJB) equation. The solution of the GHJB

equation V is a Lyapunov function for (66) under the suboptimal control u [1].It represents the value function under a suboptimal control.

The development below closely follows Saridis and Lee [24]. Key theoremsare reproduced (in a standardized form) and presented without the proofs. The

first lemma relates the suboptimal value function V (x) to the true value functionV (x) under optimal control.

Lemma 1 Assume the optimal control u∗ and the optimal value function V (x)exist. Then these satisfy the GHJB equation (86) and

0 < V (x) ≤ V (x). (87)

The next theorem presents an approach to ensure a successively (at each stepor iteration) smaller suboptimal value function.

20

Theorem 1 If a sequence of pairs u(i), V (i) satisfying the GHJB equation(86) is generated by selecting the control u(i) to minimize the GHJB equation

associated with the previous value function V (i−1), e.g.,

u(i) = −1

2gT (x)

∂V (i−1)

∂x(88)

then the corresponding value function satisfies the inequality

V (i) ≤ V (i−1). (89)

Note the similarity between (88) and general expression for the optimal control(76). The corollary that follows is intuitively immediate from Lemma 1 andTheorem 1. It deals with the convergence of a sequence of suboptimal valuefunctions to the optimal value function given a control such as (88)

Corollary 1 By selecting pairs u(i), V (i) with

u(i) = −1

2gT (x)

∂V (i−1)

∂x(90)

the resulting sequence V (i) converges monotonically to the optimal value func-tion V (x) associated with the optimal control, i.e.,

V (0) ≥ V (1) ≥ V (2) ≥ . . . ≥ V. (91)

The final two theorems deal with construction of upper and lower bounds forthe true value function V (x). This is accomplished by obtaining functions thatonly marginally do not satisfy the GHJB equation on both sides (< 0 and > 0).

Theorem 2 Suppose for a given us(x) and some

s(x), |s(x)| <∞ (92)

there exists a continously differentiable positive definite function Vs = V (x,us)satisfying the properties

∂V Ts

∂x[f(x) + g(x)u(x)] + xTQx + uT (x)Ru(x) = ∆Vs ≤ s(x) < 0 (93)

Then Vs(x) is an upper bound of the optimal value function V (x)

Vs(x) > V (x). (94)

21

And similarly for the lower bound, we have the last theorem.

Theorem 3 Suppose for a given us(x) and some

s(x), |s(x)| <∞ (95)

there exists a continously differentiable positive definite function Vs = V (x,us)satisfying the properties

∂V Ts

∂x[f(x) + g(x)u(x)] + xTQx + uT (x)Ru(x) = ∆Vs ≥ s(x) > 0 (96)

Then Vs(x) is a lower bound of the optimal value function V (x)

Vs(x) < V (x). (97)

An exact design procedure for optimizing an initial admissible control u(0) ∈Ωu can now be formed from the previous results.

1. Select an initial admissible control u(0) ∈ Ωu for the system (66).

2. Solve the GHJB partial differential equation to find V (0)

∂V (0)T

∂x

[f(x) + g(x)u(0)(x)

]+ xTQx + u(0)T (x)Ru(0)(x) = 0. (98)

Then by Lemma 1, V (0) ≥ V .

3. Obtain an improved controller u(1) using Corollary 1

u(1) = −1

2gT (x)

∂V (0)

∂x. (99)

4. Solve the GHJB partial differential equation to find V (1)

∂V (1)T

∂x

[f(x) + g(x)u(1)(x)

]+xTQx+u(1)T (x)Ru(1)(x) = 0. (100)

Then by Lemma 1, V (0) > V (1) ≥ V .

5. Determine a lower bound Vs to the optimal value function using Theorem3.

6. Use V (1) − Vs as a measure to evaluate how close an approximation u(1)

is to the optimal control u∗. If acceptable, stop at this iteration.

7. Otherwise, if the approximation is not acceptable, repeat from step 2onwards with a new iteration.

22

The benefit of using the GHJB equation and the control design procedureoutlined is that we do not need to solve the HJB partial differential equationequation (84) directly. Rather, a much more amenable partial differential equa-tion needs to be solved in the form of the GHJB (86). Furthermore, the GHJBallows for an iteratively improving solution that addresses several implementa-tion challenges. Rather than have to solve the entire optimal control problemat once, the solution is divided into successively improving iterations, each ofwhich is useful in the control action since each is always better than the initialdesigned stabilizing controller.

A method to solve the GHJB equation is considered below.

9 Succesive Galerkin Approximation to the GHJBEquation

The solution to the GHJB equation (86) needs to be numerically determined inorder to utilize the design procedure outlined above. This problem was tackledby Beard in his doctoral work [2] and in the subsequent journal publication [3].An algorithm called Succesive Galerkin Approximation (SGA) was developedbased on the spectral method of Galerkin. A numerically efficient version of thealgorithm was also developed in [17]. Most famously, a discussion of the methodby Beard, Saridis and Wen appeared in the IEEE Control Engineering Magazine[1]. This section provides an outline of the method with its key points.

Let the system (66) be Lipschitz continous on a set Ω ⊂ Rn containing theorigin. Furthermore, let there exist a continous control on Ω that asymptoticallystabilizes the system, i.e., the system is controllable over Ω. Now assume theexistence of a set of basis functions φj∞1 , where φj : Ω → Rn are continous,

φ(0) = 0 and spanφj∞1 ⊆ L2(Ω). Then the solution V of the GHJB equation(86) can be written as

V (x) =

∞∑j=1

cjφj(x) (101)

where the cj are constants to be determined. It is not practical to have aninfinite summation as an approximation, and thus a large enough number N ischosen to truncate the solution. This truncated solution is referred to as VNand from (101), it is given by

VN (x) = cTNΦN (x) (102)

wherecTN =

[c1 . . . cN

](103)

andΦN (x) =

[φ1(x) . . . φN (x)

]T(104)

23

The vector of N constants cN is determined by ensuring orthogonality be-tween the GHJB expressed in terms of VN (x) and ΦN (x), i.e.,⟨

GHJB[VN (x)

],ΦN (x)

⟩Ω

= 0 (105)

where 〈., .〉Ω denotes the function inner product (integral) over the set Ω. Notethat in (105), the expression (101) is used. It follows that (105) is system ofN linear equations with N unknows. The system can be easily inverted todetermine cN as is shown in the following discussion.

The GHJB equation from (105) (in terms of the truncated approximation ofthe suboptimal value function) is written as

∂V TN

∂x[f(x) + g(x)u(x)] + xTQx + uT (x)Ru(x)

= cTN∂ΦN (x)

∂x[f(x) + g(x)u(x)] + xTQx + uT (x)Ru(x) (106)

where ∂ΦN/∂x ∈ RN×n is a matrix quantity. For convenience denote this as

∂ΦN (x)

∂x= ∇ΦN (x) =

[∂φ1(x)

∂x. . .

∂φN (x)

∂x

]T. (107)

Then from (106), it follows that the GHJB equation is

cTN∇ΦN (x) [f(x) + g(x)u(x)] +[xTQx + uT (x)Ru(x)

]. (108)

Transposing (108)

[f(x) + g(x)u(x)]T ∇ΦT

N (x)cN +[xTQx + uT (x)Ru(x)

]. (109)

and then substituting into (109) yields⟨[f(x) + g(x)u(x)]

T ∇ΦTN (x),ΦN

⟩ΩcN+

⟨xTQx,ΦN

⟩Ω

+⟨uT (x)Ru(x),ΦN

⟩Ω

= 0.

(110)or (∫

Ω


N (x)ΦN

)cN+

∫Ω

xTQxΦN+

∫Ω

uT (x)Ru(x),ΦN

=

(∫Ω


N (x)ΦN

)cN+

(∫Ω

(xTQx + uT (x)Ru(x)

)ΦN

)= acN + b = 0. (111)

where a ∈ R, cN ∈ RN and b ∈ RN . Thus cj maybe found element by elementusing

cj = −bja

(112)

24

where bj is the j-th element of b. Once these are determined, (102) is usedto form the truncated approximation of the suboptimal value function. Theconvergence and validity proofs for this type of approximation is dealt with in[2].

The basis functions have not been discussed so far. Polynomials, in mostcases, are sufficient. Moreover, if these are orthogonal, better results are ex-pected. Increasing the number of these basis functions, i.e., increasing N , hasan exponential effect on the computation required [17]. It is therefore, impor-tant to choose the basis vectors carefully. Lawton and Beard showed in [17]that choosing the basis functions such that they are separable and assuming thedomain Ω to be rectangular allows for the formulation of significantly compu-tationally cheaper versions of the SGA algorithm. Polynomials are separablefunctions and therefore play an important role in that work.

Despite the attractiveness of the methods presented, they still pose chal-lenges when it comes to addressing one of the prime reasons for utilizing theHJB equation for optimal control; to allow for a closed-form solution to the op-timal feedback problem that can be used efficiently in realistic scenarios. In thisrespect, the GHJB/SGA algorithm is not unique among the other methodolo-gies in numerical optimal feedback control. As the system order increases andcomputational resources become more restrictive, most methodologies becomeinfeasible. Thus, using such algorithms in embedded systems or to efficientlycontrol complex systems (like aircraft) is often impossible.

10 Conclusion

A broad discussion of optimal control was presented. A history and the basicproblem of optimal control were shown. This was followed by a derivation ofstandard results in optimal control theory along with discussions of the connec-tions between classical mechanics and optimal control theory. The report endedwith a discussion of more recent results in optimal control theory, namely, resultsto make optimal control theory more practically viable.

Even half a century after the initial results published independently by Bell-man and Pontryagin, optimal control remains a vibrant area of research withmuch sought after results. Rather than recede to the background in light ofthe latest developments, optimal control is becoming more and more relevant.This is not least because of the huge strides achieved in computational power.Mathematical developments and the race towards achieving computationally vi-able schemes for simulation also indirectly benefit optimal control theory. Withits wide applications and promise for future research, optimal control remainsa high value research area. Since the theoretical foundation of optimal controltheory has already been laid, this high value research is geared towards achievingnumerical schemes to make optimal control more practical.

25

References

[1] R. Beard, G. Saridis, and J. Wen, “Improving the performance of stabilizingcontrols for nonlinear systems,” Control Systems Magazine, IEEE, vol. 16,no. 5, pp. 27–35, 1996.

[2] R. Beard, “Improving the closed-loop performance of nonlinear systems,”Ph.D. dissertation, Rensselaer Polytechnic Institute, 1995.

[3] R. Beard, G. Saridis, and J. Wen, “Galerkin approximations of the gener-alized Hamilton-Jacobi-Bellman equation* 1,” Automatica, vol. 33, no. 12,pp. 2159–2177, 1997.

[4] R. Bellman, “On the theory of dynamic programming,” Proceedings of theNational Academy of Sciences of the United States of America, vol. 38,no. 8, p. 716, 1952.

[5] ——, The theory of dynamic programming. Defense Technical InformationCenter, 1954.

[6] ——, “An introduction to the theory of dynamic programming.” 1953.

[7] ——, Eye of the Hurricane: an Autobiography. World Scientific, 1984.

[8] A. Bryson and Y. Ho, Applied optimal control. American Institute ofAeronautics and Astronautics, 1979.

[9] L. Gao, L. Chen, Y. Fan, and H. Ma, “A nonlinear control design for powersystems,” Automatica, vol. 28, no. 5, pp. 975–979, 1992.

[10] W. Garrard, “Suboptimal feedback control for nonlinear systems,” Auto-matica, vol. 8, no. 2, pp. 219–221, 1972.

[11] W. Garrard and J. Jordan, “Design of nonlinear automatic flight controlsystems,” Automatica, vol. 13, no. 5, pp. 497–505, 1977.

[12] W. Garrard, N. McClamroch, and L. Clark, “An approach to sub-optimalfeedback control of non-linear systems,” International Journal of Control,vol. 5, no. 5, pp. 425–435, 1967.

[13] H. Goldstein, C. Poole, J. Safko, and S. Addison, “Classical mechanics,”American Journal of Physics, vol. 70, p. 782, 2002.

[14] A. Isidori, Nonlinear control systems. Springer Verlag, 1995.

[15] A. Klumpp, “Apollo lunar descent guidance,” Automatica, vol. 10, no. 2,pp. 133–146, 1974.

[16] C. Lanczos, The variational principles of mechanics. Dover Publications,1970.

26

[17] J. Lawton and R. Beard, “Numerically efficient approximations to theHamilton-Jacobi-Bellman equation,” in American Control Conference,1998. Proceedings of the 1998, vol. 1. IEEE, 1998, pp. 195–199.

[18] F. Lewis, Applied optimal control and estimation. Prentice Hall PTR,1992.

[19] F. Lewis and V. Syrmos, Optimal control. Wiley-Interscience, 1995.

[20] Y. Nishikawa, N. Sannomiya, and H. Itakura, “A method for suboptimaldesign of nonlinear feedback systems,” Automatica, vol. 7, no. 6, pp. 703–712, 1971.

[21] J. Papastavridis and J. Papastavridis, Analytical Mechanics. Oxford Uni-versity Press, 2002.

[22] L. Pontryagin, “Optimal regulation processes,” Uspekhi MatematicheskikhNauk, vol. 14, no. 1, pp. 3–20, 1959.

[23] L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko, Themathematical theory of optimal control processes. Interscience, New York,1962.

[24] G. Saridis and C. Lee, “An approximation theory of optimal control fortrainable manipulators,” Systems, Man and Cybernetics, IEEE Transac-tions on, vol. 9, no. 3, pp. 152–159, 1979.

[25] S. Sethi and G. Thompson, Optimal control theory: applications to man-agement science and economics. Springer Verlag, 2005.

[26] H. Sussmann and J. Willems, “300 years of optimal control: from thebrachystochrone to the maximum principle,” Control Systems Magazine,IEEE, vol. 17, no. 3, pp. 32–44, 1997.

[27] Y. Wang, D. Hill, R. Middleton, and L. Gao, “Transient stabilization ofpower systems with an adaptive control law* 1,” Automatica, vol. 30, no. 9,pp. 1409–1413, 1994.

[28] J. Willems, “1696: the birth of optimal control,” in Decision and Control,1996., Proceedings of the 35th IEEE, vol. 2. IEEE, 1996, pp. 1586–1587.

27

Technology

Optimal Control: Perspectives from the Variational Principles of Mechanics