Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
An Overview of Risk-Sensitive StochasticOptimal Control
Matt James
Australian National University
Workshop:
Stochastic control, communications and numerics perspective
UniSA, Adelaide, Sept. 2004.
1
Outline
• A brief history
• Approaches to controller design
• Relationships among approaches
• Some limit results (state feedback)
• Robust interpretation
• Partially observed case
• Quantum systems
• References
2
A brief history
The risk-sensitive criterion introduced by Jacobson 1973.
[Linear-Exponential-Quadratic-Gaussian (LEQG)]
Linear system with Gaussian noise:
dx = (Ax + Bu)dt + dw
Minimize average of exponential of quadratic cost:
Jµ = Ex,t
[
expµ
(
∫ T
t
12 [x′Qx + 1
2 |u|2] ds + 1
2x′Rx
)]
µ > 0 - risk-sensitive
µ < 0 - risk-seeking
3
solution
Optimal state feedback control is
u∗(x, t) = −B′Pµt x
where 0 ≤ t ≤ T
Pµt = −Pµ
t A − A′Pµt + Pµ
t (BB′ − µI) Pµt − Q,
PµT = R
[control type Riccati equation]
4
Limit µ → 0: recover standard LQG control (risk-neutral).
[Linear-Quadratic-Gaussian (LQG)]
Linear system with Gaussian noise:
dx = (Ax + Bu)dt + dw
Minimize average of quadratic cost:
Ex,t
[
∫ T
t
12 [x′Qx + 1
2 |u|2] ds + 1
2x′Rx
]
Solution: Optimal state feedback control is
u∗(x, t) = −B′Ptx
where
Pt = −PtA − A′Pt + PtBB′Pt − Q,
PT = R
5
partially observed case
Linear system with Gaussian noise:
dx = (Ax + Bu)dt + dw
dy = Cxdt + dv
Minimize average of exponential of quadratic cost
Jµ = Ex,t
[
expµ
(
∫ T
t
12 [x′Qx + 1
2 |u|2] ds + 1
2x′Rx
)]
over causal maps y(·) 7→ u(·).
6
Solution: Whittle 1981, Bensoussan-van Schuppen 1985.
Optimal partially observed control is
u∗t = −B′Pµ
t xµt
where
dxµ = ([A + µY µQ]xµ + Bu)dt + Y µC ′(dy − Cxµdt)
and
Y µ = AY µ + Y µA′ + µY µQY µ + I − Y µC ′CY µ
Significantly, this is not the Kalman filter.
When µ → 0, LEQG → LQG, whose solution is expressed in terms of
the Kalman filter.
7
Some subsequent developments:
• LEQG. Connection with dynamic games. (Jacobson, 1973)
• LEQG. Connection with robust control (H∞). (Glover-Doyle, 1988)
• Risk-sensitive control for nonlinear systems, state feedback.
Connection with dynamic games. (Fleming-McEneaney 1992, James,
1992, Nagai 1996)
• Solution of risk-sensitive control for nonlinear systems, output
feedback. Solution and connection with dynamic games and nonlinear
robust control. (James-Baras-Elliott 1994, Charalambous-Hibey 1996,
Dai Pra-Meneghini-Runggaldier 1996. )
• Robustness properties of risk-sensitive criterion. Also stochastic IQC.
(James, Fleming, Boel, Dupuis, Petersen, 1995, 1998)
• Risk-sensitive control for finance. (Nagai 2000)
• Risk-sensitive control for quantum systems. (James 2004)
8
Approaches to controller design
The basic aim of robust control is:
Design a controller to achieve “satisfactory performance” in
the presence of disturbances and uncertainty.
Stochastic control has similar aims.
9
Dynamics H∞ robust control
xs = f(xs) + g(xs)ds, 0 < s < T,
zs = h(xs)
f(0) = 0, h(0) = 0, x0 = 0
Disturbance—Output Map
Td, z : d(·) 7→ z(·)
H∞ Norm
‖ Td, z ‖∞△= sup
d∈L2[0,T ]
‖ z ‖2
‖ d ‖2
Note:
‖ Td, z ‖∞≤ γ iff
∫ T
0
|z(t)|2dt ≤∫ T
0
γ2|d(t)|2 dt ∀d ∈ L2[0, T ]
10
D-H2 deterministic optimal control
disturbance: none
controller: gives optimal performance without
special consideration of disturbances.
Optimal cost
W (x, t) = infu
[
∫ T
t
L(xs, us) ds + Φ(xT )
]
Dynamics
xs = b(xs, us), t < s < T,
xt = x
11
S-H2 (RN) stochastic optimal control
disturbance: noise
controller: gives optimal average
performance
Optimal cost
W ε(x, t) = infu
Ex,t
[
∫ T
t
L(xεs, us) ds + Φ(xT )
]
Dynamics
dxεs = b(xε
s, us) ds +√
ε dBs, t < s < T,
xεt = x
(ε > 0 - noise intensity)
12
RS stochastic risk-sensitive optimal control
disturbance: noise
controller: gives optimal average performance
using exponential cost
(heavily penalizes large values)
Optimal cost
Sµ,ε(x, t) = inf
uEx,t
[
expµ
ε
(∫ T
t
L(xεs, us) ds + Φ(xε
T )
)]
Dynamics
dxεs = b(xε
s, us) ds +√
ε dBs, t < s < T,
xεt = x
(µ > 0 - risk sensitivity)
13
H∞ H∞ robust control
disturbance: L2 signal
controller: achieves specified H∞
norm constraint
(worst-case design)
Find u(·) such that
‖ Td, z ‖∞ ≤ γ
Dynamics
xs = b(xs, us) + ds t < s < T
xt = x
zs = h(xs)
(γ = 1/√
µ - H∞ constraint)
14
D-DG deterministic differential game
disturbance: L2 signal
controller: gives optimal performance subject
to opposing efforts of disturbance
(worst-case design)
Optimal cost
Wµ(x, t) = sup
d
infu
[∫ T
t
L(xs, us) −1
2µ|ds|
2ds + Φ(xT )
]
Dynamics
xs = b(xs, us) + ds, t < s < T
xt = x
15
S-DG stochastic differential game
disturbances: noise
L2 signal
controller: gives optimal average performance
subject to opposing efforts of
disturbance (worst-case design)
Optimal cost
Wµ,ε(x, t) = supd
infu
Ex,t
[
∫ T
t
L(xεs, us) − 1
2µ|ds|2 ds + Φ(xε
T )
]
Dynamics
dxεs = (b(xε
s, us) + ds) ds +√
ε dBs, t < s < T
xεt = x
16
Relationships among approaches
ε → 0
?
-
?
D-H2
S-H2RS
D-DG
µ → 0
µ → 0
ε → 0
-
RS, S-DG, D-DG, and S-H2 are
“perturbations”
of D-H2.
17
(James, Fleming-McEneaney, Whittle)
RS ≡ S-DG
≡ D-H2
+ (noise variance term)
+ (disturbance energy term)
Equivalence via
• logarithmic transformation,
• dynamic programming PDEs, and
• representation theorem
Expansion valid for
• small noise intensity and
• small risk-sensitivity
18
S-H2 ≡ D-H2 + (noise variance term)
H∞ ≡ D-H2 + (disturbance energy term)
RS ≡ H2 + H∞ ???
19
Some limit results (state feedback)
Assumptions
• b(x, u) = f(x) + g(x)u, where f : Rn → Rn, g : Rm → Rn and
their first order derivatives Df , Dg are bounded and uniformly
Lipschitz continuous.
• Control values: U ⊂ Rm is compact and convex.
• Disturbance values: D = Rn.
• Φ : Rn → R is non–negative; Φ and DΦ are bounded and
uniformly Lipschitz continuous.
20
• L : Rn × Rm → R is C2, non–negative; L(·, u) and DxL(·, u) are
bounded and uniformly Lipschitz continuous, uniformly in u ∈ U ;
and D2uL(x, ·) > 0, uniformly in x ∈ Rn.
• There exists locally Lipschitz U∗(x, p) achieving maximum in
minu∈U
[p · b(x, u) + L(x, u)]
• Use Elliott–Kalton formulation for two–player zero–sum
differential games (strategies, upper and lower values, etc)
• For later use, assume also
– L(x, u) = 12 |u|2 + V (x), and the functions f , g, V and Φ are of
class C∞ and have compact support.
– U∗(x, p) is of class C2.
21
Viscosity Solutions
Viscosity solutions introduced by Crandall-Lions 1983.
The definition for a fully nonlinear parabolic PDE
∂v∂t
+ F (x, Dv, D2v) = 0 in Rn × (0, T )
v(x, T ) = Φ(x) in Rn.
is as follows.
22
Definition
An upper semicontinuous (u.s.c.) function v (resp. l.s.c. function v)
is called a viscosity subsolution (resp. supersolution) if
v ≤ Φ (resp. v ≥ Φ) on Rn × {T}
and
∂
∂tφ(x, t) + F (x, Dφ(x, t), D2φ(x, t)) ≥ 0 (resp. ≤ 0)
for every smooth function φ and any local maximum (resp.
minimum) (x, t) ∈ Rn × [0, T ) of v − φ (resp. v − φ).
A continuous function is called a viscosity solution if it is both a
subsolution and a supersolution.
23
Comparison Theorem
If v is a subsolution and if v is a supersolution, then
v ≤ v in Rn × [0, T ].
Thus any continuous viscosity solution is unique.
24
Dynamic Programming PDEs
RS (ε > 0, µ > 0)
∂Sµ,ε
∂t+ ε
2∆Sµ,ε+
minu∈U [DSµ,ε · b(x, u) +µεL(x, u)Sµ,ε] = 0 in Rn × (0, T )
Sµ,ε(x, T ) = exp µεΦ(x) in Rn
Logarithmic transformation: (Fleming, 1970’s)
Wµ,ε(x, t) =ε
µlog Sµ,ε(x, t)
25
S-DG (ε > 0, µ > 0)
∂W µ,ε
∂t+ ε
2∆Wµ,ε+
minu∈U maxd∈D[DWµ,ε ·(b(x, u) + d) + L(x, u)
− 12µ
|d|2] = 0 in Rn × (0, T )
Wµ,ε(x, T ) = Φ(x) in Rn.
Note:
minu∈U maxd∈D
[
p · (b(x, u) + d) + L(x, u) − 1
2µ|d|2]
= minu∈U [p · b(x, u) + L(x, u)] + 1
2µ|p|2
26
Representation theorem
Viscosity formulation of stochastic differential games developed by
• Fleming–Souganidis (1989).
Theorem (James, 1992; Fleming-McEneany 1992) The function
Wµ,ε defined by the logarithmic transformation is the optimal cost
function of the stochastic differential game defined above.
i.e. RS ≡ S-DG
Proof : • Methods of Fleming (1971) imply
|DWµ,ε(x, t)| ≤ C∗ for all (x, t) ∈ Rn × [0, T ].
• In view of this bound, the set of disturbance values D can be
taken as bounded.
• The result now follows from Fleming–Souganidis (1989). 2
27
Limit PDEs
Obtained as ε ↓ 0 and/or µ ↓ 0.
D-DG (ε ↓ 0, µ > 0)
∂W µ
∂t+ minu∈U maxd∈D[DWµ ·(b(x, u) + d)
+ L(x, u) − 12µ
|d|2] = 0 in Rn × (0, T )
Wµ(x, T ) = Φ(x) in Rn.
28
S-H2 (ε > 0, µ ↓ 0)
∂W ε
∂t+ ε
2∆W ε+
minu∈U [DW ε · b(x, u) +L(x, u)] = 0 in Rn × (0, T )
W ε(x, T ) = Φ(x) in Rn.
D-H2 (ε ↓ 0, µ ↓ 0)
∂W∂t
+ minu∈U [DW · b(x, u) + L(x, u)] = 0 in Rn × (0, T )
W (x, T ) = Φ(x) in Rn.
29
Limit theorems
Limit theorems depend on general viscosity limit techniques
developed by
• Evans–Ishii (1984)
• Barles–Perthame (1987)
Theorem (James, 1992) We have
limε↓0
Wµ,ε(x, t) = W
µ(x, t)
uniformly on compact subsets, where W µ ∈ C(Rn × [0, T ]) is the unique
bounded continuous viscosity solution of the corresponding PDE and is the
optimal cost function of the deterministic differential game defined above.
i.e. S-DG −→ D-DG as ε ↓ 0.
30
Proof : • Assumptions imply
0 ≤ Wµ,ε(x, t) ≤ C for all (x, t) ∈ R
n × [0, T ], µ, ε > 0
• The function
v(x, t) = lim supε↓0, y→x, s→t
Wµ,ε(y, s).
is u.s.c. and a viscosity subsolution.
• The function
v(x, t) = lim infε↓0, y→x, s→t
Wµ,ε(y, s)
is l.s.c and a viscosity supersolution.
• By construction
v ≤ v
• By the comparison theorem
v ≤ v
31
• Therefore
limε↓0
Wµ,ε(x, t) = v = v△= v
• From Evans–Souganidis (1984) the value Wµ of the D-DG is
the unique viscosity solution, and hence
Wµ = v.
2
Expansion of optimal cost for S-DG
W aε,bε(x, t) = W (x, t) + ε (Wg(x, t), Wn(x, t)) ·
a
b
+ . . .
[methods of Fleming, 1970’s]
32
Robust interpretation
(Boel, Dupuis, James, Petersen 1998, 2000)
We wish to interpret the finiteness of the risk-sensitive criterion for a
nominal system G0 in terms of a performance bound for a perturbed
system G.
nominal system
- -
ZwG0
33
Nominal system dynamics:
G0 :
dXt = b(Xt)dt + ε12 σ(Xt)dwt
Zt = h(Xt)
Risk-sensitive criterion:
S(x, T ).= Ex
[
exp1
2γ2ε
∫ T
0
|Zt|2dt
]
γ2 = 1/µ
34
Key tool is the representation [Dupuis-Ellis 1997]
γ2ε log S(x, T ) = supv∈VT
Ex
[
1
2
∫ T
0
(
|Zt|2 − γ2|vt|2)
dt
]
where Xt is given by perturbed system dynamics:
G :
dXt = b(Xt)dt + σ(Xt)vtdt + ε12 σ(Xt)dwt
Zt = h(Xt)
35
perturbed system
- -
�
-
ZwG
perturbation
Xv
robustness inequality
Ex
[
1
2
∫ T
0
|Zt|2dt
]
≤ γ2Ex
[
1
2
∫ T
0
|vt|2dt
]
+ γ2ε log S(x, T )
(analogous to H∞ inequality)
36
Partially observed (output feedback) case
(James-Baras-Elliott, 1994)
Dynamics
dxt = b(xt, ut) dt +√
ε dwt
dyt = h(xt) dt +√
ε dvt
Objective γ2 = 1/µ
J(u) = E
[
expµ
ε
(
∫ T
0
L(xt, ut) dt + Φ(xT )
)]
37
solution of RS
Information state σµ, εt (x) not cond prob
〈σµ,εt , η〉 = E0
[
η(xεt )Z
εt exp
(
µ
ε
∫ t
0
L(xεs, us) ds
)
| Yt
]
for all η
Dynamics stochastic PDE
dσµ,εt =
(
Aut ∗ + µεLut)
σµ,εt dt + 1
εhσµ,ε
t dyεt ,
σµ,ε0 = ρ in L1(R
n),
Representation
J(u) = E0[
〈σµ,εT , exp
µ
εΦ〉]
Linear case: Bensoussan-van Schuppen
38
Dynamic programming
Value function
Sµ,ε(σ, t) = infu∈Ut,T
E0σ,t
[
〈σµ,εT , e
µεΦ〉]
.
Dyn prog equation infinite dim, nonlinear, 2nd-order
∂∂t
Sµ,ε + 12ε
D2Sµ,ε(hσ, hσ)
+ infu∈U
{
DSµ,ε ·(
Au ∗σ + µεLuσ
)}
= 0
Sµ,ε(σ, T ) = 〈σ, eµεΦ〉
Optimal controller
u∗t (σ) = achieves min in d.p.e.
information state feedback
39
Small Noise limit
Duality
〈σ, ν〉 △=
∫
Rn
σ(x)ν(x) dx
max-plus
(p, q)△= sup
x∈Rn
{ p(x) + q(x) }
large deviations
limε→0
ε
µlog〈eµ
εp, e
µε
q〉 = (p, q)
40
Information state limit
limε→0
ε
µlog σµ,ε
t (x) = pµt (x),
Value function limit
limε→0
ε
µlog Sµ,ε(e
µε
p, t) = Wµ(p, t)
Some consequences
• Can solve output feedback deterministic dynamic games (D-DG)
using analogous, though deterministic, information state methods
(cf. max-plus).
• Output feedback nonlinear H∞ control problem can be solved
using the D-DG methods.
41
Risk-sensitive control of quantum systems
(James, 2004 - current)
bath (EM field)
X dB†, dB
H
system (atom)
Γ
System: eg. atom, can be controlled, Hilbert space H
Bath: e.g. electromagnetic field, continuously monitored, Hilbert space Γ (Fock space)
42
Dynamics on H ⊗ Γ: [Hudson-Parthasarathy quantum stochastic DE, 1980’s]
dU(t) ={
−K(u(t))dt + LdB†(t) − L†dB(t)}
U(t)
with initial condition U(0) = I, where
K(u) =i
~H(u) +
1
2L†L
System operators evolve according to
X(t) = jt(u, X) = U†(t)X ⊗ IU(t)
i.e.
dX(t) + (X(t)K(t) + K†(t)X(t) − L†(t)X(t)L(t))dt
= [X(t), L(t)]dB†(t) − [X(t), L†(t)]dB(t)
where
L(t) = jt(u, L), K(t) = jt(u, K(u(t)))
43
Measurements of real quadrature of field: Qt = Bt + B†t
dY (t) = (L(t) + L†(t))dt+ dQ(t)
output field input field
Risk-sensitive cost: [James, 2004]
Jµ = 〈π0 ⊗ vv†, R†(T )eµC2(T )R(T )〉
where [〈A, B〉 = tr(A†B)]
dR(t)
dt=
µ
2C1(t)R(t), C1(t) = jt(u, C1(u(t))), C2(t) = jt(u, C2)
π0 = initial system state
vv† = field vacuum state
44
Stochastic representation:
Jµ = E0[〈σµT , eµC2〉]
where P0 is standard Wiener measure, and the information state (an
unnormalized density operator) satisfies
dσµt +[Kσµ
t +σµt K†−LσtL
†]dt = µ12 [C1σ
µt +σµ
t C1]dt+[Lσµt +σµ
t L†]dy(t)
[modified stochastic master equation]
πµt =
σµt
〈σt, 1〉[unnormalized state, new to physics]
When µ → 0 recover the Belavkin quantum filtering equation
(stochastic master equation) (1980’s) [unnormalized state σt,
normalized state πt = σt/〈σt, 1〉].
45
The dynamic programming equation is, formally,
∂∂t
Sµ(σ, t) + infu∈U Lµ;uSµ(σ, t) = 0, 0 ≤ t < T
Sµ(σ, T ) = 〈σ, eµC2〉
Some related current work
• Discrete time (James 2004)
• Robustness interpretation, discrete time (James-Petersen 2004)
• Robustness, linear/quadratic case (Doherty, d’Helon, James,
Petersen, Wilson)
46
References[1] T. Basar and P. Bernhard. H∞ Optimal Control and Related Minimax Design Problems. Birkhauser,
Boston, 1995.
[2] A. Bensoussan and J. H. van Schuppen. Optimal control of partially observable stochastic
systems with an exponential-of-integral performance index. SIAM Journal on Control and
Optimization, 23:599–613, 1985.
[3] P. Dupuis and R.S. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. Wiley, New
York, 1997.
[4] W.H. Fleming and W.M. McEneaney. Risk-sensitive control on an infinite time horizon. SIAM J.
Control and Optimization, 33:1881–1915, 1995.
[5] K. Glover and J. Doyle. State space formulea for all stabilizing controllers that statisfy an H∞norm bound and relations to risk-sensitivity. Systems and Control Letters, 11:167–172, 1988.
[6] J.W. Helton and M.R. James. Extending H∞ Control to Nonlinear Systems: Control of Nonlinear
Systems to Achieve Performance Objectives, volume 1 of Advances in Design and Control. SIAM,
Philadelphia, 1999.
[7] D.H. Jacobson. Optimal stochastic linear systems with exponential performance criteria and
their relation to deterministic differential games. IEEE Trans. Aut. Control, 18(2):124–131, 1973.
[8] M. R. James, J.S. Baras, and R.J. Elliott. Risk-sensitive control and dynamic games for
partially observed discrete-time nonlinear systems. IEEE Trans. Automatic Control, 39(4):780–792,
1994.
[9] M.R. James. Asymptotic analysis of nonlinear stochastic risk-sensitive control and differential
games. Mathematics of Control, Signals and Systems, 5(4):401–417, 1992.
[10] M.R. James. Risk-sensitive optimal control of quantum systems. Phys. Rev. A, 69:032108, 2004.
47
[11] P. Dai Pra, L. Meneghini, and W.J. Runggaldier. Connections between stochastic control and
dynamic games. Math. Control, Signals and Systems, 9(4):303–326, 1996.
[12] A.J. van der Schaft. L2-Gain and Passivity Techniques in Nonlinear Control. Springer Verlag, New
York, 1996.
[13] P. Whittle. Risk-sensitive linear quadratic Gaussian control. Advances in Applied Probability,
13:764–777, 1981.
[14] G. Zames. On the input-output stability of time-varying nonlinear feedback systems. part i:
Conditions derived using concepts of loop gain, conicity, and positivity. part ii: Conditions
involving circles in the frequency plane and sector nonlinearities. IEEE Trans. Aut. Control,
11:228–238, 465–476, 1966.
[15] G. Zames. Feedback and optimal sensitivity: Model reference transformation, multiplicative
seminorms and approximate inverses. IEEE Trans. Aut. Control, 26:301–320, 1981.
[16] C.D. Charalambous and J.L. Hibey, ” Minimum principle for partially observable nonlinear
risk-sensitive control problems using measure-valued decompositions, ” Stochastics and
Stochastics Reports, vol. 57, no. 3+4, pp. 247-288, August 1996.
[17] W.H. Fleming and M.R. James, The Risk-Sensitive Index and the H2 and H∞ Norms for Nonlinear Systems,
Mathematics of Control, Signals, and Systems, 8(3), 1995, 199-221.
[18] H. Nagai, Bellman Equation of Risk-Sensitive Control, SIAM J. Control Optim., 34(1), 1996, 74-101.
[19] M.G. Crandall, L.C. Evans and P.L. Lions, Some Properties of Viscosity Solutions of Hamilton–Jacobi
Equations, Trans. AMS, 282 (2) (1984) 487–502.
[20] M.G. Crandall, H. Ishii and P.L. Lions, User’s Guide to Viscosity Solutions of Second Order Partial Differential
Equations, CEREMADE Report No. 9039, (1990).
[21] P. Dupuis, M.R. James, and I.R. Petersen, Robust Properties of Risk-Sensitive Control, Math. Control,
Systems and Signals, 13, 318-332, 2000.
[22] M. Bardi and I. Capuzzo-Dolcetta, “Optimal Control and Viscosity Solutions of
Hamilton-Jacobi-Bellman Equations”, Birkhauser, Boston, 1997.
48
[23] W.H. Fleming and H.M. Soner, “Controlled Markov Processes and Viscosity Solutions”,
Springer, New York, 1993.
49