Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Maximum principle for one kind of discrete-timestochastic optimal control problem and its
applications
Zhen WU
School of Mathematics, Shandong University
Joint work with Dr. Feng ZHANG (SDUFE)
1 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Outline
1 Introduction
2 Maximum principleProblem formulation and preliminariesMaximum principle
3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
5 Conclusions and more extensions
2 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Outline
1 Introduction
2 Maximum principleProblem formulation and preliminariesMaximum principle
3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
5 Conclusions and more extensions
3 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Stochastic MP
The maximum principle (MP ) is a milestone of optimal controltheory.
For stochastic MP, see e.g. J.M.Bismut (1978), A.Bensoussan(1981), U.G.Haussmann (1986), S.G.Peng (1990), J.M.Yong(2010) and Z.Wu (2013).
4 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Discrete-time stochastic optimal control problems
Discrete-time optimal control problems are as important ascontinuous-time ones.
Difference equations are in their own right importantmathematical models.In many situations, it is sufficient or natural to describe asystem by a discrete-time model.
signal values are sometimes only available at certain times.discretization of the dynamics of continuous-time problems.computer manipulation.
Difficulties in studying discrete-time stochastic optimalcontrol problems:
differentiation and integration operations no longer work;well-known tools of Ito’s Calculus do not necessarily hold (Ito’sformula, BDG inequality, etc.).
5 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Works for discrete-time stochastic optimal controls
To turn the original problem into a deterministic one and thenuse a matrix version of MP:A.Beghi and D.D’ Alessandro (J. Optim. Theory Appl., 1998),M.Ait Rami, X.Chen and X.Y.Zhou (J. Global Optim., 2002).
To use the completion of squares for LQ problems:J.B.Moore, X.Y.Zhou and A.E.B.Lim (Syst. Control Lett.,1999), R.J.Elliott, X.Li and Y.H.Ni (Automatica, 2013).
Oswaldo L.V.Costa, D.Li, J.F.Zhang, H.S.Zhang, W.H.Zhang,. . .
6 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Comments on one literature
X.Y.Lin and W.H.Zhang, A maximum principle for optimal control of
discrete-time stochastic systems with multiplicative noise, IEEE
Transactions on Automatic Control, 60, 1121–1126, 2015.Main contributions:
necessary as well as sufficient conditions are established for onekind of discrete-time stochastic optimal control problem;one kind of new backward stochastic difference equation isintroduced to be the adjoint equation.
Some deficiencies:the assumptions are strict, excluding the quadratic costfunctional;the convex perturbation is not appropriately defined;the necessary condition does not necessarily hold for generalconvex control domains;the square-integrability of the solution to the adjointequation is not guaranteed.
7 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Results and contributions of this talk
This talk considers one kind of discrete-time stochastic optimalcontrol problem with convex control domains.
Main results:
necessary MPsufficient MP (verification theorem)extensions and applications
Main contributions:
discrete-time stochastic MP in a rigorous versionproof in a clear and concise wayroad for further related topics
8 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Outline
1 Introduction
2 Maximum principleProblem formulation and preliminariesMaximum principle
3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
5 Conclusions and more extensions
9 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Notations
(Ω,F ,P): a probability space
For fixed N ∈ N, T := 0, 1, · · · , N − 1, T := 1, 2, · · · , N.
Noises: Wk = (W 1k , · · · ,W d
k )> ∈ Rd, k ∈ T .
Filtration F0,F1, · · · ,FN ⊂ F :
F0 = ∅,Ω, Fk = σW0,W1, · · · ,Wk−1, k ∈ T.
10 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Noises and admissible controls
Assume that for some α > 2,
E [|Wk|α|Fk] = C(α) <∞, ∀k ∈ T . (1)
Let U0, · · · , UN−1 be nonempty convex subsets of Rm.u = u0, · · · , uN−1 is called an admissible control, if
uk is an Uk-valued, Fk-measurable r.v. for each k ∈ T ;for some β > 2,
EN−1∑k=0
|uk|β <∞. (2)
Denote by U the set of admissible controls.
11 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Problem formulation
The discrete-time stochastic optimal control problem, Problem (OCP),is to find u∗ = u∗0, · · · , u∗N−1 ∈ U such that
J(u∗) = infu∈U
J(u),
where the state X is defined byXk+1 = b(k,Xk, uk) + σ(k,Xk, uk)Wk, k ∈ T ,X0 = x0 ∈ Rn, (3)
and the cost J is given by
J(u) = E
N−1∑k=0
l(k,Xk, uk) + h(XN )
.
Such u∗ is called an optimal control; the corresponding system state is
denoted by X∗ = X∗k , k ∈ T.12 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Assumptions
The coefficients are defined by
b(k, ·, ·, ·) : Rn ×Uk ×Ω→ Rn, σ(k, ·, ·, ·) : Rn ×Uk ×Ω→ Rn×d,l(k, ·, ·, ·) : Rn × Uk × Ω→ R and h(·, ·) : Rn × Ω→ R for k ∈ T .
(H1) b(k, x, v) and σ(k, x, v) are Fk-measurable for any (k, x, v),and continuously differentiable in (x, v) with bounded derivatives.In addition, |b(k, 0, 0)|+ |σ(k, 0, 0)| ≤ C for some C > 0.
(H2) l(k, x, v) is Fk-measurable and h(x) is FN -measurable forany (k, x, v); (l, h) are continuously differentiable in (x, v), and thederivatives are bounded by C(1 + |x|+ |v|). Besides,
E[∑N−1
k=0 |l(k, 0, 0)|+ |h(0)|]<∞.
13 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Solvability of the state equation
Set γ = minα, β(> 2).
Lemma 1. Under (H1), Eq. (3) admits a unique solutionX = Xk, k ∈ T such that Xk ∈ Lγ(Ω,Fk,P;Rn) for each k.Moreover, there exists C = C(α, γ,N) > 0 satisfying
EN∑k=1
|Xk|γ ≤ CE
1 + |x0|γ +
N−1∑k=0
|uk|γ. (4)
Proof. The existence of X is obvious. We only need to check(4). In fact, by (H1),
E|Xk+1|γ ≤C(γ)E[
1 + |Xk|γ + |uk|γ]
+[1 + |Xk|γ |+ |uk|γ
]|Wk|γ
, k ∈ T .
14 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Solvability of the state equation
By (1), Holder’s inequality yields E [|Wk|γ |Fk] ≤ C(α, γ) =: C1. Then
E [|Xk|γ |Wk|γ ] = E |Xk|γE [|Wk|γ |Fk] ≤ C1E|Xk|γ .
Similarly, E [|uk|γ |Wk|γ ] ≤ C1E|uk|γ . So, there exists C2 := C2(α, γ) s.t.
E|Xk+1|γ ≤ C2E 1 + |Xk|γ + |uk|γ , k ∈ T .
Then, by induction we get for k ∈ T,
E|Xk|γ ≤k∑j=1
(C2)j + (C2)k|x0|γ +
k−1∑j=0
(C2)k−jE|uj |γ .
Thus we can draw the conclusion.
Remark. The conditions (1) and (2) are not sufficient to get the
Lr-estimate of Xk (k ≥ 2) for r > γ. In fact, if ξ ∈ Lp and η ∈ Lq, then
generally we can only get ξη ∈ Lr for r ≤ pq/(p+ q)(< minp, q).15 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Convex perturbation
Let us define the convex perturbation.
For any µ = µ0, · · · , µN−1 ∈ U , defineνk = µk − u∗k, ν = ν0, · · · , νN−1,uεk = u∗k + ενk, u
ε = uε0, · · · , uεN−1.
Then uε ∈ U .
Similar to (4), it’s easy to check that
EN∑k=0
|Xεk −X∗k |2 ≤ Cε2E
N−1∑k=0
|νk|2,
where C is a positive constant independent of ε.
16 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The variational equation
Let us introduce (the variational equation)Zk+1 =
[b>x (k)Zk + b>v (k)νk
]+
d∑i=1
[(σix(k))>Zk + (σiv(k))>νk
]W ik, k ∈ T ,
Z0 = 0,
(5)
where σ = (σ1, · · · , σd).
Eq. (5) admits a unique solution Z = Zk, k ∈ T such thatZk ∈ L2(Ω,Fk,P;Rn) for each k. Moreover, there exists C > 0such that
EN∑k=1
|Zk|2 ≤ CEN−1∑k=0
|νk|2.
17 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The variational equation
As a linear stochastic difference equation, the variational equation(5) could be explicitly solved. In fact, (5) is equivalent to
Zk+1 = MkZk +Rk, k ∈ T ; Z0 = 0,
where
Mk = b>x (k)+d∑i=1
(σix(k))>W ik, Rk = b>v (k)νk+
d∑i=1
(σiv(k))>W ikνk.
Then using induction yields
Zk =
k−1∑i=0
Mk−1i Ri, (6)
where M ji = MjMj−1 · · ·Mi+1 when i < j, and M j
i = In wheni = j.
18 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The variational inequality
Lemma 2. Under (H1) and (H2), it holds that
LJ(u∗) := E
N−1∑k=0
[〈lx(k), Zk〉+ 〈lv(k), νk〉] + 〈hx(X∗N ), ZN 〉
≥ 0.
The proof is similar to that of the continuous-time counterpart.
Define the Hamiltonian H by
H(k, x, v, p, q) = 〈b(k, x, v), p〉+
d∑i=1
〈σi(k, x, v), qi〉+ l(k, x, v)
with q = (q1, · · · , qd). Note thatd∑i=1〈σi, qi〉 = tr
[σ>q
].
19 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
Introduce the following adjoint equation:Pk = E
Hx(k + 1, X∗k+1, u
∗k+1, Pk+1, Qk+1)
∣∣Fk ,Qk = E
Hx(k + 1, X∗k+1, u
∗k+1, Pk+1, Qk+1)W
>k
∣∣Fk ,k = 0, 1, · · · , N − 2,PN−1 = E[hx(X∗N )|FN−1],QN−1 = E[hx(X∗N )W>N−1
∣∣FN−1].(7)
20 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
This adjoint equation, introduced first by Lin and Zhang (IEEETAC, 2015), is a kind of backward stochastic difference equation,which is quite different from the continuous-time counterpart.
Note that Cohen and Elliott (SPA, 2010; SICON, 2011) alsostudied one kind of backward stochastic difference equation andobtained the existence and uniqueness result. However, our adjointequation is different from the equations studied in the above tworeferences. On the one hand, they have different forms. On theother hand, our adjoint equation is introduced naturally as the dualequation of the variational equation, while in our opinion, theequations studied in the above two references are not appropriateto serve as the adjoint equations of Problem (OCP).
21 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
The main concern of the adjoint equation (7) is the integrability ofits solution (Pk, Qk), k ∈ T .
For later use we need these integrability conditions:E[|Pk|+ |Qk|] <∞, k ∈ T ,E[(|Pk+1|+ |Qk+1|)|Wk|] <∞, k = 0, · · · , N − 2,E[(|Pk|+ |Qk|)|Xk|+ (|Pk|+ |Qk|)|uk|] <∞, k ∈ T .
(8)
Let us point out that the first two integrability conditions in (8)comes from the adjoint equation itself, while the third condition isneeded in the prove of necessary as well as sufficient MP.
22 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
The classical square-integrability conditions of the noises andadmissible controls are insufficient for our purpose.
If Wk are only square-integrable (α = 2), then Xk,1 ≤ k ≤ N , are at most square-integrable even when uk arebounded. In this case, (Pk, Qk), 1 ≤ k ≤ N − 2, are notsquare-integrable. Thus E[(|Pk+1|+ |Qk+1|)|Wk|] andE[(|Pk|+ |Qk|)|Xk|] are not well defined. So, a strongerintegrability condition on Wk is needed.
If uk are only square-integrable (β = 2) and Wk are notbounded, then Xk, 1 ≤ k ≤ N , are at most square-integrable,and (Pk, Qk), 1 ≤ k ≤ N − 2, are still not square-integrable.Thus E[(|Pk|+ |Qk|)|Xk|] and E(|Pk|+ |Qk|)|uk| are not welldefined. So, a stronger integrability condition on uk isneeded.
23 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
Based on the previous discussions, let us assume moreover
(H3) α ≥ N + 1 and β = 2α(α−N + 1)−1.
Lemma 3. Under (H1)-(H3), the integrability conditions in (8)hold true.
Proof. As a corollary of Holder’s inequality or Young’s inequality,the following assertion holds true: if ξ ∈ La and η ∈ Lb with
a, b > 0, then ξη ∈ Laba+b , and ξη ∈ L1 if 1
a + 1b ≤ 1 or ab
a+b ≥ 1.
Denote θk = αγα+γ(N−1−k) for k ∈ T . Note that θk is increasing in
k and θk ≤ γ.Let us use induction to show that
|Pk| ∈ Lθk , |Qk| ∈ Lθk−1 , k = 1, · · · , N − 1. (9)
24 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
Since |hx(X∗N )| ≤ C(1 + |X∗N |), |X∗N | ∈ Lγ and |WN−1| ∈ Lα, we have
|PN−1| ∈ Lγ and |QN−1| ∈ Lαγα+γ . Thus (9) holds for k = N − 1.
Let us use induction to proceed. Assume
|Pj+1| ∈ Lθj+1 , |Qj+1| ∈ Lθj (10)
for some j = 1, · · · , N − 2. Firstly, note that
|Pj | ≤ CE[(1 + |X∗j+1|+ |u∗j+1|+ |Pj+1|+ |Qj+1|
)|Fj ].
Since |X∗j+1| ∈ Lγ , |u∗j+1| ∈ Lβ , |Pj+1| ∈ Lθj+1 , |Qj+1| ∈ Lθj , and
θj < θj+1 ≤ γ ≤ β, it follows that Pj ∈ Lθj . Secondly, we have
|Qj | ≤ CE[(1 + |X∗j+1|+ |u∗j+1|+ |Pj+1|+ |Qj+1|
)|Wj ||Fj ]
Note that |Wj | ∈ Lα, |X∗j+1||Wj | ∈ Lαγα+γ , |u∗j+1||Wj | ∈ L
αβα+β ,
|P ∗j+1||Wj | ∈ Lθj , |Q∗j+1||Wj | ∈ Lθj−1 . Since
θj−1 < θj ≤ αγα+γ ≤
αβα+β < α, we have |Qj | ∈ Lθj−1 .
25 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
With these preparations, the integrability conditions in (8) hold if
|Q1||W0|, |Q1||X1|, |Q2||X2| ∈ L1.
It’s easy to check that the above holds true if
γ(α−N + 1) ≥ 2α. (11)
Since γ = minα, β, (11) is equivalent toα > N − 1,α(α−N + 1) ≥ 2α,β(α−N + 1) ≥ 2α.
Solving it yields
α ≥ N + 1, β ≥ 2α(α−N + 1)−1.
Thus, it’s proper and sufficient to choose α ≥ N + 1 andβ = 2α(α−N + 1)−1, which is just (H3).
26 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
Remark. (i). The proof of Lemma 3 shows the reason that weneed and how we worked out the assumption (H3). Under thepreconditions (H1)-(H2), (H3) is almost the minimum conditionthat ensures (8).
(ii). Lemma 3 guarantees that the adjoint equation is well definedwith a unique solution (Pk, Qk), k ∈ T , and all the expectationsinvolving (Pk, Qk), k ∈ T in this talk are well defined.
(iii). It is expected that both α and β could be small; however, it’snot the case. In fact, (H3) implies that 2 < β ≤ α; in addition, thesmaller β is, the larger α is.
27 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
The adjoint equation
Lemma 4. Under (H1)-(H3), the variational equation (5) and theadjoint equation (7) admits the following duality:
EN−1∑k=0
⟨bv(k)Pk +
d∑j=1
σjv(k)Qjk, νk
⟩= E
N∑k=1
〈lx(k), Zk〉. (12)
In fact, the adjoint equation (7) can be solved explicitly to getPk =
N∑i=k+1
E[(M i−1
k )>lx(i)∣∣Fk] ,
Qk =N∑
i=k+1
E[(M i−1
k )>lx(i)W>k∣∣Fk] . (13)
Then Lemma 4 could be proved by interchanging the order of
summations and making use of the properties of conditional expectations.
Lemma 4 shows the main role played by (P,Q). (13) is the origin of the
adjoint equation (7).28 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Necessary MP
Theorem 1. Assume (H1)-(H3). If u∗ = u∗k, k ∈ T is anoptimal control of Problem (DTC), then it satisfies
〈Hv(k,X∗k , u∗k, Pk, Qk), v − u∗k〉 ≥ 0, ∀v ∈ Uk, k ∈ T , (14)
Proof. By (12), the variational inequality becomes
EN−1∑i=0
〈Hv(i,X∗i , u∗i , Pi, Qi), µi − u∗i 〉 ≥ 0, ∀µ = µi, i ∈ T ∈ U .
For any k ∈ T , v ∈ Uk and A ∈ Fk, define µk = vχA + u∗kχA, andµi = u∗i if i 6= k. Then µ ∈ U . Applying to the last inequality gives
E [〈Hv(k,X∗k , u∗k, Pk, Qk), v − u∗k〉χA] ≥ 0.
Since 〈Hv(k,X∗k , u∗k, Pk, Qk), v − u∗k〉 is Fk-measurable and A ∈ Fk is
arbitrarily chosen, the result follows directly.29 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Sufficient MP
Theorem 2. Assume (H1)-(H3). Let (u∗, X∗) be an admissiblepair and (P,Q) the solution of the adjoint equation (7). Assumethat h(·) and H(k, ·, ·, Pk, Qk) are convex functions for eachk ∈ T . Then u∗ is an optimal control of Problem (DTC) if itsatisfies (14).
Proof. For admissible pair (u,X), set b(k) = b(k,Xk, uk)− b(k,X∗k , u
∗k) and
σ(k) = σ(k,Xk, uk)− σ(k,X∗k , u
∗k). Then
J(u)− J(u∗) = EN−1∑k=0
[H(k,Xk, uk, Pk, Qk)−H(k,X∗k , u
∗k, Pk, Qk)]
− EN−1∑k=0
[〈b(k), Pk〉+
d∑j=1
〈σj(k), Qjk〉]+ E [h(XN )− h(X∗
N )] .
Note that for the first two expectations to be well defined, thethird integrability condition in (8) is needed.
30 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Problem formulation and preliminariesMaximum principle
Sufficient MP
If h and H are convex in (x, v) and (14) holds, then
J(u)− J(u∗) ≥ EN∑k=1
〈lx(k), Xk〉 − EN−1∑k=0
[〈αk, Pk〉+
d∑j=1
〈βjk, Qjk〉],
where X = X −X∗, αk = b(k)− b>x (k)Xk and
βjk = σj(k)− (σjx(k))>Xk. Similar to (6), we have Xk =k−1∑i=0
Mk−1i δi
for k ∈ T, where δk = αk +d∑j=1
βjkWjk , k ∈ T .
Finally, in view of (13), by interchanging the order of summations and
using the properties of conditional expectations we can show that the
right-hand side of the last inequality is equal to zero.
31 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
Outline
1 Introduction
2 Maximum principleProblem formulation and preliminariesMaximum principle
3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
5 Conclusions and more extensions
32 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
Let us consider a system described byXk+1 = b(k,Xk, u1,k, u2,k) + σ(k,Xk, u1,k, u2,k)Wk,X0 = x0 ∈ Rn, (15)
and introduce two cost functionals
Ji(u1, u2) = E
N−1∑k=0
li(k,Xk, u1,k, u2,k) + hi(XN )
, i = 1, 2.
Player i hopes to minimize the cost Ji by selecting an appropriatecontrol u∗i , i = 1, 2. The problem is to find (u∗1, u
∗2) ∈ U1 × U2 s.t. J1(u
∗1, u∗2) = inf
u1∈U1J1(u1, u
∗2),
J2(u∗1, u∗2) = inf
u2∈U2J2(u
∗1, u2).
(16)
It is a discrete-time nonzero-sum stochastic game. We call itProblem (NZSG). Such (u∗1, u
∗2) is called an equilibrium point.
33 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
Similar assumptions are supposed on the noises, the controls andthe coefficients as in the previous section.
Let us define
Hi = 〈b, p〉+ tr[σ>q] + li, i = 1, 2.
Introduce the following adjoint equations for i = 1, 2:
Pi,k = EHix(k + 1, X∗k+1, u
∗1,k+1, u
∗2,k+1, Pi,k+1, Qi,k+1)
∣∣Fk ,Qi,k = E
Hix(k + 1, X∗k+1, u
∗1,k+1, u
∗2,k+1, Pi,k+1, Qi,k+1)W
>k
∣∣Fk ,k = 0, 1, · · · , N − 2,Pi,N−1 = E[hix(X∗N )|FN−1],Qi,N−1 = E[hix(X∗N )W>N−1
∣∣FN−1].34 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
Note that Problem (NZSG) consists of two discrete-timestochastic optimal control problems of Problem (DTC)’s type.
Theorem 3. Suppose (u∗1, u∗2) is an equilibrium point of Problem
(NZSG). Then it holds that ∀vi,k ∈ Ui,k, k ∈ T , i = 1, 2,
〈Hiui(k,X∗k , u∗1,k, u
∗2,k, Pi,k, Qi,k), vi,k − u∗i,k〉 ≥ 0. (17)
Theorem 4. Let (u∗1, u∗2) be a pair of admissible controls for
Problem (NZSG). Suppose that H1(k, ·, ·, u∗2,k, P1,k, Q1,k),H2(k, ·, u∗1,k, ·, P2,k, Q2,k), h1(·) and h2(·) are convex for eachk ∈ T . Then (u∗1, u
∗2) is an equilibrium point if it satisfies (17).
35 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
Next let us consider a discrete-time zero-sum stochastic game. Westill use the system (15) where u1 and u2 are the controls of twoplayers. The discrete-time zero-sum stochastic game, Problem(ZSG), is to find (u∗1, u
∗2) ∈ U1 × U2 such that
J(u∗1, u2) ≤ J(u∗1, u∗2) ≤ J(u1, u
∗2), ∀(u1, u2) ∈ U1 × U2, (18)
where
J(u1, u2) = E
N−1∑k=0
l(k,Xk, u1,k, u2,k) + h(XN )
.
Such (u∗1, u∗2) is called a saddle point.
36 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
By transforms, Problem (ZSG) could be turned to a nonzero-sumstochastic game of Problem (NZSG)’s type. In fact, by defining
l1 = l, h1 = h, J1 = J, l2 = −l, h2 = −h, J2 = −J,
Problem (ZSG) is equivalent to finding (u∗1, u∗2) such that
J1(u∗1, u∗2) = inf
u1∈U1J1(u1, u
∗2), J2(u
∗1, u∗2) = inf
u2∈U2J2(u
∗1, u2),
where
Ji(u1, u2) = E
N−1∑k=0
li(k,Xk, u1,k, u2,k) + hi(XN )
, i = 1, 2.
37 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
Define the Hamiltonian by H = 〈b, p〉+ tr[σ>q
]+ l, and the adjoint
equation byPk = E
[Hx(k + 1, X∗k+1, u
∗1,k+1, u
∗2,k+1, Pk+1, Qk+1)
∣∣Fk],Qk = E
[Hx(k + 1, X∗k+1, u
∗1,k+1, u
∗2,k+1, Pk+1, Qk+1)W>k
∣∣Fk],k = 0, 1, · · · , N − 2,PN−1 = E [hx(X∗N )|FN−1] ,QN−1 = E
[hx(X∗N )W>N−1
∣∣FN−1
].
By the necessary MP of Problem (NZSG) [see Theorem 3], we get thefollowing necessary MP for Problem (ZSG).
Theorem 5. Suppose (u∗1, u∗2) is a saddle point of Problem (ZSG).
Then for each k ∈ T ,⟨Hu1(k,X∗k , u
∗1,k, u
∗2,k, Pk, Qk), v1,k − u∗1,k
⟩≥ 0, ∀v1,k ∈ U1,k,⟨
Hu2(k,X∗k , u∗1,k, u
∗2,k, Pk, Qk), v2,k − u∗2,k
⟩≤ 0, ∀v2,k ∈ U2,k.
(19)
38 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
By the sufficient MP of Problem (NZSG) [see Theorem 4], weget the following sufficient MP for Problem (ZSG).
Theorem 6. Let (u∗1, u∗2) be a pair of admissible controls for
Problem (ZSG). Assume that H(k, ·, ·, u∗2,k, Pk, Qk) is convex,H(k, ·, u∗1,k, ·, Pk, Qk) is concave and h(x) = Rx+ ξ with R beinga bounded FT -measurable r.v. and ξ an integrable FT -measurabler.v.. Then (u∗1, u
∗2) is a saddle point if (19) holds.
39 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Outline
1 Introduction
2 Maximum principleProblem formulation and preliminariesMaximum principle
3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
5 Conclusions and more extensions
40 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Example 1. Suppose there is a market in which m+ 1 assets aretraded discontinuously. The price Bk, k ∈ T of one risk-lessasset satisfies
Bk+1 −Bk = rkBk, k ∈ T ; B0 = b0. (20)
The prices of m risk assets, Si,k, k ∈ T , i = 1, 2, · · · ,m, satisfy
Si,k+1 − Si,k = Si,k(µi,k + σi,kWk), k ∈ T ; Si,0 = si. (21)
We now consider an investor whose wealth at time k is Xk.Denote by ui,k the wealth invested in the i-th stock at time k, andcall uk = (u1,k, u2,k, · · · , um,k)> a portfolio of the investor at timek. Assume moreover that consumption occurs at each time k ∈ T ,and denote by vk, k ∈ T the consumption sequence. Then
Xk+1−Xk =
Xk −m∑i=1
ui,k
Bk(Bk+1−Bk)+
m∑i=1
ui,kSi,k
(Si,k+1−Si,k)−vk.
41 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Thus the dynamics of the wealth satisfiesXk+1 =
[(rk + 1)Xk + b>k uk − vk
]+ σ>k ukWk,
X0 = x0,
where bk = (µ1,k − rk, µ2,k − rk, · · · , µm,k − rk)> andσk = (σ1,k, σ2,k, · · · , σm,k)>. Assume that r, b, σ are bounded.
Let U be the set of u = uk, k ∈ T , where each uk is anFk-measurable and Rm-valued r.v. with necessary integrability.Denote by V the collection of all v = vk, k ∈ T such that eachvk is real-valued and Fk-measurable, vk ≥ a with some constanta > 0, and v has necessary integrability.
42 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
The objective of the investor is to find (u∗, v∗) ∈ U × V such that
J(u∗, v∗) = inf(u,v)∈U×V
J(u, v)
with
J(u, v) = E
N−1∑k=0
[1
2Lk|uk − ηk|2 −Gk
(vk)1−δ
1− δ
]−RXN
.
Assume that Lk, Gk and R are bounded scalar-valued r.v.’s, ηk is abounded Rm-valued r.v. representing a benchmark, and δ is aconstant representing the Arrow-Pratt index of the risk aversion.Assume 0 < δ < 1, Gk > 0, Lk > c and R > c for some c > 0.
Note that the objective consists of three parts. The first one is aminimization of the difference between the portfolio and a givenbenchmark, the second is a maximization of a utility of theconsumption and the last is a maximization of the terminal wealth.
43 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Let (H1) and (H2) hold true. The Hamiltonian is defined by
H =[(rk + 1)x+ b>k u− v
]p+ σ>k uq +
Lk2|u− ηk|2 −Gk
v1−δ
1− δ.
It’s easily seen that H and h(x) = −Rx are convex functions of(x, u, v). Applying Theorem 1 shows that the optimal control(u∗, v∗) satisfies
〈bkPk + σkQk + Lk(u∗k − ηk), u− u∗k〉 ≥ 0, ∀u ∈ Rm, (22)[
−Pk −Gk(v∗k)−δ]
(v − v∗k) ≥ 0, ∀v ≥ a (23)
for each k, where (Pk, Qk), k ∈ T uniquely solvesPk = E [(rk+1 + 1)Pk+1|Fk] , k = 0, 1, · · · , N − 2,Qk = E [(rk+1 + 1)Pk+1Wk|Fk] , k = 0, 1, · · · , N − 2,PN−1 = −E [R|FN−1] , QN−1 = −E [RWN−1|FN−1] .
(24)
44 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
By Theorem 2, an admissible control (u∗, v∗) satisfying (22) and(23) is indeed an optimal control.
From (22) we derive
u∗k = ηk − L−1k (bkPk + σkQk). (25)
Since H(v) := −Pkv −Gk v1−δ
1−δ is a differentiable and convexfunction, (23) implies that H(v), v ∈ [a,+∞), takes its minimumat v∗k. Note that Pk < 0, k ∈ T . Then the solution of (23) is
v∗k =
vk, if vk ≥ a,a, if vk < a,
(26)
with vk =(−GkPk
) 1δ
.
45 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
It’s easy to see that each Pk is bounded. Thus, Qk ∈ Lα and soQk ∈ Lβ since β ≤ α. This implies that u∗k defined by (25) isindeed admissible. Using induction we can get Pk < −c for k ∈ T ,which shows that vk is bounded. Thus, v∗k defined by (26) isbounded and so admissible. Consequently, (u∗k, v∗k) defined by(25) and (26) is the optimal strategy of this investmentconsumption choice problem.
Eq. (24) can be solved explicitly in some special cases. Ifrk, k = 1, 2, · · · , N − 1 and R are deterministic constants, andE [Wk|Fk] = 0, k ∈ T , then (24) is uniquely solved by Qk ≡ 0 and
Pk =
−R(rN−1 + 1)(rN−2 + 1) · · · (rk+1 + 1), if 0 ≤ k ≤ N − 2,−R, if k = N − 1.
46 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Example 2. Let us consider one kind of discrete-time LQstochastic optimal control problem in the case when d = 1,E[Wk|Fk] = 0 and Uk = Rm for each k ∈ T .
Let us define
b(k, x, v) = Akx+Bkv + Ck, σ(k, x, v) = Dkx+ Ekv + Fk,
l(k, x, v) = [〈Gkx, x〉+ 〈Lkv, v〉] /2, h(x) = 〈Rx, x〉,
where the coefficients are deterministic and bounded matrices.Moreover, Gk, R ≥ 0 and Lk > 0 for each k.
47 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
By Theorems 1 and 2, u∗k = −L−1k (B>k Pk + E>k Qk) is an optimal
control of this LQ problem if it’s admissible, where (P,Q) solvesPk = E
[A>k+1Pk+1 +D>k+1Qk+1 +Gk+1X
∗k+1
] ∣∣Fk ,Qk = E
[A>k+1Pk+1 +D>k+1Qk+1 +Gk+1X
∗k+1
]Wk
∣∣Fk ,k = 0, 1, · · · , N − 2,PN−1 = E[RX∗N |FN−1], QN−1 = E[RX∗NWN−1|FN−1].
Let us give a state-feedback form of u∗ in the simple case when Dk = 0and Ek = 0 for each k ∈ T . In this case, we have
u∗k = −L−1k B>k Pk, (27)
wherePk = E
[A>k+1Pk+1 +Gk+1X
∗k+1
] ∣∣Fk , k = 0, 1, · · · , N − 2,PN−1 = E[RX∗N |FN−1].
To conclude, we only need to express Pk in terms of X∗k .
Since the diffusion term is independent of x and v, and Qk is not
needed, the square-integrability issue disappears in this case.48 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
By Lemma 4, we havePk = (AN−1
k )>RE[X∗N |Fk] +N−1∑j=k+1
(Aj−1k )>GjE[X∗j |Fk],
k = 0, · · · , N − 2,PN−1 = RE[X∗N |FN−1],
(28)
where Aji = AjAj−1 · · ·Ai+1 if i < j, and Aji = In if i = j.
The subsequent idea is: Pk ⇐= E[X∗k+1|Fk]⇐= X∗k .Let us assume
Pk = αkE[X∗k+1|Fk] + βk, k ∈ T , (29)
where αk, βk, k ∈ T are deterministic matrices which are to determinedlater. Then by (27), we have
u∗k = −L−1k B>k (αkE[X∗k+1|Fk] + βk). (30)
49 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
By substituting (30) into the state equation and taking conditionalexpectation we get
ΦkE[X∗k+1|Fk] = AkX∗k + Ψk, k ∈ T ,
where Φk = In +BkL−1k B>k αk and Ψk = Ck −BkL−1k B>1,kβk.
Let us assume
Φ−1k exists and is bounded, ∀k ∈ T . (31)
ThenE[X∗k+1|Fk] = φkX
∗k + ψk, k ∈ T (32)
withφk = Φ−1k Ak, ψk = Φ−1k Ψk. (33)
50 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Consequently, by (30) and (32) we get
u∗k = −L−1k B>k αkφkX∗k − L−1k B>k (αkψk + βk). (34)
This is the feedback form. It remains to determine (αk, βk).
By (32), we can use induction to get
E[X∗i |Fk] = φi−1k E[X∗k+1|Fk] +
i−1∑j=k+1
φi−1j ψj , i ≥ k + 2, (35)
where φji is defined by: φji = φjφj−1 · · ·φi+1 if i < j, and φji = Inif i = j. That is, we can express E[X∗i |Fk], i ≥ k + 2, in terms ofE[X∗k+1|Fk] only.
51 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Taking i = N and k = N − 2 in (35) gives
E[X∗N |FN−2] = φN−1E[X∗N−1|FN−2] + ψN−1. (36)
By (28) we have
PN−2 = [(AN−1)>RφN−1 +GN−1]E[X∗N−1|FN−2] + (AN−1)>RψN−1.
Recall from (28) that Pk satisfies that
Pk = (AN−1k )>RE[X∗N |Fk] +
N−1∑j=k+2
(Aj−1k )>GjE[X∗j |Fk]
+Gk+1E[X∗k+1|Fk], k = 0, 1, · · · , N − 3,PN−2 = [(AN−1)>RφN−1 +GN−1]E[X∗N−1|FN−2]
+(AN−1)>RψN−1,PN−1 = RE[X∗N |FN−1].
(37)
52 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
Next, substituting (29) and (35) into (37) and comparing the coefficientsleads to
αN−1 = R, βN−1 = 0,αN−2 = GN−1 + (AN−1)>RφN−1, βN−2 = (AN−1)>RψN−1,
αk = Gk+1 +N−1∑j=k+2
(Aj−1k )>Gjφ
j−1k + (AN−1
k )>RφN−1k ,
βk =N−1∑j=k+2
(Aj−1k )>Gj
j−1∑s=k+1
φj−1s ψs + (AN−1
k )>RN−1∑j=k+1
φN−1j ψj ,
k = 0, 1, · · · , N − 3.(38)
Note that (αk, βk), k ∈ T defined above are bounded.
Finally the conclusion is as follows. Let (φk, ψk) be defined by (33)
and (αk, βk), k ∈ T be defined by (38). If (31) holds, then u∗ defined
by (34) is the admissible optimal control of this LQ problem.
53 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Outline
1 Introduction
2 Maximum principleProblem formulation and preliminariesMaximum principle
3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game
4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem
5 Conclusions and more extensions
54 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Conclusions.
One kind of discrete-time stochastic optimal control problem isstudied. The main result is to establish the necessary as well assufficient stochastic maximum principle in a rigorous and clear way.A difficulty arising from the lack of necessary integrability of thesolution to the adjoint equation is overcome. The results areextended to two kinds of discrete-time stochastic games. Twoapplications are studied, for which the optimal controls are derived.This research paves a way for further related works.
More extensions.
general control domain case
different kinds of control systems, i.e., delayed system,mean-field system, · · ·state/control constrained cases
partial information/ partial observation cases55 / 56
IntroductionMaximum principle
Extensions to stochastic gamesApplications
Conclusions and more extensions
Thank Y ou !
56 / 56