Maximum principle for one kind of discrete-time stochastic ... › events › 2019 › qfinance › files › zhen.pdf · Extensions to stochastic games Applications Conclusions and

IntroductionMaximum principle

Extensions to stochastic gamesApplications

Conclusions and more extensions

Maximum principle for one kind of discrete-timestochastic optimal control problem and its

applications

Zhen WU

School of Mathematics, Shandong University

Joint work with Dr. Feng ZHANG (SDUFE)

1 / 56




Outline

1 Introduction

2 Maximum principleProblem formulation and preliminariesMaximum principle

3 Extensions to stochastic gamesMP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game

4 ApplicationsApplication to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem

5 Conclusions and more extensions

2 / 56




Outline

1 Introduction





3 / 56




Stochastic MP

The maximum principle (MP ) is a milestone of optimal controltheory.

For stochastic MP, see e.g. J.M.Bismut (1978), A.Bensoussan(1981), U.G.Haussmann (1986), S.G.Peng (1990), J.M.Yong(2010) and Z.Wu (2013).

4 / 56




Discrete-time stochastic optimal control problems

Discrete-time optimal control problems are as important ascontinuous-time ones.

Difference equations are in their own right importantmathematical models.In many situations, it is sufficient or natural to describe asystem by a discrete-time model.

signal values are sometimes only available at certain times.discretization of the dynamics of continuous-time problems.computer manipulation.

Difficulties in studying discrete-time stochastic optimalcontrol problems:

differentiation and integration operations no longer work;well-known tools of Ito’s Calculus do not necessarily hold (Ito’sformula, BDG inequality, etc.).

5 / 56




Works for discrete-time stochastic optimal controls

To turn the original problem into a deterministic one and thenuse a matrix version of MP:A.Beghi and D.D’ Alessandro (J. Optim. Theory Appl., 1998),M.Ait Rami, X.Chen and X.Y.Zhou (J. Global Optim., 2002).

To use the completion of squares for LQ problems:J.B.Moore, X.Y.Zhou and A.E.B.Lim (Syst. Control Lett.,1999), R.J.Elliott, X.Li and Y.H.Ni (Automatica, 2013).

Oswaldo L.V.Costa, D.Li, J.F.Zhang, H.S.Zhang, W.H.Zhang,. . .

6 / 56




Comments on one literature

X.Y.Lin and W.H.Zhang, A maximum principle for optimal control of

discrete-time stochastic systems with multiplicative noise, IEEE

Transactions on Automatic Control, 60, 1121–1126, 2015.Main contributions:

necessary as well as sufficient conditions are established for onekind of discrete-time stochastic optimal control problem;one kind of new backward stochastic difference equation isintroduced to be the adjoint equation.

Some deficiencies:the assumptions are strict, excluding the quadratic costfunctional;the convex perturbation is not appropriately defined;the necessary condition does not necessarily hold for generalconvex control domains;the square-integrability of the solution to the adjointequation is not guaranteed.

7 / 56




Results and contributions of this talk

This talk considers one kind of discrete-time stochastic optimalcontrol problem with convex control domains.

Main results:

necessary MPsufficient MP (verification theorem)extensions and applications

Main contributions:

discrete-time stochastic MP in a rigorous versionproof in a clear and concise wayroad for further related topics

8 / 56




Problem formulation and preliminariesMaximum principle

Outline

1 Introduction





9 / 56





Notations

(Ω,F ,P): a probability space

For fixed N ∈ N, T := 0, 1, · · · , N − 1, T := 1, 2, · · · , N.

Noises: Wk = (W 1k , · · · ,W d

k )> ∈ Rd, k ∈ T .

Filtration F0,F1, · · · ,FN ⊂ F :

F0 = ∅,Ω, Fk = σW0,W1, · · · ,Wk−1, k ∈ T.

10 / 56





Noises and admissible controls

Assume that for some α > 2,

E [|Wk|α|Fk] = C(α) <∞, ∀k ∈ T . (1)

Let U0, · · · , UN−1 be nonempty convex subsets of Rm.u = u0, · · · , uN−1 is called an admissible control, if

uk is an Uk-valued, Fk-measurable r.v. for each k ∈ T ;for some β > 2,

EN−1∑k=0

|uk|β <∞. (2)

Denote by U the set of admissible controls.

11 / 56





Problem formulation

The discrete-time stochastic optimal control problem, Problem (OCP),is to find u∗ = u∗0, · · · , u∗N−1 ∈ U such that

J(u∗) = infu∈U

J(u),

where the state X is defined byXk+1 = b(k,Xk, uk) + σ(k,Xk, uk)Wk, k ∈ T ,X0 = x0 ∈ Rn, (3)

and the cost J is given by

J(u) = E

N−1∑k=0

l(k,Xk, uk) + h(XN )

.

Such u∗ is called an optimal control; the corresponding system state is

denoted by X∗ = X∗k , k ∈ T.12 / 56





Assumptions

The coefficients are defined by

b(k, ·, ·, ·) : Rn ×Uk ×Ω→ Rn, σ(k, ·, ·, ·) : Rn ×Uk ×Ω→ Rn×d,l(k, ·, ·, ·) : Rn × Uk × Ω→ R and h(·, ·) : Rn × Ω→ R for k ∈ T .

(H1) b(k, x, v) and σ(k, x, v) are Fk-measurable for any (k, x, v),and continuously differentiable in (x, v) with bounded derivatives.In addition, |b(k, 0, 0)|+ |σ(k, 0, 0)| ≤ C for some C > 0.

(H2) l(k, x, v) is Fk-measurable and h(x) is FN -measurable forany (k, x, v); (l, h) are continuously differentiable in (x, v), and thederivatives are bounded by C(1 + |x|+ |v|). Besides,

E[∑N−1

k=0 |l(k, 0, 0)|+ |h(0)|]<∞.

13 / 56





Solvability of the state equation

Set γ = minα, β(> 2).

Lemma 1. Under (H1), Eq. (3) admits a unique solutionX = Xk, k ∈ T such that Xk ∈ Lγ(Ω,Fk,P;Rn) for each k.Moreover, there exists C = C(α, γ,N) > 0 satisfying

EN∑k=1

|Xk|γ ≤ CE

1 + |x0|γ +

N−1∑k=0

|uk|γ. (4)

Proof. The existence of X is obvious. We only need to check(4). In fact, by (H1),

E|Xk+1|γ ≤C(γ)E[

1 + |Xk|γ + |uk|γ]

+[1 + |Xk|γ |+ |uk|γ

]|Wk|γ

, k ∈ T .

14 / 56





Solvability of the state equation

By (1), Holder’s inequality yields E [|Wk|γ |Fk] ≤ C(α, γ) =: C1. Then

E [|Xk|γ |Wk|γ ] = E |Xk|γE [|Wk|γ |Fk] ≤ C1E|Xk|γ .

Similarly, E [|uk|γ |Wk|γ ] ≤ C1E|uk|γ . So, there exists C2 := C2(α, γ) s.t.

E|Xk+1|γ ≤ C2E 1 + |Xk|γ + |uk|γ , k ∈ T .

Then, by induction we get for k ∈ T,

E|Xk|γ ≤k∑j=1

(C2)j + (C2)k|x0|γ +

k−1∑j=0

(C2)k−jE|uj |γ .

Thus we can draw the conclusion.

Remark. The conditions (1) and (2) are not sufficient to get the

Lr-estimate of Xk (k ≥ 2) for r > γ. In fact, if ξ ∈ Lp and η ∈ Lq, then

generally we can only get ξη ∈ Lr for r ≤ pq/(p+ q)(< minp, q).15 / 56





Convex perturbation

Let us define the convex perturbation.

For any µ = µ0, · · · , µN−1 ∈ U , defineνk = µk − u∗k, ν = ν0, · · · , νN−1,uεk = u∗k + ενk, u

ε = uε0, · · · , uεN−1.

Then uε ∈ U .

Similar to (4), it’s easy to check that

EN∑k=0

|Xεk −X∗k |2 ≤ Cε2E

N−1∑k=0

|νk|2,

where C is a positive constant independent of ε.

16 / 56





The variational equation

Let us introduce (the variational equation)Zk+1 =

[b>x (k)Zk + b>v (k)νk

]+

d∑i=1

[(σix(k))>Zk + (σiv(k))>νk

]W ik, k ∈ T ,

Z0 = 0,

(5)

where σ = (σ1, · · · , σd).

Eq. (5) admits a unique solution Z = Zk, k ∈ T such thatZk ∈ L2(Ω,Fk,P;Rn) for each k. Moreover, there exists C > 0such that

EN∑k=1

|Zk|2 ≤ CEN−1∑k=0

|νk|2.

17 / 56





The variational equation

As a linear stochastic difference equation, the variational equation(5) could be explicitly solved. In fact, (5) is equivalent to

Zk+1 = MkZk +Rk, k ∈ T ; Z0 = 0,

where

Mk = b>x (k)+d∑i=1

(σix(k))>W ik, Rk = b>v (k)νk+

d∑i=1

(σiv(k))>W ikνk.

Then using induction yields

Zk =

k−1∑i=0

Mk−1i Ri, (6)

where M ji = MjMj−1 · · ·Mi+1 when i < j, and M j

i = In wheni = j.

18 / 56





The variational inequality

Lemma 2. Under (H1) and (H2), it holds that

LJ(u∗) := E

N−1∑k=0

[〈lx(k), Zk〉+ 〈lv(k), νk〉] + 〈hx(X∗N ), ZN 〉

≥ 0.

The proof is similar to that of the continuous-time counterpart.

Define the Hamiltonian H by

H(k, x, v, p, q) = 〈b(k, x, v), p〉+

d∑i=1

〈σi(k, x, v), qi〉+ l(k, x, v)

with q = (q1, · · · , qd). Note thatd∑i=1〈σi, qi〉 = tr

[σ>q

].

19 / 56





The adjoint equation

Introduce the following adjoint equation:Pk = E

Hx(k + 1, X∗k+1, u

∗k+1, Pk+1, Qk+1)

∣∣Fk ,Qk = E

Hx(k + 1, X∗k+1, u

∗k+1, Pk+1, Qk+1)W

>k

∣∣Fk ,k = 0, 1, · · · , N − 2,PN−1 = E[hx(X∗N )|FN−1],QN−1 = E[hx(X∗N )W>N−1

∣∣FN−1].(7)

20 / 56






This adjoint equation, introduced first by Lin and Zhang (IEEETAC, 2015), is a kind of backward stochastic difference equation,which is quite different from the continuous-time counterpart.

Note that Cohen and Elliott (SPA, 2010; SICON, 2011) alsostudied one kind of backward stochastic difference equation andobtained the existence and uniqueness result. However, our adjointequation is different from the equations studied in the above tworeferences. On the one hand, they have different forms. On theother hand, our adjoint equation is introduced naturally as the dualequation of the variational equation, while in our opinion, theequations studied in the above two references are not appropriateto serve as the adjoint equations of Problem (OCP).

21 / 56






The main concern of the adjoint equation (7) is the integrability ofits solution (Pk, Qk), k ∈ T .

For later use we need these integrability conditions:E[|Pk|+ |Qk|] <∞, k ∈ T ,E[(|Pk+1|+ |Qk+1|)|Wk|] <∞, k = 0, · · · , N − 2,E[(|Pk|+ |Qk|)|Xk|+ (|Pk|+ |Qk|)|uk|] <∞, k ∈ T .

(8)

Let us point out that the first two integrability conditions in (8)comes from the adjoint equation itself, while the third condition isneeded in the prove of necessary as well as sufficient MP.

22 / 56






The classical square-integrability conditions of the noises andadmissible controls are insufficient for our purpose.

If Wk are only square-integrable (α = 2), then Xk,1 ≤ k ≤ N , are at most square-integrable even when uk arebounded. In this case, (Pk, Qk), 1 ≤ k ≤ N − 2, are notsquare-integrable. Thus E[(|Pk+1|+ |Qk+1|)|Wk|] andE[(|Pk|+ |Qk|)|Xk|] are not well defined. So, a strongerintegrability condition on Wk is needed.

If uk are only square-integrable (β = 2) and Wk are notbounded, then Xk, 1 ≤ k ≤ N , are at most square-integrable,and (Pk, Qk), 1 ≤ k ≤ N − 2, are still not square-integrable.Thus E[(|Pk|+ |Qk|)|Xk|] and E(|Pk|+ |Qk|)|uk| are not welldefined. So, a stronger integrability condition on uk isneeded.

23 / 56






Based on the previous discussions, let us assume moreover

(H3) α ≥ N + 1 and β = 2α(α−N + 1)−1.

Lemma 3. Under (H1)-(H3), the integrability conditions in (8)hold true.

Proof. As a corollary of Holder’s inequality or Young’s inequality,the following assertion holds true: if ξ ∈ La and η ∈ Lb with

a, b > 0, then ξη ∈ Laba+b , and ξη ∈ L1 if 1

a + 1b ≤ 1 or ab

a+b ≥ 1.

Denote θk = αγα+γ(N−1−k) for k ∈ T . Note that θk is increasing in

k and θk ≤ γ.Let us use induction to show that

|Pk| ∈ Lθk , |Qk| ∈ Lθk−1 , k = 1, · · · , N − 1. (9)

24 / 56






Since |hx(X∗N )| ≤ C(1 + |X∗N |), |X∗N | ∈ Lγ and |WN−1| ∈ Lα, we have

|PN−1| ∈ Lγ and |QN−1| ∈ Lαγα+γ . Thus (9) holds for k = N − 1.

Let us use induction to proceed. Assume

|Pj+1| ∈ Lθj+1 , |Qj+1| ∈ Lθj (10)

for some j = 1, · · · , N − 2. Firstly, note that

|Pj | ≤ CE[(1 + |X∗j+1|+ |u∗j+1|+ |Pj+1|+ |Qj+1|

)|Fj ].

Since |X∗j+1| ∈ Lγ , |u∗j+1| ∈ Lβ , |Pj+1| ∈ Lθj+1 , |Qj+1| ∈ Lθj , and

θj < θj+1 ≤ γ ≤ β, it follows that Pj ∈ Lθj . Secondly, we have

|Qj | ≤ CE[(1 + |X∗j+1|+ |u∗j+1|+ |Pj+1|+ |Qj+1|

)|Wj ||Fj ]

Note that |Wj | ∈ Lα, |X∗j+1||Wj | ∈ Lαγα+γ , |u∗j+1||Wj | ∈ L

αβα+β ,

|P ∗j+1||Wj | ∈ Lθj , |Q∗j+1||Wj | ∈ Lθj−1 . Since

θj−1 < θj ≤ αγα+γ ≤

αβα+β < α, we have |Qj | ∈ Lθj−1 .

25 / 56






With these preparations, the integrability conditions in (8) hold if

|Q1||W0|, |Q1||X1|, |Q2||X2| ∈ L1.

It’s easy to check that the above holds true if

γ(α−N + 1) ≥ 2α. (11)

Since γ = minα, β, (11) is equivalent toα > N − 1,α(α−N + 1) ≥ 2α,β(α−N + 1) ≥ 2α.

Solving it yields

α ≥ N + 1, β ≥ 2α(α−N + 1)−1.

Thus, it’s proper and sufficient to choose α ≥ N + 1 andβ = 2α(α−N + 1)−1, which is just (H3).

26 / 56






Remark. (i). The proof of Lemma 3 shows the reason that weneed and how we worked out the assumption (H3). Under thepreconditions (H1)-(H2), (H3) is almost the minimum conditionthat ensures (8).

(ii). Lemma 3 guarantees that the adjoint equation is well definedwith a unique solution (Pk, Qk), k ∈ T , and all the expectationsinvolving (Pk, Qk), k ∈ T in this talk are well defined.

(iii). It is expected that both α and β could be small; however, it’snot the case. In fact, (H3) implies that 2 < β ≤ α; in addition, thesmaller β is, the larger α is.

27 / 56






Lemma 4. Under (H1)-(H3), the variational equation (5) and theadjoint equation (7) admits the following duality:

EN−1∑k=0

⟨bv(k)Pk +

d∑j=1

σjv(k)Qjk, νk

⟩= E

N∑k=1

〈lx(k), Zk〉. (12)

In fact, the adjoint equation (7) can be solved explicitly to getPk =

N∑i=k+1

E[(M i−1

k )>lx(i)∣∣Fk] ,

Qk =N∑

i=k+1

E[(M i−1

k )>lx(i)W>k∣∣Fk] . (13)

Then Lemma 4 could be proved by interchanging the order of

summations and making use of the properties of conditional expectations.

Lemma 4 shows the main role played by (P,Q). (13) is the origin of the

adjoint equation (7).28 / 56





Necessary MP

Theorem 1. Assume (H1)-(H3). If u∗ = u∗k, k ∈ T is anoptimal control of Problem (DTC), then it satisfies

〈Hv(k,X∗k , u∗k, Pk, Qk), v − u∗k〉 ≥ 0, ∀v ∈ Uk, k ∈ T , (14)

Proof. By (12), the variational inequality becomes

EN−1∑i=0

〈Hv(i,X∗i , u∗i , Pi, Qi), µi − u∗i 〉 ≥ 0, ∀µ = µi, i ∈ T ∈ U .

For any k ∈ T , v ∈ Uk and A ∈ Fk, define µk = vχA + u∗kχA, andµi = u∗i if i 6= k. Then µ ∈ U . Applying to the last inequality gives

E [〈Hv(k,X∗k , u∗k, Pk, Qk), v − u∗k〉χA] ≥ 0.

Since 〈Hv(k,X∗k , u∗k, Pk, Qk), v − u∗k〉 is Fk-measurable and A ∈ Fk is

arbitrarily chosen, the result follows directly.29 / 56





Sufficient MP

Theorem 2. Assume (H1)-(H3). Let (u∗, X∗) be an admissiblepair and (P,Q) the solution of the adjoint equation (7). Assumethat h(·) and H(k, ·, ·, Pk, Qk) are convex functions for eachk ∈ T . Then u∗ is an optimal control of Problem (DTC) if itsatisfies (14).

Proof. For admissible pair (u,X), set b(k) = b(k,Xk, uk)− b(k,X∗k , u

∗k) and

σ(k) = σ(k,Xk, uk)− σ(k,X∗k , u

∗k). Then

J(u)− J(u∗) = EN−1∑k=0

[H(k,Xk, uk, Pk, Qk)−H(k,X∗k , u

∗k, Pk, Qk)]

− EN−1∑k=0

[〈b(k), Pk〉+

d∑j=1

〈σj(k), Qjk〉]+ E [h(XN )− h(X∗

N )] .

Note that for the first two expectations to be well defined, thethird integrability condition in (8) is needed.

30 / 56





Sufficient MP

If h and H are convex in (x, v) and (14) holds, then

J(u)− J(u∗) ≥ EN∑k=1

〈lx(k), Xk〉 − EN−1∑k=0

[〈αk, Pk〉+

d∑j=1

〈βjk, Qjk〉],

where X = X −X∗, αk = b(k)− b>x (k)Xk and

βjk = σj(k)− (σjx(k))>Xk. Similar to (6), we have Xk =k−1∑i=0

Mk−1i δi

for k ∈ T, where δk = αk +d∑j=1

βjkWjk , k ∈ T .

Finally, in view of (13), by interchanging the order of summations and

using the properties of conditional expectations we can show that the

right-hand side of the last inequality is equal to zero.

31 / 56




MP for discrete-time nonzero-sum gameMP for discrete-time zero-sum game

Outline

1 Introduction





32 / 56





Let us consider a system described byXk+1 = b(k,Xk, u1,k, u2,k) + σ(k,Xk, u1,k, u2,k)Wk,X0 = x0 ∈ Rn, (15)

and introduce two cost functionals

Ji(u1, u2) = E

N−1∑k=0

li(k,Xk, u1,k, u2,k) + hi(XN )

, i = 1, 2.

Player i hopes to minimize the cost Ji by selecting an appropriatecontrol u∗i , i = 1, 2. The problem is to find (u∗1, u

∗2) ∈ U1 × U2 s.t. J1(u

∗1, u∗2) = inf

u1∈U1J1(u1, u

∗2),

J2(u∗1, u∗2) = inf

u2∈U2J2(u

∗1, u2).

(16)

It is a discrete-time nonzero-sum stochastic game. We call itProblem (NZSG). Such (u∗1, u

∗2) is called an equilibrium point.

33 / 56





Similar assumptions are supposed on the noises, the controls andthe coefficients as in the previous section.

Let us define

Hi = 〈b, p〉+ tr[σ>q] + li, i = 1, 2.

Introduce the following adjoint equations for i = 1, 2:

Pi,k = EHix(k + 1, X∗k+1, u

∗1,k+1, u

∗2,k+1, Pi,k+1, Qi,k+1)

∣∣Fk ,Qi,k = E

Hix(k + 1, X∗k+1, u

∗1,k+1, u

∗2,k+1, Pi,k+1, Qi,k+1)W

>k

∣∣Fk ,k = 0, 1, · · · , N − 2,Pi,N−1 = E[hix(X∗N )|FN−1],Qi,N−1 = E[hix(X∗N )W>N−1

∣∣FN−1].34 / 56





Note that Problem (NZSG) consists of two discrete-timestochastic optimal control problems of Problem (DTC)’s type.

Theorem 3. Suppose (u∗1, u∗2) is an equilibrium point of Problem

(NZSG). Then it holds that ∀vi,k ∈ Ui,k, k ∈ T , i = 1, 2,

〈Hiui(k,X∗k , u∗1,k, u

∗2,k, Pi,k, Qi,k), vi,k − u∗i,k〉 ≥ 0. (17)

Theorem 4. Let (u∗1, u∗2) be a pair of admissible controls for

Problem (NZSG). Suppose that H1(k, ·, ·, u∗2,k, P1,k, Q1,k),H2(k, ·, u∗1,k, ·, P2,k, Q2,k), h1(·) and h2(·) are convex for eachk ∈ T . Then (u∗1, u

∗2) is an equilibrium point if it satisfies (17).

35 / 56





Next let us consider a discrete-time zero-sum stochastic game. Westill use the system (15) where u1 and u2 are the controls of twoplayers. The discrete-time zero-sum stochastic game, Problem(ZSG), is to find (u∗1, u

∗2) ∈ U1 × U2 such that

J(u∗1, u2) ≤ J(u∗1, u∗2) ≤ J(u1, u

∗2), ∀(u1, u2) ∈ U1 × U2, (18)

where

J(u1, u2) = E

N−1∑k=0

l(k,Xk, u1,k, u2,k) + h(XN )

.

Such (u∗1, u∗2) is called a saddle point.

36 / 56





By transforms, Problem (ZSG) could be turned to a nonzero-sumstochastic game of Problem (NZSG)’s type. In fact, by defining

l1 = l, h1 = h, J1 = J, l2 = −l, h2 = −h, J2 = −J,

Problem (ZSG) is equivalent to finding (u∗1, u∗2) such that

J1(u∗1, u∗2) = inf

u1∈U1J1(u1, u

∗2), J2(u

∗1, u∗2) = inf

u2∈U2J2(u

∗1, u2),

where

Ji(u1, u2) = E

N−1∑k=0

li(k,Xk, u1,k, u2,k) + hi(XN )

, i = 1, 2.

37 / 56





Define the Hamiltonian by H = 〈b, p〉+ tr[σ>q

]+ l, and the adjoint

equation byPk = E

[Hx(k + 1, X∗k+1, u

∗1,k+1, u

∗2,k+1, Pk+1, Qk+1)

∣∣Fk],Qk = E

[Hx(k + 1, X∗k+1, u

∗1,k+1, u

∗2,k+1, Pk+1, Qk+1)W>k

∣∣Fk],k = 0, 1, · · · , N − 2,PN−1 = E [hx(X∗N )|FN−1] ,QN−1 = E

[hx(X∗N )W>N−1

∣∣FN−1

].

By the necessary MP of Problem (NZSG) [see Theorem 3], we get thefollowing necessary MP for Problem (ZSG).

Theorem 5. Suppose (u∗1, u∗2) is a saddle point of Problem (ZSG).

Then for each k ∈ T ,⟨Hu1(k,X∗k , u

∗1,k, u

∗2,k, Pk, Qk), v1,k − u∗1,k

⟩≥ 0, ∀v1,k ∈ U1,k,⟨

Hu2(k,X∗k , u∗1,k, u

∗2,k, Pk, Qk), v2,k − u∗2,k

⟩≤ 0, ∀v2,k ∈ U2,k.

(19)

38 / 56





By the sufficient MP of Problem (NZSG) [see Theorem 4], weget the following sufficient MP for Problem (ZSG).

Theorem 6. Let (u∗1, u∗2) be a pair of admissible controls for

Problem (ZSG). Assume that H(k, ·, ·, u∗2,k, Pk, Qk) is convex,H(k, ·, u∗1,k, ·, Pk, Qk) is concave and h(x) = Rx+ ξ with R beinga bounded FT -measurable r.v. and ξ an integrable FT -measurabler.v.. Then (u∗1, u

∗2) is a saddle point if (19) holds.

39 / 56




Application to one investment/consumption choice problemApplication to one kind of discrete-time LQ problem

Outline

1 Introduction





40 / 56





Example 1. Suppose there is a market in which m+ 1 assets aretraded discontinuously. The price Bk, k ∈ T of one risk-lessasset satisfies

Bk+1 −Bk = rkBk, k ∈ T ; B0 = b0. (20)

The prices of m risk assets, Si,k, k ∈ T , i = 1, 2, · · · ,m, satisfy

Si,k+1 − Si,k = Si,k(µi,k + σi,kWk), k ∈ T ; Si,0 = si. (21)

We now consider an investor whose wealth at time k is Xk.Denote by ui,k the wealth invested in the i-th stock at time k, andcall uk = (u1,k, u2,k, · · · , um,k)> a portfolio of the investor at timek. Assume moreover that consumption occurs at each time k ∈ T ,and denote by vk, k ∈ T the consumption sequence. Then

Xk+1−Xk =

Xk −m∑i=1

ui,k

Bk(Bk+1−Bk)+

m∑i=1

ui,kSi,k

(Si,k+1−Si,k)−vk.

41 / 56





Thus the dynamics of the wealth satisfiesXk+1 =

[(rk + 1)Xk + b>k uk − vk

]+ σ>k ukWk,

X0 = x0,

where bk = (µ1,k − rk, µ2,k − rk, · · · , µm,k − rk)> andσk = (σ1,k, σ2,k, · · · , σm,k)>. Assume that r, b, σ are bounded.

Let U be the set of u = uk, k ∈ T , where each uk is anFk-measurable and Rm-valued r.v. with necessary integrability.Denote by V the collection of all v = vk, k ∈ T such that eachvk is real-valued and Fk-measurable, vk ≥ a with some constanta > 0, and v has necessary integrability.

42 / 56





The objective of the investor is to find (u∗, v∗) ∈ U × V such that

J(u∗, v∗) = inf(u,v)∈U×V

J(u, v)

with

J(u, v) = E

N−1∑k=0

[1

2Lk|uk − ηk|2 −Gk

(vk)1−δ

1− δ

]−RXN

.

Assume that Lk, Gk and R are bounded scalar-valued r.v.’s, ηk is abounded Rm-valued r.v. representing a benchmark, and δ is aconstant representing the Arrow-Pratt index of the risk aversion.Assume 0 < δ < 1, Gk > 0, Lk > c and R > c for some c > 0.

Note that the objective consists of three parts. The first one is aminimization of the difference between the portfolio and a givenbenchmark, the second is a maximization of a utility of theconsumption and the last is a maximization of the terminal wealth.

43 / 56





Let (H1) and (H2) hold true. The Hamiltonian is defined by

H =[(rk + 1)x+ b>k u− v

]p+ σ>k uq +

Lk2|u− ηk|2 −Gk

v1−δ

1− δ.

It’s easily seen that H and h(x) = −Rx are convex functions of(x, u, v). Applying Theorem 1 shows that the optimal control(u∗, v∗) satisfies

〈bkPk + σkQk + Lk(u∗k − ηk), u− u∗k〉 ≥ 0, ∀u ∈ Rm, (22)[

−Pk −Gk(v∗k)−δ]

(v − v∗k) ≥ 0, ∀v ≥ a (23)

for each k, where (Pk, Qk), k ∈ T uniquely solvesPk = E [(rk+1 + 1)Pk+1|Fk] , k = 0, 1, · · · , N − 2,Qk = E [(rk+1 + 1)Pk+1Wk|Fk] , k = 0, 1, · · · , N − 2,PN−1 = −E [R|FN−1] , QN−1 = −E [RWN−1|FN−1] .

(24)

44 / 56





By Theorem 2, an admissible control (u∗, v∗) satisfying (22) and(23) is indeed an optimal control.

From (22) we derive

u∗k = ηk − L−1k (bkPk + σkQk). (25)

Since H(v) := −Pkv −Gk v1−δ

1−δ is a differentiable and convexfunction, (23) implies that H(v), v ∈ [a,+∞), takes its minimumat v∗k. Note that Pk < 0, k ∈ T . Then the solution of (23) is

v∗k =

vk, if vk ≥ a,a, if vk < a,

(26)

with vk =(−GkPk

) 1δ

.

45 / 56





It’s easy to see that each Pk is bounded. Thus, Qk ∈ Lα and soQk ∈ Lβ since β ≤ α. This implies that u∗k defined by (25) isindeed admissible. Using induction we can get Pk < −c for k ∈ T ,which shows that vk is bounded. Thus, v∗k defined by (26) isbounded and so admissible. Consequently, (u∗k, v∗k) defined by(25) and (26) is the optimal strategy of this investmentconsumption choice problem.

Eq. (24) can be solved explicitly in some special cases. Ifrk, k = 1, 2, · · · , N − 1 and R are deterministic constants, andE [Wk|Fk] = 0, k ∈ T , then (24) is uniquely solved by Qk ≡ 0 and

Pk =

−R(rN−1 + 1)(rN−2 + 1) · · · (rk+1 + 1), if 0 ≤ k ≤ N − 2,−R, if k = N − 1.

46 / 56





Example 2. Let us consider one kind of discrete-time LQstochastic optimal control problem in the case when d = 1,E[Wk|Fk] = 0 and Uk = Rm for each k ∈ T .

Let us define

b(k, x, v) = Akx+Bkv + Ck, σ(k, x, v) = Dkx+ Ekv + Fk,

l(k, x, v) = [〈Gkx, x〉+ 〈Lkv, v〉] /2, h(x) = 〈Rx, x〉,

where the coefficients are deterministic and bounded matrices.Moreover, Gk, R ≥ 0 and Lk > 0 for each k.

47 / 56





By Theorems 1 and 2, u∗k = −L−1k (B>k Pk + E>k Qk) is an optimal

control of this LQ problem if it’s admissible, where (P,Q) solvesPk = E

[A>k+1Pk+1 +D>k+1Qk+1 +Gk+1X

∗k+1

] ∣∣Fk ,Qk = E

[A>k+1Pk+1 +D>k+1Qk+1 +Gk+1X

∗k+1

]Wk

∣∣Fk ,k = 0, 1, · · · , N − 2,PN−1 = E[RX∗N |FN−1], QN−1 = E[RX∗NWN−1|FN−1].

Let us give a state-feedback form of u∗ in the simple case when Dk = 0and Ek = 0 for each k ∈ T . In this case, we have

u∗k = −L−1k B>k Pk, (27)

wherePk = E

[A>k+1Pk+1 +Gk+1X

∗k+1

] ∣∣Fk , k = 0, 1, · · · , N − 2,PN−1 = E[RX∗N |FN−1].

To conclude, we only need to express Pk in terms of X∗k .

Since the diffusion term is independent of x and v, and Qk is not

needed, the square-integrability issue disappears in this case.48 / 56





By Lemma 4, we havePk = (AN−1

k )>RE[X∗N |Fk] +N−1∑j=k+1

(Aj−1k )>GjE[X∗j |Fk],

k = 0, · · · , N − 2,PN−1 = RE[X∗N |FN−1],

(28)

where Aji = AjAj−1 · · ·Ai+1 if i < j, and Aji = In if i = j.

The subsequent idea is: Pk ⇐= E[X∗k+1|Fk]⇐= X∗k .Let us assume

Pk = αkE[X∗k+1|Fk] + βk, k ∈ T , (29)

where αk, βk, k ∈ T are deterministic matrices which are to determinedlater. Then by (27), we have

u∗k = −L−1k B>k (αkE[X∗k+1|Fk] + βk). (30)

49 / 56





By substituting (30) into the state equation and taking conditionalexpectation we get

ΦkE[X∗k+1|Fk] = AkX∗k + Ψk, k ∈ T ,

where Φk = In +BkL−1k B>k αk and Ψk = Ck −BkL−1k B>1,kβk.

Let us assume

Φ−1k exists and is bounded, ∀k ∈ T . (31)

ThenE[X∗k+1|Fk] = φkX

∗k + ψk, k ∈ T (32)

withφk = Φ−1k Ak, ψk = Φ−1k Ψk. (33)

50 / 56





Consequently, by (30) and (32) we get

u∗k = −L−1k B>k αkφkX∗k − L−1k B>k (αkψk + βk). (34)

This is the feedback form. It remains to determine (αk, βk).

By (32), we can use induction to get

E[X∗i |Fk] = φi−1k E[X∗k+1|Fk] +

i−1∑j=k+1

φi−1j ψj , i ≥ k + 2, (35)

where φji is defined by: φji = φjφj−1 · · ·φi+1 if i < j, and φji = Inif i = j. That is, we can express E[X∗i |Fk], i ≥ k + 2, in terms ofE[X∗k+1|Fk] only.

51 / 56





Taking i = N and k = N − 2 in (35) gives

E[X∗N |FN−2] = φN−1E[X∗N−1|FN−2] + ψN−1. (36)

By (28) we have

PN−2 = [(AN−1)>RφN−1 +GN−1]E[X∗N−1|FN−2] + (AN−1)>RψN−1.

Recall from (28) that Pk satisfies that

Pk = (AN−1k )>RE[X∗N |Fk] +

N−1∑j=k+2

(Aj−1k )>GjE[X∗j |Fk]

+Gk+1E[X∗k+1|Fk], k = 0, 1, · · · , N − 3,PN−2 = [(AN−1)>RφN−1 +GN−1]E[X∗N−1|FN−2]

+(AN−1)>RψN−1,PN−1 = RE[X∗N |FN−1].

(37)

52 / 56





Next, substituting (29) and (35) into (37) and comparing the coefficientsleads to

αN−1 = R, βN−1 = 0,αN−2 = GN−1 + (AN−1)>RφN−1, βN−2 = (AN−1)>RψN−1,

αk = Gk+1 +N−1∑j=k+2

(Aj−1k )>Gjφ

j−1k + (AN−1

k )>RφN−1k ,

βk =N−1∑j=k+2

(Aj−1k )>Gj

j−1∑s=k+1

φj−1s ψs + (AN−1

k )>RN−1∑j=k+1

φN−1j ψj ,

k = 0, 1, · · · , N − 3.(38)

Note that (αk, βk), k ∈ T defined above are bounded.

Finally the conclusion is as follows. Let (φk, ψk) be defined by (33)

and (αk, βk), k ∈ T be defined by (38). If (31) holds, then u∗ defined

by (34) is the admissible optimal control of this LQ problem.

53 / 56




Outline

1 Introduction





54 / 56




Conclusions.

One kind of discrete-time stochastic optimal control problem isstudied. The main result is to establish the necessary as well assufficient stochastic maximum principle in a rigorous and clear way.A difficulty arising from the lack of necessary integrability of thesolution to the adjoint equation is overcome. The results areextended to two kinds of discrete-time stochastic games. Twoapplications are studied, for which the optimal controls are derived.This research paves a way for further related works.

More extensions.

general control domain case

different kinds of control systems, i.e., delayed system,mean-field system, · · ·state/control constrained cases

partial information/ partial observation cases55 / 56




Thank Y ou !

56 / 56

Documents

Maximum principle for one kind of discrete-time stochastic ... › events › 2019 › qfinance › files › zhen.pdf · Extensions to stochastic games Applications Conclusions and