97
S N N x 0 m x m-1 x m y m m x m = f m (x m-1 ,y m ) f m m y 1 y 2 ... y N x 0 x N x 0 F (x 0 ,x 1 ,...,x N ,y 1 ,y 2 ,...,y N )= N X m=1 g m (x m-1 ,y m ), y * 1 y * 2 ... y * N F (x 0 ,x 1 ,...,x N ,y * 1 ,y * 2 ,...,y * N )= sup y1,y2,...,y N F (x 0 ,x 1 ,...,x N ,y 1 ,y 2 ,...,y N ). y * 1 y * 2 ... y * N 1 n N y 1 y 2 ... y n

Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

CHAPTER 1

Dynamic programming

Dynamic programming is an algorithm which enables to solve a certain class ofproblems, by an induction argument which reduces them to simpler sub-problems.Or, to put it in the reverse direction, this approach allows to tackle dicult prob-lems by solving simpler ones rst and relating these solutions to the harder contextby intrinsic recurrence relations. It plays an important role in computer science, asit can be used to construct eective algorithms of polynomial complexity. Further-more, the method is utilized in optimal planning problems (e.g. in problems of theoptimal distribution of resources available, the theory of inventory management,replacement of equipment, and so on). From the viewpoint of our further purposes,we would like to emphasize that the dynamic programming lies at the heart of theBellman function technique which will be developed later.

The basic idea behind the dynamic programming can be formulated as follows.Suppose that a given system S is controlled with a procedure which consists of Nsteps, where N is a xed positive integer. At the beginning, the system is in somestate x0. At the m-th step of the procedure, there is a possibility of applying a classof controls, each of which transforms the state xm−1 obtained through the previousoperations into some new state xm. Formally, if ym denotes the control applied atm-th step, then we have the identity

xm = fm(xm−1, ym)

for some fm (the transition function associated with m-th step). An importantfeature of the method is the absence of after-eects, i.e. the controls selected for agiven step may only aect the state of the system at that moment.

As the result of the whole procedure y1, y2, . . ., yN , the system is convertedfrom the state x0 into the nal state xN . Now the problem can be stated as follows:given the initial state x0 and a xed objective function

F (x0, x1, . . . , xN , y1, y2, . . . , yN ) =

N∑m=1

gm(xm−1, ym),

identify the controls y∗1 , y∗2 , . . ., y

∗N which yield the optimal performance, i.e., such

that we have

F (x0, x1, . . . , xN , y∗1 , y∗2 , . . . , y

∗N ) = sup

y1,y2,...,yN

F (x0, x1, . . . , xN , y1, y2, . . . , yN ).

Conventional methods of tackling such problems are often either inapplicable, orinvolve lengthy and elaborate calculations. Dynamic programming allows to solvethe above problem by a recursive formula, which rests on the so-called optimalityprinciple of Richard Bellman. For simplicity, let us assume that the controls y∗1 , y

∗2 ,

. . ., y∗N exist (i.e., the optimal performance is not attained in the limit). Supposethat for some 1 ≤ n ≤ N , we have chosen certain controls y1, y2, . . ., yn up to step

1

Page 2: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2 1. DYNAMIC PROGRAMMING

n, transforming the system from the state x0 to the state xn. So, to conclude theprocedure, we need to select the remaining controls yn+1, yn+2, . . ., yN . Then, ifthe part of F corresponding to these remaining controls is not optimized, that is,if the maximum of the function

Fn(xn, xn+1, . . . , xN , yn, yn+1, . . . , yN ) =

N∑m=n+1

gm(xm−1, ym)

is not attained, then the procedure as a whole will not be optimal. Now, for anyn = 0, 1, 2, . . . , N and any state x, dene Bn(x) to be the maximal possible valueof the value function Fn assuming that after n-th step the system is in the state x.Then BN (x) = 0, and the optimality principle explained above yields the identity

(1.1) Bn−1(x) = supy

Bn(fn(x, y)) + gn(x, y)

, n = 1, 2, . . . , N,

called the Bellman equation. Solving this system of functional equations, we obtainB0(x0), the value of the optimal performance under the assumption that the systemis initially in the state x0. We also identify the optimal controls y∗1 , y

∗2 , . . ., y

∗N :

these are the parameters for which the suprema in (1.1) are attained. Specically,having determined B0, B1, . . ., BN (called the Bellman sequence in the sequel), wend y∗1 by the requirement

B0(x0) = B1(f1(x0, y∗1)) + g1(x0, y

∗1).

Let x1 = f1(x0, y∗1) be the position of the system after the optimal rst move. Then

we get the control y∗2 from the equation

B1(x1) = B2(f2(x1, y∗2)) + g2(x1, y

∗2),

and so on.Summarizing, the approach rests on solving the initial problem by imbedding

it into a class of similar sub-problems, the collection of which can be treated, asa whole, by means of the recursive formulas. We should point out, however, thatstill, in many cases, the analysis of the obtained setting can be technically involvedand require plenty of laborious computations.

In what follows, we will see the above reasoning in various disguises. We havepurposefully restrained ourselves from the discussion concerning the state space orthe class of feasible controls; this would probably complicate the above presentation,as these objects can be multidimensional or change from step to step. Sometimesit will be convenient to enumerate the steps by numbers 1, 2, . . ., N instead of0, 1, . . ., N . Moreover, in many cases it will be more natural to work with thereversed sequence BN , BN−1, . . ., B1 instead of B1, B2, . . ., BN (then the lowerindex indicates the number of steps up to the termination of the process). However,the main idea remains essentially unchanged.

Instead of exploring further the abstract description, we continue with theanalysis of several examples which will serve as an illustration of the above concepts.

1.1. A toy example: AM-GM inequality

We will show how dynamic programming can yield one of the most fundamentalinequalities in mathematics.

Page 3: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.1. A TOY EXAMPLE: AM-GM INEQUALITY 3

Theorem 1.1. For any positive integer N and any nonnegative numbers a1,a2, . . ., aN we have

a1 + a2 + . . .+ aNN

≥ (a1a2 . . . aN )1/N .

Equality holds if and only if a1 = a2 = . . . = aN .

Proof. The rst step in the analysis is to rephrase the desired claim into theproblem of the optimization of some value function. This can be done as follows.Fix a nonnegative number x and consider the quantity

sup

a1a2 . . . aN : a1, a2, . . . , aN ≥ 0, a1 + a2 + . . .+ aN = x

.

We need to show that for any x, the above supremum does not exceed (x/N)N .This problem can be studied with the use of dynamic programming approach, wherethe sub-problems correspond to AM-GM inequalities applied to subsequences of a1,a2, . . ., aN . More precisely, x n ∈ 1, 2, . . . , N, a nonnegative number x andconsider the quantity

Bn(x) = sup

anan+1 . . . aN

,

where the supremum is taken over all sequences an, an+1, . . ., aN of nonnegativenumbers satisfying an + an+1 + . . . + aN = x. To put this into the frameworkpresented in the previous section, the state of the system at step k is described byx, the sum of the numbers ak + ak+1 + . . .+ aN , while the control in step k is thevalue of the variable ak ∈ [0, x], which transforms the state x into x− ak.

The dynamic approach rests on writing the system of equations which governthe evolution of the sequence (Bn)Nn=1. Bellman's optimality principle implies thatfor any n = 0, 1, . . . , N − 1 and any x ≥ 0 we have

(1.2) Bn(x) = supt∈[0,x]

Bn+1(x− t) · t

.

Indeed, if an, an+1, . . ., aN are arbitrary nonnegative numbers summing up to xand we denote t = an ∈ [0, x], then

anan+1 . . . aN = t · an+1an+2 . . . aN ≤ Bn+1(x− t) · t,by the denition of Bn+1 and the fact that an+1 + an+2 + . . .+ aN = x− t. Takingthe supremum over all an, an+1, . . ., aN as above, gives the inequality ≤ in (1.2).To get the reverse, we x t ∈ [0, x] and nonnegative numbers an+1, an+2, . . ., aNsumming up to x− t. Then t+ an+1 + an+2 + . . .+ aN = x, so the denition of Bnyields

t · an+1an+2 . . . an ≤ Bn(x).

Now, taking the supremum over an+1, an+2, . . ., aN as above gives tBn+1(x− t) ≤Bn(x), and since t ∈ [0, x] was arbitrary, the identity (1.2) follows. The descriptionof the sequence (Bn)Nn=1 is completed by observing that BN (x) = x, directly fromthe denition.

It remains to solve the recurrence. It is easy to compute that BN−1(x) = (x/2)2

and BN−2(x) = (x/3)3, which leads to the conjecture

(1.3) Bn(x) = (x/(N − n+ 1))N−n+1.

This can be veried by a straightforward induction. Let us briey present thecalculations, as they will be useful in the identication of the optimal controls. For

Page 4: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4 1. DYNAMIC PROGRAMMING

n = N the hypothesis is true. Assuming the validity for a xed n+1 ∈ 2, 3, . . . , N,we derive that the expression

Bn+1(x− t) · t =

(x− tN − n

)N−n· t,

considered as a function of t, attains its maximum for t = x/(N − n + 1) (only).Furthermore, this maximal value is equal to (x/(N − n + 1))N−n+1. This yields(1.3) and the claim follows.

The above calculations encode, for any xed x ≥ 0, the optimal sequences a∗1,a∗2, . . ., a

∗N for BN (x). To see this, assume that x > 0 (for x = 0 there is nothing

to prove). We go back to the above proof of (1.2). We have

B1(x) = supt∈[0,x]

B2(x− t) · t

,

and the supremum is attained for the unique choice t = x/N . This necessarilyimplies that a∗1 must be equal to x/N . Otherwise, we would have

B1(x) = supt∈[0,x]

B2(x− t) · t

> B2(x− a∗1) · a∗1 ≥ a∗1 · a∗2a∗3 . . . a∗N ,

where in the last passage we have exploited the denition of B2 and the fact thata∗2, a

∗3, . . ., a

∗N sum up to x− a∗1. This would contradict the optimality of (a∗k)Nk=1.

Therefore we have a∗1 = x/N and a∗2 + a∗3 + . . . + a∗N = (N − 1)x/N . How to getany information on a∗2, a

∗3, . . ., a

∗N? Write

a∗1a∗2 . . . a

∗N = B1(x) = B2

(x− x

N

)· xN

= B2

(x− x

N

)· a∗1,

or, equivalently (recall that we have assumed x > 0)

a∗2a∗3 . . . a

∗N = B2

(x− x

N

).

This brings us to the same position as above, with the length of the unknownextremal sequence decreased by 1 (and the required sum x replaced by x − x/N).Repeating the arguments, we show that a∗2 = (x−x/N)/(N − 1) = x/N , a∗3 + a∗4 +. . .+ aN = x− 2x/N , and the numbers a∗3, a

∗4, . . ., a

∗N satisfy

a∗3a∗4 . . . a

∗N = B3

(x− 2x

N

),

and so on. The procedure can be carried out until we get all the values of a∗1, a∗2,

. . ., a∗N : one easily checks by induction that a∗1 = a∗2 = . . . = a∗N = x/N is theextremal sequence we have searched for.

Remark 1.1. We could have shown the estimate by considering the dualproblem

inf

a1 + a2 + . . .+ aN : a1, a2, . . . , aN ≥ 0, a1a2 . . . aN = x

for x ≥ 0, imbedded into the simpler sub-problems

Bn(x) = inf

an+an+1+. . .+aN : an, an+1, . . . , aN ≥ 0, anan+1an+2 . . . aN = x

for n = 1, 2, . . . , N , x ≥ 0.

Page 5: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.2. TWO ONE-DIMENSIONAL DP PROBLEMS 5

1.2. Two one-dimensional DP problems

We continue the presentation of the examples, starting with dimension 1, i.e.,with the case in which the Bellman sequence (Bn)n consists of functions of onevariable.

Theorem 1.2. For any nonnegative numbers a1, a2, . . ., aN we have

(1.4) (1 + a1)(1 + a2) . . . (1 + aN ) ≥ (1 + (a1a2 . . . aN )1/N )N .

The equality holds if and only if a1 = a2 = . . . = aN .

Proof. We proceed as previously, but we will use a slightly dierent enu-meration for the Bellman sequence. For any integer n = 1, 2, . . . , N and x ≥ 0,let

Bn(x) = inf

(1 + a1)(1 + a2) . . . (1 + an)

,

where the inmum is taken over all sequences a1, a2, . . ., an of nonnegative numberssatisfying a1a2 . . . an = x. We need to show that BN (x) ≥ (1 + x1/N )N . Directlyfrom the denition, we see that B1(x) = 1 + x and Bellman optimality principleyields the recurrence

(1.5) Bn+1(x) = inft>0

(1 + t)Bn

(xt

).

We easily show inductively that Bn(x) = (1 + x1/n)n satises this recurrence (andthe initial requirement B1(x) = 1 + x). Furthermore, the inmum in (1.5) isattained for t = x1/(n+1) only. Consequently, the desired estimate holds, and theoptimal controls a∗1, a

∗2, . . ., a

∗N must satisfy a∗n+1 = (a∗1a

∗2 . . . a

∗n+1)1/(n+1) for all

n = 1, 2, . . . , N − 1. The latter condition is equivalent to saying a∗1 = a∗2 = . . . =a∗N = x1/N and the proof is complete.

Now we will turn our attention to a slightly more involved result. The followingHardy-type estimate can be easily established with the use of Schwarz' inequality.However, it is instructive to present an alternative approach based on dynamicprogramming.

Theorem 1.3. For any positive integer n and any real numbers a1, a2, . . ., anwe have the inequality

(1.6)

n∑k=1

(a1 + a2 + . . .+ ak

k

)≤

(2n−

n∑k=1

1

k

)1/2( n∑k=1

a2k

)1/2

.

For any n, the constant(2n−

∑nk=1

1k

)1/2cannot be improved.

Proof of (1.6). In the light of the previous estimates, it is natural to studythe function

Bn(x) = sup

n∑k=1

(a1 + a2 + . . .+ ak

k

):

n∑k=1

a2k = x

for a xed x ≥ 0. However, it seems that the successful treatment of such aproblem with the use of dynamic programming requires the introduction of anotherparameter. Indeed, if we try to express Bn+1(x) in terms of Bn at some point, wesee that we must introduce the free variable t (i.e., the variable over which the

Page 6: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

6 1. DYNAMIC PROGRAMMING

supremum in the recurrence is taken) equal to an+1: this enables the handling of

the sum∑n+1k=1 a

2k =

∑nk=1 a

2k + t2, as previously. However, we have

n+1∑k=1

(a1 + a2 + . . .+ ak

k

)=

n∑k=1

(a1 + a2 + . . .+ ak

k

)+a1 + a2 + . . .+ an + t

n+ 1,

and we do not have any control over the second term on the right. To remedy this,let us introduce the additional parameter y = a1 + a2 + . . .+ an. Then, if we set

Bn(x, y) = sup

n∑k=1

(a1 + a2 + . . .+ ak

k

):

n∑k=1

a2k = x,

n∑k=1

ak = y

,

then the optimality principle yields

Bn+1(x, y) = sup

Bn(x− t2, y − t) +

y

n+ 1

,

where the supremum is taken over all t ≥ 0 satisfying t ≤ y and t2 ≤ x. Thus, wehave to perform the analysis of the two-dimensional problem, which in general ismuch more dicult than the one-dimensional ones.

This raises the question whether the estimate (1.6) can be reformulated so thatit can be treated with Bellman sequence which consists of functions of one variable.The answer turns out to be positive and involves the linearization of the inequalityrst. Namely, the desired estimate is equivalent to

(1.7)

n∑k=1

(a1 + a2 + . . .+ ak

k

)−

n∑k=1

a2k ≤

n

2− 1

4

n∑k=1

1

k.

Indeed, the above bound follows from (1.6) and Schwarz' inequality. To prove thereverse implication, note that given a1, a2, . . ., an, the application of (1.7) to the

sequence λa1, λa2, . . ., λan with λ =(n2 −

14

∑nk=1

1k

)1/2/(∑n

k=1 a2k

)1/2, gives an

inequality equivalent to (1.6).Now, for any positive integer n and any x ∈ R, consider the quantity

Bn(x) = sup

n∑k=1

(a1 + a2 + . . .+ ak

k

)−

n∑k=1

a2k

,

where the supremum is taken over all sequences a1, a2, . . ., an of real numberssatisfying a1 + a2 + . . .+ an = x. Directly from this denition, we check that

(1.8) B1(x) = x− x2.

Furthermore, Bellman optimality principle becomes, as the reader veries readily,

(1.9) Bn+1(x) = supt∈R

Bn(x− t) +

x

n+ 1− t2

.

To solve this recurrence, one can compute that

B2(x) = x− x2

2+

1

8, B3(x) = x− x2

3+

1

8+

1

6,

which suggests that for general n we have the formula

(1.10) Bn(x) = x− x2

n+ bn,

Page 7: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.3. A FEW HIGHER-DIMENSIONAL DP PROBLEMS 7

for some (bn)n≥1 to be found. We proceed by induction. We have already checked(1.10) for n = 1, 2, 3 with b1 = 0, b2 = 1/8 and b3 = 1/8 + 1/6. Straightforwardcalculations show that the expression

Bn(x− t) +x

n+ 1− t2 = x− t− (x− t)2

n+ bn +

x

n+ 1− t2,

considered as a function of t, attains its maximal value at

(1.11) t = (x− n/2)/(n+ 1),

equal to x − x2 + bn + n4(n+1) . This proves the validity of (1.10), with bn+1 =

bn + n4(n+1) . Consequently, we have

bn =

n−1∑k=1

k

4(k + 1)

and by the very denition of Bn(x), we have established the estimate

n∑k=1

(a1 + a2 + . . .+ ak

k

)−

n∑k=1

a2k ≤ x−

x2

n+

n−1∑k=1

k

4(k + 1).

It remains to observe that x− x2/n ≤ n/4 and do some simple manipulations withthe right-hand side to obtain (1.7).

To identify the optimal controls a∗1, a∗2, . . ., a

∗n, we must ensure that all the

inequalities we obtained on our way become equalities. First, note that we musthave x = a∗1 +a∗2 + . . .+a∗n = n/2: then x−x2/n = n/4, so we do not lose anythingin the last step of the proof of (1.7). Next, it follows from the proof of (1.9) andthe formula (1.11) that for each k ∈ 0, 1, 2, . . . , n − 1, the optimal number a∗kmust satisfy

a∗k+1 =a∗1 + a∗2 + . . .+ a∗k+1 − k/2

k + 1,

or ka∗k+1 = a∗1 + a∗2 + . . . + a∗k − k/2. This recurrence can be solved by induction,

and the outcome is a∗k =∑nj=k(2j)−1.

1.3. A few higher-dimensional DP problems

The rst example is the inequality equivalent to (1.4), but it is a nice starterfor two-dimensional problems.

Theorem 1.4. For any nonnegative numbers a1, a2, . . ., aN , b1, b2, . . ., bNwe have

(1.12) (a1 + b1)(a2 + b2) . . . (aN + bN ) ≥ ((a1a2 . . . aN )1/N + (b1b2 . . . bN )1/N )N .

Proof. We may assume that all numbers ai and bj are non-zero (otherwise,the claim is obvious). For any x, y > 0, consider the function

Bn(x, y) = sup

(a1 + b1)(a2 + b2) . . . (an + bn)

,

where the supremum is taken over all sequences a1, a2, . . . , an, b1, b2, . . ., bn ofpositive numbers such that a1a2 . . . an = x and b1b2 . . . bn = y. Clearly, we haveB1(x, y) = x+ y, and Bellman optimality principle yields

(1.13) Bn+1(x, y) = sup

(s+ t)Bn(x/s, y/t)

,

Page 8: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

8 1. DYNAMIC PROGRAMMING

where the supremum is taken over all s, t > 0. One easily shows by inductionthat the solution to this recurrence is given by Bn(x, y) = (x1/n + y1/n)n, and thesupremum in (1.13) is attained for s, t such that y/x = (t/s)n+1. This establishesthe desired inequality; furthermore, we obtain that the optimal controls a∗1, a

∗2, . . .,

a∗N , b∗1, b∗2, . . ., b

∗N satisfy

a∗1a∗2 . . . a

∗n+1

b∗1b∗2 . . . b

∗n+1

=

(a∗n+1

b∗n+1

)1/(n+1)

, n = 1, 2, . . . , N − 1,

i.e., the equality in (1.12) is attained if and only if a1/b1 = a2/b2 = . . . = aN/bN .Of course, this is in perfect consistency with the assertion of Theorem 1.2.

The next inequality is related to the inclusion-exclusion principle in probabilitytheory.

Theorem 1.5. For any N ≥ 2 and any numbers a1, a2, . . ., aN belonging to[0, 1] we have

N∏n=1

(1− an) ≤ 1−N∑n=1

an +∑

1≤n<m≤N

anam.

Proof. We rewrite the inequality in the equivalent form

N∏n=1

(1− an) ≤ 1−N∑n=1

an +1

2

( N∑n=1

an

)2

−N∑n=1

a2n

,

which suggests to consider the functions

Bn(x, y) = sup

N∏n=1

(1− an), x, y ≥ 0,

the supremum taken over all sequences a1, a2, . . ., an of numbers in [0, 1] suchthat a1 + a2 + . . . + an = x and a2

1 + a22 + . . . + a2

n = y. The assumption a1,a2, . . ., an ∈ [0, 1] implies that x and y must satisfy certain additional relations:for instance, we must have y ≤ minx, x2. In other words, the domain of Bnhas a little more complicated structure. In general, this is an important issue: theanalysis of the Bellman sequence (or Bellman function) involves the careful handlingof the corresponding domains (which is already visible from the examples we havestudied so far: the domain imposes some restrictions on the controls allowed in therecurrence).

However, in the above example we need not worry about this. Directly from thedenition, we have B2(x, y) = 1−x+ (x2−y)/2. Furthermore, Bellman optimalityprinciple reads

(1.14) Bn+1(x, y) = sup

(1− t)Bn(x− t, y − t2)

,

where the supremum is taken over all t ∈ [0, 1] such that (x− t, y − t2) belongs tothe domain of Bn. Solving this recurrence seems to be a little complicated task(there are several cases to consider). This is why we will develop a slightly dierentargument, which will be of fundamental importance in our future considerations.Namely, the assertion of the theorem is equivalent to checking that Bn(x, y) ≤1 − x + (x2 − y)/2. Consider the function U(x, y) = 1 − x + (x2 − y) given forall x, y ≥ 0 satisfying y ≤ x2. We will show that this auxiliary function satises

Page 9: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.3. A FEW HIGHER-DIMENSIONAL DP PROBLEMS 9

the equation (1.14); however, the crucial dierence is that now in the supremumwe allow all t such that (x − t, y − t2) lies in the domain of U . This is a strictlylarger set of controls, which will actually make U bigger than Bn (this is why wewill sometimes call U a supersolution). It is enough to prove that

(1.15) U(x, y) ≥ sup

(1− t)U(x− t, y − t2)

.

To this end, take t such that x− t ≥ 0, y− t2 ≥ 0 and y− t2 ≤ (x− t)2. Note that

(1− t)U(x− t, y − t2) = (1− t)(

1− (x− t) +(x− t)2 − (y − t2)

2

)=

(1− x+

x2 − y2

)+ t · y − t

2 − (x− t)2

2

≤ U(x, y),

since y− t2 ≤ (x− t)2 (the point (x− t, y− t2) lies in the domain of U). This yieldsthe supersolution property (1.15). The point is that the function U carries all theinformation necessary to prove the desired bound. Indeed, by (1.15), for any a1,a2, . . ., aN ∈ [0, 1] and any n = 2, 3, . . . , N ,

U(a1 + . . .+ an, a21 + . . .+ a2

n) ≥ (1− an)U(a1 + . . .+ an−1, a21 + . . .+ a2

n−1),

which by induction implies

(1− a2) . . . (1− aN )U(a1, a21) ≤ U(a1 + a2 + . . .+ aN , a

21 + a2

2 + . . .+ a2N ).

This is precisely the desired inequality.

Now we turn our attention to the more involved estimate, the classical Lp

estimate of Hardy, Littlewood and Pólya.

Theorem 1.6. Let 1 < p <∞. Then for any positive numbers a1, a2, . . . andλ1, λ2, . . . we have the inequality

(1.16)

∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p≤(

p

p− 1

)p ∞∑n=1

λnapn.

The constant (p/(p− 1))p is the best possible.

Remark 1.2. Setting λ1 = λ2 = . . . transforms the inequality into the morefamiliar form

∞∑n=1

(a1 + a2 + . . .+ an

n

)p≤(

p

p− 1

)p ∞∑n=1

apn,

in which the constant (p/(p− 1))p is also optimal.

Proof of (1.16). A natural idea is to try to nd, for any N ≥ 1, the bestconstant Cp,N in the truncated inequality

N∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p≤ Cp,N

N∑n=1

λnapn

(to which the dynamic programming can obviously be applied) and then showthat limN→∞ Cp,N = (p/(p − 1))p. However, the resulting system of functionalequations seems to be very dicult to solve. We will proceed in a dierent manner.It is convenient to split the reasoning into a few intermediate steps.

Page 10: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

10 1. DYNAMIC PROGRAMMING

Step 1. Bellman function and optimality principle. Clearly, it suces to provethe inequality for nite sequences a1, a2, . . . only (i.e., we may assume that there isN such that aN = aN+1 = aN+2 = . . . = 0). Furthermore, we may assume that thesequence a1, a2, . . . is nonincreasing: indeed, if we replace it by its nonincreasingrearrangement, then the right-hand side does not change, while the left can onlyincrease.

Let Cp be the best constant in the inequality (1.16). The rst step is to identifythe state space: it will be governed by the values of the initial parameters a1, λ1.Set

B(a1, λ1) = sup

∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=1

λnapn

,

where the supremum is taken over all positive numbers a2, a3, . . ., λ2, λ3, . . . suchthat an's are zero for suciently large n. Actually, we will change slightly thedenition of B later on - we shall see that in the analysis of (1.16) there is a bitmore convenient expression to maximize. To get the functional equation governingB, note that

∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=1

λnapn

= λ1ap1 − Cpλ1a

p1 +

[ ∞∑n=2

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=2

λnapn

].

(1.17)

Now the idea is to transform the expression in the square brackets into the shapesimilar to that appearing in the denition of B, by changing the parameters a1, a2,. . ., λ1, λ2, . . . appropriately. This is simple: the rst summand in the rst sum is

λ2

(λ1a1 + λ2a2

λ1 + λ2

)p,

while the n-th summand is(λ1a1+λ2a2λ1+λ2

· (λ1 + λ2) + λ3a3 + . . .+ λnan

(λ1 + λ2) + λ3 + . . .+ λn

)p.

Therefore, we have

∞∑n=2

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=2

λnapn

=

∞∑n=1

λ′n

(λ′1a′1 + λ′2a

′2 + . . .+ λ′na

′n

λ′1 + λ′2 + . . .+ λ′n

)p− Cp

∞∑n=1

λ′n(a′n)p

− λ1(a′1)p − Cpλ2ap2 + Cpλ

′1(a′1)p,

where a′1 = (λ1a1 + λ2a2)/(λ1 + λ2), λ′1 = λ1 + λ2, and a′n = an+1, λ

′n = λn+1 for

n = 2, 3, . . .. Note that a′1 ≤ a1 (since the sequence (an)n≥1 is nonincreasing) and

Page 11: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.3. A FEW HIGHER-DIMENSIONAL DP PROBLEMS 11

λ′1 > λ1. Plugging this into (1.17), we get

∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=1

λnapn − λ1a

p1 + Cpλ1a

p1

=

∞∑n=1

λ′n

(λ′1a′1 + λ′2a

′2 + . . .+ λ′na

′n

λ′1 + λ′2 + . . .+ λ′n

)p− Cp

∞∑n=1

λ′n(a′n)p − λ′1(a′1)p + Cpλ′1(a′1)p

+ (λ′1 − λ1)(a′1)p − Cp(λ′1 − λ1)1−p(λ′1a′1 − λ1a1)p,

since λ2 = λ′1 − λ1 and λ2a2 = λ′1a′1 − λ1a1. This identity indicates that it is more

convenient to study the function

B(a1, λ1)

= sup

∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=1

λnapn + (Cp − 1)λ1a

p1

,

where the supremum is taken over the same parameters as previously. The preced-ing identity yields

∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=1

λnapn − λ1a

p1 + Cpλ1a

p1

≤ B(a′1, λ′1) + (λ′1 − λ1)(a′1)p − Cp(λ′1 − λ1)1−p(λ′1a

′1 − λ1a1)p,

and hence

B(a1, λ1) ≤ sup

B(a′1, λ

′1) + (λ′1 − λ1)(a′1)p − Cp(λ′1 − λ1)1−p(λ′1a

′1 − λ1a1)p

,

where the supremum is taken over all a′1 ≤ a1 and λ′1 ≥ λ1 such that λ′1a′1 > λ1a1

(the latter estimate being equivalent to a2 > 0). A similar argument, involvingtaking the supremum over a1, a2, . . . rst, gives the reverse estimate. Thus wehave established the functional equation

B(a1, λ1)

= sup

B(a′1, λ

′1) + (λ′1 − λ1)(a′1)p − Cp(λ′1 − λ1)1−p(λ′1a

′1 − λ1a1)p

,

(1.18)

the supremum taken over all a′1, λ′1 satisfying the same requirements as previously.

Step 2. On the guess of Cp and B. An important homogeneity-type propertyof B is that

(1.19) B(a, λ) = λapB(1, 1) for any a, λ > 0.

Page 12: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

12 1. DYNAMIC PROGRAMMING

Indeed, given any sequences a1, a2, . . ., λ1, λ2, . . . as in the denition of B(1, 1),we see that

λap

[ ∞∑n=1

λn

(λ1a1 + λ2a2 + . . .+ λnan

λ1 + λ2 + . . .+ λn

)p− Cp

∞∑n=1

λnapn + (Cp − 1)λ1a

p1

]

=

∞∑n=1

λλn

(λλ1aa1 + λλ2aa2 + . . .+ λλnaan

λλ1 + λλ2 + . . .+ λλn

)p− Cp

∞∑n=1

λλn(aan)p

+ (Cp − 1)λλ1(aa1)p

≤ B(a, λ),

since aa1 = a and λλ1 = λ. Taking the supremum over all a1, a2, . . ., λ1, λ2,. . . as above gives B(a, λ) ≥ λapB(1, 1). The reverse estimate is shown similarly.Plugging (1.19) into the functional equation for B and dividing both sides by λap

gives

B(1, 1) = supα≤1, r>1, rα>1

rαpB(1, 1) + (r − 1)αp − Cp(r − 1)1−p(rα− 1)p

,

where we have set α = a′1/a1 and r = λ′1/λ1. Observe that the supremum isattained in the limit if we take r → 1, α → 1. Let us look what happens withthe identity when r and α are close to 1, say, r = 1 + δ and α = 1 − uδ, for someu ∈ (0, 1) and some small δ > 0. We have

B(1, 1) ≥ rαpB(1, 1) + (r − 1)αp − Cp(r − 1)1−p(rα− 1)p.

If u ∈ (1/p, 1) and δ is suciently small, then rαp < 1 and the inequality can berewritten in the form

B(1, 1) ≥δ(1− uδ)p − Cpδ1−p((1 + δ)(1− uδ)− 1

)p1− (1 + δ)(1− uδ)p

.

Letting δ → 0 gives

B(1, 1) ≥ 1− Cp(1− u)p

pu− 1.

Now we let u ↓ 1/p. If Cp < (p/(p − 1))p, then the right-hand side converges toinnity: a contradiction, since B(1, 1) ≤ Cp − 1, by the very denition of B (andthe fact that Hardy-Littlewood-Pólya inequality holds with the constant Cp). Thisshows how the constant (p/(p−1))p appears in the estimate. Suppose for a momentthat Cp = (p/(p− 1))p; then letting u→ 1/p returns B(1, 1) ≥ p

p−1 . This suggests

to assume that we actually have equality: B(1, 1) = pp−1 and by (1.19),

B(a, λ) =p

p− 1λap.

Step 3. A formal proof. Now we will show that the function B and the con-stant Cp we have guessed in the previous step actually work. The key step is theverication of the identity (1.18). It suces to check the inequality ≥, since thereverse bound is obtained by letting a′1 → a1 and λ′1 → λ. By (1.19), the inequalityis equivalent to

p

p− 1≥ p

p− 1rαp + (r − 1)αp −

(p

p− 1

)p(r − 1)1−p(rα− 1)p

Page 13: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.3. A FEW HIGHER-DIMENSIONAL DP PROBLEMS 13

for α ∈ [0, 1] and r ≥ α−1. Substitute x = αp and rewrite the bound in the form

F (r, x) =

(p

p− 1

)p(r − 1)1−p(rx1/p − 1)p +

p

p− 1(1− rx)− (r − 1)x ≥ 0.

One easily checks that F does not have extremal points inside the domain which,by standard arguments, yields the inequality. So, we have

B(a1, λ1)−B(a′1, λ′1) ≥ (λ′1 − λ1)(a′1)p −

(p

p− 1

)p(λ′1 − λ1)1−p(λ′1a

′1 − λ1a1)p

= λ2

(λ1a1 + λ2a2

λ1 + λ2

)p−(

p

p− 1

)pλ2a

p2.

Apply this inequality to the sequence a′1, a′2, . . ., λ

′1, λ

′2, . . .. As the result, we get

B

(λ1a1 + λ2a2

λ1 + λ2, λ1 + λ2

)+B

(λ1a1 + λ2a2 + λ3a3

λ1 + λ2 + λ3, λ1 + λ2 + λ3

)= B(a′1, λ

′1)−B(a′′1 , λ

′′1)

≥ λ′2(λ′1a′1 + λ′2a

′2

λ′1 + λ′2

)p−(

p

p− 1

)pλ′2(a′2)p

= λ3

(λ1a1 + λ2a2 + λ3a3

λ1 + λ2 + λ3

)p−(

p

p− 1

)pλ3a

p3,

and so on, by induction, if Λn = λ1 + λ2 + . . . + λn, An = (λ1a1 + λ2a2 + . . . +λnan)/Λn, then

B(An,Λn)−B(An+1,Λn+1) ≥ λn+1Apn+1 −

(p

p− 1

)pλn+1a

pn+1.

Write down these inequalities for n = 1, 2, . . . , N − 1 and sum them up to obtain

B(a1, λ1)−B(AN , λN ) ≥N∑n=2

λnApn −

(p

p− 1

)p N∑n=2

λnapn.

However, B(AN , λN ) ≥ 0 and

B(a1, λ1) =p

p− 1λ1a

p1 ≤

((p

p− 1

)p− 1

)λ1a

p1.

Combining this with the previous estimate yields

N∑n=1

λnApn ≤

(p

p− 1

)p N∑n=1

λnapn,

which is the desired assertion, since N was arbitrary.

Remark 1.3. The above proof of (1.16) could be shortened signicantly. In-deed, as the reader immediately veries, the essential argumentation for the proofis contained in Step 3 only, where we have derived the crucial inductive argument.The preceding two steps have preparatory character and their purpose is to indicatehow the inductive step should look like.

Page 14: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

14 1. DYNAMIC PROGRAMMING

1.4. Integral inequalities

1.4.1. Description of the method. Fix d ≥ 1. Introduce a function Φ :(0, 1] × [0,∞) → Rd and, for each t ∈ (0, 1], the auxiliary functional Xt acting onnonnegative functions on (0, t] by the formula

(1.20) Xt(f) =1

t

∫ t

0

Φ(u, f(u))du, t ∈ (0, 1].

Suppose that all functionals Xt have equal range, denoted by D ⊆ Rd. Next, letF : (0, 1] × [0,∞) × D → [0,∞), G : D → [0,∞) be two given Borel functions.Suppose that we are interested in showing the estimate

(1.21)

∫ 1

0

F (t, f(t), Xt(f)) dt ≤ G(X1(f))

for all f . For instance, x p ∈ (1,∞) and let Φ(u, v) = (v, vp); then

Xt(f) =

(1

t

∫ t

0

f(u)du,1

t

∫ t

0

f(u)p du

)takes values in the set D = (x, y) ∈ [0,∞)2 : xp ≤ y. Now, if we additionallytake F (t, u, x) = F (t, u, x1, x2) = xp1 and G(x) = G(x1, x2) = cpx2, where cp is agiven constant, then (1.21) becomes the localized Hardy inequality∫ 1

0

(1

t

∫ t

0

f(u) du

)pdt ≤

∫ 1

0

fp(t) dt

(where by localized we mean that the integrals are taken over the interval [0, 1]; thenon-localized version would involve integration over the whole half-line [0,∞)).To give another important example, take Xt as above, but set F (t, u, x1, x2) = tαxq1and G(x, y) = cp,qx

q/p2 , where α, q and cp,q are some positive constants. Then (1.21)

becomes the localized Bliss' inequality∫ 1

0

tα(

1

t

∫ t

0

f(u)du

)qdt ≤ cp,q

(∫ 1

0

fp(t)dt

)q/p.

The inequality (1.21) can be studied with the use of a version of dynamicprogramming. Suppose there exists a family (Ut)t∈(0,1] of functions on D, satisfying

1 (Nonnegativity) For any t ∈ (0, 1] and x ∈ D, we have Ut(x) ≥ 0.2 (Majorization) For any x ∈ D we have U1(x) ≤ G(x).3 (Monotonicity) For any nonnegative f on (0, 1], the function

t 7→ Ut(Xt(f)) +

∫ 1

t

F (u, f(u), Xu(f))du

is nondecreasing on (0, 1].

Lemma 1.1. If there exists a family satisfying 1, 2 and 3, then (1.21) holds.

Proof. Fix a nonnegative function f on (0, 1]. Applying the majorizationcondition 2 and then the monotonicity property 3, we get

G(X1(f)) ≥ U1(X1(f)) = U1(X1(f)) +

∫ 1

1

F (u, f(u), Xu(f))du

≥ Ut(Xt(f)) +

∫ 1

t

F (u, f(u), Xu(f))du

Page 15: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.4. INTEGRAL INEQUALITIES 15

for any t ∈ (0, 1). Combining this with the nonnegativity condition 1, we obtain∫ 1

t

F (u, f(u), Xu(f))du ≤ G(X1(f))

for any t. Letting t→ 0 yields the desired bound, by virtue of Fatou's lemma.

A beautiful fact is that the implication of Lemma 1.1 can be reversed. For anyT ∈ (0, 1], consider the function BT : D→ R given by

BT (x) = sup

∫ T

0

F (t, f(t), Xt(f)) dt

,

where the supremum is taken over all f : (0, T ] → [0,∞) satisfying XT (f) = x.The family (BT )T∈(0,1] is the continuous version of the Bellman sequence (Bn)Nn=1

appearing in the preceding considerations.

Lemma 1.2. If the inequality (1.21) holds true, then the family (Bt)t∈(0,1] sat-ises the conditions 1, 2 and 3.

Proof. The property 1 follows from the fact that F is nonnegative, while themajorization 2 is a direct consequence of (1.21). Let us express 3 in a slightly moregeneral form, exhibiting the connection between this condition and the optimalityprinciple. Namely, we will show that for any 0 < S < T ≤ 1 we have

(1.22) BT (x) = sup

BS(XS(f)) +

∫ T

S

F (t, f(t), Xt(f)) dt

,

where the supremum is taken over all f : (0, T ]→ [0,∞) satisfying XT (f) = x. Tothis end, take an arbitrary f on (0, T ] satisfying XT (f) = x. Then, directly fromthe denition of BS applied to f |(0,S],∫ T

0

F (t, f(t), Xt(f)) dt =

∫ S

0

F (t, f(t), Xt(f)) dt+

∫ T

S

F (t, f(t), Xt(f)) dt

≤ BS(XS(f)) +

∫ T

S

F (t, f(t), Xt(f)) dt.

Taking the supremum over all f as above yields the inequality ≤ in (1.22). Toprove the reverse bound, take an arbitrary function f : (0, T ] → [0,∞) satisfyingXT (f) = x. Next, pick any f : (0, S] → [0,∞) such that XS(f) = XS(f), and letus splice f with f by the formula g = fχ(0,S] + fχ(S,T ]. Observe that if t ∈ (S, T ],then

Xt(g) =1

t

∫ t

0

Φ(u, g(u)) du =1

t

(∫ S

0

Φ(u, f(u)) du+

∫ t

S

Φ(u, f(u)) du

),

=1

t

(∫ S

0

Φ(u, f(u)) du+

∫ t

S

Φ(u, f(u)) du

)= Xt(f).

Consequently,

BT (x) ≥∫ T

0

F (t, g(t), Xt(g)) dt

=

∫ S

0

F (t, f(t), Xt(f))dt+

∫ T

S

F (t, f(t), Xt(f))dt,

Page 16: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

16 1. DYNAMIC PROGRAMMING

so taking the supremum over all f yields the inequality ≥ in (1.22) and completesthe proof.

Let us also record here the following minimality principle for the Bellmanfamily (Bt)t∈(0,1] constructed above.

Lemma 1.3. The Bellman family is the pointwise least family satisfying 1, 2

and 3.

Proof. We will prove that if a family (Ut)t∈(0,1] satises 1 and 3, then we

have the pointwise bound Bt ≤ Ut for all t ∈ (0, 1]. This will clearly yield the claim.For given T > 0 and x ∈ D, we take a nonnegative function f on (0, T ] satisfyingXT (f) = x and observe that

UT (x) = UT (XT (f)) +

∫ T

T

F (u, f(u), Xu(f))du

≥ Ut(Xt(f)) +

∫ T

t

F (u, f(u), Xu(f))du

for any t > 0. Since Ut(Xt(f)) ≥ 0, letting t→ 0 above yields

UT (x) ≥∫ T

0

F (u, f(u), Xu(f))du

and taking the supremum over all f as above gives BT (x) ≤ UT (x).

Summarizing the above discussion, the problem of showing (1.21) is reducedto that of nding a family satisfying 1, 2 and 3. While the rst two conditionsare simple pointwise estimates, the third property might look a little more dicult.However, under appropriate regularity requirements, one may dierentiate (withrespect to t) the expression on the right in 3, making it much much easier to handle.To state this observation precisely, let us introduce the function u(t, x) := Ut(x),dened for (t, x) ∈ (0, 1]×D.

Lemma 1.4. Assume that Φ is a continuous function. Furthermore, supposethat u is continuous, of class C1 in the interior of its domain and satises

(1.23) ut(t, x) +

⟨∇xu(t, x),

Φ(t, d)− xt

⟩− F (t, d, x) ≥ 0

for any (t, x) ∈ (0, 1]×D and d ≥ 0. Then 3 holds true for any continuous f .

Proof. This follows from a direct dierentiation of the function

t 7→ Ut(Xt(f)) +

∫ 1

t

F (u, f(u), Xu(f))du.

Therefore, if the assumptions of the above lemma are satised, then the esti-mate (1.21) holds true for all continuous functions f . In most cases, the passageto general f is carried out by straightforward approximation arguments and theanalysis is complete.

Let us present several examples illustrating the above approach.

Page 17: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.4. INTEGRAL INEQUALITIES 17

1.4.2. A weak-type estimate. We will prove the following statement.

Theorem 1.7. Let p ∈ [0, 1). Then for any f ∈ L1(0,∞) and any λ > 0 wehave

(1.24) λ

∣∣∣∣t > 0 : tp−1

∫ t

0

|f(u)|du ≥ λ∣∣∣∣1−p ≤ ∫ ∞

0

|f(u)|du.

Proof. Clearly, we may restrict ourselves to nonnegative functions and, byhomogeneity, we may assume that λ = 1. It is enough to show that for any T > 0,

(1.25)

∣∣∣∣t ∈ (0, T ] : tp−1

∫ t

0

f(u)du ≥ 1

∣∣∣∣1−p ≤ ∫ T

0

f(u) ds.

However, if we x T and substitute t = Ts, then the above inequality becomes∣∣∣∣s ∈ (0, 1] : sp−1

∫ s

0

f(u)du > 1

∣∣∣∣1−p ≤ ∫ 1

0

f(u)du,

with the new function f(u) := T pf(Tu), u ∈ (0, 1]. Hence, the validity of (1.25)for all T > 0 is actually equivalent to the validity of the single estimate withT = 1. We can rewrite this bound in the form (1.21) as follows. First, we set

Φ(u, v) = v ∈ [0,∞) so that Xt(f) = 1t

∫ t0f(u)du ∈ D := [0,∞). Next, take

F (t, u, x) = χtpx>1 and G(x) = x1/(1−p).

The associated Bellman family is given by

Bt(x) = sup

∣∣∣∣s ∈ (0, t] : sp−1

∫ s

0

f(u)du > 1

∣∣∣∣ , t ∈ (0, 1],

where the supremum runs over all f ≥ 0 satisfying Xt(f) = x. It is easy tond the above supremum: it is clear that we should consider only those functions,which have very small support concentrated near 0 (for instance, of the form f (a) =xaχ[0,a], for some a ∈ (0, t]). Computing the integral over these functions and lettinga→ 0 leads to the following candidate for B:

Ut(x) = min

(tx)1/(1−p), t.

Of course, for any t ∈ (0, 1] we have Ut ≤ Bt directly from the denition of theBellman sequence. To show the reverse bound, it is enough to verify the condition3 (see Corollary 1.3). To this end, observe rst that for any x ≥ 0, t 7→ Ut(x) isnondecreasing. Next, pick 0 ≤ t < u ≤ 1; we must show that for any f on (0, u],

min

(uXu(f))1/(1−p), u−min

(tXt(f))1/(1−p), t

≥∫ u

t

χspXs(f)>1ds.

Fix t and look at the behavior of both sides as u increases. The right-hand sideincreases on the open set A = s ∈ (t, u) : spXs(f) > 1 only and its derivative isequal to 1 there. On the other hand, the left-hand side is a nondecreasing function,and we have min

(sXs(f))1/(1−p), s

= s on A, implying that the derivative of the

left-hand side on A is also equal to 1. This proves 3 and establishes the identityUt = Bt for all t. Since B1(x) ≤ G(x), the desired inequality follows.

Page 18: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

18 1. DYNAMIC PROGRAMMING

1.4.3. Hardy inequality in Lp. Our next example is the following.

Theorem 1.8. For any 1 < p <∞ and any f ∈ Lp(0,∞) we have

(1.26)

∫ ∞0

∣∣∣∣1t∫ t

0

f(u) du

∣∣∣∣p dt ≤(

p

p− 1

)p ∫ ∞0

|f(t)|p dt,

and the constant is the best possible.

In the proof of this result, we will need the following auxiliary fact.

Lemma 1.5. There is an increasing continuous function ϕ : [1,∞)→ R, satis-fying the dierential equation

(1.27) (p− 1)

[1−

(sϕ′(s)− ϕ(s)

ϕ′(s)

)1/(p−1)]

(sϕ′(s)− ϕ(s)) = 1

for s ∈ (1,∞) and the initial condition ϕ(1) = 1. Furthermore,

(1.28) ϕ(s) ≤(

p

p− 1

)ps for s ≥ 1.

Proof. Consider the function ψ : [1,∞)→ R given by

(1.29) ψ(s) = s

(1− 1

p+

1

ps

)p.

One easily veries that this function is smooth, strictly increasing and maps [1,∞)onto itself. Let ϕ be the inverse to ψ; then ϕ(1) = 1 and we have

(1.30) s =

(1− 1

p+

1

pϕ(s)

)pϕ(s).

This, by a direct dierentiation, yields

1 = ϕ′(s)

(1− 1

p+

1

pϕ(s)

)p− ϕ′(s)

ϕ(s)

(1− 1

p+

1

pϕ(s)

)p−1

,

or, equivalently,

(1.31)1

ϕ′(s)=p− 1

p

(1− 1

p+

1

pϕ(s)

)p−1(1− 1

ϕ(s)

).

Multiply both sides by ϕ(s) and subtract the obtained equality from (1.30). Weget

(1.32) s− ϕ(s)

ϕ′(s)=

(1− 1

p+

1

pϕ(s)

)p−1

and hence (1.31) can be rewritten in the form

1

ϕ′(s)= (p− 1)

(s− ϕ(s)

ϕ′(s)

)[1−

(s− ϕ(s)

ϕ′(s)

)1/(p−1)],

which is the desired dierential equation (1.27). To show (1.28), note that(ϕ(s)

s

)′=ϕ′(s)s− ϕ(s)

s2≥ 0,

Page 19: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.4. INTEGRAL INEQUALITIES 19

where the latter bound comes from (1.32) (and the estimate ϕ′(s) > 0, which is aconsequence of the strict monotonicity of ψ). Thus, (1.28) follows at once from

lims→∞

ϕ(s)

s= lims→∞

s

ψ(s)=

(p

p− 1

)p.

This nishes the proof.

A direct proof of (1.26). Suppose that cp is the smallest constant such that∫ ∞0

∣∣∣∣1t∫ t

0

f(u) du

∣∣∣∣p dt ≤ cp∫ ∞

0

|f(t)|p dt

for all f ∈ Lp(0,∞). Clearly, we may restrict ourselves to nonnegative and contin-uous functions (having established the above bound for such functions, we easilyget the general case by standard density arguments). Using a similar truncation asin the proof of the previous estimate, we see that it is enough to show that

(1.33)

∫ 1

0

(1

t

∫ t

0

f(u) du

)pdt ≤ cp

∫ 1

0

fp(t) dt,

i.e., we may deal with functions on (0, 1] only. As we have explained above, if wetake Φ(u, v) = (v, vp), then

Xt(f) =

(1

t

∫ t

0

f(u)du,1

t

∫ t

0

fp(u)du

)∈ D = (x, y) ∈ [0,∞)2 : xp ≤ y

and the choice F (t, x1, x2) = xp1, G(x1, x2) = cpx2 corresponds to (1.33). Writedown the formula for the corresponding Bellman family:

Bt(x, y) = sup

∫ t

0

(1

s

∫ s

0

f(r)dr

)pds

,

where the supremum is taken over all nonnegative f satisfying Xt(f) = (x, y).Directly from this denition we extract several structural properties of the family.First, observe that Xt(f) = (x, y) if and only if the dilated function f (T ) : u 7→f(Tu) satises XTt(f

(T )) = (x, y). Thus, substituting in the integrals dening theBellman family, we see that

Bt(x, y) = t sup

∫ 1

0

(1

s

∫ s

0

f (t)(r)dr

)kds

= tB1(x, y).

Another important property comes from the observation that for any λ > 0, wehave Xt(f) = (x, y) if and only if Xt(λf) = (λx, λpy). This implies Bt(λx, λ

py) =λpBt(x, y), and hence, taking λ = 1/x, we obtain

Bt(x, y) = tB1(x, y) = txpB1(1, y/xp)

for all (x, y, t). Denote ϕ(s) = B1(s): all we need is to nd the explicit formulafor ϕ (there is no abuse of notation: as we will see shortly, ϕ is precisely thefunction studied in Lemma 1.5 above). Here the reasoning will be a little informal:we will obtain the desired function after imposing some (reasonable) assumptions.The key indication for the search is contained in the inequality (1.23). Pluggingu(t, x, y) = Bt(x, y) there, we get

xpϕ(y/xp) + p (xpϕ(y/xp)− yϕ′(y/xp))(d

x− 1

)+ ϕ′(y/xp)(dp − y)− xp ≥ 0

Page 20: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

20 1. DYNAMIC PROGRAMMING

for all d ≥ 0. Divide both sides by xp and substitute s = y/xp, r = d/x, obtaining

(1.34) ϕ(s) + p(ϕ(s)− sϕ′(s))(r − 1) + ϕ′(s)(rp − s)− 1 ≥ 0.

This inequality should hold for all r ≥ 0, if we want our method to work. Supposefor a while that ϕ′(s) > 0 (when we have found the candidate for ϕ, we will verifythat this condition is satised). Then, optimizing over r, we see that the aboveinequality yields

(p− 1)

[1−

(s− ϕ(s)

ϕ′(s)

)1/(p−1)]

(sϕ′(s)− ϕ(s))− 1 ≥ 0.

Since we are studying the sharp estimate, it is natural to expect that we actuallyshould have equality here. This leads us to the dierential equation (1.27). Toget the initial condition for ϕ(1) = B1(1, 1), note that there is only one continuousfunction f for which X1(f) = (1, 1), the constant function f ≡ 1. By the denitionof B1, we get ϕ(1) = 1 and hence ϕ is the function constructed in Lemma 1.5 (inparticular, we have ϕ′ > 0 which we needed above).

The above reasoning has brought us the candidate Ut given by Ut(x, y) =txpϕ(y/xp), t ∈ (0, 1]. By the very construction, we see that the family (Ut)t∈(0,1]

satises 1 and 3. Furthermore, (1.28) shows that 2 is also true, if we only assumethat cp ≥ (p/(p − 1))p. Hence, we set cp = (p/(p − 1))p: the method implies thatthe inequality (1.33), and hence also the inequality of Theorem 1.8, are valid.

Sharpness. A beautiful feature of the approach is that it also gives an indi-cation about the extremal functions in (1.26). The idea is very simple: we wantto nd an f such that all the intermediate inequalities appearing during the appli-cation of the method become equalities. Let us be more specic. As we alreadyknow, for a given smooth f : (0, 1]→ [0,∞), the function

t 7→ B(Xt(f), Yt(f), t) +

∫ 1

t

F (Xs(f), Ys(f), s)ds

is nondecreasing. This, by the direct dierentiation, gives (1.34) with the parame-ters s = Yt(f)/Xk

t (f) and r = f(t)/Xt(f). Then the optimization over r leads to(1.27): the extremal choice for r is (s − ϕ(s)/ϕ′(s))1/(p−1), as one easily veries.This, in the light of (1.32), is equal to 1− p−1 + (pϕ(s))−1, or just (s/ϕ(s))1/p, bythe denition of the function ψ. Since the extremal function f should correspondto this extremal choice of r, we should have

(1.35)f(t)

Xt(f)=

(Yt(f)/Xpt (f))1/p

ϕ(Yt(f)/Xkt (f))1/p

,

or ϕ(Yt(f)/Xpt (f)) = Yt(f)/fp(t). Plugging this into the denition (1.29) of ψ and

working a little bit, we get

f(t)

Xt(f)= 1− 1

p+

fp(t)

pYt(f),

orf(t)∫ t

0f(s)ds

=

(1− 1

p

)t−1 +

fp(t)

p∫ t

0fp(s)ds

.

The latter inequality means that the function

t 7→ log

[∫ t

0

f(s)ds

]−(

1− 1

p

)log t− 1

plog

[∫ t

0

fp(s)ds

]

Page 21: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.4. INTEGRAL INEQUALITIES 21

is constant. Equivalently, we have Yt(f)/Xpt (f) = c for some constant c. Coming

back to (1.35), we see that f and X·(f) should be proportional: f(t) = CXt(f),

where C = c1/p/ϕ(c)1/p. This, in turn, implies tf(t) = C∫ t

0f(s)ds and hence

f(t) = atC−1 for some constant a. One easily veries that this candidate indeedyields the sharpness of (1.33) and hence also (1.26): it suces to take a = 1 and Cclose to (but a little larger than) 1− 1/p.

A simpler proof of Theorem 1.8. The above direct application of the me-thod may look a little complicated, but fortunately one can simplify the argumenta-tion. It turns out that a proper reformulation of the inequality leads us to Bellmanfunctions of one variable only. To see this, rewrite (1.33) in the form∫ 1

0

(1

t

∫ t

0

f(u) du

)p− cpfp(t)

dt ≤ 0.

Take Φ(u, v) = v: then Xt(f) = 1t

∫ t0f(u)du ∈ D := [0,∞), and the choice

F (t, u, x) = xp − cpup, G(x) ≡ 0 transforms (1.21) into the above bound. The

evident obstacle we face here is that the function F may take negative values; onthe other hand, in the above abstract discussion on the method we have used theinequality F ≥ 0 several times. However, as we will see, we can modify the argu-ments at appropriate places and make the technique applicable also in this moregeneral setting.

So, let us proceed as above and introduce the associated Bellman family by

BT (x) = sup

∫ T

0

(1

t

∫ t

0

f(u)du

)p− cpfp(t) dt

, x ≥ 0, T ∈ (0, 1],

where the supremum is taken over all nonnegative and continuous functions f on

(0, T ] satisfying 1T

∫ T0f(u)du = x. Arguing as above, we show the following struc-

tural properties of (Bt)t∈(0,1]:

BT (x) = TB1(x) for all x ≥ 0 and T ∈ (0, 1],

and

BT (x) = xpBT (1) for all x ≥ 0 and T ∈ (0, 1].

Putting these two observations together, we see that

(1.36) BT (x) = −αTxp,

for some constant α depending only on p; we have put the minus sign above, sinceB should be nonpositive, if the inequality (1.33) is to hold. Let us try to verify thevalidity of (1.23). One easily computes that it reads

−αpxp−1(d− x) + (−α− 1)xp + cpdp ≥ 0.

For a xed x, the left-hand side, considered as a function of d, attains its minimalvalue for d satisfying d = (α/cp)

1/(p−1)x. Plugging this above, we see that α mustsatisfy the equation

(1− p)αp/(p−1)

c1/(p−1)p

+ α(p− 1)− 1 ≥ 0.

Page 22: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

22 1. DYNAMIC PROGRAMMING

The left-hand side, considered as a function of α, attains its maximum for α =(pp−1

)1−pcp, and this maximal value is equal to

(pp−1

)−pcp − 1. Therefore, sum-

marizing the above calculations, if we take cp =(

pp−1

)p, then there is α (equal

to p/(p − 1)) such that the family (BT )T∈(0,1], given by (1.36), satises (1.23).Consequently, arguing as in the proof of Lemma 1.1, we get

0 = G(X1(f)) ≥ Bt(Xt(f)) +

∫ 1

t

F (u, f(u), Xu(f))du

= −αt(

1

t

∫ t

0

f(u)du

)p+

∫ 1

t

(1

s

∫ s

0

f(u)du

)p− cpfp(s)

ds.

The point is that we cannot remove the term −αt(

1t

∫ t0f(u)du

)p, since it is neg-

ative; furthermore, letting t → 0 in the second term is also problematic, since theintegrand takes negative values. However, these obstacles are easily removed. Werewrite the inequality in the form

(1.37) cpp

∫ 1

t

fp(u)du ≥ −αt(

1

t

∫ t

0

f(u)du

)p+

∫ 1

t

(1

s

∫ s

0

f(u) du

)pds

and let t→ 0. Then∫ 1

tfp(u)du→

∫ 1

0fp(u)du and, by Hölder's inequality,

t

(1

t

∫ t

0

f(u)du

)p≤∫ t

0

fp(u)du→ 0.

Finally, the integrand in the last integral in (1.37) is nonnegative. Therefore, Fatou'slemma yields (1.33).

1.5. Problems

1. Prove that for any n ≥ 1 and any numbers a1, a2, . . ., an ∈ [0, 1] we have

(1− a1)(1− a2) . . . (1− an) ≥ 1− a1 − a2 − . . .− an.

2. Prove that for any n ≥ 1 and any real numbers a1, a2, . . ., an we have

a1a2 . . . an ≤a2

1

2+a4

2

4+a8

3

8+ . . .+

a2n

n

2n+

1

2n.

3. Show that for any n ≥ 2 and any positive numbers a1, a2, . . ., an we haven∑k=1

1

1 + ak≥ min

1,

n

1 + (a1a2 . . . an)1/n

.

4. (J. V. Whittaker) Nonnegative numbers a1, a2, . . ., aN satisfy the condition∑Nn=1(1 + an)−1 ≤ a. Find the maximal value of the expression

∑Nn=1 2−an .

5. Prove that for any positive numbers a1, a2, . . ., an satisfying a1 + a2 + . . .+an < 1 we have

a1a2 . . . an(1− (a1 + a2 + . . .+ an)

)(a1 + a2 + . . .+ an)(1− a1)(1− a2) . . . (1− an)

≤ 1

nn+1.

6. Prove that for any positive numbers a1, a2, . . ., an satisfying a1a2 . . . an = 1we have

1

n− 1 + a1+

1

n− 1 + a2+ . . .+

1

n− 1 + an≤ 1.

Page 23: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

1.5. PROBLEMS 23

7. Prove that for any positive numbers a1, a2, . . ., an we have(1 +

1

a1

)(1 +

1

a2

). . .

(1 +

1

an

)≥(

1 +n

a1 + a2 + . . .+ an

)n.

8. Prove that for any positive numbers a1, a2, . . ., an, b1, b2, . . ., bn we have(a1 + a2 + . . .+ anb1 + b2 + . . .+ bn

)2

≤(a1

b1

)2

+

(a2

b2

)2

+ . . .+

(anbn

)2

.

9. (R. Bellman [1]) At any time, a particle can be in one of two states, calledS and T . At the beginning, the particle is in the state T . We perform a sequenceof operations of type A and L. If the operation A is used and the probability thatthe particle is in the state T is equal to x, then this probability decreases to ax(0 < a < 1 is a given constant). The operation L consists of observing the particleand tells us denitely which state it is in. Given the strategy, let τ be the numberof operations after which the particle is observed in S (with certainty). Find thestrategy giving the minimal expectation of τ .

10. (E. Cerri, M. Silvestri and S. Villan) Let g : [1,∞) → R be a continuousfunction and let

Vn(x) = inf

g(a1) +

g(a2)

a1+g(a3)

a1a2+ . . .+

g(an)

a1a2 . . . an−1

,

where the inmum is taken over all a1, a2, . . ., an ≥ 1 satisfying a1a2 . . . an = x.Show that for any n ≥ 1 we have

Vn+1(x) = inft≥1

g(t) +

1

tVn

(xt

).

Solve the problem explicitly for g(y) = yp, p > 0.

11. (T. Carleman [4]) Using dynamic programming, prove that for any nonnneg-ative numbers a1, a2, . . . we have the sharp estimate

∞∑n=1

(a1a2 . . . an)1/n ≤ e∞∑n=1

an.

12. (G. H. Hardy, J. E. Littlewood, G. Pólya [10]) Using dynamic program-ming, prove that if 0 < p < 1 and a1, a2, . . . is a sequence of nonnegative numberssuch that

∑∞n=1 a

pn <∞, then we have the sharp estimate(

1 +1

1− p

)(a1 + a2 + . . .

1

)p+

∞∑n=2

(an + an+1 + . . .

n

)p>

(p

1− p

)p ∞∑n=1

apn.

13. (F. Carlson [5]) Prove that for any f : [0,∞)→ R we have∫ ∞0

|f(t)|dt ≤√π

(∫ ∞0

f2(u)du

)1/4(∫ ∞0

f2(u)u2du

)1/4

.

Page 24: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a
Page 25: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

CHAPTER 2

Optimal stopping

2.1. Discrete time

2.1.1. Martingale approach: description. Suppose that (Ω,F ,P) is agiven probability space, equipped with a ltration (Fn)n≥0, i.e., a nondecreas-ing family of sub-σ-algebras of F . Let G = (Gn)n≥0 be an adapted sequence ofrandom variables, i.e., we assume that for each n ≥ 0 the random variable Gnis measurable with respect to Fn. Our objective will be to stop this sequence sothat the expected return is maximized. The stopping procedure is described by arandom variable τ : Ω → 0, 1, 2, . . ., which returns the value of the time whenthe sequence (Gn)n≥0 should be stopped. Clearly, a reasonable procedure decideswhether to stop the sequence at time n or not based on the observations up to timen; this means that for each n we have

τ = n ∈ Fn for each n ≥ 0.

The random variable τ satisfying the above condition will be called a stopping time.Let us put the discussion into a more precise framework. The optimal stopping

problem concerns the study of

(2.1) V0 = supτ

EGτ ,

where the supremum is taken over a certain family of adapted stopping times τ(which depends on the problem). We should point out that the study consists oftwo parts: (i) to compute the value V0 as explicitly as possible; (ii) to identify theoptimal stopping time τ∗ (or the family of almost-optimal stopping times) for whichthe supremum V0 is attained.

The rst problem we encounter concerns the existence of EGτ , to overcomewhich we need to impose some additional assumptions on G and τ . For example, if

(2.2) E supn≥0|Gn| <∞,

then the expectation EGτ is well dened for all stopping times τ . Another possi-bility is to restrict in (2.1) to those τ , for which the expectation exists. One wayor another, we should emphasize that in general, this obstacle is just a technicalitywhich is easily removed by some straightforward arguments (which might dependon the problem under the study). For the sake of simplicity and the clarity ofthe statements, we will assume that the condition (2.2) is satised, but it will beevident how to relax this requirement in other contexts.

So, let us assume that the supremum in (2.1) is taken over the familyM of allstopping times τ . A successful treatment of this problem requires the introduction,for each n ≤ N , the smaller family

MNn = τ ∈M : n ≤ τ ≤ N.

25

Page 26: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

26 2. OPTIMAL STOPPING

We will also write MN = MN0 and Mn = M∞n . These families give rise to the

related value functions

(2.3) V Nn = supτ∈MN

n

EGτ ,

and we will use the notation V N = V N0 , Vn = V∞n and V = V∞0 . The primarygoal of this section is to present the solution to (2.3) with the use of martingaleapproach.

2.1.2. Martingale approach: nite horizon. If N <∞ (the case of nitehorizon), then the problem (2.3) can be easily solved by means of the backwardinduction. Indeed, let us x a nonnegative integer N and try to inspect the valuefunctions as n decreases from N to 0. If n = N , then the class MN

n consists ofone stopping time τ ≡ N only and hence the optimal gain is equal to GN (andV NN = EGN ). If n = N − 1, then we have two choices for the stopping time: wecan either stop at time N − 1 or continue and stop at time N . In the rst case ourgain is GN−1; in the second case we do not know what the random variable GN willbe, so we can only say that on average, we will obtain E(GN |FN−1). Therefore, ifGN−1 ≥ E(GN |FN−1), we should stop immediately; otherwise, we should continue.For smaller values of n we proceed similarly. More precisely, dene recursively thesequence (BNn )0≤n≤N , representing the optimal gains at times 0, 1, 2, . . ., N , asfollows:

BNN = GN ,

BNn = maxGn,E(BNn+1|Fn)

, n = N − 1, N − 2, . . . .

(2.4)

The above discussion also suggests to consider the family of stopping times

(2.5) τNn = infk ∈ n, n+ 1, . . . , N : BNk = Gk

,

for n = 0, 1, 2, . . . , N .

Theorem 2.1. Suppose that N is a xed integer and the sequence G = (Gk)Nk=n

satises Emaxn≤k≤N |Gk| < ∞. Consider the optimal stopping problem (2.3) andthe sequence (BNk )Nk=n, dened by (2.4).

(i) The sequence (BNk )Nk=n is the smallest supermartingale majorizing (Gk)Nk=n.

In addition, the stopped sequence(BNτN

n ∧k

)Nk=n

is a martingale.

(ii) For any 0 ≤ n ≤ N we have, with probability 1,

BNn ≥ E(Gτ |Fn) for any τ ∈MNn ,(2.6)

BNn = E(GτNn|Fn).(2.7)

(iii) The stopping time τNn is optimal in (2.3) and any other optimal stoppingtime τ∗ satises τ∗ ≥ τNn almost surely.

Proof. (i) The supermartingale property and the majorization follow directlyfrom the denition of the sequence (BNn )Nn=0. If (BNn )Nn=0 is another supermartingalemajorizing (Gk)Nk=n, then the desired inequality BNk ≤ BNk almost surely, k =n, n+1, N+2, . . . , N , can be proved by backward induction. Indeed, the estimateis trivial for k = N (we have BNN = GN ≤ BNN , by the majorization property of B),and assuming its validity for k, we see that

BNk−1 ≥ maxGk−1,E(BNk |Fk−1) ≥ maxGk−1,E(BNk |Fk−1) = BNk−1.

Page 27: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.1. DISCRETE TIME 27

So, it remains to prove the martingale property of the stopped process(BNτN

n ∧k

)Nk=n

.

We compute directly that

E[BNτN

n ∧(k+1) | Fk]

= E[BNτN

n ∧(k+1)1τNn ≤k | Fk

]+ E

[BNτN

n ∧(k+1)1τNn >k | Fk

]= E

[BNτN

n ∧k1τN

n ≤k | Fk]

+ E[BNk+11τN

n >k | Fk]

= BNτNn ∧k

1τNn ≤k + 1τN

n >kE[BNk+1 | Fk

].

However, on the set τNn > k we have BNk > Gk and hence BNk = E(BNk+1|Fk).

This shows the identity E[BNτN

n ∧(k+1) | Fk]

= BNτNn ∧k

1τNn ≤k+1τN

n >kBNk = BNτN

n ∧kand part (i) is established.

(ii) This follows at once from (i) and Doob's optional sampling theorem.

(iii) Taking the expectations in (2.6) and (2.7) gives EGτ ≤ EBNn = EGτNn

for

all τ ∈MNn , showing that τ

Nn is indeed the optimal stopping time. Suppose that τ∗

is another optimal stopping time. Then BNτ∗ = Gτ∗ almost surely, since otherwisewe would have

EGτ∗ < EBNτ∗ ≤ EBNn = EGτNn,

where the second inequality follows from Doob's optional sampling theorem andthe supermartingale property of the sequence (BNk )Nk=n. The contradiction showsthat Bτ∗ and Gτ∗ must coincide, and clearly τNn is the smallest stopping time whichhas this property.

Remark 2.1. The following observation is an immediate consequence of theabove considerations. Namely, as we have just proved, the stopping time τN0 isoptimal for V N0 . If it was not optimal to stop at the instances 0, 1, . . ., n − 1,then the optimal policy coincides with that corresponding to the problem V Nn .This naturally brings us into the context of dynamic programming studied in theprevious chapter.

2.1.3. Martingale approach: innite horizon. The above method re-quired N to be a nite integer, since we have needed the variable GN to startthe backward recurrence. In the case N =∞ one could try to use approximation-type arguments (of the form V∞n = limN→∞ V Nn ), but these do not necessarily workin general, so we will proceed in a dierent manner. By (2.6) and (2.7) it seemstempting to write

BNn = supτ∈MN

n

E(Gτ |Fn).

However, two problems arise. The rst is that (2.6) and (2.7) hold true on a set offull measure only which might depend on the stopping time, so the above identitymight fail to hold. A second obstacle is that the supremum on the right need not beeven measurable. To overcome these diculties, a typical argument in the theoryof optimal stopping is to introduce the concept of essential supremum.

Definition 2.1. Let (Zα)α∈I be a family of random variables. Then there isa countable subset J of I such that the random variable Z = supα∈J Zα satisesthe following two properties:

(i) P(Zα ≤ Z) = 1 for each α ∈ I,

Page 28: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

28 2. OPTIMAL STOPPING

(ii) if Z is another random variable satisfying (i) in the place of Z, then

P(Z ≤ Z) = 1.

The random variable Z is called the essential supremum of (Zα)α∈I and is denotedby ess supα∈I Zα. In addition, if Zα : α ∈ I is upwards directed in the sense thatfor any α, β ∈ I there is γ ∈ I such that maxZα, Zβ ≤ Zγ , then the countableset J = α1, α2, . . . can be chosen so that Zα1

≤ Zα2≤ . . . and ess supα∈I Zα =

limn→∞ Zαn.

Now we see that (2.6) and (2.7) give the identity

(2.8) BNn = ess supτ∈MN

n

E(Gτ |Fn)

with probability 1. A nice feature of this alternative characterization of the sequence(BNn )Nn=0 is that it extends naturally to the setting of innite horizon (i.e., forN =∞) and, as we shall prove now, provides the desired solution.

So, consider the optimal stopping problem (2.3) for N =∞:

(2.9) Vn = supτ≥n

EGτ .

For n = 0, 1, 2, . . ., introduce the random variable

(2.10) Bn = ess supτ≥n

E(Gτ |Fn)

and the stopping time

(2.11) τn = infk ≥ n : Bk = Gk,

with the usual convention inf ∅ = ∞. In the literature, the sequence (Bn)n≥0 isoften referred to as the Snell envelope of G.

We will establish the following analogue of Theorem 2.1.

Theorem 2.2. Suppose that the sequence (Gn)n≥0 satises E supn≥0 |Gn| <∞and consider the optimal stopping problem (2.9). Then the following statements holdtrue.

(i) For any n ≥ 0 we have the recurrence relation

Bn = max(Gn,E(Bn+1|Fn)).

(ii) We have P(Bn ≥ E(Gτ |Fn)) = 1 for all τ ∈ Mn and, if the stopping timeτn is nite almost surely, then P(Bn = E(Gτn |Fn)) = 1.

(iii) If P(τn < ∞) = 1, then τn is optimal in (2.9). Furthermore, if τ∗ isanother optimal stopping time for (2.9), then τn ≤ τ∗ almost surely.

(iv) The sequence (Bk)k≥n is the smallest supermartingale which majorizes(Gk)k≥n. Moreover, the stopped process (Bτn∧k)k≥n is a martingale.

Proof. We will only establish (i), the other parts can be shown with theargumentation similar to that used in the proof of Theorem 2.1. We need to showtwo inequalities to prove the identity. Take τ ∈Mn and let τ ′ = τ ∨ (n+ 1). Then

Page 29: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.1. DISCRETE TIME 29

τ ′ ∈Mn+1 and since τ ≥ n+ 1 ∈ Fn, we may write

E(Gτ |Fn) = E(Gτ1τ=n|Fn) + E(Gτ1τ≥n+1|Fn)

= 1τ=nGn + 1τ≥n+1E(Gτ ′ |Fn)

= 1τ=nGn + 1τ≥n+1E[E(Gτ ′ |Fn+1)|Fn

]≤ 1τ=nGn + 1τ≥n+1E(Bn+1|Fn)

≤ maxGn,E(Bn+1|Fn).

This proves the inequality ≤. To show the reverse, observe that the familyE(Bτ |Fn+1) : τ ∈ Mn+1 is upwards directed. Indeed, if α, β ∈ Mn+1 andwe set γ = α1A + β1Ω\A, where A = E(Gα|Fn+1) ≥ E(Gβ |Fn+1), then γ is astopping time belonging toMn+1 and

E(Gγ |Fn+1) = E(Gα1A +Gβ1Ω\A|Fn+1)

= 1AE(Gα|Fn+1) + 1Ω\AE(Gβ |Fn+1)

= maxE(Gα|Fn+1),E(Gβ |Fn+1)

.

Therefore, there is a sequence σk : k ≥ 1 inMn+1 such that

ess supτ∈Mn+1

E(Gτ |Fn+1) = limk→∞

E(Gσk|Fn+1)

and E(Gσ1 |Fn+1) ≤ E(Gσ2 |Fn+1) ≤ . . . with probability 1. Now we can write, byLebesgue's monotone convergence theorem,

E(Bn+1|Fn) = E

(ess supτ∈Mn+1

E(Gτ |Fn+1)

∣∣∣∣Fn)

= E(

limk→∞

E(Gσk|Fn+1)

∣∣∣∣Fn)= limk→∞

E(E(Gσk

|Fn+1)|Fn)

= limk→∞

E(Gσk|Fn) ≤ Bn.

Since Bn ≥ Gn (which can be trivially obtained by considering τ ≡ n in thedenition of Bn), we get the desired identity.

In the remaining part of this subsection, let us inspect the connection betweenthe contexts of nite and innite horizons. One easily checks that the randomvariables BNn and τNn do not decrease as we increase N . Consequently, the limits

B∞n := limN→∞

BNn and τ∞n := limN→∞

τNn

exist on a set of full measure. Furthermore, we also see that the sequence (V Nn )∞N=n

is nondecreasing, so the quantity V∞n = limN→∞ V∞n is well-dened. Now it followsdirectly from (2.5), (2.8), (2.10) and (2.11) that

(2.12) B∞n ≤ Bn and τ∞n ≤ τnalmost surely. Therefore, we also have

(2.13) V∞n ≤ Vn.

Theorem 2.3. Suppose that E supn≥0 |Gn| <∞ and consider the optimal stop-ping problems (2.3) and (2.9). Then equalities hold in (2.12) and (2.13).

Page 30: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

30 2. OPTIMAL STOPPING

Proof. Letting N →∞ in the recurrence relation (2.4) yields

B∞n = maxGn,E(B∞n+1|Fn), n = 0, 1, 2, . . . ,

by Lebesgue's monotone convergence theorem. Consequently, (B∞n )n≥0 is an adaptedsupermartingale dominating (Gn)n≥0. Thus B∞n ≥ Bn for each n, by the fourthpart of the preceding theorem. This shows the identity B∞ = B almost surely, andthe remaining equalities follow immediately.

Example 2.1. Strict inequalities may hold in (2.12) and (2.13) if the integra-bility condition on supn≥0 |Gn| is not imposed. To see this, let ε0, ε1, ε2, . . . bea sequence of independent Rademacher variables and set Gn = ε0 + ε1 + . . . + εn.Then the process (Gn)n≥0 is a martingale with respect to the natural ltration, soV Nn = 0, BNn = Gn and τNn = n for all 0 ≤ n ≤ N < ∞. Consequently, theseidentities are preserved in the limit: V∞n = 0, B∞n = Gn and τ∞n = n for all n. Onthe other hand, it is well-known that for any positive integer a, the stopping timeσn = infk ≥ n : Gk = a is nite almost surely and hence Vn ≥ EGσn = a. Sincea was arbitrary, we get Vn =∞, Bn =∞ and τn =∞ with probability 1.

2.1.4. An application: a dierence prophet inequality for bounded

random variables. We take the opportunity to present here some informationabout the so-called prohet inequalities, a distinguished class of estimates arisingin the theory of optimal stopping. As in the preceding section, we assume thatwe are given a sequence G (nite or innite) of random variables adapted to someltration. The idea is as follows: under some boundedness condition on G, nduniversal inequalities which compare M = E supnGn, the expected supremum ofthe sequence, with V = supτ EGτ , the optimal stopping value of the sequence; hereτ runs over the class of all nite stopping times adapted to the underlying ltration.The term prophet inequality arises from the optimal-stopping interpretation ofM ,which is the optimal expected return of a player endowed with complete foresight;this player observes the sequence G and may stop whenever he wants, incurringa reward equal to the variable at the time of stopping. With complete foresight,such a player obviously stops always when the largest value is observed, and on theaverage, his reward is equal to M . On the other hand, the quantity V correspondsto the optimal return of the non-prophet player.

The literature on the subject is quite large (see the bibliographic details at theend of this chapter). We will mention only two results here. The rst result in thisdirection is the estimate of Krengel, Sucheston and Garling, which asserts that ifG1, G2, . . . are independent and nonnegative, then we have

M ≤ 2V

and the constant 2 is the best possible. Sometimes such estimates are called ratioprophet inequalities, as they provide an ecient upper bound for the ratio M/V .The above estimate is left as an exercise: see Problem 2.3 below. Instead, we willstudy the following related dierence prophet inequality.

Theorem 2.4. If G1, G2, . . . are independent random variables taking valuesin an interval [a, b], then

M ≤ V +b− a

4,

and the constant (b− a)/4 cannot be decreased.

Page 31: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.1. DISCRETE TIME 31

Note that it suces to show the claim for a = 0 and b = 1 only, by simpletranslation and dilation arguments. The proof of the above statement will involvea certain transformation of random variables, which we describe now. Namely, ifY is an arbitrary integrable random variable (taking values in R) and a < b, thenwe dene

Y ba (ω) =

Y (ω) if Y (w) /∈ [a, b],

a or b if Y (w) ∈ [a, b],

with

P(Y ba = a) =1

b− aE(b− Y )1Y ∈[a,b], P(Y ba = b) =

1

b− aE(Y − a)1Y ∈[a,b].

The probabilities P(Y ba = a), P(Y ba = b) are uniquely determined by the requirementEY ba = EY . We will need the following two simple lemmas.

Lemma 2.1. If X is a random variable independent of Y and Y ba , then

EmaxX,Y ≤ EmaxX,Y ba .

Proof. By the convexity of the function y 7→ maxx, y (where x ∈ R isxed), we get

Emaxx, Y 1Y ∈[a,b]

≤ 1

b− aE(b− Y )1Y ∈[a,b]maxx, a+

1

b− aE(Y − a)1Y ∈[a,b]maxx, b.

Therefore, since X and Y are independent, we get

EmaxX,Y 1Y ∈[a,b]

≤ 1

b− aE(b− Y )1Y ∈[a,b]maxX, a+

1

b− aE(Y − a)1Y ∈[a,b]maxX, b.

Thus we obtain

EmaxX,Y = EmaxX,Y 1Y /∈[a,b] + EmaxX,Y 1Y ∈[a,b]

≤ EmaxX,Y ba 1Y /∈[a,b] +1

b− aE(b− Y )1Y ∈[a,b]maxX, a

+1

b− aE(Y − a)1Y ∈[a,b]maxX, b

= EmaxX,Y ba .

In the second lemma, we will use the notation

D(G1, G2, . . . , Gn) = E max1≤k≤n

Gk − sup1≤τ≤n

EGτ .

The purpose of the statement below is to show that for any sequence of n > 2random variables there is a sequence of n− 1 random variables oering at least aslarge an additive advantage to the prophet.

Lemma 2.2. For any n ≥ 2 and any independent random variables G1, G2,. . ., Gn with values in [0, 1], there is a random variable W with values in 0, 1independent of G2, . . ., Gn−2, satisfying

D(G1, G2, . . . , Gn) ≤ D(µ,G2, . . . , Gn−2,W ),

where µ = sup2≤τ≤n EGτ .

Page 32: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

32 2. OPTIMAL STOPPING

Proof. By the theory of optimal stopping, we know that

V n1 = V n1 (G1, G2, . . . , Gn) = EmaxG1,EmaxG2, . . .= Emaxµ,EmaxG2, . . .+ E(G1 − µ)+

(note that the rst summand in the last expression is just Emaxµ, µ = µ, butwe write it in the above more complicated form). Furthermore,

EmaxG1, G2, . . . , Gn ≤ Emaxµ,G2, . . . , Gn+ E(G1 − µ)+,

and hence D(G1, G2, . . . , Gn) ≤ D(µ,G2, . . . , Gn). Now consider the random vari-ables Y = (Gn−1)1

EGnand Z = (Gn)1

0, independent of each other and of the se-quence G2, G3, . . ., Gn−2. We have

EmaxY,EZ = EmaxY,EGn = EmaxGn−1, Gnand hence V n(µ,G2, . . . , Gn) = Vn(µ,G2, . . . , Gn−2, Y, Z). By the previous lemma,

Emaxµ,G2, . . . , Gn ≤ Emaxµ,G2, . . . , Gn−2, Y, Z,soD(µ,G2, . . . , Gn) ≤ D(µ,G2, . . . , Gn−2, Y, Z). Take any random variableW inde-pendent ofG2, . . ., Gn−2 and satisfying P(W = 1) = EmaxY,EZ = 1−P(W = 0).We have EW = EmaxY,EZ and EZ = EXn ≤ µ, so

D(µ,G2, . . . , Gn−2, Y, Z) = D(µ,G2, . . . , Gn−2,W )

and the proof is complete.

Proof of Theorem 2.4. Clearly, it suces to show the claim for nite num-ber of variables. By the second lemma above, we may reduce this number to 2:D(G1, G2) ≤ 1

4 . However, the above proof shows that if we set µ = EG2, thenD(G1, G2) ≤ D(µ,Z), where P(Z = 1) = µ = 1−P(Z = 0). It suces to note that

D(µ,Z) = Emaxµ,Z − Emaxµ,EZ = µ · 1 + (1− µ) · µ− µ = µ− µ2 ≤ 1/4.

The equality holds for µ = 1/2, and the above considerations immediately yield theexample for which the dierence 1/4 is attained. Indeed, take the pair (G1, G2),where G1 ≡ 1/2 and the distribution of G2 is given by P(G2 = 0) = P(G2 = 1) =1/2.

2.2. Markovian approach

Throughout this section, we assume that X = (X0, X1, X2, . . .) is a Markovfamily dened on (Ω,F , (Fn)n≥0, (Px)x∈E), taking values in some topological space(E,B(E)). For the sake of simplicity, we will assume that E = Rd for some d ≥ 1,though the reasoning remains essentially the same for other topological spaces. Asusual, we assume that for each x ∈ E, we have X0 ≡ x Px-almost surely. Let usalso introduce the transition operator T of X, which acts by the formula

Tf(x) = Exf(X1) for x ∈ E,on the class I of all measurable functions f : E → R such that f(X1) is Px-integrablefor all x ∈ E.

Suppose that N is a nonnegative integer and let G : E → R be a measurablefunction satisfying

(2.14) Ex(

sup0≤n≤N

|G(Xn)|)<∞ for all x ∈ E.

Page 33: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.2. MARKOVIAN APPROACH 33

Consider the associated nite-horizon optimal stopping problem

(2.15) V N (x) = supExG(Xτ ),

where x ∈ E and the supremum is taken over all τ ∈ MN . Obviously, if wedene Gn = G(Xn) for n = 0, 1, 2, . . ., then for each separate x this problem isof the form considered in the preceding sections (with P and E replaced by Px andEx). However, the joint study of the whole family of optimal stopping problemsdepending on the initial value x enables the exploitation of the additional Markovianstructure of the sequence (Xn)n≥0.

For a given x, let us consider the random variables BNn and the stopping timesτNn , n = 0, 1, 2, . . . , N , dened by (2.4) and (2.5). We also introduce the sets

Cn = x ∈ E : V N−n(x) > G(x),Dn = x ∈ E : V N−n(x) = G(x),

for n = 0, 1, 2, . . . , N ; we will call these the continuation and stopping regions,respectively. Finally, dene the stopping time

τD = inf0 ≤ n ≤ N : Xn ∈ Dn.

Since V 0 = G, by the very denition (2.15), we see that XN ∈ DN and hence thestopping time τD is nite (it does not exceed N).

Theorem 2.5. Assume that the function G satises the integrability condition(2.14) and consider the optimal stopping problem (2.15).

(i) For any n = 0, 1, 2, . . . , N we have BNn = V N−n(Xn).(ii) The function x 7→ V n(x) satises the Wald-Bellman equation

(2.16) V n(x) = maxG(x), TV n−1(x), x ∈ E,

for n = 1, 2, . . . , N .(iii) The stopping time τD is optimal in (2.15). If τ∗ is another optimal stopping

time, then τD ≤ τ∗ Px-almost surely for all x ∈ E.(iv) For each x ∈ E, the sequence (V N−n(Xn))Nn=0 is the smallest Px-supermar-

tingale majorizing (G(Xn))Nn=0, and the stopped sequence(V N−n∧τD (Xn∧τD )

)Nn=0

is a Px-martingale.

Proof. We only need to establish (i) and (ii); the remaining parts follow atonce from Theorem 2.1. To verify (i), recall that

BNn = Ex[G(XτN

n)|Fn

]for all n = 0, 1, 2, . . . , N . This shows the claim for n = 0, by the very denitionof V N (x). On the other hand, for n ≥ 1 we apply the Markov property to get

BNn = Ey[G(XτN−n

0)] ∣∣∣∣y=Xn

= V N−n(y)∣∣y=Xn

= V N−n(Xn).

(ii) We apply the denition of the sequence (BNn )Nn=0 and part (i) to obtainthat Px-almost surely,

V N (x) = V N (X0) = BN0 = maxG(X0),Ex(BN1 |F0)= maxG(x),Ex(V N−1(X1)|F0)= maxG(x), TV N−1(x).

Page 34: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

34 2. OPTIMAL STOPPING

Part (ii) above gives the following iterative method of solving (2.15). Denethe operator Q acting on f ∈ I by the formula

Qf(x) = maxG(x), T f(x), x ∈ E.

Corollary 2.1. We have V N (x) = QNG(x) for all x ∈ E and all integers N .

Let us illustrate the above considerations by analyzing the following simpleexample.

Example 2.2. Let (Sn)n≥0 be a symmetric random walk over the space E =−2,−1, 0, 1, 2 stopped at −2, 2. Clearly, (Sn)n≥0 is a Markov family on E. SetG(x) = x2(x+ 2) and consider the optimal stopping problem

V N (x) = supτ≤N

ExG(Sτ ), x ∈ E.

To treat the problem successfully, we compute the sequence V 0, V 1, V 2, V 3, . . ..Directly from (2.16), we have

V n(x) = maxG(x), TV n−1(x)

=

maxG(x), V n−1(x) if x ∈ −2, 2,

max

G(x), 1

2

(V n−1(x− 1) + V n−1(x+ 1)

)if x ∈ −1, 0, 1.

For notational simplicity, let us identify a function f : E → R with the sequence ofits values f(−2), f(−1), f(0), f(1), f(2). Using the above recurrence, we computethat

V 0 = G : 0, 1, 0, 3, 16,V 1 : 0, 1, 2, 8, 16,V 2 : 0, 1, 4 1

2 , 9, 16,V 3 : 0, 2 1

4 , 5 10 14 , 16,

V 4 : 0, 2 12 , 6 1

4 , 10 12 , 16,

. . .

and so on. Suppose that we want to solve the problem

V 4(x) = supτ≤4

ExG(Sτ ), x ∈ E.

The value function V 4 has been derived above; to describe the optimal stoppingstrategy, let us write down the continuation and stopping regions Ci and Di, i =0, 1, 2, 3, 4. Directly from the above formulas for V i, we see that

C0 = −1, 0, 1, D0 = −2, 2,C1 = −1, 0, 1, D1 = −2, 2,C2 = 0, 1, D2 = −2,−1, 2,C3 = 0, 1, D3 = −2,−1, 2,C4 = ∅ D4 = −2,−1, 0, 1, 2.

The optimal strategy is to wait for the rst step n at which we visit the correspond-ing stopping set Dn; then we stop the process ultimately.

We turn our attention to the case of innite horizon, i.e., we consider theoptimal stopping problem (or rather a family of optimal stopping problems)

(2.17) V (x) = supExG(Xτ ), x ∈ E,

Page 35: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.2. MARKOVIAN APPROACH 35

where the supremum is taken over the classM of all adapted stopping times. Recallthat the class I consists of all measurable functions f : E → R such that f(X1)is Px-integrable for all x ∈ E. The following notion will be crucial in our furtherconsiderations.

Definition 2.2. The function f ∈ I is called superharmonic (or excessive) ifwe have

Tf(x) ≤ f(x) for all x ∈ E.

We have the following simple observation.

Lemma 2.3. The function f ∈ I is superharmonic if and only if (f(Xn))n≥0 isa supermartingale under each Px, x ∈ E.

Proof. If f is superharmonic, then by Markov property,

Ex(f(Xn+1)|Fn) = Eyf(X1)∣∣y=Xn

= Tf(Xn) ≤ f(Xn),

for each n. To show the reverse implication, observe that if (f(Xn))n≥0 is a super-martingale under each Px, then in particular

Tf(x) = Ex(f(X1)|F0) ≤ f(x).

To formulate the main theorem, we introduce the corresponding continuationset C and stopping set D by

C = x ∈ E : V (x) > G(x),D = x ∈ E : V (x) = G(x).

Moreover, we dene the stopping time τD = infn : Xn ∈ D. In contrast to thecase of nite horizon, this stopping time need not be nite (which will force us toimpose some additional assumptions: see the statement below).

Theorem 2.6. Consider the optimal stopping problem (2.17) and assume that

(2.18) Ex supn≥0|G(Xn)| <∞, x ∈ E.

Then the following holds.(i) The function V satises the Wald-Bellman equation

(2.19) V (x) = maxG(x), TV (x).(ii) If τD is nite Px-almost surely for all x ∈ E, then τD is the optimal stopping

time. If τ∗ is another optimal stopping time, then τ∗ ≥ τD Px-almost surely.(iii) The value function V is the smallest superharmonic function which ma-

jorizes the gain function G on E.(iv) The stopped sequence (V (XτD∧n))n≥0 is a Px-martingale for every x ∈ E.

Proof. This follows immediately from the case of nite horizon and the limitTheorem 2.3.

Let us make here an important comment on the uniqueness of the solutions tothe Wald-Bellman equations (2.16) and (2.19). Clearly, in the case of nite horizonthere is only one solution: indeed, the starting function V 0 coincides with G andthe formula (2.16) produces a unique sequence V 1, V 2, . . . , V N . In the case ofinnite horizon, the situation is less transparent. For instance, if G is a constantfunction, say, G ≡ c, then any constant function V ≡ c′ for some c′ ≥ c satises

Page 36: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

36 2. OPTIMAL STOPPING

the Wald-Bellman equation. However, any solution to (2.19) is a superharmonicfunction majorizing G, so part (iii) of Theorem 2.6 immediately yields the followingminimality principle.

Corollary 2.2. The value function V is the minimal solution to (2.19).

Example 2.3. Let us provide solution to the innite-horizon version of Exam-ple 2.2. Under the notation used there, we study the optimal stopping problem

V (x) = supτ∈M

EG(Sτ ), x ∈ E.

The function G is bounded, so the integrability assumption of Theorem 2.6 issatised. Thus, we know that V is the least superharmonic function which majorizesG: here the superharmonicity means that

V (x) ≥ 1

2

(V (x− 1) + V (x+ 1)

), for x ∈ −1, 0, 1.

In other words, we search for the smallest concave function on −2,−1, 0, 1, 2majorizing the function G. One easily checks that the function x 7→ 4(x + 2) isconcave (since it is linear), majorizes G and coincides with G at the endpoints ±2.Thus it is the smallest majorant of G and hence it must be equal to the valuefunction V .

Example 2.4. Let ξ1, ξ2, . . . be a sequence of i.i.d. random variables with thedistribution given by P(ξi = 1) = p, P(ξi = −1) = q, where p + q = 1 and p < q.For a given integer x, dene Sn = x+ ξ1 + ξ2 + . . .+ ξn, n = 0, 1, 2, . . .. Then thesequences (Sn)n≥0 (with varying x) form a Markov family. Consider the optimalstopping problem

V (x) = supτ∈M

ExS+τ , x ∈ E.

One easily checks the integrability assumption (2.18) (with G(x) = x+) is satised.This follows from the well-known fact that

(2.20) P(

supn≥0

(ξ1 + ξ2 + . . .+ ξn) ≥ k)

=

(p

q

)k, k = 0, 1, 2, . . . .

Thus, we need to nd the least superharmonic majorant ofG: V is the least functionon Z satisfying

V (x) = maxx+, pV (x+ 1) + qV (x− 1)

, x ∈ Z.

To identify this object, let us try to inspect the properties of the continuation setC and the stopping region D. A little thought suggests that these sets should beof the form C = . . . , b − 2, b − 1, D = b, b + 1, . . . for some positive integer b(possibly innite). While this is more or less clear by some intuitive argumentation,we should point out here that this can also be shown rigorously. Indeed, pick x ∈ Z−and take the stopping time τ ≡ −x+ 1. Then V (x) ≥ ExS+

τ = p−x+1 > 0 = G(x),so in particular C contains all nonpositive integers. Furthermore, if x > 0 lies inC, then so does x− 1. To see this, note that for any a ∈ Z we have

(x+ a)+ − x+ ≤ (x− 1 + a)+ − (x− 1)+

(which is equivalent to the trivial bound (x + a)+ ≤ (x − 1 + a)+ + 1) and hencefor any stopping time τ , if we plug a = ξ1 + ξ2 + . . .+ ξτ ,

ExS+τ −G(x) ≤ Ex−1S

+τ −G(x− 1).

Page 37: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.2. MARKOVIAN APPROACH 37

This yields

(2.21) 0 < V (x)−G(x) ≤ V (x− 1)−G(x− 1)

and thus x− 1 ∈ C, as we have claimed. This shows that C and D are of the formpostulated above and hence, by the general theory,

V (x) =

x if x ≥ b,pV (x+ 1) + qV (x− 1) if x < b.

Let us rst identify V on C. Solving the linear recurrence, we check that

V (x) = α+ β

(q

p

)x, x < b,

for some constants α, β ∈ R. It follows from (2.20) that V (x) → 0 as x → −∞(simply use the estimate ExS+

τ ≤ Ex supn≥0 S+n ): this implies α = 0 and β ≥ 0.

This also shows that b < ∞. Indeed, otherwise V (x) would explode exponentiallyas x→∞, but on the other hand, by (2.21), for x > 0 we would have

V (x) ≤ G(x) + V (0)−G(0) = x+ V (0)−G(0).

It remains to nd β and the boundary b. First, exploiting the Wald-Bellman equa-tion, we see that V (b − 1) = pV (b) + qV (b − 2). This implies V (b) = β(q/p)b andhence

V (x) = b

(q

p

)x−bfor x ≤ b.

Secondly, again by Wald-Bellman equation, we see that V (b) ≥ pV (b+1)+qV (b−1),which is equivalent to b ≥ p/(q − p). Finally, observe that if x > b, then

V (x) = x = px+ qx > p(x+ 1) + q(x− 1) = pV (x+ 1) + qV (x− 1).

Therefore, if b satises the inequality b ≥ p/(q − p), then the function

V(x) =

x if x ≥ b,b (q/p)

x−bif x < b.

is excessive. Let us now check for which b the inequality V ≥ G holds. Thismajorization is clear on b, b + 1, b + 2, . . .. Since the function x 7→ (q/p)x−b isnonnegative, convex and coincides with G at x = b, it suces to check whether itis bigger than G at x = b − 1. The latter bound is equivalent to b < q/(q − p) =p/(q − p) + 1. This actually forces us to take b = dp/(q − p)e: this is the onlychoice for the parameter such that the resulting function V is superharmonic andmajorizes G. Summarizing, we have shown that

V (x) =

x if x ≥ dp/(q − p)e,dp/(q − p)e (q/p)

x−dp/(q−p)eif x < dp/(q − p)e.

Observe that by (2.20), the stopping time

τ = inf

n : Sn ≥ dp/(q − p)e

]is innite with positive probability. Therefore, there is no optimal stopping timeτ∗ which would be nite Px-almost surely for all x. Hence, the value function is

Page 38: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

38 2. OPTIMAL STOPPING

attained asymptotically at the stopping times

τ (M) = inf

n : Sn /∈

[M, dp/(q − p)e

],

as M → −∞.

2.3. Problems

1. Let G1, G2, . . . be a sequence of independent random variables, each ofwhich has the uniform distribution on [0, 1]. Solve the optimal stopping problems

V N = supτ∈MN

EGτ and V0 = supτ∈M

EGτ ,

where N is an arbitrary integer.

2. (Krengel, Sucheston and Garling [12, 13]) Let G1, G2, . . ., be a family ofindependent, nonnegative random variables. Show that

E supn≥1

Gn ≤ 2 supτ∈M

EGτ .

3. (Hill and Kertz [11]) Let G0, G1, G2, . . ., GN be a family of arbitrarilydependent random variables taking values in [0, 1]. Prove the inequality

E sup0≤n≤N

Gn ≤ supτ∈MN

0

EGτ +

(n

n+ 1

)n+1

and show that the constant (n/(n+ 1))n+1 cannot be decreased.

4. Let ε1, ε2, . . . be the sequence of independent Rademacher variables and setX0 ≡ 0 and Xn = ε1 + ε2 + . . .+ εn for n = 1, 2, . . .. Find the smallest constant Csuch that for any stopping time τ adapted to the natural ltration of X we have

EX4τ ≤ CEτ2.

5. Let ε1, ε2, . . . be the sequence of independent Rademacher variables andset X0 ≡ 0 and Xn = ε1 + ε2 + . . .+ εn for n = 1, 2, . . .. For any 1 < p <∞, ndthe smallest constant Cp such that for any stopping time τ ∈ Lp/2 adapted to thenatural ltration of X we have

E supn≤τ|Xn|p ≤ CpE|Xτ |p.

6. (Chow and Robbins [6]) Let G0, G1, G2, . . . be i.i.d. nonnegative randomvariables and let c > 0 be a xed constant. Solve the optimal stopping problem

V = supτ∈M

E[

maxG0, G1, G2, . . . , Gτ − cτ].

7. (Robbins [18]) We ip a coin innitely many times. For n ≥ 1, let Gn =n2n/(n+ 1) of there were no tails in the rst n ips, and Gn = 0 otherwise. Solvethe optimal stopping problem

V = supτ∈M

EGτ .

8. Let (Sn)n≥0 be a symmetric random walk over the integers and let G(x) =arctanx− x−, x ∈ Z. Solve the optimal stopping problem

V (x) = supExG(Sτ ), x ∈ Z,

Page 39: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

2.3. PROBLEMS 39

where the supremum is taken over (i) all stopping times τ , (ii) integrable stoppingtimes τ , (iii) bounded stopping times τ .

9. We ip a coin at most ve times, at each point we may decide whether tostop or not (in particular, we are allowed to stop at the very beginning, withoutipping the coin even once). Having stopped, we look at the outcomes we haveobtained. We get 1 if there are no heads and get 2 if we got at least three heads.What is the strategy which yields the largest expected gain?

10. Let (Sn)n≥0 be a symmetric random walk over the integers and let β ∈(0, 1) be a xed parameter. Solve the optimal stopping problem

V (x) = supτ∈M

Ex[βτ (1− exp(Sτ ))+

], x ∈ Z.

11. (Dubins, Shepp and Shiryaev [9]) Let (Sn)n≥0 be a symmetric randomwalk over the integers, started at some point x. Prove that for any stopping timeτ we have

E supn≤τ

Sn ≤√Eτ

and that the inequality is sharp.

Page 40: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a
Page 41: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

CHAPTER 3

Inequalities for martingale transforms and

dierentially subordinated processes

3.1. Description of the method

We turn our attention to another class of estimates arising in the probabilitytheory and harmonic analysis. Though we will mainly focus on the martingalesetting, we should emphasize that the results we will obtain have their analyticcounterparts which can be expressed in terms of unconditional-type properties ofthe Haar system. Furthermore, the martingale inequalities which will be studiedcan be applied to obtain corresponding tight estimates for wide classes of Fouriermultipliers. In other words, despite the probabilistic language, the contents of thischapter is meaningful from the point of view of harmonic analysis.

As we will see below, our approach to the estimates for martingale trans-forms/under the dierential subordination is quite similar to that used in the previ-ous chapter in the context of Markovian optimal stopping. We will use an analogousnotation to make this similarity even more visible. We assume that (Ω,F ,P) is aprobability space, ltered by (Fn)n≥0, a non-decreasing family of sub-σ-algebras ofF . Let f = (fn)n≥0, g = (gn)n≥0 be real-valued martingales, with the dierencesequences df = (dfn)n≥0, dg = (dgn)n≥0 given by

df0 = f0, dfn = fn − fn−1, n = 1, 2, . . . ,

and similarly for dg. The martingale f is simple, if for every n the random variablefn takes only a nite number of values. We say that g is a transform of f , if thereis a predictable sequence v = (vn)n≥0 such that dgn = vndfn for each n ≥ 1 (hereby predictability of v we mean that for any n ≥ 0, the variable vn is F(n−1)∨0-measurable). This denition is slightly dierent from that used in the literature,where it is assumed that the equality dgn = vndfn holds also for n = 0 (in such acase, we will say that g is a full transform of f by v). We will say that g is a (full)±1-transform of f , if the transforming sequence v takes values in the set −1, 1.

The central problem of this chapter can be formulated as follows. Let G : R2 →R be a given function and suppose that we are interested in the inequality

(3.1) EG(fn, gn) ≤ 0,

for all n and all pairs (f, g) of simple martingales such that g is a full transformof f by a predictable sequence bounded in absolute value by 1 (i.e., we assumethat for any n, the random variable vn takes values in [−1, 1]). Note that we donot need to assume anything about the regularity of integrability of G: the factthat f and g are simple guarantees the existence of the expectation. Motivated bythe results obtained in the preceding chapter, we may search, for each pair (f, g)as above, for a supermartingale (Un)n≥0 majorizing (G(fn, gn))n≥0 and satisfying

41

Page 42: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

42 3. MARTINGALE INEQUALITIES

U0 ≤ 0. Clearly, if such a supermartingale exists, then (3.1) holds true. The Bell-man function approach rests on reducing the search to the supermartingales of theform (U(fn, gn))n≥0 for some special function U to be found; now the conditionU0 ≤ 0, the majorization Un ≥ G(fn, gn) and the supermartingale property trans-late into appropriate pointwise properties of U . Specically, consider the followingconditions a function U : R2 → R might satisfy.

1 (initial condition) We have U(x, y) ≤ 0 for all |y| ≤ |x|.2 (majorization) We have U ≥ G on R2.

3 (concavity) The function U is concave along any line of slope in [−1, 1].

The statement below describes the relation between the above set of conditionsand the desired estimate (3.1).

Lemma 3.1. If there is a function U satisfying 1, 2 and 3, then (3.1) holds.

Proof. Fix any pair (f, g) of simple martingales such that g is the full trans-form of f by a certain predictable sequence v bounded in absolute value by 1. Theconcavity condition implies that (U(fn, gn))n≥0 is a supermartingale: for any n ≥ 0we have

E[U(fn+1, gn+1)|Fn

]= E

[U(fn + dfn+1, gn + dgn+1)|Fn

]= E

[U(fn + dfn+1, gn + vn+1dfn+1)|Fn

]≤ U(fn, gn),

by 3 and the conditional version of Jensen's inequality. Therefore, applying 2 andthen 1, we obtain that for each n,

EG(fn, gn) ≤ EU(fn, gn) ≤ EU(f0, g0) ≤ 0.

There is a natural question about the existence of a special function U andhow to identify this object. To understand this issue, we will adopt the reasoningdeveloped in the preceding chapter. Our starting observation is that the inequality(3.1) is closely related to the following optimal stopping problem

(3.2) V N = supEG(fN , gN ),

where the supremum is taken over all pairs (f, g) of simple martingales such thatg is a full transform of f by a predictable sequence bounded in absolute value by1. Here the ltration is also allowed to vary as well as the probability space. Atthe rst glance, the phrase optimal stopping problem seems a bit misleading here,since no stopping times enter the above problem. However, a little thought revealsthat the stopping times actually do appear: one easily checks that if g is a fulltransform of f and τ ≤ N is a stopping time, then the stopped process gτ is a fulltransform of fτ (by the same sequence). Thus (3.2) can indeed be regarded as anoptimal stopping problem, and the signicant complication (in comparison to thepreceding chapter) lies in the fact that we do not stop a given Markov process, buta much larger class of martingale pairs.

The problem (3.2) is of nite-horizon-type and admits a natural version forN =∞:

(3.3) V = V∞ = supEG(fn, gn),

where the supremum is taken over all n and all pairs (f, g) as above. Motivated bythe theory of optimal stopping, our rst step is to enlarge the class of problems so

Page 43: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.1. DESCRIPTION OF THE METHOD 43

that the initial values of the martingales f , g can be taken into account. Namely,for each (x, y) ∈ R and any N , set

(3.4) V N (x, y) = supEG(fN , gN ),

where the supremum is taken over all pairs (f, g) of simple martingales startingfrom (x, y) such that g is a transform of f by some predictable sequence bounded inabsolute value by 1. An innite-horizon version of (3.4) is introduced analogously:

(3.5) V (x, y) = supEG(fn, gn),

where the supremum is taken over all n and all pairs (f, g) as above.

Theorem 3.1. Consider the nite-horizon problem (3.4). The sequence (V n)n≥0

can be computed inductively from the relations V 0 = G and, for n ≥ 1,

(3.6) V n(x, y) = sup

α1V

n−1(x+ t1, y + at1) + α2Vn−1(x+ t2, y + at2)

,

where the supremum is taken over all numbers α1, α2 ≥ 0, t1, t2 ∈ R and a ∈ [−1, 1]such that α1 + α2 = 1 and α1t1 + α2t2 = 0.

The equality (3.6) can be regarded as a version of Wald-Bellman equation. Ithas a very nice geometric meaning: to nd V n(x, y), we take an arbitrary a ∈ [−1, 1]and consider the restriction of V n−1 to the line of slope a passing through the point(x, y), i.e., the function t 7→ V n−1(x+t, y+at). If ζa is the smallest concave functionwhich majorizes this restriction, then V n(x, y) is the supremum of ζa(0) over all a.Directly from this interpretation, we see that (3.6) can be rewritten in the form

(3.7) V n(x, y) = supEV n−1(x+ ξ, y + aξ),

where the supremum is taken over all random variables a with values in [−1, 1] andall simple centered random variables ξ satisfying E(ξ|a) = 0.

Proof. The equality V 0 = G follows from the very denition of the sequence(V n)n≥0. To show the recurrence (3.6), x α1, α2, t1, t2 and a as in its statementand take any martingales (f1, g1) and (f2, g2) as in the denition of V n−1(x +t1, y + at1) and V n−1(x + t2, y + at2), respectively. Suppose that these pairs aregiven on two probability spaces (Ω,F1,P1) and (Ω2,F2,P2). Let us glue these pairsinto one pair (f, g). To this end, let Ω = Ω1 ∪ Ω2, F = σ(F1 ∪ F2) and dene theprobability measure P on F by requiring that P(A1 ∪A2) = α1P1(A1) + α2P2(A2)for any A1 ∈ F1 and A2 ∈ F2. The pair (f, g) is given by (f0, g0) = (x, y) and

(fk(ω), gk(ω)) =

(f1k−1(ω), g1

k−1(ω)) if ω ∈ Ω1,

(f2k−1(ω), g2

k−1(ω)) if ω ∈ Ω2.

Notice that the sequence (f, g) is a martingale with respect to its natural ltration(Fn)n≥0: this follows at once from the fact that (f1, g1), (f2, g2) are martingalesand the identity

E((f1, g1)|F0) = E(f1, g1) = α1(x+ t1, y + at1) + α2(x+ t2, y + at2) = (x, y).

Consequently, by the very denition of the functions V n and V n−1,

V n(x, y) ≥ EG(fn, gn) = α1E1G(f1n−1, g

1n−1) + α2E2G(f2

n−1, g2n−1)

Page 44: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

44 3. MARTINGALE INEQUALITIES

(where Ei is the expectation with respect to Pi), so taking the supremum over all(f1, g1), (f2, g2) as above, we get

V n(x, y) ≥ α1Vn−1(x+ t1, y + at1) + α2V

n−1(x+ t2, y + at2).

Taking the supremum over all αi, ti and a, we obtain the inequality ≥ in (3.6)and hence also in (3.7). To show the reverse estimate, take an arbitrary pair (f, g)as in the denition of V n(x, y) and note that

V n−1(f1, g1) ≥ E[G(fn, gn)|(f1, g1)

],

by the very denition of V n−1 applied conditionally on the σ-algebra generated by(f1, g1). Therefore

EG(fn, gn) = E [E(G(fn, gn)|(f1, g1)] ≤ EV n−1(x+ df1, y + v0df1)

does not exceed the right-hand side of (3.7). Taking the supremum over all (f, g)we get the claim.

The passage to the case of innite horizon requires a simple limiting argument.Indeed, it follows directly from (3.2) and (3.3) that the functional sequence (V n)n≥0

is pointwise increasing (any martingale f0, f1, f2, . . ., fn−1 of length n can betreated as a martingale f0, f1, f2, . . ., fn−1, fn−1 of length n) and V (x, y) =V∞(x, y) = limn→∞ V n(x, y). Therefore, we obtain the following

Corollary 3.1. Consider the problem (3.5). Then V is the smallest functionwhich satises the conditions 2 and 3. Furthermore, if the inequality (3.1) isvalid, then V satises 1 as well.

Proof. We have V n ≥ V 0 = G, so indeed V majorizes G. Letting n→∞ in(3.6), we see that

V (x, y) = sup

α1V (x+ t1, y + at1) + α2V (x+ t2, y + at2

,

where the supremum is taken over all ti, αi and a as above. This implies that Vhas the required concavity property. To see that V is the smallest, x an arbitraryfunction V satisfying 2 and 3, a point (x, y) and a pair (f, g) as in the deni-tion of V (x, y). As we have seen in the proof of Lemma 3.1, then the sequence(V(fn, gn))n≥0 is a supermartingale majorizing (G(fn, gn))n≥0, so

EG(fn, gn) ≤ EV(fn, gn) ≤ EV(f0, g0) = V(x, y).

Taking the supremum over all (f, g) yields the desired bound V (x, y) ≤ V(x, y). Itremains to show that if (3.1) holds true, then 1 holds. This follows immediatelyfrom the fact that if |y| ≤ |x|, the martingale pair (f, g) starts from (x, y) and g isa transform of f by a predictable sequence v bounded in absolute value by 1, thenactually g is a full transform of f (with v0 ∈ [−1, 1] determined by the conditiony = v0x).

The above discussion concerns the case when the transforming sequence takesvalues in [−1, 1]. However, the approach can be easily modied to apply to otherpossibilities, e.g. to the case of ±1-transforms (when vn ∈ −1, 1 for all n), or tothe case when vn ∈ [0, 1] for all n, etc. Let us be more precise. Suppose we areinterested in proving (3.1) for all (f, g) such that g is a full transform of f by apredictable sequence v taking values in some xed set A ⊂ R. A straightforward

Page 45: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.1. DESCRIPTION OF THE METHOD 45

modication of the above arguments shows that the validity of this estimate isequivalent to the existence of a function U satisfying

1 (initial condition) We have U(x, y) ≤ 0 for all (x, y) such that y = vx forsome v ∈ A.

2 (majorization) We have U ≥ G on R2.

3 (concavity) The function U is concave along any line of slope in A.

An important observation is that the passage from the set [−1, 1] to −1, 1 sim-plies the analysis and in many cases does not reduce the validity of the results.More precisely, we will frequently perform the following procedure. Suppose thatwe want to establish (3.1) for all f , g such that g is a full transform of f by apredictable sequence with values in [−1, 1]. The rst step is to consider the morerestrictive (and a little simpler) case of ±1-transforms rst and identify the cor-responding special function U . The second step is to verify that this function Uactually satises all the properties needed for the more general case of [−1, 1]-valuedtransforming sequences. This phenomenon occurs in most interesting estimates andis what one might expect: when studying the [−1, 1]-case, it seems reasonable thatthe extremal martingales (for which the equality or almost equality occurs) shouldbe constructed with the use of extremal transforming sequences, with values in−1, 1. Analogous reasoning can be applied when reducing the case of [0, 1]-valuedto 0, 1-valued transforming sequences, and so on.

Actually, the special functions U obtained via the analysis of ±1-transformsoften lead to much wider class of inequalities for martingales satisfying the so-calleddierential subordination, a condition which is of signicant importance from thepoint of view of applications. Here is the precise denition.

Definition 3.1. A martingale g is dierentially subordinate to f if for anyn ≥ 0 we have |dgn| ≤ |dfn| almost surely.

Of course, if g is a transform of f by a predictable sequence with values in[−1, 1], then g is dierentially subordinate to f . However, the dierential subordi-nation allows a much wider class of martingale pairs (f, g). Nevertheless, a similarapproach to that used above can be used to the study of estimates in this newsetting. Suppose that G : R2 → R is a given function and assume we are interestedin showing the inequality (3.1) for any pair (f, g) of simple martingales such that gis dierentially subordinate to f . We should stress here that in a typical situation,the assumption on the simplicity of the sequences can be skipped, mostly often byimposing some boundedness-type conditions on f ; for the sake of clarity, we willwork with simple martingales only. Suppose that U : R2 → R satises the followingproperties:

1 (initial condition) We have U(x, y) ≤ 0 for all (x, y) such that |y| ≤ |x|.2 (majorization) We have U ≥ G on R2.

3 (concavity) For any (x, y) ∈ R2 there are two numbers A = A(x, y) andB = B(x, y) such that if h, k ∈ R satisfy |k| ≤ |h|, then(3.8) U(x+ h, y + k) ≤ U(x, y) +A(x, y)h+B(x, y)k.

As we have mentioned above, in many cases the special function obtained fromthe analysis of the corresponding estimate for ±1-transforms satises these con-ditions; in 3, typically one takes the partial derivatives A(x, y) = Ux(x, y) andB(x, y) = Uy(x, y), or their one-sided versions.

Page 46: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

46 3. MARTINGALE INEQUALITIES

Lemma 3.2. If there is a function U satisfying 1, 2 and 3, then (3.1) isvalid for any pair (f, g) such that g is dierentially subordinate to f .

Proof. As in the case of transforms, the main ingredient of the proof is thesupermartingale property of the sequence (U(fn, gn))n≥0. Fix n ≥ 0 and apply 3

to x = fn, y = gn, h = dfn and k = dgn (the condition |k| ≤ |h| follows from thedierential subordination) to obtain

U(fn+1, gn+1) ≤ U(fn, gn) +A(fn, gn)dgn+1 +B(fn, gn)dgn+1.

Taking the conditional expectation with respect to Fn yields the desired super-martingale property. The remainder of the proof is as previously:

EG(fn, gn) ≤ EU(fn, gn) ≤ EU(f0, g0) ≤ 0.

The remaining part of this section is a list of useful comments and observationswhich will be used frequently in our later considerations.

(a) In general, the special function U satisfying 1, 2 and 3, if it exists, is notunique. In some situations the choice of the appropriate function does simplify thecalculations involved.

(b) In many cases, the special function inherits certain structural propertiesfrom G. For example, if G is symmetric with respect to x-variable (i.e., we haveG(x, y) = G(−x, y) for all x, y), then we may search for U in the class of functionsenjoying this property. That is, if there is U satisfying 1, 2 and 3, then thereexists a function U which enjoys these conditions as well as the additional propertyU(x, y) = U(−x, y) for all x, y. To see this, one easily veries that

U(x, y) = minU(x, y), U(−x, y),

or

U(x, y) =U(x, y) + U(−x, y)

2,

is such a function. Alternatively, one can check that the function V given by (3.5)meets this requirement (directly from the denition). Similarly, if G is homogeneousof order p and there exists a function U satisfying 1, 2 and 3, then there is alsoa function U enjoying these properties which is homogeneous of order p. Indeed, itsuces to take

U(x, y) = infλ>0λpU(x/λ, y/λ) ,

or check that the solution to (3.5) has the required property.

(c) In the above discussion we have considered the real-valued martingalesf and g only. This can be easily modied to the case when the sequences takevalues in some other domains. For instance, suppose we are interested in showing(3.1) under the assumption that f takes values in [0, 1] and g is its transform bya predictable sequence with values in [−1, 1] (in particular, g may take negativevalues). Then only some minor straightforward modications of the approach arerequired. Namely, one needs to construct a function on [0, 1]× R which satises

1 (initial condition) We have U(x, y) ≤ 0 for all |y| ≤ x ≤ 1.

2 (majorization) We have U ≥ G on [0, 1]× R.3 (concavity) The function U is concave along any line segment of slope in

[−1, 1], contained in [0, 1]× R.

Page 47: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 47

Analogously, one can extend the method so that it works for Hilbert or Banach-space valued sequences. We will not pursue the discussion in this direction. Seethe bibliographical details at the end of the chapter.

(d) It should also be mentioned here that in the above martingale context, thereare very natural analogues of the continuation and the stopping regions C and Dwhich were so meaningful in the previous chapter. These sets can be dened withthe same formula as in the theory of optimal stopping:

C = (x, y) ∈ R2 : V (x, y) > G(x, y), D = (x, y) ∈ R2 : V (x, y) = G(x, y).

If one looks at the formula (3.5), these sets have very natural interpretation.Namely, if (x, y) belongs to the set D, then the optimal choice for the pair (f, g)(for which the supremum dening V (x, y) is attained) is just the constant pair(f, g) ≡ (x, y). In other words, when starting from the set D, it is optimal tostop the martingale pair instantly. In contrast, when (x, y) ∈ C, the optimal pair(f, g) for V (x, y) must make some nontrivial moves, that is, it must continue itsevolution.

3.2. Examples

In this section we will illustrate the above method on several important exam-ples. We assume throughout that all the processes we work with are simple.

Example 3.1. Suppose that g is dierentially subordinate to f and we areinterested in the best constant C in the inequality

(3.9) λP(|gn| ≥ λ) ≤ CE|fn|2, n = 0, 1, 2, . . . ,

where λ > 0 is a xed parameter. Note that we may assume that λ = 1, replacingf , g with f/λ and g/λ, respectively (this replacement does not aect the dierentialsubordination). There are two objects to be determined: a priori unknown optimalvalue of the constant C and an appropriate special Bellman function. As we haveexplained above, we rst restrict ourselves to the case of ±1-transforms and hopethat the function U we will obtain will also work in the general setting. We rewritethe estimate in the form

EG(fn, gn) ≤ 0,

with G(x, y) = 1|y|≥1 −C|x|2. We will identify the sequence V 0, V 1, V 2, . . .. By

the very denition, V 0 = G. Note that the function V(x, y) = 1−C|x|2 is concave;this, by a straightforward induction, implies V n ≤ V on R2. Since G = V outsidethe strip R× (−1, 1), we see that

V n(x, y) = V(x, y) = 1− C|x|2 if |y| ≥ 1.

To nd V n on the remaining part of the domain, x (x, y) ∈ R× (−1, 1) and writedown the Wald-Bellman equation for V 1:

V 1(x, y) = sup

α1G(x+ t1, y + at1) + α2G(x+ t2, y + at2)

.

Here the supremum is taken over all αi ≥ 0, ti ∈ R and a = ±1 such that α1 +α2 = 1 and α1t1 + α2t2 = 0. So, we need to nd the least concave majorant

Page 48: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

48 3. MARTINGALE INEQUALITIES

ζ1 of t 7→ G(x + t, y + t), the least majorant ζ−1 of t 7→ G(x + t, y − t) and setV 1(x, y) = maxζ1(0), ζ−1(0). The function ξ(t) = G(x+ t, y+ t) has the formula

ξ(t) =

−C(x+ t)2 if t ∈ (−1− y, 1− y),

1− C(x+ t)2 otherwise

and hence

(3.10) ζ1(0) ≥ 1 + y

2ξ(1− y) +

1− y2

ξ(−1− y) = 1− C + C(y2 − x2).

This inequality yields a useful lower bound for C. Namely, if we take x = y ∈ (0, 1),then, by 1,

0 ≥ V (x, y) ≥ V 1(x, y) ≥ 1− C + C(y2 − x2) = 1− C,or C ≥ 1. under this additional assumption, it is not dicult to check that

ζ1(t) =

1− C + C((y + t)2 − (x+ t)2) if t ∈ (−1− y, 1− y),

1− C(x+ t)2 otherwise,

and

ζ−1(t) =

1− C + C((y − t)2 − (x+ t)2) if t ∈ (−1 + y, 1 + y),

1− C(x+ t)2 otherwise,

so V 1(x, y) = 1 − C + C(y2 − x2) for (x, y) ∈ R × (−1, 1). Now, one easily checksthat V 1 is concave along the lines of slope ±1. This implies that the sequence(V n)n≥0 stabilizes: we have V 1 = V 2 = . . . = V . Since V 1(x, y) ≤ 0 for y = ±x,we have proved the validity of (3.9) for any C ≥ 1 (under the assumption that g isa ±1-transform of g). Therefore, C = 1 is the best constant there.

Now we can pass to the more general classes of martingales. Setting C = 1,one can check that the function

V (x, y) =

y2 − x2 if |y| < 1,

1− x2 if |y| ≥ 1

is concave along lines of slope belonging to [−1, 1] and satises V (x, y) ≤ 0 for|y| ≤ |x|. Therefore, we obtain that the inequality (3.9) holds for transformingsequences with values in [−1, 1]. Actually, this inequality holds even in the lessrestrictive setting of dierential subordination. Indeed, the appropriate versionsof the initial and majorization conditions are satised, so it remains to verify theconcavity inequality (3.8). We set A(x, y) = −2x, B(x, y) = 2y1|y|<1 (note thatthese are essentially the partial derivatives of V ) and consider two cases. If |y| ≥ 1,we use the fact that the function V(x, y) = 1 − x2 is concave and majorizes V .Therefore, for any h, k ∈ R,

V (x+ h, y + k) ≤ V(x+ h, y + k)

≤ V(x, y) + Vx(x, y)h+ Vy(x, y)k

= V (x, y) +A(x, y)h+B(x, y)k.

On the other hand, if |y| < 1 and |k| ≤ |h|, thenV (x, y) +A(x, y)h+B(x, y)k = (y + k)2 − (x+ h)2 + h2 − k2

≥ (y + k)2 − (x+ h)2

≥ V (x+ h, y + k).

Page 49: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 49

This proves the validity of (3.9) under the assumption of the dierential subordi-nation of g to f .

We conclude this lengthy analysis of the weak-type inequality by the obser-vation that the function V we used above is the smallest possible, but not thesimplest possible. There is a dierent, nicer choice for the function which yieldsthe validity of (3.9): it is easy to see that U(x, y) = y2 − x2 has all the requiredproperties. See the Remark (a) above.

However, in general the computation of the whole sequence V 0, V 1, V 2, . . .is a formidable task and the calculations are hard to push through. Therefore, atypical approach rests on the direct search for the function V . Let us move to thenext example.

Example 3.2. We will identify the best constant C1 in the estimate

λP(|gn| ≥ λ) ≤ C1E|fn|, n = 0, 1, 2, . . . ,

where λ > 0 and g is assumed to be dierentially subordinate to f . As previously,we start the analysis in the more restrictive setting of ±1-transforms and assumethat λ = 1. Then the problem can be rewritten in the form (3.1) with G(x, y) =1|y|≥1 −C1|x|. To gain some intuition about the special function to be found, letus write down the denition of V :

(3.11) V (x, y) = supP(|gn| ≥ 1)− C1E|fn|

,

the supremum taken over the usual parameters. We divide the analysis into a fewintermediate steps.

Step 1. The case |y| ≥ 1. We know that V is the smallest majorant of G whichis concave along the lines of slope ±1. On the other hand, observe that the functionV(x, y) = 1 − C1|x| is concave and majorizes G: this implies V ≤ V on the wholeR2. Since V and G coincide on (x, y) : |y| ≥ 1, we must have

V (x, y) = G(x, y) = 1− C1|x| provided |y| ≥ 1.

Step 2. The case |x| + |y| ≥ 1. Actually, the above equality holds for all x, ysatisfying |x| + |y| ≥ 1. This can be easily seen geometrically, by looking at thegraph of the function G; the formal proof goes as follows. Suppose that x, y ≥ 0(for the remaining cases the reasoning is analogous) and y < 1. We have

V (x, y) ≥ 1− y2

V (x+ y + 1,−1) +1 + y

2V (x+ y − 1, 1) = 1− C1x = V(x, y).

Since V ≥ V on the whole R2, we must actually have equality above.

Step 3. The case |x| + |y| < 1. Here we make a guess. Take a line of slope1 passing through (x, y): this line intersects the set (x, y) : |x| + |y| = 1 at twopoints

P1 =

(1 + x− y

2,

1− x+ y

2

), P2 =

(−1 + x− y

2,−1− x+ y

2

).

We have, by the concavity of V ,

V (x, y) ≥ 1 + x+ y

2V (P1) +

1− x− y2

V (P2) = 1− C1(1 + x2 − y2)/2.

Page 50: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

50 3. MARTINGALE INEQUALITIES

We assume that we actually have equality here. Since V (0, 0) must be nonpositive,this implies C1 ≥ 2 (note that this is a formal proof that the weak-type constantcannot be smaller than 2). This leads us to the candidate

U(x, y) =

1− C1(1 + x2 − y2) if |x|+ |y| < 1,

1− C1|x| if |x|+ |y| ≥ 1.

It is easy to check that this function has all the required properties and the inequal-ity P(|gn| ≥ 1) ≤ C1E|fn| is established (in the setting of ±1-transforms). SinceC1 ≥ 2 was arbitrary, we see that the weak-type constant is equal to 2.

The passage to more general contexts of [−1, 1]-valued transforming sequencesand the setting of dierential subordination goes along the same lines as previously.Clearly, it is enough to study the less restrictive context of dierential subordina-tion. Fix C1 = 2. Then the function

U(x, y) =

y2 − x2 if |x|+ |y| < 1,

1− 2|x| if |x|+ |y| ≥ 1

satises the appropriate initial condition and majorization, so it suces to check(3.8). We leave the straightforward verication to the reader and just mention thatfor the function A and B, one can take

A(x, y) =

−2x if |x|+ |y| < 1,

−2 sgnx if |x|+ |y| ≥ 1,B(x, y) =

2y if |x|+ |y| < 1,

0 if |x|+ |y| ≥ 1

(as in the previous example, A and B are essentially the partial derivatives of U).This establishes the sharp inequality

λP(|gn| ≥ λ) ≤ 2E|fn|, n = 0, 1, 2, . . . ,

for dierentially subordinate martingales.

It is instructive to see that the Steps 1, 2 and 3 have a very transparent prob-abilistic interpretation. The idea can be described as follows. Let us rst look at(3.11). A little informally, we want to make the probability P(|gn| ≥ 1) as large aspossible and the expectation E|fn| as little as possible. Furthermore, the process(|fn|)n≥0 is a submartingale, so its expectation does not decrease as we increase n;however, it stays on the same level if the process does not change its sign.

Step 1. The case |y| ≥ 1. Here the reasoning is simple: the supremum deningV (x, y) must be attained for the constant pair (f, g) ≡ (x, y). Indeed, for such a pairthe probability P(|gn| ≥ 1) is one (so we cannot make it larger); on the other hand,any nontrivial movement of f can only increase E|fn|. Thus V (x, y) = 1− C1|x|.

Step 2. The case |x| + |y| ≤ 1. Here it is again possible to send g outsidethe strip R × (−1, 1) almost surely and simultaneously keep E|fn| at the level |x|.Indeed, suppose that x ≥ 0 and y ≥ 0 (and y < 1), for the remaining cases theconstruction is similar. We consider the martingale pair (f, g) starting from (x, y)and moving along the line of slope −1 at the rst step: formally, we require that(f1, g1) ∈ (x+y−1, 1), (x+y+1,−1) (the corresponding probabilities are uniquelydetermined by the fact that E((f1, g1)|F0) = (x, y)). The construction is completedby the condition f1 = f2 = f3 = . . . and g1 = g2 = g3 = . . . almost surely. ThenP(|g1| ≥ 1) = 1 and E|f1| = |x|, so V (x, y) = 1− C1|x|.

Step 3. The case |x| + |y| < 1. Here we need to experiment a bit. A littlethought leads to the following idea: start the pair (f, g) at (x, y), then, at the rst

Page 51: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 51

step, send it to the set (x, y) : x + y ∈ −1, 1, and then move according to thepattern described in Step 2. Precisely, consider the following Markov martingale(f, g):

(i) It starts from (x, y): (f0, g0) ≡ (x, y).(ii) The random variable df1 = dg1 is centered and takes values in (1− x−

y)/2, (−1− x− y)/2.(iii) Conditionally on df1 > 0 and conditionally on df1 < 0, the random

variable df2 = −dg2 is centered and takes values in −f1, g1 + 1.(iv) Put dfn = dgn ≡ 0 for n ≥ 3.

Then we have P(|g2| ≥ 1) = 1. Furthermore, we easily derive that df1 takes values(1 − x − y)/2 and (−1 − x − y)/2 with probabilities p− = (1 + x + y)/2 andp+ = (1− x− y)/2, respectively. In consequence, since f2 has the same sign as f1,we may write

E|f2| = E|f1| =∣∣∣∣x+

1− x− y2

∣∣∣∣ · 1 + x+ y

2+

∣∣∣∣x+−1− x− y

2

∣∣∣∣ · 1− x− y2

=1 + |x|2 − |y|2

2.

and hence we getV (x, y) ≥ 1− C1(1 + |x|2 − |y|2)/2.

The remaining analysis is the same as previously. The above probabilistic anal-ysis very clearly illustrates the notions of the continuation and stopping regionsdescribed in Remark (d) above. Indeed, from the formulas for G and V we inferthat

C = (x, y) : V (x, y) > G(x, y) = R× (−1, 1),

D = (x, y) : V (x, y) = G(x, y) = R× ((−∞,−1] ∪ [1,∞)),

and it is evident from Steps 1, 2 and 3 above that the optimal pairs (f, g) corre-sponding to V (x, y) have some nontrivial evolution if and only if (x, y) ∈ C.

Example 3.3. Fix 1 < p < 2. The purpose of the example is to identify thebest constant Cp in the Lp estimate

E|gn|p ≤ CppE|fn|p, n = 0, 1, 2, . . . ,

under the assumption of the dierential subordination of g to f . As previously, ourrst step is to consider the case of ±1-transforms. The inequality is of the form(3.5) with G(x, y) = |y|p−Cpp |x|p and the formula for the smallest Bellman functionbecomes

(3.12) V (x, y) = supE|gn|p − CppE|fn|p

,

where the supremum is taken over all n and all pairs (f, g) starting from (x, y) suchthat g is a ±1-transform of f . As we have explained in Remark (b), we may searchfor the Bellman function in the class of homogeneous functions:

U(λx,±λy) = λpU(x, y), x, y ∈ R, λ > 0.

Thus it is enough to nd the formula for the restriction

(3.13) u(x) = U(x, 1− x).

This function must be concave and majorize the function w(x) = G(x, 1 − x) on[0, 1]. Clearly, this is not a full set of requirements: we must somehow guarantee

Page 52: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

52 3. MARTINGALE INEQUALITIES

Figure 3.1. The graph of the function w(x) = G(x, 1 − x) forp = 6/5 and Cp = 2.

that the function x 7→ U(x, 1− x) is concave on the full real line. To take this intoaccount, let us rst inspect the behavior of this restriction on (1,∞). To this end,x x > 1 and note that

U(x, 1− x) = U(x, x− 1) = (2x− 1)pu

(x

2x− 1

)and hence, the analysis the one-sided derivatives yields

2pu(1)− u′(1−) =d

dxU(x, 1− x)

∣∣∣∣x=1+

≤ d

dxU(x, 1− x)

∣∣∣∣x=1−

= u′(1−),

or

(3.14) u′(1−) ≥ pu(1).

A natural guess for u is to take a linear function, whose graph is tangent to that ofw at some point x0 ∈ (0, 1). In other words, as a candidate for u, let us take

u(x) = w(x0) + w′(x0)(x− x0),

for some x0 to be found. An application of (3.14) yields p[w(x0)+w′(x0)(1−x0)

]≤

w′(x0) or, equivalently,

(3.15) Cpp ≥(1− x0)p−2 − (p− 1)(1− x0)p−1

(p− 1)xp−10

.

Now we specify x0 by requiring that the right-hand side above is the least possible,and assume that Cpp is equal to that minimal value. Simple calculations show that

we must take x0 = 1− 1/p and Cp = (p− 1)−1; this leads us to the candidate

u(x) = − p3−p

p− 1

(x− 1 +

1

p

)and the function

U(x, y) = (|x|+ |y|)pu(

|x||x|+ |y|

,|y|

|x|+ |y|

)= p2−p

(|y| − |x|

p− 1

)(|x|+ |y|)p−1.

Page 53: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 53

Now one has to verify rigorously that the properties 1, 2 and 3 are satised. Onecan perform this analysis right away in the context of the dierential subordination(we omit the details) and thus obtain that

E|gn|p ≤ (p− 1)−pE|fn|p, n = 0, 1, 2, . . . .

In contrast to the preceding examples, the above analysis does not imply that theconstant (p− 1)−1 is the best possible. To prove this, we will exploit the abstractproperties of the function V . Suppose that the Lp inequality holds with someconstant Cp and introduce the function v(x) = V (x, 1− x), x ∈ [0, 1]. We have

(3.16) v((Cp + 1)−1) ≥ w((Cp + 1)−1) =(1− (Cp + 1)−1

)p − Cpp (Cp + 1)−p = 0.

Therefore, exploiting the concavity of v, we get

Cp + 1

Cpv(1) ≥ v(1)− v((Cp + 1)−1)

1− (Cp + 1)−1≥ v′(1−) ≥ pv(1).

In the last passage we have used that fact that v, being appropriately symmetric andconcave, must satisfy (3.14). The above estimate is equivalent to Cp ≥ (p − 1)−1,since v(1) is strictly negative. The latter follows from concavity of v, (3.16) andthe estimate v(0) ≥ w(0) = 1.

We should point out here that the function U discovered above does not coincidewith the smallest Bellman function V . One can show that the two functions coincideonly on a part of R2, more precisely, we have

V (x, y) =

U(x, y) if |y| ≤ |x|/(p− 1),

|y|p − (p− 1)−p|x|p if |y| > |x|/(p− 1).

It is instructive to see that the above result can be obtained with the use ofa dierent argumentation which is of probabilistic nature. Let us write down theformula for the function V :

V (x, y) = sup

E|gn|p − CppE|fn|p

,

where the supremum is taken over all n and all martingale pairs (f, g) startingfrom (x, y) such that g is a ±1 transform of f . As in the example concerningthe weak-type estimate, we begin by a little informal observation. Namely, whensearching for the supremum, we need to make E|gn|p relatively big in comparisonto E|fn|p. Both processes (|gn|p)n≥0 and (|fn|p)n≥0 are submartingales, so theirmoments grow as we increase n. However, we can steer the pair (f, g) so that thep-th moment of g grows appropriately faster than that of f . To do this, note thatthe second derivative of the function t 7→ |t|p is decreasing: this implies that it isprotable to evolve the pair (f, g) when gn small and fn big; on the other hand,if g is big when compared to f , the best strategy seems to be to stop the processesat once. This observation can be very nicely expressed in terms of the continuationand the stopping regions C and D: we should have C = (x, y) : |y| < γp|x| andD = (x, y) : |y| ≥ γp|x|, for some parameter γp to be found. In other words,

V (x, y) = |y|p − Cpp |x|p if |y| ≥ γp|x|and V (x, y) > |y|p − Cpp |x|p for |y| < γp|x|. To identify V on C = (x, y) : |y| <γp|x|, we make the second observation. Namely, a little thought suggests that thefollowing principle should hold: if g approaches 0, then f should be go away fromthis value, and vice versa. This means that when (x, y) ∈ C ∩ xy > 0, then V

Page 54: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

54 3. MARTINGALE INEQUALITIES

should be linear along the lines of slope −1, and linear along the lines of slope 1on the remaining part of C. This actually brings us back to the analytic reasoningsimilar to that presented above. Indeed, one rst notes that V is homogeneous oforder p and thus it is enough to identify v(x) = V (x, 1−x) for x ∈ [0, 1]. The aboveanalysis suggests that if 1− x ≥ γpx (equivalently: x ≤ (1 + γp)

−1, then

v(x) = w(x) = (1− x)p − Cppxp.

On the other hand, on [(1 + γp)−1, 1] the function v should be linear and majorize

w: this uniquely determines v:

v(x) = w′((1 + γp)−1)(x− (1 + γp)

−1) + w((1 + γp)−1),

and the condition v′(1−) ≥ pv(1) implies (3.15), with x0 = (1 + γp)−1.

There is a natural question about the explicit examples showing that the con-stant (p− 1)−1 cannot be improved. The idea behind the construction is straight-forward: we want to start the pair (f, g) at some point (x, x) for which U(x, x) = 0and then require that the martingales evolve on the line segments along which U islinear. If, in addition, we ensure that the nal pair (fN , gN ) terminates at the setD = (x, y) : |y| = (p− 1)−1|x|, then all the inequalities

E|gN |p − CppE|fN |p ≤ EU(fN , gN ) ≤ EU(fN−1, gN−1) ≤ . . .EU(f0, g0) ≤ 0

would become equalities, so the constant Cpp would be attained. Unfortunately, thiscannot be done in such a simple manner. The only pair (f, g) which satises theabove set of conditions is the pair (f, g) ≡ (0, 0), and clearly, the equality E|gN |p =CppE|fN |p is not meaningful. We will show that for any ε > 0, there is a martingale

pair (f, g) such that if n is suciently large, then E|gn|p >((p− 1)−1 − ε

)pE|fn|p.To guarantee this inequality, we may relax a little the requirements formulatedabove. Fix β ∈ (1, (p − 1)−1). First, we will allow the martingale (f, g) to startfrom the point (1, 1): we will see that the loss EU(f0, g0) < 0 we experience hereis insignicant in comparison to the overall size of the martingales f and g. Next,consider the following Markov transities:

(i) The states lying in the set (x, y) : |y| ≥ β|x| are absorbing.(ii) The state (x, y) with 0 < y < βx, leads to (x+ y, 0) or to

(x+yβ+1 ,

β(x+y)β+1

)(the move along the line of slope −1).

(iii) The state (x, 0) with x > 0 leads to (x+ δx, δx) or to(

xβ+1 ,−

βxβ+1

)(the

move along the line of slope 1).(iv) The remaining states (x, y) behave in a symmetrical way when compared

to (ii) and (iii).

It is easy to check that g is a ±1-transform of f . Furthermore, (f, g) converges toa nontrivial random variable (f∞, g∞) with values (x, y) : |y| = β|x|. Thereforewe will be done if we check that f is Lp bounded. This can be veried readily fromthe above construction, we leave the details to the reader.

Example 3.4. The purpose of our nal example is to illustrate the modicationof the method when the values of f are restricted to the interval [−1, 1] and thetransforming sequence takes values in [0, 1]. Namely, in such a setting, we willidentify the best constant Cλ in the estimate

(3.17) P(|gn| ≥ λ) ≤ Cλ, n = 0, 1, 2, . . . ,

Page 55: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 55

where λ ≥ 3/2. As in the previous examples, we start with the extremal case inwhich the transforming sequence takes values in 0, 1. The inequality (3.17) canbe rewritten in the form

EG(fn, gn) ≤ Cλ, n = 0, 1, 2, . . . ,

with G(x, y) = 1|y|≥λ, (x, y) ∈ [−1, 1]× R. This inequality is of slightly dierentform than (3.1), due to the appearance of the term Cλ on the right. Of course,we could have put it on the left and use the function G(x, y) = 1|y|≥λ − Cλ, butinstead we prefer to say that the method described above works perfectly ne, theonly change we need is the requirement U(x, y) ≤ Cλ in the initial condition 1.

Let us start with the smallest Bellman function

V (x, y) = supEG(fn, gn) = sup

P(|gn| ≥ λ)− Cλ

,

where the supremum is taken over all associated parameters. The function G issymmetric with respect to x and y; this implies that we have V (x, y) = V (−x,−y)for all (x, y) ∈ [−1, 1]×R (on contrary, we do not have V (x,−y) = V (x, y): this isdue to the fact that the transforming sequence is non-symmetric, i.e., takes valuesin a non-symmetric set).

Step 1. As in the preceding examples, it is easy to identify V on some partof its domain, directly from the denition. It will be convenient to express theobservations in terms of the continuation and stopping sets C and D. Directlyfrom the above denition of V , we see that our objective is to send g outside(−1, 1) with as large probability as possible, keeping f inside the interval [−1, 1].This observation immediately gives C = (−1, 1)×(−λ, λ) and D = ([−1, 1]×R)\C.Indeed, we have G = 0 on (−1, 1) × (−λ, λ) and for any (x, y) ∈ (−1, 1) × (−λ, λ)one easily constructs a martingale pair (f, g) starting from (x, y) for which theprobability P(|gn| ≥ λ) is strictly positive for some n. On the other hand, forany (x, y) /∈ (−1, 1) × (−λ, λ), the best choice is to take the constant martingale(f, g) ≡ (x, y). Indeed, if |x| = 1 this is due to the fact that f must be stoppedimmediately (otherwise it leaves [−1, 1]), while for remaining points the process |g|already reaches the desired set [λ,∞) at its initial position. This reasoning showsthat V (x, y) = 1|y|≥λ on D and it remains to identify the formula for V on C. Wewill actually guess this formula basing on a number of (reasonable) assumptions:this is done in the next four steps and our reasoning will be a little informal. Therigorous verication that the obtained candidate enjoys all the required propertiesis a separate issue (see Step 6). To stress that we are working with a candidate, wewill use a dierent letter and, from now on, denote the investigated function by U .

Step 2. In all the considerations below, we will treat U as a C1 function (thoughthe candidate we will end up with will not even be continuous). Let us consider thecase when x − 1 + λ ≤ y < λ. Consider the line of slope 1 passing through (x, y).The restriction of G to this line (more formally, to an appropriate line segment) isgiven by

ξ(t) =

0 if t ∈ [−1− x, λ− y),

1 if t ∈ [λ− y, 1− x],

Page 56: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

56 3. MARTINGALE INEQUALITIES

so the least concave majorant is ζ(t) = min(t+ x+ 1)/(x− y+ λ+ 1), 1. Conse-quently, we have

V (x, y) ≥ x+ 1

x− y + λ+ 1.

The right hand side is linear along lines of slope 1 and concave along lines of slope0 when y < λ. These desired properties suggest to assume that

U(x, y) =x+ 1

x− y + λ+ 1if x− 1 + λ ≤ y < λ

(with a symmetric formula for −x− 1 + λ ≤ −y < λ).

Step 3. Let us turn our attention to the case y < x − 1 + λ. Suppose, for awhile, that y is positive and not too close to 0. A little thought and experimentationsuggest the following behavior of U : there should be some curve γ splitting the set(x, y) ∈ [−1, 1]×R : y < x− 1 +λ such that U is linear along the lines of slope 1on the left of γ, and linear along the horizontal lines on the right of γ: see Figure3.2 below. Let us parametrize this curve as (γ(y), y) : y ∈ I for some interval I.So, if A denotes the restriction of U to the curve (γ(y), y) : y ∈ I (meaning thatA(y) = U(γ(y), y)) and x > γ(y), then

(3.18) U(x, y) =x− γ(y)

1− γ(y)V (1, y) +

1− x1− γ(y)

A(y) =1− x

1− γ(y)A(y).

Let us assume that the function U is of class C1 on the set y < x − 1 + λ (ac-

Figure 3.2. The curve γ. The dotted lines indicate the directionsalong which U should be linear.

tually, this condition will not hold, but it leads to the right candidate). Then the

Page 57: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 57

aforementioned linearity of U along line segments of slope −1 lying on the left of γimplies that

Ux(γ(y), y) + Uy(γ(y), y) =U(γ(y), y)− U(−1, y − γ(y)− 1)

γ(y) + 1=

A(y)

γ(y) + 1.

The partial derivatives can be identied with the use of (3.18): as the result, weobtain the dierential equation

A′(y)

A(y)+

γ′(y)

1− γ(y)=

2

1− γ2(y).

Solving this equation, we get

A(y)

1− γ(y)= c exp

(∫ y

0

2du

1− γ2(u)

),

for some unknown constant c. To nd this parameter, let (γ(y0), y0) be the in-tersection of the curve γ with the line y = x − 1 + λ (the upper end of γ: seeFigure 3.2). We have assumed that U is continuous, so by the formula guessed atthe previous step we may write

c exp

(∫ y0

0

2du

1− γ2(u)

)=

A(y0)

1− γ(y0)=

1 + γ(y0)

2(1− γ(y0)).

Consequently, we have obtained that

A(y)

1− γ(y)=

1 + γ(y0)

2(1− γ(y0))exp

(∫ y

y0

2du

1− γ2(u)

).

Now suppose that y is a positive number and suppose that x > γ(y). Exploitingthe linearity of U along the horizontal segments as in the Figure 3.2, we may write

U(x, y) =1− x

1− γ(y)A(y) +

x− γ(y)

1− γ(y)U(1, y)

=1 + γ(y0)

2(1− γ(y0))(1− x) exp

(∫ y

y0

2du

1− γ2(u)

).

(3.19)

At this point it is not dicult to see what the optimal choice for γ should be. Thefunction γ should be continuous, take values in (−1, 1) and satisfy γ(y) ≥ y+ 1−λ(the latter condition means that the curve γ lies below or on the line y = x−1+λ).Now if we keep y0 and γ(y0) xed, it is clear that in order to maximize U(x, y),we need to make γ as close to 0 as possible: this will guarantee that the integral∫ yy0

2(1 − γ2(u))−1du will be maximal (recall that y ≤ y0). This leads to the

assumption γ ≡ 0 and y0 = λ− 1. Having assumed this, we can nd U on a largepart of the domain. Namely, if λ− 1 ≤ y < x+ λ− 1, then

U(x, y) =1− xλ− y

U(y + 1− λ, y) +x− y − 1 + λ

λ− yU(1, y) =

(1− x)(y − λ+ 2)

2(λ− y).

If x ∈ [0, 1] and y < λ− 1 is suciently big (this will be made more precise later),then the above considerations (see (3.19)) give

(3.20) U(x, y) =1− x

2exp(2(y − λ+ 1)).

Page 58: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

58 3. MARTINGALE INEQUALITIES

Finally, if x ∈ [−1, 0] and y < x+ λ− 1 is large enough, we have

U(x, y) = −xU(−1,−x+ y − 1) + (1 + x)U(0, y − x)

= (1 + x)A(−x+ y)

=1 + x

2exp(2(−x+ y − λ+ 1)).

(3.21)

Step 4. Now it is time to specify what we have meant by saying that theformulas (3.20) and (3.21) hold for suciently large y. Clearly, they cannot holdfor all y ≤ λ−1, since then the symmetry condition U(x, y) = U(−x,−y) would beviolated. A closer look at this symmetry condition suggests that (3.20) should holdtrue for D1 = (x, y) : x ∈ [0, 1], 1/2 ≤ y ≤ λ − 1 and (3.21) should be valid forD2 = (x, y) : x ∈ [−1, 0], x− 1/2 ≤ y ≤ x+ 1− λ: indeed, the lower boundariesy = 1/2 in D1 and y = x−1/2 are the smallest numbers with the property that theinteriors of the sets D1, D2 and their reections −D1 = (x, y) : (−x,−y) ∈ D1,−D2 are disjoint. See Figure 3.3.

Figure 3.3. The sets D1, D2, −D1 and −D2 have pairwise dis-joint interiors.

Step 5. It remains to nd U on the set (x, y) : y − 1/2 ≤ x ≤ y + 1/2, y ∈[−1, 1]. Now we make the following guess. Namely, take the line segment of slope1 passing through (x, y), with endpoints lying on the lines y = ±1/2. We assumethat U is linear on this line segment, which leads us to

U(x, y) =

(1

2− y)U

(x− y − 1

2,−1

2

)+

(y +

1

2

)U

(x− y +

1

2,

1

2

)= exp(3− 2λ)

(y2 − xy +

1

4

).

Page 59: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.2. EXAMPLES 59

Summarizing, we have obtained the function uniquely determined by the conditionU(x, y) = U(−x,−y) and the formula

U(x, y) =

1 if y ≥ λ,x+ 1

x− y + λ+ 1if x− 1 + λ ≤ y < λ,

(1− x)(y − λ+ 2)

2(λ− y)if λ− 1 ≤ y < x+ λ− 1,

1− x2

exp(2(y − λ+ 1)) if x ∈ [0, 1], 1/2 ≤ y < λ− 1,

1 + x

2exp(2(−x+ y − λ+ 1)) if x ∈ [−1, 0], −1/2 ≤ y − x < λ− 1,

exp(3− 2λ)

(y2 − xy +

1

4

)if − 1/2 ≤ x− y ≤ 1/2, y ∈ [−1, 1].

The initial condition 1 reads

U(x, y) ≤ Cλ if x ∈ [−1, 1] and y ∈ 0, x,so in particular, taking x = y = 0, we get Cλ ≥ exp(3− 2λ)/4. We assume that wehave equality here.

Step 6. The above U is just a candidate for the Bellman function. Furthermore,it was constructed under the assumption that the transforming sequence takesvalues in 0, 1. The remaining part of the analysis is to verify the conditions

1 We have U(x, y) ≤ exp(3− 2λ)/4 for all x ∈ [−1, 1] and y ∈ 0, x.2 We have U(x, y) ≥ 1|y|≥1 for all (x, y) ∈ [−1, 1]× R.3 The function U is concave along any line segment of slope belonging to [0, 1],

contained in [−1, 1]× R.We leave the lengthy, but rather straightforward verication to the reader.

Having done this, we will have shown the estimate

P(|gn| ≥ λ) ≤ exp(3− 2λ)/4

under the assumption that f is bounded by 1 and g is its full transform by apredictable sequence with values in [0, 1].

Step 7. Finally, let us address the issue of the sharpness of the constant Cλ. Itis not dicult to see the structure of the (almost) extremal examples. Informallyspeaking, the corresponding pair (f, g) should start from some point (x, x) for whichwe have U(x, x) = Cλ, and then it should move along the segments of linearity ofU . This will guarantee that in the chain of inequalities

EU(fn, gn) ≤ EU(fn−1, gn−1) ≤ . . .EU(f0, g0) ≤ Cλ,we have actually a chain of equalities (or almost equalities). If in addition we ensurethat (f, g) terminates at the stopping set D, we will be able to write that for a largen we have P(|gn| ≥ 1) = EG(fn, gn) = Cλ (or rather ≈ Cλ) and we will be done.

Let us put the above idea into a rigorous framework: for simplicity, we willconsider the strict inequality λ > 3/2 only. We assume that (f, g) starts from thepoint (1/2, 1/2) and, at the rst step, jumps either to (0, 1/2) or to (1, 1/2) (withprobabilities 1/2). The latter point belongs to the stopping set (x, y) : U(x, y) =G(x, y), so we nish the evolution there (we can also argue that the martingalef must stop, since otherwise it would leave the interval [−1, 1]). On contrary, if(f1, g1) = (0, 1/2), then the movement continues. Unfortunately, there is no linesegment of slope 0 or 1, containing (0, 1/2) inside, along which U is linear. To

Page 60: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

60 3. MARTINGALE INEQUALITIES

overcome this diculty, we x a small positive δ and move (f, g) along the linesegment with endpoints (−1,−1/2) and (δ, 1/2 + δ): we expect that the lossobtained due to this non-optimal move (i.e., the dierence E(U(f2, g2)|(f1, g1) =(0, 1/2))−U(0, 1/2)) will be of order o(δ) and hence insignicant if we let δ → 0 atthe very end. Formally, on the set (f1, g1) = (0, 1/2), we assume that the randomvariable (f2, g2) takes values in the set (−1,−1/2), (δ, 1/2+δ) (the correspondingprobabilities δ/(1+δ) and 1/(1+δ) are determined by the requirement that (f, g) isa martingale). If (f2, g2) = (−1,−1/2), the movement stops; if (f2, g2) = (δ, 1 + δ),we let (f3, g3) jump to (0, 1+δ) or to (1, 1+δ). If the latter occurs, the evolution isover; if (f3, g3) = (0, 1 + δ), we move (f4, g4) along the line segment with endpoints(−1,−1/2 + δ) and (δ, 1 + 2δ), and so on.

Now suppose that δ is of special form: δ = (λ−3/2)/N for some large integerN .It is easy to see that after 2N+1 steps we have two possibilities: either f2N+1 = ±1(and we have stopped the evolution of the martingale pair), or (f2N+1, g2N+1) =(0, λ−1). If the latter occurs, we change the above scheme and let (f2N , g2N ) movealong the line segment of slope 1, with endpoints (−1, λ−2) and (1, λ). Then, afterthis 2N + 2-nd step, we ultimately stop the process.

It is clear that the martingale f just constructed takes its values in [−1, 1] andg is its full transform by the sequence with values in 0, 1. Furthermore, we have

P(g2N+2 ≥ λ) = P(f0 = 1/2, f1 = 0, f2 = δ, f3 = 0, . . . , f2N+1 = 0, f2N+2 = 1)

=1

2· 1

1 + δ· (1− δ) · 1

1 + δ· (1− δ) · . . . · (1− δ) · 1

2

=1

4

(1− δ1 + δ

)N.

If we recall that δ = (λ− 3/2)/N and let N →∞, we see that the above quantityconverges to Cλ. This proves the sharpness of the estimate and completes theanalysis.

3.3. Continuous-time analogues

Now we will discuss the possibility of extending the above approach and resultsto continuous-time setting. First we need to introduce the necessary background.Let (Ω,F ,P) be a complete probability space, which is ltered by a nondecreas-ing right-continuous family (Ft)t≥0 of sub-σ-elds of F . In addition, for technicalreasons, we will assume that F0 contains all the events of probability 0. Sup-pose that X = (Xt)t≥0, Y = (Yt)t≥0 are adapted real-valued martingales. Weassume that these processes have right-continuous paths with left limits. The sym-bol [X,X] = ([X,X]t)t≥0 will stand for the quadratic variation (square bracket) ofthe process X.

It is straightforward to see that the continuous-time counterpart of a martingaletransform is the notion of a stochastic integral:

Yt = (H ?X)t := H0X0 +

∫ t

0+

HsdXs, t ≥ 0,

where H = (Ht)t≥0 is a predictable process. If we assume in addition that H takesvalues in the set −1, 1, we get precisely the continuous-time version of full ±1-transform. To see that the above concept does generalize martingale transforms,suppose that f = (fn)n≥0 and g = (gn)n≥0 are discrete-time martingales such that

Page 61: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.3. CONTINUOUS-TIME ANALOGUES 61

g is the full transform of f by a certain predictable sequence v. Let us embed thesesequences into continuous time via the formulas

(3.22) Xt = fbtc, Yt = gbtc and Ht = vbtc

for t ≥ 0. Then we have the obvious identity Y = H ? X, proving the desiredconsistency.

To introduce the continuous-time dierential subordination, we rewrite its dis-crete version in the equivalent form:

the sequence

(n∑k=0

|dfk|2 −n∑k=0

|dgk|2)n≥0

is nonnegative and nondecreasing.

This immediately suggests the following denition.

Definition 3.2. Suppose that X, Y are two martingales. We say that Y isdierentially subordinate to X, if the process ([X,X]t − [Y, Y ]t)t≥0 is nonnegativeand nondecreasing.

Treating discrete-time martingales as continuous-time processes as in (3.22),we check, as previously, that this denition is consistent with the previous one.Furthermore, in analogy to the discrete-time setting, if Y is a stochastic integral,with respect to X, of some predictable process H taking values in [−1, 1], then Yis dierentially subordinate to X. This follows at once from the identity

[X,X]t − [Y, Y ]t = X20 (1−H2

0 ) +

∫ t

0+

(1−H2s )d[X,X]s.

Finally, we will also need the following fact from stochastic analysis, the proof ofwhich is left as an easy exercise for the reader.

Lemma 3.3. (i) For any martingale X there exists a unique continuous mar-tingale part Xc of X satisfying

[X,X]t = |X0|2 + [Xc, Xc]t +∑

0<s≤t

|∆Xs|2

for all t ≥ 0 (here ∆Xs = Xs − Xs− is the jump of X at time s). Furthermore,[Xc, Xc] is equal to [X,X]c, the pathwise continuous part of [X,X].

(ii) If X and Y are martingales, then Y is dierentially subordinate to X if andonly if Y c is dierentially subordinate to Xc, the inequality |∆Yt| ≤ |∆Xt| holdsfor all t > 0 and |Y0| ≤ |X0|.

Now we can pose a similar problem to that introduced at the beginning ofthis chapter. Namely, suppose that G : R2 → R is a given function (for technicalreasons, let us assume that it is Borel measurable) and assume we are interested inproving the inequality

(3.23) EG(Xt, Yt) ≤ 0, t ≥ 0,

under one of the above dominations. Here, from the formal reason, we need toassume that we work with processes for which the above expectation makes sense.For the sake of simplicity, we will restrict ourselves to bounded martingales X, Yand locally bounded function G, but we would like to stress that in general theseassumptions can be relaxed signicantly (depending on the problem).

Roughly speaking, any reasonable ineqiuality which holds for full martingaletransforms extends, with unchanged constants, to the continuous-time setting, even

Page 62: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

62 3. MARTINGALE INEQUALITIES

in the context of the dierential subrodination. More precisely, we will prove thefollowing statement.

Theorem 3.2. Suppose that the inequality (3.1) holds in the context of fullmartingale transforms by sequences taking values in [−1, 1] and assume furtherthat there is a continuous Bellman function U leading to this estimate.

(i) Then (3.23) is valid for stochastic integrals of predictable processes withvalues in [−1, 1].

(ii) If in addition U is convex in its second variable, then (3.23) holds trueunder the dierential subordination.

Proof. We will exploit the properties of the Bellman function U , combinedwith Itô's formula. The latter requires sucient regularity of U , so we start with astandard mollication argument. Fix a C∞ function h : R2 → [0,∞), supported onthe unit ball of R2 and satisfying

∫R2 h = 1. Given a positive parameter δ, dene

the function Uδ : R2 → R by the convolution

(3.24) Uδ(x, y) =

∫R2

U(x+ δu, y + δv)h(u, v)dudv.

Clearly, this function is of class C∞ and inherits the concavity along lines of slopebelonging to [−1, 1]. In particular, this implies that if (x, y) ∈ R2 and |k| ≤ |h|,then

(3.25) Uδxx(x, y)h2 ± 2Uδxy(x, y)hk + Uδyy(x, y)k2 ≤ 0

and

(3.26) Uδ(x+ h, y + k)− U δ(x, y)− 〈∇U δ(x, y), (h, k)〉 ≤ 0.

Now, the application of Itô's formula yields

(3.27) Uδ(Xt, Yt) = U δ(X0, Y0) + I1 + I2 + I3/2,

where

I1 =

∫ t

0+

U δx(Xs−, Ys−)dXs +

∫ t

0+

Uδy (Xs−, Ys−)dYs,

I2 =∑

0<s≤t

[Uδ(Xs, Ys)− Uδ(Xs−, Ys−)−

⟨∇Uδ(Xs−, Ys−), (∆Xs,∆Ys)

⟩],

I3 =

∫ t

0+

U δxx(Xs−, Ys−)d[X,X]cs + 2

∫ t

0+

U δxy(Xs−, Ys−)d[X,Y ]cs

+

∫ t

0+

U δyy(Xs−, Ys−)d[Y, Y ]cs.

Let us study the properties of the terms I1, I2 and I3. The rst term, treated as afunction of t, denes a centered, L2-bounded martingale: this is due to the fact thatthe processes X and Y are bounded and Uδx , U

δy are locally bounded. Consequently,

EI1 = 0. The second term is nonpositive by (3.26), since |∆Ys| ≤ |∆Xs|. To dealwith I3, we need to apply dierent arguments depending on whether we prove (i)or (ii). First, if Y is the stochastic integral, with respect to X, of some predictableprocess H taking values in [−1, 1], then

I3 =

∫ t

0+

U δxx(Xs−, Ys−) + 2U δxy(Xs−, Ys−)Hs + Uδyy(Xs−, Ys−)H2sd[X,X]s ≤ 0,

Page 63: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.4. ORTHOGONAL MARTINGALES AND CONJUGATE HARMONIC FUNCTIONS 63

by (3.25). Suppose, on the other hand, that Y is dierentially subordinate to Xand U is convex in y. The inequality (3.25) implies

Uδxx + 2|Uδxy|+ Uδyy ≤ 0 on R2,

and hence, for any (x, y) ∈ R2 and any h, k ∈ R2 (we do not require |k| ≤ |h| here),

U δxx(x, y)(h2 + k2)± 4Uδxy(x, y)hk + Uδyy(x, y)(h2 + k2) ≤ 0.

This is equivalent to saying that

Uδxx(x, y)h2 ± 2U δxy(x, y)hk + U δyy(x, y)k2 ≤Uδxx(x, y)− Uδyy(x, y)

2(h2 − k2).

Approximating I3 with discrete sums, we easily check that the above bound implies

I3 ≤∫ t

0+

U δxx(Xs−, Ys−)− Uδyy(Xs−, Ys−)

2d([X,X]cs − [Y, Y ]cs) ≤ 0,

since Uδxx ≤ 0 (see (3.25)), U δyy ≥ 0 (here we use the assumption of convexity of Uin y) and [X,X]c− [Y, Y ]c is nondecreasing (since Y c is dierentially subordinate toXc). Taking the expectation of both sides of (3.27) and putting all the above factstogether, we obtain EU δ(Xt, Yt) ≤ EU δ(X0, Y0). It remains to perform a limitingprocedure. Since U is continuous, we have U δ → U pointwise. But X, Y areassumed to be bounded, so Lebesgue's dominated convergence theorem implies

EU(Xt, Yt) ≤ EU(X0, Y0).

Applying the properties 1 and 2, we get the desired bound (3.23). The proof iscomplete.

Though we have worked with Bellman functions dened on R2, a similar proofcan be carried out if the domain of U is a strict subset of the plane. One can alsotry to relax the assumption on the convexity of U in y, appearing in the secondpart of the theorem above: for example, it is clear that it suces to guarantee thatthe expression U δxx−Uδyy is nonpositive, or apply appropriate localizing techniques.We will not go into details here and refer the reader to the literature (see thebibliographical notes at the end of this chapter).

We can summarize the above discussion on the inequalities for continuous-timemartingales as follows. If one wants to show (3.23) (say, under the assumption ofthe dierential subordination), one should rst take a look at the counterpart ofthis inequality in the discrete-time setting, in the context of ±1-transforms. Havingsuccessfully identied the corresponding Bellman function, one often realizes that itenjoys all the structural properties required to yield the validity of the more generalcontinuous-time version.

3.4. Orthogonal martingales and conjugate harmonic functions

In this section we will study pairs of processes enjoying the dierential subor-dination and the following additional structural property.

Definition 3.3. Two martingales X = (Xt)t≥0, Y = (Yt)t≥0 are said to beorthogonal, if their square bracket [X,Y ] is constant as a function of time.

Page 64: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

64 3. MARTINGALE INEQUALITIES

It follows from Lemma 3.3 and a standard polarization argument that for anymartingales X, Y we have

[X,Y ]t = X0Y0 + [Xc, Y c]t +∑

0<s≤t

∆Xs ·∆Ys.

Therefore, in particular, the orthogonality condition implies ∆Xs ·∆Ys = 0 for allt > 0. This has a very important further consequence: if Y is dierentially subor-dinate to X and the processes are orthogonal, then necessarily Y has continuoustrajectories.

We will present two examples, which will be important to our subsequent con-siderations below.

Example 3.5. Suppose that (X,Y ) is a two-dimensional Brownian motionstarting from some given point (x, y) with |y| ≤ |x|. Then Y is dierentially sub-ordinate to X and both processes are orthogonal.

The second example is very important from the viewpoint of applications toanalysis. It links the above probabilistic concepts with that of conjugate harmonicfunction (Hilbert transform). Let us discuss this issue for a while.

Example 3.6. Suppose that D is the unit disc in R2 and let u, v : D → Rbe two harmonic functions, normalized so that |v(0)| ≤ |u(0)|. We say that v isconjugate to u, if the Cauchy-Riemann equations

(3.28) ux = vy, uy = −vx

hold on D. Equivalently, this amounts to saying that the function u + iv is ananalytic function on D. Let D0 ⊂ D be a disc of center 0 and radius r < 1 and letW = (W 1,W 2) be the standard Brownian motion in R2, started at the origin, andstopped upon reaching the boundary of D0. Dene the processes X, Y by

Xt = u(Wτ∧t), Yt = v(Wτ∧t), t ≥ 0.

Clearly, X and Y are bounded processes. Actually, it follows directly from Itô'sformula that X and Y are martingales:

Xt = u(0) +

∫ τ∧t

0+

∇u(Ws) · dWs, Yt = v(0) +

∫ τ∧t

0+

∇v(Ws) · dWs.

Since

[X,X]t = u(0)2 +

∫ τ∧t

0+

|∇u(Ws)|2ds, [Y, Y ]t = v(0)2 +

∫ τ∧t

0+

|∇v(Ws)|2ds

and |v(0)| ≤ |u(0)|, we see that Y is dierentially subordinate to X (indeed: by(3.28), we have |∇u| = |∇v|). Furthermore, we have

[X,Y ]t = u(0)v(0) +

∫ τ∧t

0+

〈∇u(Ws),∇v(Ws)〉ds = u(0)v(0),

so X and Y are orthogonal. Therefore, any estimate of the form (3.23) has anappropriate counterpart in the analytic setting: indeed, if we manage to show that

EG(u(Wτ∧t), v(Wτ∧t)) = EG(Xt, Yt) ≤ 0,

Page 65: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.4. ORTHOGONAL MARTINGALES AND CONJUGATE HARMONIC FUNCTIONS 65

then letting t→∞ we obtain, by Lebesgue's dominated convergence theorem (recallthat u, v, G are bounded on D0)

1

∫ π

−πG(u(reiθ), v(reiθ))dθ = EG(u(Wτ ), v(Wτ )) ≤ 0.

So, for instance, if G(x, y) = |y|p − Cpp |x|p is the function corresponding to sharpmoment inequality for orthogonal, dierentially subordinate martingales, we obtain

||v||Lp(T) = sup0<r<1

[1

∫ π

−π|v(reiθ)|pdθ

]1/p

≤ Cp sup0<r<1

[1

∫ π

−π|u(reiθ)|pdθ

]1/p

= Cp||u||Lp(T).

(3.29)

This inequality can be expressed in terms of the so-called periodic Hilbert transformHT. This operator is a singular integral operator, acting on integrable functionsf : T→ C by the formula

HTf(eiθ) =1

2πp.v.

∫ π

−πf(eit) cot

θ − t2

dt.

More explicitly, if f(ζ) =∑n∈Z anζ

n, ζ ∈ C, then we have

HTf(ζ) = −i∑n∈Z

sgn(n)anζn, ζ ∈ T.

In other words, HT is the Fourier multiplier with the symbol m(n) = −i sgn(n).The connection between this setup and the context of the conjugate harmonic

functions is the following. Suppose that f is an integrable function on the unitcircle T. Then we can extend f to a harmonic function u on the whole disc D withthe use of the Poisson formula:

u(reiθ) =1

∫ π

−πPr(θ − t)f(eit) dt, 0 ≤ r < 1, θ ∈ (−π, π],

where Pr denotes the Poisson kernel

Pr(θ) = <(

1 + reiθ

1− reiθ

)=

1− r2

1− 2r cos θ + r2.

It turns out that u has a radial limit limr→1 u(reiθ) which is equal to f(eiθ) almosteverywhere. Now, let v be a harmonic conjugate function to u, satisfying v(0) =0. Then one can show that the radial limit limr→1− v(reiθ) exists for almost allθ ∈ (−π, π], the limit is precisely HTf(eiθ). Since 0 = |v(0)| ≤ |u(0)|, we can use(3.29), which combined with Fatou's lemma implies

||HTf ||Lp(T) ≤ Cp||f ||Lp(T).

The above analysis can be carried out in any locally connected domain D ⊂ R2.Distinguish a point ξ ∈ D and let u, v : D → R be two harmonic functions satisfying(3.28) and |v(ξ)| ≤ |u(ξ)|. Then if D0 is any bounded subdomain of D satisfyingD0 ⊂ D and µD0,ξ is the harmonic measure on ∂D0 corresponding to ξ, then theinequality (3.23) implies

(3.30)

∫∂D0

G(u, v)dµD0,ξ ≤ 0.

Page 66: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

66 3. MARTINGALE INEQUALITIES

Example 3.7. Let us continue the above considerations in the context of Lp

estimates for non-periodic Hilbert transformHR. This operator is a singular integraloperator, acting on integrable functions f : R→ R by the formula

HRf(x) =1

πp.v.

∫R

f(y)

x− ydy.

One can show that HR is the Fourier multiplier with the symbol m(ξ) = −i sgn(ξ),ξ ∈ R, that is,

HRf(ξ) = −i sgn(ξ)f(ξ), ξ ∈ R.To link this operator with the context of orthogonal harmonic functions, we setD = R × (0,∞), the upper halfplane. Suppose that f is a function belonging tothe Schwartz class S(R). We can extend it to a harmonic function u on D by thePoisson formula

u(x, y) =1

π

∫R

yf(t)

(x− t)2 + y2dt, (x, y) ∈ D.

This function is indeed an extension: one easily shows that limy↓0 u(x, y) = f(x)for all x ∈ R. Note that we can rewrite the formula for u in the form

u(x, y) = <(i

π

∫R

f(t)

z − tdz

),

where z = x+ iy. Therefore, the function

v(x, y) = =(i

π

∫R

f(t)

z − tdz

)=

1

π

∫R

f(t)(x− t)(x− t)2 + y2

dt

is conjugate to u, since the function z 7→ iπ

∫Rf(t)z−tdt is analytic on D. It can be

shown that if y ↓ 0, then v(x, y) converges to HRf(x) for almost all x. To make useof (3.29), we need an appropriate behavior of u and v at some base point ξ. To thisend, x y0 > 0, take ξ = iy0 and consider the pair u, vy0 = v − v(ξ) of harmonicfunctions. Of course, their conjugacy is preserved and we have |u(ξ)| ≥ 0 = |vy0(ξ)|.For a given ε > 0, consider the halfdisc

D0 = (x, y) : y ≥ ε, x2 + (y − ε)2 ≤ ε−1.

If ε is suciently small, then ξ belongs to D0 and hence (3.30) yields∫D0

|v|pdµD0,ξ ≤ Cpp∫D0

|u|pdµD0,ξ.

If we let ε→ 0, then µD0,ξ converges weakly to

dµξ(x) =1

π

y0

x2 + y20

,

the harmonic measure on R with respect to the haline R × (0,∞) and the pointξ. But u is bounded (which follows directly from the assumption f ∈ S(R)), soLebesgue's dominated convergence theorem and Fatou's lemma give

1

π

∫R|HRf(x)− v(iy0)|p y0

x2 + y20

dx ≤Cppπ

∫R|f(x)|p y0

x2 + y20

dx.

Multiply both sides by πy0 and let y0 → ∞. Then v(iy0) → 0 and y20/(x

2 + y20)

increases to 1 and the above bound yields ||HRf ||Lp(R) ≤ Cp||f ||Lp(R).

Page 67: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.4. ORTHOGONAL MARTINGALES AND CONJUGATE HARMONIC FUNCTIONS 67

Let us now illustrate on some examples how the Bellman function method worksin the above context. That is, suppose that G is a Borel, locally bounded functionand assume that we are interested in showing the estimate

(3.31) EG(Xt, Yt) ≤ 0, t ≥ 0,

for all bounded, orthogonal martingales X, Y such that Y is dierentially subordi-nate to X. A closer look at the reasoning presented in the previous section revealsthat it is enough to nd a continuous function U : R2 → R satisfying the followingconditions:

1 U(x, y) ≤ 0 if |y| ≤ |x|.2 U ≥ G on R2.3 U is superharmonic on R2 and concave in its rst variable.

Theorem 3.3. If U satises 1, 2 and 3, then (3.31) holds true.

Proof. We argue as in the preceding section; just a few modications arenecessary. Pick a mollifying function h : R2 → [0,∞) and dene U δ by the formula(3.24). This new function is of class C∞, so Itô's formula implies

(3.32) U δ(Xt, Yt) = Uδ(X0, Y0) + I1 + I2 + I3/2,

where (recall that Y has continuous paths)

I1 =

∫ t

0+

Uδx(Xs−, Ys)dXs +

∫ t

0+

U δy (Xs−, Ys)dYs,

I2 =∑

0<s≤t

[Uδ(Xs, Ys)− Uδ(Xs−, Ys)− Uδx(Xs−, Ys)∆Xs

],

I3 =

∫ t

0+

Uδxx(Xs−, Ys)d[X,X]cs + 2

∫ t

0+

Uδxy(Xs−, Ys)d[X,Y ]cs

+

∫ t

0+

Uδyy(Xs−, Ys)d[Y, Y ]s

=

∫ t

0+

Uδxx(Xs−, Ys)d[X,X]cs +

∫ t

0+

U δyy(Xs−, Ys)d[Y, Y ]s,

the last equation following from the orthogonality of X and Y . Now, arguing aspreviously, we have EI1 = 0. Furthermore, the function U δ inherits the property3 and hence each summand in I2 is nonpositive. Finally, since Y is dierentiallysubordinate to Xc (see Lemma 3.3), we see that

I3 ≤∫ t

0+

Uδxx(Xs−, Ys)d[Y, Y ]s +

∫ t

0+

Uδyy(Xs−, Ys)d[Y, Y ]s ≤ 0,

where in the last passage we have exploited the superharmonicity of Uδ. Puttingall the above facts together, we see that

EUδ(Xt, Yt) ≤ EUδ(X0, Y0).

Letting δ → 0 we get EU(Xt, Yt) ≤ EU(X0, Y0) (here we use the continuity of Uand the boundedness properties of X and Y ). It remains to apply 1 and 2 toobtain the desired claim.

Page 68: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

68 3. MARTINGALE INEQUALITIES

Remark 3.1. Two important observations are in order.(a) In some problems it is enough to consider only those pairs (X,Y ) of or-

thogonal dierentially subordinate martingales, for which the dominated processY starts from 0 (see the above discussion concerning conjugate harmonic functionson the unit disc). Then the inequality of the condition 1 needs to be checked forx ∈ R and y = 0.

(b) Sometimes we are interested in the inequalities for orthogonal processes X,Y satisfying |Y0| ≤ |X0| and [Y, Y ] = [X,X]. Then in 3 we only need to verify thesuperharmonicity property, the concavity along the rst variable is not necessary(indeed: this is sucient for the nonpositivity of the term I3 in the above proof).

Example 3.8. Let 1 < p < 2. We will identify the best constant Cp in theinequality

(3.33) E|Yt|p ≤ CppE|Xt|p, t ≥ 0,

where X, Y are bounded orthogonal martingales such that Y is dierentially sub-ordinate to X. We rewrite the above estimate in the form (3.31), with G(x, y) =|y|p −Cpp |x|p. As we have discussed above, we need to nd a function Up : R2 → Rsatisfying the conditions 1, 2 and 3. Clearly, we may assume that Up is symmet-ric with respect to x and y, since G has this property. It is instructive to come backto Lp estimate for dierentially subordinated martingales (without the orthogonal-ity assumption). Recall the least special function corresponding to this inequality:when 1 < p ≤ 2, then

U(x, y) =

αp(|y| − (p− 1)−1|x|)(|x|+ |y|)p−1 if |y| < (p− 1)−1|x|,|y|p − (p− 1)−p|x|p if |y| ≥ (p− 1)−1|x|,

for an appropriate αp > 0. It is natural to conjecture that the special function Upin the orthogonal case should also be of this form. So, we assume that there is athreshold γp > 0 such that

Up(x, y) > |y|p − Cpp |x|p if |y| < γp|x|and

Up(x, y) = |y|p − Cpp |x|p if |y| ≥ γp|x|.In the language of optimal stopping, we expect the continuation region to be of theform C = (x, y) : |y| < γp|x|. However, as it is always the case, on the continua-tion set the concavity part of condition 3 should be turned into linearity. Moreprecisely, this means that Up should be harmonic on C. These observations bringus much closer to the discovery of Up. On C, we write Up in polar coordinates

Up(x, y) = Rpg(θ), θ ∈ [−θp, θp],where θp = arctan γp. By the symmetry of U with respect to the variable y, we seethat g is an even function. Furthermore, the condition ∆Up = 0 on C is equivalentto saying that g′′(θ) + p2g(θ) = 0 for θ ∈ [−θp, θp]. We easily solve the dierentialequation, obtaining

g(θ) = αp cos(pθ) + βp sin(pθ),

for some unknown parameters αp, βp. Since g is an even function, we conclude thatβ = 0. Furthermore, the continuity of Up implies

αpRp cos(pθp) = Rp

[sinp θp − Cpp cosp θp

],

Page 69: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.4. ORTHOGONAL MARTINGALES AND CONJUGATE HARMONIC FUNCTIONS 69

or αp cos(pθp) = sinp θp − Cpp cosp θp. Finally, the superharmonicity of Up togetherwith the majorization 2 enforce

Upx(1−, γp) = Upx(1+, γp),

or, equivalently,

−Cpp cosp−1 θp = αp cos((p− 1)θp).

Combining this with the preceding equation, we obtain

Cpp = tanp−1 θp cot((p− 1)θp).

It is natural to conjecture that the desired value of the parameter γp is such thatthe right-hand side is minimized. Denoting the expression on the right by f(θp),we compute that

f ′(θ) =(p− 1) tanp−2 θp

2 cos2 θp sin2((p− 1)θp)

(sin(2(p− 1)θp)− sin(2θp)

).

Therefore f , considered as a function on (0, π/2), attains its minimum for θp =

π/(2p). Plugging this above, we get Cp = tan π2p , αp = − sinp−1 π

2p/ cos π2p and the

explicit formula for a candidate for the Bellman function. Now one can easily checkthat this function does satisfy the properties 1, 2 and 3.

The following observation will be useful later. Namely, using a simple localiza-tion argument, we can show that (3.33) holds true for any pair (X,Y ) such that Yis dierentially subordinate to X (i.e., we do not need to assume the boundednessof the processes). To see this, x t ≥ 0 and assume that E|Xt|p < ∞ (otherwisethere is nothing to prove). Fix a huge positive numberM and consider the stoppingtime τ = infs ≥ 0 : |Xs|+ |Ys| ≥M. Introduce the truncated martingales

XMs = Xτ∧s1τ>0, YMs = Yτ∧s1τ>0, s ≥ 0.

Clearly, X and Y are bounded. Furthermore, YM is dierentially subordinate toXM , since

[XM , XM ]s = [X,X]τ∧s1τ>0 and [YM , YM ]s = [Y, Y ]τ∧s1τ>0.

Consequently, by (3.33) for bounded processes, which we have already established,we may write

E|YMt |p ≤ tanp(π

2p

)E|XM

t |p ≤ tanp(π

2p

)E|Xt|p.

Here in the last line we have exploited Doob's optional sampling theorem appliedto the submartingale (|Xs|p)0≤s≤t. Letting M → ∞, we see that |YMt |p → |Yt|palmost surely. Therefore it suces to apply Fatou's lemma to get the desiredestimate (3.33).

It remains to check that the constant tan π2p cannot be decreased. To this

end, let κ be a xed positive number smaller than tan(π/(2p)). Consider a two-dimensional Brownian motion W = (W 1

t ,W2t )t≥0 starting from (1, 0) and let τ

be the rst exit time of W from the set E = (x, y) : |y| ≤ κx. By well-knownproperties of Brownian motion, we have τ <∞ almost surely. Since Up is harmonicon E, we get

Up(1, 0) = EUp(W 1τ∧t,W

2τ∧t) = E

[(W 1

τ∧t)pUp(1,W

2τ∧t/W

1τ∧t)

]≤ Up(1, κ)E(W 1

τ∧t)p.

Page 70: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

70 3. MARTINGALE INEQUALITIES

Here we have used the fact that Up(1, ·) is increasing on [0, θp): a little calculationshows that

Upy(1, y) = −pαpRp−1 sin((p− 1) arctan y) > 0.

Furthermore, Up(1, κ) < Up(1, γp) = 0, so the preceding estimate reads

(3.34) E(W 1τ∧t)

p ≤ Up(1, 0)/Up(1, κ).

Consider the (unbounded) martingales

Xt = W 1τ∧t, Yt = W 2

τ∧t, t ≥ 0.

Then Y is dierentially subordinate to X and both processes are orthogonal. Fur-thermore, we infer from (3.34) that X is bounded in Lp and hence, by (3.33), thesame is true for Y . Therefore,

limt→∞

E|Yt|p

E|Xt|p=

E|W 2τ |p

E|W 1τ |p

= κp,

so the best constant in (3.33) cannot be smaller than κ. It remains to observe thatκ could be taken arbitrarily close to tan(π/(2p)).

Example 3.9. Our second example concerns the sharp weak-type (1,1) esti-mate for orthogonal martingales. We will nd the best constant K in the estimate

P(|Yt| ≥ 1) ≤ KE|Xt|, t ≥ 0,

where X, Y run over the class of all orthogonal martingales such that Y is dier-entially subordinate to X and Y0 = 0.

Clearly, this inequality if of the form (3.31), with G(x, y) = 1|y|≥1 − K|x|.Let us rst assume that both X and Y are bounded. In the light of Theorem 3.3,we need to construct a function U : R2 → R satisfying the conditions 1, 2 and3. As in the case of Lp estimates, it is useful to take a look at the weak-type (1,1)inequality for dierentially subordinate martingales without the orthogonality: thecorresponding Bellman function, corresponding to G(x, y) = 1|y|≥1 − 2|x|, was

U(x, y) =

y2 − x2 if |x|+ |y| ≤ 1,

1− 2|x| if |x|+ |y| > 1.

An important feature of this object is that the continuation set C was the strip(R× (−1, 1)) \ (0, 0). A natural idea is to assume that in the orthogonal settingthe continuation set C is the same. This immediately implies that we have

U(x, y) = 1−K|x| if |y| ≥ 1

and

U(x, y) > −K|x| if |y| < 1, (x, y) 6= (0, 0).

On the continuation set the function U must be harmonic: this gives that U is theharmonic lift of G on the strip. To identify the explicit formula for this function,we translate the problem into the more friendly context with the use of conformalmappings. Specically, consider the conformal map

φ(z) = i exp(πz/2) =(eπx/2 sin(πy/2), eπx/2 cos(πy/2)

),

Page 71: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.4. ORTHOGONAL MARTINGALES AND CONJUGATE HARMONIC FUNCTIONS 71

which maps the strip S = R × (−1, 1) onto the upper halfplane H = R × (0,∞).This conformal map sends the boundary function G|∂S onto the function

f(t) = 1−K ·∣∣∣∣ 2π log |t|

∣∣∣∣on R = ∂H. Consequently, if (x, y) ∈ S, then U(x, y) = U(φ(x, y)), where U is theharmonic extension of f obtain via the Poisson formula

U(α, β) =1

π

∫R

βf(t)

(α− t)2 + β2dt = 1− K

π

∫R

β∣∣ 2π log |t|

∣∣(α− t)2 + β2

dt.

Plugging the above formula for φ and substituting t = seπx/2, we see that U admitsthe following formula

(3.35) U(x, y) =

1− K

π

∫R

cos(πy2 )∣∣ 2π log |s|+ x

∣∣(sin(πy2 ) + s

)2+ cos2(πy2 )

ds if |y| < 1,

1−K|x| if |y| ≥ 1.

To nd K, let us exploit the initial condition 1. We have

U(0, 0) = 1−K · 4

π2

∫ ∞0

| log s|s2 + 1

ds

= 1−K · 4

π2

∫ ∞−∞

|t|et

e2t + 1dt

= 1−K · 8

π2

∫ ∞0

te−t∞∑k=0

(−e−2t)kdt

= 1−K · 8

π2

∞∑k=0

(−1)k

(2k + 1)2,

which leads us to the choice

K =

(8

π2

∞∑k=0

(−1)k

(2k + 1)2

)−1

= 1.34 . . . .

Let us check that the function U dened in (3.35) indeed enjoys the properties1, 2 and 3 (since we assume that Y0 = 0, the initial condition amounts to sayingthat U(x, 0) ≤ 0: see Remark 3.1 above). Let us rst observe that for any xed y,the function x 7→ U(x, y) is concave. This is evident for |y| ≥ 1, while for remainingy it follows directly from the fact that x 7→

∣∣x+ 2π log |s|

∣∣ is convex. By harmonicityof U on the strip, we get Uyy ≥ 0 there and hence, to check 1 and 2, we need toverify that U(x, 0) ≥ −K|x|. Dierentiating, we get

|Ux(x, 0)| =

∣∣∣∣∣−Kπ∫R

sgn(

2π log |s|+ x

)s2 + 1

ds

∣∣∣∣∣ ≤ K

π

∫R

ds

s2 + 1= K,

so the aforementioned concavity of x 7→ U(x, 0) and the equality U(0, 0) = 0completes the proof of 1 and 2. Finally, to show 3, we only need to check super-harmonicity of U . Clearly, if x ∈ R and |y| 6= 1, then U is obviously superharmonicin some neighborhood of (x, y). To check the remaining points, note that

U(x, y) ≤ 1−K|x| on R× [−1, 1],

Page 72: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

72 3. MARTINGALE INEQUALITIES

because the function (x, y) 7→ 1 −K|x| is superharmonic and coincides with U onthe boundary of the strip. Therefore, if x ∈ R, then the superharmonicity in someneighborhood of (x,±1) follows directly from the superharmonicity of (x, y) 7→1−K|x| and the mean value property.

The passage from bounded to unbounded processes X, Y is performed with thesame arguments as in the Lp case above. To see that the constant K is optimal,we consider the two-dimensional Brownian motion (X,Y ) started at the origin andstopped upon leaving the strip R × (−1, 1). Then X, Y are orthogonal and Y isdierentially subordinate to X. Furthermore, U is harmonic in the interior of thestrip R× [−1, 1] (in which the pair (X,Y ) takes its values), so Itô's formula impliesthat (U(Xt, Yt))t≥0 is a martingale starting from 0. However, as we will show now,this martingale is bounded in L2. Indeed, rst note that E|Xt|2 = E|Yt|2 ≤ 1 forall t: hence X is square integrable. Furthermore, as we have already shown above,we have

−K|x| ≤ U(x, y) ≤ 1−K|x| on R× [−1, 1].

Thus (U(Xt, Yt))t≥0 is bounded in L2 and hence in particular EU(X∞, Y∞) = 0.Since |Y∞| = |W 2

τ | = 1 and U coincides with G at the boundary of the stripR× [−1, 1], this implies P(|Y∞| ≥ 1)−KE|X∞| = 0. It remains to write

limt→∞

P(|Yt| ≥ 1)

E|Xt|=

P(|Y∞| ≥ 1)

E|X∞|= K,

thus proving the desired sharpness.

3.5. Problems

1. Find the smallest Bellman function corresponding to the inequality

P(gn ≥ 1) ≤ 2E|fn|, n = 0, 1, 2, . . . .

2. (Burkholder [2], Suh [20]) Let 1 ≤ p <∞. Prove that the best constant inthe weak-type inequality

P(|gn| ≥ 1) ≤ CppE|fn|p, n = 0, 1, 2, . . . ,

is given by

Cpp =

2

Γ(p+ 1)if 1 ≤ p ≤ 2,

pp−1

2if p > 2.

3. (Burkholder [3]) Let a < b are xed real numbers. Find the best constantC in the weak-type inequality

P(|gn| ≥ 1) ≤ CE|fn|, n = 0, 1, 2, . . . ,

under the assumption that g is a transform of f by a predictable sequence withvalues in [a, b].

4. (Os¦kowski [15]) For any K > 1, nd the least constant L(K) in thelogarithmic estimate

E|gn| ≤ KE|fn| log |fn|+ L(K), n = 0, 1, 2, . . . .

5. (Os¦kowski [16]) For any 1 < p < ∞, nd the least constant Cp in theinequality

E|gn| ≤ Cp (E|fn|p)1/p, n = 0, 1, 2, . . . ,

Page 73: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

3.5. PROBLEMS 73

where f, g are martingales such that g is dierentially subordinate to f .

6. Find the best constants in the estimates

P(|Xt|+ |Yt| ≥ 1) ≤ C1E|Xt|, t ≥ 0,

P(X2t + Y 2

t ≥ 1) ≤ C2EX2t , t ≥ 0,

under the assumption that X, Y are orthogonal martingales such that Y is dier-entially subordinate to X.

7. (Tomaszewski [21]) Find the best constant C in the inequality

P(X2t + Y 2

t ≥ 1) ≤ CE|Xt|, t ≥ 0,

under the assumption that X, Y are orthogonal martingales such that Y is dier-entially subordinate to X.

8. For a given K > 0, nd the least L = L(K) such that if X, Y are orthogonalmartingales, X is nonnegative and Y is dierentially subordinate to X, then

E|Yt| ≤ KEXt logXt + L(K), t ≥ 0.

9. (Strzelecki [19]) Let X, Y be orthogonal martingales such that Y is dier-entially subordinate to X. Prove that the inequality

P(Yt ≥ 1) ≤ E|Xt|2, t ≥ 0,

is sharp. Find the least Bellman function corresponding to this estimate.

10. (Os¦kowski [17]) Let X, Y be orthogonal martingales such that Y is dif-ferentially subordinate to X. Prove that for any t ≥ 0 we have the sharp inequality

P(Yt ≥ 1) ≤ E|Xt|.

Page 74: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a
Page 75: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

CHAPTER 4

Inequalities for maximal functions

4.1. Denitions and notation

In this chapter we will study the boundedness properties of maximal operators.Such estimates play a fundamental role in various areas of mathematics, and it isoften of interest to have a sharp, or at least tight information on the constantsinvolved.

We start with a classical Euclidean setting. Let d ≥ 1 be a xed dimension.The (centered) Hardy-Littlewood maximal operatorMc acts on locally integrablefunctions f : Rd → R by the formula

Mcf(x) = sup1

|B(x, r)|

∫B(x,r)

|f(y)|dy,

where the supremum is taken over r > 0 (here B(x, r) is the ball of center x andradius x). The uncentered version of Mc, denoted by M (i.e, without the lowerindex c), is given by the same formula but this time the supremum is taken over alarger family: we consider all balls B ⊂ Rd which just contain x inside.

Actually, from the viewpoint of boundedness properties, the choice of the max-imal operator (i.e., centered or uncentered) is irrelevant, since both operators areequivalent up to a constant depending on the dimension. Specically, we haveMcf ≤ Mf (which follows trivially from the very denition) andMf ≤ 2dMcf .To show the latter bound, note that for any x ∈ Rd and any ball B ⊂ Rd of radiusr containing x we have B ⊂ B(x, 2r), so

1

|B|

∫B

|f(y)|dy ≤ 2d

|B(x, 2r)|

∫B(x,2r)

|f(y)|dy ≤Mcf(x)

and taking the supremum over all B gives the desired bound. We should alsomention here that sometimes in the denitions ofMc andM one considers cubeswith sides parallel to the axes instead of balls. Arguing as above, we get that theseobjects are essentially of the same size (pointwise), up to a constant depending onthe dimension only.

Example 4.1. Consider the following simple application. Let Φ : Rd → [0,∞)be a C1 function satisfying

∫Rd Φ dx = 1 and let (Φt)t>0 be the associated approx-

imation of identity given by Φt(x) = t−dΦ(x/t). In addition, we assume that Φ isradially decreasing, i.e., there exists a C1 nonincreasing function ϕ : [0,∞)→ [0,∞)such that Φ(x) = ϕ(|x|). Then for any locally integrable function f : Rd → R andany x ∈ Rd we have

(4.1) supt>0|(f ∗ Φt)(x)| ≤ Mcf(x).

75

Page 76: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

76 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

To see this, x t > 0 and observe that

(4.2) Φ(x) = ϕ(|x|) = −∫ ∞|x|

ϕ′(r)dr =

∫ ∞0

(−ϕ′(r))χB(0,r)(x)dr.

Consequently,

|(f ∗ Φt)(x)| =∣∣∣∣∫

Rd

∫ ∞0

f(y) · t−d(−ϕ′(r))χB(0,r)((x− y)/t)drdy

∣∣∣∣≤∫ ∞

0

∫Rd

∣∣f(y) · t−d(−ϕ′(r))χB(0,r)((x− y)/t)∣∣ dydr

=

∫ ∞0

t−d(−ϕ′(r))∫Rd

|f(y)|χB(x,rt)(y)dydr

≤∫ ∞

0

t−d(−ϕ′(r))|B(x, rt)|Mcf(x)dr

=Mcf(x)

∫ ∞0

(−ϕ′(r))|B(0, r)|dr

=Mcf(x) ·∫Rd

Φ(x)dx,

where in the last line we have exploited (4.2) and Fubini's theorem. Since theintegral of Φ is equal to 1, (4.1) follows.

Here is the further application. Suppose that 1 < p < ∞ and let f ∈ Lp(Rd).Then for Φ and (Φt)t>0 as above, we have

(4.3) f ∗ Φt → f in Lp as t→ 0.

Let us rst show the convergence for continuous functions having bounded support.Pick such a function and observe that it is uniformly continuous. Therefore, astandard argument shows that f ∗ Φt → f pointwise as t→ 0. But

|f ∗ Φt(x)− f(x)| ≤ supt>0|f ∗ Φt(x)|+ |f(x)|

and both summands on the right are in Lp: here we use (4.1) and the fact thatMc is bounded on Lp. This proves (4.3) for continuous, compactly supportedfunctions and the passage to the general case follows from a standard approximationargument.

Unfortunately, it seems that Bellman function technique does not work for theabove maximal operators (i.e., there seems to be no modication of the methodwhich would lead to sharp estimates). Roughly speaking, the reason is that givenx, the class of all balls/cubes containing x, or the class of all balls/cubes having x asits center, does not possess any ordered structure which would enable the inductiveargument which lies at the very heart of the Bellman function method. This is whywe will work with yet another maximal operator, associated with the dyadic latticein Rd. Let D denote the class of all dyadic cubes in Rd, i.e., all cubes of the form(4.4) Q = (2na1, 2

n(a1 + 1)]× (2na2, 2n(a2 + 1)]× . . . (2nad, 2n(ad + 1)],

where n, a1, a2, . . . , ad run over the set of all integers. Note that in the dyadicsetting we have the following dichotomy: for any Q1, Q2 ∈ D, the cubes are ei-ther disjoint, or one of them is contained in the other. This property induces anappropriate structure on the dyadic lattice which makes Bellman function methodapplicable.

Page 77: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.1. DEFINITIONS AND NOTATION 77

The dyadic maximal operator M acts on locally integrable functions f on Rdby the formula

Mf(x) = sup

1

|Q|

∫Q

|f(y)|dy : x ∈ Q ∈ D.

Actually, one can study the dyadic maximal operators in a yet more general contextwhich arises naturally in probability theory. Suppose that (Ω,F ,P) is a probabilityspace, equipped with some discrete-time ltration (Fn)n≥0. Given an integrablerandom variable f , let (fn)n≥0 be the associated martingale (with a slight abuse ofnotation, this martingale will again be denoted by f), given by fn = E(f |Fn) forall n. Then the maximal function of f is dened by

f∗ = supn≥0|fn|.

We will also consider its truncated version, given by f∗N = max0≤n≤N fn. To seehow the above probabilistic setup generalizes the preceding analytic setting, setΩ = (0, 1]d, F = B((0, 1]d) and let P be the Lebesgue measure on (0, 1]d. Next,for any n ≥ 0, let Fn be the σ-algebra generated by dyadic cubes contained in(0, 1]d, with volume at least 2−nd. Then (Fn)n≥0 is a nondecreasing family ofsub-σ-algebras of F . Furthermore, if f is an integrable function on (0, 1]d, then

sup1

|Q|

∫Q

|f |dx : x ∈ Q ⊆ (0, 1]d, Q dyadic

,

is precisely the maximal function of the martingale induced by the random variable|f |. Consequently, having proved any inequality for the martingale maximal func-tion, we immediately recover its version (with the same constants involved) for thelocalized dyadic maximal operator, i.e., restricted to the unit cube (0, 1]d. Nowthe use of standard dilation and translation arguments allow to extend such aninequality to the non-local dyadic maximal operator, without any change in theconstants.

Another important application of the probabilistic setup concerns the one-sidedHardy-Littlewood maximal operators on the nonnegative haline (which can beregarded as a dierent approach to the integral inequalities studied in Chapter1). Let Ω = (0, 1], F = B((0, 1]) and suppose that P is the Lebesgue measureon (0, 1]. Fix a large positive integer N and consider the nite ltration (Fn)Nn=0,where Fn is generated by the intervals (0, 1 − n/N ], (1 − n/N, 1 − (n − 1)/N ],(1 − (n − 1)/N, 1 − (n − 2)/N ], . . ., (1 − 1/N, 1]. Extend the ltration by settingFk = FN for k > N . Given an integrable and nonnegative function f on (0, 1],consider the associated martingale (fn)n≥0. This martingale evolves up to time Nand its terminal variable fN is constant on any interval of the form (k/N, (k+1)/N ]

(where k = 0, 1, 2, . . . , N − 1), equal to N∫ (k+1)/N

k/Nf there. On the other hand,

we easily compute that the maximal function f∗ of the martingale satises

f∗(x) = f∗N (x) = max0≤n≤N

fn(x) ≥ 1

dNxe/N

∫ dNxe/N0

f(u)du.

Now, if we let N →∞, then the expression on the right converges to 1x

∫ x0f(u)du,

the one-sided Hardy-Littlewood function of f evaluated at x. On the other hand,the terminal variable fN converges almost everywhere to f , in the light of Lebesgue's

Page 78: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

78 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

dierentiation theorem. In practice, this often means that if we manage to establishthe martingale inequality

EG(fN , f∗N ) ≤ 0,

then it automatically proves the analytic estimate∫ 1

0

G

(f(x),

1

x

∫ x

0

f(u)du

)≤ 0.

The above discussion justies why in our considerations below we restrict our-selves to the probabilistic setup and nonnegative martingales. We should point outthat essentially all the estimates we will establish will be sharp even in the specialanalytic context of dyadic operators.

4.2. On the method

Now we will describe the modication of Bellman function method which allowsthe study of sharp estimates for the pair (f, f∗), where f is a simple nonnegativemartingale and f∗ is its maximal function. In all cases, the simplicity assumptioncan be imposed by some standard approximation arguments and, as in the previouschapters, it will enable us to avoid technical integrability issues. The pair (f, f∗)takes values in the domain

Dom = (x, y) ∈ R : 0 ≤ x ≤ y.

Suppose that G : Dom→ R is a given function and assume that we are interestedin showing the estimate

(4.5) EG(fn, f∗n) ≤ 0, n = 0, 1, 2, . . . ,

for any pair (f, f∗) as above. The underlying idea of the approach is the sameas in the previous chapters: for a given (f, f∗), we search for a supermartingale(Un)n≥0 majorizing the sequence (G(fn, f

∗n))n≥0 and satisfying EU0 ≤ 0. Actually,

we look for a function U : Dom → R, not depending on f and g, such thatthe supermartingale is of the form Un = U(fn, gn) for all n. Then the aboverequirements for Un imply that U should satisfy the following properties.

1 (initial condition) For any x ≥ 0, we have U(x, x) ≤ 0.

2 (majorization) We have U ≥ G on Dom.

3 (concavity) For any (x, y) ∈ D and any numbers t1, t2 ≥ −x, α1, α2 ∈ (0, 1)such that α1t1 + α2t2 = 0, we have

(4.6) U(x, y) ≥ α1U(x+ t1, (x+ t1) ∨ y) + α2U(x+ t2, (x+ t2) ∨ y).

Let us study precisely the relation between the validity of (4.5) and the exis-tence of a function U possessing the above three properties.

Lemma 4.1. If U satises 1, 2 and 3, then (4.5) holds true.

Proof. This is done in the same manner as in the previous chapters. Fix apair (f, f∗). Condition 3, by standard induction, implies that if (x, y) ∈ Dom andX is a simple mean-zero random variable taking values in [−x,∞), then U(x, y) ≥

Page 79: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.2. ON THE METHOD 79

EU(x+X, (x+X) ∨ y). Applying this inequality conditionally yields

E[U(fn+1, f

∗n+1)|Fn

]= E

[U(fn+1, fn+1 ∨ f∗n)|Fn

]= E

[U(fn + dfn+1, (fn + dfn+1) ∨ f∗n)|Fn

]≤ E

[U(fn, f

∗n)|Fn

]= U(fn, f

∗n),

i.e., the desired supermartingale property. Consequently, applying 2 and then 1,we get

EG(fn, f∗n) ≤ EU(fn, f

∗n) ≤ EU(f0, f

∗0 ) = EU(f0, f0) ≤ 0

and we are done.

The implication of the above lemma can be reversed. The Bellman functionV : Dom→ R associated with the problem (4.5) is dened by

(4.7) V (x, y) = supEG(fn, f∗n ∨ y),

where the supremum is taken over all n and the class of all simple martingalesf starting from x (the use of f∗n ∨ y on the second variable guarantees that themaximal process on the second coordinate starts from y). The probability space isalso vary, unless it is assumed to be nonatomic.

Lemma 4.2. If (4.5) is valid, then the function V satises 1, 2 and 3.

Proof. The condition 1 is a direct consequence of the validity of (4.5): forany x ≥ 0 and any simple martingale f starting from x we have

EG(fn, f∗n ∨ x) = EG(fn, f

∗n) ≤ 0,

and taking the supremum over all such f gives the inequality. The majorization 2

follows by considering the constant martingale f ≡ x in the denition of V (x, y).To show 3, x (x, y) ∈ Dom and t1, t2, α1, α2 as in the statement of the condition.Let f1, f2 be simple martingales as in the denition of V (x+ t1, (x+ t1) ∨ y) andV (x + t2, (x + t2) ∨ y), respectively. Suppose that these processes are given onsome probability spaces (Ω1,F1,P1) and (Ω2,F2,P2). We splice these probabilityspaces into one triple (Ω1 ∪Ω2, σ(F1,F2), α1P1 + α2P2). Here, as in the setting ofmartingale transforms, the probability is given by

(α1P1 + α2P2)(A1 ∪A2) = α1P1(A1) + α2P2(A2)

for any A1 ∈ F1 and A2 ∈ F2. We splice the martingales f1 and f2 into onesequence f = (fn)n≥0 on the new probability space, setting f0 ≡ x and, for n ≥ 1,

fn(ω) =

f1n−1(ω) if ω ∈ Ω1,

f2n−1(ω) if ω ∈ Ω2.

Then it is easy to see that f is a simple martingale with respect to its naturalltration (Fn)n≥0: the assumption α1t1 + α2t2 = 0 implies that the rst move of

the sequence f is of martingale type (precisely: we have E(f1|F0) = Ef1 = x), andthe martingale properties of f i yield that f also enjoys this property for later steps.Furthermore, since x ≤ y, we see that

f∗n(ω) ∨ y = max0≤k≤n

fk(ω) = max1≤k≤n

fk(ω) =

(f1n−1)∗(ω) ∨ y if ω ∈ Ω1,

(f2n−1)∗(ω) ∨ y if ω ∈ Ω2.

Therefore,

V (x, y) ≥ EG(fn, f∗n ∨ y) = α1E1G(f1

n−1, (f1n−1)∗ ∨ y) + α2E2G(f2

n−1, (f2n−1)∗ ∨ y)

Page 80: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

80 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

and taking the supremum over all n and all f1, f2, we get the desired concavitycondition.

The above approach translates the problem of proving (4.5) into that of thesearch for an appropriate function U . How can we nd such an object? The rststep towards the answer to this question, in analogy to the context of martingaletransforms, lies in considering the nite-horizon version of the problem. Namely,for any n = 0, 1, 2, . . ., dene V n : Dom→ R by the formula

V n(x, y) = supEG(fn, f∗n ∨ y),

where the supremum is taken over all simple martingales f starting from x. Hereis the analogue of Theorem 3.1. As the proof requires only some minor changes, weleave it to the reader.

Theorem 4.1. The sequence (V n)n≥0 can be computed inductively from therelations V 0 = G and, for n ≥ 1,

V n(x, y) = sup

α1V

n−1(x+ t1, (x+ t1) ∨ y) + α2Vn−1(x+ t2, (x+ t2) ∨ y)

,

where the supremum is taken over all numbers α1, α2 ≥ 0, t1, t2 ∈ R such thatα1 + α2 = 1 and α1t1 + α2t2 = 0.

Furthermore, a simple limiting argument shows that V (x, y) = limn→∞ V n(x, y).This, as in the contexts of optimal stopping or martingale transforms, gives us amethod of searching for V , by trying to compute the sequence (V n)n≥0 explicitly.Unfortunately, in many cases the calculations arising during the study of such asequence are technically involved and impossible to push through. An alternativeapproach rests on trying to nd V directly, by understanding the condition 3.It is easy to see that this condition implies that for any xed y > 0, the sectionx 7→ U(x, y) must be a concave function on [0, y], simply by restricting the inequal-ity (4.6) to t1, t2 satisfying x + t1 ≤ y, x + t2 ≤ y. In addition, (4.6) imposesthe following monotonicity requirement on U on the diagonal x = y. Suppose fora moment that U extends to a C1 function on some open set D containing Dominside. Then applying (4.6) to x = y, α1 = α2 = 1/2 and t1 = −t2 = δ > 0 gives

U(x, x) ≥ 1

2U(x+ δ, x+ δ) +

1

2U(x− δ, x).

Putting all the terms on the right, dividing throughout by δ and letting δ → 0 givesthe partial dierential inequality Uy(x, x) ≤ 0.

We will continue the reasoning by studying a little more specic example. Sup-pose that Φ, Ψ are xed C1 functions on [0,∞), Ψ is strictly convex and assumethat we are interested in proving the estimate

(4.8) EΦ(f∗n) ≤ EΨ(fn), n = 0, 1, 2, . . . .

Obviously, this corresponds to the choice G(x, y) = Φ(y)−Ψ(x). Let us provide aninformal reasoning which will lead us to an appropriate candidate for the specialfunction. Let V be dened by (4.7). Introduce the continuation and the stoppingsets by

C = (x, y) ∈ Dom : V (x, y) > G(x, y),D = (x, y) ∈ Dom : V (x, y) = G(x, y).

Page 81: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.2. ON THE METHOD 81

What can be said about the shape of C and D? The rst observation is that if(x, y) ∈ D, then also (x′, y) ∈ D for x′ < x. The reason for this implication is thefollowing. Directly from the formula for V , we see that when studying V (x, y), wewant to nd a martingale f starting from x for which EΦ(f∗n ∨ y) is relatively bigwhile EΨ(fn) is relatively small. However, the process f∗n ∨ y increases only if itgets on the diagonal (z, z) : z ≥ y; otherwise it remains constant. On the otherhand, since the function Ψ is strictly convex, the expectation EΨ(fn) increases ateach step. This suggests that if the point (x, y) is too far from the diagonal, thenit is not benecial to send the process in the neighborhood of this set, since thepotential gain obtained from the increase of EΦ(f∗n) would be totally consumed bythe increase of EΨ(fn). Therefore, if (x, y) ∈ D (the point (x, y) is already toofar from the diagonal), then the whole set [0, x] × y should be contained in thestopping set.

The further observation is the following. As usual, we expect that on thecontinuation set, the concavity condition is actually the linearity condition. Thatis (we pass from the letter V to U , as usual), if (x, y) ∈ C and x < y, then U shouldbe linear along some small line segment (x− δ, x+ δ)×y, and if (x, x) ∈ C, thenwe expect the equality Uy(x, x) = 0 (or perhaps rather Uy(x, x+) = 0; since (x, x)lies on the diagonal, we should treat the partial derivative as the one-sided object).However, if Φ is strictly increasing (which is very often the case), then the wholediagonal x = y lies in C: indeed, if for some x we have U(x, x) = G(x, x), then forany δ ∈ [0, x] we may write

G(x, x) = U(x, x) ≥ 1

2(U(x− δ, x) + U(x+ δ, x+ δ))

≥ 1

2(G(x− δ, x) +G(x+ δ, x+ δ)) ,

which by a similar limiting argument as above gives Gy(x, x+) = Φ′(x) ≤ 0, acontradiction.

Putting all the above facts together, we arrive at the following general methodof handling (4.8). There should be a curve γ : [0,∞)→ [0,∞) satisfying γ(y) < y,which is a boundary between C and D: see Figure 4.2 below. That is, we expect

Figure 4.1. The structure of the continuation and the stopping regions.

Page 82: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

82 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

that the continuation and the stopping regions should be of the form

C = (x, y) ∈ D : x > γ(y), D = (x, y) ∈ D : x ≤ γ(y).

We put U(x, y) = G(x, y) onD. To nd V on C, we rst note that for any y > 0such that γ(y) > 0 we must have Ux(γ(y), y) = Gx(γ(y), y). Indeed, otherwisethe concavity along the variable x or the majorization 2 would not be satisedat some point lying in the neighborhood of (γ(y), y). Consequently, combiningthis observation with the linearity of V (·, y) on the continuation set, we see thatnecessarily

U(γ(y) + t, y) = G(γ(y), y) +Gx(γ(y), y)t

for t ∈ [0, y − γ(y)]. Now the inequality Uy(x, x+) implies the dierential equationfor γ. In many cases this can be solved explicitly and we obtain the candidate forthe special function: if it satises all the required conditions, then (4.8) follows.

4.3. Examples

Example 4.2. Let 1 ≤ p <∞. We will nd the best constant in the weak type(p, p) inequality

(4.9) P(f∗n ≥ 1) ≤ cpEfpn, n = 0, 1, 2, . . . ,

where f = (fn)n≥0 is a simple nonnegative martingale. We let G(x, y) = 1y≥1 −cpxp and write down the formula for the Bellman function

V (x, y) = supP(f∗n ∨ y ≥ 1)− cpEfpn

,

the supremum taken over all simple nonnegative martingales f starting from x. Ify ≥ 1, then the optimal choice is to take the constant martingale: then Efpn is theleast, while P(f∗n ∨ y ≥ 1) = 1 is the largest possible. This shows that

V (x, y) = 1− cpxp if y ≥ 1.

On the other hand, if y < 1, then we may write

V (x, y) = supP(f∗n ≥ 1)− cpEfpn

,

and hence V (x, y) depends only on x variable: V (x, y) = v(x) for some v : [0, 1)→R, with v(0) = 0 and, by the expected continuity of V , v(1−) = 1 − cp. In thelight of the above discussion, we expect the diagonal (x, x) : x > 0 to lie in thecontinuation region; this implies v(x) > −cpxp for x ∈ (0, 1). This in turn leads tothe estimate V (x, y) > G(x, y) for all 0 < x ≤ y < 1 and consequently, v must belinear: v(x) = ax for some a. By the initial condition, we must have a ≤ 0; on theother hand, the majorization 2 enforces a ≥ 0. Hence v ≡ 0 and c = 1: puttingall the facts together, we obtain the candidate

U(x, y) =

0 if y < 1,

1− xp if y ≥ 1.

It is very easy to check that U enjoys the conditions 1, 2 and 3, so (4.9) holdswith the constant 1. Obviously, this choice is optimal, which can be immediatelyveried by considering the constant martingale f ≡ 1.

Page 83: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.3. EXAMPLES 83

Example 4.3. Now we will nd the best constant in Doob's inequality

(4.10) E(f∗n)p ≤ CppEfpn, n = 0, 1, 2, . . . ,

where 1 < p < ∞ is xed. This inequality corresponds to the choice G(x, y) =yp − Cppxp and the associated Bellman function is given by

V (x, y) = supE

(f∗n ∨ y)p − Cppfpn,

where the supremum is taken over all n and all simple martingales starting from x.Using a standard homogenization argument, we see that

V (λx, λy) = λpV (x, y)

for any λ > 0 and any (x, y) ∈ Dom. Consequently, we may write V (x, y) =ypv(x/y), for some unknown function v : [0, 1]→ R to be found. Clearly, v(0) = 1.Furthermore, directly from the above abstract reasoning, we expect the existenceof some number γ0 ∈ (0, 1) such that

v(x) =

1− Cppxp ifx ∈ [0, γ0],

1− Cppγp0 − pCppγ

p−10 (x− γ0) if x ∈ [γ0, 1].

The requirement Vy(1, 1) = 0 is equivalent to the equality pv(1) − v′(1) = 0 or,after some manipulations,

Cpp =1

(p− 1)(γp−10 − γp0 )

.

The right-hand side, considered as a function of γ0, attains its minimal value equalto (p/(p − 1))p at the point γ0 = (p − 1)/p. This suggests taking Cp = p/(p − 1)and the special function

U(x, y) =

yp −

(p

p− 1

)pxp if x ≤ p− 1

py,

pyp−1

(y − p

p− 1x

)if x >

p− 1

py.

One can verify readily that U has the desired properties and hence the inequality(4.10) holds with the constant p/(p − 1). Suppose that the best constant is equalto Cp and let V be the Bellman function associated with this sharp estimate. Fixδ > 0 and use 3 with x = y = 1, t1 = δ, t2 = −1/p and α1 = 1− α2 = (1 + pδ)−1

to get

V (1, 1) ≥ 1

1 + pδV (1 + δ, 1 + δ) +

1 + pδV (1− 1/p, 1)

≥ (1 + δ)p

1 + pδV (1, 1) +

1 + pδG(1− 1/p, 1),

(4.11)

where in the last passage we have used the homogeneity of V and the majorizationproperty 2. (The reason why we have chosen these particular constants is givenat the end.) This estimate can be rewritten in the equivalent form

1 + pδ − (1 + δ)p

pδV (1, 1) ≥ 1− Cpp

(p− 1

p

)p.

Letting δ → 0 makes the left hand side vanish; this means Cp ≥ p/(p− 1) and weare done.

Page 84: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

84 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

It remains to comment on the parameters which have led us to (4.11). Thischoice actually encodes the move of the almost-optimal pair (f, f∗) starting from(1, 1). It follows from the previous considerations that the continuation set is ex-pected to be (x, y) : p−1

p y ≤ x ≤ y. Therefore, a natural candidate for the pair

(f, f∗) is the following: in the rst step, either go to the boundary between thestopping and the continuation set, i.e., to ((p − 1)/p, 1), or move a little bit alongthe diagonal (i.e., jump to the point (1 + δ, 1 + δ)). If the rst possibility occurs(which happens with probability α2), then stop; otherwise, continue using the samepattern (i.e., go to ((1 + δ)2, (1 + δ)2) or to

((1 + δ)(p − 1)/p, 1 + δ

), and so on).

Then the stopping when reaching ∂C∩∂D is reected in (4.11) by replacing V withG, while the phrase continue using the same pattern is precisely the replacementof V (1 + δ, 1 + δ) with (1 + δ)pV (1, 1).

4.4. An alternative approach: Carleson sequences

There is a dierent method, also based on the construction of a certain spe-cial function, which can be used to study maximal estimates. Suppose that Φ,Ψ : [0,∞) → [0,∞) are given two nondecreasing functions and suppose we areinterested in showing the estimate

(4.12) EΦ(f∗n) ≤ EΨ(fn), n = 0, 1, 2, . . . ,

for any n and any nonnegative simple martingale f . We may and will assume thatthe ltration has the following tree-type structure:

(i) F0 = ∅,Ω.(ii) For any k = 0, 1, 2, . . . and any atom A of Fk with P(A) > 0, there is a

nite collection A1, A2, . . ., Am, m ≥ 2, of pairwise disjoint elements ofFk+1 such that P(Aj) > 0 and

⋃mj=1Aj = A.

The second condition above simply asserts that any nontrivial atom of anyFk is split, in Fk+1, into at least two atoms of positive measure. Let At be thecollection of all the atoms of the algebras F0, F1, F2, . . ..

Consider a function U : [0,∞) × [0, 1] → R which satises the following prop-erties.

1 We have U(x, 1) ≤ 0 for any x ≥ 0.

2 We have U(x, 0) ≥ −Ψ(x) for all x ≥ 0.

3 For any (x, y) ∈ [0,∞)× [0, 1], any α1, α2 ∈ (0, 1) and any x1, x2, y1, y2 ∈[0, 1] such that α1 + α2 = 1, α1x1 + α2x2 = x and α1y1 + α2y2 ≤ y we have

(4.13) U(x, y) ≥ α1U(x1, y1) + α2U(x2, y2) + Φ(x)(y − α1y1 − α2y2).

In particular, assuming y = α1y1 + α2y2 in 3, we see that U must be concaveon [0,∞)× [0, 1]. Furthermore, the inequality (4.13) implies that if x ≥ 0, y ∈ [0, 1],and (αk)mk=1 ⊂ (0, 1), (xk)mk=1 ⊂ [0,∞) and (yk)mk=1 satisfy

m∑k=1

αk = 1,

m∑k=1

αkxk = x,

m∑k=1

αkyk ≤ y,

then

(4.14) U(x, y) ≥m∑k=1

αkU(xk, yk) + Φ(x)

(y −

m∑k=1

αkyk

).

Page 85: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.4. AN ALTERNATIVE APPROACH: CARLESON SEQUENCES 85

To see this, apply rst (4.13) to the weights α1, α′2 = α2 + α3 + . . .+ αm and the

points

x1, x′2 =α2x2 + . . .+ αmxmα2 + . . .+ αm

, y1, y′2 =α2y2 + . . .+ αmymα2 + . . .+ αm

.

As the result, one gets

U(x, y) ≥ α1U(x1, y1) + α′2U(x′2, y′2) + Φ(x)

(y −

m∑k=1

αkyk

)

and it suces to combine this with the estimate

U(x′2, y′2) ≥

m∑k=2

αkα′2U(xk, yk),

which follows from the concavity of U .The relation between the existence of the function U and the validity of (4.12)

is described in the two lemmas below.

Lemma 4.3. If there is U satisfying the properties 1, 2 and 3, then (4.12)holds true.

Proof. As usual, we will show that the composition of U with certain processesyields a supermartingale-type process. Fix a martingale f = (fn)n≥0 and x anonnegative integer n; we will prove the estimate

EΦ(f∗n) ≤ EΨ(fn).

We may modify (fk)k≥0 by setting fk = fn for k ≥ n; this will not aect theabove inequality. The starting point is the following simple observation: sincef∗n = max0≤k≤n fk, for each ω ∈ Ω there is an integer m = m(ω) ≤ n and an atomA = A(ω) of Fm such that

f∗n(ω) = fm(ω) =1

P(A)

∫A

fdP.

Of course, such anmmay not be unique; in such a case, we takem(ω) to be smallestpossible. For any A ∈ At, let

E(A) = ω ∈ Ω : A(ω) = A.

By the very denition, we have E(A) ⊆ A, the sets (E(A))A∈At are pairwise disjointand their union is the full Ω. Furthermore, since we consider the maximal functionf∗n up to time n, we must have E(A) = ∅ if A is an atom of Fk, k ≥ n+ 1. Denean adapted sequence g = (gk)k≥0 of random variables given as follows: for any kand any atom A of Fk,

gk|A =1

P(A)

∑B⊆A,B∈At

P(E(B)).

Page 86: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

86 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

This sequence is a nonnegative supermartingale. Indeed, if k ≥ 0, A is an atom ofFk and A1, A2, . . ., Am are its children from Fk+1, then

E(gk+1|A) =

m∑j=1

P(Aj)

P(A)· 1

P(Aj)

∑B⊆Aj , B∈At

P(E(B))

=1

P(A)

∑B(A,B∈At

P(E(B)) = gk|A −P(E(A))

P(A)≤ gk|A,

so the supermartingale property is veried. Now, keeping the above notation, weapply (4.14) with

αj =P(Aj)

P(A), x = fk|A, y = gk|A, xj = fk+1|Aj , yj = gk+1|Aj

for j = 1, 2, . . . , m. As the result, we get an estimate equivalent to

(4.15)

∫A

U(fk, gk)dP ≥∫A

E[U(fk+1, gk+1)|A

]dP +

∫E(A)

Φ(fk)dP,

or ∫A

U(fk, gk)dP ≥∫A

U(fk+1, gk+1)dP +

∫E(A)

Φ(f∗n)dP,

by the very denition of E(A). Summing over all atoms A of Fk, we get

EU(fk, gk) ≥ EU(fk+1, gk+1) +∑

A atom of Fk

∫E(A)

Φ(f∗n)dP.

Write such inequalities for k = 0, 1, 2, . . . , n and add them. As the result, weobtain

EU(f0, g0) ≥ EU(fn+1, gn+1)− EΦ(f∗n).

However, we have g0 = 1, gn+1 = 0 and fn+1 = fn. Therefore, the application of1 and 2 gives

(4.16) 0 ≥ −EΨ(fn) + EΦ(f∗n),

which is the desired estimate.

Remark 4.1. If we do not assume the validity of the condition 1, the aboveapproach yields the estimate

EΦ(f∗n) ≤ EΨ(fn) + U(Efn, 1).

This can be used, for example, in the study of the estimates where the controllingterm depends only on the rst moment of f : see e.g. Exercise 4.7.

Here is the result in the reverse direction.

Lemma 4.4. If (4.12) is valid, then there is a function U satisfying the condi-tions 1, 2 and 3.

Proof. For any x ≥ 0 and y ∈ [0, 1], set

(4.17) V (x, y) = sup

∫A

Φ(f∗n)dP− EΨ(fn)

,

where the supremum is taken over all n, all simple nonnegative martingales f =(fn)n≥0 starting from x and all measurable sets A with P(A) = y. Then thecondition 1 is a direct consequence of (4.12), while 2 follows by considering the

Page 87: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.4. AN ALTERNATIVE APPROACH: CARLESON SEQUENCES 87

constant martingale f ≡ x. It remains to establish the third property. Fix the pa-rameters as in the statement and let A1, A2, f (1), f (2) be arbitrary sets and simplepositive martingales as in the denition of V (x1, y1) and V (x2, y2), respectively.We may assume that these processes are given on some disjoint probability spaces(Ω1,F1,P1) and (Ω2,F2,P2), equipped with some ltrations (F1

n)n≥0, (F2n)n≥0. As

usual, we splice these two spaces into one: Ω = Ω1 ∪Ω2, F = σ(F1,F2), while theprobability is given by P(A1 ∪A2) = α1P(A1) + α2P2(A2) if Ai ∈ F i, i = 1, 2. Wealso splice the ltration, setting F0 = ∅,Ω and Fn = σ(F1

n−1,F2n−1). Then the

process f = (fn)n≥0, given by

fn(ω) =

x if n = 0,

f(1)n−1(ω) if ω ∈ Ω1,

f(2)n−1(ω) if ω ∈ Ω2,

is an adapted martingale. Consider any subset A ∈ F with P(A) = y, such thatA1 ∪A2 ⊆ A. Then for any n ≥ 1,

V (x, y) ≥∫A

Φ(f∗n)dP− EΨ(fn)

=

∫A\(A1∪A2)

Φ(f∗n)dP +

2∑i=1

[∫Ai

Φ(f∗n)dP−∫

Ωi

Ψ(fn)dP]

≥ P(A \ (A1 ∪A2)) · Φ(x) +

2∑i=1

αi

[∫Ai

Φ(f(i)∗n−1)dPi −

∫Ωi

Ψ(f(i)n−1)dPi

].

Taking the supremum over all n ≥ 1, all Ai and all f (i) as above, we get the desiredclaim.

Thus we have reduced the problem of proving (4.12) to that of nding a Bellmanfunction U satisfying the conditions 1, 2 and 3. As usual, a successful treatmentof this approach requires the understanding of the concavity-type condition 3. Togain some intuition about this property, let us assume that the function U is ofclass C2. We have already mentioned above that the choice of y = α1y1 + α2y2

in (4.13) implies that U must be concave on R× [0, 1]. Consequently, the Hessianmatrix

D2U =

[Uxx UxyUxy Uyy

]must be nonpositive-denite. The second observation is that if we take x = x1 = x2

and y1 = y2 = y−δ ≥ 0, the inequality (4.13) becomes U(x, y)−U(x, y−δ) ≥ Φ(x)δand hence in particular Uy(x, y) ≥ Φ(x). It turns out that these implications canbe reversed. Let us state this as a separate lemma.

Lemma 4.5. Suppose that U is of class C2 and its Hessian matrix is nonpositive-denite. If Uy(x, 1) ≥ Φ(x) for any x, then (4.13) is satised.

Proof. Fix the parameters x, y, αi, xi and yi as in the statement of 3. Theassumption on the Hessian matrix implies the concavity of U and hence

α1U(x1, y1) + α2U(x2, y2) ≤ U(x, α1y1 + α2y2).

Page 88: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

88 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

Furthermore, the function z 7→ U(x, z) is also concave, so Uy(x, z) ≥ Uy(x, 1) ≥Φ(x) for all z ∈ (0, 1). Therefore,

U(x, α1y1 + α2y2)− U(x, y) = −∫ y

α1y1+α2y2

Uy(x, z)dz ≤ −Φ(x)(y − α1y1 − α2y2),

and we are done.

4.5. On solutions to MongeAmpère equation in planar domains

As we have seen in Lemma 4.5 above, a successful treatment of maximal equa-tions can be reduced to nding a function which satises the appropriate dierentialinequalities. As it is usually the case, when studying the optimal constants involved,one actually searches for those functions for which these dierential inequalities ac-tually become equalities. This, in particular, leads us to the question about thestructure of the solutions to the equation

(4.18) detD2U(x, y) = Uxx(x, y)Uyy(x, y)− (Uxy(x, y))2 = 0,

the so-called Monge-Ampère equation. We will establish the following statement.

Theorem 4.2. Suppose that D is an open connected subset of R2 and assumethat U : D → R is a C2 function satisfying (4.18) such that its Hessian matrix isnon-degenerate for each (x, y) ∈ D. Then the domain D can be foliated, i.e., it canbe written as a union of pairwise disjoint line segments along which the functionU is linear. Furthermore, the partial derivatives of U are constant on the leaves ofthe foliation.

Proof. Fix a point x ∈ D. For each point (x, y) belonging to some neighbor-hood V of x there is a nonzero vector v = v(x, y) such that D2U(x, y)v = 0; we canassume that v forms a continuous vector eld on V. Let ϕ = (ϕ1, ϕ2) be the integralcurve of this vector eld, passing through (x, y): that is, ϕ is a function dened onsome interval I containing 0 such that ϕ(0) = (x, y) and D2U(ϕ(t))ϕ′(t) = 0 foreach t ∈ I. Observe that Ux and Uy are constant along ϕ: we have

(4.19)d

dtUx(ϕ(t)) = Uxx(ϕ(t))ϕ′1(t) + Uxy(ϕ(t))ϕ′2(t) = (D2U(ϕ(t))ϕ′(t))1 = 0

and

d

dtUy(ϕ(t)) = Uxy(ϕ(t))ϕ′1(t) + Uyy(ϕ(t))ϕ′2(t) = (D2U(ϕ(t))ϕ′(t))2 = 0.

Consequently, there is a function F such that Uy(x, y) = F (Ux(x, y)). A directdierentiation yields

Uxy(x, y) = F ′(Ux(x, y))Uxx(x, y),

which gives that on any integral curve of v the ratio Uxy/Uxx is constant. Hence,by (4.19), the integral curves must be line segments. A similar calculation showsthat the function (x, y) 7→ U(x, y) − xUx(x, y) − yUy(x, y) is also constant alongϕ. Indeed, if ξ(t) = U(ϕ(t))− ϕ1(t)Ux(ϕ(t))− ϕ2(t)Uy(ϕ(t)), then we easily checkthat

ξ′(t) = −ϕ1(t) · ddtUx(ϕ(t))− ϕ2(t) · d

dtUy(ϕ(t)) = 0.

Thus we have proved that

U(x, y) = Ux(x, y)x+ Uy(x, y)y + c(x, y),

Page 89: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.6. EXAMPLES 89

where c is some function which is constant on any integral curve of the vector eldv. This is precisely the desired linearity property of U .

4.6. Examples

4.6.1. Doob's inequality revisited. Fix 1 < p <∞ and let us see how theabove approach yields the best constant Cp in the estimate

E(f∗n)p ≤ CppE(fn)p,

under the assumption that f is a nonnegative, simple martingale. We have Φ(x) =xp and Ψ(x) = Cppx

p, and we need to nd a function U on [0,∞)× [0, 1] satisfyingthe conditions 1, 2 and 3. In this case, we will not need to exploit the structuralproperties of the solutions to Monge-Ampère equation studied in the precedingsection. Instead, directly from the proof of Lemma 4.4, we infer that we maysearch for U of the form

U(x, y) = xpϕ(y),

for some ϕ : [0, 1] → R to be found. Assuming that U is of class C2, we applyLemma 4.5. The inequality Uy(x, 1) ≥ Φ(x) becomes

(4.20) ϕ′(1) ≥ 1.

Furthermore, the Hessian matrix is[p(p− 1)xp−2ϕ(y) pxp−1ϕ′(y)

pxp−1ϕ′(y) xpϕ′′(y)

],

so, by Sylvester's criterion, we need ϕ to be concave and satisfy

(p− 1)ϕϕ′′ ≥ p(ϕ′)2.

It is natural to expect that we actually have equality here: solving this equation,we get

ϕ(y) =α

(y + β)p−1,

for some α ∈ R, β > 0 to be found. Then the inequality (4.20) gives

−(p− 1)α(1 + β)−p ≥ 1,

while the initial and the majorization conditions 1, 2 yield

α(1 + β)1−p = ϕ(1) ≤ 0, αβ1−p = ϕ(0) ≥ −Cpp .Combining these properties, we obtain

Cpp ≥ −αβ1−p ≥ (β + 1)p

(p− 1)βp−1.

The right-hand side, considered as a function of β > 0 attains its minimal value(p/(p− 1))p for β = p− 1. Thus, if we take α = −pp/(p− 1) and β = p− 1, thenthe above ϕ leads to the Bellman function

U(x, y) = − ppxp

(p− 1)(y + p− 1)p−1,

which gives Doob's estimate with the constant Cp = p/(p− 1).

It is very instructive to study how to extract extremal examples from the abovefunction. The starting point is the proof of Lemma 4.3. If we want to obtainan (almost) equality in (4.16), we need to make sure that all the intermediateinequalities actually become (almost) equalities. The key property of the sequence

Page 90: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

90 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

(fn, gn) considered there is that it must evolve along the linearity regions of U :this will guarantee that in (4.15) both sides will be (almost) equal. Let us be moreprecise here: we will construct an appropriate ltration and an adapted martingaleon the probability space ([0, 1],F , | · |). We study the sharpness of the estimate

E(f∗n)p ≤(

p

p− 1

)pEfpn,

in which (f∗n)p is integrated over Ω, the set of full measure; thus we take g0 ≡ 1.Since U(f0, g0) = U(f0, 1) ≤ 0 no matter what f0 is, we can take f0 to be an arbi-trary number; let us set f0 ≡ 1, for example. To construct the further variables fmand gm, we must split the space appropriately; the information about this splittingis encoded in the function U . The general rule is the following: suppose that wehave successfully constructed the variables fk and gk, and we aim at constructingfk+1 and gk+1 at some atom A of Fk. Let x = fk|A and y = gk|A.

Step 1. The linearity paths of U . First let us nd parameters α1, α2, x1, x2,y1, y2 for which both sides of (4.13) are equal, or almost equal. If y = 0, thenwe set fm = fk and gm = 0 for all m ≥ k, and the construction is over. So,suppose that y > 0. Observe that U is linear along any line segment of the formIa = (s, t) : s = a(t + p − 1), where a is an arbitrary positive parameter. So, if

Figure 4.2. The strip [0,∞) × [0, 1] is foliated, i.e., split intothe line segments Ia = (s, t) : s = a(t + p − 1), t ∈ [0, 1], a ≥ 0,along which U is linear.

y < 1, then we can take for (x1, y1) and (x2, y2) the endpoints of the line segmentIa passing through (x, y), and for α1, α2 the weights for which

α1(x1, y1) + α2(x2, y2) = (x, y).

On contrary, if y = 1, then such a maneuver is impossible, since then (x, y) isalready the endpoint of some Ia. However, it follows from the above constructionthat Uy(x, 1) = xp, so if we take x1 = x2 = x, y1 = y2 = 1 − δ for some smallpositive δ and α1 = 1, α2 = 0, then

U(x, 1) ≈ U(x, 1− δ) + xpδ.

Here we make an error of size o(δ); the sum of such terms will be irrelevant as itwill go to 0 when we let δ → 0. See below.

Step 2. Splitting. Having found the parameters x1, x2, y1, y2 and α1, α2, wesplit A = A1 ∪A2 so that P(Ai) = αiP(A) (if α2 = 0, then take A2 = ∅). Next, setfk+1|A1

= x1, fk+1|A2= x2, gk+1|A1

= y1 and gk+1|A2= y2. After performing this

Page 91: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.6. EXAMPLES 91

construction for all A ∈ Fk, we obtain the variables fk+1, gk+1 and the σ-algebraFk+1 generated by them.

Summarizing, the construction is the following. We start from the point (1, 0)and then move along the leaves of the foliation so that the moves are of martingaletype. Furthermore, the points of the form (x, 0) are absorbing - the evolution stopshere; the points of the form (x, 1) lead to the point (x, 1 − δ) (where δ is a smallparameter sent eventually to 0). Then the x-variable of the obtained process isprecisely the extremal or almost extremal martingale we search for.

This construction can be expressed in a very convenient form in the Markovianlanguage. The process (f, g) is a Markov process on the state space [0,∞) × [0, 1]with a distribution uniquely determined by the following conditions.

(i) We have (f0, g0) ≡ (1, 1).(ii) Any state of the form (x, 1) leads to (x, 1− δ).(iii) Any state of the form (x, y) with y ∈ (0, 1) leads to the endpoints of the

line segment Ia passing through (x, y), i.e., to one of the points((p− 1)x

y + p− 1, 0

),

(px

y + p− 1, 1

)with probabilities 1− y and y, respectively.

(iv) The states of the form (x, 0) are absorbing.

Step 3. Final calculation. Now, directly from the above construction, we seethat the process (fn) evolves only at even moves. It starts from 1 and then it jumps(at time 2) to (p− 1)/(p− δ) or to p/(p− δ). If the rst possibility occurs, then theprocess stops (and then f∗ = 1). In the second case, the evolution continues: theprocess goes (at time 4) to p

p−δ ·p−1p−δ (and stays there forever; then f∗ = p/(p− δ)),

or goes to (p/(p−δ))2, and the pattern of the movement is repeated. Summarizing,we see that the random variable f2n takes values in the set

p− 1

p− δ,

p

p− δ· p− 1

p− δ,

(p

p− δ

)2p− 1

p− δ, . . . ,

(p

p− δ

)n−1p− 1

p− δ,

(p

p− δ

)nand

P

(f2n =

(p

p− δ

)kp− 1

p− δ

)= (1− δ)kδ, k = 0, 1, 2, . . . , n− 1.

Furthermore, we have f∗2n =(

pp−δ

)k= p−δ

p−1f2n on the set where f2n =(

pp−δ

)kp−1p−δ ,

k = 0, 1, 2, . . . , n− 1, and f∗2n =(

pp−δ

)nwhen f2n =

(pp−δ

)n. Consequently,

E(f∗2n)p ≥n−1∑k=0

(p

p− δ

)pk(1− δ)kδ

=

(p− δp− 1

)pEfp2n −

(p− δp− 1

)p(p

p− δ

)pn(1− δ)n.

(4.21)

We easily check that the function f(δ) =(

pp−δ

)p(1 − δ) satises f ′(0) = 0 and

f ′′(0) = (1 − p)/p < 0, so it is negative for arguments suciently close to 0. Picksuch a δ and let n → ∞ to see that the last term in (4.21) tends to 0 as n → ∞.Since Efp2n ≥ Efp0 = 1, we see that the ratio ||f∗2n||p/||f2n||p can be made larger

Page 92: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

92 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

than (p− δ)/(p− 1)− δ if n is suciently big. Since δ can be as small as we wish,the constant p/(p− 1) is indeed the best.

4.6.2. Weak-type estimates revisited. Suppose that 1 ≤ p < ∞ and letus try to apply the approach to the study of

P(f∗n ≥ 1) ≤ cppEfpn, n = 0, 1, 2, . . . ,

where, as usual, we assume that f = (fn)n≥0 is a simple and nonnegative martin-gale. We will actually restrict ourselves to the case p > 1: the boundary case p = 1follows from a simple limiting argument.

The inequality is of the form (4.12), with Φ(x) = 1x≥1 and Ψ(x) = cpxp,and hence we need to construct a function U which satises the conditions 1, 2

and 3. The condition 2 gives U(x, 0) ≥ −cxp. Let us now try to exploit thethird condition. To this end, we will apply Lemma 4.5: assume for a moment thatU is of class C2 (as we shall see, the function we will end up with will not havethis property). The concavity condition will be satised if the Hessian matrix ofU is nonpositive-denite and Uy(x, 1) ≥ 1x≥1. Let us suppose that both theserequirements hold true with equality, so

detD2U = 0 on (0,∞)× (0, 1),(4.22)

Uy(x, 1) = 1x≥1 for x ≥ 0.(4.23)

We will also assume that the partial derivatives Ux and Uy can be extended tocontinuous functions on ([0,∞)× [0, 1]) \ 1, 1 (the reason for the exclusion of thepoint (1, 1) is evident, see (4.23)). By the argumentation from the previous section,the set [0,∞) × [0, 1] can be foliated, i.e., written as a union of pairwise disjointline segments, along which U is linear. Furthermore, the partial derivatives of Uare constant along the line segments of the foliation. Let us now split the furtherreasoning into a few intermediate steps.

Step 1. Special parts of the domain. Directly from the abstract formula (4.17),we see that we can take U(x, y) = y − cppx

p when x ≥ 1, since this is true forthe optimal Bellman function V . A similar argument shows that we may assumethat U(0, y) = 0 for all y ∈ [0, 1] and U(x, 0) = −cppxp for all x ≥ 0. Thus, theset (0 × [1,∞))× [0, 1] is foliated into the line segments (x × [0, 1])x∈0×[1,∞)

and obviously, along each such segment, the function U is linear and the partialderivatives of U are constant. Note that by 1, we must have 1− cpp = U(1, 1) ≤ 0,so cp ≥ 1.

Step 2. The case cp = 1. Under this assumption, we have U(1, 1) = 0 andhence, by concavity of U , the equality U(x, 1) = 0 must hold for all x ∈ (0, 1) aswell. Take such an x and consider the line segment I of the foliation having (x, 1)as one of its endpoints. Then the second endpoint of I cannot be of the form (x′, 0)for some x′ > 0. Indeed, by Theorem 4.2, we have Uy(x) = 0 for any point x lyingon the left of I. This in particular would mean that the function y 7→ U(x′, y)would be constant: a contradiction, since U(x′, 0) = −(x′)p and U(x′, 1) = 0.

Thus, the second endpoint of I is of the form (0, y), y ∈ [0, 1]. Take the leaf ofthe foliation starting from (0, 0); it follows from the above argumentation that thesecond endpoint must be equal to (1, 1). This, in turn, implies that the remainingleaves of the foliation must be the collection of all line segments joining (1, 1) withsome (x′, 0).

Page 93: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.6. EXAMPLES 93

Having identied the foliation, we immediately nd the formula for U . Indeed,U vanishes on 0 × [0, 1] and [0, 1] × 1, which implies that U is zero on thetriangle with vertices (0, 0), (1, 1), (0, 1). Furthermore, if 0 ≤ y < x < 1, then bythe above considerations,

U(x, y) = (1− y)U

(x− y1− y

, 0

)+ yU(1, 1) = −(x− y)p(1− y)1−p.

Summarizing, we have discovered the Bellman function

U(x, y) =

0 if 0 ≤ x < y ≤ 1,

−(x− y)p(1− y)1−p if 0 ≤ y < x < 1,

y − xp if x ≥ 1.

The conditions 1 and 2 hold obviously. To check 3, note that U is concave on eachof the sets (x, y) : 0 ≤ x ≤ y ≤ 1, (x, y) : 0 < y < x < 1 and [1,∞)× [0, 1] (theclaim for the second set follows from the observation that detD2U = 0 and Uxx ≤ 0there). However, U is continuous on [0,∞)× [0, 1] and of class C1 in the interior ofthis set, so it is concave on its full domain. Finally, we have Ux(x, 1) = 1x≥1 andhence, repeating the proof of Lemma 4.5, the property 3 is satised. This provesthe weak-type bound with the constant 1. This estimate is sharp, which can beeasily seen by considering the constant martingale f ≡ 1.

Figure 4.3. The foliation in the case cp = 1.

Step 3. The case cp > 1. Though the reasoning presented in the previous stepfully answers the question about the best constant in the weak-type estimate, it isinstructive to look at the shape of the Bellman function in the case of bigger cp.The rst observation is that in such a case there is no line segment of the foliationwhich would join points of the from (x, 1) and (0, y). Suppose on contrary that sucha leaf exists and let Ix′ be a line segment of the foliation starting from (0, 0): itmust end at some point (x′, 1) with x′ ∈ (x, 1]. Then U must vanish on the trianglewith vertices (0, 0), (x′, 1) and (0, 1). Indeed, by (4.23), the partial derivative Uy iszero in the triangle. Furthermore, we have Ux ≥ 0 on 0 × [0, 1], since otherwisewe would have U(x, y) < −cppxp for some (x, y) from the triangle. Thus, again bythe fact that the partial derivatives are constant along the leaves of foliation, wehave Ux ≥ 0 on the triangle and hence U(x, 1) ≥ 0 for each x ∈ [0, x′]. Applying1, we get that we actually have equality here, which implies that Ux vanishes on0×[0, 1] and hence U is zero on the triangle. This, in turn, gives that x′ < 1, sinceotherwise the concavity along the horizontal half-line [0,∞)×1 would be violated.So, there is a line segment Ix′′ starting from some point (x′′, 0), 0 < x′′ < x′, andending at some point (x′′′, 1) with x′′′ ∈ (x′, 1). But Uy is constant along the leaves

Page 94: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

94 4. INEQUALITIES FOR MAXIMAL FUNCTIONS

of the foliation, so Uy is zero on the part of [0, 1] × [0, 1] contained between thesegments Ix′ and Ix′′ . This is a contradiction: we would have U(x′′, y) = −cpp(x′′)pfor small y and U(x′′, y) = 0 for large y.

Step 4. The case cp > 1, continuation. In the previous step we have shownthat for each (x, 1), x ∈ (0, 1), there is a point x0 = x0(x) ∈ (0, 1) such that theleaf starting from (x, 1) ends at (x0, 0). Let x1 = supx∈(0,1) x0(x). Then, by a

similar argument to that used above, we have Uy = 0 on [0, x1) × [0, 1] and henceU(x, y) = −cppxp there. This, in turn, implies x1 < 1, since otherwise the concavityof U would not be true. Consequently, the set [0, x1]× [0, 1] is foliated into verticalline segments x×[0, 1] which implies that x0(x) = x1 for all x ∈ [x1, 1). Therefore,

Uy = 0 and Ux = −cppxp−11 on the triangle with vertices (x1, 0), (1, 1) and (x1, 1)

and hence

U(x, y) = −cppxp1 − pcppx

p−11 (x− x1)

there. However, U must be continuous at (1, 1), which enforces the equality

(4.24) (p− 1)cppxp1 − pcppx

p−11 = 1− cpp.

A straightforward analysis shows that the left-hand side, considered as a function ofx1 ∈ [0, 1], is decreasing and contains 1− cpp in its range, so the point x1 is uniquelydetermined.

Step 5. Final part of the domain. It is clear that on the triangle with vertices(x1, 0), (1, 0), (1, 1), the only choice for the foliation is the family of line segmentsjoining (x, 0) with (1, 1), x ∈ (x1, 1). Therefore, for any (x, y) from the triangle,

U(x, y) = (1− y)U

(x− y1− y

, 0

)+ yU(1, 1) = −cpp(x− y)p(1− y)1−p.

Figure 4.4. The foliation in the case cp > 1.

So, we have obtained the following candidate for the Bellman function:

U(x, y) =

−cppxp if x ≤ x1,

−cppxp1 − pcppx

p−11 (x− x1) if x ≥ x1,

x−x1

1−x1≤ y ≤ 1,

−cpp(x− y)p(1− y)1−p if 0 ≤ y ≤ x−x1

1−x1< 1,

y − cppxp if x ≥ 1.

As in the case cp = 1, we easily check that this function satises the conditions 1,2 and 3. Therefore, the application of the Bellman function method yields theestimate

(4.25) P(f∗n ≥ 1) ≤ cppEfpn + U(Efn, 1).

Page 95: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

4.7. PROBLEMS 95

This bound is sharp, no matter what Efn is. Indeed, if a := Efn ∈ [0, x1] ora ≥ 1, then the constant martingale f ≡ a gives equality in (4.25). Of course thisoptimal choice is encoded in the foliation. The point (Efn, 1) is the endpoint of thevertical segment and the martingale evolution along this segment is constant in thex-variable (see the similar reasoning described in the study of the Lp estimates).If a ∈ (x1, 1), then the foliation illustrated at Figure 4.6.2 suggests introducing themartingale which behaves as follows. It starts from a and then, at each step, itjumps to x1 and stops ultimately, or moves a little bit to the right, until it reaches1. More precisely , the distribution of the process is given by

P(fn = a(1 + δ)n) = 1− P(fn = x1) =a− x1

a(1 + δ)n − x1, n = 0, 1, 2, . . . , N,

where δ and N are chosen so that a(1 + δ)N = 1. This martingale satises

P(f∗N ≥ 1) =a− x1

1− x1, EfpN = xp1 ·

1− a1− x1

+ 1 · a− x1

1− x1

and hence, by (4.24),P(f∗N ≥ 1)− cppEfpn = U(a, 1).

4.7. Problems

1. For a given K > 1, nd the best constant L(K) in the logarithmic inequality

||f∗n||1 ≤ KEfn log fn + L(K), n = 0, 1, 2, . . . ,

where f is assumed to be a nonnegative martingale.

2. For a xed 1 < p <∞, nd the best constant in the inequality

||f∗n||1 ≤ Cp||fn||p, n = 0, 1, 2, . . . .

3. For a xed 0 < p < 1, nd the best constant Cp in the inequality

||f∗n||p ≤ Cp||fn||1, n = 0, 1, 2, . . . .

4. (Coifman and Rochberg [7]) A nonnegative random variable w is said to bean A1 weight, if there is a nite constant c such that w∗ ≤ cw almost surely. Thesmallest c with this property is called the A1 characteristics of w and denoted by[w]A1

. Prove that if 0 < p < 1 and f is a nonnegative random variable, then (f∗)p

is an A1 weight satisfying [(f∗)p]A1 ≤ (1− p)−1.

5. (Nikolidakis [14]) For a xed 1 < p < ∞, nd the best constant cp in theinequality

||f∗n||p,∞ ≤ cp||fn||p,∞, n = 0, 1, 2, . . . .

Page 96: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a
Page 97: Dynamic programming - Logowanieados/Bellman/Bellman.pdf · 2017. 6. 10. · involve lengthy and elaborate calculations. Dynamic programming allows to solve the above problem by a

Bibliography

[1] Bellman, R. Dynamic programming. Reprint of the 1957 edition. Princeton Landmarks inMathematics. Princeton University Press, Princeton, NJ, 2010.

[2] Burkholder, D. L. Boundary value problems and sharp inequalities for martingale transforms,Ann. Probab. 12 (1984), pp. 647702.

[3] Burkholder, D. L. An extension of a classical martingale inequality, Prob. Theory and Har-monic Analysis (J.-A. Chao and A. W. Woyczy«ski, eds.), Marcel Dekker, New York, 1986,pp. 2130.

[4] Carleman, T. Sur les fonctions quasi-analytiques. Conférences faites au cinquieme congresdes mathématiciens Scandinaves, Helsinki (1923), 181196.

[5] Carlson, F. Une inegalité, Ark. Math. Astron. Fys. 25 (1934), 15.[6] Chow Y. S. and Robbins, H. A martingale system theorem and applications, Proc. Fourth

Berkeley Symp. on Math., Stat. and Prob., 1 (1960) 93104.[7] Coifman R. R. and Rochberg R. Another characterization of BMO, Proc. Amer. Math. Soc.

79 (1980), pp. 249254.[8] Cox A. Optimal stopping and applications, available at www.maths.bath.ac.uk/mapamgc

/optimalstopping v5.pdf.[9] Dubins, L. E., Shepp, L. A. and Shiryaev, A. N., Optimal stopping rules and maximal in-

equalities for Bessel process, Theory Probab. Appl. 38 (1993), 226261.[10] Hardy, G. H.; Littlewood, J. E. and Pólya, G. Inequalities. Reprint of the 1952 edition.

Cambridge Mathematical Library. Cambridge University Press, Cambridge, 1988.[11] Hill, T.P. and Kertz R. P. Stop rule inequalities for uniformly bounded sequences of random

variables, Trans. Amer. Math. Soc. 278 (1983), 197207.[12] Krengel, U. and Sucheston, L. Semiamarts and nite values, Bull. Amer. Math. Soc. 83

(1977), 745747.[13] Krengel, U. and Sucheston, L. On semiamarts, amarts, and processes with nite value,

Probability on Banach Spaces, Ed. by J. Kuelbs, Marcel Dekker, New York, 1978.[14] Nikolidakis, E. N. Extremal problems related to maximal dyadic-like operators, J. Math.

Anal. Appl. 369 (2010), 377385.[15] Os¦kowski, A. Sharp LlogL inequality for Dierentially Subordinated Martingales, Illinois J.

Math. 52, Vol. 3 (2008), pp. 745756.[16] Os¦kowski, A. Sharp moment inequalities for dierentially subordinated martingales, Studia

Math. 201 (2010), pp. 103131.[17] Os¦kowski, A. A sharp one-sided bound for the Hilbert transform, Proc. Amer. Math. Soc.

141 (2013), pp. 873882.[18] Robbins, H. Optimal stopping, Amer. Math. Mothly 77 (1970), pp. 333-343.[19] Strzelecki, M. A note on sharp one-sided bounds for the Hilbert transform, Proc. Amer. Math.

Soc. 144 (2016), pp. 11711181.[20] Suh, Y. A sharp weak type (p, p) inequality (p > 2) for martingale transforms and other

subordinate martingales, Trans. Amer. Math. Soc. 357 (2005), pp. 15451564.[21] Tomaszewski, B. The best constant in a weak-type H1-inequality, complex analysis and ap-

plications, Complex Variables, Vol. 4, 1984, pp. 3538.

97