A Course in Stochastic Processes - UB · PDF fileA Course in Stochastic Processes Jos´e Manuel Corcuera. ... laws. The mean m X(t) and ... valued stochastic process on a probability

A Course in Stochastic Processes

Jose Manuel Corcuera

2 J.M. Corcuera

Contents

1 Basic Concepts and Definitions 31.1 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . . 31.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Equivalence of Stochastic Processes . . . . . . . . . . . . . . . . . 61.4 Continuity Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Main Classes of Random Processes . . . . . . . . . . . . . . . . . 71.6 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.6.1 Model of a Germination Process . . . . . . . . . . . . . . 81.6.2 Modeling of Soil Erosion Effect . . . . . . . . . . . . . . . 91.6.3 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 9

1.7 Problemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Discrete time martingales 132.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . 132.2 Discrete Time Martingales . . . . . . . . . . . . . . . . . . . . . . 162.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 Optional stopping theorem . . . . . . . . . . . . . . . . . . . . . 212.5 The Snell envelope and optimal stopping . . . . . . . . . . . . . . 222.6 Decomposition of supermartingales . . . . . . . . . . . . . . . . . 252.7 Martingale convergence theorems . . . . . . . . . . . . . . . . . . 27

3 Markov Processes 313.1 Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Class structure . . . . . . . . . . . . . . . . . . . . . . . . 333.1.2 Hitting times and absortion probabilities . . . . . . . . . . 343.1.3 Recurrence and transience . . . . . . . . . . . . . . . . . . 393.1.4 Recurrence and trasient in a random walk . . . . . . . . . 413.1.5 Invariant distributions . . . . . . . . . . . . . . . . . . . . 423.1.6 Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . 473.1.7 Ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.2.1 Exponential distribution . . . . . . . . . . . . . . . . . . . 483.2.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . 52

3

4 CONTENTS

3.3.1 Definition and examples . . . . . . . . . . . . . . . . . . . 523.3.2 Computing the transition probability . . . . . . . . . . . 533.3.3 Jump chain and holding times . . . . . . . . . . . . . . . 563.3.4 Birth processes . . . . . . . . . . . . . . . . . . . . . . . . 573.3.5 Class structure . . . . . . . . . . . . . . . . . . . . . . . . 583.3.6 Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . 593.3.7 Recurrence and trasience . . . . . . . . . . . . . . . . . . 613.3.8 Invariant distributions . . . . . . . . . . . . . . . . . . . . 623.3.9 Convergence to equilibrium . . . . . . . . . . . . . . . . . 63

CONTENTS 1

2 CONTENTS

Chapter 1

Basic Concepts andDefinitions

1.1 Definition of a Stochastic Process

A stochastic process with state space S is a collection of random variablesXt; t ∈ T defined on the same probability space (Ω,F , P ). The set T is calledits parameter set. If T = N = 0, 1, 2, ..., the process is said to be a discreteparameter process. If T is not countable, the process is said to have a continuousparameter. In the latter case the usual examples are T = R+ = [0,∞) andT = [a; b] ⊂ R. The index t represents time, and then one thinks of Xt as thestate or the position of the process at time t. The state space is R in most usualexamples, and then the process is said real-valued. There will be also exampleswhere S is N, the set of all integers, or a finite set.

For every fixed ω ∈ Ω , the mapping

t −→ Xt(ω)

defined on the parameter set T , is called a realization, trajectory, sample pathor sample function of the process.

Let Xt; t ∈ T be a real-valued stochastic process and t1 < · · · < tn ⊂ T ,then the probability distribution Pt1,...,tn = P (Xt1 , ..., Xtn)−1 of the randomvector

(Xt1 , ..., Xtn) : Ω −→ Rn

is called a finite-dimensional marginal distribution of the process Xt; t ∈ T.A theorem due to Kolmogorov, establishes the existence of a stochastic processassociated with a given family of finite-dimensional distributions satisfying theconsistence condition. Let Ω = RT , the collection of all functions ω := (ω(t)t∈T )from T to R. Define Xt(ω) = ω(t).A set

C = ω : Xt1(ω) ∈ B1, ..., Xtn(ω) ∈ Bn

3

4 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS

for t1 < · · · < tn ∈ T and B1, ..., Bn ∈ B(Rn) is called a cylinder set. Considerthe σ-field F generated by the cylinders sets.

Theorem 1.1.1 Consider a family of probability measures

Pt1,...,tn, t1 < · · · < tn, n ≥ 1, ti ∈ T

such that: 1. Pt1,...,tnis a probability on Rn 2.(Consistence condition): If

tk1 < · · · < tkm ⊂ t1 < · · · < tn ,

then Ptk1 ,...,tkmis the marginal of Pt1,...,tn

, corresponding to the indexes k1, ..., km.Then, there exists a unique probability P on F which has the family Pt1,...,tnas finite-dimensional marginal distributions.

1.2 Examples

A real-valued process Xt, t ≥ 0 is called a second order process providedE(X2) < ∞ for all t ≥ 0. The mean and the covariance function of a secondorder process Xt, t ≥ 0 are defined by

mX(t) = E(Xt)ΓX(s, t) = Cov(Xs;Xt)

= E((Xs −mX(s))(Xt −mX(t)).

The variance of the process Xt, t ≥ 0 is defined by is defined by

σ2X(t) = ΓX(t, t) = Var(Xt).

Example 1.2.1 Let X and Y be independent random variables. Consider thestochastic process with parameter t ∈ [0,∞)

Xt = tX + Y.

The sample paths of this process are lines with random coefficients. The finite-dimensional marginal distributions are given by

P (Xt1 ≤ x1, ..., Xtn ≤ xn) =∫

RFX

(min

1≤i≤n

xi − y

ti

)PY (dy).

Example 1.2.2 Consider the stochastic process

Xt = Acos(ϕ + λt);

where A and ϕ are independent random variables such that E(A) = 0, E(A2) <∞ and ϕ is uniformly distributed on [0, 2π]. This is a second order process with

mX(t) = 0

ΓX(s, t) =12E(A2) cos λ(t− s).

1.2. EXAMPLES 5

Example 1.2.3 Arrival process: Consider the process of arrivals of customersat a store, and suppose the experiment is set up to measure the interarrival times.Suppose that the interarrival times are positive random variables X1, X2, ....Then,for each t ∈ [0;∞), we put Nt = k if and only if the integer k is such that

X1 + ... + Xk ≤ t < X1 + ... + Xk+1,

and we put Nt = 0 if t < X1. Then Nt is the number of arrivals in the timeinterval [0, t]. Notice that for each t ≥ 0, Nt is a random variable taking valuesin the set S = N. Thus, Nt, t ≥ 0 is a continuous time process with valuesin the state space N. The sample paths of this process are non-decreasing, rightcontinuous and they increase by jumps of size 1 at the times X1 + ... + Xk. Onthe other hand, Nt <∞ for all t ≥ 0 if and only if

∞∑k=1

Xk =∞.

Example 1.2.4 Consider a discrete time stochastic process Xn, n = 0, 1, 2, ...with a finite number of states S = 1, 2, 3. The dynamics of the process is asfollows. You move from state 1 to state 2 with probability 1. From state 3 youmove either to 1 or to 2 with equal probability 1/2, and from 2 you jump to 3with probability 1/3, otherwise stay at 2. This is an example of a Markov chain.

A real-valued stochastic process Xt, t ∈ T is said to be Gaussian or normalif its finite-dimensional marginal distributions are multi-dimensional Gaussianlaws. The mean mX(t) and the covariance function ΓX(s, t) of a Gaussian pro-cess determine its finite-dimensional marginal distributions. Conversely, sup-pose that we are given an arbitrary function m : T −→ R, and a symmetricfunction Γ : T × T −→ R, which is nonnegative definite, that is

n∑i,j=1

Γ(ti, tj)aiaj ≥ 0

for all ti ∈ T, ai ∈ R, and n ≥ 1. Then there exists a Gaussian process withmean m and covariance function Γ.

Example 1.2.5 Let X and Y be random variables with joint Gaussian distri-bution. Then the process Xt = tX + Y , t ≥ 0, is Gaussian with mean andcovariance functions

mX(t) = tE(X) + E(Y );ΓX(s, t) = stVar(X) + (t + s)Cov(X;Y ) + Var(Y ).

Example 1.2.6 Gaussian white noise: Consider a stochastic process Xt, t ∈T such that the random variables Xt are independent and with the same lawN(0, σ2). Then, this process is Gaussian with mean and covariance functions,

mX(t) = 0

ΓX(s, t) = σ2δst.


1.3 Equivalence of Stochastic Processes

Definition 1.3.1 A stochastic process Xt, t ∈ T is equivalent to anotherstochastic process Yt, t ∈ T if for each t ∈ T

P Xt = Yt = 1.

We also say that Xt, t ∈ T is a version of Yt, t ∈ T. Two equivalentprocesses may have quite different sample paths.

Example 1.3.1 Let ξ be a nonnegative random variable with continuous dis-tribution function. Set T = [0,∞). The processes

Xt = 0

Yt =

0 if ξ 6= t1 if ξ = t

are equivalent but their sample paths are different.

Definition 1.3.2 Two stochastic processes Xt, t ∈ T and Yt, t ∈ T are saidto be indistinguishable if X·(ω) = Y·(ω) for all ω 6∈ N , with P (N) = 0.

Two stochastic process which have right continuous sample paths and areequivalent, then they are indistinguishable.

Two discrete time stochastic processes which are equivalent, they are alsoindistinguishable.

1.4 Continuity Concepts

Definition 1.4.1 A real-valued stochastic process Xt, t ∈ T, where T is aninterval of R, is said to be continuous in probability if, for any ε > 0 and everyt ∈ T

lims−→t

P (|Xt −Xs| > ε) = 0.

Definition 1.4.2 Fix p ≥ 1. Let Xt, t ∈ T be a real-valued stochastic pro-cess, where T is an interval of R, such that E(|Xt|p) < ∞, for all t ∈ T . Theprocess Xt, t ∈ T is said to be continuous in mean of order p if

lims−→t

E(|Xt −Xs|p) = 0.

Continuity in mean of orderp implies continuity in probability. However, thecontinuity in probability (or in mean of order p) does not necessarily impliesthat the sample paths of the process are continuous.

In order to show that a given stochastic process have continuous sample pathsit is enough to have suitable estimations on the moments of the increments of theprocess. The following continuity criterion by Kolmogorov provides a sufficientcondition of this type:

1.5. MAIN CLASSES OF RANDOM PROCESSES 7

Proposition 1.4.1 (Kolmogorov continuity criterion) Let Xt, t ∈ T bea real-valued stochastic process and T is a finite interval. Suppose that thereexist constants α > 1 and p > 0 such that

E(|Xt −Xs|p) ≤ cT |t− s|α (1.1)

for all s, t ∈ T . Then, there exists a version of the process Xt, t ∈ T withcontinuous sample paths.

Condition (1.1) also provides some information about the modulus of conti-nuity of the sample paths of the process: mT (ω, δ) := sups,t∈T,|t−s|<δ |Xt(ω) −Xs(ω)| . In particular if T = [a, b], for each ε > 0 there exists a random variableGε such that, with probability one,

mT (ω, δ) ≤ Gε|b− a|α−1

p −ε

Moreover, E(Gpε) <∞.

1.5 Main Classes of Random Processes

1. Processes with independent increments. Let X := Xt, t ∈ T be a real-valued stochastic process on a probability space (Ω,F , P ), where T ⊂ R is aninterval.

Definition 1.5.1 The process X is said to have independent increments if forany finite subset t0 < · · · < tn ⊂ T , the increments

Xt0 , Xt1 −Xt0 , ..., Xtn −Xtn−1

are independent random variables.

Remark 1.5.1 If T = [0,∞) , from the definition, it follows that all themarginal distributions are determined by X0 and Xt − Xs, s < t ∈ T . Themost important processes with independent increments are the Poisson processand the Wiener process (or Brownian motion).

2. Markov processes.

Definition 1.5.2 The process X is said to be a Markov process with a statespace R,B(R) if for any t1 < · · · < tn ⊂ T and B ∈ B(R),

P (Xtn∈ B|Xt1 , ..., Xtn−1) = P (Xtn

∈ B|Xtn−1) (a.s.) (1.2)

Remark 1.5.2 Property (2.3) is called the Markov property.

3. Strickly stationary processes.


Definition 1.5.3 The process X is said to be strickly stationary if for anyt1 < · · · < tn ⊂ T and t1 + h < · · · < tn + h ⊂ T any (Bk)1≤k≤n ∈ B(R)

P (Xt1+h ∈ B1, ..., Xtn+h ∈ Bn) = P (Xt1 ∈ B1, ..., Xtn ∈ Bn)

4. Wide sense stationary processes.

Definition 1.5.4 A second order process X is said to be wide sense stationaryif for all t ∈ T and t + h ∈ T

E(Xt+h) = E(Xt)

andCov(Xt+h, Xs+h) = Cov(Xt, Xs).

5. Martingales.

Definition 1.5.5 A process X, such that E(|X|) <∞ is said to be a martingaleif, for every t1 < · · · < tn ⊂ T,

E(Xtn|Xt1 , ..., Xtn−1) = Xtn−1 (a.s.).

IfE(Xtn

|Xt1 , ..., Xtn−1) ≤ Xtn−1 (a.s.),

the process X is a supermartingale. Finally if

E(Xtn |Xt1 , ..., Xtn−1) ≥ Xtn−1 (a.s.),

it is said to be a submartingale.

1.6 Some Applications

1.6.1 Model of a Germination Process

Let p be the probability that a seed may fal to germinate. If there is germinationassume that the germination time, T, follows a exponential distribution withparameter λ. Then

P (T ≤ t) = P (T ≤ t|germination)P (germination)

= (1− e−λt)(1− p).

If we have n plants and Xt is the number of them that have germinated, then

Xt =N∑

i=1

1Ti≤t,

where Ti is the germination time of the i-plant. If we assume that TiNi=1 isan i.i.d. sequence, then

Xt ∼ Bi(N, (1− e−λt)(1− p)

).

1.6. SOME APPLICATIONS 9

1.6.2 Modeling of Soil Erosion Effect

Let Y1, Y2, ..., Yn, ... be the sequence of annual yields of a given crop in an areawithout erosion and Z1, Z2, ...Zn, ... a sequence of factors with common supportin (0, 1]. Then we can model the crop yield in year n as

Xn = Yn

n∏i=1

Zi,

and to assume that Y and Z are, respectively, sequences of i.i.d random variablesand that Y and Z are independent.

1.6.3 Brownian motion

In 1827 Robert Brown noticed the irregular but ceaseless motion of pollen par-ticles suspended in a liquid. Assume that ξt is the coordinate, say x, of theposition of a pollen particle. Then (ξt) is a continuos-time process. If we as-sume that ξt − ξs has a distribution depending on t − s, independent of whathappened before s and that (ξt) is a continuous process then if can be provedthat

ξt ∼ N(µt, σ2t),

where µ ∈ R and σ ∈ R+. If µ = 0 and σ = 1 we have the so called standardBrownian motion. Since

(ξt − ξs)2 = ξ2

t + ξ2s − 2ξtξs,

we have that

|t− s| = E((ξt − ξs)2) = E(ξ2

t ) + E(ξ2s )− 2Cov (ξt, ξs) ,

andCov (ξt, ξs) =

12(t + s− |t− s|) = min(s, t).


1.7 Problemes

1. Siguin Ω = [0, 1],F = B([0, 1]) i P la mesura de probabilitat uniforme enΩ. Considereu el proces Xt(ω) = tω, t ∈ [0, 1]. Trobeu les trajectories iles distribucions en una i dos dimensions.

2. Siguin Xt = A cos αt + B sinαt i Yt = R cos(αt + Θ). On α > 0, A i Bson variables aleatories independents amb distribucio N(0, σ2). R i Θ sontambe variables aleatories independents on Θ ∼Unif(0, 2π) i R te densitat

fR(r) =x

σ2exp(− x2

2σ2)1(0,+∞)(x).

Demostreu que els processos Xtt≥0 i Ytt≥0 tenen la mateixa llei.

3. Siguin X1 i X2 variables aleatories independents definides en el mateix es-pai de probabilitat (Ω,F , P ) i amb la mateixa distribucio normal estandard.Sigui Ytt≥0 el proces estocastic definit per

Yt = (X1 + X2) t.

Determineu les distribucions en dimensio finita del proces. Si A es elconjunt de les trajectories no negatives del proces, calculeu P (A).

4. Sigui Ytt≥0 el proces estocastic definit per

Yt = X + αt, α > 1,

on X es una variable aleatoria amb llei N(0, 1). Sigui D ⊂ [0,+∞) unconjunt finit o infinit numerable. Determineu:

(a) P (Yt = 0, per almenys un t ∈ D),(b) P (Yt = 0, per almenys un t ∈ [1, 2]).

5. Siguin X i Y variables aleatories definides en (Ω,F , P ) , on Y ∼ N (0, 1) .Sigui Ztt≥0 el seguent proces estocastic

Zt = X + t (Y + t) .

Sigui A el conjunt de trajectories no decreixents d’aquest proces. Deter-mineu P (A) .

6. Siguin Xii=1,...,n i Yii=1,...,n variables aleatories tal que E[Xi] = E[Yi] =0,Var[Xi] = Var[Yi] = σi < +∞ i E[XiXj ] = E[YiYj ] = E[XiYj ] = 0 pera tot i 6= j. Sigui Ztt≥0 el proces estocastic definit per

Zt =n∑

i=1

Xi cos (λit) + Yi sin (λit) .

Determineu la seva funcio de covariancia. Es Ztt≥0 estacionari en sentitampli?

1.7. PROBLEMES 11

7. Diem que un procesNtt≥0 es un un proces de Poisson homogeni deparametre λ > 0, si te increments independents, Nt = 0 i per a tot 0 <t1 < t2 es compleix

P (Nt2 −Nt1 = k) =(λ (t2 − t1))

k

k!e−λ(t2−t1), k ∈ Z+.

Sigui Xkk≥0 una successio de v.a. i.i.d. amb E[Xk] = 0 i Var[Xi] = σ2 <+∞. Sigui Ntt≥0 un proces de Poisson homogeni de parametre λ > 0,

independent de Xkk≥0. Es el proces Ytt≥0 , definit per Yt = XNt ,estacionari en sentit ampli? I estrictament estacionari?

8. Sigui Ytt≥0 el proces estocastic definit per

Yt = α sin (βt + X) ,

on α, β > 0 i X ∼ N (0, 1) . Calculeu E[Yt] i E[YsYt]. Es el proces Ytt≥0

estacionari en senti ampli?


Chapter 2

Discrete time martingales

We will first introduce the notion of conditional expectation of a random variableX with respect to a σ-field B ⊂ F in a probability space (Ω,F , P ).

2.1 Conditional expectation

Consider an integrable random variable X defined in a probability space (Ω,F , P ),and σ-field B ⊂ F . We define the conditional expectation of X given B (denotedby E(X|B) to be any integrable random variable Z that satisfies the followingtwo properties:

(i) Z is measurable with respect to B.

(ii) For all A ∈ B, E(Z1A) = E(X1A)

It can be proved that there exists a unique (up to modifications on sets ofprobability zero) random variable satisfying these properties. That is, if Z andZ satisfy the above properties, then Z = Z, P -almost surely.

Property (ii) implies that for any bounded and B-measurable random vari-able Y we have

E(E(X|B)Y ) = E(XY ) (2.1)

Example 2.1.1 Consider the particular case where the σ-field B is generated bya finite partition B1, ..., Bm. In this case, the conditional expectation E(X|B)is a discrete random variable that takes the constant value E(X|Bj) on each setBj:

E(X|B) =m∑

j=1

E(X1Bj )P (Bj)

1Bj.

Here are some properties of the conditional expectations in the general case:

1. The conditional expectation is lineal: E(aX+bY |B) = aE(X|B)+bE(Y |B).

13

14 CHAPTER 2. DISCRETE TIME MARTINGALES

2. A random variable and its conditional expectation have the same expecta-tion: E(E(X|B)) = E(X). This follows from property (ii) taking A = Ω.

3. If X and B are independent, then E(X|B) = E(X). In fact, the constantE(X) is clearly B-measurable, and for all A ∈ B we have E(X1A) =E(X)E(1A) = E(E(X)1A).

4. If X is B-measurable, then E(X|B) = X.

5. If Y is a bounded and B-measurable random variable, then E(Y X|B) =Y E(X|B). In fact, the random variable Y E(X|B) is integrable and B-measurable, and for all A ∈ B we have

E(E(X|B)Y 1A) = E(XY 1A),

where the equality follows from (2.1). This property means that B-measurablerandom variables behave as constants and can be factorized out of theconditional expectation with respect to B. This property holds if X, Y ∈L2(Ω).

6. Given two σ-fields C ⊂ B, then E(E(X|B)|C) = E(X|C)

7. Consider two random variable X and Z, such that Z is B-measurableand X is independent of B. Consider a measurable function h(x, z) suchthat the composition h(X, Z) is an integrable random variable. Then,we have E(h(X, Z)|B) = E(h(X; z))|z=Z . That is, we first compute theexpectation E(h(X; z)) for any fixed value z of the random variable Zand, afterwards, we replace z by Z.

Conditional expectation has properties similar to those of ordinary expecta-tion. For instance, the following monotone property holds:

X ≤ Y ⇒ E(X|B) ≤ E(Y |B).

This implies |E(X|B)| ≤ E(|X||B).Jensen’s inequality also holds. That is, if ϕ is a convex function defined in

an open interval, such that E(|ϕ(X)|) <∞, then

ϕ(E(X|B)) ≤ E(ϕ(X)|B). (2.2)

In particular, if we take ϕ(x) = |x|p with p ≥ 1, and E(|X|p) <∞, we obtain

|E(X|B)|p ≤ E(|X|p|B),

hence, taking expectations,

E(|E(X|B)|p) ≤ E(|X|p). (2.3)

We can define the conditional probability of an event C ∈ F given a σ-field Bas

P (C|B) = E(1C |B).

2.1. CONDITIONAL EXPECTATION 15

Suppose that the σ-field B is generated by a finite collection of random vari-ables Y1, ..., Ym. In this case, we will denote the conditional expectation of Xgiven B by E(X|Y1, ..., Ym) and this conditional expectation is the mean of theconditional distribution of X given Y1, ..., Ym. The conditional distribution ofX given Y1, ..., Ym is a family of distributions p(dx|y1, ..., ym) parametrized bythe possible values y1, ..., ym of the random variables Y1, ..., Ym, such that forall a < b

P (a ≤ X ≤ b|Y1, ..., Ym) =∫ b

a

p(dx|Y1, ..., Ym).

Then, this implies that

E(X|Y1, ..., Ym) =∫

Rxp(dx|Y1, ..., Ym).

Notice that the conditional expectation E(X|Y1, ..., Ym) is a function g(Y1, ..., Ym)of the variables Y1, ..., Ym, where

g(y1, ..., ym) =∫

Rxp(dx|y1, ..., ym).

In particular, if the random variables X, Y1, ..., Ym have a joint density f(x, y1, ..., ym),then the conditional distribution has the density:

f(x|y1, ..., ym) =f(x, y1, ..., ym)∫∞

−∞ f(x, y1, ..., ym)dx,

andE(X|Y1, ..., Ym) =

∫ ∞

−∞xf(x|Y1, ..., Ym)dx.

The set of all square integrable random variables, denoted by L2, is a Hilbertspace with the scalar product

〈Z, Y 〉 = E(ZY ).

Then, the set of square integrable and B-measurable random variables, denotedby L2(Ω,B, P ) is a closed subspace of L2(Ω,F , P ). Then, given a randomvariable X such that E(X2) < ∞, the conditional expectation E(X|B) is theprojection of X on the subspace L2(Ω,B, P ). In fact, we have:

(i) E(X|B) belongs to L2(Ω,B, P ) because it is a B-measurable random vari-able and it is square integrable due to (2.3).

X − E(X|B) is orthogonal to the subspace L2(Ω,B, P ). In fact, for allZ ∈ L2(Ω,B, P ) we have, using the property 5,

E[(X − E(X|B))Z] = E(XZ)− E(E(X|B)Z)= E(XZ)− E(E(XZ|B)) = 0.


As a consequence, E(X|B) is the random variable in L2(Ω,B, P ) that minimizesthe mean square error:

E[(X − E(X|B))2] = minY ∈L2(Ω,B,P )

E[(X − Y )2].

This follows from the relation

E[(X − Y )2] = E[(X − E(X|B))2] + E[(E(X|B)− Y )2],

and it means that the conditional expectation is the optimal estimator of Xgiven the σ-field B.

2.2 Discrete Time Martingales

In this section we consider a probability space (Ω,F , P ) and a nondecreasingsequence of σ- fields

F0 ⊆ F1 ⊆ · · · ⊆ Fn ⊆ · · ·

contained in F . F := Fnn≥0 is called a filtration. A sequence of real randomvariables M = Mnn≥0 is called a martingale with respect to the filtration Fif:

(i) For each n ≥ 0, Mn is Fn-measurable (that is, M is adapted to thefiltration F).

(ii) For each n ≥ 0, E(|Mn|) <∞.(iii) For each n ≥ 0,

E(Mn+1|Fn) = Mn.

The sequence M = Mn n≥0 is called a supermartingale ( or submartin-gale) if property (iii) is replaced by E(Mn+1|Fn) ≤Mn (or E(Mn+1|Fn) ≥Mn).

Notice that the martingale property implies that E(Mn) = E(M0) for alln ≥ 0.

On the other hand, condition (iii) can also be written as

E(∆Mn|Fn−1) = 0,

for all n ≥ 1, where ∆Mn = Mn −Mn−1.

Example 2.2.1 Suppose that ξn, n ≥ 1 are i.i.d. centered random variables.Set M0 = 0 and Mn = ξ1 + ... + ξn, for n ≥ 1.Then Mn is a martingale w.r.t(Fn), with Fn = σ(ξ1, ..., ξn) for n ≥ 1, and F0 = φ,Ω. In fact,

Example 2.2.2 Suppose that ξn, n ≥ 1 are i.i.d. random variables such that

P (ξn = 1 = 1− P (ξn = −1 = p, on 0 < p < 1. Then Mn =(

1−pp

)ξ1+...+ξn

,

2.2. DISCRETE TIME MARTINGALES 17

M0 = 1 is a martingale w.r.t (Fn), with Fn = σ(ξ1, ..., ξn) for n ≥ 1, andF0 = φ,Ω. In fact,

E(Mn+1|Fn) = E

((1− p

p

)ξ1+...+ξn+1

|Fn

)

=(

1− p

p

)ξ1+...+ξn

E

((1− p

p

)ξn+1

|Fn

)

= MnE

((1− p

p

)ξn+1)

= Mn.

In the two previous examples, Fn = σ(M0, ...,Mn), for all n ≥ 0. That is,(Fn) is the filtration generated by the process (Mn). Usually, when the filtrationis not mentioned, we will take Fn = σ(M0, ...,Mn), for all n ≥ 0. This is alwayspossible due to the following result:

Proposition 2.2.1 Suppose (Mn)n≥0 is a martingale with respect to a filtration(Gn). Let Fn = σ(M0, ...,Mn) ⊆ Gn. Then (Mn)n≥0 is a martingale with respectto the filtration (Fn).

Proof. We have, by the properties 6 and 4 of the conditional expectation,

E(Mn+1|Fn) = E(E(Mn+1|Gn)|Fn) = E(Mn|Fn) = Mn.

Some elementary properties of martingales:1. If (Mn) is a martingale, then for all m ≥ n we have

E(Mm|Fn) = Mn.

In fact, for m > n + 1

E(Mm|Fn) = E(E(Mm|Fm−1)|Fn)= E(Mm−1|Fn) = ... = E(Mn+1|Fn) = Mn.

2. (Mn) is a submartingale if and only if (−Mn) is a supermartingale.3. If (Mn) is a martingale and ϕ is a convex function, defined in an open

interval, such that E(|ϕ (Mm) |) < ∞ for all n ≥ 0, then (ϕ (Mm)) is a sub-martingale. In fact, by Jensen’s inequality for the conditional expectation wehave

E(ϕ (Mn+1) |Fn) ≥ ϕ (E(Mn+1|Fn)) = ϕ (Mn) .

In particular, if (Mn) is a martingale such that E(|Mn|p) <∞ for all n ≥ 0 andfor some p ≥ 1, then (|Mn|p) is a submartingale.


4. If (Mn) is a submartingale and ϕ is a convex an increasing function suchthat E(|ϕ (Mm) |) <∞ for all n ≥ 0, then (ϕ (Mm)) is a submartingale. In fact,by Jensen’s inequality for the conditional expectation we have

E(ϕ (Mn+1) |Fn) ≥ ϕ (E(Mn+1|Fn)) ≥ ϕ (Mn) .

In particular, if If (Mn) is a submartingale, then (M+n ) and (Mn ∨ a) are sub-

martingales.Suppose that (Fn)n≥0 is a given filtration. We say that (Hn)n≥1is a pre-

dictable sequence of random variables if for each n ≥ 1, Hn is Fn−1-measurable.We define the martingale transform of (Mn) by (Hn) as the sequence

(H ·M)n = M0 +n∑

j=1

Hj∆Mj , n ≥ 1.

Proposition 2.2.2 If (Mn)n≥0 is a (sub)martingale and (Hn)n≥1is a bounded(nonnegative) predictable sequence, then the martingale transform ((H ·M)n)is a (sub)martingale.

Proof. Clearly, for each n ≥ 0 the random variable (H · M)n is Fn-measurable and integrable. On the other hand, if n ≥ 0 we have

E (∆(H ·M)n+1|Fn) = E (Hn+1∆Mn+1|Fn)= Hn+1E (∆Mn+1|Fn) = 0.

Remark 2.2.1 We may think of Hn as the amount of money a gambler willbet at time n. Suppose that ∆Mn = Mn −Mn−1 is the amount of money agambler can win or lose at every step of the game if the bet is 1 Euro, and M0

is the initial capital of the gambler. Then, Mn will be the fortune of the gamblerat time n if in each step the bet is 1 Euro, and (H ·M)n will be the fortuneof the gambler is he uses the gambling system (Hn). The fact that (Mn) is amartingale means that the game is fair. So, the previous proposition tells usthat if a game is fair, it is also fair regardless the gambling system (Hn).

Example 2.2.3 Suppose that M0 = 0 and Mn = ξ1 + ... + ξn, n ≥ 1, whereξn, n ≥ 1 are i.i.d. random variables such that P (ξn = 1 = P (ξn = −1 = 1

2 .(Mn) is a martingale. A famous system to bet is ”the doubling strategy” definedas

H1 = 1

Hn =

2n−1 if ξ1 = .... = ξn−1 = −10 otherwise , n > 1,

2.3. STOPPING TIMES 19

then, if we stop at τ = infk, ξk = 1 we will have

(H ·M)τ =τ∑

i=1

Hn∆Mn

= −τ−1∑i=1

2n−1 + 2n = 1! (a.s)

and our initial capital was zero! Note that P (τ = k) =(

12

)k and so P (τ <∞) = 1 then we will stop at a finite time. However, that result does not con-tradict our previous proposition, since if we stop the si process by τ and we lookwhat happens at a fix time n and its relation with previous times we obtain amartingale, that is

E((H ·M)n∧τ |Fn−1) = (H ·M)n−1∧τ for all n ≥ 1.

It is remarkable that the term martingale is originated from a French acronymfor the gambling strategy described above.

2.3 Stopping times

Consider a filtration F = Fnn≥0 in a probability space (Ω,F , P ). A (general-ized) random variable is called a stopping time if T indicates the time when anobservable event happens. In this way at time n we know the value of T ∧(n+1),in other words T ∧ (n + 1) is Fn-medible, so equivalently T ≤ n ∈ Fn for alln ≥ 0.

Definition 2.3.1 A (generalized) random variable T is called a stopping timeif

T : Ω −→ Z+ ∪ ∞,satisfies T ≤ n ∈ Fn.

Proposition 2.3.1 T is an F-stopping time iff

T = n ∈ Fn, for all n ≥ 0.

Proof. T ≤ n = ∪nj=1T = j

T = n = T ≤ n ∩ T ≤ n− 1c.

Example 2.3.1 Consider a discrete time stochastic process (Xn) adapted to F.Let A be a subset of the space state. Then the first hitting time of A, TA is astopping time time because

TA = n = X0 6∈ A,X1 6∈ A, ...,Xn−1 6∈ A,Xn ∈ A .

Remark 2.3.1 We could consider random times with certain delay, for in-stance if T ≤ n − 1 ∈ Fn. Or even we can consider predictable times:T ≤ n ∈ Fn−1.


Remark 2.3.2 The extension of the notion of stopping time to continuous timeis evident: the (generalized) random variable

T : Ω −→ [0,∞]

is a stopping time if T ≤ t ∈ Ft. Random times with infinitesimal delay arecalled optional times: T < t ∈ Ft (equivalently T ≤ t ∈ Ft+ := ∩s>tFs),random times with infinitesimal anticipation are called predictable times: T ≤t ∈ Ft− := σ(∪s<tFs).

Consider a real-valued stochastic process Xt, t ≥ 0 with continuous tra-jectories, and adapted to F = Ft, t ≥ 0. Assume X0 = 0 and a > 0. The firstpassage time for a level a ∈ R defined by

Ta := inft > 0 : Xt = a

is a stopping time because

Ta ≤ t = sup0≤s≤t

Xs ≥ a = sup0≤s≤t,s∈Q

Xs ≥ a ∈ Ft.

Properties of stopping times:1. If S and T are stopping times, so are S ∨ T and S ∧ T . In fact, this a

consequence of the relations

S ∨ T ≤ t = S ≤ t ∩ T ≤ t ,

S ∧ T ≤ t = S ≤ t ∪ T ≤ t .

2. Given a stopping time T , we can define the σ-field

FT = A : A ∩ T ≤ t ∈ Ft, for all t ≥ 0.

FT is a σ-field because it contains the empty set, it is stable by complementsdue to

Ac ∩ T ≤ t = A ∪ T > tc = (A ∩ T ≤ t) ∪ T > tc ,

and it is stable by countable intersections.3. If S and T are stopping times such that S ≤ T , then FS ⊂ FT . In fact,

if A ∈ FS , then

A ∩ T ≤ t = A ∩ S ≤ t ∩ T ≤ t ∈ Ft

for all t ≥ 0.

Remark 2.3.3 The σ-field FT have the meaning of the information before thestopping time T . We observe what happens until time T , time T included.

2.4. OPTIONAL STOPPING THEOREM 21

4. Let Xt be an adapted stochastic process (with discrete or continuousparameter) and let T be a stopping time. If the parameter is continuous, weassume that the trajectories of the process Xt are right-continuous. Then therandom variable

XT (ω) = XT (ω)(ω)

is FT -measurable. In discrete time this property follows from the relation

XT ∈ B ∩ T = n = Xn ∈ B ∩ T = n ∈ Fn

for any subset B of the state space (Borel set if the state space is R).

2.4 Optional stopping theorem

Consider a discrete time fitration and suppose that T is a stopping time. Then,the process

Hn = 1T≥n

is predictable. In fact, T ≥ n = T ≤ n − 1c ∈ Fn−1. The martingaletransform of Mn by this sequence is

(H ·M)n = M0 +n∑

j=1

1T≥j∆Mj ,

= M0 +T∧n∑j=1

∆Mj = MT∧n,

as a consequence, if Mn is a (sub)martingale, the stopped processMT

n

:=

MT∧n will be a (sub)martingale.

Theorem 2.4.1 (Optional Stopping Theorem) Suppose that Mn is a sub-martingale and S ≤ T < m are two stopping times bounded by a fixed time m.

E(MT |FS) ≥MS

with equality in the martingale case.

Proof. We make the proof only in the martingale case. Notice first thatMT is integrable because

|MT | ≤m∑

n=0

|Mn|.

Consider the predictable process Hn = 1S<n≤T∩A, where A ∈ FS . Noticethat Hn is predictable because

S < n ≤ T ∩A = T < nc ∩ (S ≤ n− 1 ∩A) ∈ Fn−1.


Moreover, the random variables Hn are nonnegative and bounded by one. There-fore by Proposition 2.2.2, (H ·M)n is a martingale. We have

(H ·M)0 = M0

(H ·M)m = M0 + 1A(MT −MS).

The martingale property of (H ·M)n implies that E(1A(MT −MS)) = 0 for allA ∈ FS and this implies that E(MT |FS) = MS , because MS is FS-measurable.

Theorem 2.4.2 (Doob’s Maximal Inequality) Suppose that Mn is a sub-martingale and λ > 0. Then

P ( sup0≤n≤N

Mn ≥ λ) ≤ 1λ

E(MN1sup0≤n≤N Mn≥λ ≤1λ

E(M+N )

Proof. Consider the stopping time

T = infn ≥ 0,Mn ≥ λ ∧N.

Then, by the Optional Stopping Theorem,

E(MN ) ≥ E(MT ) = E(MT 1sup0≤n≤N Mn≥λ)

+ E(MT 1sup0≤n≤N Mn<λ)

≥ E(λ1sup0≤n≤N Mn≥λ) + E(MN1sup0≤n≤N Mn<λ).

Corollary 2.4.1 If Mn is a martingale and p ≥ 1, and E(|Mn|p) <∞ then

P ( sup0≤n≤N

Mn ≥ λ) ≤ 1λp

E(|MN |p).

Remark 2.4.1 Note that the last inequality is a generalization of the Chebyshevinequality.

2.5 The Snell envelope and optimal stopping

Let (Yn) be an adapted process (to (Fn)) with finite expectation, define

XN = YN

Xn = max(Yn, E(Xn+1|Fn)), 0 ≤ n ≤ N − 1,

we say that (Xn) is the Snell envelope of (Yn).

Proposition 2.5.1 The sequence (Xn) is the smallest a supermartingale thatdominates the sequence(Yn)

2.5. THE SNELL ENVELOPE AND OPTIMAL STOPPING 23

Proof. (Xn) is adapted and by construction

E(Xn+1|Fn) ≤ Xn.

Let (Tn) be another supermartingale that dominates (Yn), then TN ≥ YN = XN .Assume that Tn+1 ≥ Xn+1. Then, by the monotony of the expectation and since(Tn) is a supermartingale

Tn ≥ E(Tn+1|Fn) ≥ E(Xn+1|Fn)

moreover (Tn) dominates (Yn), so

Tn ≥ max(Yn, E(Xn+1|Fn)) = Xn

Remark 2.5.1 Fixed ω if Xn is strictly greater than Yn, Xn = E(Xn+1|Fn)so Xn behaves, until this n as a martingale, this indicates that if we ”stop” Xn

properly we can have a martingale.

Proposition 2.5.2 The random variable

ν = infn ≥ 0, Xn = Yn

is a stopping time and (Xνn) is a martingale.

Proof.

ν = n = X0 > Y0 ∩ ... ∩ Xn−1 > Yn−1 ∩ Xn = Yn ∈ Fn.

And

Xνn = X0 +

n∑j=1

1j≤ν(Xj −Xj−1)

thereforeXν

n+1 −Xνn = 1n+1≤ν(Xn+1 −Xn)

and

E(Xνn+1 −Xν

n|Fn) = 1n+1≤νE(Xn+1 −Xn|Fn)

=

0 if ν ≤ n since the indicator vanishes0 if ν > n since in such a case Xn = E(Xn+1|Fn)

We denote τn,N stopping times with values in n, n + 1, ..., N.

Corollary 2.5.1

X0 = E(Yν |F0) = supτ∈τ0,N

E(Yτ |F0)


Proof. (Xνn) is a martingale and consequently

X0 = E(XνN |F0) = E(XN∧ν |F0)

= E(Xν |F0) = E(Yν |F0).

On the other hand (Xn) is supermartingale and then (Xτn) as well for all τ ∈

τ0,N , soX0 ≥ E(Xτ

N |F0) = E(Xτ |F0) ≥ E(Yτ |F0),

thereforeE(Yν |F0) ≥ E(Yτ |F0), ∀τ ∈ τ0,N

Remark 2.5.2 Analogously we could prove

Xn = E(Yνn |Fn) = supτ∈τn,N

E(Yτ |Fn),

whereνn = infj ≥ n, Xj = Yj

Definition 2.5.1 A stopping time ν is said to be optimal for the sequence (Yn)if

E(Yν |F0) = supτ∈τ0,N

E(Yτ |F0).

Remark 2.5.3 The stopping time ν = infn, Xn = Yn (where X is the Snellenvelope of Y ) is then an optimal stopping time for Y . We shall see the it isthe smallest optimal stopping time.

The following theorem characterize the optimal stopping times.

Theorem 2.5.1 τ is an optimal stopping time if and only ifXτ = Yτ

(Xτn) is a martingale

Proof. If (Xτn) is a martingale and Xτ = Yτ

X0 = E(XτN |F0) = E(XN∧τ |F0)

= E(Xτ |F0) = E(Yτ |F0).

On the other hand for all stopping time π, (Xπn ) is a supermartingale, so

X0 ≥ E(XπN |F0) = E(Xπ|F0) ≥ E(Yπ|F0).

Reciprocally, we know, by the previous corollary, that X0 = supτ∈τ0,NE(Yτ |F0).

Then, if τ is optimal

X0 = E(Yτ |F0) ≤ E(Xτ |F0) ≤ X0,

2.6. DECOMPOSITION OF SUPERMARTINGALES 25

where the last inequality is due to the fact that (Xτn) is a supermartingale. So,

we haveE(Xτ − Yτ |F0) = 0

and since Xτ − Yτ ≥ 0, we conclude that Xτ = Yτ .Now we can also see that (Xτ

n) is a martingale. We know that it is a super-martingale, then

X0 ≥ E(Xτn |F0) ≥ E(Xτ

N |F0) = E(Xτ |F0) = X0

as we saw before. Then, for all n

E(Xτn − E(Xτ |Fn)|F0) = 0,

and since (Xτn) is supermartingale,

Xτn ≥ E(Xτ

N |Fn) = E(Xτ |Fn)

therefore Xτn = E(Xτ |Fn).

2.6 Decomposition of supermartingales

Proposition 2.6.1 Any supermartingale (Xn) has a unique decomposition:

Xn = Mn −An

where (Mn) is a martingale and (An) is non-decreasing predictable with A0 = 0.

Proof. It is enough to write

Xn =n∑

j=1

(Xj − E(Xj |Fj−1))−n∑

j=1

(Xj−1 − E(Xj |Fj−1)) + X0

and to identify

Mn =n∑

j=1

(Xj − E(Xj |Fj−1)) + X0,

An =n∑

j=1

(Xj−1 − E(Xj |Fj−1))

where we define M0 = X0 and A0 = 0. So (Mn) is a martingale:

Mn −Mn−1 = Xn − E(Xn|Fn−1), 1 ≤ n ≤ N

in such a way that

E(Mn −Mn−1|Fn−1) = 0, 1 ≤ n ≤ N.


Finally since (Xn) is supermartingale

An −An−1 = Xn−1 − E(Xn|Fn−1) ≥ 0, 1 ≤ n ≤ N.

Now we can see the uniqueness. If

Mn −An = M ′n −A′

n, 0 ≤ n ≤ N

we haveMn −M ′

n = An −A′n, 0 ≤ n ≤ N,

but then since (Mn) y (M ′n) are martingales and (An) y (A′

n) predictable, itturns out that

An−1 −A′n−1 = Mn−1 −M ′

n−1 = E(Mn −M ′n|Fn−1)

= E(An −A′n|Fn−1) = An −A′

n, 1 ≤ n ≤ N,

that isAN −A′

N = AN−1 −A′N−1 = ... = A0 −A′

0 = 0,

since by hypothesis A0 = A′0 = 0.

This decomposition is known as the Doob decomposition.

Proposition 2.6.2 The biggest optimal stopping time for (Yn) is given by

νmax =

N si AN = 0infn, An+1 > 0 si AN > 0 ,

where (Xn), Snell envelope of (Yn), has a Doob decomposition Xn = Mn −An.

Proof. νmax = n = A1 = 0, A2 = 0, ..., An = 0, An+1 > 0 ∈ Fn,0 ≤ n ≤ N − 1, νmax = N = AN = 0 ∈ FN−1. So, it is a stopping time.

Xνmaxn = Xn∧νmax = Mn∧νmax −An∧νmax = Mn∧νmax

since An∧νmax = 0. Therefore (Xνmaxn ) is a martingale. So, to see that this

stopping time is optimal we have to prove that

Xνmax = Yνmax

Xνmax =N−1∑j=1

1νmax=jXj + 1νmax=NXN

=N−1∑j=1

1νmax=j max(Yj , E(Xj+1|Fj)) + 1νmax=NYN ,

but in νmax = j, Aj = 0, Aj+1 > 0 so

E(Xj+1|Fj) = E(Mj+1|Fj)−Aj+1 < E(Mj+1|Fj) = Mj = Xj

2.7. MARTINGALE CONVERGENCE THEOREMS 27

therefore Xj = Yj en νmax = j and consequently Xνmax = Yνmax . Finally we seethat is the biggest optimal stopping time. Let τ ≥ νmax and Pτ > νmax > 0.Then

E(Xτ ) = E(Mτ )− E(Aτ ) = E(M0)− E(Aτ )= X0 − E(Aτ ) < X0

so (Xτ∧n) cannot be a martingale.

2.7 Martingale convergence theorems

Let Mnn≥0 be a supermartingale and a < b ∈ R, let UN [a, b](ω) be the numberof upcrossings made by (Mn(ω)) by time N . Then define a predictable strategyH1 = 1M0<a and Hn = 1Mn−1<a,Hn−1=0 + 1Mn−1≤b,Hn−1=1, for n ≥ 2.Then

(H ·M)n :=n∑

j=1

Hj(Mj −Mj−1)

is a supermartingale that represents the gain by time n in a game where theprofit at time n if the bet is 1 euro. It is easy to see (by a simple picture) that

(H ·M)n ≥ (b− a)Un[a, b]− (Mn − a)−, n ≥ 1

then we have:

Lemma 2.7.1 Let Mnn≥0 be a supermartingale let UN [a, b](ω) be the numberof upcrossings by time N. Then

(b− a)E(UN [a, b]) ≤ E((MN − a)−).

Proof. The process (H ·M)n above is a supermartingale then E((H ·M)N ) ≤0.

Corollary 2.7.1 Let Mnn≥0 be a supermartingale such that supn E(|Mn|) <∞. Let a < b ∈ R and U∞[a, b] := limn→∞ Un[a, b], then

(b− a)E(U∞[a, b]) ≤ |a|+ supn

E(|Mn|) <∞,

so thatP (U∞[a, b] =∞) = 0.

Proof. By the previous lemma

(b− a)E(UN [a, b]) ≤ |a|+ E(|MN |)

and taking the limit when N →∞ and using the monotone convergence theoremwe obtain the result.


Theorem 2.7.1 Let Mnn≥0 be a supermartingale such that supn E(|Mn|) <∞. Then almost surely M∞ := limn→∞ Mn exists and is finite.

Proof. Let Λ = ω, Mn(ω) does not converge to a limit in [−∞,+∞],then

Λ = ∪a<b∈Qω, lim inf Mn(ω) < a < b < lim supMn(ω)= ∪a<b∈QΛab (say),

butΛab ⊂ U∞[a, b] =∞ ,

and by the previous corollary P (Λab) = 0 and then P (Λ) = 0. So M∞ exists in[−∞,+∞] a.s., but by Fatou’s lemma

E (|M∞|) = E (lim inf |Mn|) ≤ lim inf E (|Mn|) ≤ supn

E(|Mn|) <∞,

so M∞ is finite a.s..

Corollary 2.7.2 If Mnn≥0 is a non negative supermartingale then limn→∞ Mn

exists and is finite almost surely.

Proof. E(|Mn|) = E(Mn) ≤ E(M0).

Remark 2.7.1 It is important to note that it need no be true that Mn → M∞in L1.

Example 2.7.1 Suppose that ξn, n ≥ 1 are i.i.d. centered random variableswith distribution N(0, 2σ2). Set M0 = 1 and

Mn = exp

n∑j=1

ξj − nσ2

.

Then, Mn is a nonnegative martingale such that Mn → 0 a.s., by the stronglaw of large numbers, but E(Mn) = 1 for all n.

Theorem 2.7.2 Let Mnn≥0 be a martingale, such that supn E(M2n) < ∞,

then Mn →M∞ a.s. and in L2.

Proof. Obviously supn E(|Mn|) <∞ so we have amost sure convergence toM∞, on the other hand by the orthogonality of the increments

E(M2n) = E(M2

0 ) +n∑

k=1

E((Mk −Mk−1)2),

so∞∑

k=1

E((Mk −Mk−1)2) <∞,

2.7. MARTINGALE CONVERGENCE THEOREMS 29

and applying Fatou’s lemma in

E((Mn+r −Mn)2) =n+r∑

k=n+1

E((Mk −Mk−1)2),

when r →∞, we have

E((M∞ −Mn)2) ≤∞∑

k=n+1

E((Mk −Mk−1)2),

solim

n→∞E((M∞ −Mn)2) = 0.


Chapter 3

Markov Processes

3.1 Discrete-time Markov chains

Definition 3.1.1 We say that (Xn)n≥0 is a Markov chain with initial distri-bution λ := (λi, i ∈ I) and transition matrix P := (pij , i, j ∈ I)

(i) PX0 = i = λi, i ∈ I

(ii) for all n ≥ 0, P (Xn+1 = in+1|X0 = i0, X1 = i1, ..., Xn = in) = pinin+1 .

Where I is a countable set. We say that (Xn)n≥0 is Markov(λ, P ) for short.

We regard λ as a row vector whose components are indexed by I and P amatrix whose entries are indexed by I × I. We write p

(n)ij = (Pn)ij for the (i, j)

entry in Pn. We agree that P 0 is the identity matrix, that is(P 0)ij

= δij .

Proposition 3.1.1 Let (Xn)n≥0 be Markov(λ, P ). Then for all n, m ≥ 0

P (Xn+m = j|X0 = i0, , ..., Xn = i) = P (Xn+m = j|Xn = i)= P (Xm = j|X0 = i).

31

32 CHAPTER 3. MARKOV PROCESSES

Proof.

P (Xn+m = j|Xn = i)

=P (Xn+m = j, Xn = i)

P (Xn = i)=

=

∑i0∈I

∑i1∈I ...

∑in+m−1∈I P (X0 = i0, ..., Xn = i)piin+1 · · · pin+m−1j∑

i0∈I

∑i1∈I ...

∑in∈I P (X0 = i0, ..., Xn = i)

=∑

in+1∈I

...∑

in+m−1∈I

piin+1 · · · pin+m−1j

=

∑in+1∈I ...

∑in+m−1∈I P (X0 = i, X1 = in+1..., Xm−1 = in+m−1, Xm = j)

P (X0 = i)

=P (X0 = i, Xm = j)

P (X0 = i)= P (Xm = j|X0 = i)

Remark 3.1.1 Note also that

P (Xn+1 = j1, Xn+2 = j2, ..., Xn+m = jm|X0 = i0, , ..., Xn = i)= P (X1 = j1, X2 = j2, ..., Xm = jm|X0 = i).

Proposition 3.1.2 Let (Xn)n≥0 be Markov(λ, P ). Then, for all n, m ≥ 0,

(i) PXn = j = (λPn)j , j ∈ I

(ii) Pi(Xn = j) := P (Xn = j|, X0 = i) = P (Xn+m = j|, Xm = i) = p(n)ij .

Proof. (i):

PXn = j =∑i0∈I

∑i1∈I

...∑

in−1∈I

P (X0 = i0, ..., Xn = j)

=∑i0∈I

∑i1∈I

...∑

in−1∈I

P (Xn = j|Xn−1 = in−1, ..., X0 = i0)P (Xn−1 = in−1, ..., X0 = i0)

=∑i0∈I

∑i1∈I

...∑

in−1∈I

pin−1jP (Xn−1 = in−1, ..., X0 = i0)

=∑i0∈I

∑i1∈I

...∑

in−1∈I

pin−2in−1pin−1jP (Xn−2 = in−2, ..., X0 = i0) =

= ... =∑i0∈I

∑i1∈I

...∑

in−1∈I

λi0pi0i1 · · · pin−2in−1pin−1j = (λPn)j .

(ii): P (Xn = j|, X0 = i) =∑

i1∈I ...∑

in−1∈I pii1 · · · pin−1j = (Pn)ij = p(n)ij .

Example 3.1.1 The most general two-state chain has transition matrix

P =(

1− α αβ 1− β

).

3.1. DISCRETE-TIME MARKOV CHAINS 33

If we want to calculate p(n)11 we can use the relation Pn+1 = PnP , then

p(n+1)11 = p

(n)11 (1− α) + p

(n)12 β.

We know thatp(n)11 + p

(n)12 = P1(Xn = 1 or 2) = 1,

sop(n+1)11 = p

(n)11 (1− α− β) + β, n ≥ 0

with p(0)11 = 1. This has a unique solution

p(n)11 =

βα+β + α

α+β (1− α− β)n for α + β > 01 for α + β = 0.

3.1.1 Class structure

It is sometimes possible to split a Markov chain into smaller pieces, each ofwhich is relatively easy to analyze, and which together we can understand thewhole.This is done by identifying the communicating classes.

We say that i leads to j and write i → j if

Pi(Xn = j for some n ≥ 0) > 0.

We say that i communicates with j and write i←→ j if both i → j and j → i.

Proposition 3.1.3 The following are equivalent

(i) i→ j;(ii) pi0i1 · · · pin−1in

> 0 for some states i0, i1, ..., in, with i0 = i and in = j;

(iii) p(n)ij > 0 for some n ≥ 0.

Proof. (i)∼(iii):

p(n)ij ≤ Pi(Xn = j for some n ≥ 0) ≤

∞∑n=0

p(n)ij .

(ii)∼ (iii):p(n)ij =

∑i1,i2,...,in−1

pii1 · · · pin−1j.

It is clear that i ←→ j is an equivalence relation. So, it partitions I (thestate space) into communicating classes. We say that a class C is closed if

i ∈ C, i→ j imply j ∈ C.

We cannot escape from a closed class. A state i is absorbing if i is a closedclass. A chain or matrix P is called irreducible if I is a single class.


Example 3.1.2 Consider the stochastic matrix

P =

12

12 0 0 0 0

0 0 1 0 0 013 0 0 1

313 0

0 0 0 12

12 0

0 0 0 0 0 10 0 0 0 1 0

,

the communicating classes associated with it are: 1, 2, 3, 4 and 5, 6, withonly 5, 6 being closed.

3.1.2 Hitting times and absortion probabilities

Let τA be the hitting time of a subset A ⊆ I:

τA = infn ≥ 0, Xn ∈ A.

One purpose of this subsection is to learn how to calculate the probability,starting from i, that (Xn)n≥0 ever hits A that is

hAi := Pi(τA <∞).

When A is a closed class this probability is named absortion probability. Theother purpose is to calculate the mean time taken for (Xn)n≥0 to reach A:

kAi := Ei(τA) =

∑1≤n<∞

nPi(τA = n) +∞Pi(τA =∞).

Proposition 3.1.4

Ei(τA) =∑

1≤n≤∞

Pi(τA ≥ n)

Proof.Pi(τA ≥ n) =

∑n≤m≤∞

Pi(τA = m),

then ∑1≤n≤∞

Pi(τA ≥ n) =∑

1≤n≤∞

∑n≤m≤∞

Pi(τA = m)

=∑

1≤m≤∞

∑1≤n≤m

Pi(τA = m)

=∑

1≤m≤∞

mPi(τA = m).


Example 3.1.3 Consider the following transition matrix

P =

1 0 0 012 0 1

2 00 1

2 0 12

0 0 0 1

,

Introduce hi := h4i and ki =: k

1,4i . Then is clear that h1 = 0, h4 = 1 and

k1 = k4 = 0. Also we have

h2 =12h1 +

12h3, k2 = 1 +

12k1 +

12k3,

andh3 =

12h2 +

12h4, k3 = 1 +

12k2 +

12k4.

Therefore

h2 =12h3, k2 = 1 +

12k3

h3 =12h2 +

12, k3 = 1 +

12k2

and then

h2 =13, h3 =

23

k2 = k3 = 2.

Proposition 3.1.5 The vector of hitting probabilities hA = (hAi : i ∈ I) is the

minimal non-negative solution to the system of linear equationshA

i = 1 for i ∈ AhA

i =∑

j∈I pijhAj for i 6∈ A.

(3.1)

where minimality means that if x = (xi : i ∈ I) is another solution with xi ≥ 0for all i, then xi ≥ hA

i , for all i.

Proof. First we prove that hA satisfies (3.1). If X0 = i ∈ A then τA = 0,so hA

i = 1. If X0 = i 6∈ A, then τA ≥ 1 and by the Markov property

Pi(τA <∞|X1 = j) = Pj(τA <∞) = hAj .

Then

hAi =

∑j∈I

Pi(τA <∞, X1 = j)

=∑j∈I

Pi(τA <∞|X1 = j)Pi(X1 = j)

=∑j∈I

Pj(τA <∞)Pi(X1 = j) =∑j∈I

pijhAj .


Suppose now that x = (xi : i ∈ I) is another solution to (3.1) with xi ≥ 0. ThenhA

i = xi = 1 for i ∈ A consequently, if i 6∈ A

xi =∑j∈I

pijxj =∑j∈A

pij +∑j 6∈A

pijxj .

Now if we substitute for xj we obtain

xi =∑j∈A

pij +∑j 6∈A

pij

∑k∈A

pjk +∑k 6∈A

pikxk

=∑j∈A

pij +∑j 6∈A

∑k∈A

pijpjk +∑j 6∈A

∑k 6∈A

pijpikxk

= Pi(X1 ∈ A) + Pi(X1 6∈ A,X2 ∈ A) +∑j 6∈A

∑k 6∈A

pijpikxk.

By repeating the substitution, after n steps we have

xi = Pi(X1 ∈ A) + ... + Pi(X1 6∈ A, ...,Xn−1 6∈ A,Xn ∈ A)+

+∑j1 6∈A

...∑

jn 6∈A

pij1pj1j2 · · · pjn−1jnxjn

.

So, since tha last term is non-negative, for all n ≥ 1

xi ≥ Pi(X1 ∈ A) + ... + Pi(X1 6∈ A, ...,Xn−1 6∈ A,Xn ∈ A)

= Pi(τA ≤ n).

Finallyxi ≥ lim

n−→∞Pi(τA ≤ n) = Pi(τA <∞) = hA

i .

Proposition 3.1.6 The vector of mean hitting times kA = (kAi : i ∈ I) is the

minimal non-negative solution to the system of linear equationskA

i = 0 for i ∈ AkA

i = 1 +∑

j 6∈A pijkAj for i 6∈ A.

(3.2)

Proof. First we prove that kA satisfies the system (3.2). If i ∈ A thenτA = 0 and kA

i = 0. If X0 = i 6∈ A, then τA ≥ 1 and by the Markov property

Ei(τA|X1 = j) = 1 + Ej(τA).

Then

Ei(τA) =∑j∈I

Ei(τA1X1=j) =∑j∈I

Ei(τA|X1 = j)Pi (X1 = j)

= 1 +∑j∈I

Ej(τA)pij = 1 +∑j 6∈A

pijkAj .


Now, assume that y = (yi, i ∈ I) is a solution of the system (3.2). Then

yi = 1 +∑j 6∈A

pijyj = 1 +∑j 6∈A

pij(1 +∑k 6∈A

pjkyk)

= 1 +∑j 6∈A

pij +∑j 6∈A

∑k 6∈A

pijpjkyk

= Pi(τA ≥ 1) + Pi(τA ≥ 2) +∑j 6∈A

∑k 6∈A

pijpjkyk

= ... = Pi(τA ≥ 1) + ... + Pi(τA ≥ n)

+∑j1 6∈A

...∑

jn 6∈A

pij1 · · · pjn−1jnyjn ,

andyi ≥

∑1≤n≤∞

Pi(τA ≥ n).

(Gambler’s ruin) Consider the transition probabilities

p00 = 1pi,i−1 = q > 0, pi,i+1 = p = 1− q > 0, for i = 1, 2, ....

Imagine you enter in the casino with 1 euro and you can win 1 euro withprobability por to lose it with probability q. The resources of the casino areregarded as infinite, so there is no upper bound to your fortune. But which isthe probability you finish broke?

Set hi = Pi(τ0 <∞), then hi is the minimal solution of the system

h0 = 1,

hi = qhi−1 + phi+1, i = 1, 2, ...

If p 6= q this recurrence equation has a general solution

hi = A + B

(q

p

)i

.

If p < q (quite often) the restriction 0 ≤ hi ≤ 1 forces B to be zero so hi = 1for all i. If p > q, since h0 = 1, we have a family of solutions

hi =(

q

p

)i

+ A(1−(

q

p

)i

),

and since hi ≥ 0 we must have A ≥ 0, then the minimal solution is hi =(

qp

)i

.

Finally if p = q the recurrent relation has a general solution

hi = A + Bi


and again the restriction 0 ≤ hi ≤ 1 forces B to be zero, so hi = 1 for all i.Thus, even if you find a fair casino and you have a lot of money, you are certainto end up broke.

Proposition 3.1.7 (Strong Markov property) Let (Xn)n≥0 be Markov(λ, P ).And τ a finite stopping time associated to (Xn)n≥0 and A ∈ Fτ then for allm ≥ 0

P (Xτ+m = j|A,Xτ = i) = P (Xm = j|X0 = i).

That is (XT+n)n≥0 is Markov(µ, P ) where µi = P (XT = i).

Proof.

P (Xτ+m = j|A,Xτ = i) =P (Xτ+m = j, A,Xτ = i)

P (A,Xτ = i)=

=∑

n<∞ P (Xτ+m = j, A,Xτ = i, τ = n)∑n<∞ P (A,Xτ = i, τ = n)

=∑

n<∞ P (Xn+m = j, A,Xn = i, τ = n)∑n<∞ P (A,Xn = i, τ = n)

=∑

n<∞ P (Xn+m = j|Xn = i)P (A,Xn = i, τ = n)∑n<∞ P (A,Xn = i, τ = n)

=∑

n<∞ P (Xm = j|X0 = i)P (A,Xn = i, τ = n)∑n<∞ P (A,Xn = i, τ = n)

=P (Xm = j|X0 = i)

∑n<∞ P (A,Xn = i, τ = n)∑

n<∞ P (A,Xn = i, τ = n)

Remark 3.1.2 Note also that if τ is a finite stopping time associated to (Xn)n≥0

and A ∈ Fτ then for all m ≥ 0

P (Xτ+1 = j1, Xτ+2 = j2, ..., Xτ+m = jm|A,Xτ = i)= P (X1 = j1, X2 = j2, ..., Xm = jm|X0 = i).

In particular assume that

τ0 = infn ≥ 0, Xn ∈ J ⊂ I

and for m = 0, 1, 2, ...

τm+1 = infn > τm, Xn ∈ J ⊂ I,

then

P(Xτm+1 = j|Xτ0 = i0, Xτ1 = i1, ..., Xτm = i

)= P (Xτ1 = j|Xτ0 = i) ,

where for i, j ∈ J

P (Xτ1 = j|Xτ0 = i) = Pi(Xn = j for some n ≥ 1)


3.1.3 Recurrence and transience

Let (Xn)n≥0 be a Markov chain with transition matrix P. We say that a statei is recurrent if

Pi(Xn = i, for infinitely many n) = 1.

We say that i is transient if

Pi(Xn = i, for infinitely many n) = 0.

Recall that the first passage time to state i is the random variable τ (i) definedas

τ (i) = infn ≥ 1, Xn = i,

then we can define inductively the rth passage time τ(i)r to state i for r = 1, 2, ...

byτ

(i)1 = τ (i), τ

(i)r+1 = infn ≥ τ (i)

r + 1, Xn = i.The length for the rth excursion, for r ≥ 2, is given by

δ(i)r :=

τ

(i)r − τ

(i)r−1 if τ

(i)r−1 <∞

0 if τ(i)r−1 =∞

.

Lemma 3.1.1 For r ≥ 2, conditional on τ(i)r−1 < ∞ and for all A ∈ F

τ(i)r−1

wehave

P (δ(i)r = n|τ (i)

r−1 <∞, A) = Pi(τ (i) = n).

Proof. By the strong Markov property and conditional on

τ(i)r−1 <∞

,

(Xτ(i)r−1+n

)n≥0 is a Markov chain with transition matrix P = (pij), pij =

P (X1 = j|X0 = i) and λj = δij for all j ∈ I. Then δ(i)r is the first passage time

of (Xτ(i)r−1+n

)n≥0 to state i, so

P (δ(i)r = n|τ (i)

r−1 <∞, A) = P (τ (i) = n|X0 = i).

Let us introduce the number of visits Vi to i:

Vi =∞∑

n=0

1Xn=i,

note that

Ei(Vi) =∞∑

n=0

Ei(1Xn=i) =∞∑

n=0

Pi(Xn = i) =∞∑

n=0

p(n)ii .

Also we can compute the distribution of Vi under Pi in terms of the returnprobability

fi = Pi(τ (i) <∞).


Lemma 3.1.2 For r ≥ 0 we have Pi(Vi > r) = fri (we assume 00 = 1).

Proof. For Pi(Vi > 0) = 1 since X0 = i, so the result is true for r = 0.X0 = i, Vi > r = X0 = i, τ

(i)r <∞, assume the result is true for r, then

Pi(Vi > r + 1) = Pi(τ(i)r+1 <∞)

= Pi(τ (i)r <∞, δ

(i)r+1 <∞)

= Pi(δ(i)r+1 <∞|τ (i)

r <∞)Pi(τ (i)r <∞)

= Pi(τ (i) <∞)fri .

Theorem 3.1.1 The following dichotomy holds:

(i) if Pi(τ (i) <∞) = 1 then i is recurrent and∞∑

n=0

p(n)ii =∞

(ii) if Pi(τ (i) <∞) < 1 then i is transient and∞∑

n=0

p(n)ii <∞.

Proof.Pi(Vi =∞) = lim

n→∞Pi(Vi > n) = fn

i ,

then if fi := Pi(τ (i) <∞) = 1, then Pi(Vi =∞) = 1 and i is recurrent and

∞∑n=0

p(n)ii = Ei(Vi) =∞.

On the other hand if fi < 1

∞∑n=0

p(n)ii = Ei(Vi) =

∞∑n=0

Pi(V > n) =∞∑

n=0

fni

=1

1− fi<∞,

so Pi(Vi =∞) = 0 and i is trasient.Recurrence and trasience are class properties.

Theorem 3.1.2 Let C a communicating class. Then either all states in C aretrasient or recurrent.

Proof. Take a pair of states i, j ∈ C and assume that i is trasient. Thenthere exist n, m such that p

(n)ij > 0 and p

(m)ji > 0, and for all r ≥ 0

p(n+m+r)ii ≥ p

(n)ij p

(r)jj p

(m)ji ,


then∞∑

r=0

p(r)jj ≤

1

p(n)ij p

(m)ji

∞∑r=0

p(n+m+r)ii <∞,

so by the previous Theorem j is also trasient.

Theorem 3.1.3 Every recurrent class is closed.

Proof. Let C a class that is not closed. Then there exist i ∈ C, j 6∈ C andm ≥ 1 such that

Pi(Xm = j) > 0.

Since we have that

Pi(Xm = j, Xn = i for infinitely many n) = 0,

this implies that

Pi(Xn = i for infinitely many n) < 1,

so i is not recurrent, and so neither is C.

Theorem 3.1.4 Every finite closed class is recurrent.

Proof. Suppose that C is close and that (Xn)n≥0 starts in C. Then forsome i ∈ C

P (Xn = i for infinitely many n) > 0,

since the class is finite. Let τ = infn ≥ 0, Xn = i then

P (Xn = i for infinitely many n)= P (τ <∞)P (Xτ+m = i for infinitely many m|τ <∞)= P (τ <∞)Pi(Xm = i for infinitely many m) > 0.

So this shows that i is not trasient.

3.1.4 Recurrence and trasient in a random walk

Consider the transition probabilities in Z

pi,i−1 = q > 0, pi,i+1 = p = 1− q > 0, for i ∈ Z,

that is a simple random walk on Z. It is clear that

p(2n+1)00 = 0,

and it is easy to see that

p(2n)00 =

(2nn

)pnqn.


Bu Stirling’s formula

n! ∼√

2πnnne−n, as n→∞

so

p(2n)00 =

(2n)!(n!)2

(pq)n ∼1√π

(4pq)n

√n

, as n→∞.

If p = q = 12 , p

(2n)00 ∼ 1√

π1√n, and since

∞∑n=1

1√n

=∞,

we have that∞∑

n=0

p(n)00 =∞,

so the random walk is recurrent. If, on the contrary, p 6= q, then 4pq = r < 1and p

(2n)00 ∼ 1√

πrn√

n, now since

∞∑n=1

rn

√n

<∞,

we have that∞∑

n=0

p(n)00 <∞,

and the random walk is trasient.

3.1.5 Invariant distributions

Many of the long-time properties of Markov chains are connected with the notionof an invariant distribution or measure. We say that λ is invariant if

λP = λ.

Theorem 3.1.5 Let (Xn)n≥0 be Markov(λ, P ) and suposse that λ is invariantthen (Xm+n)n≥0 is also Markov(λ, P ).

Proof. By the Markov property (Xm+n)n≥0 has the same transition matrixwith initial distribution P (Xm = i) = (λPm)i = λi.

Theorem 3.1.6 Let I be finite. Suppose for some i ∈ I that

p(n)ij → πj as n→∞ for all j ∈ I.

Then π = πj , j ∈ I is an invariant distribution.


Proof. We have∑j∈I

πj =∑j∈I

limn→∞

p(n)ij = lim

n→∞

∑j∈I

p(n)ij = 1,

and

πj = limn→∞

p(n)ij = lim

n→∞

∑k∈I

p(n−1)ik pkj =

∑k∈I

limn→∞

p(n−1)ik pkj

=∑k∈I

πkpkj .

Example 3.1.4 Consider the two-state Markov chain with transition matrix

P =(

1− α αβ 1− β

).

Ignore te trivial case α = β = 0 and α = β = 1. Then, as we saw before

Pn →

(β

α+βα

α+ββ

α+βα

α+β

)as n→∞,

so, by the previous theorem, the distribution(

βα+β , α

α+β

)must be invariant. Of

course we can solve(λ1, λ2

)( 1− α αβ 1− β

)=(

λ1, λ2

),

and we obtain (λ1, λ2

)=(

βα+β , α

α+β

).

Note also that , for the case α = β = 1 we do not have convergence of Pn.

Example 3.1.5 Consider the three-state chain with matrix

P =

0 1 00 1

212

12 0 1

2

.

Then, since the eigenvalues are 1, i/2,−i/2, we deduce that

p(n)11 = a + b

(i

2

)n

+ c

(− i

2

)n

,

for some constants a, b and c. Since p(n)11 is real and(

± i

2

)n

=(

12

)n

e±inπ2 =

(12

)n

(cosnπ

2± i sin

nπ

2),


then we can write

p(n)11 = α +

(12

)n (β cos

nπ

2+ γ sin

nπ

2

),

for constants α, β and γ that can be determined by the boundary conditions

1 = p(0)11 = α + β

0 = p(1)11 = α +

12γ

0 = p(2)11 = α− 1

4β.

So, α = 1/5, β = 4/5, γ = −2/5 and

p(n)11 =

15

+(

12

)n(45

cosnπ

2− 2

5sin

nπ

2

).

Then to find the invariant distribution we can write down the components of thevector equation

π1 =12π3

π2 = π1 +12π2

π3 =12π2 +

12π3.

One equation is redundant but we have the additional condition

π1 + π2 + π3 = 1

and the soution is π = (1/5, 2/5, 2/5). Note that

p(n)11 →

15

as n→∞,

so this confirm the previous theorem and also we could have used the result toidentify α = 1/5 instead of working out p

(2)11 .

For a fixed state k, consider for each i the expected time spent in i betweenvisits to k:

γki := Ek

τ(k)−1∑n=0

1Xn=i.

Theorem 3.1.7 Let P be irreducible and recurrent. Then

(i) γkk = 1;

(ii) γk := (γki , i ∈ I) satisfies γkP = γk;

(iii) 0 < γki <∞ for all i ∈ I.


Proof. (i) is obvious. (ii) Since P is recurrent, under Pk , τ (k) < ∞ andX0 = Xτ(k) = k

γkj = Ek

τ(k)∑n=1

1Xn=j

= Ek

( ∞∑n=1

1Xn=j1n≤τ(k)

)

=∞∑

n=1

Pk(Xn = j, n ≤ τ (k)) =∑i∈I

∞∑n=1

Pk(Xn−1 = i,Xn = j, n ≤ τ (k))

=∑i∈I

∞∑n=1

Pk(Xn = j|Xn−1 = i, n ≤ τ (k))Pk(Xn−1 = i, n ≤ τ (k))

=∑i∈I

pij

∞∑n=1

Pk(Xn−1 = i, n ≤ τ (k)) =∑i∈I

pijE

τ(k)−1∑m=0

1Xm=i

=∑i∈I

pijγki .

(iii) Fixed i ∈ I, since P is irreducible, there exist m,n ≥ 0 such that p(n)ki > 0

and p(m)ik > 0, then by (i) and (ii) γk

i > γkkp

(n)ki > 0 and γk

k > γki p

(m)ik .

Theorem 3.1.8 Let P be irreducible and let λ be an invariant measure for Pwith λk = 1.Then λ ≥ γk. If in addition P is recurrent, then λ = γk.

Proof.

λj =∑i0∈I

λi0pi0j =∑i0 6=k

λi0pi0j + pkj

=∑i0 6=k

∑i1 6=k

λi1pi1i0pi0j + pkj +∑i0 6=k

pki0pi0j

= ... =∑i0 6=k

∑i1 6=k

...∑in 6=k

λinpinin−1 · · · pi0j

+ pkj +∑i0 6=k

pki0pi0j + ... +∑i0 6=k

∑i1 6=k

...∑

in−1 6=k

pkin−1 · · · pi0j

≥ pkj +∑i0 6=k

pki0pi0j + ... +∑i0 6=k

∑i1 6=k

...∑

in−1 6=k

pkin−1 · · · pi0j

→ γkj as n goes to infinity.

Since P is recurrent γ is invariant and µ := λ − γ will be also invariant, nowsince it is irreducible, for any fixed i, there will be n ≥ 0 such that p

(n)ik > 0 and

0 = µk =∑

j

µjp(n)jk ≥ µip

(n)ik ,

so µi = 0.


A recurrent state is positive recurrent if the expected return time

mi := Ei(τ (i))

is finite, otherwise is called null recurrent.

Remark 3.1.3 Note that

mi = Ei

τ(i)−1∑n=0

∑j 6=i

1Xn=j

+ 1 =∑j∈I

γij

Theorem 3.1.9 Let P be irreducible. Then the following are equivalent:

(i) every state is positive recurrent;(ii) some state i is positive recurrent;(iii) P has an invariant distribution, π say.

Moreover, when (iii) holds we have mi = 1/πi , for all i.

Proof. (i)=⇒(ii) is evident. (ii)=⇒(iii). If i is recurrent then P is recur-rent and by the Theorem 3.1.7 γi is an invariant measure. But∑

j∈I

γij = mi,

and mi < ∞, so πi := γij/mi defines an invariant distribution. (iii)=⇒(i).

Takes any state k, since P is ireducible there exist n ≥ 1 such that πk =∑j∈I πip

(n)ik > 0. Set λi = πi/πk, then, by the previous theorem λ ≥ γk and

mk =∑i∈I

γki ≤

∑i∈I

πi

πk=

1πk

<∞

and k is positive recurrent. Note finally that, since P is recurrent, by the previ-ous theorem the above inequality is in fact an equality.

Theorem 3.1.10 If the state space is finite there is at least one stationarydistribution.

Proof. We can assume the chain is irreducible, then since it is finite, everystate is positive recurrent, in fact since

mk =∑i∈I

γki

and I is finite it is sufficient to see that γki <∞, but this follows from Theorem

3.1.7.


3.1.6 Limiting behaviour

Let us call a state i is aperiodic if p(n)ii > 0 for all sufficiently large n. It can be

shown that i is aperiodic iff the set n ≥ 0, p(n)ii > 0 has no common divisor

other than 1. It can be also shown that it is a class property.

Theorem 3.1.11 Let P be ireducible and aperiodic and suposse that P has aninvariant distribution π then

p(n)ij → πj as n→∞ for all i, j.

3.1.7 Ergodic theorem

Ergodic theory concerns the limiting behaviour of averages over time. We shallprove a theorem which identifies for Markov chains the long-run proportion oftime spent in each state.

Denote by Vi(n) the number of visits to i before n :

Vi(n) :=n−1∑k=0

1Xk=i.

Theorem 3.1.12 Let P irreducible. Then

Vi(n)n→ 1

mia.s. as n→∞.

Moreover, in the positive recurrent case, for any bounded function f : I → R wehave

1n

n−1∑k=0

f(Xk)→∑i∈I

πif(i), a.s as n→∞.

where π is the unique invariant distribution.

Proof. Without loss of generality we assume that we start at state i. If Pis trasient then Vi(n) is bounded by the total number of visits Vi so then

Vi(n)n≤ Vi

n→ 0 =

1mi

Suposse that P is recurrent. Let δ(i)r the length of the r excursion to i. Then

by the strong Markov property δ(i)r are i.i.d. with E(δ(i)

r ) = mi, and

δ(i)1 + ... + δ

(i)Vi(n)−1

Vi(n)≤ n

Vi(n)≤

δ(i)1 + ... + δ

(i)Vi(n)

Vi(n)

and we can apply the strong law of large numbers.


For the second part, we can assume w.l.o.g. that |f | ≤ 1 then for any J ⊆ I∣∣∣∣∣ 1nn−1∑k=0

f(Xk)− πif(i)

∣∣∣∣∣ =∣∣∣∣∣∑i∈I

(Vi(n)

n− πi

)f(i)

∣∣∣∣∣≤∑i∈J

∣∣∣∣Vi(n)n− πi

∣∣∣∣+∑i 6∈J


∣∣∣∣≤∑i∈J


∣∣∣∣+∑i 6∈J

(Vi(n)

n+ πi

)

≤ 2∑i∈J


∣∣∣∣+ 2∑i 6∈J

πi,

but we saw thatVi(n)

n→ πi as n→∞ for all i.

Then, given ε > 0 we can take J such that∑i 6∈J

πi <ε

4,

and N(ω) such that for all n ≥ N(ω)∑i∈J


∣∣∣∣ < ε

4.

3.2 Poisson process

3.2.1 Exponential distribution

Definition 3.2.1 We say that T : Ω → [0,∞] follows an exponential distribu-tion with parameter λ (we shall write it E(λ) for short) if

fT (t) = λe−λt1[0,∞](t).

It is easy to see the following properties:

• E(T ) = 1/λ, var(T ) = 1/λ2.

• T/µ ∼ E (µλ) .

• P (T > t + s|T > t) = P (T > s) (lack of memory).

• If S ∼ E(λ) and T ∼ E(µ), then min(S, T ) ∼ E(λ + µ) and P (S < T ) =λ

λ+µ .

3.2. POISSON PROCESS 49

• If t1, t2, ..., tn are independent E(λ) then Tn := t1 + t2 + ... + tn ∼gamma(n, λ).

Theorem 3.2.1 Let T1, T2, ... be a sequence of independent random variableswith Tn = E(λn) and 0 < λn <∞ for all n.

(i) If∞∑

n=1

1λn

<∞, then P (∞∑

n=1

Tn <∞) = 1.

(ii) If∞∑

n=1

1λn

=∞, then P (∞∑

n=1

Tn =∞) = 1.

Proof. (i) Suppose∑∞

n=11

λn< ∞. Then by the monotone convergence

theorem

E

( ∞∑n=1

Tn

)=

∞∑n=1

E (Tn) =∞∑

n=1

1λn

<∞,

so P (∑∞

n=1 Tn <∞) = 1.(ii) Suppose instead that

∑∞n=1

1λn

= ∞. Then Π∞n=1(1 + 1

λn) = ∞, by the

monotone convergence theorem and the independence

E

(exp−

∞∑n=1

Tn

)= Π∞

n=1E (exp−Tn) = Π∞n=1(1 +

1λn

)−1 = 0,

so P (∑∞

n=1 Tn =∞) = 1.

Theorem 3.2.2 Let I be a countable set and let Tk, k ∈ I, be independentrandom variables with Tk ∼ E(qk) and q :=

∑k∈I qk < ∞. Set T = infk Tk.

Then this minimum is attained at a unique random value K of k, with probability1. Moreover T and K are independent, with T ∼ E(q) and P (K = k) = qk

q .

Proof.

P (K = k, T ≥ t)= P (Tk ≥ t, Tj > Tk for all j 6= k)

=∫ ∞

t

qke−qksP (Tj > s for all j 6= k)ds

=∫ ∞

t

qke−qksΠj 6=ke−qjsds =qk

qe−qt,

Hence P (K = k, for some k) = 1, and K and T have the claimed joint distri-bution.

3.2.2 Poisson process

Definition 3.2.2 Let t1, t2, ...of independent E(λ) random variables, let Tn =t1 + t2 + ... + tn, for n ≥ 1, T0 = 0 and define Ns = maxn, Tn ≤ s.Then, theprocess (Ns)s≥0 is named a Poisson process with parameter (or rate) λ.


Remark 3.2.1 We think of the tn as times between arrivals of customers at abank. Then Ns is the number of arrivals by the time s.

Lemma 3.2.1 Ns ∼ Poisson(λs)

Proof. P (Ns = n) = P (Tn ≤ s < Tn+1).

Tn+1 = tn+1 + Tn,

and tn+1 is independent of Tn, then

P (Tn ≤ s < Tn+1) =∫ s

0

∫ ∞

s

fTn(t)ftn+1(u− t)dudt

=∫ s

0

∫ ∞

s

1(n− 1)!

λntn−1e−λtλe−λ(u−t)dudt

=λn

(n− 1)!e−λs

∫ s

0

tn−1dt = e−λs (λs)n

n!.

Lemma 3.2.2 Nt+s −Ns, t ≥ 0 is a Poisson process with rate λ independentof (Nr)0≤r≤s .

Proof. Assume that for the time s there where four arrivals and that T4 =u4. Then

P (t5 > s− u4 + t|t5 > s− u4) = P (t5 > t) = e−λt.

So, the first arrival after s is E(λ) and independent of T1, T2, T3, T4. It is clearthat t5, t6, ...are independent of T1, T2, T3, T4.

Lemma 3.2.3 (Nt)t≥0 has independent increments.

Theorem 3.2.3 If (Nt)t≥0 is a Poisson process

(i) N0 = 0(ii) Nt+s −Ns ∼ Poisson(λt)(iii) (Nt)t≥0 has independent increments .

Conversely , if (i) (ii) and (iii) hold and the process is right continuous, then(Nt)t≥0 is a Poisson process.

Proof. (The converse)

P (T1 > t) = P (Nt = 0) = e−λt.

P (t2 > t|t1 ≤ s) = P (Nt+s −Ns = 0|Nr ≤ 1, 0 ≤ r ≤ s)

= P (Nt+s −Ns = 0) = e−λt,

so t2 ∼ E(λ) independent of t1.

3.2. POISSON PROCESS 51

Theorem 3.2.4 (Nt)t≥0 is a Poisson process iff

(i) N0 = 0,

(ii) Nt is an increasing, right continuous, integer-valued processwith independent increments,(ii) P (Nt+h −Nt = 0) = 1− λh + o(h), P (Nt+h −Nt = 1) = λh + o(h) ,

as h ↓ 0 uniformly in t.

Proof. Suppose that (Nt)t≥0 is a Poisson process. Then

P (Nt+h −Nt ≥ 1) = P (Nh ≥ 1) = 1− e−λh = λh + o(h),P (Nt+h −Nt ≥ 2) = P (Nh ≥ 2) = P (T1 + T2 ≤ h) ≤ P (T1 ≤ h, T2 ≤ h)

= P (T1 ≤ h)P (T2 ≤ h) = (1− e−λh)2 = o(h).

If (i) (ii) and (iii) hold, for i = 2, 3, ..., we have P (Nt+h −Nt = i) = o(h) ash ↓ 0, uniformly in t. Set pj(t) = P (Nt = j). Then, for j = 1, 2, ...,

pj(t + h) = P (Nt+h = j) =j∑

i=0

P (Nt+h −Nt = i)P (Nt = j − i)

= (1− λh + o(h))pj(t) + (λh + o(h)) pj−1(t) + o(h).

Sopj(t + h)− pj(t)

h= −λpj(t) + λpj−1(t) + O(h),

since this estimate is uniform in t we can put t = s− h to obtain

pj(s)− pj(s− h)h

= −λpj(s− h) + λpj−1(s− h) + O(h),

now letting h ↓ 0 we obtain that pj(t) are differentiable and satisfies the differ-ential equation

p′j(t) = −λpj(t) + λpj−1(t).

Analogously we can see that

p′0(t) = −λp0(t),

and sincep0(0) = 1, pj(0) = 0 for j = 1, 2, ...

we have that

pj(t) = e−λt (λt)j

j!for j = 0, 1, 2, ....


3.3 Continuous-time Markov chains

3.3.1 Definition and examples

Definition 3.3.1 (Xt)t≥0 is a Markov chain if for any 0 ≤ s0 < s1 · ·· < sn < sand possible states i0, ..., in, i, j we have

P (Xt+s = j|Xs = i, Xsn= in, ..., Xs0 = i0) = P (Xt+s = j|Xs = i)

= P (Xt = j|X0 = i).

Example 3.3.1 Let (Nt)t≥0 be a Poisson process with rate λ and let (Yn)n≥0

be a discrete time Markov chain with transition probabilities πij independent of(Nt)t≥0 . Then Xt = YNt

is a continuous-time Markov chain. Intuitively, thisfollows from the memoryless property of the exponential distribution : if Xs = i,then, independently of what happened in the past, the time to the next jump willbe exponentially distributed with rate λ and will go to state j with probabilitypij . If we write

pij(t) = P (Xt = j|X0 = i),

we have

pij(t) =P (Xt = j, X0 = i)

P (X0 = i)=

P (YNt = j, Y0 = i)P (Y0 = i)

=∞∑

k=0

P (Yk = j, Y0 = i)1P (Y0 = i)

P (Nt = k)

=∞∑

k=0

π(k)ij

e−λt (λt)k

k!=

∞∑k=0

e−λt (tλπij)(k)

k!

=(e−λtetλΠ

)ij

=(et(λΠ−λI)

)ij

=(etQ)ij

.

Where Q = λ(Π− I).

Chapman-Kolmogorov equation

Proposition 3.3.1 ∑k∈I

pik(s)pkj(t) = pij(t + s). (3.3)

Proof.

pij(t + s) = P (Xt+s = j|X0 = i) =∑k∈I

P (Xt+s = j, Xs = k|X0 = i)

=∑k∈I

P (Xt+s = j|Xs = k)P (Xs = k|X0 = i).

3.3. CONTINUOUS-TIME MARKOV CHAINS 53

(3.3) shows that if we know that transition probability for t < t0 for anyt0 > 0 we know it for all t. This observation suggests that the transitionprobabilities pij(t) can be determined from their derivatives at 0 :

qij = limh→0

pij(h)h

, j 6= i.

If this limit exists we will call qij the jump rate from i to j.For Example 3.3.1 qij = λπij .Example 3.3.1 is atypical. There we started with the Markov chain and then

defined its rates: In most cases it is much simpler to describe the system bywriting down its transition rates qij for j 6= i, which describe the rates at whichjumps are made from i to j.

Example 3.3.2 Poisson process

qn,n+1 = λ for all n ≥ 0.

Example 3.3.3 M/M/s queue. Imagine a bank with s sellers that serve cus-tomers who queue in a single line if all of the servers are busy. We imagine thatcustomers arrive at times of a Poisson process with rate λ, and that each ser-vice time is an independent exponential with rate µ. As in the previous exampleqn,n+1 = λ. To model the departures we let

qn,n−1 =

nµ 0 ≤ n ≤ ssµ n ≥ µ

3.3.2 Computing the transition probability

Theorem 3.3.1 Let Q be matrix on a finite set I. Set P (t) = etQ . Then(P (t), t ≥ 0) has the following properties:

(i) P (s + t) = P (s)P (t) for all s, t (semigroup property)

(ii) (P (t), t ≥ 0) is the unique solution to the forward equationddtP (t) = P (t)Q, P (0) = I

(iii) (P (t), t ≥ 0) is the unique solution to the backward equationddtP (t) = QP (t), P (0) = I

(iv) for k = 0, 1, 2, ..., we have(ddt

)k |k=0P (t) = Qk.

Proof. For any s, t ∈ R, sQ and tQ commute, so

esQetQ = e(s+t)Q.

Proving the semigroup property. In fact if Q1 and Q2 commute

e(Q1+Q2) =∞∑

n=0

(Q1 + Q2)n

n!=

∞∑n=0

1n!

n∑k=0

n!k!(n− k)!

Qk1Qn−k

2

=∞∑

k=0

Qk1

k!

∞∑n=k

Qn−k2

(n− k)!= eQ1eQ2 .


The matrix-value power series

P (t) =∞∑

k=0

(tQ)k

k!

has infinite radius of convergence. So each component is differentiable withderivative given by term-by-term differentiation:

P ′(t) =∞∑

k=1

tk−1Qk

(k − 1)!= P (t)Q = QP (t).

Hence P (t) satisfies the forward and backward equations. By repeated term-by-term differentiation we obtain (iv). To show uniqueness, assume that M(t)is another solution of the forward equation, then

d

dt

(M(t)e−tQ

)=

d

dt(M(t)) e−tQ + M(t)

d

dt

(e−tQ

)= M(t)Qe−tQ + M(t)(−Q)e−tQ = 0,

so M(t)e−tQ is constant, and so M(t) = etQ. Similarly for the backward equa-tion.

Example 3.3.4 Let

Q =

−2 1 11 -1 02 1 −3

.

det(Q− λI) = λ(λ + 2)(λ + 4).

Then, Q has eigenvalues 0,−2,−4. Then there is an invertible matrix U suchthat

Q = U

0 0 00 −2 00 0 −4

U−1,

so

etQ =∞∑

k=0

(tQ)k

k!= U

∞∑k=0

1k!

0k 0 00 (−2t)k 00 0 (−4t) k

U−1

= U

1 0 00 e−2t 00 0 e−4t

U−1.

To determine the conditions we use

P (0) = I, P ′(0) = Q,P′′(0) = Q2.


Theorem 3.3.2 Consider a continuous-time Markov chain, with a finite set ofstates. Assume that as h ↓ 0 we have that

pij(h) = δij + qijh + o(h), for all i, j ∈ I,

uniformly in t, then pij(t), i, j ∈ I, t ≥ 0 is the solution of the forward equation

p′(t) = p(t)Q, p(0) = I,

where (p(t))ij = pij(t), Qij = qij .

Proof. Similar to the Theorem 3.2.4:

pij(t + h) =∑k∈I

pik(t)(δkj + qkjh + o(h)).

Since I is finite,

pij(t + h)− pij(t)h

=∑k∈I

pik(t)qkj + O(h).

Remark 3.3.1 Note that in the previous theorem qii = −∑

j 6=i qij . By Theorem3.3.1 pij(t), i, j ∈ I, t ≥ 0 is also the solution of the backward equation

p′(t) = Qp(t), p(0) = I,

where (p(t))ij = pij(t), Qij = qij

Remark 3.3.2 We have similar results for the case of infinite state-space. Butin this case we have to look for a minimal non-negative solution of the backwardand forward equations.

Example 3.3.5 Consider a two-state chain, where, for concreteness I = 1, 2.In this case we have to specify only two rates q12 = λ and q21 = µ, so

Q =(−λ λµ −µ

).

Writing the backward equation in matrix form(p′11(t) p′12(t)p′21(t) p′22(t)

)=(−λ λµ −µ

)(p11(t) p12(t)p21(t) p22(t)

),

to solve this we guess that

p11(t) =µ

µ + λ+

λ

µ + λe−(µ+λ)t

p21(t) =µ

µ + λ− µ

µ + λe−(µ+λ)t.


Example 3.3.6 (The Yule process). In this model each particle splits into twoat rate β so qi,i+1 = βi. It can be shown that

p1j(t) = e−βt(1− e−βt)j−1 for j ≥ 1. (3.4)

From this, since the chain starting at i is the sum of i copies of the chainstarting at i we have

pij(t) =(

j + i− 1j

)(e−βt

)i(1− e−βt)j−i.

3.3.3 Jump chain and holding times

We say that a matrix Q = (qij , i, j ∈ I) is a rate matrix if

(i) 0 ≤ −qii <∞ for all i,

(ii) qij ≥ 0 for all i 6= j,

(iii)∑j∈I

qij = 0 for all i. .

We shall write qi or q(i) as an alternative notation for −qii.From a rate matrix Q we can obtain a stochastic matrix Π, the jump matrix ,

πij =

qij/qi if j 6= i and qi 6= 00 if j 6= i and qi = 0

πii=

0 if qi 6= 01 if qi = 0 .

Example 3.3.7

Q =

−2 1 11 −1 02 1 −3

,

then

Π =

0 1/2 1/21 0 02/3 1/3 0

Definition 3.3.2 A minimal right-continuous process (Xt)t≥0 on I is a Markovchain with initial distribution λ and rate matrix Q if its jump chain (Yn)n≥0

is a discrete-time Markov chain (λ, Π) and if for each n ≥ 1, conditional onY0, Y1, ...Yn−1, its holding times S1, S2, ..., Sn are independent exponential ran-dom variables of parameters q(Y0), q(Y1), ..., q(Yn−1). In such a way that if writethe jump times Jn = S1 + S2 + ... + Sn

XJn= Yn,

and

Xt =

Yn if Jn ≤ t < Jn+1 for some n∞ otherwise (it is minimal) .


Let X0 = Y0 with distribution λ, and with and array(T j

n, n ≥ 1, j ∈ I)

ofindependent exponential random variable of parameter 1. Then, inductively forn = 0, 1, 2, ..., if Yn = i, we set

Sjn+1 = T j

n/qij , for j 6= i,

Sn+1 = infj 6=i

Sjn+1,

Yn+1 =

j if Sjn+1 = Sn+1 <∞

i if Sn+1 =∞..

Then conditional on Yn = i, Sn+1 is E(qi), qi =∑

j 6=i qij , Yn+1 has distri-bution (πij , j ∈ I) and Sn+1 and Yn+1 are independent, and independent ofY0, Y1, ..., Yn−1 and S1, ..., Sn.

3.3.4 Birth processes

A birth process, say (Xt)t≥0 , is a generalization of a Poisson process in which theparameter λ is allowed to depend on the current state of the process. The datafor a birth process consist of birth rates 0 ≤ qi,i+1 <∞, where i = 0, 1, 2, ...Thatis (Xt)t≥0 is a right-continuous process with values in Z+∪∞ and conditionalon X0 = i, its holding times are qi,i+1, qi+1,i+2, ...and its jump chain is given byYn = i + n. So

Q =

−q01 q01

−q12 q12

−q23 q23

. . . . . .

.

Example 3.3.8 (Simple birth process or Yule process) qi,i+1 = βi. WriteXt the number of individuals at time t. Suppose that X0 = 1 and let J1 denote,as above, the time of the first birth, then J1 ∼ exp(β) and

E(Xt) = E(Xt1J1≤t) + E(Xt1J1>t)

=∫ t

0

βe−βsE(Xt|J1 = s)ds + E(1J1>t)

=∫ t

0

βe−βsE(Xt|J1 = s)ds + eβt.

If we put µ(t) = E(Xt) then E(Xt|J1 = s) = 2µ(t− s): if we have a birth at shereafter we have like two simple birth process of the same type as X. Then

µ(t) = 2∫ t

0

βe−βsµ(t− s)ds + eβt,


and from here we obtainµ′(t) = βµ(t),

so the mean population size grows exponentially:

E(Xt) = eβt.

Note also that we can obtain µ(t) from (3.4)

µ(t) =∞∑

j=1

jp1j(t) =∞∑

j=1

je−βt(1− e−βt)j−1

=1β

d

dt

∞∑j=1

(1− e−βt)j =1β

d

dt

1− e−βt

e−βt

= eβt.

Definition 3.3.3 If (Jn)n≥1 n≥1are the jump times and (Sn)n≥1 the holdingtimes of our Markov chain

ζ = sup Jn =∞∑

n=1

Sn

is called the explosion time.

Remark 3.3.3 Note that our Markov chains, say (Xt)t≥0 are minimal pro-cesses in the sense that

Xt =∞ if t ≥ ζ.

Remark 3.3.4 It is easy to see from Theorem 3.2.1 that the birth process start-ing from zero explodes if and only if

∞∑i=0

1qi,i+1

<∞.

Example 3.3.9 (Non-minimal chain) Consider a birth process (Xt)t≥0 start-ing from 0 with rates qi,i+1 = 2i for i ≥ 0. Then by the previous remark theprocess explodes, we have insisted that Xt =∞ if t ≥ ζ, where ζ is the explosiontime. But another obvious possibility is to start the process off again from zeroat time ζ and do the same for all subsequent explosions. Using the memorylessproperty of the exponential distribution it can be shown that the new process isa continuous Markov chain in the sense of Definition 3.3.1 .

3.3.5 Class structure

Hereafter we deal only with minimal chains, those that die after explosion. Thenthe class structure is simply the discrete-time class structure of the jump chain(Yn)n≥0.We say that i leads to j and write i→ j if

Pt(Xt = i for some t ≥ 0) > 0.


We say i communicates with j and write i←→ j if both i→ j and j → i. Thenotions of communicating class, closed class, absorbing state and irreducibilityare inherited from the jump chain.

Theorem 3.3.3 For distinct states i and j the following are equivalent:

(i) i→ j;(ii) i→ j for the jump chain;(iii) qi0i1qi1i2 · · · qin−1in > 0 for some states i0, i1, ..., in with i0 = i,

and in = j;(iv) pij(t) > 0 for all t > 0;(v) pij(t) > 0 for some t > 0.

3.3.6 Hitting times

Let (Xt)t≥0 be a Markov chain with rate matrix Q. The hitting time of a subsetA of I is the random variable DA defined by

DA(ω) = inft ≥ 0, Xt(ω) ∈ A

with the usual convention that inf ∅ = ∞. Since (Xt)t≥0 is minimal, if HA isthe hitting time of A for the jump chain, then

HA <∞

=DA <∞

,

and on this setDA = JHA .

The probability, starting from i, that (Xt)t≥0 ever hits A is then

hAi = Pi(DA <∞) = Pi(HA <∞).

Theorem 3.3.4 The vector of hitting probabilities hA = (hAi , i ∈ I) is the

minimal non-negative solution to the system of linear equationshA

i = 1 for i ∈ A∑j∈I qijh

Aj = 0 for i 6∈ A


The average time taken, starting from i, for (Xt)t≥0 to reach A is given by

kAi = Ei(DA)

Theorem 3.3.5 Assume that qi > 0 for all i 6∈ A. The vector of expectedhitting times kA = (kA

i , i ∈ A) is the minimal non-negative solution to thesystem of linear equations

kAi = 0 for i ∈ A−∑

j∈I qijkAj = 1 for i 6∈ A

(3.5)

Proof. First note that kA satisfies (3.5): if X0 = i ∈ A then DA = 0, sokA

i = 0; if X0 = i 6∈ A, by the Markov property

Ei(DA − J1|Y1 = j) = Ej(DA),

so

kAi = Ei(DA) = Ei(J1) + Ei(DA − J1)

= Ei(J1) +∑j 6=i

Ei(DA − J1|Y1 = j)Pi(Y1 = j)

=1qi

+∑j 6=i

Ej(DA)πij =1qi

+∑j 6=i

πijkAj

=1qi

+∑j 6=i

qij

qikA

j ,

and then−qiik

Ai −

∑j 6=i

qijkAj = 1.

Suppose now that y = (yi, i ∈ I) is another non-negative solution to (3.5). ThenkA

i = yi = 0 for i ∈ A. If i 6∈ A, then

yi =1qi

+∑j 6∈A

πijyj =1qi

+∑j 6∈A

πij

1qj

+∑k 6∈A

πjkyk

= Ei(S1) + Ei(S21HA≥2) +

∑j 6∈A

∑k 6∈A

πijπjkyk.

By repeated substitution for y we obtain

yi = Ei(S1) + · · ·+ Ei(Sn1HA≥n) +∑j1 6∈A

· · ·∑

jn−1 6∈A

πij1 · · · πjn−1jnyjn

.

So,

yi ≥n∑

m=1

Ei(Sm1HA≥m) = Ei

n∧HA∑m=1

Sm

→n→∞

Ei(DA) = kAi .


3.3.7 Recurrence and trasience

Let (Xt)t≥0 be Markov chain with rate matrix Q. Recall that (Xt)t≥0 is mini-mal. We say a state i is recurrent if

Pi(t ≥ 0, Xt = i is unbounded) = 1.

We say is transient if

Pi(t ≥ 0, Xt = i is unbounded) = 0.

Remark 3.3.5 Note that if (Xt)t≥0 can explode starting from i then i si cer-tainly not recurrent.

Theorem 3.3.6 We have:

(i) if i is recurrent for the jump chain (Yn)n≥0, then i is recurrent for (Xt)t≥0 ;

(ii) if i is transient for the jump chain (Yn)n≥0, then i is transient for (Xt)t≥0 ;

(iii) every state is either recurrent or transient;(iv) recurrence and transient are class properties.

Denote by Ti the first passage time of (Xt)t≥0 to state i, defined by

Ti (ω) = inft ≥ J1(ω), Xt(ω) = i

Proof. (i) Suppose i is recurrent for the jump chain (Yn)n≥0. If X0 = i then(Xt)t≥0 does not explode and Jn →∞.Also X(Jn) = Yn = i infinitely often, sot ≥ 0, Xt = i is unbounded, with probability 1.

(ii) Suppose i is transient for (Yn)n≥0. If X0 = i then

N = supn ≥ 0, Yn = i <∞,

so t ≥ 0, Xt = i is bounded by JN+1 which is finite with probability 1, becauseYn, n ≤ N cannot include an absorbing state.

We denote by Ti the first passage time of (Xt)t≥0 to state i, defined by

Ti (ω) = inft ≥ J1, Xt(ω) = i.

Theorem 3.3.7 The following dichotomy holds:

(i) if qi = 0 or Pi (Ti <∞) = 1, then i is recurrent and∫ ∞

0

pii(t)dt =∞;

(ii) if qi > 0 and Pi (Ti <∞) < 1, then i is transient and∫ ∞

0

pii(t)dt <∞; .

Proof. If qi = 0 then (Xt)t≥0 cannot leave i and pii(t) = 1.Suppose qi > 0.Let Ni denote the first passage time of the chain (Yn)n≥0 for the state i.Then

P (Ni <∞) = P (Ti <∞),


so i is recurrent if and only if P (Ti <∞) = 1 by the previous theorem and theresult for the jump chain (Yn)n≥0.

Write π(n)ij for (Πn)ij . We shall prove that∫ ∞

0

pii(t)dt =1qi

∞∑n=0

π(n)ij .

∫ ∞

0

pii(t)dt =∫ ∞

0

(Ei1Xt=i)dt = Ei

(∫ ∞

0

1Xt=idt

)= Ei

( ∞∑n=0

Sn+11Yn=i

)

=∞∑

n=0

Ei (Sn+1|Yn = i) Pi (Yn = i) =1qi

∞∑n=0

π(n)ij .

3.3.8 Invariant distributions

We say that λ is invariant ifλQ = 0.

Theorem 3.3.8 Let Q a rate matrix with jump matrix Π and let λ be a mea-sure. The following are equivalent:

(i) λ is invariant;(ii) µΠ = µ where µi = λiqi .

Proof. We have qi(πij − δij) = qij for all i, j so

µ (Π− I)j =∑i∈I

µi (πij − δij) =∑i∈I

λiqij .

Theorem 3.3.9 Suppose that Q is irreducible and recurrent. Then Q has aninvariant measure λ unique up to scalar multiples.

Recall that a state i is recurrent if qi = 0 or Pi(Ti < ∞) = 1. If qi = 0or the expected return time mi = Ei(Ti) is finite we say i is positive recurrentotherwise is called null recurrent.

Theorem 3.3.10 Let Q be an irreducible rate matrix. Then the following areequivalent:

(i) every state is positive recurrent;(ii) some state i is positive recurrent;(iii) P has an invariant distribution, λ say.

Moreover, when (iii) holds we have mi = 1/ (λiqi) , for all i.


Theorem 3.3.11 Let Q be an irreducible rate matrix, and let λ be a measure.Let s > 0 be given. The following are equivalent

(i) λQ = 0;(ii) λP (s) = λ .

3.3.9 Convergence to equilibrium

Theorem 3.3.12 Let Q be an irreducible non-explosive rate matrix with semi-group P (t) an having an invariant distribution λ. Then for all states i, j wehave

pij(t)→ λj as t→∞.

Theorem 3.3.13 Let Q be an irreducible rate matrix and let ν be any distri-bution. Suppose that (Xt)t≥0 is Markov (ν, Q) . Then

P (Xt = j)→ 1/(qjmj) as t→∞ for all j ∈ I

where mj is the expected return time to state j.

Documents

A Course in Stochastic Processes - UB · PDF fileA Course in Stochastic Processes Jos´e Manuel Corcuera. ... laws. The mean m X(t) and ... valued stochastic process on a probability