Random processes - NYU CourantRandom processes 1 Introduction Random processes, also known as stochastic processes, allow us to model quantities that evolve in time (or space) in an

DS-GA 1002 Lecture notes 5 Fall 2016

Random processes

1 Introduction

Random processes, also known as stochastic processes, allow us to model quantities thatevolve in time (or space) in an uncertain way: the trajectory of a particle, the price of oil,the temperature in New York, the national debt of the United States, etc. In these noteswe introduce a mathematical framework that allows to reason probabilistically about suchquantities.

2 Definition

We denote random processes using a tilde over an upper case letter X. This is not standardnotation, but we want to emphasize the difference with random variables and random vectors.

Formally, a random process X is a function that maps elements in a sample space Ω toreal-valued functions.

Definition 2.1 (Random process). Given a probability space (Ω,F ,P), a random process X

is a function that maps each element ω in the sample space Ω to a function X (ω, ·) : T → R,where T is a discrete or continuous set.

There are two possible interpretations for X (ω, t):

• If we fix ω, then X (ω, t) is a deterministic function of t known as a realization of therandom process.

• If we fix t then X (ω, t) is a random variable, which we usually just denote by X (t).

We can consequently interpret X as an infinite collection of random variables indexed by t.The set of possible values that the random variable X (t) can take for fixed t is called thestate space of the random process. Random processes can be classified according to theindexing variable or to their state space.

• If the indexing variable t is defined on R, or on a semi-infinite interval (t0,∞) for some

t0 ∈ R, then X is a continuous-time random process.

• If the indexing variable t is defined on a discrete set, usually the integers or the naturalnumbers, then X is a discrete-time random process. In such cases we often use adifferent letter from t, such as i, as an indexing variable.

• If X (t) is a discrete random variable for all t, then X is a discrete-state randomprocess. If the discrete random variable takes a finite number of values that is thesame for all t, then X is a finite-state random process.

• If X (t) is a continuous random variable for all t, then X is a continuous-state randomprocess.

Note that there are continuous-state discrete-time random processes and discrete-state continuous-time random processes. Any combination is possible.

The underlying probability space (Ω,F ,P) mentioned in the definition completely determinesthe stochastic behavior of the random process. In principle we can specify random processesby defining the probability space (Ω,F ,P) and the mapping from elements in Ω to continuousor discrete functions, as illustrated in the following example. As we will discuss later on,this way of specifying random processes is only tractable for very simple cases.

Example 2.2 (Puddle). Bob asks Mary to model a puddle probabilistically. When thepuddle is formed, it contains an amount of water that is distributed uniformly between 0and 1 gallon. As time passes, the water evaporates. After a time interval t the water that isleft is t times less than the initial quantity.

Mary models the water in the puddle as a continuous-state continuous-time random processC. The underlying sample space is (0, 1), the σ algebra is the corresponding Borel σ algebra(all possible countable unions of intervals in (0, 1)) and the probability measure is the uniformprobability measure on (0, 1). For a particular element in the sample space ω ∈ (0, 1)

C (ω, t) :=ω

t, t ∈ [1,∞), (1)

where the unit of t is days in this example. Figure 1 shows different realizations of therandom process. Each realization is a deterministic function on [1,∞).

Bob points out that he only cares what the state of the puddle is each day, as opposed toat any time t. Mary decides to simplify the model by using a continuous-state discrete-timerandom process D. The underlying probability space is exactly the same as before, but thetime index is now discrete. For a particular element in the sample space ω ∈ (0, 1)

D (ω, i) :=ω

i, i = 1, 2, . . . (2)

Figure 1 shows different realizations of the continuous random process. Note that eachrealization is just a deterministic discrete sequence.

2

2 4 6 8 10

0

0.2

0.4

0.6

0.8

1

t

C(ω,t

)ω = 0.62ω = 0.91ω = 0.12

1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

i

D(ω,i

)

ω = 0.31ω = 0.89ω = 0.52

Figure 1: Realizations of the continuous-time (left) and discrete-time (right) random processdefined in Example 2.2.

When working with random processes in a probabilistic model, we are often interested inthe joint distribution of the process sampled at several fixed times. This is given by thenth-order distribution of the random process.

Definition 2.3 (nth-order distribution). The nth-order distribution of a random process X

is the joint distribution of the random variables X (t1), X (t2), . . . , X (tn) for any n samplest1, t2, . . . , tn of the time index t.

Example 2.4 (Puddle (continued)). The first-order cdf of C (t) in Example 2.2 is

FC(t) (x) := P(C (t) ≤ x

)(3)

= P (ω ≤ t x) (4)

=

∫ t xu=0

du = t x if 0 ≤ x ≤ 1t,

1 if x > 1t,

0 if x < 0.

(5)

We obtain the first-order pdf by differentiating.

fC(t) (x) =

t if 0 ≤ x ≤ 1

t,

0 otherwise.(6)

3

If the nth order distribution of a random process is shift-invariant, then the process is saidto be strictly or strongly stationary.

Definition 2.5 (Strictly/strongly stationary process). A process is stationary in a strict orstrong sense if for any n ≥ 0 if we select n samples t1, t2, . . . , tn and any displacement τ therandom variables X (t1), X (t2), . . . , X (tn) have the same joint distribution as X (t1 + τ),

X (t2 + τ), . . . , X (tn + τ).

The random processes in Example 2.2 are clearly not strictly stationary because their first-order pdf and pmf are not the same at every point. An important example of strictly sta-tionary processes are independent identically-distributed sequences, presented in Section 4.1.

As in the case of random variables and random vectors, defining the underlying probabilityspace in order to specify a random process is usually not very practical, except for very simplecases like the one in Example 2.2. The reason is that it is challenging to come up with aprobability space that gives rise to a given n-th order distribution of interest. Fortunately,we can also specify a random process by directly specifying its n-th order distribution forall values of n = 1, 2, . . . This completely characterizes the random process. Most of therandom processes described in Section 4, e.g. independent identically-distributed sequences,Markov chains, Poisson processes and Gaussian processes, are specified in this way.

Finally, random processes can also be specified by expressing them as functions of otherrandom processes. A function Y := g(X) of a random process X is also a random process,

as it maps any element ω in the sample space Ω to a function Y (ω, ·) := g(X (ω, ·)). InSection 4.4 we define random walks in this way.

3 Mean and autocovariance functions

The expectation operator allows to derive quantities that summarize the behavior of therandom process through weighted averaging. The mean of the random vector is the mean ofX (t) at any fixed time t.

Definition 3.1 (Mean). The mean of a random process is the function

µX (t) := E(X (t)

). (7)

Note that the mean is a deterministic function of t. The autocovariance of a random processis another deterministic function that is equal to the covariance of X (t1) and X (t2) for anytwo points t1 and t2. If we set t1 := t2, then the autocovariance equals the variance at t1.

4

Definition 3.2 (Autocovariance). The autocovariance of a random process is the function

RX (t1, t2) := Cov(X (t1) , X (t2)

). (8)

In particular,

RX (t, t) := Var(X (t)

). (9)

Intuitively, the autocovariance quantifies the correlation between the process at two differenttime points. If this correlation only depends on the separation between the two points, thenthe process is said to be wide-sense stationary.

Definition 3.3 (Wide-sense/weakly stationary process). A process is stationary in a wideor weak sense if its mean is constant

µX (t) := µ (10)

and its autocovariance function is shift invariant, i.e.

RX (t1, t2) := RX (t1 + τ, t2 + τ) (11)

for any t1 and t2 and any shift τ . For weakly stationary processes, the autocovariance isusually expressed as a function of the difference between the two time points,

RX (s) := RX (t, t+ s) for any t. (12)

Note that any strictly stationary process is necessarily weakly stationary because its firstand second-order distributions are shift invariant.

Figure 2 shows several stationary random processes with different autocovariance functions.If the autocovariance function is only nonzero at the origin, then the values of the randomprocesses at different points are uncorrelated. This results in erratic fluctuations. Whenthe autocovariance at neighboring times is high, the trajectory random process becomessmoother. The autocorrelation can also induce more structured behavior, as in the rightcolumn of the figure. In that example X (i) is negatively correlated with its two neighbors

X (i− 1) and X (i+ 1), but positively correlated with X (i− 2) and X (i+ 2). This resultsin rapid periodic fluctuations.

4 Important random processes

In this section we describe some important examples of random processes.

5

Autocovariance function

15 10 5 0 5 10 15s

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

R(s

)

15 10 5 0 5 10 15s

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

R(s

)

15 10 5 0 5 10 15s

1.0

0.5

0.0

0.5

1.0

R(s

)

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

0 2 4 6 8 10 12 14i

3

2

1

0

1

2

3

Figure 2: Realizations (bottom three rows) of Gaussian processes with zero mean and the auto-covariance functions shown on the top row.

6

Uniform in (0, 1) (iid)

0 2 4 6 8 10 12 14i

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 2 4 6 8 10 12 14i

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 2 4 6 8 10 12 14i

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Geometric with p = 0.4 (iid)

0 2 4 6 8 10 12 14i

0

2

4

6

8

10

12

0 2 4 6 8 10 12 14i

0

2

4

6

8

10

12

0 2 4 6 8 10 12 14i

0

2

4

6

8

10

12

Figure 3: Realizations of an iid uniform sequence in (0, 1) (first row) and an iid geometric sequencewith parameter p = 0.4 (second row).

4.1 Independent identically-distributed sequences

An independent identically-distributed (iid) sequence X is a discrete-time random process

where X (i) has the same distribution for any fixed i and X (i1), X (i2), . . . , X (in) are

mutually independent for any n fixed indices and any n ≥ 2. If X (i) is a discrete randomvariable (or equivalently the state space of the random process is discrete), then we denotethe pmf associated to the distribution of each entry by pX . This pdf completely characterizesthe random process, since for any n indices i1, i2, . . . , in and any n:

pX(i1),X(i2),...,X(in)(xi1 , xi2 , . . . , xin) =

n∏i=1

pX (xi) . (13)

Note that the distribution that does not vary if we shift every index by the same amount,so the process is strictly stationary.

Similarly, if X (i) is a continuous random variable, then we denote the pdf associated to the

7

distribution by fX . For any n indices i1, i2, . . . , in and any n we have

fX(i1),X(i2),...,X(in)(xi1 , xi2 , . . . , xin) =

n∏i=1

fX (xi) . (14)

Figure 3 shows several realizations from iid sequences which follow a uniform and a geometricdistribution.

The mean of an iid random sequence is constant and equal to the mean of its associateddistribution, which we denote by µ,

µX (i) := E(X (i)

)(15)

= µ. (16)

Let us denote the variance of the distribution associated to the iid sequence by σ2. Theautocovariance function is given by

RX (i, j) := E(X (i) X (j)

)− E

(X (i)

)E(X (j)

)(17)

=

σ2,

0.(18)

This is not surprising, X (i) and X (j) are independent for all i 6= j, so they are alsouncorrelated.

4.2 Gaussian process

A random process X is Gaussian if the joint distribution of the random variables X (t1),

X (t2), . . . , X (tn) is Gaussian for all t1, t2, . . . , tn and any n ≥ 1. An interesting featureof Gaussian processes is that they are fully characterized by their mean and autocovariancefunction. Figure 2 shows realizations of several discrete Gaussian processes with differentautocovariances.

4.3 Poisson process

In Lecture Notes 2 we motivated the definition of Poisson random variable by deriving thedistribution of the number of events that occur in a fixed time interval under the followingconditions:

1. Each event occurs independently from every other event.

8

2. Events occur uniformly.

3. Events occur at a rate of λ events per time interval.

We now assume that these conditions hold in the semi-infinite interval [0,∞) and define a

random process N that counts the events. To be clear N (t) is the number of events thathappen between 0 and t.

By the same reasoning as in Example 3.7 of Lecture Notes 2, the distribution of the randomvariable N (t2) − N (t1), which represents the number of events that occur between t1 andt2, is a Poisson random variable with parameter λ (t2 − t1). This holds for any t1 and t2.

In addition the random variables N (t2) − N (t1) and N (t4) − N (t3) are independent asalong as the intervals [t1, t2] and (t3, t4) do not overlap by Condition 1. A Poisson processis a discrete-state continuous random process that satisfies these two properties. Poissonprocesses are often used to model events such as earthquakes, telephone calls, decay ofradioactive particles, neural spikes, etc. In Lecture Notes 2, Figure 5 shows an example ofa real scenario where the number of calls received at a call center is well approximated as aPoisson process (as long as we only consider a few hours). Note that here we are using theword event to mean something that happens, such as the arrival of an email, instead of a setwithin a sample space, which is the meaning that it usually has elsewhere in these notes.

Definition 4.1 (Poisson process). A Poisson process with parameter λ is a discrete-state

continuous random process N such that

1. N (0) = 0.

2. For any t1 < t2 < t3 < t4 N (t2)− N (t1) is a Poisson random variable with parameterλ (t2 − t1).

3. For any t1 < t2 < t3 < t4 the random variables N (t2)− N (t1) and N (t4)− N (t3) areindependent.

We now check that the random process is well defined, by proving that we can derive thejoint pmf of N at any n points t1 < t2 < . . . < tn for any n ≥ 0. To alleviate notation let

p(λ, x)

be the value of the pmf of a Poisson random variable with parameter λ at x, i.e.

p(λ, x)

:=λx e−λ

x!. (19)

9

λ = 0.2 λ = 1 λ = 2

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10t

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4: Events corresponding to the realizations of a Poisson process N for different values ofthe parameter λ. N (t) equals the number of events up to time t.

10

We have

pN(t1),...,N(tn)(x1, . . . , xn) (20)

= P(N (t1) = x1, . . . , N (tn) = xn

)(21)

= P(N (t1) = x1, N (t2)− N (t1) = x2 − x1, . . . , N (tn)− N (tn−1) = xn − xn−1

)(22)

= P(N (t1) = x1

)P(N (t2)− N (t1) = x2 − x1

). . .P

(N (tn)− N (tn−1) = xn − xn−1

)= p (λt1, x1) p (λ (t2 − t1) , x2 − x1) . . . p (λ (tn − tn−1) , xn − xn−1) . (23)

In words, we have expressed the event that N (ti) = xi for 1 ≤ i ≤ n in terms of the random

variables N (t1) and N (ti) − N (ti−1), 2 ≤ i ≤ n, which are independent Poisson randomvariables with parameters λt1 and λ (ti − ti−1) respectively.

Figure 4 shows several sequences of events corresponding to the realizations of a Poissonprocess N for different values of the parameter λ (N (t) equals the number of events up totime t). Interestingly, the interarrival time of the events, i.e. the time between contiguousevents, always has the same distribution: it is an exponential random variable. This allowsto simulate Poisson processes by sampling from an exponential distribution. Figure 4 wasgenerated in this way.

Lemma 4.2 (Interarrival times of a Poisson process are exponential). Let T denote the timebetween two contiguous events in a Poisson process with parameter λ. T is an exponentialrandom variable with parameter λ.

The proof is in Section A of the appendix. Figure 8 in Lecture Notes 2 shows that theinterarrival times of telephone calls at a call center are indeed well modeled as exponential.

The following lemma, which derives the mean and autocovariance functions of a Poissonprocess is proved in Section B.

Lemma 4.3 (Mean and autocovariance of a Poisson process). The mean and autocovarianceof a Poisson process equal

E(X (t)

)= λ t, (24)

RX (t1, t2) = λmin t1, t2 . (25)

The mean of the Poisson process is not constant and its autocovariance is not shift-invariant,so the process is neither strictly nor wide-sense stationary.

11

Example 4.4 (Earthquakes). The number of earthquakes with intensity at least 3 on theRichter scale occurring in the San Francisco peninsula is modeled using a Poisson processwith parameter 0.3 earthquakes/year. What is the probability that there are no earthquakesin the next ten years and then at least one earthquake over the following twenty years?

We define a Poisson process X with parameter 0.3 to model the problem. The number ofearthquakes in the next 10 years, i.e. X (10), is a Poisson random variable with parameter

0.3 · 10 = 3. The earthquakes in the following 20 years, X (30) − X (10), are Poisson withparameter 0.3 · 20 = 6. The two random variables are independent because the intervals donot overlap.

P(X (10) = 0, X (30) ≥ 1

)= P

(X (10) = 0, X (30)− X (10) ≥ 1

)(26)

= P(X (10) = 0

)P(X (30)− X (10) ≥ 1

)(27)

= P(X (10) = 0

)(1− P

(X (30)− X (10) = 0

))(28)

= e−3(1− e−6

)= 4.97 10−2. (29)

The probability is 4.97%.

4.4 Random walk

A random walk is a discrete-time random process that is used to model a sequence thatevolves by taking steps in random directions. To define a random walk formally, we firstdefine an iid sequence of steps S such that

S (i) =

+1 with probability 1

2,

−1 with probability 12.

(30)

We define a random walk X as the discrete-state discrete-time random process

X (i) :=

0 for i = 0,∑i

j=1 S (j) for i = 1, 2, . . .(31)

We have specified X as a function of an iid sequence, so it is well defined. Figure 5 showsseveral realizations of the random walk.

X is symmetric (there is the same probability of taking a positive step and a negative step)and begins at the origin. It is easy to define variations where the walk is non-symmetric

12

0 2 4 6 8 10 12 14i

6

4

2

0

2

4

6

0 2 4 6 8 10 12 14i

6

4

2

0

2

4

6

0 2 4 6 8 10 12 14i

6

4

2

0

2

4

6

Figure 5: Realizations of the random walk defined in Section 5.

and begins at another point. Generalizations to higher dimensional spaces– for instance tomodel random processes on a 2D surface– are also possible.

We derive the first-order pmf of the random walk in the following lemma, proved in Section Cof the appendix.

Lemma 4.5 (First-order pmf of a random walk). The first-order pmf of the random walk

X is

pX(i) (x) =

(i

i+x2

)12i

if i+ x is even and −i ≤ x ≤ i

0 otherwise.(32)

The first-order distribution of the random walk is clearly time-dependent, so the randomprocess is not strictly stationary. By the following lemma, the mean of the random walk isconstant (it equals zero). The autocovariance, however, is not shift invariant, so the processis not weakly stationary either.

Lemma 4.6 (Mean and autocovariance of a random walk). The mean and autocovariance

of the random walk X are

µX (i) = 0, (33)

RX (i, j) = min i, j . (34)

13

Proof.

µX (i) := E(X (i)

)(35)

= E

(i∑

j=1

S (j)

)(36)

=i∑

j=1

E(S (j)

)by linearity of expectation (37)

= 0. (38)

RX (i, j) := E(X (i) X (j)

)− E

(X (i)

)E(X (j)

)(39)

= E

(i∑

k=1

j∑l=1

S (k) S (l)

)(40)

= E

mini,j∑k=1

S (k)2 +i∑

k=1

j∑l=1l 6=k

S (k) S (l)

(41)

=

mini,j∑k=1

1 +i∑

k=1

j∑l=1l 6=k

E(S (k)

)E(S (l)

)(42)

= min i, j , (43)

where (42) follows from linearity of expectation and independence.

The variance of X at i equals RX (i, i) = i which means that the standard deviation of the

random walk scales as√i.

Example 4.7 (Gambler). A gambler is playing the following game. A fair coin is flippedsequentially. Every time the result is heads the gambler wins a dollar, every time it lands ontails she loses a dollar. We can model the amount of money earned (or lost) by the gambleras a random walk, as long as the flips are independent. This allows us to estimate that theexpected gain equals zero or that the probability that the gambler is up 6 dollars or moreafter the first 10 flips is

P (gambler is up $6 or more) = pX(10) (6) + pX(10) (8) + pX(10) (10) (44)

=

(10

8

)1

210+

(10

9

)1

210+

1

210(45)

= 5.47 10−2. (46)

14

4.5 Markov chains

We begin by defining the Markov property, which is satisfied by any random process forwhich the future is conditionally independent from the past given the present.

Definition 4.8 (Markov property). A random process satisfies the Markov property if X (ti+1)

is conditionally independent of X (t1) , . . . , X (ti−1) given X (ti) for any t1 < t2 < . . . < ti <ti+1. If the state space of the random process is discrete, then for any x1, x2, . . . , xi+1

pX(tn+1)|X(t1),X(t2),...,X(ti)(xn+1|x1, x2, . . . , xn) = pX(ti+1)|X(ti)

(xi+1|xi) . (47)

If the state space of the random process is continuous (and the distribution has a joint pdf),

fX(ti+1)|X(t1),X(t2),...,X(ti)(xi+1|x1, x2, . . . , xi) = fX(ti+1)|X(ti)

(xi+1|xi) . (48)

Any iid sequence satisfies the Markov property, since all conditional pmfs or pdfs are justequal to the marginals. The random walk also satisfies the property, since once we fix wherethe walk is at a certain time i the path that it took before i has no influence in its nextsteps.

Lemma 4.9. The random walk satisfies the Markov property.

Proof. Let X denote the random walk defined in Section 4.4. Conditioned on X (j) = xifor j ≤ i, X (i+ 1) equals xi + S (i+ 1). This does not depend on x1, . . . , xi−1, whichimplies (47).

A Markov chain is a random process that satisfies the Markov property. In these notes wewill consider discrete-time Markov chains with a finite state space, which means that theprocess can only take a finite number of values at any given time point. To specify such aMarkov chain, we only need to define the pmf of the random process at its starting point(which we will assume is at i = 0) and its transition probabilities. This follows from theMarkov property, since for any n ≥ 0

pX(0),X(1),...,X(n) (x0, x1, . . . , xn) :=n∏i=0

pX(i)|X(0),...,X(i−1) (xi|x0, . . . , xi−1) (49)

=n∏i=0

pX(i)|X(i−1) (xi|xi−1) . (50)

15

If these transition probabilities are the same at every time step (i.e. they are constant anddo not depend on i), then the Markov chain is said to be time homogeneous. In this case,we can store the probability of each possible transition in an s× s matrix TX , where s is thenumber of states. (

TX)jk

:= pX(i+1)|X(i) (xj|xk) . (51)

In the rest of this section we will focus on time-homogeneous finite-state Markov chains.The transition probabilities of these chains can be visualized using a state diagram, whichshows each state and the probability of every possible transition. See Figure 6 below for anexample.

To simplify notation we define an s-dimensional vector ~pX(i) called the state vector, whichcontains the marginal pmf of the Markov chain at each time i,

~pX(i) :=

pX(i) (x1)

pX(i) (x2)

· · ·

pX(i) (xs)

. (52)

Each entry in the state vector contains the probability that the Markov chain is in thatparticular state at time i. It is not the value of the Markov chain, which is a randomvariable.

The initial state space ~pX(0) and the transition matrix TX suffice to completely specify a time-homogeneous finite-state Makov chain. Indeed, we can compute the joint distribution of thechain at any n time points i1, i2, . . . , in for any n ≥ 1 from ~pX(0) and TX by applying (50)and marginalizing over any times that we are not interested in. We illustrate this in thefollowing example.

Example 4.10 (Car rental). A car-rental company hires you to model the location of theircars. The company operates in Los Angeles, San Francisco and San Jose. Customers regu-larly take a car in a city and drop it off in another. It would be very useful for the companyto be able to compute how likely it is for a car to end up in a given city. You decide tomodel the location of the car as a Markov chain, where each time step corresponds to a newcustomer taking the car. The company allocates new cars evenly between the three cities.The transition probabilities, obtained from past data, are given by

16

SF

LA

SJ

0.2

0.2

0.6

0.1

0.8

0.1

0.3

0.3

0.4

0 5 10 15Customer

SF

LA

SJ

0 5 10 15Customer

SF

LA

SJ

0 5 10 15Customer

SF

LA

SJ

Figure 6: State diagram of the Markov chain described in Example (4.10) (top). Each arrowshows the probability of a transition between the two states. Below we show three realizations ofthe Markov chain.

17

San Francisco Los Angeles San Jose( )0.6 0.1 0.3 San Francisco0.2 0.8 0.3 Los Angeles0.2 0.1 0.4 San Jose

To be clear, the probability that a customer moves the car from San Francisco to LA is 0.2,the probability that the car stays in San Francisco is 0.6, and so on.

The initial state vector and the transition matrix of the Markov chain are

~pX(0) :=

1/3

1/3

1/3

, TX :=

0.6 0.1 0.3

0.2 0.8 0.3

0.2 0.1 0.4

. (53)

State 1 is assigned to San Francisco, state 2 to Los Angeles and state 3 to San Jose. Figure 6shows a state diagram of the Markov chain. Figure 6 shows some realizations of the Markovchain.

The researcher now wishes to estimate the probability that the car starts in San Franciscoand is in San Jose after the second customer. This is given by

pX(0),X(2) (1, 3) =3∑i=1

pX(0),X(1),X(2) (1, i, 3) (54)

=3∑i=1

pX(0) (1) pX(1)|X(0) (i|1) pX(2)|X(1) (3|i) (55)

=(~pX(0)

)1

3∑i=1

(TX)i1

(TX)3i

(56)

=0.6 · 0.2 + 0.2 · 0.1 + 0.2 · 0.4

3≈ 7.33 10−2. (57)

The probability is 7.33%.

The following lemma provides a simple expression for the state vector at time i ~pX(i) in termsof TX and the previous state vector.

Lemma 4.11 (State vector and transition matrix). For a Markov chain X with transitionmatrix TX

~pX(i) = TX ~pX(i−1). (58)

18

If the Markov chain starts at time 0 then

~pX(i) = T iX~pX(0), (59)

where T iX

denotes multiplying i times by matrix TX .

Proof. The proof follows directly from the definitions,

~pX(i) :=

pX(i) (x1)

pX(i) (x2)

· · ·

pX(i) (xs)

=

∑sj=1 pX(i−1) (xj) pX(i)|X(i−1) (x1|xj)∑sj=1 pX(i−1) (xj) pX(i)|X(i−1) (x2|xj)

· · ·∑sj=1 pX(i−1) (xj) pX(i)|X(i−1) (xs|xj)

(60)

=

pX(i)|X(i−1) (x1|x1) pX(i)|X(i−1) (x1|x2) · · · pX(i)|X(i−1) (x1|xs)

pX(i)|X(i−1) (x2|x1) pX(i)|X(i−1) (x2|x2) · · · pX(i)|X(i−1) (x2|xs)

· · ·

pX(i)|X(i−1) (xs|x1) pX(i)|X(i−1) (xs|x2) · · · pX(i)|X(i−1) (xs|xs)

pX(i−1) (x1)

pX(i−1) (x2)

· · ·

pX(i−1) (xs)

= TX ~pX(i−1) (61)

Equation (59) is obtained by applying (58) i times and taking into account the Markovproperty.

Example 4.12 (Car rental (continued)). The company wants to estimate the distributionof locations right after the 5th customer has used a car. Applying Lemma 4.11 we obtain

~pX(5) = T 5X~pX(0) (62)

=

0.281

0.534

0.185

. (63)

The model estimates that after 5 customers more than half of the cars are in Los Angeles.

The states of a Markov chain can be classified depending on whether the Markov chain mayeventually stop visiting them or not.

19

Definition 4.13 (Recurrent and transient states). Let X be a time-homogeneous finite-stateMarkov chain. We consider a particular state x. If

P(X (j) = s for some j > i | X (i) = s

)= 1 (64)

then the state is recurrent. In words, given that the Markov chain is at x, the probabilitythat it returns to x is one. In contrast, if

P(X (j) 6= s for all j > i | X (i) = s

)> 0 (65)

the state is transient. Given that the Markov chain is at x, there is nonzero probability thatit will never return.

The following example illustrates the difference between recurrent and transient states.

Example 4.14 (Employment dynamics). A researcher is interested in modeling the employ-ment dynamics of young people using a Markov chain.

She determines that at age 18 a person is either a student with probability 0.9 or an internwith probability 0.1. After that she estimates the following transition probabilities:

Student Intern Employed Unemployed

0.8 0.5 0 0 Student0.1 0.5 0 0 Intern0.1 0 0.9 0.4 Employed0 0 0.1 0.6 Unemployed

The Markov assumption is obviously not completely precise, someone who has been a studentfor longer is probably less likely to remain a student, but such Markov models are easier tofit (we only need to estimate the transition probabilities) and often yield useful insights.

The initial state vector and the transition matrix of the Markov chain are

~pX(0) :=

0.9

0.1

0

0

, TX :=

0.8 0.5 0 0

0.1 0.5 0 0

0.1 0 0.9 0.4

0 0 0.1 0.6

. (66)

Figure 7 shows the state diagram and some realizations of the Markov chain.

20

Student

Employed

Intern

Unemployed

0.1

0.8

0.1

0.9

0.1

0.4

0.6

0.5

0.5

20 25 30Age

Stud.

Int.

Emp.

Unemp.

20 25 30Age

Stud.

Int.

Emp.

Unemp.

20 25 30Age

Stud.

Int.

Emp.

Unemp.

Figure 7: State diagram of the Markov chain described in Example (4.14) (top). Below we showthree realizations of the Markov chain.

21

States 1 (student) and 2 (intern) are transient states. Note that the probability that theMarkov chain returns to those states after visiting state 3 (employed) is zero, so

P(X (j) 6= 1 for all j > i | X (i) = 1

)≥ P

(X (i+ 1) = 3 |X (i) = 1

)(67)

= 0.1 > 0, (68)

P(X (j) 6= 2 for all j > i | X (i) = 2

)≥ P

(X (i+ 2) = 3 |X (i) = 2

)(69)

= 0.4 · 0.1 > 0. (70)

In contrast, states 3 and 4 (unemployed) are recurrent. We prove this for state 3 (theargument for state 4 is exactly the same):

P(X (j) 6= 3 for all j > i | X (i) = 3

)(71)

= P(X (j) = 4 for all j > i |X (i) = 3

)(72)

= limk→∞

P(X (i+ 1) = 4 |X (i) = 3

) k∏j=1

P(X (i+ j + 1) = 4 |X (i+ j) = 4

)(73)

= limk→∞

0.1 · 0.6k (74)

= 0. (75)

In this example, it is not possible to reach the states student and intern from the statesemployed or unemployed. Markov chains for which there is a possible transition between anytwo states (even if it is not direct) are called irreducible.

Definition 4.15 (Irreducible Markov chain). A time-homogeneous finite-state Markov chainis irreducible if for any state x, the probability of reaching every other state y 6= x in a finitenumber of steps is nonzero, i.e. there exists m ≥ 0 such that

P(X (i+m) = y | X (i) = x

)> 0. (76)

One can easily check that the Markov chain in Example 4.10 is irreducible, whereas the onein Example 4.14. An important result is that all states in an irreducible Markov chain arerecurrent.

Theorem 4.16 (Irreducible Markov chains). All states in an irreducible Markov chain arerecurrent.

The result is proved in Section D of the appendix.

We end this section by defining the period of a state.

22

A B C

1

0.1

0.9

1

Figure 8: State diagram of a Markov chain where states the states have period two.

Definition 4.17 (Period of a state). Let X be a time-homogeneous finite-state Markov chainand x a state of the Markov chain. The period m of x is the smallest integer such that it isonly possible to return to x in a number of steps that is a multiple of m, i.e. km for somepositive integer k with nonzero probability.

Figure 8 shows a Markov chain where the states have a period equal to two. AperiodicMarkov chains do not contain states with periods greater than one.

Definition 4.18 (Aperiodic Markov chain). A time-homogeneous finite-state Markov chain

X is aperiodic if all states have period equal to one.

The Markov chains in Examples 4.10 and 4.14 are both aperiodic.

A Proof of Lemma 4.2

We begin by deriving the cdf of T ,

FT (t) := P (T ≤ t) (77)

= 1− P (T > t) (78)

= 1− P (no events in an interval of length t) (79)

= 1− e−λ t (80)

because the number of points in an interval of length t follows a Poisson distribution withparameter λ t. Differentiating we conclude that

fT (t) = λe−λ t. (81)

B Proof of Lemma 4.3

By definition the number of events between 0 and t is distributed as a Poisson randomvariables with parameter λ t and hence its mean is equal to λ t.

23

The autocovariance equals

RX (t1, t2) := E(X (t1) X (t2)

)− E

(X (t1)

)E(X (t2)

)(82)

= E(X (t1) X (t2)

)− λ2t1t2. (83)

By assumption X (t1) and X (t2)− X (t1) are independent so that

E(X (t1) X (t2)

)= E

(X (t1)

(X (t2)− X (t1)

)+ X (t1)

2)

(84)

= E(X (t1)

)E(X (t2)− X (t1)

)+ E

(X (t1)

2)

(85)

= λ2t1 (t2 − t1) + λt1 + λ2t21 (86)

= λ2t1t2 + λt1. (87)

C Proof of Lemma 4.5

Let us define the number of positive steps S+ that the random walk takes. Given theassumptions on S, this is a binomial random variable with parameters i and 1/2. The

number of negative steps is S− := i− S+. In order for X (i) to equal x we need for the netnumber of steps to equal x, which implies

x = S+ − S− (88)

= 2S+ − i. (89)

This means that S+ must equal i+x2

. We conclude that

pX(i) (i) = P

(i∑

j=0

S (i) = x

)(90)

=

(ii+x2

)1

2iifi+ x

2is an integer between 0 and i. (91)

D Proof of Theorem 4.16

In any finite-state Markov chain there must be at least one state that is recurrent. If allthe states are transient there is a nonzero probability that it leaves all of the states forever,which is not possible. Without loss of generality let us assume that state x is recurrent. We

24

will now prove that another arbitrary state y must also be recurrent. To alleviate notationlet

px,x := P(X (j) = x for some j > i | X (i) = x

), (92)

px,y := P(X (j) = y for some j > i | X (i) = x

). (93)

The chain is irreducible so there is a nonzero probability pm > 0 of reaching y from x inat most m steps for some m > 0. The probability that the chain goes from x to y andnever goes back to x is consequently at least pm (1− py,x). However, x is recurrent, so thisprobability must be zero! Since pm > 0 this implies py,x = 1.

Consider the following event:

1. X goes from y to x.

2. X does not return to y in m steps after reaching x.

3. X eventually reaches x again at a time m′ > m.

The probability of this event is equal to py,x (1− pm) px,x = 1−pm (recall that x is recurrent

so px,x = 1). Now imagine that steps 2 and 3 repeat k times, i.e. that X fails to go from

x to y in m steps k times. The probability of this event is py,x (1− pm)k pkx,x = (1− pm)k.

Taking k → ∞ we have that the probability that X does not eventually return to x mustbe zero.

25

Documents

Random processes - NYU CourantRandom processes 1 Introduction Random processes, also known as stochastic processes, allow us to model quantities that evolve in time (or space) in an