The Elements of Stochastic Processes - With Applications to the Natural Sciences Norman t j

A WILEY PUBLICATION IN APPLIED STATISTICS The elelllents ofStochastic Processeswith applications to the natural sciences

NORMAN T. J. BAILEYReader in BiometryUniversity of Oxford

John Wiley & Sons, Inc.N ew York . London . Sydney

COPYRIGHT 1964BY

JOHN WILEY & SONS, INC.

All Rights Reserved

7 8 9 10

ISBN 0 471 04165 3

LIBRARY OF CONGRESS CATALOG CARD NUMBER: 63-23220

PRINTED IN THE UNITED STATES OF AMERICA

(Everything flows, nothing is stationary)HERACLITUS

" ... I love everything that flows, everything that has time in it and becoming,that brings us back to the beginning where there is never end: ..."

HENR Y MILLER, Tropic of Cancer

v

Preface

Stochastic models are being used to an ever-increasing extent by thosewho wish to investigate phenomena that are essentially concerned with aflow of events in time, especially those exhibiting such highly variablecharacteristics as birth, death, transformation, evolution, etc. The rigorousprobability theory of these processes presents very considerable mathe-matical difficulties, but no attempt has been made to handle this aspecthere. Instead, the more heuristic approach of applied mathematics hasbeen adopted to introduce the reader to a wide variety of theoretical prin-ciples and applied techniques, which are developed simultaneously.

An acquaintance with basic probability and statistics is assumed. Someknowledge of matrix algebra is required, and complex variable theory isextensively used in the latter half of the book. Differential equations, bothordinary and partial, occur frequently, though none is of a very advancednature. Where special mathematical methods are required repeatedly, suchas the use of generating functions or the solution of linear partial differen-tial equations, additional sections are included to give a detailed discussion.

The book starts with a general introduction, and then a whole chapter ongenerating functions. The next four chapters deal with various kinds ofprocesses in discrete time such as Markov chains and random walks. Thisis followed by the treatment of continuous-time processes, including suchspecial applications as birth-and-death processes, queues, epidemics, etc.After this there is a discussion of diffusion processes. The two final chap-ters deal with the use of various approximate methods of handling certainkinds of processes, and a brief introduction to the treatment of some non-Markovian processes.

It is hoped that students will find the book useful as a first text in thetheory and application of stochastic processes, after which they will bebetter equipped to tackle some of the more mathematically sophisticatedtreatises. Readers who already have some acquaintance with the subjectmay find the discussions of special topics to be of value.

vii

viii PREFACE

The text of this book is based on a course of lectures given during thefall and winter quarters, 1961-2, at the Department of Biostatistics, TheJohns Hopkins University, Baltimore. It gives me great pleasure to ack-nowledge the encouragement received from Professor Allyn Kimball,while I was a Visiting Professor in his department. I am also indebted tothe Milbank Memorial Fund of New York City for support during mytenure at The Johns Hopkins University. A shortened version of the lec-tures was presented at the same time at the National Institutes of Health,Bethesda, where I was fortunate to receive further stimulus from Dr.Jerome Cornfield and his colleagues.

The first six chapters were drafted on Sanibel Island, Florida, duringFebruary 1962. The following nine chapters were completed during myappointment as Visiting Professor in the Department of Statistics, Stan-ford University, in the spring and summer quarters of 1962. I shouldspecially like to thank Professor Herbert Solomon for arranging thefinancial support for the production and distribution of the multilith draftof the first fifteen chapters. I am also greatly indebted to Mrs. Betty JoPrine, who supervised the latter, and to Mrs. Carolyn Knutsen, who under-took the actual typing, a task which she performed with great rapidity,enthusiasm, and accuracy.

The final chapter was completed in Oxford, England, during the fall of1962, and the usual introductory material, problems for solution, refer-ences, etc., were added. A revized version of the whole book was thenprepared, and I am delighted to be able to acknowledge the help receivedfrom Professor Ralph Bradley, who read most of the draft typescript andprovided me with detailed comments and criticisms. I must add, of course,that any errors that remain are entirely my own responsibility. I shouldalso like to thank pr. J. F. C. Kingman of the University of Cambridgefor making a number of helpful comments and suggestions.

Finally, I must thank my secretary in Oxford, Mrs. Kay Earnshaw, forher assistance in preparing the final version of the book for printing.

NORMAN T. J. BAILEYOxfordOctober, 1963

Contents

CHAPTER1 INTRODUCTION AND GENERAL ORIENTAnON

2 GENERATING FUNCTIONS2.1 Introduction2.2 Definitions and elementary results2.3 Convolutions2.4 Compound distributions2.5 Partial fraction expansions2.6 Moment- and cumulant-generating functions .

Problems for solution

3 RECURRENT EVENTS3.1 Introduction3.2 Definitions3.3 Basic theorems3.4 Illustration3.5 Delayed recurrent events


4 RANDOM WALK MODELS4.1 Introduction4.2 Gambler's ruin .4.3 Probability distribution of ruin at nth trial4.4 Extensions


5 MARKOV CHAINS5.1 Introduction5.2 Notation and definitions5.3 Classification of states5.4 Classification of chains5.5 Evaluation of pn5.6 Illustrations

Problems for solutionix

PAGE1

55579

101214

16161618192122

242425283436

3838394146475156

X STOCHASTIC PROCESSES CONTENTS xi

6 DISCRETE BRANCHING PROCESSES 58 11.4 Monte Carlo methods in appointment systems 1476.1 Introduction 58 11.5 Non-equilibrium treatment of a simple queue 1496.2 Basic theory 59 11.6 First passage times 1576.3 Illustration 63 Problems for solution 160

Problems for solution 6412 EPIDEMIC PROCESSES 162

7 MARKOV PROCESSES IN CONTINUOUS TIME 66 12.1 Introduction 1627.1 Introduction 66 12.2 Simple epidemics 1647.2 The Poisson process 67 12.3 General epidemics 1697.3 Use of generating functions. 69 12.4 Recurrent epidemics 1777.4 "Random-variable" technique 70 12.5 Chain-binomial models 1827.5 Solution of linear partial differential equations 74 Problems for solution 1847.6 General theory 75

Problems for solution 82 13 COMPETITION AND PREDATION 18613.1 Introduction 186

8 HOMOGENEOUS BIRTH AND DEATH PROCESSES. 84 13.2 Competition between two species 1878.1 Introduction 84 13.3 A prey-predator model 1898.2 The simple birth process 848.3 The general birth process 88 14 DIFFUSION PROCESSES 1948.4 Divergent birth processes 89 14.1 Introduction 1948.5 The simple death process 90 14.2 Diffusion limit of a random walk. 1958.6 The simple birth-and-death process 91 14.3 Diffusion limit of a discrete branching process 1978.7 The effect of immigration 97 14.4 General theory 1998.8 The general birth-and-death process 101 14.5 Application to population growth 2058.9 Multiplicative processes 102

Problems for solution 105 15 APPROXIMATIONS TO STOCHASTIC PROCESSES 20715.1 Introduction 207

9 SOME NON-HOMOGENEOUS PROCESSES 107 15.2 Continuous approximations to discrete processes 2079.1 Introduction 107 15.3 Saddle-point approximations 2149.2 The P6lya process 107 15.4 Neglect of high-order cumulants 2179.3 A Simple non-homogeneous birth-and-death process 110 15.5 Stochastic linearization 2199.4 The effect of immigration. . . . . 115

Problems for solution 116 16 SOME NON-MARKOVIAN PROCESSES . 22216.1 Introduction 222

10 MULTI-DIMENSIONAL PROCESSES 117 16.2 Renewal theory and chromosome mapping 22410.1 Introduction 117 16.3 Use of integral equations 22810.2 Population growth with two sexes 11910.3 The cumulative population . 120 REFERENCES 23410.4 Mutation in bacteria 125 SOLUTIONS TO PROBLEMS 23710.5 A multiple-phase birth process 131 AUTHOR INDEX 243

Problems for solution. 135 SUBJECT INDEX 245

11 QUEUEING PROCESSES 13611.1 Introduction 13611.2 Equilibrium theory 13711.3 Queues with many servers 143

CHAPTER 1

Introduction and General Orientation

The universal spectacle of birth and death, growth and decay, chan!;,..:and transformation has fascinated mankind since the earliest times. Poetsand philosophers alike have been preoccupied with the remorseless flow ofevents. In ancient Greece Heraclitus made the idea of perpetual changeand flow the central concept of his philosophy. Sometimes the wholeprocess of change has been conceived as an endless series of repetitivecycles and sometimes as a continuous line of advance. From the time ofDarwin, however, previous philosophical speculations about the possibilityof continuous evolutionary development were given a sound basis inobserved facts.

Since the work of Newton in the 17th Century a mathematical under-standing of the physical processes of the universe has been continuallybroadened and deepened. Biological processes, however, have provedmore difficult to master. This is largely due to the much greater inherentvariability of biological material. In many, though certainly not all, areasof physics variation arises mainly from mere errors of observation, and canbe averaged out to negligible proportions by the use of repeated measure-ments. But in biological contexts variability usually has to be acceptedas basic and handled as such. Hence the tremendous emphasis now placedon the use of statistical methods in designing and analyzing experimentsinvolving any kind of biological material.

Nevertheless, when dealing with general mathematical descriptions ofbiological phenomena (as opposed to the problem of interpreting criticalexperiments) the,first step has often been to ignore substantial amounts ofvariation. Thus the simplest procedure in dealing with populationgrowth is to adopt such concepts as birth-rate, death-rate, immigration-rate, etc., and to treat these as operating continuously and steadily. Sothat, given the rates, we ~an write down a differential equation whosesolution spe~ifil:s l:X1H:lly I he population size (regarded as a continuous

2 STOCHASTIC PROCESSES INTRODUCTION AND GENERAL ORIENTATION 3

variable) at any instant of time. A good deal of actuarial work is of thiskind. Again, the first mathematical accounts of the spread of epidemicdisease (see Bailey, 1957, for references) assumed that in a short intervalof time the number of new cases would be precisely proportional to theproduct of the time interval, the number of susceptibles and the numberof infectious persons. Assumptions of this type lead to differentialequations whose solutions predict the exact numbers of susceptibles andinfectives to be found at any given time.

If we are dealing with large populations, it may be legitimate to assumethat the statistical fluctuations are small enough to be ignored. In thiscase a deterministic model, i.e. the type just considered, may be a sufficientlyclose approximation to reality for certain purposes.

There is nothing inappropriate in predicting that the present populationof, say, 51,226 persons in a small town will have grown to 60,863 in 10years'time. Provided the latter estimate is correct to within a few hundredunits, it may well be a useful figure for various economic and sociologicalpurposes. (A finer analysis would of course take the age and sex structureinto account.) But if, in a small family of four children, one of whomdevelops measles today, we predict the total number of cases in a week'stime as 2.37, it is most unlikely that this figure will have any particularsignificance: a practical knowledge of the variability normally observed inepidemic situations suggests that the deterministic type of model is in-appropriate here.

It is clear that the only satisfactory type of prediction about the futurecourse of an epidemic in a small family must be on a probability basis.That is, in the family of 4 mentioned above we should want to be able tospecify the probability distribution at any time of the existing number ofcases (0, 1,2, 3, or 4). We might also want to go further and make proba-bility statements about the actual times of occurrence of the various cases,and to distinguish between susceptibles, infectives and those who wereisolated or recovered. A model which specified the complete joint proba-bility distribution of the numbers of different kinds of individuals at eachpoint of time would be a stochastic model, and the whole process, con-ceived as a continuous development in time would be called a stochasticprocess (or probability process).

Whenever the group of individuals under consideration is sufficientlysmall for chance fluctuations to be appreciable, it is very likely that astochastic representation will be essential. And it often happens that theproperties of a stochastic model are markedly different from those of thecorresponding deterministic analog. In short, we cannot assume that theformer is merely a more detailed version of the latter. This is especiallythe case with queues and epidemics. for example.

There is now a very large literature dealing with stochastic processesof many different kinds. Much of it makes difficult reading, especiallyfor those who are primarily concerned with applications, rather than withabstract formulations and fundamental theory. For the latter, the readershould consult the treatise of Doob (1953). However, it is quite possibleto gain a working knowledge of the subject, so far as handling a variety ofpractically useful stochastic processes is concerned, by adopting the lessrigorous and more heuristic approach characteristic of applied mathe-matics. An attempt has been made to develop this approach in the presentvolume.

Those readers who require an introductory course to stochastic pro-cesses should read the whole book through in the order in which it iswritten, but those who already possess some knowledge of the subject mayof course take any chapter on its own. No special effort has been made toseparate theory and applications. Indeed, this has on the contrary beendeliberately avoided, and the actual balance achieved varies from chapterto chapter in accordance with what seems to the author a natural develop-ment of the subject. Thus in dealing with generating functions, the basicmathematical material is introduced as a tool to be used in later chapters.Whereas the discussion of birth-and-death processes is developed from thestart in a frankly biological context. In this way a close connection ismaintained between applied problems and the theory required to handlethem.

An acquaintance with the elements of probability theory and basicstatistics is assumed. For an extended coverage of such topics, includingproofs of many results quoted and used in the present book, the readershould refer to the textbooks of Feller (1957) and Parzen (1960).

Some knowledge of matrix algebra is required (e.g. Frazer, Duncan,and Collar, 1946) though special topics such as the spectral resolution of amatrix are explained in the text.

Considerable use is made of complex variable theory, and the readershould be familiar with the basic ideas at least as far as contour integrationand the calculus of residues. Many standard textbooks, such as Copson(1948), are available for this purpose. Frequent use is also made of theLaplace transform technique, though no abstruse properties are involved.In this connection some readers may like to refer to McLachlan (1953),which deals both with complex variable theory and with practical appli-cations of Laplace transforms in a straightforward and comparativelynon-rigorous fashion.

Ordinary differential equations and partial differential equations are afairly common occurrence in this book. No advanced methods are em-ployed. but a number of special procedures are described and illustrated

4 STOCHASTIC PROCESSES

in detail. At the same time the reader might find it helpful to consultsome textbook such as Forsyth (1929) for a fuller exposition of standardmethods of solution.

Finally, the whole subject of approximation and asymptotics is onewhich no applied mathematician can afford to neglect. This aspect willcrop up continually throughout the book as many of the simplest stochasticmodels entail considerable mathematical difficulty: even moderate degreesof reality in the model may result in highly intractable mathematics!A variety of approximation techniques will therefore appear in the sub-sequent discussions, and it cannot be too strongly emphasized that somefacility in handling this aspect is vital to making any real progress withapplied problems in stochastic processes. While some space in thepresent book is devoted to specialized topics such as the methods ofsteepest descents, which is not as widely understood as it should be, thereader might like to refer to the recent book by DeBruijn (1958) for adetailed and connected account of asymptotic theory. Unfortunately,there appears to be no single book dealing with the very wide range ofapproximate methods which are in principle available to the appliedmathematician.

It will be seen from the foregoing that the whole philosophy of thepresent volume is essentially rooted in real phenomena: in the attemptto deepen insight into the real world by developing adequate mathematicaldescriptions of processes involving a flow of events in time, and exhibitingsuch characteristics as birth, death, growth, transformation etc. Themajority of biological and medical applications at present appear to entailprocesses of this type. These may be loosely termed evolutionary, incontradistinction to the stationary type of time-series process. There is aconsiderable literature on the latter subject alone, and in the present bookwe have confined our attention to the evolutionary processes. This ispartly because stationary processes are adequately dealt with elsewhere;and partly because the evolutionary models are more appropriate for mostbiological situations. For a more rigorous outline of the mathematicaltheory of stochastic processes the reader may like to consult Takacs(1960). Proofs of theorems are, however, generally omitted, but the bookcontains an excellent collection of problems with solutions. A moredetailed treatment of this material is given in Parzen (1962), and for moreadvanced reading the now classic text by Bartlett (1955) is to be recom-mended. The recent book by Bharucha-Reid (1960) also contains someinteresting discussions and a very extensive set of bibliographies. For awider coverage, including time-series, Rosenblatt (i 962) should be con-sulted.

CHAPTER 2

Generating Functions

2.1 Introduction

Before embarking on a discussion of some of the more elementarykinds of stochastic processes, namely the recurrent events of Chapter 3,it will be convenient to present the main properties of generating junctions.This topic is of central importance in the handling of stochastic processesinvolving integral-valued random variables, not only in theoreticalanalyses but also in practical applications. Moreover, all processes dealingwith populations of individuals, whether these are biological organisms,radioactive atoms, or telephone calls, are basically of this type. Con-siderable use will be made of the generating function method in the sequel,the main advantage lying in the fact that we use a single function torepresent a whole collection of individual items. And as this technique isoften thought to constitute an additional basic difficulty in studyingstochastic processes, although it frequently makes possible very sub-stantial and convenient simplifications, it seems worth while devotingspace to a specific review of the main aspects of the subject. This chapteris in no sense a complete treatment: for a more extensive discussion seeFeller (1957, Chapter 11 and following).

2.2 Definitions and elementary results

First, let us suppose that we have a sequence of real numbers ao, a1 ,a2' .. '. Then, introducing the dummy variable x, we may define afunction

00

A(x) = lit) + lItX + a2x2 + ... = L ajxj . (2.1)j=O

If the series converges in some real interval -xo < x < X o, the function5

7(2.9)

(2.10)

(2.14)

GENERATING FUNCTIONS

m == E(X) = L jPj = P'(I),j=O= L qj = Q(I),j=O

P{X =j} = Cl j , PlY = k} = bk

E{X(X - I)} = Lj(j - l)pj = P"(I) = 2Q'(I).Hence the variance is

where the prime in (2.9) indicates differentiation. We also have

concerned, all coefficients are less than unity, and so Q(x) convergesabsolutely at least in the open interval Ixl < 1.

A useful result connecting P(x) and Q(x) is that(1 - x)Q(x) = 1 - P(x), (2.8)

as is easily verified by comparing coefficients on both sides.Simple formulas are available giving the mean and variance of theprob~bility distri~utio~Pj in terms of particular values of the generatingfunctIons and theIr denvatives. Thus the mean is

(J2 == var(X) = P"(I) + P'(I) - {P'(1)}2, (2.11)= 2Q'(I) + Q(I) - {Q(I)}2. (2.12)

Similarly we can obtain the rth factorial moment p'[r] about the origin asE{X(X - 1) ... (X - r + I)} = Lj(j - 1) ... (j - r + l)pj

= p(r)(I) == rQ(r-l)(I), (2.13)i.e. differentiating P(x) r times and putting x = 1.

Several other sets of quantities characterizing probability distributions,such as moments, factorial moments, and cumulants, can also be handledby means of the appropriate generating functions. For an extended dis-cussion of the relevant theory a good standard textbook of mathematicalstatistics should be consulted. Some of the main results required in thepresent book are summarized in Section 2.6 below.

2.3 Convolutions

Let us n?w consider two non-negative independent integral-valuedrandom vanables X and Y, having the probability distributions

(2.4)

(2.7)

(2.2)

(2.6)

(2.5)

(2.3)P{X =j} = Pj'

STOCHASTIC PROCESSES

P{X > if = qj'

P{X ~ j} = 1 - qj'We now have the probability-generating function

P(x) = L PjXj = E(xj ),j=O

6

IP(x)1 :s;; L Ipjxjl:s;; L Pj, if Ixl:S;; 1,~1.

Note that Q(x) is not a probability-generating function as defined above.Although the coefficients are probabilities, they do not in general con-stitute a probability distribution.

Now the probabilities Pj sum to unity. Therefore P(I) = 1, and

where X is an integral-valued random variable assuming the values0, 1, 2, .... We can also define the "tail" probabilities

The corresponding function A(x) is then a probability-generatingfunction.Let us consider specifically the probability distribution given by

A(x) is called the !(enerating function of the sequence {aj}' This may beregarded as a transformation carrying the sequence {aj} into the functionA(x). For the moment we can take x as real, but in more advancedapplications it is convenient to work with a complex variable z.

It is clear that if the sequence {aJ is bounded, then a comparison withthe geometric series shows that A(x) converges at least for Ixl < 1.

In many, but by no means all, cases of interest to us the aj are proba-bilities, that is we introduce the restriction

where the operator E indicates an expectation. We can also define agenerating function for the "tail" probabilities, i.e.

The usual distribution function is thus

Thus P(x) is absolutely convergent at least for Ixl ~ l. So far as Q(x) is

8 STOCHASTIC PROCESSES GENERATING FUNCTIONS 9

S = X + Y.

Then the event (S = r) comprises the mutually exclusive events

The probability of the event (X = j, Y = k) is therefore ajbk Suppose wenow form a new random variable

where ~~e X k have. a comm~n probability distribution given by Pj' withprob~blhty-gen~ratmg functIOn P(x), then the probability-generatingfunctIOn of Sn IS {P(xW. Further, the distribution of Sn is given by asequence of probabilities which is the n-fold convolution of {p.l. withitself. This is written as )J

(2.21){pJ*{Pj}* ... *{pJ = {Pjt".~ 11 factors __

(2.15)

(X = 0, Y = r), (X = 1, Y = r - 1), ... ,(X = r, Y = 0).

If the distribution of S is given byP{S = r} = c"

it follows that (2.16) 2.4 Compound distributions

Let us define the generating functions

(2.26)

(2.25)

(2.23)

(2.24)

(2.22)

P{Xk =j} =fj )P{N=n} =gn'P{SN = I} = h,

= L gnP{Sn = liN = n}.n=O

L P{Sn = liN = n}x'= {F(xW.

1- 0

h, = P{SN = I}= L P{N = n}P{Sn = liN = n}

n=O

Let the corresponding probability-generating functions be

Now simple probability considerations show that we can write theprobability distribution of SN as

and

For fixed n, the distribution of Sn is the n-fold convolution of {f.} withitself, i.e. {fj}n.. Therefore )

where

Let us extend the notion at the end of the previous section to the casewhere the number of random variables contributing to the sum is itself arandom variable. Suppose that

and

(2.19)

(2.18)

(2.17)

C(x) = L cjxij=O

B(x) = L bjxjj=O

A(x) = L ajxjj=O

since multiplying the two series A(x) and B(x), and using (2.16), gives thecoefficient of xr as Cr'

In the special case of probability distributions, the probability-generatingfunction of the sum S of two independent non-negative integral-valuedrandom variables X and Y is simply the product of the latter's probability-generating functions.

More generally, we can consider the convolution of several sequences,and the generating function of the convolution is simply the product ofthe individual generating functions, i.e. the generating function of{aj}*{bj}*{cJ* ... is A(x) B(x) C(x) ....

If, in particular, we have the sum of several independent randomvariables of the previous type, e.g.

Sn = Xl + X 2 + ... + X n' (2.20)

It follows almost immediately thatC(x) = A(x)B(x),

This method of compounding two sequences of numbers (which neednot necessarily be probabilities) is called a conl'Olution. We shall use thenotation

= L Xl L g"P{S" = liN = II}, using (2.25)1=0 ,,=0

Thus the probability-generating function H(x) can be expressed asH(x) = L hixi

1=0

GENERATING FUNCTIONS 11

It ~s a standard result that the function P(x) can be expressed in partialfractIOns as

to STOCHASTIC PROCESSES

P( ) _ Pl P2 PmX ----+ __ + ... + __Xl - X X2 - X Xm- X'

(2.31)

(2.32)

(2.34)

(2.35)

,m,

P._I' I X I - k - l as k-+oo.

= lim (Xl - X)U(X)x-x, V(X)

(Xl - X2)( Xl - X3 ) (Xl - Xm)- U(X I )V'(Xl) ,

Pl P2 PmPk=~+--r+T+ ... +~.Xl X2 Xm

1 1( X)-l--=- I--xi - X x j Xi

where the constant Pl' for example, is given by

Pl = lim (Xl - X)P(x)X-Xl

and in generalPi = - U(x)/V'(x). (2.33)

The chie~ difficu~ty so f~r is in obtaining the partial fraction expansion(2.31), and In practIce consl~era,ble n~merical calculation may be required.Wh,en, however, the expanSIOn, IS available, the coefficient Pk of X is easilydenved. For k

and if thi,s expression is substituted in equation (2.31) for j = 1,2,we can pick out the required coefficient as

The formula in (2.34) is of Course exact, but involves the disadvantage~hat all n: roots of V(x) = 0 must be calculated, and this may be prohibitive

I~ practIce. Frequently, however, the main contribution comes from asmgle term. Thus suppose that Xl is smaller in absolute value than allother roots. The first term PlXl -k-l will then dominate as k increases.More exactly, we can write

(2.30)

(2.29)

(2.27)

using (2.26)

P(x) = U(x)jV(x),

= L g" L P{S" = liN = n}xl,,=0 1=0

= L g,,{F(xW,,,=0

=G(F(x)).

where U(x) and V(x) are polynomials without common roots. To beginwith we can assume that the degree of U(x) is less than that of V(x), thelatter being m, say. Let the equation V(x) = 0 have m distinct roots,Xl' X2' ... ,x"" so that we can take

2.5 Partial fraction expansions

In many problems some desired probability distribution is obtainedby first deriving the corresponding probability-generating function,and then picking out the coefficients in the latter's Taylor expansion.Thus if we have found the probability-generating function P(x), whereP(x) = Li=o Pixi, as usual, the coefficients Pi are given by repeateddifferentiation of P(x), i.e.

Pi = ~ (dd)iP(X)!). X x=O

= p(j)(O)/j1. (2.28) .In practice explicit calculation may be very difficult or impossible, andsome kind of approximation is desira~le. One very convenient method isbased on a development in terms of partial fractions.

For the time being let us suppose that the function P(x) is a rationalfunction of the form

This gives a functionally simple form for the probability-generatingfunction of the compound distribution {hi} of the sum SN'

The formula in (2.27) can be used to obtain several standard results inthe theory of distributions. However, in the context of the present bookwe shall be concerned with applications to a special kind of stochasticprocess, namely the branching process examined in Chapter 6.

This formula is often surprisingly good, even for relatively small valuesof k. Moreover, if we apply this method to the generating function Q(x),we shall obtain an approximation to the whole tail of the probabilitydistribution beyond some specified point (see Feller, 1957, Chapter 11, fora numerical example).

Most of the restrictions in the foregoing derivation can be relaxed.Thus if U(x) is of degree m + r, division by V(x) leads to the form alreadyconsidered plus a polynomial of degree r, affecting only the first r ~ 1terms of the distribution. Again if Xh is a double root, then (2.31) contaInSan additional term of the form O"h(X - Xh)-2, which adds a quantityO"h(k + l)xh-k-2 to the exact expression for Pk in (2.34). Si~ilar1y forroots of greater multiplicity. Thus we can see that provided XI IS a simpleroot, smaller in absolute value than all other roots, the asymptotic formula(2.35) will still hold.

It can further be shown that if XI is a root of V(x), smaller in absolutevalue than all other roots, but of multiplicity r, then

number of other functions that generate moments and cumulants andwhich apply to random variables that are continuous as well as to ~hosethat are discrete. Thus we may define the moment-generating functionM(8) of a random variable X quite generally as

M(8) = E(e8X). (2.38)If X is discrete and takes the value j with probability Pj we have

M(8) = I e8jpj == P(e8), (2.39)j

while if X is continuous with a frequency function{(u) we can write

M(O) = f:oo e8'J(u) duo (2.40)In either case the Taylor expansion of M(8) generates the moments

given by

12 STOCHASTIC PROCESSESGENERATING FUNCTIONS 13

where now

2.6 Moment- and cumulant-generating functions

(2.41)

(2.45)

00

M(8) = 1 + I p;W/r!,r; I

where p; is the rth moment about the origin.. .Unfortunately, moment-generating functions do not always exist, andIt IS often better to work with the characteristic function defined by

(O) = E(ei8x), (2.42)which does always exist. The expansion corresponding to (2.41) is then

00

(O) = 1 + I p;(i8y!rL (2.43)r;1

When there is a continuous-frequency function f(u), we have, as theanalog of (2.40),

(8) = f:oo e i8Uf(u) du, (2.44)for which there is a simple inversion formula given by the Fourier trans-form

1 foof(u) = - e- i8U(8) d8.2n -00

It shoul~ be no~ed that this is a complex integral, and usually requirescontour mtegratIOn and the theory of residues for its evaluation. For afuller account of the theory of characteristic functions, and the appropri-ate form taken by the inversion theorem under less restrictive conditionsthan those required above. the reader should consult Parzen (1960Chapter 9). '

(2.37)

(2.36)

So far we have been discussing the properties of generating functions,mainly, though not entirely, in the context of probability-generatingfunctions for discrete-valued random variables. There are of course a

As mentioned in Section 2.2 it is sometimes convenient in advancedapplications to define the generating function in terms of a compl~xdummy variable z instead of the real quantity X used ab~ve. We ca~ stIlluse the foregoing method of partial fraction expansIOn for ratIOnalfunctions, but a much wider class of functions in complex variable theoryadmits of such expansions. Thus suppose that the functionf(z) is regularexcept for poles in any finite region of the z-plane. Then if ( is not a pole,we have, subject to certain important conditions (see, for example, Copson, .1948, Section 6.8), the result thatf(O is equal to the sum of the residues off(z)/( ( - z) at the poles of f(z). That this does provide a partial fractionexpansion is seen from the fact that if rx is a pole ofj'(z) with principle pa~t

L~; I br(z - rx)-r, the contribution of rx to the reqUIred sum ISL~;I M( - rx)-r.

14 STOCHASTIC PROCESSES GENERATING FUNCTIONS 15

PROBLEMS FOR SOLUTION

where P[r]' the rth factorial moment about the origin, is defined as in (2.13).

where K r is the rth cumulant.Finally, we note the factorial moment-generating function, sometimes

useful in handling discrete variables, defined by

8. Show that the cumulant-generating function of the normal distributiongiven by

I 2 2feu) = --- e-( u-m) /20 - (JJ < U < (JJ. a(27T)Y:, , ,

is K(B) = mB + ia2B2.9. Suppose that SN = Xl + X2 + ... + X N, where the Xi are independent

random variables with identical distributions given by

P{Xi = O} =q, P{Xi = I}= p,and N is a random variable with a Poisson distribution having mean A.Prove that SN has a Poisson distribution with mean Ap.

(2.48)(2.49)

(2.46)

(2.47)00

== L KrOrjr!,r= 1

K(8) = log M(O)

Q(y) = pel + y) = E{(l + y)j}== 1 + L P[r]yrjr!,

r=1

For many reasons it often turns out to be simpler to work in terms ofcumulants, rather than moments.

The cumulant-generating function is simply the natural logarithm ofthe moment-generating function or characteristic function, whichever ismost convenient. In the former case we have the cumulant-generatingfunction K(8) given by

I. Show that the probability-generating function for the binomial distribution

Pi = C) piqn-i, where q = I - p, is P(x) = (q +px)n. Hence prove that themean is m = np, and the variance is a2= npq.

2. What is the probability-generating function for the Poisson distributionPi = e-At..J/j!,j = 0, 1,2, ... ? Hence show that the mean and variance of thedistribution are both equal to A.

3. Find the probability-generating function of the geometric distributionPi = pqi,j = 0,1,2, . What are the mean and variance?

4. If M x(B) =E(eoX) is the moment-generating function ofa random variableX, show that the moment-generating function of Y = (X - a)/b is M y(B)= e- aO / bM x(B/b).

5. Using the result of Problem 4, show that the moment-generating function ofthe binomial variable in Problem I, measured from its mean value, is(qe- PO +pe-qo)n. Hence calculate the first four moments about the mean.

6. Obtain the probability-generating function of the negative-binomial dis-

tribution given by Pi = (n + j- 1) pnqi, j = 0, 1,2, .... Find the mean andvariance.

7. Show that the negative-binomial distribution in Problem 6 is the convolu-tion of n independent geometric distributions, each of the form shown inProblem 3.

RECURRENT EVENTS 17

Note that the sequence {un} is not a probability distribution, as theprobabilities refer to events that are not mutually exclusive. And in manyimportant cases we find 'L UII = 00. However, the events "E occurs for thefirst time at the nth trial" are mutually exclusive, and so

The quantity 1 - f can clearly be interpreted as the probability that E doesnot occur at all in an indefinitely prolonged series of trials.

If f = 1, the fll constitute a genuine set of probabilities, and we canlegitimately talk about the mean of this distribution, f1 = I:= I nf.. Indeed,even when f< 1, we can regard the fll as providing a probabilitydistribution if we formally assign the value 00 to the chance of non-occur-rence, 1 - f In this case we automatically have f1 = 00. The probabilitydistribution fll therefore refers to the waiting time for E, defined as thenumber of trials up to and including the first occurrence of E; or, moregenerally, to the recurrence time between successive occurrences of E.

Thus in a sequence of coin-tossing trials E could simply be the occurrenceof heads at any particular trial, or it could be the contingency that theaccumulated numbers of heads and tails were equal. It is convenient tointroduce a mild restriction as to the kind of events we are prepared toconsider: namely, that whenever E occurs, we assume the series of trialsor observations to start again from scratch for the purposes of looking foranother occurrence of E. This ensures that the event "E occurs at the nthtrial" depends only on the outcomes of the first n trials, and not on thefuture, which would lead to unnecessary complications. The practicalconsequence of this assumption is that if we consider the event Econstitutedby two successive heads, say HH, then the sequence HHHT (where Tstands for tails) shows E appearing only at the second trial. After that, westart again, and the remaining trials HT do not yield E. Similarly, thesequence HHHIHTIHHHIHHH provides just three occurrences of "arun of three heads", as indicated by the divisions shown.

With these definitions, let us specify the probability that E occurs at thenth trial (not necessarily for the first time) as UII ; and the probability thatE does occur for the first time at the nth trial as fll' For convenience wedefine

(3.3)

(3.1)

(3.2)00

F(x) = 'L f"x".11=0

fo =0,

00

f= 'Lfll=F(l)";;'l.11=1

Uo =1,

00

U(x) = L U"X",11=0

and introduce the generating functions

Some of the simplest kinds of stochastic process occur in re,lat,ion to arepetitive series of experimental trials such as occur when a com IS tossedover and over again, Thus we might suppose that the chan~e of headsoccurring at any particular trial is p, independently of all prevIOus tosses.It is then not difficult to use standard probability theory to calculate thechance that the accumulated number of heads in n tosses lies betweenspecified limits, or the probability of,occurre~ce,of runs ?f heads of agiven length, and so on. We might hesitate to dignify SUC? simple sche~eswith the title of stochastic process. Nevertheless, With only a httlegeneralization, they provide the basis for a general theory of, recurre~tevents. And the importance of this topic in the context of this book ISthat it provides a number of results and theorems that will be of immediateapplication to our discussion of Markov chains in Chapter 5. The, presentchapter introduces the chief concepts required later, and outlmes t.hemain results in a comparatively non-rigorous fashion. For a more preciseset of definitions and closely reasoned discussion see Chapter 13 ofFeller (1957).

Recurrent Events

Let us suppose that we have a succession of trials, which. are not neces-sarily independent, and each of which has a number of possible outcomes,say E j (j = 1,2, ., . ). We now concentrate attention on some specificevent E which could be one of the outcomes or some more general patternof outc~mes. More generally, we need only consider an event E whichmayor may not occur at each of a succession of instants, n = I, 2, ....

16

3.2 Definitions

3.1 Introduction

CHAPTER 3

3.3 Basie theorems

00

U = 'L u j , (3.6)j=Othen U - 1 is the expected number of occurrences in an infinite sequenceof trials (using the fact that "o == I).

19

(3.7)

(3.8)

(3.9)

U -1f=-.U

RECURRENT EVENTS

1U=--1-J'

A theorem of considerable importance in the theory of recurrent events,and especially in later applications to Markov chains, is the following.

Theorem 3.2

Let E be a persistent, not periodic, recurrent event with mean re-currence time {t = I nf,. = F'(1), then

Theorem 3.1

A necessary and sufficient condition for the event E to be persistent isthat 'L Uj should diverge.

This theorem is useful for determining the status of a particular kindof event, given the relevant probabilities. The proof is as follows. If Eis persistent, thenf = I and F(x) -> 1 as x -> 1. Thus, from (3.5), U(x) -> 00and I Uj diverges. The converse argument also holds.

Again, if E is transient, F(x) -> f as x -> 1. In this case it follows from(3.5) that U(x) -+ (1 - 1) -1. Now U(x) is a series with non-negativecoefficients. So by Abel's convergence theorem, the series converges forthe value x = 1, and has the value U(1) = (1 - f)-l. Thus in the transientcase we have

In particular Un ->0 if JL = 00.We shall not give the proofhere: it can be carried through in a relatively

elementary way, but is rather lengthy (see, for example, Feller, 1957,Chapter 13, Section 10). The corresponding result for periodic events(which can be deduced quite easily from Theorem 3.2) is:Theorem 3.3

If E is persistent with period t, then

and Uk = 0 for every k not divisible by t.

3.4 Illustration

As an illustration of some of the foregoing ideas let us consider a seriesof independent trials, at each of which there is a result that is either asuccess or failure, the probabilities being p and q = 1 - p, respectively.Let E be the event "the cumulative numbers of successes and failures are


1U(x) = 1 _ F(x) . (3.5)

Suppose now that we regard u as the expectation of a random variablethat takes the values 1 and 0 as Edoes or does not occur at the jth trial.The quantity 'Lj= 1 Uj is thus the expected number of occurrences in ntrials. If we define

18

The first thing we require is the fundamental relationship between thetwo sequences {un} and {j~}. This is most easily expressed in terms of thecorresponding generating functions. We use the following argument.

To begin with we decompose the event HE occurs at the nth trial", forwhich the probability is Um into the set of mutually exclusive and exhaustiveevents given by HE occurs for the first time at the jth trial, and again at thenth trial" for j = I, 2, 3, ... ,n. The probability of the latter is fjun _ j'since the chance that E occurs for the first time at the jth trial isfj, and,having thus occurred, the chance that it occurs again n - j trials later islIn -j, the two chances being by definition independent. It immediatelyfollows that

Un =f1Un-1 + fzun-z + .,. + fnuo, n ~ 1. (3.4)Now, remembering that fo == 0, the sequence on the right of (3.4) is,for n ~ 1, just the convolution {un}*{fn}, for which the generatingfunction is U(x)F(x). But the sequence on the left of (3.4) is, for n;;::' I,the set of quantities {un} with the first term Uo = 1 missing. The generatingfunction for this sequence is thus U(x) - 1. It follows that U(x) - 1 =U(x)F(x), or

DefinitionsWe call the recurrent event Epersistent iff= 1, and transient iff < 1.(There is some variation in the literature regarding the terms employed.

The present usage, recommended in Feller (1957, Chapter 13), seemspreferable where applications to Markov chains are concerned.)

We call the recurrent event E periodic with period t, if there exists aninteger t > 1 such that E can occur only at trials numbered t, 2t, 3t, ...(i.e. Un = 0 whenever n is not divisible by t, and t is the largest integer withthis property).

If p =1= .1, we have

u = U(I) = (1 - 4pq)-t = Ip _ ql-l

00

U(x) = L u2nX 2nn=O

21

(3.14)

(3.15)

(3.16)

(3.17)

We also have

RECURRENT EVENTS

and the chance that E will occur at least once is

u -1f=-= l-Ip-ql.u

U(x) - 1F(x) = U(x) = 1 - (1 - 4pqx2)t.

When p = t, this becomesF(x) = 1 - (1 - x2r~-,

from which we obtain the probability distribution.

( 2n - 2) 1 )f2n: n _ 1 1122n-1' n ~ 1 .f2n + 1 - 0, n ~ 0

The mean value of this distribution is of course f1 = P(l) = 00, as alreadyremarked above.

It is convenient to introduce at this point a certain extension of theabove account of recurrent events, mainly because we shall want to makeuse of the appropriate modification of Theorem 3.2 in our discussion ofMarkov chains in Chapter 5. Actually, the whole subject of recurrentevents can be treated as a special case of the more general theory of re-newal processes. For an extensive account of this topic see the recent bookby D. R. Cox (1962).

In formulating the basic model for ordinary recurrent events in Section3.2, we used. the sequence {fn} to represent the probability distribution ofrecu~rence times between successive occurrences of the event E, includingthe tm~e up to the first appearance of E. We now modify this model bysupposmg that the probability distribution of the time up to the firstoccurrence of 1! is given by the sequence {bn }, which is in general differentfr?m Un}. It IS as though we missed the beginning of the whole series oftrIals, and only started recording at some later point. Following Feller(1957, Chapter 13, Section 5), we shall call E a delayed recurrent event.The b~sic equ.atio~ for the corresponding process is obtained by thefollowmg modificatIOn of the earlier argument.

As before, we decompose the event "E occurs at the nth trial", for whichthc probability is Un' into the set of mutually exclusive and exhaustive

3.5 Delayed recurrent events

(3.12)

(3.13)

(3.10)

(3.11)

U 2n +l = o.

I11-1=---1Ip - ql '


( 2n) n nU2n = n P q ,

Using Stirling's approximation to large factorials, we can write

20

(4pq)nU2n ~ (nn)t .

Thus if p =1= t, we have 4pq < 1 and L U2n converges faster than thegeometric series with ratio 4pq. According to Theorem 3.1 the event Ecannot be persistent, and is therefore transient. If, on the other hand,p = q = t, we have U2n ~ (nn)-t and L U2n diverges, although U2n -+ 0 asn -+ 00. In this case E is persistent, but the mean recurrence time will beinfinite (the latter follows directly from Theorem 3.2, but a direct proof isgiven below).

In terms of the gambling interpretation, we could say that when p > tthere is probability one that the accumulated gains and losses will be zeroonly a finite number of times: the game is favorable, and after someinitial fluctuation the net gain will remain positive. But if p = t, thesituation of net zero gain will recur infinitely often.

An exact calculation of the various properties of the process and ofthe distribution of recurrence times is easily effected using the generatingfunctions of the previous section. First, from (3.10) we have

so that the expected number of occurrences of E, i.e. of returns to netzero gain is

equal". We might suppose for example that a gambler wins or loses aunit sum of money at each trial with chances pOI' q, and E represents thecontingency that his accumulated losses and gains are exactly zero.

Clearly, the numbers of successes and failures can only be equal ateven-numbered trials, i.e.

23

aP{Xk = -b} =--a+b'

RECURRENT EVENTS

bP{Xk =a} =--,

a+bwhere a and b are positive integers. Define the sum Sn by

Sn = Xl + X2 + ... + X n Show that the event "Sn = 0" is persistent.5. Suppose a gambier wins or loses a unit sum at each of a series of independent

trials with equal probability. Let E be the event "the gambler's net gain is zeroafter the present trial, but was negative after the last trial". What is the distri-bution of the recurrence time of this event?

event "success". Find the generating functions U(x) and F(x). Hence show thatthe number of trials between consecutive successes has a geometric distribution.

2. Suppose that two distinguishable unbiased coins are tossed repeatedly.Let E be the event "the accumulated number of heads is the same for each coin".Prove that E is persistent with an infinite mean recurrence time.

3. An unbiased die is thrown successively, The event E is defined by "all sixspot-numbers have appeared in equal numbers". Show that E is periodic andtransient.

4. Suppose we have a sequence of mutualIy independent random variables Xkwith a common probability distribution given by

(3.24)

(3.22)

(3.21)

(3.20)

(3.18)

0Ci

B(x) = L bn~,n=O

0Ci

F(x) = L fnxn ,n=O

0Ci

U(x) = L unxn,n=O

U = :L Un = (1 - fr 1 :L bn,which is the obvious extension of the result in equation (3.7).

Theorem 3.4Let E be a persistent and not periodic delayed recurrent event, with'

mean recurrence time fl = L nfn = F'(l), thenUn ~ fl- I :L bn as 11-+ 00. (3.23)

In the case when E is transient, andf= :Lfn < I, we can easily showthat

B(x)U(x) = 1 - F(x) ,

which is the required extension of (3.5). (Alternatively, we could simplymultiply (3.18) by ~, and sum for n = 0, 1.' 2, ....), '

The theorem we require is the followmg appropnate extensIOn ofTheorem 3.2, which we simply quote here without proof (see Feller, 1957,Chapter 13, Sections 4 and 5).

or

equation (3.20) leads immediately toU(x) = U(x)F(x) + B(x),

If the corresponding generating functions are

22 STOCHASTIC PROCESSES

events given by "E occurs for the first time at t~e/th trial, and aga~n at thenth trial" for j = 1,2,3, ... ,n. The probabilIty of the latter IS fjun-jfor 1 ~ j ~ n - I, but bn for j = n. Hence this time we have

Un=fIUn_l+f2Un-2+"'+fn-IUI+bm n~l.For delayed events it is convenient to adopt the convention that

Uo = fo = bo = O. (3.19)Equation (3.18) can then be written in terms of the sequences {un}' Un}and {bn } as


t. Consider a series of independent trials, at each of which there is succ~ss orfailure, with probabilities p and q- t -po respectively. Let E be the very Simple

RANDOM WALK MODELS 25

(4.1)

CHAPTER 4

Random Walk Models

4.1 Introduction

There are often several approaches that can be tried for the investi-gation of a stochastic process. One of these is to formulate the basicmodel in terms of the motion of a particle which moves in discrete jumpswith certain probahilities from point to point. At its simplest, we imaginethe particle starting at the point x = k on the x-axis at time t = 0, and ateach subsequent time t = 1,2, ... it moves one unit to the right or leftwith probabilities p or q = 1 - p, respectively. A model of this kind isused in physics as a crude approximation to one-dimensional diffusion orBrownian motion, where a particle may be supposed to be subject to alarge number of random molecular collisions. With equal probabilities ofa transition to left or right, i.e. p = q = t, we say the walk is symmetric.Ifp > 1, there is a drift to the right; and if p < 1 the drift is to the left.

Such a model can clearly be used to represent the progress of thefortunes of the gambler in Section 3.4, who at each trial won or lost oneunit with chance p or q, respectively. The actual position of the particleon the x-axis at any time indicates the gambler's accumulated loss or gain.

In the diffusion application we might wish to have no limits on theextent to which the particle might move from the origin. We should thensay that the walk was unrestricted. If, however, we wanted the positionof the particle to represent the size of a population of individuals, where astep to the right corresponded to a birth and a step to the left meant adeath, then the process would have to stop if the particle ever reached theorigin x = O. The random walk would then be restricted, and we shouldsay that x = 0 was an absorbing barrier. .

Again, to return to the gambler, suppose that the latter starts wIth acapital of amount k and plays against an adversary who has an amounta - k, so that the combined capital is a. Then the game will cease if,

24

either x = 0 and the gambler has lost all his money, or x = a and thegambler has won all his opponent's money. In this case there are absorbingbarriers at x = 0 and x = a.

. ~s an extension of these ideas we can also take into account the possi-bIlity of the particle's remaining where it is. We could also introduce

transitio~s fro~ .any point ~n the line to any other point, and not merelyto those In posItIons one UnIt to the left or right. Other modifications arethe ~doption of reflecting barriers. Thus if x = 0 is a reflecting barrier, apartIcle at x = 1 moves to x = 2 with probability p, and remains at x = 1with probability q (i.e. it moves to x = 0 and is immediately reflected backto x.= ~). Further, w.e can have an elastic barrier, in which the reflectingbarner In the scheme Just mentioned is modified so as either to reflect or toabsorb with probabilities fJq and (1 - fJ)q, respectively.

4.2 Gambler's ruin

Historically, the classical random walk has been the gambler's ruinproblem. An analysis of the main features of this problem will illustratea number of techniques that may be of use in the investigation of moredifficult situations.

Let us suppose, as suggested in the previous section, that we have agambler with initial capital k. He plays against an opponent whose initialcapital is a - k. The game proceeds by stages, and at each step the firstgambler has a chance p of winning one unit from his adversary, and achance q = I - P of losing one unit to his adversary. The actual capitalpossessed by the first gambler is thus represented by a random walk on then.on-ne?ati:e integers with absorbing barriers at x = 0 and x = a, absorp-tIOn beIng Interpreted as the ruin of one or the other of the gamblers.

Probability of ruinNow, the probability qk of the first gambler's ruin when he starts with

an initial capital of k can be obtained by quite elementary considerationsas follows. After the first trial this gambler's capital is either k + 1 ork - I, according as he wins or loses that game. Hence

qk=pqk+l+qqk-l, l

27

(4.13)

(4.12)

(4.10)

(4.11 )

1 ~ k ~ a-I,

Pk = kia,

qk = 1 - kla.

do = 0,

RANDOM WALK MODELS

= 1 + pdk + 1 + qdk - 1 ,

Expected duration of gameThe expected duration of the game can also be obtained in a com-

paratively elementary way, without recourse to a full discussion of thewhole probability distribution, undertaken in the next section. It is worthemphasizing the importance of shortcuts of this kind, which are frequentlypossible if we are content to evaluate mean values only.

Suppose the duration of the game has a finite expectation dk , startingthe random walk as before from the point x = k. If the first trial leads to awin for the first gambler, the conditional duration from that point on isdk+l' The expected duration of the whole game if the first trial is a win(occurring with probability p) is thus 1 + dk+l' Similarly, the expectedduration of the whole game if the first trial is a loss (occurring withprobability q) is I + dk - 1 Therefore

dk = p(1 + dk+l) + q(1 + dk - 1 )

the extreme values k = 1 and k = a-I being included on the assump-tions

Now the difference equation (4.12) is simply a non-homogeneous versionof (4.1) with dk written for qk' We can, therefore, utilize the previous

and so (4.8) holds as before.It is worth remarking at this point that the difference equation in (4.1)

can also be solved by a generating-function method in which we putQ(x) = I~=o qkxk, but the above derivation is somewhat shorter.

The implications of these results for gamblers are discussed at length inFeller (1957, Chapter 14). It follows, for example, from (4.11) that givenequal skill (i.e. p = -!-) a gambler's chance of winning all his adversary'smoney is proportional to his share of their total wealth. However, we aremore concerned here with the methods of analysis of the random walkmodel.

Using the boundary conditions to determine the constants C and D leadsto the final result

(Alternatively, we could let P -> 1 in (4.6) and use I'H6pital's rule on theright-hand side.) Similarly, we find

(4.7)

(4.6)

(4.5)

(4.3)

1B=- .(qlpY - 1

(qlpt - 1Pk = (qlpY _ 1.

pw2 - W + q = 0,


(qlpYA = (qlpY _ l'

and it is not difficult to show that this is unique.In a similar way we can calculate the probability Pk ?f the first gambler's

success, i.e. his opponent's ruin. We merely have to mterchange p and. q,and write a - k for k. These substitutions applied to formula (4.6) thenyield, after rearrangement,

The required solution for qk is thus(qlpY _ (qlp)k

qk = (qlpY _ 1

with solutions

It now follows from (4.6) and (4.7) thatPk + qk = 1, (4.8)

so that the chance of an unending contest between the two gamblers iszero, a fact we were careful not to assume in advance.

If p =.1, the auxiliary equation has two equal roots w = I, and thegeneral solution must be written in the form

qk = C(1)k + Dk(l)k. (4.9)

which has roots w = I, qlpThus if p =/: 1- the equation (4.3) has two different roots, and we can

write the general solution of (4.1) asqk = A(lt + B(qlpl, (4.4)

where A and B are constants which must be chosen to satisfy the b.oundaryconditions appearing in (4.2). We therefore have the two equatIOns

1 = A + B, 0= A + B(qlpY,

provided we adopt the conventions thatqo = I, qo = 0. (4.2)

The difference equation in (4.1) is easily solved ~~ a standard me.thod,e.g. Goldberg (1958, Section 3.3). If we put qk = W m (4.1) we obtam theauxiliary equation

26

4.3 Probability distribution of ruin at nth trial

The constants A and B are easily evaluated using the boundary conditions(4.13). The final form is then

Ifp = t, we can either let p -'> t in (4.15), or solve from the beginning,using the general solution of the homogeneous equation, i.e. the result in(4.9). This time the particular solution dk = kj(q - p) breaks down, but weeasily obtain another particular solution dk = - k 2 In either case, weobtain the result for p = t as

29

(4.17)

(4.18)

(4.19)

(4.20)

(4.21)

(4.22)

(4.24)

(4.25)

n~O

Qix) = o.

1 - (1 - 4pqx2 )tw 2 =------,-----2px

00

Qk(X) = I qknXn .n=O

RANDOM WALK MODELS

QO(X) = 1,

I:::;k::::;a-l,

1 + (1 - 4pqx2 )tw l =------2px '

qOn=qan=O for n~l}.qoo = I; qkO = 0 for k > 0

We now introduce a generating function for variations in n, namely

l

quantities A and B, which are of course functions of x. After a littlereduction, we find

Special case a = 00Let us now consider for a moment the special case a = 00, which

corresponds to having an absorbing barrier only at x = O. In the gamblingmodel we can imagine the case of a gambler playing against an infinitelyrich adversary. It can be seen from (4.23) and (4.24) that, for Ixl < 1,the roots W1 and W2 satisfy the inequality 0 < W2 < 1 < w1 If, therefore,we let a --+ 00 in (4.26), we obtain the solution

31

(4.28)

(4.29)

(4.30)

p> q}p:::;, q ,= 1,

2px = (!!.)! _1_.q cos 0

1-=2(pq)tx .cos ()

qk = (q/p)k,

RANDOM WALK MODELS

1 (1 _ 4pqx2)! = cos () i sin () ,cos ()

Thus

a result which could have been obtained directly from (4.6) by lettinga --+ 00.

We can easily find

will be the sum of k independent first-passage times for successive dis-placements each of one unit to the right.

If we put x = 1 in (4.27) we obtain the total chance of ruin at somestage when playing against an infinitely rich opponent, i.e.

Exact values ofprobabilities in general caseLet us now return to the general case with two finite absorbing barriers

at ~ = 0 and x. = ,a, and see how an explicit expression for qkn may bedenved. In pnnc1ple we merely pick out the coefficient of xn in thegenerating function Qk(X) shown in (4.26), but a little care is needed inactually doing this in practice.

First, we notice that Qk(X) appears to involve the quantity (1 - 4pqx2)tthrough the roots w1 and w2 It is easily shown however that w r _ W r , , 1 2(for integral r) is a rational function in x multiplied by (1- 4pqx2)t. Thelatter quantity appears in both numerator and denominator, and so cancelsout to leave Qk(X) as the ratio of two polynomials, where the degree ofthe denominator is a-I for a odd and a - 2 for a even; and the degree ofthe numerator is a-I for a - k odd and a - 2 for a - k even. Thusthe degree of the numerator can never exceed that of the denominatorby more than one unit.

For n > 1 we could therefore calculate qkn using the kind of partialfraction expansion already discussed in Chapter 2. But the algebrasimplifies considerably if we first introduce a new variable () given by

and

(4.26)

(4.27)


Qk(X) = (qlp)kW1 -k(X)= w/(x)= {I - (1 - 4PqX2yt}k.

2px

30

Alternatively, we can solve the basic difference equation subject onlyto the boundary condition Qo(x) = 1. The solution will be unbounded'unless A = O. Hence the final result shown in (4.27).

Now Qk(X) in (4.27) is the generating function for the probability that,starting from the point x = 0, the particle will be absorbed at the originon exactly the nth step. Alternatively, if we choose to envisage a com-pletely unrestricted random walk starting from x = k, Qk(X) is the gener-ating function for the distribution of first-passage times through a point kunits to left of the initial position. First-passage time distributions forpoints to the right of the initial point are given simply by interchanging pand q, and the appropriate generating function comes out to be w1-k(X)instead of W2k(X). It will be noticed that w1-k(X) is the kth power of thegenerating function for the first-passage time to a point just one unit to theright of the starting point. This is what we should expect, since a littlereflection shows that the first-passage time to a point k units to the right

(q)k W1Q-\X) - w/-k(x)Qk(X) = -P w1

Q(x) - w2 Q(x)(using, for example, the fact that w1(x)wix) = q/p), and this is the gener-ating function for the probability of absorption at x = O. The corres-ponding generating function Pk(x) for the probability Pkn of absorption atx = a at the nth trial is clearly obtained by replacing p, q, and kin (4.26)by q, p, and a - k, respectively. Finally, the sum of these two generatingfunctions, Qk(X) + Pk(x), must give the generating function of the proba-bility that the process terminates at the nth trial (either at x = 0 or x = a).

Finally, we can pick out the coefficient qk. of x in the partial fractionexpansion (4.34), for n > I, as

33

(4.39)

(4.38)

(4.41)

(4.42)

(4.37)

(4.43)

w

(w) = pw 2 + q.

f(w) = z

w - Wof(w) - Zo = (w) .

RANDOM WALK MODELS

00 (z _ Z )n [ dn- 1 ]w = Wo + n~1 n! 0 dwn-1 {(w)}n w=wo'

Hence, in the notation of the above theorem, we have

More generally, we can obtain the expansion of a function t{!(w) as

0-1qkn = a-12npt(n-k)qHn+k) L cosn- 1jn/a sinjn/a sin kjn/a, (4.36)

j=1

00 (z z)n [ dn- 1 ]t{!(w) = t{!(wo) + Jl ~! 0 dwn-1 t{!'(w){(w)}" w=wo (4.40)

has a unique solution, regular in a neighborhood of zo, given by

using the method previously described in Section 2.5.The special case a = 00 can be dealt with more simply, since we have

only to expand the simpler form given by the generating function in(4.27). We can do this most easily by using Lagrange's expansion (see,for example, Copson, 1948, Section 6.23). This expansion is often veryuseful and easy to handle, but seems not to be as widely known as it shouldbe. It is therefore worth quoting the main theorem required in somedetail, though the interested reader should consult the reference given forfuller explanations.

where

in powers of x, where w is the root of (4.23) that vanishes when x = 0,i.e. Wo = 0, X o = 0. Now we can rewrite (4.23) as

We can apply this theorem to (4.27) quite easily. Treating x as acomplex variable, we want to expand the function

Lagrange's formula for the reversion of seriesSuppose f(w) is a function of the complex variable w, regular in the

neighborhood of wo, andf(wo) = zo, wheref'(wo) # 0, then the equation

(4.35)

(4.34)

(4.33)

(4.32)

(4.31)

1


W1,2(X) = (q/p}~(COS U i sin 0)= (q/p)t ei8.

. (q)tk sin(a - k)OPj = 11m (x j - x) - . 0

X-+Xj P sin a

PI Po-IQk(X) = Ax + B +-- + ... + _'x 1 -x X o-l X

(q)tk sinHa - k)jn/a}

= p -a cosjn-(dOjdx)x=x/using l'H6pital's rule for the limit of (xj - x)/sin aO,

= _ (9.)tk sinHa - k)jn/a}sin jn/ap 2a(pq)t cosjn cos 2 jn/a

= (9.)tk sin kjn/a sinjn/a.p 2a(pq)t cos2 jn/a

where

X= 1 J 2(pq)"J: cos jn/a

All possible values of x are obtained by putting j = 0, 1, ... ,a. Butto j = and j = a ther/ correspond the values 0~ .and 0 = n, v~hichare also roots of the numerator in (4.32). Further, If a IS even, there IS nox. corresponding to j = tao Thus if a is odd we have all a-I ro~ts Xj

~ith j = 1, 2, ... ,a - 1; but if a is even, the value j = ta must be omitted.We can now write

_ (~)k (q/prl:(0-k)(e(0-k)i8_e~(0-k)i8)Qb) - p (q/p)tO(eo/8 _ e 0/8)

= (~)tk sin(~ - k)O .p sm aO

Now the roots of the denominator in (4.32) are 0 = 0, n/a, 2n/a, ... ,and the corresponding values of x are given by

We can now write

Hence

32

4.4 Extensions

35

(4.46)

(4.47)1

11:n

RANDOM WALK MODELS

= ~n (2n) I (n) 2.4 n k~O k

1 n (211)!U 2 = - I -;-;--c;-:--;-----;-:-:-;--

n 42n k ~ 0 k! k! (n - k)! (11 - k)!

= (~).= coeff. of xn in (1 + x)2n

The case of an unrestricted symmetric random walk in one dimensionhas already been treated in Section 3.4 in the context of recurrent eventth.eory. We note that for p = t there is probability one that the particle~lll return to the origin sooner or later, i.e. the event is persistent, thoughIt was shown that the mean recurrence time was infinite.

Now consider the case of a symmetric random walk in two dimensionswhere the particle at the point (x, y) has a probability of i of moving at thenext step to anyone of its four neighbors (x + 1, y), (x - 1, y), (x, y + 1),(x, ~ - 1). Le~ I~n be the chance that a particle starting at the origin, isagam at the onglIl at the nth step. This can only happen if the numbers ofsteps in each of the positive x and y directions equal those in the negativex and y directions, respectively. Thus U1n + 1 = 0, and

The first expression above follows from the appropriate multinomialdistribution in which we have k steps in each of the two x directions, andn - k steps in e~ch of the two y directions. The second expression maybe developed, uSlIlg a well-known identity for binomial coefficients, namely,

Jo (Zr = (~)(~) + (~) (n ~ 1) + '"= coeff. of xnin {(~) + (~)x + ... + (~)xnr

Therefore

using Stirling's approximation to the factorials. It is evident from (4.47)that I U2n diverges so that a return to the origin is, once more, a persistentevent.

A different state of affairs results, however, when we go up to threedimensions. We now imagine a particle moving on a cubic lattice, the

(4.44)

(4.45)


q _ ~ ( n )p-l:(n-k)q-Hn+k)kn - n in - ik '

34

So far in this chapter, we have restricted the discussion to certainspecial kinds of one-dimensional random walk models. For example, wehave supposed that the only transitions possible at any trial are from thepoint in question to one of the points just one unit to the left or right.But, more generally, we could consider the possibility of transitions to anyother point, such as occur in applications to sequential sampling pro-cedures (see Feller, 1957, Chapter 14, Section 8).

Again we can consider extensions to two or more dimensions, and weshall have occasion later in this book to see how certain aspects of rathercomplicated situations can sometimes be represented in this way. Con-sider, for example, the epidemic model of Section 12.3, where the numbersof susceptibles and infectives in the population at any moment may berepresented by a point in the plane with coordinates (r, s). Only certainkinds of transitions, with specified probabilities, are possible, and thereare two barriers, the one given by s = obviously being absorbing (sincethe epidemic ceases when there are no more infectives in circulation).

We shall not develop any special theory here for such extensions, butshall merely note an interesting result due to P6lya showing an importantdifference of behavior for an unrestricted symmetric random walk in threedimensions compared with the corresponding results in one and two di-mensions.

where we consider only those terms for which i(n - k) is an integer in theinterval [0, n].

Now if we evaluate the coefficient in square brackets on the right of(4.44) by using the usual formula for the repeated differentiation of aproduct, we see that the only term surviving when w = arises fromdifferentiating W k - 1 exactly k - 1 times; the quantity (pw2 + qt mustthen be differentiated n - k times, and the only surviving term here comesfrom the coefficient of (W 2 )-Hn- k ). Collecting terms gives the requiredcoefficient of xn as

The required expansion is therefore obtained immediately from (4.40) as


I:p/ ~ max{Pr} LPr = max{Pr }

RANDOM WALK MODELS 37

4. Consider the one-dimensional random walk with absorbing barriers atx = 0 a~d x = a. Let qknW be the probability that a walk starting at x = k carriesthe partIcle at the nth step to the point x = g,

What are the difference equations and boundary conditions which would haveto be solved in order to calculate qkn(t) explicitly?

5..Suppose we h~v: a two-dimensional unrestricted symmetric random walkst~r:mg from the ongm. Let rn be the distance of the moving particle from theongIn after the nth step. Prove that E(rn 2) = 11.

(4.49)

(4.48)


= ~ (2n) I: {~ n! }222n n j+k

P{Xn = j/Xn - 1 = i, ... , X o = a} = P{Xn = j/Xn - 1 = i}. (5.3)(It is possible to define Markov chains of a higher order, in which the ex-pression on the right of (5.3) involves some fixed number of stages prior tothe nth greater than one. We shall not, however, pursue this generalizationhere.)

so that the initial distribution is given by pP).If we have Xn - 1 = i and Xn = j, we say that the system has made a

transition of type E i --+ Ej at the nth trial or step. Accordingly, we shallwant to know the probabilities of the various transitions that may occur.If the trials are not independent, this means that in general we have tospecify

39

(5.1)

(5.2)

MARKOV CHAINS

P{Xn = j/Xn - 1 = i, X n - 2 = h, .,. ,Xo = a}.With independent trials, the expressions in (5.1) and (5.2) are of courseidentical.

Now, as already indicated in the previous section, the Markov propertymeans that the future behavior of a sequence of events is uniquely decidedby a knowledge of the present state. In such a case the transition prob-abilities in (5.2) depend only on Xn - 1 , and not on the previous randomvariables. We can therefore define a Markov chain as a sequence of con-secutive trials such that

Before elaborating a mathematical discussion of the kind of processreferred to in the last section, we must first introduce an adequate notationand definitions of the basic concepts.

Suppose we have a sequence of consecutive trials, numberedn = 0, 1,2, .... The outcome of the nth trial is represented by the randomvariable X", which we shaH assume to be discrete and to take one of thevalues.i = 1, 2, '" . The actual set of outcomes at any trial is a system ofevents E i , i = 1, 2, "', that are mutuaHy exclusive and exhaustive: theseare the states of the system, and they may be finite or infinite in number.

Thus in a simple series of coin-tossing trials, there are just two events:E 1 == heads, say, and E2 == tails. For convenience we could label theseevents 1 and 2. On the other hand, in a series of random digits the eventscould be the actual numbers 0 to 9. And in a population of animals, theevents might be the numbers of new births in each generation.

We next consider the probabilities of the various events. Let us write theabsolute probability of outcome Ej at the nth trial as

5.2 Notation and definitions

Markov Chains

5.1 Introduction

In our previous discussion of recurrent events .in Chapter. 3 we .startedoff by envisaging a sequence of indepe~dent.tnals, each InvolvIng forexample the tossing of a coin. From consideratIOn of the outcome of :~chindividual trial we passed to more complex patterns,~uch as ~he event theaccumulated numbers of heads and tails are equal . And In the form~1definition of a recurrent event it was not necessary to assume that t?e basIctrials were independent of one another; only that once the event In ques-tion had occurred the process started off again from scratch.

We now want to generalize this idea slightly so as to be. able to takeaccount of several possible outcomes at each stage, but shall In~rodu.ce therestriction that the future probability behavior of the proce~s IS. umquel.ydetermined once the state of the system at the present ~t~ge IS given. ~hlsrestriction is the characteristic Markov property, and It IS not so sen~usas might appear at first sight. It is equivalent, for ~xample, to expressl~g .the probability distribution of the size of a populatIOn tom?rrow solely Interms of the size today, taking into account the :ffe.cts of birth and death.In reality, of course, the population tomorrow IS hk~ly to be related notonly to the size today, but also to the sizes on many prevIOus days. Neverthe-less, the proposed restriction frequently enables us to formulat~ mo~elsthat are both mathematically tractable and useful.as first approximatIOnsto a stochastic picture of reality. Moreover, as will appear In the sequel,there are several ways in which the more ela?orate.mo~els can often behandled by methods appropriate to the. restncted sltuatlO.n, such as, forexample, concentrating attention on a senes of elements havmg the Markovproperty. . II

For a more extensive account of the theory of Markov chams, see Fe er(1957, Chapter 15); or the whole book by Kemeny and Snell (1960).

38

CHAPTER 5

41

(5.8)

(5.12)

(5.10)

(5.11)

MARKOV CHAINS

00

P ,(1) = " p ..p.(O)J l..J IJ l ,i= I

5.3 Classification of states

A good deal of the practical importance of Markov chain theory attachesto the fact that the states can be classified in a very distinctive manneraccording to certain basic properties of the system. An outline of the mainclassification, together with some of the more important theorems in-volved, is given below.

If the state Ej can be arrived at from the state E i in a finite number ofsteps with non-zero probability, we say that it can be reached; i.e. there isa number n > 0 such that pijln) > O. Thus in an unrestricted random walk,for example. any point on the line can be reached from any other point.

We also have

and in general

Th.e. ,?atrix ~~ therefore giv~s t~e required set of n-step transition prob-abilitIes {Pij }, and the denvatlOn shows that, for homogeneous chainsat least, this is independent of m.

Equation (5.11) is of basic importance. It shows how to calculate theabsolute probabilities at any stage in terms of the initial distribution

(0) d th t . . . PT' .p an e ransltIon matnx . he mam labor m actual calculations isof course the evaluation of the nth power of the matrix P. In principlewe could always proceed step by step. But in practice there is a consider-able advantage in using special methods of calculating pn more directly.One such method is described in detail in Section 5.5.

since the probability that the system is in state i initially is p/O), and theprobability of a transition from i to J is Pij' and we must sum over alltransitions leading to state j. Tn matrix terms, we can express (5.8) as

pO) = pp(O). (5.9)

column vector p(n J, the initial distribution being p(O). Now the distributionat the first stage is given by

Similarly,

(5.7)

(5.6)

(5.5)

(5.4)

i ~ " 2, },


P{Xn+ m= jlXm= i} = pjn)].Pij(l) = Pij

P{Xn = JIXn - 1 = i} = Pij'

P = lPl1 P2l P31P12 P22 P32P13 P23 P33. . .

. . .

00

LPij=l,j= I

with

We have written the n-step transition probability pi/n) as being independentof m. This is in fact true for homogeneous chains, as we can easily show.

Suppose we write the absolute probabilities p/n) at the nth stage as the

This is the transition matrix. It will be of finite or infinite order, dependingon the number of states involved. The elements will all be non-negative,and in virtue of (5.5) the columns all sum to unity: a matrix with the latterproperty is often called a stochastic matrix. (We could have used the alter:native definition P = {pd, instead of P = {PiJ', but the latter transposedform is algebraically a little more convenient.)

In order to determine the absolute probabilities at any stage, we shallneed the idea of n-step transition probabilities, where n can now be greaterthan unity. Let us in fact write

since, for any fixed i, the Pij will form a probability distribution.In typical applications we are likely to be given the initial distribution

and the transition probabilities, and we want to determine the probabilitydistribution for each random variable X n In particular, we may be in-terested in the limiting distribution of X n as n -> 00, if this exists.

The transition probabilities are most conveniently handled in matrixform. Let us write, therefore, P = {pd', i.e.

and

Note that the order of the subscripts in Pij corresponds to the direction ofthe transition, i.e. i -> j. We must have, of course,

Now an important class of chains defined by the property in (5.3) is thatfor which the transition probabilities are independent of n. We then havea homogeneous Markov chain, for which

40

(5.14)

(5.15)

x

x

x x x

x

x

I.(n) = 1J J J

x

x x xx x

x

x

xQ=The transition matrix now takes the form

[ x ] ;

MARKOV CHAINS 43

A clearer picture is obtained by suitably relabeling the states as follows:

independently of the rest of the array. Note that the last three statesE~, E~ and E9are not independent of the first six, although the latter ar;independent of the former.

N?w it was indicated in Section 5.1 that we proposed to generalize ourprevIOus account of recurrent events in Chapter 3 to include situationswit~ several possible ?~tcomes at each stage. It should be immediatelyO~VIOUS that th.e definItIo~ ~f a M~rko~ chain in the present chapter en-tails the follOWing propositions. First, If we consider any arbitrary stateEj , and suppose that the system is initially in Ej , then "a return to E." is~i~~l~ a re~urrent event as discussed in Section 3.2. Secondly, if the sy~temIS Initially In som~ othe~ state E i , "a passage to E/' is a delayed recurrente.vent, as defined In SectIOn 3.5. We can, therefore, make a direct applica-tIOn ofthe theory ofrecurrent events to deal with the present case of Markovchains.

Let us writef/n) for the probability that, starting from state E. the first J'

return to E j occurs at preCisely the nth step. In the notation of Section 3.2,we have

The closed sets (ED: (E~, ED and (E~, E;, ED are now evident at a glance.Thus we could consider tl}e corresponding stochastic matrices


First we examine the columns of the transition matrix for columns con-taining'a single positive element in the leading diagonal. The fifth columnis precisely of this type. It is clear that the actual entry must be P55 = 1,and that E 5 is accordingly an absorbing state.

Next, we look at columns with a single entry in some position other thanon the leading diagonal. The third, fourth, sixth, and eighth columns areof this type, but we notice a reciprocal relation between the third andeighth columns, in that P38 = 1 = P83' The states E3 and 8 evidentlyform a closed set, but nothing special can be said about 4 and E6

Now from E 1 we can go to 4 and Ell in a single step; from 4 we returnto E 1 ; and from 9 we can move to 4 or remain in 9' The three states1' E4 and 9 therefore constitute another closed set.

But if there is an absorbing barrier, no other state can be reached once thebarrier has been arrived at.

If every state in a Markov chain can be reached from every other state,the chain is said to be irreducible. Alternatively, we may have a set ofstates C, which are closed in the sense that it is impossible to reach anystate outside C from any state of C itself by one-step transitions. Moreprecisely, Pij = 0 if E i C C and Ej C. In particular, if a single state Ekforms a closed set it is called an absorbing state, and Pkk = 1.

It follows from these definitions that in an irreducible Markov chain theset of all states constitutes a closed set, and no other set can be closed.

The importance of the concept of a closed set of states lies in the factthat we can envisage a sub-Markov chain defined on the closed set; andsuch a chain can be studied independently of the remaining states.

ExampleIn order to illustrate the items of classification introduced so far, let us

consider a Markov chain discussed by Feller (1957, Chapter 14, Section 4)for which the transition matrix is given, at least in skeleton form, below.It is of course necessary to indicate only which elements are zero (shownby a dot) and which are positive (shown by a cross). Suppose we have

42

We can therefore call the state Ej persistent ifjj = 1, with mean recurrencetime

45

(5.21)

(5.23)

(5.24)

(5.20)

(5.22)

{' (n) _ {'Jij - jm

ro

fij = L: j;ll,n=l

MARKOV CHAINS

n

P..(n) = '\' I'. .(m)p ..(n-m) n >-: 1IJ L. J iJ IJ ' ;;--,m=l

n-lI'. .(n) = p ..(n) _ '\' I'. .(n)p ..(n-m) n >-: 1

J iJ IJ L. J iJ IJ ' ;;--,m= 1

and the analog of equation (5.16) is

which may be put in the form

From E2 the system must pass to E 3 , and then back to E 2 The statesE 2 and E3 are thus persistent, and are periodic with period 2. We can alsosee that states E4 , E 5 and E6 form a closed subset which is persistent andaperiodic. So far as the last three states are concerned, we first note thatfrom E7 each of the three closed sets can be reached, and on entering oneof these sets the system stays there. There is thus a non-zero chance of noreturn, and E 7 is accordingly transient. Similarly, from E 9 we go to E 7 ,with no possible return to E9 So the latter is also transient. Finally,from E8 the system arrives sooner or later at E 7 , never to return. HenceE7 , E 8 , and E 9 are all transient states.

Having given the main classification required for individual states,based on the definitions of recurrent events introduced in Section 3.2, wenext examine the delayed type of recurrent event which occurs when weconsider a passage to Ej from some other state E i Let us write!ij(n) for theprobability that, starting from state E i , the first passage to Ej occurs atprecisely the nth step. In the notation of Section 3.2 we now have

which is the appropriate extension of (5.18).We are now in a position to state some results of fundamental impor-

tance to the theory of Markov chains. These are obtained directly fromTheorems 3.2, 3.3, and 3.4, suitably interpreted in the present context.

Theorem 5.1If Ej is persistent and aperiodic, with mean recurrence time J.1j' then

corresponding to (5.17). The probability that, starting from E i , the systemever reaches E j , is !ij' where

and in particular. if Ej is also null, i.e. J.1j = 00,

fJJ/ n ) -> 0 as n -> 00.

(5.19)

(5.18)

(5.16)


n

P..(n) = '\' j.(m)p ..(n-m), n ~ 1,)) L.., J JJm=l

ro

J.1j = L njj(n) ,n=l

and transient ifjj < 1, in which case we have J.1j = 00. From Theor.em 3.~,it follows that a necessary and sufficient condition for Ej to be persistent ISthat'\' p ..(n) should diverge, though this is often hard to apply.

L.., n . IIIf the state E is persistent, but J.1j = 00, we say that Ej IS a nu state.We must als~ consider the possibility of a periodicity in the return to a

given state. Thus the definition at the end of S~ction 3.2 ~an be applied,and we say tha.t the state E is periodic with penod t > 1, If Ej can occur

J E .only at stages numbered t, 2t, 3t, . ". When t = 1, we say that j ISaperiodic. .

If Ej is persistent, aperiodic, and not null, it is said to be ergodic, I.e./.. = 1, J.1j < 00, t = 1.) Now let T be the set of transient states. The remaining persistent states

can be divided into mutually disjoint closed sets, C l , C2 , , such thatfrom any state of one of these sets all states of this set, and no others, canbe reached. Moreover, states in Cn may be reached from T, but not con-versely. It can be shown that all states of a closed set Cnmust have thesame period, and so we can speak of the period of Cn' .,

An important result in the classification of Markov ch~ms IS that allstates of an irreducible chain are of the same type: that IS, they are alltransient; all persistent and null; or all persistent and non-null. Moreover,in each case all states have the same period.

We now consider for a moment the previous illustration of a Markovchain whose transition matrix Q is given by (5.14). Dropping primes forconvenience. we have EI as an absorbing state, and therefore persistent.

Formula (3.4) now takes the form

where, conventionally, pjO) = 1. If we put (5.16) in the formn-l

jj(n) = pj}n) _ L fr)pj}n-m), n ~ 1, (5.17)m=l

then it is clear that we have a set of recurrence relations giving the recur-rence time distribution {jj(n)} in terms of the pjn).

In particular, we can define the probability that the system returns atleast once to Ej as jj, where

44

n--> 00

5.4 Classification of chains

47

(5.29)IP - All = 0,

x= Px,

with L Ixd < 00 (this is Foster's theorem).

p = Pp, (5.28)~ith L Pi = 1 and Pi ~ 0, is a stationary distribution in the sense that if itIS chosen for an initial distribution, all subsequent distributions p(n) willalso be identical with p.. ~~othe~ v~lua?le t?eorem is that if a Markov chain is ergodic, then thehmlt~n~ dlstn~utlOn IS stationary; this is the only stationary distribution,and It IS obtamed by solving equation (5.28) above.

Tw~ :urther theorems are worth quoting in respect of the existence ofergodlclty:

Theorem 5.4All states of a ~ni~e, ap.eriodic, irreducible Markov chain are ergodic,

and hence the cham Itself IS ergodic.

Theorem 5.5An irreducible and aperiodic Markov chain is ergodic if we can find a

non-null solution of

are all distinct.Now the equations

. As. indicated a~ the end of .Section 5.2, a problem of fundamental prac-tical ~mportance m the handlmg of Markov chains is the evaluation of thematnx pn, ~hi.ch gives th~ whole set of n-step transition probabilities.

~e.cause of l~s.lmportancem actually constructing the required solution,It IS worth gl:mg the appropriate mathematical derivation in some detail.

Let us begm by supposing that we have a Markov chain

Documents

The Elements of Stochastic Processes - With Applications to the Natural Sciences Norman t j