Stochastic Chemical Kinetics - TBIpks/Preprints/stochast.pdf · numbers. Stochastic kinetics is an interdisciplinary subject and hence the course will contain elements from various

Stochastic Chemical Kinetics

A Special Course on Probability and

Stochastic Processes for Physicists,

Chemists, and Biologists

Summer Term 2011

Version of July 7, 2011

Peter Schuster

Institut fur Theoretische Chemie der Universitat Wien

Wahringerstraße 17, A-1090 Wien, Austria

Phone: +43 1 4277 527 36 , Fax: +43 1 4277 527 93

E-Mail: pks @ tbi.univie.ac.at

Internet: http://www.tbi.univie.ac.at/~pks

2 Peter Schuster

Preface

The current text contains notes that were prepared first for a course on

‘Stochastic Chemical Kinetics’ held at Vienna University in the winter term

1999/2000 and repeated in the winter term 2006/2007. The current version

refers to the summer term 2011 but no claim is made that it is free of errors.

The course is addressed to students of chemistry, biochemistry, molecular

biology, mathematical and theoretical biology, bioinformatics, and systems

biology with particular interests in phenomena observed at small particle

numbers. Stochastic kinetics is an interdisciplinary subject and hence the

course will contain elements from various disciplines, mainly from probability

theory, mathematical statistics, stochastic processes, chemical kinetics, evo-

lutionary biology, and computer science. Considerable usage of mathemat-

ical language and analytical tools is indispensable, but we have consciously

avoided to dwell upon deeper and more formal mathematical topics.

This series of lectures will concentrate on principles rather than technical

details. At the same time it will be necessary to elaborate tools that allow

to treat real problems. Analytical results on stochastic processes are rare

and thus it will be unavoidable to deal also with approximation methods and

numerical techniques that are able to produce (approximate) results through

computer calculations (see, for example, the articles [1–4]). The applica-

bility of simulations to real problems depends critically on population sizes

that can by handled. Present day computers can deal with 106 to 107 parti-

cles, which is commonly not enough for chemical reactions but sufficient for

most biological problems and accordingly the sections dealing with practical

examples will contain more biological than chemical problems.

The major goal of this text is to avoid distraction of the audience by taking

notes and to facilitate understanding of subjects that are quite sophisticated

at least in parts. At the same time the text allows for a repetition of the

major issues of the course. Accordingly, an attempt was made in preparing a

3

4 Peter Schuster

useful and comprehensive list of references. To study the literature in detail

is recommended to every serious scholar who wants to progress towards a

deeper understanding of this rather demanding discipline. Apart from a

respectable number of publications mentioned during the progress of the

course the following books were used in the preparation [5–9], the German

textbooks [10, 11], elementary texts in English [12, 13] and in German [14]. In

addition, we mention several other books on probability theory and stochastic

processes [15–26]. More references will be given in the chapters on chemical

and biological applications of stochastic processes.

Peter Schuster Wien, March 2011.

1. History and Classical Probability

An experimentalist reproduces an experiment. What can he expect to find?

There are certainly limits to precision and these limits confine the repro-

ducibility of experiments and at the same time restrict the predictability

of outcomes. The limitations of correct predictions are commonplace: We

witness them every day by watching the failures of various forecasts from

the weather to the stock market. Daily experience also tells us that there

is an enormous variability in the sensitivity of events with respect to preci-

sion of observation. It ranges from the highly sensitive and hard to predict

phenomena like the ones just mentioned to the enormous accuracy of astro-

nomical predictions, for example, the precise dating of the eclipse of the sun

in Europe on August 11, 1999. Most cases lie between these two extremes

and careful analysis of unavoidable randomness becomes important. In this

series of lectures we are heading for a formalism, which allows for proper ac-

counting of the limitations of the deterministic approach and which extends

the conventional description by differential equations.

In order to be able to study reproducibility and sensitivity of prediction

by means of a scientific approach we require a solid mathematical basis. An

appropriate conceptual frame is provided by the theory of probability and

stochastic processes. Conventional or deterministic variables have to be re-

placed by random variables which fluctuate in the sense that different values

are obtained in consecutive measurements under (within the limits of control)

identical conditions. The solutions of differential equations, commonly used

to describe time dependent phenomena, will not be sufficient and we shall

search for proper formalisms to model stochastic processes. Fluctuations play

a central role in the stochastic description of processes. The search for the

origin of fluctuations is an old topic of physics, which still is of high current

interest. Some examples will be mentioned in the next section 1.1. Recently,

5

6 Peter Schuster

the theory of fluctuations became important because of the progress in spec-

troscopic techniques, particularly in fluorescence spectroscopy. Fluctuations

became directly measurable in fluorescence correlation spectroscopy [27–29].

Even single molecules can be efficiently detected, identified, and analyzed by

means of these new techniques [30, 31].

In this series of lectures we shall adopt a phenomenological approach

to sources of randomness or fluctuations. Thus, we shall not search here

for the various mechanisms setting the limitations to reproducibility (except

a short account in the next section), but we will develop a mathematical

technique that allows to handle stochastic problems. In the philosophy of

such a phenomenological approach the size of fluctuations, for example, is

given through external parameters that can be provided by means of a deeper

or more basic theory or derived from experiments. This course starts with a

very brief primer of current probability theory (chapter 2), which is largely

based on an undergraduate text by Kai Lai Chung [5] and an introduction to

stochasticity by Hans-Otto Gregorii [11] . Then, we shall be concerned with

the general formalism to describe stochastic processes (chapter 3, [6]) and

present several analytical as well as numerical tools to derive or compute

solutions. The following two chapters 4 and 5 deal with applications to

selected problems in chemistry and biology.

Stochastic Kinetics 7

1.1 Precision limits and fluctuations

Conventional chemical kinetics is handling ensembles of molecules with large

numbers of particles, N ≈ 1020 and more. Under many conditions1 random

fluctuations of particle numbers are proportional to√N . Dealing with 10−4

moles, being tantamount to N = 1020 particles, natural fluctuations involve

typically√N = 1010 particles and thus are in the range of ±10−10N . Un-

der these conditions the detection of fluctuations would require a precision

in the order of 1 : 10−10, which is (almost always) impossible to achieve.2

Accordingly, the chemist uses concentrations rather than particle numbers,

c = N/(NL × V ) wherein NL = 6.23 × 1023 and V are Avogadro’s number

and the volume (in dm3), respectively. Conventional chemical kinetics con-

siders concentrations as continuous variables and applies deterministic meth-

ods, in essence differential equations, for modeling and analysis of reactions.

Thereby, it is implicitly assumed that particle numbers are sufficiently large

that the limit of infinite particle numbers neglecting fluctuations is fulfilled.

In 1827 the British botanist Robert Brown detected and analyzed ir-

regular motions of particles in aqueous suspensions that turned out to be

independent of the nature of the suspended materials – pollen grains, fine

particles of glass or minerals [32]. Although Brown himself had already

demonstrated that Brownian motion is not caused by some (mysterious) bi-

ological effect, its origin remained kind of a riddle until Albert Einstein [33],

and independently by Marian von Smoluchowski [34], published a satisfac-

tory explanation in 1905 which contained two main points:

(i) The motion is caused by highly frequent impacts on the pollen grain of

the steadily moving molecules in the liquid in which it is suspended.

(ii) The motion of the molecules in the liquid is so complicated in detail

that its effect on the pollen grain can only be described probabilistically

1Computation of fluctuations and their time course of will be the subject of this course.

Here we mention only that the√N law is always fulfilled in the approach towards equi-

librium and stable stationary states.2Most techniques of analytical chemistry meet serious difficulties when accuracies in

particle numbers of 10−4 or higher are required.

8 Peter Schuster

in terms of frequent statistically independent impacts.

In particular, Einstein showed that the number of particles per unit volume,

f(x, t),3 fulfils the already known differential equation of diffusion,

∂f

∂t= D

∂2f

∂x2with the solution f(x, t) =

N√4πD

exp(−x2/(4Dt)

)

√t

,

where N is the total number of particles. From the solution of the diffusion

equation Einstein computes the square root of the mean square displacement,

λx, the particle experiences in x-direction:

λx =√x2 =

√2Dt .

Einstein’s treatment is based on discrete time steps and thus contains an

approximation – that is well justified – but it represents the first analysis

based on a probabilistic concept of a process that is comparable to the current

theories and we may consider Einstein’s paper as the beginning of stochastic

modeling. Brownian motion was indeed the first completely random process

that became accessible to a description that was satisfactory by the standards

of classical physics. Thermal motion as such had been used previously as the

irregular driving force causing collisions of molecules in gases by James Clerk

Maxwell and Ludwig Boltzmann. The physicists in the second half of the

nineteenth century, however, were concerned with molecular motion only as

it is required to describe systems in the thermodynamic limit. They derived

the desired results by means of global averaging statistics.

Thermal motion as an uncontrollable source of random fluctuation has

been complemented by quantum mechanical uncertainty as another limita-

tion of achievable precision. For the purpose of this course the sensitivity of

processes too small (and uncontrolled) changes in initial conditions, however,

is of more relevance than the consequences of uncertainty. Analysis of com-

plex dynamical systems was initiated in essence by Edward Lorenz [35] who

detected through computer integration of differential equations what is nowa-

days called deterministic chaos. Complex dynamics in physics and chemistry

3For the sake of simplicity we consider only motion in one spatial direction, x.


has been known already earlier as the works of the French mathematician

Henri Pioncare and the German chemist Wilhelm Ostwald demonstrate. New

in the second half of this century were not the ideas but the tools to study

complex dynamics. A previously unknown power in the analysis by numerical

computation became available through easy access to electronic computers.

These studies have shown that the majority of dynamical systems modeled

by nonlinear differential equations show irregular – that means non-periodic

– oscillation for certain ranges of parameter values. In these chaotic regimes

solutions curves were found to be extremely sensitive to small changes in the

initial conditions. Solution curves which are almost identical at the beginning

deviate exponentially from each other. Limitations in the control of initial

conditions, which are inevitable because of the natural limits to achievable

precision, result in upper bounds of the time spans for which the dynamics of

the system can be predicted with sufficient accuracy. It is not accidental that

Lorenz detected chaotic dynamics first in equations for atmospheric motions

which are thought to be so complex that forecast is inevitably limited to

rather short times.

Fluctuations play an important role in highly sensitive dynamical sys-

tems. Commonly fluctuations increase with time and any description of such

a system will be incomplete when it does not consider their development

in time. Thermal fluctuations are highly relevant at low concentration of

one, two or more reaction partners or intermediates and such situations oc-

cur almost regularly in oscillating or chaotic systems. An excellent and well

studied example in chemistry is the famous Belousov-Zhabotinskii reaction.

In biology, on the other hand, we encounter regularly situations that are

driven by fluctuations. Every mutation leading to a new variant produces

a single individual at first. Whether or not the mutant will be amplified to

population level depends on both, the properties of the new individual and

events that are completely governed by chance.

Both phenomena, quantum mechanical uncertainty and sensitivity of

complex dynamics, provided an ultimate end for the deterministic view of the

world. Quantum mechanics set a principle limit to determinism that com-

monly becomes evident only in the world of atoms and molecules. Limited

10 Peter Schuster

predictability of complex dynamics is more of a practical nature: Although

the differential equations used to describe and analyze chaos are still deter-

ministic, initial conditions of a precision that can never be achieved in reality

would be required for correct long-time predictions.

1.2 Thinking in terms of probability

The concept of probability originated from the desire to analyze gambling

by rigorous mathematical thoughts. An early study that has largely re-

mained unnoticed but contained already the basic ideas of probability was

done in the sixteenth century by the Italian mathematician Gerolamo Car-

dano. Commonly, the beginning of classical probability theory is attributed

to the French mathematician Blaise Pascal who wrote in the middle of sev-

enteenth century – 100 years later – several letters to Pierre de Fermat. The

most famous of this letters, dated July 29, 1654, reports the careful observa-

tion of a professional gambler, the Chevalier de Mere. The Chevalier observed

that obtaining at least one “six” with one die in 4 throws is successful in more

than 50% whereas obtaining at least two times the “six” with two dice in

24 throws has less than 50% chance to win. He considered this finding as

a paradox because he calculated naıvely and erroneously that the chances

should be the same:

4 throws with one die yields 4× 1

6=

2

3,

24 throws with two dice yields 24× 1

36=

2

3.

Blaise Pascal became interested in the problem and calculated correctly the

probability as we do it now in classical probability theory by careful counting

of events:

probability = Prob =number of favorable events

total number of events. (1.1)

Probability according to equation (1.1) is always a positive quantity between

zero and one, 0 ≤ Prob ≤ 1. The sum of the probabilities that an event

has occurred or did not occur thus has to be always one. Sometimes, as in


Pascal’s example, it is easier to calculate the probability of the unfavorable

case, q, and to obtain the desired probability as p = 1 − q. In the one-

die example the probability not to throw a “six” is 5/6, in the two-dice

example we have 35/36 as the probability of failure. In case of independent

events probabilities are multiplied4 and we finally obtain for 4 and 24 trials,

respectively:

q(1) =

(5

6

)4

and p(1) = 1−(

5

6

)4

= 0.5177 ,

q(2) =

(35

36

)24

and p(2) = 1−(

35

36

)24

= 0.4914 .

It is remarkable that the gambler could observe this rather small difference

in the probability of success – he must have tried the game very often indeed!

Statistics in biology has been pioneered by the Augustinian monk Gregor

Mendel. In table 1.1 we list the results of two typical experiments distin-

guishing roundish or wrinkled seeds with yellow or green color. The ratios

observed for single plants show large scatter. In the mean values for ten

plants some averaging has occurred but still the deviations from the ideal

values are substantial. Mendel carefully investigated several hundred plant

and then the statistical law of inheritance demanding a ratio of 3:1 became

evident [36]. Ronald Fisher in a rather polemic publication [37] reanalyzed

Mendel’s experiments, questioned Mendel’s statistics, and accused him of

intentionally manipulating his data because the results are too close to the

ideal ratio. Fisher’s publication initiated a long lasting debate during which

the majority of scientists spoke up in favor of Mendel until 2008 recent book

declared the end of the Mendel-Fisher controversy [38]. In chapter 5 we shall

discuss statistical laws and Mendel’s statistics in the light of present day

mathematical statistics.

The third example we mention here can be used to demonstrate the usual

weakness of people in estimating probabilities. Let your friends guess –

without calculating – how many persons you need in a group such that there

4We shall come back to the problem of independent events later when we introduce

current probability theory in section 2, which is based on set theory.

12 Peter Schuster

Table 1.1: Statistics of Gregor Mendel’ experiments with the garden

pea (pisum sativum). The results of two typical experiments with ten plants are

shown. In total Mendel analyzed 7324 seeds from 253 hybrid plants in the second

trial year, 5474 were round or roundish and 1850 angular wrinkled yielding a ratio

2.96:1. The color was recorded for 8023 seeds from 258 plants out of which 6022

were yellow and 2001 were green with a ratio of 3.01:1.

Form of seed Color of seeds

plants round angular ratio yellow green ratio

1 45 12 3.75 25 11 2.27

2 27 8 3.38 32 7 4.57

3 24 7 3.43 14 5 2.80

4 19 10 1.90 70 27 2.59

5 32 11 2.91 24 13 1.85

6 26 6 4.33 20 6 3.33

7 88 24 3.67 32 13 2.46

8 22 10 2.20 44 9 4.89

9 28 6 4.67 50 14 3.57

10 25 7 3.57 44 18 2.44

total 336 101 3.33 355 123 2.89

is fifty percent chance that at least two of them celebrate their birthday on

the same day – You will be surprised by the oddness of the answers! With

our knowledge on the gambling problem the probability is easy to calculate.

First we compute the negative event: all person celebrate their birthdays on

different days in the year – 365 days, no leap-year – and find for n people in

the group,5 .

q =365

365· 364

365· 363

365· . . . · 365− (n− 1)

365and p = 1− q .

5The expressions is obtained by the argument that the first person can choose his

birthday freely. The second person must not choose the same day and so he has 364

possible choices. For the third remain 363 choices and the nth person, ultimately, has

365− (n− 1) possibilities.


Figure 1.1: The birthday puzzle. The curve shows the probability p(n) that

two persons in a group of n people celebrate birthday on the same day of the year.

The function p(n) is shown in figure 1.1. For the above mentioned 50%

chance we need only 27 persons, with 41 people we have already more than

90% chance that two celebrate birthday one the same day; 57 yield more

than 99% and 70 persons exceed 99,9%.

The fourth and final example deals again with counterintuitive proba-

bilities: The coin toss game Penney Ante invented by Walter Penney [39].

Before a sufficiently long sequence of heads and tails is determined by flip-

ping each of two players chooses a sequence of n consecutive flips – commonly

n = 3 is applied and this leaves the choice of the eight triples shown in ta-

ble 1.2. The second player has the advantage of knowing the choice of the

first player. Then a sufficiently long sequence of coins flips is recorded un-

til both of the chosen triples have appeared in the sequence. The player

whose sequence appeared first has won. The advantage of the second player

is commonly largely underestimated when guessed without calculation. A

simple argument illustrates the disadvantage of player 1: Assume he had

chosen ’111’. If the second player chooses a triple starting with ’0’ the only

chances for player 1 to win are expressed by the sequences beginning ’111. . .

and they have a probability of p=1/8 leading to the odds 7 to 1 for player 2.

Eventually, we mention the optimal strategy for player 2: Take the first two

digits of the three-bit sequence of player 1 and precede it with the opposite

14 Peter Schuster

Table 1.2: Advantage of the second player in Penney’s game. Two players

choose two triples of digits one after the other, player 2 after player 1. Coin flipping

is played until the two triples appear. The player whose triple came first has won.

An optimally gambling player 2 (column 2) has the advantage shown in column 3.

Code: 1= head and 0= tail. The optimal strategy for player 2 is encoded by color

and boldface (see text).

Player’s choice Outcome

Player 1 Player 2 Odds in favor of player 2

111 011 7 to 1

110 011 3 to 1

101 110 2 to 1

100 110 2 to 1

011 001 2 to 1

010 001 2 to 1

001 100 3 to 1

000 100 7 to 1

of the symbol in the middle of the triple (The shifted pair is shown in red,

the switched symbol in bold in table 1.2).

Probability theory in its classical form is more than 300 years old. Not

accidentally the concept arose in thinking about gambling, which was con-

sidered as a domain of chance in contrast to rigorous science. It took indeed

rather long time before the concept of probability entered scientific thought

in the nineteenth century. The main obstacle for the acceptance of prob-

abilities in physics was the strong belief in determinism that has not been

overcome before the advent of quantum theory. Probabilistic concepts in

physics of last century were still based on deterministic thinking, although

the details of individual events were considered to be too numerous and too

complex to be accessible to calculation. It is worth mentioning that think-

ing in terms of probabilities entered biology earlier, already in the second

half of the nineteenth century through the reported works of Gregor Mendel


on genetic inheritance. The reason for this difference appears to lie in the

very nature of biology: small sample sizes are typical, most of the regular-

ities are probabilistic and become observable only through the application

of probability theory. Ironically, Mendel’s investigations and papers did not

attract a broad scientific audience before their rediscovery at the beginning

of the twentieth century. The scientific community in the second half of

the nineteenth century was simply not yet prepared for the acceptance of

probabilistic concepts.

Although classical probability theory can be applied successfully to a

great variety of problems, a more elaborate notion of probability that is

derived from set theory is advantageous and absolutely necessary for extrap-

olation to infinitely large sample size. Here we shall use the latter concept

because it is easily extended to probability measures on continuous variables

where numbers of sample points are not only infinite but also uncountable.

16 Peter Schuster

2. Probabilities, Random Variables, and

Densities

The development of set theory initiated by Georg Cantor and Richard Dedekind

in the eighteen seventieth provided a possibility to build the concept of prob-

ability on a firm basis that allows for an extension to certain families of

uncountable samples as they occur, for example, with continuous variables.

Present day probability theory thus can be understood as a convenient ex-

tension of the classical concept by means of set and measure theory. We start

by repeating a few indispensable notions and operations of set theory.

2.1 Sets and sample spaces

Sets are collections of objects with two restrictions: (i) Each object belongs

to one set cannot be a member of more sets and (ii) a member of a set must

not appear twice or more often. In other words, objects are assigned to

sets unambiguously. In application to probability theory we shall denote the

elementary objects by the small Greek letter omega, ω – if necessary with

various sub- and superscripts – and call them sample points or individual

results and the collection of all objects ω under consideration the sample

space is denoted by Ω with ω ∈ Ω. Events, A, are subsets of sample points

that fulfil some condition1

A =ω, ωk ∈ Ω : f(ω) = c

(2.1)

with ω = (ω1, ω2, . . .) being some set of individual results and f(ω) = c

encapsulates the condition on ensemble of the sample points ωk.

1What a condition means will become clear later. For the moment it is sufficient to

understand a condition as a function providing a restriction, which implies that not all

subsets of sample points belong to A.

17

18 Peter Schuster

Any partial collection of points is a subset of Ω. We shall be dealing

with fixed Ω and, for simplicity, often call these subsets of Ω just sets. There

are two extreme cases, the entire sample space Ω and the empty set, ∅.The number of points in some set S is called its size, |S|, and thus is a

nonnegative integer or ∞. In particular, the size of the empty set is |∅| = 0.

The unambiguous assignment of points to sets can be expressed by2

ω ∈ S exclusive or ω /∈ S .

Consider two sets A and B. If every point of A belongs to B, then A is

contained in B. A is a subset of B and B is a superset of A:

A ⊂ B and B ⊃ A .

Two sets are identical if the contain exactly the same points and then we

write A = B. In other words, A = B iff (if and only if) A ⊂ B and B ⊂ A.

The basic operations with sets are illustrated in figure 2.1. We briefly repeat

them here:

Complement. The complement of the set A is denoted by Ac and consists

of all points not belonging to A:3

Ac = ω|ω /∈ A . (2.2)

There are three evident relations which can be verified easily: (Ac)c = A,

Ω c = ∅, and ∅ c = Ω.

Union. The union of the two sets A and B, A∪B, is the set of points which

belong to at least one of the two sets:

A ∪ B = ω|ω ∈ A or ω ∈ B . (2.3)

2In order to be unambiguously clear we shall write or for and/or and exclusive or for

or in the strict sense.3Since we are considering only fixed sample sets Ω these points are uniquely defined.


aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaaaaaaaaaaaaaaaaaaaa

Figure 2.1: Some definitions and examples from set theory. Part a shows

the complement Ac of a set A in the sample space Ω. In part b we explain the

two basic operations union and intersection, A∪B and A∩B, respectively. Parts

c and d show the set-theoretic difference, A \ B and B \ A, and the symmetric

difference, A4B. In parts e and f we demonstrate that a vanishing intersection of

three sets does not imply pairwise disjoint sets.

Intersection. The intersection of the two sets A and B, A ∩ B, is the set

of points which belong to both sets (For short A ∩ B is sometimes written

AB):

A ∩B = ω|ω ∈ A and ω ∈ B . (2.4)

Unions and intersections can be executed in sequence and are also defined

20 Peter Schuster

for more than two sets, or even for an infinite number of sets:

⋃

n=1,...

An = A1 ∪ A2 ∪ · · · = ω|ω ∈ An for at least one value of n ,

⋂

n=1,...

An = A1 ∩ A2 ∩ · · · = ω|ω ∈ An for all values of n .

These relations are true because the commutative and the associative laws

are fulfilled by both operations, intersection and union:

A ∪ B = B ∪A , A ∩B = B ∩ A ;

(A ∪ B) ∪ C = A ∪ (B ∪ C) , (A ∩B) ∩ C = A ∩ (B ∩ C) .

Difference. The set A \ B is the set of points, which belong to A but not

to B:

A \B = A ∩Bc = ω|ω ∈ A and ω /∈ B . (2.5)

In case A ⊃ B we write A− B for A \ B and have A \ B = A− (A ∩ B) as

well as Ac = Ω− A.

Symmetric difference. The symmetric difference A∆B is the set of points

which belongs exactly to one of the two sets A and B. It is used in advanced

theory of sets and is symmetric as it fulfils the commutative law, A∆B =

B∆A:

A∆B = (A ∩Bc) ∪ (Ac ∩B) = (A \B) ∪ (B \ A) . (2.6)

Disjoint sets. Disjoint sets A and B have no points in common and hence

their intersection, A ∩ B, is empty. They fulfill the following relations:

A ∩ B = ∅ , A ⊂ Bc and B ⊂ Ac . (2.7)

A number of sets are disjoint only if they are pairwise disjoint. For three sets,

A, B and C, this requires A∩B = ∅, B ∩C = ∅, and C ∩A = ∅. When

two sets are disjoint the addition symbol is (sometimes) used for the union,

A+B for A∪B. Clearly we have always the decomposition: Ω = A+Ac.


Figure 2.2: Sizes of sample sets and countability. Finite, countably infinite,

and uncountable sets are distinguished. We show examples of every class. A set

is countably infinite when its elements can be assigned uniquely to the natural

numbers (1,2,3,. . .,n,. . .).

Sample spaces may contain finite or infinite numbers of sample points.

As shown in figure 2.2 it is important to distinguish further between different

classes of infinity: countable and uncountable numbers of points. The set of

rational numbers, for example, is a countably infinite since the numbers can

be labeled and assigned uniquely to the positive integers 1 < 2 < 3 < · · · <n < · · · . The set of real numbers cannot be ordered in such a way and hence

it is uncountable.

22 Peter Schuster

2.2 Probability measure on countable sample spaces

Although we are equipped now, in principle, with the tools of probability

theory, which shall enable us to handle uncountable sets under certain con-

ditions, the starting point pursued in this section will be chosen under the

assumptions that our sets are countable.

2.2.1 Probabilities on countable sample spaces

For countable sets it is straightforward to measure the size of sets by counting

the numbers of points they contain. The proportion

P (A) =|A||Ω| (2.8)

is identified as the probability of the event represented by the elements of

subset A. For another event holds, for example, P (B) = |B|/|Ω|. Calculating

the sum of the two probabilities, P (A) + P (B), requires some care since we

know only (figure 2.1):

|A| + |B| ≥ |A ∪ B| .

The excess of |A|+ |B| over the size of the union |A∪B| is precisely the size

of the intersection |A ∩B| and thus we find

|A| + |B| = |A ∪ B| + |A ∩ B|

or by division through the size of sample space Ω

P (A) + P (B) = P (A ∪ B) + P (A ∩B) .

Only in case the intersection is empty, A ∩ B = ∅, the two sets are disjoint

and their probabilities are additive, |A ∪ B| = |A|+ |B|, and hence

P (A+B) = P (A) + P (B) iff A ∩B = ∅ . (2.9)

It is important to memorize this condition for later use, because it represents

an implicitly made assumption for computing probabilities.


Now we can define a probability measure by means of the basic axioms

of probability theory (for alternative axioms in probability theory see, for

example [40, 41]):

A probability measure on the sample space Ω is a function of subsets of

Ω, P : S → P (S) or P (·) for short, which is defined by the three axioms:

(i) For every set A ⊂ Ω, the value of the probability measure is a nonneg-

ative number, P (A) ≥ 0 for all A,

(ii) the probability measure of the entire sample set – as a subset – is equal

to one, P (Ω) = 1, and

(iii) for any two disjoint subsets A and B, the value of the probability mea-

sure for the union, A ∪B = A+B, is equal to the sum of its value for

A and its value for B,

P (A ∪ B) = P (A+B) = P (A) + P (B) provided P (A ∩ B) = ∅ .

Condition (iii) implies that for any countable – eventually infinite – collection

of disjoint or non-overlapping sets, Ai (i = 1, 2, 3, . . .) with Ai ∩ Aj = ∅ for

all i 6= j, the relation called σ-additivity

P

(⋃

i

Ai

)=∑

i

P (Ai) or P

( ∞∑

k=1

Ak

)=

∞∑

k=1

P (Ak) (2.10)

holds. Clearly we have also P (Ac) = 1− P (A), P (A) = 1− P (Ac) ≤ 1, and

P (∅) = 0. For any two sets A ⊂ B we have P (A) ≤ P (B) and P (B − A) =

P (B) − P (A). For any two arbitrary sets A and B we can write a sum of

disjoint sets as follows

A ∪B = A + Ac ∩ B and

P (A ∪B) = P (A) + P (Ac ∩B) .

Since B ⊂ Ac ∩B we obtain P (A ∪B) ≤ P (A) + P (B).

The set of all subsets of Ω is the powerset Π(Ω) (figure 2.3). It contains

the empty set ∅, the sample space Ω and the subsets of Ω and this includes

24 Peter Schuster

Figure 2.3: The powerset. The powerset Π(Ω) is a set containing all subsets

of Ω including the empty set ∅ and Ω itself. The figure sketches the powerset of

three events A, B, and C.

the results of all set theoretic operations that were listed above. The relation

between the sample point ω, an event A, the sample space Ω and the powerset

Π(Ω) is illustrated by means of an example presented already in section 1.2

as Penney’s game: the repeated coin toss. Flipping a coin has two outcomes:

’0’ for head and ’1’ tail (see also Bernoulli process in subsection 2.7.5). The

sample points for flipping the coin n-times are binary n-tuples or strings, ω =

(ω1, ω2, . . . , ωn) with ωi ∈ 0, 1.4 It is useful to consider also infinite numbers

of repeats, in particular for computing limits n → ∞: ω = (ω1, ω2, . . .) =

(ωi)i∈N with ωi ∈ 0, 1. Then we are dealing with infinitely long binary

strings and the sample space Ω = 0, 1N is the space of all infinitely long

binary strings. It is countable as can be easily verified: Every binary string

represents the binary encoding of a natural number (including ’0’) Nk ∈ N0

and hence Ω is countable as the natural numbers are.

A subset of Ω will be called an event A when a probability measure

4There is a trivial but important distinction between strings or n-tuples and sets: In

a string the position of an element matters, whereas in a set it does not. The following

three sets are identical: 1, 2, 3 = 3, 1, 2 = 1, 2, 2, 3. In order to avoid ambiguities

string are written in (normal) parentheses and sets in curly brackets.


derived from axioms (i), (ii), and (iii) has been assigned. Commonly, one is

not interested in the full detail of a probabilistic result and events can be

easily adapted to lumping together sample points. We ask, for example, for

the probability A that n coin flips yield at least k-times tail (the score for

tail is 1):

A =

ω = (ω1, ω2, . . . , ωn) ∈ Ω :

n∑

i=1

ωi ≥ k

,

where the sample space is Ω = 0, 1n. The task is now to find a system

of events F that allows for a consistent assignment of a probability P (A)

for every event A. For countable sample spaces Ω the powerset Π(Ω) rep-

resents such a system F , we characterize P (A) as a probability measure

on(Ω,Π(Ω)

), and the further handling of probabilities as outlined below

is straightforward. In case of uncountable sample spaces Ω, however, the

powerset Π(Ω) is too large and a more sophisticated procedure is required

(section 2.3).

So far we have constructed and compared sets but not yet introduced

numbers for actual computations. In order to construct a probability measure

that is adaptable for numerical calculations on some countable sample space,

Ω = ω1, ω2, . . . , ωn, . . ., we assign a weight %n to every sample point ωn

subject to the conditions

∀ n : %n ≥ 0 ;∑

n

%n = 1 . (2.11)

Then for P (ωn) = %n ∀ n the following two equations

P (A) =∑

ω∈A%(ω) for A ∈ Π(Ω) and

%(ω) = P (ω) for ω ∈ Ω

(2.12)

represent a bijective relation between the probability measure P on(Ω,Π(Ω)

)

and the sequences % =(%(ω)

)ω∈Ω

in [0,1] with∑

ω∈Ω %(ω) = 1. Such a

sequence is called a (discrete) probability density.

The function %(ωn) = %n has to be estimated or determined empirically

because it is the result of factors lying outside mathematics or probability

26 Peter Schuster

Figure 2.4: Probabilities of throwing two dice. The probability of obtaining

two to twelve counts through throwing two perfect or fair dice are based on the

equal probability assumption for obtaining the individual faces of a single die.

The probability P (N) raises linearly from two to seven and then decreases linearly

between seven and twelve (P (N) is a discretized tent map) and the additivity

condition requires∑12

k=2 P (N = k) = 1.

theory. In physics and chemistry the correct assignment of probabilities has

to meet the conditions of the experimental setup. An example will make this

point clear: The fact whether a die is fair and shows all its six faces with equal

probability or it has been manipulated and shows the ’six’ more frequently

then the other numbers is a matter of physics and not mathematics. For many

purposes the discrete uniform distribution, UΩ, is applied: All results

ω ∈ Ω appear with equal probability and hence %(ω) = 1/|Ω|.With the assumption of uniform distribution UΩ we can measure the size

of sets by counting sample points as illustrated best by considering the throws


Figure 2.5: An ordered partial sum of a random variable. The sum Sn =∑n

k=1Xk represents the cumulative outcome of a series of events described by a

class of random variables, Xk. The series can be extended to +∞ and such a case

will be encountered, for example, with probability distributions. The ordering

criterion is not yet specified, it could be time t, for example.

of dice. For one die the sample space is Ω = 1, 2, 3, 4, 5, 6 and for the fair

die we make the assumption

P (k) =1

6; k = 1, 2, 3, 4, 5, 6 .

that all six outcomes corresponding to different faces of the die are equally

likely. Based on the assumption of UΩ we obtain the probabilities for the

outcome of two simultaneously thrown fair dice shown in figure 2.4. The

most likely outcome is a count of seven points because it has the largest

multiplicity, (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1).

2.2.2 Random variables and functions

For the definition of random variables on countable sets a probability triple

(Ω,Π(Ω), P ) is required: Ω contains the sample points or individual results,

28 Peter Schuster

the powerset Π(Ω) provides the events A as subsets, and P eventually rep-

resents the probability measure defined by equation (2.12). Based on such

a probability triple we define a random variable as a numerically valued

function X of ω on the domain of the entire sample space Ω,

ω ∈ Ω : ω → X (ω) . (2.13)

Random variables, X (ω) and Y(ω), can be subject to operations to yield

other random variables, such as

X (ω) + Y(ω) , X (ω)−Y(ω) , X (ω)Y(ω) , X (ω)/Y(ω) [Y(ω) 6= 0] ,

and, in particular, also any linear combination of random variables such as

aX (ω) + bY(ω) is a random variable too. Just as a function of a function is

still a function, a function of a random variable is a random variable,

ω ∈ Ω : ω → ϕ (X (ω),Y(ω)) = ϕ(X ,Y) .

Particularly important cases are the (partial) sums of n variables:

Sn(ω) = X1(ω) + . . . + Xn(ω) =n∑

k=1

Xk(ω) . (2.14)

Such a partial sum Sn could be, for example, the cumulative outcome of

n successive throws of a die.5Consider, for example, an ordered series of

events where the current cumulative outcome is given by the partial sum

Sn =∑n

k=1Xk as shown in figure 2.5. In principle, the series can be extended

to t→∞.

The ordered partial sum is step functions and their precise definitions

are hidden in the equations (2.13) and (2.14). Three definitions are possible

for the value of the function at the discontinuity. We present them for the

Heaviside step function

H(x) =

0 , if x < 0 ,

0, 12, 1 if x = 0 ,

1 , if x > 0 .

(2.15)

5The use of partial in this context implies that the sum does not cover the entire

sample space in the moment. Series of throws of dice, for example, could be continued in

the future.


Figure 2.6: The cumulative distribution function of fair dice. The cumula-

tive probability distribution function (cdf) or probability distribution is a mapping

from the sample space Ω onto the unit interval [0, 1] of R. It corresponds to the

ordered partial sum with the ordering parameter being the score determined by

the stochastic variable. The example shown deals with throwing fair dice: The

distribution for one die (black) consists of six steps of equal height at the scores

1, 2, . . . , 6. The second curve (red) is the probability of throwing two dice (fig-

ure 2.4).

The value ’0’ at x = 0 implies left hand continuity for H(x) and in terms

of a probability distribution would correspond to a definition P (X < x)

in equation (2.16), the value 12

implies that H(x) is neither right-hand nor

left-hand semi-differentiable at x = 0 but is useful in many applications that

make use of the inherent symmetry of the Heaviside function, for example the

relation H(x) =(1 + sgn(x)

)/2 where sgn(x) is the sign or signum function:

sgn(x)

−1 if x < 0 ,

0 if x = 0 ,

1 if x > 0 .

30 Peter Schuster

The functions in probability theory make use of the third definition deter-

mined by P (X ≤ x) or H(0) = 1 in case of the Heaviside function.

Right-hand continuity is an important definition in the conventional han-

dling of stochastic processes, for example for semimartigales (subsection 3.1.1).

Often the property of right-hand continuity with left limits is denoted as

cadlag , which is an acronym from French for “continue a droite, limites a

gauche”.

An important step function for the characterization of a discrete proba-

bility distribution is the cumulative distribution function (cdf, see also

subsection 2.2.3). It is a mapping from sample space into the real numbers

on the unit interval, (P (X ≤ x; Ω) ⇒ (FX (x); R : 0 ≤ FX (x) ≤ 1) defined

by

FX (x) = P (X ≤ x) with limx→−∞

FX (x) = 0 and limx→+∞

FX (x) = 1 . (2.16)

Two examples for throwing one die or two dice are shown in figure 2.6. The

distribution function is defined for the entire x-axis, x ∈ R, but cannot be

integrated by conventional Riemann integration. The cumulative distribution

function and the partial sums of random variables, however, are continuous

and differentiable on the right-hand side of the steps and therefore they are

Stieltjes-Lebesgue integrable (see subsection 2.3.5).

The probability mass function (pmf) is also mapping from sample

space into the real numbers and gives the probability that a discrete random

variable X attains exactly some value x. We assume that X is a discrete

random variable on the sample space Ω, X : Ω→ R, and then we define the

probability mass function as a mapping onto the unit interval, fX : R→ [0, 1],

by

fX (x) = P (X = x) = P(s ∈ Ω : X (s) = x

). (2.17)

Sometimes it is useful to be able to treat a discrete probability distribution

as if it were continuous. The function fX (x) is defined therefore for all real

numbers, x ∈ R including those outside the sample set. Then we have:

fX (x) = 0 ∀x /∈ X (Ω). Figure 2.7 shows the probability mass function of

fair dice corresponding to the cumulative distributions in figure 2.6. For


Figure 2.7: Probability mass function of fair dice. The probability mass

function (pmf), fX (x), belong to the two probability distributions FX (x) shown

in figure 2.6. The upper part of the figure represents the scores x obtained with

one die. The pmf is zero everywhere on the x-axis except at a set of points,

x = 1, 2, 3, 4, 5, 6, of measure zero where it adopts the value 1/6 (black). The

lower part contains the probability mass function for simultaneously throwing two

dice (red, see also figure 2.4). The maximal probability value is obtained for the

score x = 7.

throwing one die it consists of six peaks, fX (k) = 1/6 with k = 1, 2, . . . , 6

and has the value fX (x) = 0 everywhere else (x 6= k). In the case of two dice

the probability mass function corresponds to the discrete probabilities shown

32 Peter Schuster

in figure 2.4. It contains, in essence, the same information as the cumulative

distribution function or the listing of discrete probabilities.

2.2.3 Probabilities on intervals

Now, we write down sets, which are defined by the range of a random variable

on the closed interval [a, b],6

a ≤ X ≤ b = ω| a ≤ X (ω) ≤ b ,

and define their probabilities by P (a ≤ X ≤ b). More generally, the set

A of sample points can be defined by the open interval ]a, b[, the half-open

intervals [a, b[ and ]a, b], the infinite intervals, ]−∞, b[ and ]a,+∞[, as well

as the set of real numbers, R =]−∞,+∞[. When A is reduced to the single

point x, it is called the singleton x:

P (X = x) = P (X ∈ x) .

For countable, finite or countably infinite, sample spaces Ω the exact range

of X is just the set of the real numbers vi below:

VX =⋃

ω∈Ω

X (ω) = v1, v2, . . . , vn, . . . .

Now we introduce probabilities

pn = P (X = vn) , vn ∈ VX ,

and clearly we have P (X = x) = 0 if x /∈ VX .

Knowledge of all pn-values implies full information on all probabilities

concerning the random variable X :

P (a ≤ X ≤ b) =∑

a≤vn≤bpn or, in general, P (X ∈ A) =

∑

vn∈Apn . (2.18)

6The notation we are applying here uses square brackets, ’[’·’]’, for closed intervals,

open square brackets, ’]’·’[’, for open intervals, ’]’·’]’ and ’[’·’[’ for left-hand or right-hand

half-open intervals, respectively. An alternative less common notation uses parentheses

instead of open square brackets, e.g., ’(’·’)’.


An especially important case, which has been discussed already in the pre-

vious subsection 2.2.2, is obtained when A is the infinite interval ] −∞, x].The function x→ FX (x), defined on R and in particular on the unit interval

[0, 1], 0 ≤ FX (x) ≤ 1, is the cumulative distribution function of X :

FX (x) = P (X ≤ x) =∑

vn≤xpn . (2.16’)

It fulfils several easy to verify properties:

FX (a) − FX (b) = P (X ≤ b) − P (X ≤ a) = P (a < X ≤ b) ,

P (X = x) = limε→0

(FX (x+ ε) − FX (x− ε)

), and

P (a < X < b) = limε→0

(FX (b− ε) − FX (a+ ε)

).

An important special case is an integer valued positive random variable Xcorresponding to a countably infinite sample space which is the set of non-

negative integers: Ω = N0 = 0, 1, 2, . . . , n, . . . with

pn = P (X = n) , n ∈ N0 and FX (x) =∑

0≤n≤xpn . (2.19)

Integer valued random variables will be used, for example, for modeling par-

ticle numbers in stochastic processes.

Two (or more) random variables,7 X and Y , form a random vector

(X ,Y), which is defined by the probability

P (X = xi,Y = yj) = p(xi, yj) . (2.20)

These probabilities constitute the joint probability distribution of the

random vector. By summation over one variable we obtain the probabilities

for the two marginal distributions:

P (X = xi) =∑

yj

p(xi, yj) = p(xi, ∗) and

P (Y = yj) =∑

xi

p(xi, yj) = p(∗, yj) ,(2.21)

of X and Y , respectively.

7For simplicity we restrict ourselves to the two variable case here. The extension to

any finite number of variables is straightforward.

34 Peter Schuster

2.3 Probability measure on uncountable sample spaces

A new and more difficult situation arises when the sample space Ω is un-

countable. Then problems with measurability arise from the impossibility

to assign a probability to every subset of Ω. The general task to develop

measures for uncountable sets that are based on countably infinite subsets is

highly demanding and requires advanced mathematics. For the probability

concept we are using here, however, a restriction of the measure to sets of a

certain family called Borel sets or Borel fields, F , and the introduction of

the Lebesgue measure is sufficient.

2.3.1 Existence of non-measurable sets

The notion that is essential for the construction of a probability measure for

an uncountable set is again the powerset Π(Ω), which is defined as the set of

all subsets of Ω (figure 2.3). It would seem straightforward to proceed exactly

as we did in the case of countability, the powerset, however, is too large since

it contains uncountably many subsets. Giuseppe Vitali [42] provided a proof

by example that no mapping P : Π(Ω) → [0, 1] for the infinitely repeated

coin flip, Ω = 0, 1N, exists, which fulfils the three indispensable properties

for probabilities [11, p.9,10]:

(N) normalization: P (Ω) = 1 ,

(A) σ-additivity: for pairwise disjoint events A1, A2, . . . ⊂ Ω holds

P

(⋃

i≥1

Ai

)

=∑

i≥1

P (Ai) ,

(I) invariance: For all A ⊂ Ω and k ≥ 1 holds P (TkA) = P (A), where Tk

is an operator that inverts the outcome of the n-th toss,

Tk : ω = (ω1, . . . , ωk−1, ωk, ωk+1 . . .)→ (ω1, . . . , ωk−1, 1− ωk, ωk+1 . . .),

and TkA = Tk(ω) : ω ∈ A is the image of A under the operation Tk.


The first two conditions are the criteria for probability measures and the in-

variance condition (I) is specific for coin flipping and encapsulates the prop-

erties derived from the uniform distribution, UΩ. In general, such a relation

will always exist with the details depending on the stochastic process – coin

flip – and its implementation in the real world – uniform distribution – be

it an experimental setup, a census in sociology or the rules of gambling. We

dispense first with the details of the proof and mention only the nature of

the constructed contradiction:

1 = P (Ω) =∑

S∈SP (TSA) =

∑

S∈SP (A) (2.22)

cannot be fulfilled for infinitely large sequences of coin tosses. There are

sequences S where all values P (A) or P (TSA) are the same and infinite

summation of the same number yields either 0 or ∞ but never 1.

The construction of the proof starts from finite sets of all subsets of N:

S = S ⊂ N : |S| < ∞, S is countable as it is a union of countably many

finite sets S ⊂ N : maxS = m. For S = k1, . . . , kn ∈ S we define

TS = Πk∈STk = Tk1 . . . Tkn and an equivalence relation ’≡’ on Ω: ω ≡ ω iff

ωk = ωk for sufficiently large k. The axiom of choice guarantees the existence

of a set A ⊂ Ω, which contains exactly one element of each equivalence class.

Then we have

(i) for each ω ∈ Ω there exists an ω ∈ A with ω ≡ ω, and therefore an

S ∈ S with ω = TSω ∈ TSA and hence Ω =⋃S∈S TSA, and

(ii) the sets (TS)S∈S are pairwise disjoint as can be easily verified: Assume

that TSA∩ TSA 6= ∅ for S, S ∈ S then there exist a pair ω, ω ∈ A with

TSω = TS and consequently ω ≡ TSω = TS ≡ ω, and according to the

choice of A we have ω = ω and consequently S = S.

Application of the properties (N), (A) and (I) on P and taking m to infinity

yields equation (2.22) and completes the proof.

Accordingly, the proof of Vitali’s theorem demonstrates the existence of

a non-measurable subset of the real numbers called Vitali sets – precisely

36 Peter Schuster

Figure 2.8: Conceptual levels of sets in probability theory. The lowest

level is the sample space Ω, it contains the sample points or individual results ω as

elements, and events A are subsets of Ω: ω ∈ Ω and A ⊂ Ω. The next higher level

is the powerset Π(Ω). Events A are its elements and event systems F constitute its

subsets: A ∈ Π(Ω) and F ⊂ Π(Ω). The highest level finally is the power powerset

Π(Π(Ω)

)that houses event systems F as elements: F ∈ Π

(Π(Ω)

).

a subset of the real numbers that is not Lebesgue measurable (see subsec-

tion 2.3.2). The problem to be solved is a reduction of the powerset to an

event system F such that the subsets causing the lack of countability are

eliminated.

2.3.2 Borel σ-algebra and Lebesgue measure

Before we define minimal requirements for an event system F the three level

of sets in set theory that are relevant for our construction are considered (fig-

ure 2.8). The objects on the lowest level are the sample points corresponding

to individual results, ω ∈ Ω. The next higher level is the powerset Π(Ω)

housing the events A ∈ Π(Ω). The elements of the powerset are subsets

of the sample space A ⊂ Ω. To illustrate the role of event systems F we

need one more higher level, the powerset of the powerset, Π(Π(Ω)

): Event

systems F are elements of the power powerset, F ∈ Π(Π(Ω)

)and subsets on

the powerset, F ⊂ Π(Ω).


Minimal requirements for an event system F are summarized in the

following definition for a σ-algebra on Ω with Ω 6= ∅ and F ⊂ Π(Ω):

(a) Ω ∈ F ,

(b) A ∈ F =⇒ Ac := Ω\A ∈ F , and

(c) A1, A2, . . . ∈ F =⇒ ⋃i≥1Ai ∈ F .

Condition (b) defines the logical negation as expressed by the difference be-

tween the entire sample space and the event A, and condition (c) represents

the logical ’or’ operation. The pair (Ω,F) is called an event space or a

measurable space. From the three properties (a) to (c) follow other prop-

erties. The intersection, for example, is the complement of the union of the

complements A ∩B = (Ac ∪Bc)c ∈ F . The consideration is easily extended

to the intersection of countable many subsets of F that belongs also to F .

Thus, a σ-algebra is closed under the operations ’c’, ’∪’ and ’∩’.8 Trivial

examples of σ-algebras are ∅,Ω, ∅, A, Ac,Ω or the family of all subsets.

A construction principle for σ-algebras makes stats out form an event

system G ⊂ Π(Ω) (Ω 6= ∅) that is sufficiently small and arbitrary. Then,

there exists exactly one smallest σ-algebra F = σ(G) in Ω with F ⊃ G, and

F is called the σ-algebra induced by G or G is the generator of F . Here are

three important examples:

(i) the powerset with Ω being countable where G =ω : ω ∈ Ω

is the

system of the subsets of Ω containing a single element, σ(G) = Π(Ω),

eachA ∈ Π(Ω) is countable, andA =⋃ω∈Aω ∈ σ(G) (see section 2.2),

(ii) the Borel σ-algebra Bn (see below), and

(iii) the product σ-algebra for sample spaces Ω that are Cartesian products

of sets Ek, Ω =∏

k∈I Ek where I is a set of indices with I 6= ∅. We

assume Ek is a σ-algebra on Ek with Xk : Ω→ Ek being the projection

onto the k-th coordinate and the generator G = X−1k Ak : k ∈ I, Ak ∈

8A family of sets is called closed under an operation when the operation can be applied

a countable number of times without producing a set that lies outside the family.

38 Peter Schuster

Ek is the system of all sets in Ω, which are determined by an event

on a single coordinate. Then,⊗

k∈I Ek := σ(G) is called the product

σ-algebra of the sets Ek on Ω. In the important case of equivalent

Cartesian coordinates, Ek = E and Ek = E for all k ∈ I the short-hand

notion E⊗I is common. The Borel σ-algebra on Rn is represented by

the n-dimensional product σ-algebra of the Borel σ-algebra B = B1 on

R, or Bn = B⊗n.

All three examples are required for a deeper understanding of probability

measures. The power set (i) provides the frame for discrete sample spaces the

Borel σ-algebra (ii) to be discussed below sets the stage for one-dimensional

continuous sample spaces. and (iii) the product σ-algebra represents the

natural extension to the n-dimensional Cartesian space.

For the construction of the Borelian σ-algebra9 we define a generator

representing the set of all compact cuboids in n-dimensional Cartesian space,

Ω = Rn, which have rational corners,

G =

n∏

k=1

[ak, bk] : ak < bk; ak, bk ∈ Q

(2.23)

where Q is the set of all rational numbers. The σ-algebra induced by this

generator is denoted as the Borelian σ-algebra, Bn := σ(G) on Rn and each

A ∈ Bn is a Borel set (for n = 1 one commonly writes B instead of B1).

Five properties of the Borel σ-algebra are useful for application and for

imagination of its enormous size.

(i) Each open set ’]..[’A ⊂ Rn is Borelian. Every ω ∈ A has a neighborhood

Q ∈ G with Q ⊂ A and therefore we have A =⋃Q∈G,Q⊂AQ representing

a union of countable many sets in Bn which follows from condition (c)

of σ-algebras.

(ii) Each closed set ’[..]’ A ⊂ Rn is Borelian since Ac is open and Borelian

according to item (i).

9Sometimes a Borel σ-algebra is also called a Borel field.


(iii) The σ-algebra Bn cannot be described in a constructive way, because is

consists of much more than the union of cuboids and their complements.

In order to create Bn the operation of adding complements and count-

able unions has to be repeated as often as there are countable ordinal

numbers (and this leads to uncountable many times [43, pp.24, 29]). It

is sufficient to memorize for practical purposes that Bn covers almost

all sets in Rn – but not all of them.

(iv) The Borelian σ-algebra B on R is generated not only by the system

of compact sets (2.23) but also by the system of closed left-hand open

infinite intervals:

G = ]−∞, c]; c ∈ R . (2.23’)

Condition (b) requires G ⊂ B and – because of minimality of σ(G) –

σ(G) ⊂ B too. Alternatively, σ(G) contains all left-open intervals since

]a, b] =] − ∞, b] \ ] − ∞, a] and all compact or closed intervals since

[a, b] =⋂n≤1 ]a − 1

n, b] and accordingly also the σ-algebra B generated

from these intervals (2.23). In full analogy B is also generated from all

open left-unbounded, from all closed and open right-unbounded inter-

vals.

(v) The event system BnΩ = A∩Ω : A ∈ Bn on Ω ⊂ Rn, Ω 6= ∅ represents

a σ-algebra on ω, which is denoted as the Borelian σ-algebra on Ω.

All intervals discussed in items (i) to (iv) are (Lebesgue) measurable while

other sets are not.

The Lebesgue measure is the conventional mean of assigning lengths,

areas, volumes to subsets of three-dimensional Euclidean space and in formal

Cartesian spaces to objects with higher dimensional volumes. Sets to which

generalized volumes10 can be assigned are called Lebesgue measurable and

the measure or the volume of such a set A is denoted by λ(A). The Lebesgue

measure on Rn has the following properties:11

10We generalize volume here to arbitrary dimension n: The generalized volume for

n = 1 is a length, for n = 2 an area, for n = 3 a (conventional) volume and for arbitrary

dimension n a cuboid in n-dimensional space.11Slightly modified from Wikipedia: Lebesgue measure, version March 04, 2011.

40 Peter Schuster

(1) If A is a Lebesgue measurable set, then λ(A) ≥ 0.

(2) If A is a Cartesian product of intervals, I1 ⊗ I2 ⊗ . . . ⊗ In, then A is

Lebesgue measurable and λ(A) = |I1| · |I2| · . . . · |In|.

(3) If A is Lebesgue measurable, its complement Ac is so too.

(4) If A is a disjoint union of countably many disjoint Lebesgue measurable

sets, A =⋃k Ak, then A is itself Lebesgue measurable and

λ(A) =∑

k λ(Ak).

(5) If A and B are Lebesgue measurable and A ⊂ B, then holds

λ(A) ≤ λ(B).

(6) Countable unions and countable intersections of Lebesgue measurable

sets are Lebesgue measurable.12

(7) If A is an open or closed subset or Borel set of Rn, then A is Lebesgue

measurable.

(8) The Lebesgue measure is strictly positive on non-empty open sets, and

so its support is the entire Rn.

(9) If A is a Lebesgue measurable set with λ(A) = 0, called a null set, then

every subset of A is also a null set, and every subset of A is measurable.

(10) If A is Lebesgue measurable and r is an element of Rn, then the transla-

tion of A by r that is defined by A+r = a+r : a ∈ A is also Lebesgue

measurable and has the same measure as A.

(11) If A is Lebesgue measurable and δ > 0, then the dilation of A by δ

defined by δA = δr : r ∈ A is also Lebesgue measurable and has

measure δnλ(A).

12This is not a consequence of items (3) and (4): A family of sets, which is closed

under complements and countable disjoint unions need not be closed under (non-disjoint)

countable unions, for example the set∅, 1, 2, 1, 3, 2, 4, 3, 4, 1, 2, 3, 4

.


(12) In generalization of items (10) and (11), if L is a linear transformation

and A is a measurable subset of Rn, then T (A) is also measurable and

has the measure | det(T )| λ(A).

All twelve items listed above can be succinctly summarized in one lemma:

The Lebesgue measurable sets form a σ-algebra on Rn containing

all products of intervals, and λ is the unique complete translation-

invariant measure on that σ-algebra with

λ([0, 1]⊗ [0, 1]⊗ . . .⊗ [0, 1]

)= 1.

We conclude this subsection on Borel algebra and Lebesgue measure by men-

tioning a few characteristic and illustrative examples:

• Any closed interval [a, b] of real numbers is Lebesgue measurable, and

its Lebesgue measure is the length b − a. The open interval ]a, b[ has

the same measure, since the difference between the two sets consists of

the two endpoint a and b only and has measure zero.

• Any Cartesian product of intervals [a, b] and [c, d] is Lebesgue mea-

surable and its Lebesgue measure is (b − a) · (d − c) the area of the

corresponding rectangle.

• The Lebesgue measure of the set of rational numbers in an interval of

the line is zero, although this set is dense in the interval.

• The Cantor set13 is an example of an uncountable set that has Lebesgue

measure zero.

• Vitali sets are examples of sets that are not measurable with respect

to the Lebesgue measure.

In the forthcoming sections we make use of the fact that the continuous

sets on the real axes become countable and Lebesgue measurable if rational

numbers are chosen as beginnings and end points of intervals. Hence, we can

work with real numbers with almost no restriction for practical purposes.

13The Cantor set is generated from the interval [0, 1] through consecutively taking out

the open middle third: [0, 1] → [0, 13 ] ∪ [ 23 , 1] → [0, 1

9 ] ∪ [ 29 ,13 ] ∪ [ 23 ,

79 ] ∪ [89 , 1] → . . .. An

explicit formula for the set is: C = [0, 1]\⋃∞m=1

⋃(3m−1−1)k=0

]3k+13m , 3k+2

3m

[.

42 Peter Schuster

2.3.3 Random variables on uncountable sets

Sufficient for dealing with random variables on uncountable sets is a prob-

ability triple (Ω,F , P ). The sets in F being the Borel σ-algebra are mea-

surable and they alone have probabilities. We are now in the position to

handle probabilities on uncountable sets:

ω|X (ω) ≤ x ∈ F and P (X ≤ x) =|X (ω) ≤ x|

|Ω| (2.24a)

a < X ≤ b = X ≤ b − X ≤ a ∈ F with a < b (2.24b)

P (a < X ≤ b) =|a < X ≤ b|

|Ω| = F (b) − F (a) . (2.24c)

Equation (2.24a) contains the definition of a real-valued function X that

is called a random variable iff it fulfils P (X ≤ x) for any real number x,

equation (2.24b) is valid since F is closed under difference, and finally equa-

tion (2.24c) provides the basis for defining and handling probabilities on

uncountable sets. The three equations (2.24) together constitute the basis of

the probability concept on uncountable sample spaces that will be applied

throughout this course.

Random variables on uncountable sets Ω are commonly characterized by

probability density functions (pdf). The probability density function –

or density for short – is the continuous analogue to the (discrete) probability

mass function (pmf). A density is a function f on R =]−∞,+∞[ , u→ f(u),

which satisfies the two conditions:

(i) ∀u : f(u) ≥ 0 , and

(ii)

∫ +∞

−∞f(u) du = 1 .

Now we can define a class of random variables14 on general sample spaces: Xis a function on Ω : ω → X (ω) whose probabilities are prescribed by means

of a density function f(u). For any interval [a, b] the probability is given by

P (a ≤ X ≤ b) =

∫ b

a

f(u) du . (2.25)

14Random variables having a density are often called continuous in order to distinguish

them from discrete random variables defined on countable sample spaces.


If A is the union of not necessarily disjoint intervals (some of which may be

even infinite), the probability can be derived in general from the density

P (X ∈ A) =

∫

A

f(u) du ,

in particular, A can be split in disjoint intervals, A =⋃kj=1[aj, bj ] and then

the integral can be rewritten as

∫

A

f(u) du =

k∑

j=1

∫ bj

aj

f(u) du .

For A = ]−∞, x] we derive the (cumulative probability) distribution func-

tion F (x) of the continuous random variable X

F (x) = P (X ≤ x) =

∫ x

−∞f(u) du .

If f is continuous then it is the derivative of F as follows from the fundamental

theorem of calculus

F ′(x) =dF (x)

dx= f(x).

If the density f is not continuous everywhere, the relation is still true for

every x at which f is continuous.

If the random variable X has a density, then we find by setting a = b = x

P (X = x) =

∫ x

x

f(u) du = 0

reflecting the trivial geometric result that every line segment has zero area.

It seems somewhat paradoxical that X (ω) must be some number for every ω

whereas any given number has probability zero. The paradox can be resolved

by looking at countable and uncountable sets in more depth.

Extension to two variables X and Y , forming a random vector (X ,Y),

yields the joint probability distribution f

P (X ≤ x,Y ≤ y) =

∫ x

−∞

∫ y

−∞f(u, v) du dv . (2.26)

Again we have to restrict the definition of probabilities to Borel sets S which

could be, for example, polygons filling the two-dimensional plane,

P((X ,Y) ∈ S

)=

∫∫

S

f(u, v) du dv .

44 Peter Schuster

Figure 2.9: Discretization of a probability density. The segment [x1, xm] on

the u-axis is divided up into m− 1 not necessarily equal intervals and elementary

probabilities are obtained by integration.

The joint density function f satisfies the following conditions:

(i) ∀(u, v) : f(u, v) ≥ 0 , and

(ii)

∫ +∞

−∞

∫ +∞

−∞f(u, v) du dv = 1 .

Condition (ii) implies that f is integrable over the whole plane. As in the

discrete case we may compute the probabilities of the individual variable from

marginal density functions, u→ f(u, ∗) and v → f(∗, v), respectively:

P (X ≤ x) =

∫ x

−∞f(u, ∗) du where f(u, ∗) =

∫ +∞

−∞f(u, v) dv ,

P (Y ≤ y) =

∫ y

−∞f(∗, v) dv where f(∗, v) =

∫ +∞

−∞f(u, v) du .

(2.27)

In the most general case we may also define a joint distribution function

F of (X ,Y) by

F (x, y) = P (X ≤ x,Y ≤ y) for all (x, y) ,


Table 2.1: Comparison of the formalism of probability theory on countable and

uncountable sample spaces.

Countable case Uncountable case

Range vn, n = 1, 2, . . . −∞ < u < +∞

Probability element pn f(u) du = dF (u)

P (x ≤ X ≤ b)∑

a≤vn≤b pn∫ baf(u) du

P (X ≤ x) = F (x)∑

vn≤x pn∫ x−∞ f(u) du

E(X )∑

n pn vn∫∞−∞ u f(u) du

proviso∑

n pn |vn| <∞∫∞−∞ |u| f(u) du <∞

and obtain the marginal distribution functions as limits:

limy→∞

F (x, y) = F (x,∞) = P (X ≤ x,Y <∞) = P (X ≤ x) and

limx→∞

F (x, y) = F (∞, y) = P (X <∞,Y ≤ y) = P (Y ≤ y) .(2.28)

We note that the relations X < ∞ and Y < ∞ put no restrictions on the

variables X and Y .

Let us finally consider the process of discretization of a density function

in order to yield a set of elementary probabilities. The x-axis is divided up

into m+1 pieces (figure 2.9), not necessarily equal and not necessarily small,

and denote the piece of the integral between xn and xn+1 by

pn =

∫ xn+1

xn

f(u) du , 0 ≤ n ≤ m . (2.29)

When x0 = −∞ and xm+1 = +∞ we have

∀n : pn ≥ 0 and∑

n

pn = 1 .

The partition is not finite but countable, provided we label the intervals

suitably, for example . . . , p−2, p−1, p0, p1, p2, . . . . Now we consider a random

46 Peter Schuster

variable Y such that

P (Y = xn) = pn , (2.29’)

where we may replace xn by any number in the subinterval [xn, xn+1]. The

random variable Y can be interpreted as the discrete analogue of the random

variable X .

We end by presenting a comparison between probability measures on

countable and uncountable sample spaces where the latter are based on a

probability density f(u) in table 2.1.15

2.3.4 Limits of series of random variables

Limits of sequences are required for problems convergence of and for approx-

imations to random variables. The problem of limits arises because there is

ambiguity in the definition of limits.

A sequence of random variables, Xn, is defined on a probability space Ω

and it is assumed to have the limit

X = limn→∞

Xn . (2.30)

The probability space Ω, we assume now, has elements ω which have a prob-

ability density p(ω). Four different definitions of the limit are common in

probability theory [6, pp.40,41].

Almost certain limit. The series Xn converges almost certainly to X if for

all ω except a set of probability zero

X (ω) = limn→∞

Xn(ω) . (2.31)

is fulfilled and each realization of Xn converges to X .

Limit in the mean. The limit in the mean or the mean square limit of a

series requires that the mean square deviation of Xn(ω) from X (ω) vanishes

in the limit and the condition is

limn→∞

∫

Ω

dω p(ω)(Xn(ω)−X (ω)

)2

≡ limn→∞

⟨(Xn −X )2

⟩= 0 . (2.32)

15Expectation values E(X ) and higher moments of probability distributions are dis-

cussed in section 2.5.


The mean square limit is the standard limit in Hilbert space theory and it is

commonly used in quantum mechanics.

Stochastic limit. The limit in probability also called the stochastic limit

fulfils the condition: X is the stochastic limit if for any ε > 0 the relation

limn→∞

P (|Xn − X| > ε) = 0 . (2.33)

Limit in distribution. Probability theory uses also a weaker form of conver-

gence than the previous three limits, the limit in distribution, which requires

that for any continuous and bounded function f(x) the relation

limn→∞

〈f(Xn)〉 = 〈f(X )〉 (2.34)

holds. This limit is particularly useful for characteristic functions, f(x) =

exp(ı.ıxs): If two characteristic functions approach each other, then the prob-

ability density of Xn converges to that of X .

2.3.5 Stieljes and Lebesgue integration

This subsection provides a short repetition of some generalizations of the

conventional Riemann integral, which are important in probability theory.

We start with a sketch comparing the Riemann and the Lebesgue approach

to integration presented in figure 2.10. One difference between the two inte-

gration methods for a non-negative function – like the probability functions

– can be visualized in three dimensional space: The volume below a surface

given by the non-negative function g(x, y) is measured by summation of the

volumes of cuboids with squares of edge length ∆d whereas the Lebesgue in-

tegral is summing the volumes of layers with thickness ∆d between constant

level sets.

The Stieltjes integral is an important generalization of Riemannian

integration ∫ b

a

g(x) dh(x) . (2.35)

Herein g(x) is the integrand and h(x) is the integrator, and the conventional

Riemann integral is obtained for h(x) = x. The integrator can be visualized

48 Peter Schuster

Figure 2.10: Comparison of Riemann and Lebesgue integrals. In the con-

ventional Riemannian-Darboux integrationa the integrand is embedded between

an upper sum (light blue) and a lower sum (blue) of rectangles. The integral ex-

ists iff the upper sum and the lower sum converge to the integrand in the limit

∆d→ 0. The Lebesgue integral can be visualized as an approach to calculating the

area enclosed by the x-axis and the integrand through partitioning into horizontal

stripes (red) and considering the limit ∆d→ 0. The definite integral∫ ba g(x) dx is

confining the integrand to a closed interval: [a, b] or a ≤ x ≤ b.a The concept of representing the integral by the convergence of two sums is due to the

French mathematician Gaston Darboux. A function is Darboux integrable iff it is Riemann

integrable, and the values of the Riemann and the Darboux integral are equal in case they

exist.

best as a weighting function for the integrand. In case g(x) and h(x) are con-

tinuous and continuously differentiable the Stieltjes integral can be resolved


by partial integration:∫ b

a

g(x) dh(x) =

∫ b

a

g(x)dh(x)

dxdx =

=(g(x)h(x)

) ∣∣∣b

x=a−∫ b

a

dg(x)

dxh(x) dx =

= g(b)h(b)− g(a)h(a) −∫ b

a

dg(x)

dxh(x) dx .

The integrator F (x) may also be a step function: For g(x) being continuous

and F (x) making jumps at the points x1, . . . , xn ∈ ]a, b [ with the heights

∆F1, . . . ,∆Fn ∈ R, and∑n

i=1 ∆Fn ≤ 1, the Stieltjes integral is of the form

∫ b

a

g(x) dF (x) =n∑

i=1

g(xi) ∆Fi . (2.36)

With g(x) = 1 and in the limit lima→−∞ the integral becomes identical with

the (discrete) cumulative probability distribution function (cdf).

Riemann-Stieltjes integration is used in probability theory to compute,

for example, moments of probability densities (section 2.5). If F (x) is the

cumulative probability distribution of a random variable X the expected

value (see section 2.5) for any function g(X ) is obtained from

E(g(X )

)=

∫ +∞

−∞g(x) dF (x) =

∑

i

g(xi) ∆Fi,

and this is the equation for the discrete case. If the random variable X has a

probability density f(x) = dF (x)/dx with respect to the Lebesgue measure

continuous integration can be used

E(g(X )

)=

∫ +∞

−∞g(x) f(x) dx .

Important special cases are the moments: E(X n) =∫ +∞−∞ xn dF (x).

Lebesgue theory of integration assumes the existence of a probability

space defined by the triple (Ω,F , µ) representing sample space Ω, a σ-algebra

F of subsets A ∈ Ω and a (non-negative) probability measure µ satisfying

µ(Ω) = 1. Lebesgue integrals are defined for measurable functions g fulfilling

g−1([a, b]

)∈ Ω for all a < b . (2.37)

50 Peter Schuster

This condition is equivalent to the requirement that the pre-image of any

Borel subset [a, b] of R is an element of the event system B. The set of mea-

surable functions is closed under algebraic operation and also closed under

certain pointwise sequential limits like supk∈N

gk, lim infk∈N

gk or lim supk∈N

gk, which

are measurable if the sequence of functions (gk)k∈N contains only measurable

functions.

The construction of an integral∫Ωg dµ =

∫Ωg(x)µ(dx) is done in steps

and we begin with the indicator function:

1A(x) =

1 iff x ∈ A0 otherwise

, (2.38)

which provides a possibility to define the integral over A ∈ Bn by∫

A

g(x) dx :=

∫1A(x) g(x) dx

and which assigns a volume to Lebesgue measurable sets by setting g ≡ 1∫

1A dµ = µ(A)

and which is the Lebesgue measure µ(A) = λ(A) for a mapping λ : B → R.

Simple functions are finite linear combinations of indicator functions

g =∑

j αj 1Aj. They are measurable if the coefficients αj are real numbers

and the sets Aj are measurable subsets of Ω. For non-negative coefficients

αj the linearity property of the integral leads to a measure for non-negative

simple functions:

∫ (∑

j

αj 1Aj

)dµ =

∑

j

αj

∫1Aj

dµ =∑

j

αj µ(Aj) .

Often a simple function can be written in several ways as a linear combination

of indicator functions and then the value of the integral will always be the

same. Sometimes some care is needed in the construction of a real-valued

simple function g =∑

j αj1Ajin order to avoid undefined expressions of the

kind ∞−∞. Choosing αi = 0 implies that αiµ(Ai) = 0 because 0 · ∞ = 0

by convention in measure theory.


An arbitrary non-negative function g : (Ω,F , µ) → (R+,B, λ) is mea-

surable iff there exists a sequence of simple functions (gk)k∈N that converges

pointwise16 and growing monotonously to g. The Lebesgue integral of a

non-negative and measurable function g is defined by∫

Ω

g dµ = limk→∞

∫

Ω

gk dµ (2.39)

with gk being simple functions that converge pointwise and monotonously

towards g. The limit is independent of the particular choice of the functions

gk. Such a sequence of simple functions is easily visualized, for example,

by the bands below the function g(x) in figure 2.10: The band widths ∆d

decrease and converge to zero as the index increases, k →∞.

The extension to general functions with positive and negative value do-

mains is straightforward. There is one important major difference between

Riemann and Lebesgue integration: The contribution to the Riemann inte-

gral changes sign when the function changes sign, whereas all partial areas

enclosed between the function and the axis of integration are summed up in

the Lebesgue integral. The improper Riemann integral,∫∞

0cosxdx, has a

limit inferior, lim infn→∞ xn = −1, and a limit superior, lim supn→∞ xn = +1,

whereas the corresponding Lebesgue integral does not exist.

For Ω = R and the Lebesgue measure λ holds: Functions that are Rie-

mann integrable on a compact interval [a, b] are Lebesgue integrable too and

the values of both integral are the same. The inverse is not true: Not every

Lebesgue integrable function is Riemann integrable (see the Dirichlet func-

tion below). A function that has an improper Riemann integral need not be

Lebesgue integrable on the whole. We consider one example for each case:

(i) The Dirichlet (step) function, D(x) is the characteristic function of the

rational numbers and assumes the value 1 for rational x and the value 0 for

irrational x:

D(x) =

1 , if x ∈ Q ,

0 , otherwise ,or D(x) = lim

m→∞limn→∞

cos2n(m! π x) .

16Pointwise convergence of a sequence of functions fn, limn→∞ fn = f pointwise is

fulfilled iff limn→∞ fn(x) = f(x) for every x in the domain.

52 Peter Schuster

Figure 2.11: The alternating harmonic series. The alternating harmonic

step function, h(x) = nk = (−1)k+1/k with (k − 1) ≤ x < k and nk ∈ N, has an

improper Riemann integral since∑∞

k=1 nk = ln 2. It is not Lebesgue integrable

because the series∑∞

k=1 |nk| diverges.

D(x) is lacking Riemann integrability for every arbitrarily small interval:

Each partitioning S of the integration domain [a, b] into intervals [xk−1, xk]

leads to parts that contain necessarily at least one rational and one irrational

number. Hence the lower Darboux sum,

Σlow(S) =

n∑

k=1

(xk − xk−1) · infxk−1<x<xk

D(x) = 0 ,

vanishes because the infimum is always zero, and the upper Darboux sum,

Σhigh(S) =

n∑

k=1

(xk − xk−1) · supxk−1<x<xk

D(x) = b− a ,

is the length of the integration interval, b − a, because the supremum is

always one and the summation runs over all partial intervals. Since Riemann


integrability requires

supS

Σlow(S) =

∫ b

a

f(x)dx = infS

Σhigh(S)

D(x) cannot be Riemann integrated.

D(x), on the other hand, has a Lebesgue integral for every interval: D(x) is a

non-negative simple function and therefore we can write the Lebesgue integral

over an interval S through sorting into irrational and rational numbers:∫

S

D dλ = 0 · λ(S ∩ R\Q) + 1 · λ(S ∩Q) ,

with λ being the Lebesgue measure. The evaluation of the integral is straight-

forward. The first term vanishes because multiplication by zero yields zero

no matter how large λ(S ∩ R\Q) is, because 0 · ∞ is zero by the conven-

tion of measure theory, and the second term is also zero because λ(S ∩ Q)

is zero since the set of rational numbers, Q, is countable. Hence we have∫SD dλ = 0.

(ii) The step function with alternatingly positive and negative areas of size1n, (1,−1

2, 1

3,−1

4, . . .) (see figure 2.11) is an example of a function that has

an improper Riemann integral whereas the Lebesgue integral diverges. The

function h(x) = (−1)k+1/k with (k − 1) ≤ x < k and k ∈ N yields a series

of contributions of alternating sign on Riemann integration that has a finite

infinite sum ∫ ∞

0

h(x) dx = 1− 1

2+

1

3− . . . = ln 2 ,

whereas Lebesgue integrability of h requires∫

R+ |h| dλ < ∞ and this is not

fulfilled since the harmonic series,∑∞

k=1 k−1, diverges.

The first case is the more important issue since it provides the proof that the

set of rational numbers, Q is of Lebesgue measure zero.

Finally, we introduce the Lebesgue-Stieltjes integral in a way that allows

us to summarize the most important results of this subsection. For each

righthand continuous and monotonously increasing function F : R → R

exists a uniquely determined Lebesgue-Stieltjes measure λF that fulfils

λF((a, b]

)= F (b) − F (a) for all (a, b] ⊂ R

54 Peter Schuster

Such righthand continuous and monotonously increasing functions F : R→R are therefore called measure generating. The Lebesgue integral of a λF

integrable function g is called Lebesgue-Stieltjes integral

∫

A

g dλF with A ∈ B (2.40)

being Borel measurable. If F is the identity function on R,17 F = id :

R→ R, id(x) = x, then the corresponding Lebesgue-Stieltjes measure is the

Lebesgue itself: λF = λid = λ. For (proper) Riemann integrable functions

g we have stated that the Lebesgue integral is identical with the Riemann

integral: ∫

[a,b]

g dλ =

∫ b

a

g(x) dx .

The interval [a, b] = a ≤ x ≤ b is partitioned into a sequence σn = (a =

x(n)0 , x

(n)1 , . . . , x

(n)r = b) where the superscript ’(n)’ indicates a Riemann sum

with |σn| → 0 and the Riemann integral on the righthand side is replaced by

the limit of the Riemann summation:

∫

[a,b]

g dλ = limn→∞

r∑

k=1

g(x(n)k−1)

(x

(n)k − x

(n)k−1

)=

= limn→∞

r∑

k=1

g(x(n)k−1)

(id(x

(n)k )− id(x

(n)k−1)

).

The Lebesgue measure λ has been introduced above as the special case F = id

and therefore we find for the Stieltjes-Lebesgue integral by replacing λ by λF

and ’id’ by F

∫

[a,b]

g dλ = limn→∞

r∑

k=1

g(x(n)k−1)

(F (x

(n)k )− F (x

(n)k−1)

).

The details of the derivation are found in [44, 45].

17The identity function id(x).= x, it maps a domain, for example [a, b], point by point

onto itself.


In summary we define a Stieltjes-Lebesgue integral or F -integral by: F, g :

R → R are two functions partitioned on the interval [a, b] by the sequence

σ = (a = x0, x1, . . . , xr = b) defined by

∑

σ

g dF.=

r∑

k=1

g(xk−1)(F (xk)− F (xk−1)

).

The function g is F-integrable on [a,b] if

b∫

a

g dF.= lim

|σ|→0

∑

σ

g dF (2.41)

exists in R and∫ bagdF is called the Stieltjes-Lebesgue integral or F -integral

of g. This formulation will be required for the presentation of the Ito integral

used in Ito calculus in section 3.

56 Peter Schuster

2.4 Conditional probabilities and independence

The conventional probability has been defined on the entire sample space

Ω, P (A) = |A|/|Ω| =∑

ω∈A P (ω)/ ∑

ω∈Ω P (ω).18 We shall now define a

probability of set A relative to another set, say S. This means that we are

interested in the proportional weight of the part of A in S which is expressed

by the intersection A ∩ S relative to S, and obtain

∑

ω∈A∩SP (ω)

/ ∑

ω∈SP (ω) .

In other words, we switch from Ω to S as the new universe and consider the

conditional probability of A relative to S:

P (A|S) =P (A ∩ S)

P (S)=

P (AS)

P (S)(2.42)

provided P (S) 6= 0. Apparently, the conditional probability vanishes when

the intersection is empty: P (A|S) = 0 if P (A ∩ S) = ∅. From here on we

shall always use the short notation for the intersection, AS ≡ A ∩ S.

Next we mention several simple but fundamental relations involving con-

ditional probabilities that we present here, in essence, without proof (for

details see [5], pp.111-144). For n arbitrary events Ai we have

P (A1, A2, . . . , An) = P (A1)P (A2|A1)P (A3|A1A2) . . . P (An|A1A2 . . . An−1)

provided P (A1A2 . . . An−1) > 0. Under this proviso all conditional probabil-

ities are well defined since

P (A1) ≥ P (A1A2) ≥ . . . ≥ P (A1A2 . . . An−1) > 0 .

Let us assume that the sample space Ω is partitioned into n disjoint sets,

Ω =∑

nAn. For any set B we have then

P (B) =∑

n

P (An)P (B|An) .

18The sample space Ω is assumed to be countable and the weight P (ω) = P (ω) is

assigned to every point. Generalization to Lebesgue measures is straightforward.


From this relation it is straightforward to derive the conditional probability

P (Aj|B) =P (Aj)P (B|Aj)∑n P (An)P (B|An)

provided P (B) > 0.

Independence of random variables will be a highly relevant problem in

the forthcoming chapters. Countably-valued random variables X1, . . . ,Xnare defined to be independent if and only if for any combination x1, . . . , xn

of real numbers the joint probabilities can be factorized:

P (X1 = x1, . . . ,Xn = xn) = P (X1 = x1) · . . . · P (Xn = xn) . (2.43)

A major extension of equation (2.43) replaces the single values xi by arbitrary

sets Si

P (X1 ∈ S1, . . . ,Xn ∈ Sn) = P (X1 ∈ S1) · . . . · P (Xn ∈ Sn) .

In order to proof this extension we sum over all points belonging to the sets

S1, . . . , Sn:

∑

x1∈S1

· · ·∑

xn∈Sn

P (X1 = x1, . . . ,Xn = xn) =

=∑

x1∈S1

· · ·∑

xn∈Sn

P (X1 ∈ S1) · . . . · P (Xn ∈ Sn) =

=

(∑

x1∈S1

P (X1 ∈ S1)

)· . . . ·

(∑

xn∈Sn

P (Xn ∈ Sn))

,

which is equal to the right hand side of the equation to be proven.

Since the factorization is fulfilled for arbitrary sets S1, . . . Sn it holds also

for all subsets of (X1 . . .Xn) and accordingly the events

X1 ∈ S1, . . . , Xn ∈ Sn

are also independent. It can also be verified that for arbitrary real-valued

functions ϕ1, . . . , ϕn on (−∞,+∞) the random variables ϕ1(X1), . . . , ϕn(Xn)are independent too.

58 Peter Schuster

Independence can be extended in straightforward manner to the joint

distribution function of the random vector (X1, . . . ,Xn)

F (x1, . . . , xn) = F1(x1) · . . . · Fn(xn) ,

where the Fj’s are the marginal distributions of the Xj ’s , 1 ≤ j ≤ n. Thus,

the marginal distributions determine the joint distribution in case of inde-

pendence of the random variables.

For the continuous case we can formulate the definition of independence

for the sets S1, . . . , Sn forming a Borel family. In particular if there is a joint

density function f , then we have

P (X1 ∈ S1, . . . ,Xn ∈ Sn) =

(∫

S1

f1(u) du

)· . . . ·

(∫

Sn

fn(u) du

)=

=

∫

S1

· · ·∫

Sn

f1(u1) . . . fn(un) du1 . . . dun ,

where f1, . . . , fn are the marginal densities. The probability is also equal to∫

S1

· · ·∫

Sn

f(u1, . . . , un) du1 . . . dun

and hence we finally find for the density case:

f(u1, . . . , un) = f1(u1) . . . fn(un) . (2.44)

As we have seen here, stochastic independence makes it possible to factorize

joint probabilities, distributions or densities.

Applications of conditional probabilities to problems in biology are found

in chapter 5. Genetics was indeed one of the first cases in science where

probabilities were used in the interpretation of experimental results (see, for

example, the works of Gregor Mendel as described in [46, 47] and section 1.2).

Finally, we mention that a whole branch of probability theory Bayesian

statistics is based on conditional probabilities. It is named after the English

mathematician and Presbyterian minister Thomas Bayes who initiated an

alternative way to think about probabilities by the formulation of Bayes’s

theorem about hypothesis H and data D [48]:

P (H|D) =P (D|H)P (H)

P (D), (2.45)


wherein P (H) is the prior probability that H is correct before the data

are seen, P (D) is the marginal probability of D giving the prior probability

of witnessing the data under all possible hypotheses and as such depends on

the prior probabilities gives to them

P (D) =∑

i

P (D,Hi) =∑

i

P (D|Hi)P (Hi) ,

P (D|H) is the conditional probability of seeing the data D given that the

hypothesis H is true, and P (H|D) eventually is the posterior probability,

which is the probability that the hypothesis H is true given the data D in

the previous state of belief about the hypothesis (For the current status of

Bayesian statistics see, for example, [49–51]). Bayesian statistics is thus deal-

ing with statistical inference rather than the conventional frequency-based

interpretation of probabilities and accordingly is much closer to formal logic.

60 Peter Schuster

2.5 Expectation values and higher moments

Random variables are accessible to analysis via their probabilities. In addi-

tion, straightforward information can be derived also from ensembles defined

on the entire sample space Ω. The most important example is the expecta-

tion value, E(X ) = 〈X 〉. We start with a countable sample space:

E(X ) =∑

ω∈AX (ω)P (ω) =

∑

n

pn vn . (2.46)

In the special case of a random variable X on N0 we have

E(X ) =∑

n=0

n pn .

The expectation value (2.46) exists when the series converges in absolute val-

ues,∑

ω∈Ω |X (ω)|P (ω) <∞. Whenever the random variable X is bounded,

which means that there exists a number m such that |X (ω)| ≤ m for all

ω ∈ Ω, then it is summable and in fact

E(|X |) =∑

ω

|X (ω)|P (ω) ≤ m∑

ω

P (ω) = m .

It is straightforward to show that the sum of two random variable, X + Y is

summable iff X and Y are summable:

E(X + Y) = E(X ) + E(Y) .

The relation can be extended to an arbitrary countable number of random

variables:

E

(n∑

k=1

Xk)

=n∑

k=1

E(Xk) .

In addition, the expectation values fulfill the following relations E(a) = a,

E(aX ) = a ·E(X ) which can be combined in

E

(n∑

k=1

ak Xk)

=n∑

k=1

ak · E(Xk) . (2.47)

Thus, E(·) is a linear operator.


For a random variable X on an arbitrary sample space Ω the expectation

value may be written as an abstract integral on Ω or – provided the density

f(u) exists and we know it – as an integral over R:

E(X ) =

∫

Ω

X (ω) dω =

∫ +∞

−∞u f(u) du . (2.48)

It is worth to reconsider the discretization of a continuous density in this

context (see figure 2.9 and section 2.3): The discrete expression for the ex-

pectation value is based upon pn = P (Y = xn) as outlined in equations (2.29)

and (2.29’),

E(Y) =∑

n

xn pn ≈ E(X ) =

∫ +∞

−∞u f(u) du ,

and approximates the exact value in the sense of an approximation to the

Riemann integral.

2.5.1 First and second moments

For two or more variables, for example (X ,Y), described by a joint density

f(u, v), we have

E(X ) =

∫ +∞

−∞u f(u, ∗) du and E(Y) =

∫ +∞

−∞v f(∗, v) dv .

The expectation value of the sum of the variables, X + Y , can be evaluated

by iterated integration:

E(X + Y) =

∫ +∞

−∞

∫ +∞

−∞(u+ v) f(u, v) du dv =

=

∫ +∞

−∞u du

(∫ +∞

−∞f(u, v) dv

)+

∫ +∞

−∞v dv

(∫ +∞

−∞f(u, v) du

)=

=

∫ +∞

−∞u f(u, ∗) du +

∫ +∞

−∞v f(∗, v) dv =

= E(X ) + E(Y) ,

which establishes the previously derived expression.

62 Peter Schuster

The multiplication theorem of probability theory requires that the

two variables X and Y are independently summable and this implies for the

discrete and the continuous case,

E(X · Y) = E(X ) ·E(Y) and (2.49a)

E(X · Y) =

∫ +∞

−∞

∫ +∞

−∞uv f(u, v) du dv =

=

∫ +∞

−∞u f(u, ∗) du

∫ +∞

−∞v f(∗, v) dv =

= E(X ) ·E(Y) , (2.49b)

respectively. The multiplication theorem is easily extended to any finite

number of independent and summable random variables:

E(X1, . . . ,Xn) = E(X1) · . . . ·E(Xn) . (2.49c)

Let us now consider the expectation values of special functions of random

variables, in particular, their powers which give rise to the raw moments of

the probability distribution, µr. For a random variable X we distinguish the

r-th moments E(X r) and the so-called centered moments19 µr = E(X r)

referring to the random variable

X = X − E(X ) .

Clearly, the first raw moment is the expectation value and the first centered

moment vanishes, E(X ) = µ1 = 0. Often the expectation value is denoted by

µ = µ1 = E(X ), a notation that we shall use too for the sake of convenience

but it is important not to confuse µ and µ1.

In general, a moment is defined about some point a by means of the

random variable

X (a) = X − a .

19Since the moments centered around the expectation value, will be used more frequently

than the raw moments we denote them by µ and the raw moments by µ. The r-th moment

of a distribution is also called the moment of order r.


For a = 0 we obtain the raw moments

µr = E(X r) (2.50)

whereas a = E(X ) yields the centered moments.

The general expressions for the raw r-th moments and centered moments

as derived from the density f(u) are

E(X r) = αr(X ) =

∫ +∞

−∞ur f(u) du and (2.51a)

E(X r) = µr(X ) =

∫ +∞

−∞(u− µ)r f(u) du . (2.51b)

The second centered moment is called the variance, σ2(X ), and its pos-

itive square root, the standard deviation σ(X ). The variance is always a

non-negative quantity as can be easily shown. Further we can derive:

σ2(X ) = E(X 2) = E

((X − E(X )

)2)

=

= E(X 2 − 2X E(X ) + E(X )2

)=

= E(X 2) − 2E(X )E(X ) + E(X )2 =

= E(X 2) − E(X )2 .

(2.52)

If E(X 2) is finite, than E(|X |) is finite too and fulfils the inequality

E(|X |)2 ≤ E(X 2) ,

and since E(X ) ≤ E(|X |) the variance is necessarily a non-negative quantity,

σ2(X ) ≥ 0.

If X and Y are independent and have finite variances, then we obtain

σ2(X + Y) = σ2(X ) + σ2(Y) ,

as follows readily by simple calculation:

E((X + Y)2

)= E

(X 2 + 2 X Y + Y2

)=

= E(X 2)

+ 2E(X )E(Y) + E(Y2)

= E(X 2)

+ E(Y2).

64 Peter Schuster

Here we have used the fact that the first centered moments vanish: E(X ) =

E(Y) = 0.

For two general – non necessarily independent – random variables X and

Y , the Cauchy-Schwarz inequality holds for the mixed expectation value:

E(XY)2 ≤ E(X 2)E(Y2) . (2.53)

If both random variables have finite variances, the covariance is defined by

Cov(X ,Y) = E((X − E(X )

)(Y − E(Y)

))=

= E(XY − X E(Y) − E(X )Y + E(X )E(Y)

)=

= E(XY) − E(X )E(Y) .

(2.54)

The covariance Cov(X ,Y) and the coefficient of correlation ρ(X ,Y),

Cov(X ,Y) = E(XY) − E(X )E(Y) and ρ(X ,Y) =Cov(X ,Y)

σ(X ) σ(Y), (2.54’)

are a measure of correlation between the two variables. As a consequence

of the Cauchy-Schwarz inequality we have −1 ≤ ρ(X ,Y) ≤ 1. If covariance

and correlation coefficient are equal to zero, the two random variables X and

Y are uncorrelated. Independence implies lack of correlation but the latter

is in general the weaker property.

In addition to the expectation value two more quantities are used to char-

acterize the center of probability distributions (figure 2.12): (i) The median

µ is the value at which the number of points of a distribution at lower values

of matches exactly the number of points at higher values as expressed in

terms of two inequalities,

P (X ≤ µ) ≥ 1

2and P (X ≥ µ) ≥ 1

2or

∫ µ

−∞dF (x) ≥ 1

2and

∫ +∞

µ

dF (x) ≥ 1

2,

(2.55)

where Lebesgue-Stieltjes integration is applied or in case of an absolutely

continuous distribution the condition simplifies to

P (X ≤ µ) = P (X ≥ µ) =

∫ µ

−∞f(x) dx =

1

2, (2.55’)


Figure 2.12: Probability densities and moments. As an example of an asym-

metric distribution with highly different values for mode, median, and mean, the

lognormal density f(x) = 1√2π σ x

exp(−(lnx − ν)2/(2σ2)

)is shown. Parameters

values σ =√

ln 2 and ν = ln 2 were chosen and they yield µ = exp(ν − σ2/2) = 1

for the mode, µ = exp(ν) = 2 for the median and µ = exp(ν + σ2/2) = 2√

2

for the mean, respectively. The sequence mode<median<mean is charac-

teristic for distributions with positive skewness whereas the opposite sequence

mean<median<mode is found in cases of negative skewness (see also figure 2.14).

and (ii) the mode µ of a distribution is the most frequent value – the value

that is most likely to obtain through sampling – and it is obtained as the

maximum of the probability mass function for discrete distribution or as the

maximum of the probability density in the continuous case. An illustrative

example for the discrete case is the probability mass function of the scores for

throwing to dice (The mode in the lower part of figure 2.7 is µ = 7). A prob-

ability distribution may have more than one mode. Bimodal distributions

occur occasionally and then the two modes provide much more information

on the expected outcomes than mean or median (see also subsection 2.7.9).

For many purposes a generalization of the median from two to n equally

66 Peter Schuster

Figure 2.13: Definition and determination of quantiles. A quantile q with

pq = k/n defines a value xq at which the (cumulative) probability distribution

reaches the value F (xq) = pq corresponding to P (X < x) ≤ p. The figure shows

how the position of the quantile pq = k/n is used to determine its value xq(p).

In particular we use here the normal distribution Φ(x) as function F (x) and the

computation yields Φ(xq) = 12

(1 + erf

( xq−ν√2σ2

))= pq. Parameter choice: ν = 2,

σ2 = 12 , and for the quantile (n = 5, k = 2), yielding pq = 2/5 and xq = 1.8209.

sized data sets is useful. The quantiles are points taken at regular intervals

from the cumulative distribution function F (x) of a random variable X . Or-

dered data are divided into n essentially equal-sized subsets and accordingly,

(n− 1) points on the x-axis separate the subsets. Then, the k-th n-quantile

is defined by P (X < x) ≤ kn

= p or equivalently

F−1(p).= infx ∈ R : F (x) ≥ p and p =

∫ x

−∞dF (u) . (2.56)

In case the random variable has a probability density the integral simplifies

to p =∫ x−∞ f(u)du. The median is simply the value of x for p = 1

2. For

partitioning into four parts we haver the first or lower quartile at p = 14,

the second quartile or median at p = 12, and the third or upper quartile at

p = 34. The lower quartile contains 25% of the data, the median 50%, and

the upper quartile eventually 75%.

The interpretation of expectation value and variance or standard devia-


tion is straightforward: The expectation value is the mean or average value

of a distribution and the variance measures the width.

2.5.2 Higher moments

Two other quantities related to higher moments are frequently used for a

more detailed characterization of probability distributions:20 (i) The skew-

ness

γ1 =µ3

µ3/22

=µ3

σ3=

E((X −E(X )

)3)

(E((X −E(X )

)2))3/2

(2.57)

and (ii) kurtosis, which is either defined as the fourth standardized moment

β2 or in terms of cumulants given as excess kurtosis, γ2,

β2 =µ4

µ22

=µ4

σ4=

E((X − E(X )

)4)

(E((X −E(X )

)2))2 and

γ2 =κ4

κ22

=µ4

σ4− 3 = β2 − 3 .

(2.58)

Skewness is a measure of the asymmetry of the probability density: curves

that are symmetric about the mean have zero skew, negative skew implies a

longer left tail of the distribution caused by fewer low values, and positive

skew is characteristic for a distribution with a longer right tail. Kurtosis

characterizes the degree of peakedness of a distribution. High kurtosis implies

a sharper peak and flat tails, low kurtosis in contrary characterizes flat or

round peaks and this tails. Distributions are called leptokurtic if they have a

positive excess kurtosis and therefore are sharper peak and a thicker tail than

the normal distribution which is taken as a reference with zero kurtosis

(see section 2.7), they are characterized as platykurtic if they have a negative

excess kurtosis, a broader peak and thinner tails (see figure 2.14). One

property of skewness and kurtosis being caused by definition is important

20In contrast to expectation value, variance and standard deviation, skewness and kur-

tosis are not uniquely defined and it is necessary therefore to check carefully the author’s

definitions when reading text from literature.

68 Peter Schuster

Figure 2.14: Skewness and kurtosis. The upper part of the figures illustrates

the sign of skewness with asymmetric density functions. The examples are taken

form the binomial distribution Bk(n, p): γ1 = (1 − 2p)/√np(1− p) with p = 0.1

(red), 0.5 (black; symmetric), and 0.9 (blue) with the values γ1 = 0.596, 0, −0.596.

Densities with different kurtosis are compared in the lower part of the figure: The

Laplace distribution (D, red), the hyperbolic secant distribution (S, orange), and

the logistic distribution (L, green) are leptokurtic with excess kurtosis values 3, 2,

and 1.2, respectively. The normal distribution is the reference curve with excess

kurtosis 0 (N, black). The raised cosine distribution (C, cyan), the Wigner semicir-

cle distribution (W, blue), and the uniform distribution (U, magenta) are platykur-

tic with excess kurtosis values of -0.593762, -1, and -1.2 respectively (The picture

is taken from http://en.wikipedia.org/wiki/Kurtosis, March 30,2011).


to note: The expectation value, the standard deviation, and the variance are

quantities with dimensions, mass, length and/or time, whereas skewness and

kurtosis are dimensionless numbers.

The cumulants κi are the coefficients of a series expansion of the logarithm

of the characteristic function (see section 2.7), which in turn is the Fourier

transform of the probability density function, f(x):

φ(z) =

∫ +∞

−∞exp(ı.ı z x) f(x)dx and ln φ(z) =

∞∑

i=1

κi(ı.ı z)i

i!. (2.59)

The first five cumulants κi (i = 1, . . . , 5) expressed in terms of the expectation

value µ and the central moments µi (µ1 = 0) are

κ1 = µ

κ2 = µ2

κ3 = µ3

κ4 = µ4 − 3µ22

κ5 = µ5 − 10µ2µ3 .

We shall come back to the use of cumulants κi in the next section 2.6 when

we apply k-statistics in order to compute empirical moments from incomplete

data sets and in section 2.7 where we shall compare frequently used individual

probability densities.

70 Peter Schuster

2.6 Mathematical statistics

Although mathematical statistics is a discipline in its own right and would

require a separate course, we mention here briefly the basic concept which

is of general importance for every scientist.21 In practice, we can collect

data for all sample points of the sample space Ω only in very exceptional

cases. Otherwise we have to rely on limited samples as they are obtained in

experiments or in opinion polls. Among other things mathematical statistics

is dealing extensively with the problem of incomplete data sets.

For a given incomplete random sample (X1, . . . ,Xn) some function Z is

evaluated and yields a random variable Z = Z(X1, . . . ,Xn) as output. From

a limited set of data, x = x1, x2, . . . , xn, sample expectation values also

called sample means, sample variances, sample standard deviations or other

quantities are computed as estimators in the same way as if the sample

set would cover the complete sample space. In particular we compute the

sample mean

m =1

n

n∑

i=1

xi (2.60)

and the moments around the sample mean. For the sample variance we

obtain

m2 = =1

n

n∑

i=1

x2i −

(1

n

n∑

i=1

xi

)2

, (2.61)

21For the reader who is interested in more details on mathematical statistics we recom-

mend the textbook by Fisz [10] and the comprehensive treatise by Stuart and Ord [25, 26]

which is a new edition of Kendall’s classic on statistics. The monograph by Cooper [52]

is particularly addressed to experimentalists practicing statistics. A great variety of other

and equally well suitable texts are, of course, available in the rich literature on mathemat-

ical statistics.


and for the third and fourth moments after some calculations

m3 =1

n

n∑

i=1

x3i −

3

n2

(n∑

i=1

xi

)(n∑

j=1

x2j

)+

2

n3

(n∑

i=1

xi

)3

(2.62a)

m4 =1

n

n∑

i=1

x4i −

4

n2

(n∑

i=1

xi

)(n∑

j=1

x3j

)+

+6

n3

(n∑

i=1

xi

)2( n∑

j=1

x2j

)− 3

n4

(n∑

i=1

xi

)4

. (2.62b)

These naıve estimators, mi (i = 2, 3, 4, . . .), contain a bias because the exact

expectation value µ around which the moments are centered is not known

and has to be approximated by the sample mean m. For the variance we

illustrate the systematic deviation by calculating a correction factor known

as Bessel’s correction but more properly attributed to Gauss [53, part 2,

p.161]. In order to obtain an expectation value for the sample moments

we repeat drawing of samples with n elements and denote their expectation

values by < mi >. In particular we have

m2 =1

n

n∑

i=1

x2i −

(1

n

n∑

i=1

xi

)2

=

=1

n

n∑

i=1

x2i −

1

n2

(n∑

i=1

x2i +

n∑

i,j=1, i6=jxi xj

)

=

=n− 1

n2

n∑

i=1

x2i −

1

n2

n∑

i,j=1, i6=jxi xj .

The expectation value is now of the form

< m2 > =n− 1

n

⟨1

n

n∑

i=1

x2i

⟩− 1

n2

⟨n∑

i,j=1, i6=jxi xj

⟩,

and by using < xixj >=< xi >< xj >=< xi >2 we find

< m2 > =n− 1

n

⟨1

n

n∑

i=1

x2i

⟩

− n(n− 1)

n2

⟨n∑

i=1

xi

⟩2

=

=n− 1

nα2 −

n(n− 1)

n2µ2 =

n− 1

n(α2 − µ2) ,

72 Peter Schuster

where α2 is the second moment about zero. Using the identity α2 = µ2 + µ2

we find eventually

< m2 > =n− 1

nµ2 and var(x) =

1

n− 1

n∑

i=1

(xi −m)2 . (2.63)

Further useful measures of correlation between pairs of random variables can

be derived straightforwardly: (i) the unbiased sample covariance

MXY =1

n− 1

n∑

i=1

(xi − m) (yi − m) , (2.64)

and (ii) the sample correlation coefficient

RXY =

∑ni=1 (xi − m) (yi − m)√

(∑n

i=1 (xi − m)2) (∑n

i=1 (yi − m)2). (2.65)

For practical purposes Bessel’s correction is often unimportant when the data

sets are sufficiently large but the recognition of the principle is important in

particular for statistical properties more involved than variances. Sometimes

a problem is encountered in cases where the second moment of a distribution,

µ2, does not exist that means it diverges. Then, computing variances from

incomplete data sets is also unstable and one may choose the mean absolute

deviation,

D(X ) =1

n

n∑

i=1

|Xi −m| , (2.66)

as a measure for the width of the distribution [54, pp.455-459], because it is

commonly more robust than variance or standard deviation.

In order to derive unbiased estimators for the cumulants of a probability

distribution, < ki >= κi, k-statistics has been invented [55, pp.99-100]. The

first four terms of k-statistics for n sample points are

k1 = m ,

k2 =n

n− 1m2 ,

k3 =n2

(n− 1)(n− 2)m3 and

k4 =n2((n+ 1)m4 − 3 (n− 1)m 2

2

)

(n− 1)(n− 2)(n− 3),

(2.67)


which result from inversion of the relationships

< m > = µ ,

< m2 > =n− 1

nµ2 ,

< m3 > =(n− 1)(n− 2)

n2µ3 ,

< m 22 > =

(n− 1)((n− 1)µ4 + (n2 − 2n+ 3)µ 2

2

)

n3, and

< m4 > =(n− 1)

((n2 − 3n+ 3)µ4 + 3 (2n− 3)µ 2

2

)

n3.

(2.68)

The usefulness of these relations becomes evident in various applications.

The statistician computes moments and other functions from his empiri-

cal data sets, for example x1, . . . , xn or (x1, y1), . . . , (xn, yn) by means of

the equations (2.60) and (2.63) to (2.65). The main issue of mathematical

statistics, however, has always been and still is the development of inde-

pendent tests that allow for the derivation of information on the quality of

data. The underlying assumption commonly is that the values of the empir-

ical functions converge to the corresponding exact moments as the random

sample increases. Predictions on the reliability of the computed values are

made by means of a great variety of tools. We dispense from details which

are extensively treated in the literature [10, 25, 26].

74 Peter Schuster

2.7 Distributions, densities and generating functions

In this section we introduce a few frequently used probability distributions

and analyze their properties. For this goal we define and make use of two

auxiliary functions, which allow for the derivation of compact representa-

tions of the distributions and which provide convenient tools for handling

functions of probabilities. In particular we shall make use of probability gen-

erating functions, g(s), moment generating functions M(s) and characteristic

functions φ(s). The characteristic function φ(s) exists for all distributions

but we shall encounter cases where no generating function exists (see, for

example, the Cauchy-Lorentz distribution in subsection 2.7.8). In addition

to the three generating functions mentioned here other functions are in use

as well. An example is the cumulant generating function that is lacking a

uniform definition. It is either the logarithm of the moment generation or

the logarithm of the characteristic function.

2.7.1 Probability generating functions

Let X be a random variable taking only non-negative integer values with a

probability distribution given by

P (X = j) = aj ; j = 0, 1, 2, . . . . (2.69)

A dummy variable s is introduced and the probability generating func-

tion is expressed by an infinite power series

g(s) = a0 + a1 s + a2 s2 + . . . =

∞∑

j=0

aj sj . (2.70)

As we shall show later, the full information on the probability distribution is

encapsulated in the coefficients aj (j ≥ 0). In most cases s is a real variable,

although it can be of advantage to consider also complex s. Recalling that∑

j aj = 1 we verify easily that the power series (2.70) converges for |s| ≤ 1:

|g(s)| ≤∑

j

|aj| · |s|j ≤∑

j

aj = 1 , for |s| ≤ 1 .


For |s| < 1 we can differentiate the series term by term to calculate the

derivatives of the generating function g

g′(s) =dg

ds= a1 + 2 a2s + 3 a3s

2 + . . . =∞∑

n=1

n ansn−1 ,

g′′(s) =d2g

ds2 = 2 a2 + 6 a3s + . . . =

∞∑

n=2

n(n− 1) ansn−2 ,

and, in general, we have

g(j) =djg

dsj=

∞∑

n=j

n(n− 1) . . . (n− j + 1) ansn−j =

∞∑

n=j

(n

j

)j ! an s

n−j .

Setting s = 0, all terms vanish except the constant term

g(j)(0) = j ! aj or aj =1

j !g(j)(0) .

In this way all aj ’s may be obtained by consecutive differentiation from the

generating function and alternatively the generating function can be deter-

mined from the known probability distribution.

Putting s = 1 in g′(s) and g′′(s) we can compute the first and second

moments of the distribution of X :

g′(1) =∞∑

n=0

n an = E(X ) ,

g′′(1) =

∞∑

n=0

n2 an −∞∑

n=0

n an = E(X 2) − E(X ) and,

E(X ) = g′(1) and E(X 2) = g′(1) + g′′(1) .

(2.71)

We summarize: The probability distribution of a non-negative integer val-

ues random variable can be converted into a generating function without

loosing information. The generating function is uniquely determined by the

distribution and vice versa.

76 Peter Schuster

2.7.2 Moment generating functions

Basis of the moment generating function is the series expansion of the expo-

nential of the random variable X

eX s = 1 + X s +X 2

2!s2 +

X 3

3!s3 . . . .

The moment generating function allows for direct computation of the

moments of a probability distribution as defined in equation (2.69) since we

have:

MX (s) = E(eX s) = 1 + µ1 s +µ2

2!s2 +

µ3

3!s3 . . . , (2.72)

wherein µi is the i-th raw moment. The moments are obtained by differen-

tiating MX (s) i times with respect to s and then setting s = 0

E(X n) = µn = M(n)X =

dnMXdsn

∣∣∣∣s=0

.

A probability distribution thus has (at least) as many moments as many times

the moment generating function can be continuously differentiated (see also

characteristic function in subsection 2.7.3) . If two distributions have the

same moment generating functions they are identical at all points:

MX (s) = MY(s) ⇐⇒ FX (x) = FY(x) .

This statement, however, does not imply that two distributions are iden-

tical when they have the same moments, because in some cases the mo-

ments exist but the moment generating function does not since the limit

limn→∞∑n

k=0µk s

k

k!diverges as, for example, in case of the logarithmic nor-

mal distribution.

2.7.3 Characteristic functions

Like the moment generating function the characteristic function φ(s) of

a random variable X completely describes the probability distribution F (x).

It is defined by

φ(s) =

∫ +∞

−∞exp(ı.ı s x) dF (x) =

∫ +∞

−∞exp(ı.ı s x) f(x) dx , (2.59’)


where the integral over dF (x) is of Riemann-Stieltjes type. In case a probabil-

ity density f(x) exists for the random variable X the characteristic function

is (almost) the Fourier transform of the density.22 From equation (2.59’) fol-

lows the useful expression φ(s) = E(eı.ısX ) that we shall use, for example, in

solving the equations for stochastic processes (subsection 3.4).

The characteristic function exists for all random variables since it is an in-

tegral of a bounded continuous function over a space of finite measure. There

is a bijection between distribution functions and characteristic functions:

φX (s) = φY(s) ⇐⇒ FX (x) = FY(x) .

If a random variable X has moments up to k-th order, then the characteristic

function φ(x) is k times continuously differentiable on the entire real line and

vice versa if a characteristic function φ(x) has a k-th derivative at zero, then

the random variable X has all moments up to k if k is even and up to k − 1

if k is odd:

E(X k) = (−ı.ı)k dkφ(s)

dsk

∣∣∣∣s=0

anddkφ(s)

dsk

∣∣∣∣s=0

= ı.ık E(X k) . (2.73)

An interesting example is presented by the Cauchy distribution (subsec-

tion 2.7.8) with φ(s) = exp(|s|): It is not differentiable at s = 0 and the

distribution has no moments including the expectation value.

The moment generating function is related to the probability generating

function g(s) (subsection 2.7.1) and the characteristic function φ(s) (subsec-

tion 2.7.3) by

g (es) = E(eX s)

= MX (s) and φ(s) = Mı.ıX (s) = MX (ı

.ıs) .

All three generating functions are closely related but it may happen that not

all three are existing. As said, characteristic functions exist for all probability

distributions.

22The difference between the Fourier transform ψ(s) and the characteristic function

φ(s),

ψ(s) =

∫ +∞

−∞exp(−2π ı

.ı s x) dx and φ(s) =

∫ +∞

−∞exp(+ı

.ı s x) dx ,

is only a matter of the factor 2π and the choice of the sign.

78Peter

Schuster

able 2.2: Comparison of several common probability densities. Abbreviation and notations used in the table

Γ(r, x) =∫∞x sr−1e−sds and γ(r, x) =

∫ x0 s

r−1e−sds are the upper and lower incomplete gamma function, respectively;

a, b) = B(x; a, b)/B(1; a, b) is the regularized incomplete beta function with B(x; a, b) =∫ x0 s

a−1(1 − s)b−1ds. For more

details see [56].

Name Parameters Support pmf / pdf cdf Mean Median Mode Variance Skewness Kurtosis mgf cf

oisson α > 0 ∈ R k ∈ N0 αk

k!e−α Γ(bk+1c,α)

bkc! α ≈ bα+ 13− 0.02

αc dαe−1 α 1√

α1α

exp(α(es−1)

)exp

(α(eı

.ıs−1)

)

α)

Binomial n ∈ N k ∈ N0

(nk

)pk(1−p)n−k I1−p = (n−k, 1+k) np bnpc or dnpe b(n+1)pc or np(1−p)

1−2p√np(1−p)

1−6p(1−p)np(1−p)

(1−p+ps)n (1−p+pı.ıs)n

n, p) p ∈ [0, 1] p ∈ [0, 1] b(n+1)pc−1

Normal ν ∈ R x ∈ R1√

2πσ2e− (x−ν)2

2σ2 12

(1+erf

(x−ν√2σ2

))ν ν ν σ2 0 0 exp(νs+ 1

2σ2s2) exp(ı

.ıνs− 1

2σ2s2)

ν, σ) σ2 ∈ R+

hi-square k ∈ N x ∈ [0,∞[ xk2−1

e− x

2

2k2 Γ

(k2

)γ( k

2, x2

)

Γ( k2

)k ≈ k

(1− 2

9k

)3maxk−2, 0 2k

√8k

12k

(1−2s)− k

2 (1−2ı.ıs)

− k2

(k) for s < 12

Logistic a ∈ R, b > 0 x ∈ R

sech2((x−a)/2b

)

4b1

1+exp(−(x−a)/b

) a a a π2b2/3 0 4.2 πbs eassin(πbs)

ı.ıπbs eassin(ı

.ıπbs)

Laplace ν ∈ R x ∈ R12b

e− |x−ν|

b

12

e− ν−x

b ,

x < a

1− 12

e− x−ν

b ,

x ≥ a

ν ν ν 2b2 0 3exp(νs)

1−b2s2exp(ı

.ıνs)

1−b2s2

b > 0 for |s| < 1b

Uniform a < b x ∈ [a, b]

1b−a

, x ∈ [a, b]

0 otherwise

0, x < a

x−ab−a

, x ∈ [a, b]

1, x ≥ b

a+b2

a+b2

m ∈ [a, b](b−a)2

120 − 6

5ebs−eas

(b−a)seı.ıbs−eı

.ıas

ı.ı(b−a)s

a, b) a, b ∈ R

Cauchy x0 ∈ R x ∈ R1

πγ

(1+

(x−x0

γ

)2) 1

πarctan

(x−x0

γ

)– x0 x0 – – – – exp(ı

.ıx0s−γ|s|)

γ ∈ R+


Figure 2.15: The Poisson probability density. Two examples of Poisson

distributions, πk(α) = αke−α/k!, with α = 1 (black) and α = 5 (red ) are shown.

The distribution with the larger α has the mode shifted further to the right and a

thicker tail.

Before entering the discussion of individual common probability distribu-

tions we present an overview over the most important characteristics of these

distributions in table 2.2.

2.7.4 The Poisson distribution

The Poisson distribution, named after the French physicist and mathemati-

cian Simeon Denis Poisson, is a discrete probability distribution. It is used

to model the number of independent events occurring in a given interval of

time. In physics and chemistry the Poisson process is the stochastic basis of

first order processes, for example radioactive decay or irreversible first order

reactions and the Poisson distribution is the probability distribution underly-

ing the time course of particle numbers, N(t). Despite its major importance

in physics and biology the Poisson distribution, π(α), is a fairly simple math-

ematical object. It contains a single parameter only, the real valued positive

80 Peter Schuster

number α:23

P (X = k) = πk(α) =e−α

k!αk ; k ∈ N0 . (2.74)

As an exercise we leave to verify the following properties:

∞∑

k=0

πk = 1 ,

∞∑

k=0

k πk = α and

∞∑

k=0

k2 πk = α + α2

Examples of Poisson distributions with two different parameter values, α = 1

and 5, are shown in figure 2.15.

By means of a Taylor expansion we can find the generating function of

the Poisson distribution,

g(s) = eα(s−1) . (2.75)

From the generating function we calculate easily

g′(s) = α eα(s−1) and g′′(s) = α2 eα(s−1) .

Expectation value and second moment follow straightforwardly from equa-

tion(2.71):

E(X ) = g′(1) = α , (2.75a)

E(X 2) = g′(1) + g′′(1) = α + α2 , and (2.75b)

σ2(X ) = α . (2.75c)

Both, the expectation value and the variance are equal to the parameter α

and hence, the standard deviation amounts to σ(X ) =√α. This remarkable

property of the Poisson distribution is not limited to the second moment.

The factorial moments, 〈X r〉f , fulfil the equation

〈X r〉f = E(X (X − 1) . . . (X − r + 1)

)= αr . (2.75d)

23In order to solve the problems one requires knowledge on some basic infinite series:

e =∑∞

n=01n! , e

x =∑∞

n=0xn

n! for |x| <∞, e = limn→∞(1 + 1n )n, e−α = limn→∞(1− α

n )n.


Figure 2.16: The binomial probability density. Two examples of binomial

distributions, Bk(n, p) =(nk

)pk(1 − p)n−k, with n = 10, p = 0.5 and p = 0.1 are

shown. The former distribution is symmetric with respect to the expectation value

E(Bk) = n/2.

2.7.5 The binomial distribution

The binomial distribution, B(n, p), characterizes the cumulative result of

independent trials with two-valued outcomes, for example, successive coin

tosses as we discussed in sections 1.2 and 2.2, Sn =∑n

i=1Xi. The Xi’s are

commonly called Bernoulli random variables named after the Swiss mathe-

matician Jakob Bernoulli, and accordingly, the sequence of events is known

as a Bernoulli process, and the corresponding random variable is said to have

a Bernoulli or binomial distribution:

P (Sn = k) = Bk(n, p) =

(n

k

)pk qn−k , q = 1− p and (k ∈ N0, k ≤ n) .

(2.76)

Two examples are shown in figure 2.16. The distribution with p = 0.5 is

symmetric with respect to k = n/2.

The generating function for the single trial is g(s) = q + ps. Since we

have n independent trials the complete generating function is

g(s) = (q + ps)n =

n∑

k=0

(n

k

)qn−k pk sk . (2.77)

82 Peter Schuster

From the derivatives of the generating function,

g′(s) = n p (q + ps)n−1 and g′′(s) = n(n− 1) p2 (q + ps)n−2 ,

we compute readily expectation value and variance:

E(Sn) = g′(1) = n p , (2.77a)

E(S2n) = g′(1) + g′′(1) = np + n2p2 − np2 = n p q + n2 p2 , (2.77b)

σ2(Sn) = n p q , and (2.77c)

σ(Sn) =√npq . (2.77d)

For p = 1/2, the case of the unbiased coin, we have the symmetric binomial

distribution with E(Sn) = n/2, σ2(Sn) = n/4, and σ(Sn) =√n/2. We note

that the expectation value is proportional to the number of trials, n, and the

standard deviation is proportional to its square root,√n.

The binomial distribution B(n, p) can be transformed into the Poisson

distribution π(α) in the limit n→∞. In order to show this we start from

Bk(n, p) =

(n

k

)pk (1− p)n−k , 0 ≤ k ≤ n (k ∈ N0, k ≤ n) .

The symmetry parameter p is assumed to vary with n, p(n) = α/n for n ≥ 1,

and thus we have

Bk

(n,α

n

)=

(n

k

) (αn

)k (1− α

n

)n−k, (k ∈ N0, k ≤ n) .

We let n go to infinity for fixed k and start with B0(n, p):

limn→∞

B0

(n,α

n

)= lim

n→∞

(1− α

n

)n= e−α .

Now we compute the ratio of two consecutive terms, Bk+1/Bk:

Bk+1

(n, α

n

)

Bk

(n, α

n

) =n− kk + 1

·(αn

)·(1− α

n

)−1

=α

k + 1·[(

n− kn

)·(1− α

n

)−1].

Both terms in the square brackets converge to one as n→∞, and hence we

find:

limn→∞

Bk+1

(n, α

n

)

Bk

(n, α

n

) =α

k + 1.


From the two results we compute all terms starting from the limit value of

B0 and find limB1 = α exp(−α), limB2 = α2 exp(−α)/2!, . . ., limBk =

αk exp(−α)/k!. Accordingly we have verified Poisson’s limit law:

limn→∞

Bk

(n,α

n

)= πk(α) , k ∈ N0 . (2.78)

It is worth remembering that the limit was performed in a peculiar way

since the symmetry parameter p(n) was shrinking with increasing n and as

a matter of fact vanished in the limit of n→∞.

2.7.6 The normal distribution

The normal distribution is of central importance in probability theory be-

cause many distributions converge to it in the limit of large numbers. It is

basic for the estimate of statistical errors and thus we shall discuss it in some

detail. The general normal distribution has the density24

ϕ(x) =1√2π σ

e−(x−ν)2

2σ2 with

∫ +∞

−∞ϕ(x) dx = 1 , (2.79)

and the corresponding random variable X has the moments E(X ) = ν,

σ2(X ) = σ2, and σ(X ) = σ.

For many purposes it is convenient to use the normal density in centered

and normalized form:

ϕ(x) =1√2π

e−x2/2 with

∫ +∞

−∞φ(x) dx = 1 , (2.79’)

In this form we have E(X ) = 0, σ2(X ) = 1, and σ(X ) = 1. Integration of

the density yields the distribution function

P (X ≤ x) = Φ(x) =1√2π

∫ x

−∞e−

u2

2 du . (2.80)

The function Φ(x) is not available in analytical form, but it can be easily

formulated in terms of the error function, erf(x). This function as well as its

24The notations applied here for the normal distribution are: Φ(x; ν, σ) for the cumu-

lative distribution and ϕ(x; ν, σ) for the density, Commonly, the parameters, (ν, σ) are

omitted.

84 Peter Schuster

complement, erfc(x), defined by

erf(x) =2√π

∫ x

0

e−t2

dt and erfc(x) =2√π

∫ ∞

x

e−t2

dt ,

are available in tables and in standard mathematical packages.25 Examples of

the normal density ϕ(x) with different values of the standard deviation σ and

one example of the integrated distribution Φ(x) are shown in figure 2.17. The

normal distribution is also used in statistics to define confidence intervals:

68.2% of the data points lie within an interval ν±σ, 95.4% within an interval

ν ± 2σ, and eventually 99,7% with an interval ν ± 3σ.

The normal density function ϕ(x) has, among other remarkable prop-

erties, derivatives of all orders. Each derivative can be written as product

of ϕ(x) by a polynomial, of the order of the derivative, known as Hermite

polynomial. This existence of all derivatives makes the bell-shaped curve

x → ϕ(x) particularly smooth. In addition, the function ϕ(x) decreases to

zero very rapidly as |x| → ∞.

The normal density is particularly smooth and can be differentiated ar-

bitrarily often. This makes the moment generating function of the normal

distribution especially attractive (see subsection 2.7.2). M(s) can be ob-

tained directly by integration

M(s) =

∫ +∞

−∞ex s ϕ(x) dx =

∫ +∞

−∞exp

(x s − x2

2

)dx =

=

∫ +∞

−∞e(

s2

2− (x−s)2

2) dx = es

2/2

∫ +∞

−∞ϕ(x− s) dx =

= es2/2 .

(2.81)

All raw moments of the normal distribution

µ(n) =

∫ +∞

−∞xn ϕ(x) dx (2.82)

They can be obtained, for example, by successive differentiation of M(s) with

respect to s (subsection 2.7.2). The moments are obtained more efficiently

25We remark that erf(x) and erfc(x) are not normalized in the same way as the normal

density: erf(x) + erfc(x) = 2√π

∫∞0

exp(−t2)dt = 1, but∫∞0ϕ(x)dx = 1

2

∫ +∞−∞ ϕ(x)dx = 1

2 .


Figure 2.17: The normal probability density. In the upper part we show the

density function of the normal distribution, ϕ(x) = exp[−(x− ν)2/(2σ2)]/(√

2πσ)

for ν = 5 and σ = 0.5 (red), 1.0 (black), and 2.0 (blue). The smaller the

standard deviation σ, the sharper is the curve. The lower part shows the den-

sity function ϕ(x) (red) together with the distribution Φ(x) =∫ x−∞ ϕ(u)du =

0.5[1 + erf

(x/(√

2σ)− ν)]

(black) for ν = 5 and σ = 0.5.

by expanding the first and the last expression in the previous equation (2.81)

86 Peter Schuster

Figure 2.18: A fit of the normal distribution to the binomial distribu-

tion. The curves represent normal densities (red), which were fit to the points of

the binomial distribution (black). The three examples. Parameter choice for the

binomial distribution: (n = 4, p = 0.5), (n = 10, p = 0.5), and (n = 10, p = 0.1),

for the upper, middle, and lower plot, respectively.


in a power series of s,

∫ +∞

−∞

(1 + x s +

(x s)2

2!+ . . .+

(x s)n

n!+ . . .

)ϕ(x) dx =

= 1 +s2

2+

1

2!

(s2

2

)2

+ . . .+1

n!

(s22)n

+ . . . ,

or expressed in terms of the moments µ(n),

∞∑

n=0

µ(n)

n!sn =

∞∑

n=0

1

2n n!s2n ,

from which we compute the moments of Φ(x) by putting the coefficients of

the powers of s equal on both sides of the expansion and find for n ≥ 1:26

µ(2n−1) = 0 and µ(2n) =(2n)!

2n n!. (2.83)

All odd moments vanish because of symmetry. In case of the fourth moment,

kurtosis, a kind of standardization is common, which assigns zero excess

kurtosis, γ2 = 0 to the normal distribution. In other words, excess kurtosis

monitors peak shape with respect to the normal distribution: Positive excess

kurtosis implies peaks that are sharper than the normal density, negative

excess kurtosis peaks that are broader than the normal density.

Multivariate normal distribution. For general applications it is worth

considering the normal distribution in multiple dimensions. The random

variable X is replaced by a random vector,27 ~X = (X1, . . . ,Xn) with the joint

26The definite integrals are:

∫ +∞−∞ xn exp(−x2)dx =

√π n = 0

0 n ≥ 1; odd

(n−1)!!2n/2

√π n ≥ 2; even

,

where (n− 1)!! = 1 · 3 · . . . · (n− 1).27The notation we use here for vectors is ‘~· ’ or bold-face letters. Matrices are denoted

either by upper-case Roman or upper-case Greek letters. Vectors are commonly under-

stood as columns or one-column matrices. Transposition, indicated by ‘ · ′ ’, converts them

into row vectors or one-row matrices and vice versa transposition of a row vector yields a

column vector. If no confusion is possible row vectors are used instead of column vectors

in order to save space.

88 Peter Schuster

probability distribution

P (X1 = x1, . . . ,Xn = xn) = p(x1, . . . , xn) = p(x) .

This multivariate the normal probability density can be written as

ϕ(x) =1√

(2π)n |Σ|exp(−1

2(x− ν)′ Σ−1 (x− ν)

).

The vector ν consists of the (raw) first moments along the different coordi-

nates, ν = (ν1, . . . , νn) and the variance-covariance matrix Σ contains the n

variances in the diagonal and the covariances are combined as off-diagonal

elements

Σ =

σ2(X1) Cov(X1,X2) . . . Cov(X1,Xn)Cov(X2,X1) σ2(X2) . . . Cov(X2,Xn)

......

. . ....

Cov(Xn,X1) Cov(Xn,X2) . . . σ2(Xn)

,

which is symmetric by definition of covariances: Cov(Xi,Xj) = Cov(Xj,Xi).Mean and variance are given by ν = ν and the variance-covariance matrix

Σ, the moment generating function expressed in the dummy vector variable

s = (s1, . . . , sn) is of the form

M(s) = exp (ν ′s) · exp

(1

2s′Σs

),

and, finally, the characteristic function is given by

φ(s) = exp (ı.ıν ′s) · exp

(−1

2s′Σs

)

Without showing the details we remark that this particulary simple char-

acteristic function implies that all moments higher than order two can be

expressed in terms of first and second moments, in particular expectation

values, variances, and covariances. To give an example that we shall require

later in subsection 3.5.2, the fourth order moments can be derived from

E(X1,X2,X3,X4) = σ12σ34 + σ41σ23 + σ13σ24 and

E(X 41 ) = 3 σ2

11 .(2.84)


Normal and binomial distribution. The normal distribution is of general

importance since it may be derived, for example, from the binomial distri-

bution

Bk(n, p) =

(n

k

)pk (1− p)n−k , 0 ≤ k ≤ n ,

through extrapolation to large values of n at constant p.28 As a matter of

fact the expression normal distribution originated from the idea that many

distributions can be transformed in a natural way for large n to yield the

distribution Φ(x). The transformation from the binomial distribution to the

normal distribution is properly done in two steps (see also [5, pp.210-217]):

(i) At first we make the binomial distribution comparable by shifting the

maximum towards x = 0 and adjusting the width (figures 2.18 and 2.19).

For 0 < p < 1 and q = 1 − p we define a new variable ξk in order to

replace the discrete variable k.29 The new variables, X ∗k and S∗

n =∑n

k X ∗k , are

centered and adjusted to the standard Gaussian, ϕ(x) = exp(−x2/2)/√

2π,

by making use of the expectation value, E(Sn) = np, and the standard

deviation, σ(Sn) =√npq, of the binomial distribution:

ξk =k − np√npq

; 0 ≤ k ≤ n .

We assume now an arbitrary but fixed positive constant c. In the range of k

defined by |ξk| ≤ c we approximate

(n

k

)pkqn−k ≈ 1√

2πnpqe−ξ

2k/2 .

The convergence is uniform with respect to k in the range specified above.

(ii) The limit n→∞ is performed by means of the deMoivre-Laplace theo-

rem which proofs convergence of the (centered and adjusted) distribution of

the random variable S∗n towards the normal distribution Φ(x) on any finite

28This is different from the extrapolation in the previous subsection because the limit

limn→∞Bk(n, α/n) = πk(α) leading to the Poisson distribution was performed in the limit

of vanishing p = α/n.29The new variable ξk depends also on n, but for short we dispense from a second

subscript.

90 Peter Schuster

Figure 2.19: Normalization of the binomial distribution. The figure shows

a symmetric binomial distribution B(20, 12), which is centered around µ = ν = 10

(black). The transformation yields a binomial distribution centered around the

origin with unit variance: σ = σ2 = 1 (red). The grey and the pink continu-

ous curves are normal distributions ϕ = exp(− (x − ν)2/(2σ2)

)/√

2πσ2 with the

parameters (ν = 10, σ2 = np(1− p) = 5) and (ν = 0, σ2 = 1), respectively.

interval. For an arbitrary constant interval ] a, b] with a < b, we have

limn→∞

P

((Sn −

np√npq

)∈ ]a, b]

)=

1√2π

∫ b

a

e−x2/2 dx . (2.85)

In the proof the definite integral∫ baϕ(x)dx is partitioned into n (small) seg-

ments like in Riemannian integration: The segments still reflect the discrete

distribution. In the limit n→∞ the partition becomes finer and eventually

converges to the continuous function described by the integral. A comparison

of figures 2.18 and 2.19 shows that the convergence is particularly effective in

the symmetric case, p = q = 0.5, where only minor differences are observable

already for n = 20. (see also the next subsection 2.7.7).


2.7.7 Central limit theorem and the law of large numbers

The central limit theorem, in essence, is a more general formulation of the just

discussed convergence of the binomial distribution to the normal distribution

in the limit of large n, as expressed by the deMoivre-Laplace theorem (2.85).

The outcome of a number of successive trials described by random variables

Xj is summed up to yield the random variable

Sn = X1 + X2 + . . .+ Xn , n ≥ 1 .

The individual random variables Xj are assumed to be independent and

identically distributed. By identically distributed we mean that the variables

have a common distribution which need not be specified. In the previous

subsection we considered the binomial distribution as the outcome of succes-

sive Bernoulli trials. Here, the only assumptions concerning the distributions

P (Xj ≤ x) = Fj(x) and P (Sn ≤ x) = Fn(x) are that the means µ and the

variances σ2 of all random variables Xj are the same and finite.

The first step towards the central limit theorem is again a transformation

of variables shifting the maximum to the origin and adjusting the width of

the distribution:

X ∗j =

Xj − E(Xj)σ(Xj)

and S∗n =

Sn − E(Sn)σ(Sn)

=1√n

n∑

j=1

X ∗j . (2.86)

The values for the individual first and second moments are: E(Xj) = µ,

E(Sn) = nµ, σ(Xj) = σ, and σ(Sn) =√nσ. The transformation is in full

analogy to the one performed with the random variable Sn of the binomial

distribution in the previous subsection and eventually we obtain:

E(X ∗j ) = 0 , σ2(X ∗

j ) = 1 ,

E(S∗n) = 0 , σ2(S∗

n) = 1 .(2.87)

If I is the finite interval ]a, b] then F (I) = F (b)− F (a) for any distribution

function F and we can write the deMoivre-Laplace theorem in the compact

form

limn→∞

Fn(I) = Φ(I) . (2.88)

92 Peter Schuster

Under the generalized conditions the central limit theorem states that for

any interval ]a, b] with a < b the limit

limn→∞

P

(Sn − nµ√nσ

∈ ]a, b]

)=

1√2π

∫ b

a

e−x2/2 dx. (2.89)

is fulfilled.

A proof of the central limit theorem makes use of the characteristic func-

tion for the unit normal distribution (ν = 0, σ2 = 1):

φ(s) = exp(ı.ıν s − 1

2σ2 s2) = e−s

2/2 . (2.90)

Assume that for every s the characteristic function for Sn converges to the

characteristic function φ(s),

limn→∞

φn(s) = φ(s) = e−s2/2 .

Since φn(s) are the characteristic functions associated with an arbitrary dis-

tribution function Fn(x) follows for every x

limn→∞

Fn(x) = Φ(x) =1√2π

∫ x

−∞e−u

2/2du , (2.91)

and in particular the deMoivre-Laplace theorem follows as a special case.

Characteristic functions h(s) of random variables X with mean zero,

ν = 0, and variance one, σ2 = 1, have the Taylor expansion

h(s) = 1 − s2

2

(1 + ε(s)

)with lim

s→0ε(s) = 0

at s = 0 and truncation after the second term. The proof is straightforward

and starts from the full Taylor expansion up to the second term:

h(s) = h(0) + h′(0) s +h′′(0)

2s2(1 + ε(s)

).

From h(s) = E(eı.ısX ) follows by differentiation

h′(s) = E(ı.ıX eı.ısX

)and h′′(s) = E

(−X 2 eı

.ısX )


and hence h′(0) = E(ı.ıX ) = 0 and h′′(0) = E(−X 2) = −1 yielding the

equation given above. Next we consider the characteristic function of S∗n:

E(exp(ı

.ısS∗

n))

= E

(exp(ı.ıs(∑n

j=1X ∗j

)/√n))

The right hand side of the equation can be factorized and yields

E(eı

.ıs(∑n

j=1 X ∗j

)/√n)

= E(eı

.ısX ∗

j /√n)n

= h( s√

n

)n,

where h(s) is the characteristic function of the random variable Xj. Insertion

into the expression for the Taylor series yields now

h( s√

n

)= 1 − s2

2n

(1 + ε

( s√n

)).

Herein s is fixed and n is approaching infinity:

limn→∞

E(eı

.ısS∗

n

)= lim

n→∞

(1 − s2

2n

(1 + ε

( s√n

)))n

= e−s2/2 . (2.92)

For taking the limit in the last step of the derivation we recall the summation

of infinite series:

limn→∞

(1− αn

n

)n= e−α for lim

n→∞αn = α . (2.93)

This is a stronger result than the convergence of the conventional exponential

series, limn→∞(1− α/n)n = e−α. Thus, we have shown that the characteris-

tic function of the normalized sum of random variables, S∗n, converges to the

characteristic function of the unit normal distribution and therefore by equa-

tion (2.91) the distribution Fn(x) converges to the unit normal distribution

Φ(x) and the validity of (2.88) follows straightforwardly.

Summarizing the results of this part we conclude that the distribution

of the sum Sn of every sequence of n independent random variables Xj with

finite variance converges to the normal distribution for n→∞. This conver-

gence is independent of the particular distribution of the random variables

provided all variables follow the same distribution and they have finite mean

µ and variance σ2.

94 Peter Schuster

Contrasting the rigorous mathematical derivation, simple practical appli-

cations used in large sample theory of statistics turn the limit theorem

encapsulated in equation (2.92) into a rough approximation

P (σ√nx1 < Sn − nm < σ

√nx2) ≈ Φ(x2) − Φ(x1) (2.94)

or for the spread around the sample mean m by setting x1 = −x2

P (|Sn − nm| < σ√nx) ≈ 2Φ(x) − 1 . (2.94’)

For practical purposes equation (2.94) has been used in pre-computer time

together with extensive tabulations of the functions Φ(x) and Φ−1(x), which

are still found in statistics textbooks.

The law of large numbers can be derived as a straightforward conse-

quence of the central limit theorem (2.92) [5, pp.227-233]. For any fixed but

arbitrary constant c > 0 we have

limn→∞

P

(∣∣∣∣Snn− µ

∣∣∣∣ < c

)= 1 (2.95)

Related to and a consequence of equation (2.95) is Chebyshev’s inequality

named after the Russian mathematician Pafnuty Chebyshev for random vari-

ables X that have a finite second moment:

P (|X | ≥ c) ≤ E(X 2)

c2(2.96)

is fulfilled for any constant c. We dispense here from a proof that is found

in [5, pp.228-233].

Equation (2.95) is extended to a sequence of independent random vari-

ables Xj with different expectation values and variances, E(Xj) = µ(j) and

σ2(Xj) = σ 2j , with the restriction that there exists a constant Σ2 < ∞ such

that σ 2j ≤ Σ2 is fulfilled for all Xj . Then we have for each c > 0:

limn→∞

P

(∣∣∣∣X1 + . . .+ Xn

n− µ(1) + . . .+ µ(n)

n

∣∣∣∣ < c

)= 1 . (2.97)

For the purpose of illustration we consider a Bernoulli sequence of coin tosses

as described by a binomial distribution. First we rewrite equation (2.97) by


the introduction of centered random variables Xj = Xj − µ(j) – we note that

the variables Xj are not normalized with respect to variance, X ∗j = Xj/σ(Xj)

– and their sum Sn =∑n

i=1 Xj with

E(Sn) = 0 and E((Sn)2

)= σ2(Sn) =

n∑

i=1

σ2(Xi) =

n∑

i=1

σ2i ,

make use of boundedness of the variances, σ 2j ≤ Σ2,

E((Sn)2

)≤ nΣ and E

(( Snn

)2)≤ Σ

n,

and find through application of Chebyshev’s inequality (2.96)

P

(∣∣∣∣∣Snn

∣∣∣∣∣ ≥ c

)≤ E

((Sn)2)

)

c2≤ Σ

n c2.

Hence the probability P above converges to zero in the limit n→∞. Second,

insertion of the specific data for the Bernoulli series yields

P

(∣∣∣∣Snn− p∣∣∣∣ ≥ c

)≤ p(1− p)

n c2≤ 1

4nc2,

where the last inequality result from p(1−p) ≤ 1/4 for 0 ≤ p ≤ 1. In order to

make an error smaller than ε, the number of trials has to exceed n ≥ 1/(4c2ε).

Since the Chebyshev inequality provides a rather crude estimate we present

also a sharper bound that leads to the approximation

P

(∣∣∣∣Sn − n p√n p(1− p)

∣∣∣∣ ≥√

n

p(1− p) c)≈ 2

(1− Φ

(√n

p(1− p) c))

.

Eventually, we put η = c√n/√

p (1− p) and find

2(I − Φ(η) ≤ ε

)or Φ(η) ≥ 1− ε

2,

which is suitable for numerical evaluation.

The main message of the law of large numbers is that for a sufficiently

large numbers of independent events the statistical errors in the sum will

vanish and the mean converges to the exact expectation value. Hence, the

law of large numbers provides the basis for the assumption of convergence in

mathematical statistics.

96 Peter Schuster

2.7.8 The Cauchy-Lorentz distribution

The Cauchy-Lorentz distribution is named after the French mathematician

Augustin Louis Cauchy and the Dutch physicist Hendrik Antoon Lorentz

and is important in mathematics and in particular in physics where it occurs

as the solution to the differential equation for forced resonance. In spec-

troscopy the Lorentz curve is used for the description of spectral lines that

are homogeneously broadened. The Cauchy density function is of the form

f(x) =1

π γ· 1

1 +(x−x0

γ

)2 =1

π· γ

(x− x0)2 + γ2, (2.98)

which yields the cumulative distribution function

F (x) =1

πarctan

(x− x0

γ

)+

1

2. (2.99)

The two parameters define the position of the peak, x0, and the width of the

distribution, γ (figure 2.20). The peak height or amplitude is 1/(πγ). The

function F (x) can be inverted

F−1(p) = x0 + γ tan(π(p− 1

2

))(2.99’)

and we obtain for the quartiles and the median the values: (x0−γ, x0, x0+γ).

The characteristic function of the Cauchy distribution is given by

φX (s) = E(eı.ıX s) =

∫ ∞

−∞f(x)eı

.ıs x dx = exp(ı

.ıx0 s − γ|s|) , (2.100)

which is the Fourier transform of the probability density that in turn can be

derived from

f(x) =1

2π

∫ ∞

−∞φX (s)e−ı

.ıx s ds .

We remark that the conventional Fourier transformation is slightly different

through the choice of a factor 2π and the sign in the exponent.

The Cauchy distribution has a well defined median and a well defined

mode given by µ = µ = x0, quantiles can be calculated readily from (2.99’),

but the mean and all other moments do not exist because the integral∫∞−∞ xf(x)dx diverges.


Figure 2.20: The Cauchy-Lorentz probability density. The figure shows

three examples of Cauchy-Lorentz densities, f(x) = γ/ (

π((x − x0)

2 + γ2))

centered around the median x0 = 5. The width of the distribution increases with

γ, which was chosen to be 0.5 (red), 1.0 (black), and 2.0 (blue).

2.7.9 Bimodal distributions

As the name of the bimodal distribution indicates that the density function

f(x) has two maxima. It arises commonly as a mixture of two unimodal

distribution in the sense that the bimodally distributed random variable Xis defined as

Prob(X ) =

P (X = Y1) = α and

P (X = Y2) = (1− α) .

Bimodal distributions commonly arise from statistics of populations that are

split into two subpopulations with sufficiently different properties. The sizes

of weaver ants give rise to a bimodal distributions because of the existence of

two classes of workers [57]. In case the differences are too small as in case of

the combined distribution of body heights for men and women monomodality

is observed [58].

As an illustrative model we choose the superposition of two normal dis-

tributions with different means and variances (figure 2.21). The probability

98 Peter Schuster

Figure 2.21: A bimodal probability density. The figure illustrates a bimodal

distribution modeled as a superposition of two normal distributions (2.101) with

α = 1/2 and different values for mean and variance (ν1 = 2, σ21 = 1/2) and (ν2 =

6, σ22 = 1): f(x) = (

√2e−(x−2)2 + e−(x−6)2/2)

/(2√

2π). The upper part shows the

probability density corresponding to the two modes µ1 = ν1 = 2 and µ2 = ν2 = 6.

Median µ = 3.65685 and mean µ = 4 are situated near the density minimum

between the two maxima. The lower part presents the cumulative probability

distribution, F (x) = 14

(2 + erf

(x− 2

)+ erf

(x−6√

2

)), as well as the construction of

the median. The variances in this example are: µ2 = 20.75 and µ2 = 4.75.


density for α = 1/2 is then of the form:

f(x) =1

2√

2π

(e

−(x−ν1)2

2σ21

/√σ2

1 + e−(x−ν2)2

2σ22

/√σ2

2

). (2.101)

The cumulative distribution function is readily obtained by integration. As in

the case of the normal distribution the result is not analytical but formulated

in terms of the error function, which is available only numerically through

integration:

F (x) =1

4

(2 + erf

(x− ν1√

2σ21

)+ erf

(x− ν2√

2σ22

)). (2.102)

In the numerical example shown in figure 2.21 the distribution function shows

two distinct steps corresponding to the maxima of the density f(x).

As an exercise first an second moments of the bimodal distribution can

be readily computed analytically. The results are:

µ1 = µ =1

2(ν1 + ν2) , µ1 = 0 and

µ2 =1

2(ν2

1 + ν22) +

1

2(σ2

1 + σ22) , µ2 =

1

4(ν1 − ν2)

2 +1

2(σ2

1 + σ22) .

The centered second moment illustrates the contributions to the variance

of the bimodal density. It is composed of the sum of the variances of the

subpopulations and the square of the difference between the two means,

(ν1 − ν2)2.

In table 2.2 we have listed also several other probability densities which

are of importance for special applications. In the forthcoming chapters 4 and

5 dealing with applications we shall make use of them, in particular of the

logistic distribution that describes the stochasticity of growth following the

logistic equation.

100 Peter Schuster

3. Stochastic processes

Systems evolving probabilistically in time can be described and modeled in

mathematical terms by stochastic processes. More precisely, we postulate

the existence of a time dependent random variable X (t) or random vector

~X (t).1 We shall distinguish the simpler discrete case,

Pn(t) = P(X (t) = n

)with n ∈ N0 ,

and the continuous or probability density case,

dF (x, t) = f(x, t) dx = P(x ≤ X (t) ≤ x+ dx

)with x ∈ R .

In both cases an experiment, or a trajectory, is understood as a recording

of the particular values of X at certain times:

T =((x1, t1), (x2, t2), (x3, t3), · · · , (y1, τ1), (y2, τ2), · · ·

). (3.1)

Although it is not essential for the application of probability theory, we shall

assume for the sake of clearness that the recorded values are always time

ordered here with the earliest or oldest values on the rightmost position and

the most recent values at the latest entry on the left-hand side (figure 3.1):2

t1 ≥ t2 ≥ t3 ≥ · · · ≥ τ1 ≥ τ2 ≥ · · · .A trajectory is a sequence of time ordered doubles (x, t).

A general comment on the meaning of variables is required. So far we

have only used the vague notion of scores and not yet specified what kinds of

1At first we need not specify whether X (t) is a simple random variable or a random

vector. Later on, when a distinction between problems of different dimensionality becomes

necessary, we shall make clear, in which sense X (t) is used (variable in one dimension or

vector ~X (t)).2It is worth noticing that the conventional time axis in drawings of stochastic processes

goes in opposite direction, from left to right.

101

102 Peter Schuster

Figure 3.1: Time order in modeling stochastic processes. Time is pro-

gressing from left to right and the most recent event is given by the rightmost

recording at time t1. The Chapman-Kolmogorov equation describing stochastic

processes comes in two forms: (i) the forward equation predicting the future form

past and present and (ii) the backward equation that extrapolates back in time

from present to past.

quantities the random variables (X ,Y ,Z) ∈ Ω describe or what their realiza-

tions in some measurable space, denoted by (x, y, z) ∈ Rn, are. Depending

on the chemical or biological model these variables can be discrete numbers

of particles in ensembles of atoms or molecules or continuous concentrations,

they can be the numbers of individuals in populations or they can be po-

sitions in three-dimensional space when migration processes are considered.

In the last example we shall tacitly assume that the one-dimensional vari-

ables can be replaced by vectors,(X (t),Y(t),Z(t)

)=⇒

( ~X (t), ~Y(t), ~Z(t))

or

(x, y, z) =⇒ (x, y, z) without changing the equations, and if this is not the

case the differences will be stated. In the more involved models of chemical

reaction-diffusion systems the variables will be functions of space and time,

for example X (r, t).

In this chapter we shall present a general formalism to describe stochastic

processes and distinguish between differen classes, in particular drift, diffu-

sion, and jump processes. In essence, we shall use the notation introduced

by Crispin Gardiner [59]. The introduction given here is essentially based

on the two textbooks [6, 16]. A few examples of stochastic processes of gen-

eral importance will be discussed in this chapter in order to illustrate the

formalism. Applications are presented in the following two chapters 4 and 5.


3.1 Markov processes

A stochastic process, as we shall assume, is determined by a set of joint

probability densities the existence of which is given and which determine the

system completely:3

p(x1, t1; x2, t2; x3, t3; · · · ; xn, tn; · · · ) . (3.2)

By the phrase ’the determination is complete’ we mean that no additional

information is needed to describe the progress in terms of a time ordered

series (3.1) and we shall call such a process a separable stochastic pro-

cess. Although more general processes are conceivable, they play little role

in current physics, chemistry, and biology and therefore we shall not consider

them here.

Calculation of probabilities from (3.2) by means of marginal densities

(2.21) and (2.27) is straight forward. For the continuous case we obtain

P (X1 = x1 ∈ [a, b]) =

∫ b

a

dx1

∫∫∫ ∞

−∞dx2dx3 · · ·dxn · · ·

p(x1, t1; x2, t2; x3, t3; · · · ; xn, tn; · · · )

and in the discrete case the result is obvious

P (X = x1) = p(x1, ∗) =∑

xk 6=x1

p(x1, t1; x2, t2; x3, t3; · · · ; xn, tn; · · · ) .

Time ordering allows us to formulate predictions of future values from the

known past in terms of conditional probabilities:

p(x1, t1; x2, t2; · · · |y1, τ1; y2, τ2, · · · ) =p(x1, t1; x2, t2; · · · ; y1, τ1; y2, τ2, · · · )

p(y1, τ1; y2, τ2, · · · ),

with t1 ≥ t2 ≥ · · · ≥ τ1 ≥ τ2 ≥ · · · . In other words, we may compute

(x1, t1), (x2, t2), · · · from known (y1, τ1), (y2, τ2), · · · . Before we derive

3The joint density p is defined in the same way as in equations (2.20) and (2.26)

but with a slightly different notation. In describing stochastic processes we are always

dealing with doubles (x, t), and therefore we separate individual doubles by a semicolon:

· · · ;xk, tk;xk+1, tk+1; · · · .

104 Peter Schuster

a general concept that allows for flexible modeling and tractable stochastic

description of processes we introduce a few common and characteristic classes

of stochastic processes.

3.1.1 Simple stochastic processes

The simplest class of stochastic processes is characterized by complete in-

dependence,

p(x1, t1; x2, t2; x3, t3; · · · ) =∏

i

p(xi, ti) , (3.3)

which implies that the current value X (t) is completely independent of its

values in the past. A special case is the sequence of Bernoulli trials (see Sn in

chapter 2, in particular in subsections 2.2.1 and 2.7.5) where the probability

densities are also independent of time: p(xi, ti) = p(xi), and then we have

p(x1, t1; x2, t2; x3, t3; · · · ) =∏

i

p(xi) . (3.3’)

Further simplification occurs, of course, when all trials are based on the same

probability distribution – for example, if the same coin is tossed in Bernoulli

trials – and then the product is replaced by p(x)n.

The notion of martingale has been introduced by the French mathe-

matician Paul Pierre Levy and the development of the theory of martingales

is due to the American mathematician Joseph Leo Doob. The conditional

mean value of the random variable X (t) provided X (t0) = x0 is defined as

E(X (t)|(x0, t0)

) .=

∫dx p(x, t|x0, t0) .

In a martingale the conditional mean is simple given by

E(X (t)|(x0, t0)

)= x0 . (3.4)

The mean value at time t is identical to the initial value of the process. The

martingale property is rather strong and we shall use it for several specific

situations.


The somewhat relaxed notion of a semimartingale is of importance

because it covers the majority of processes that are accessible to modeling by

stochastic differential equations (section 3.5). A semimartingale is composed

of a local martingale and a cadlag adapted process with bounded variation

X (t) = M(t) + A(t)

A local martingale is a stochastic process that satisfies locally the martingale

property (3.4) but its expectation value 〈M(t)〉 may be distorted at long

times by large values of low probability. Hence, every martingale is a local

martingale and every bounded local martingale is a martingale. In particu-

lar, every driftless diffusion process is a local martingale but need not be a

martingale. An adapted or nonanticipating process is a process that cannot

see in the future. An informal interpretation [60, section II.25] would say: A

stochastic process X (t) is adapted iff for every realization and for every time

t, X (t) is known at time t. Cadlag stands for right-hand continuous with left

limits (For the cadlag property of processes see also section 2.2.2).

More formally the definition of an adapted process reads: For a prob-

ability space (Ω,F , P ) with I being an index set with total order (≤) and

I ∈ N, I ∈ N0, I = [0, t] or I = [0,+∞), F· = (Fi)i∈I being a filtration4 of

the σ algebra F , (S,Σ) being a measurable state space and X : I⊗Ω→ S be-

ing a stochastic process, the process X is said to be adapted to the filtration

(Fi)i∈I if the random variable

Xi : Ω→ S is a (Fi,Σ)-measurable function for each i ∈ I .

The concept of adapted processes is essential, for the Ito stochastic integral,

which requires that the integrand is an adaptive process.

4A filtration is an indexed set Si of subobjects from a given algebraic structure S with

the index i running over some index set I, which is totally ordered with respect to the

condition: i ≤ j, (i, j) ∈ I ⇒ Si ⊆ Sj . Alternatively, in a filtered algebra there is instead

the requirement that the Si are subobjects with respect to certain operations, for example

vector addition, whereas other operations are objects from other structures, for example

multiplication that satisfies Si · Sj ⊂ Si⊕j where the index set is the natural numbers,

I ≡ N (see also [61]).

106 Peter Schuster

Another simple concept assumes that knowledge of the present only is

sufficient to predict the future. It is realized in Markov processes named

after the Russian mathematician Andrey Markov and can formulated easily

in terms of conditional probabilities:

p(x1, t1; x2, t2; · · · |y1, τ1; y2, τ2, · · · ) = p(x1, t1; x2, t2; · · · |y1, τ1) . (3.5)

In essence, the Markov condition expresses more precisely the assumptions

of Albert Einstein and Marian von Smoluchowski in their derivation of the

diffusion process. particular, we have

p(x1, t1; x2, t2; y1, τ1) = p(x1, t1|x2, t2) p(x2, t2|y1, τ1) .

As we have seen in section 2.4 any arbitrary joint probability can be simply

expressed as products of conditional probabilities:

p(x1, t1; x2, t2; x3, t3; · · ·xn, tn) =

= p(x1, t1|x2, t2) p(x2, t2|x3, t3) · · · p(xn−1, tn−1|xn, tn) p(xn, tn) (3.5’)

under the assumption of time ordering t1 ≥ t2 ≥ t3 ≥ . . . tn−1 ≥ tn.

3.1.2 The Chapman-Kolmogorov equation

From joint probabilities also follows that summation of all mutually exclusive

events of one kind eliminates the corresponding variable:

∑

B

P (A ∩ B ∩ C) = P (A ∩ C) .

By the same token we find

p(x1, t1) =

∫dx2 p(x1, t1; x2, t2) =

∫dx2 p(x1, t1|x2, t2) p(x2, t2) .

Extension to three events leads to

p(x1, t1|x3, t3) =

∫dx2 p(x1, t1; x2, t2|x3, t3) =

=

∫dx2 p(x1, t1|x2, t2; x3, t3) p(x2, t2|x3, t3) .


For t1 ≥ t2 ≥ t3 and making use of the Markov assumption we obtain the

Chapman-Kolmogorov equation, which is named after the British geo-

physicist and mathematician Sydney Chapman and the Russian mathemati-

cian Andrey Kolmogorov:

p(x1, t1|x3, t3) =

∫dx2p(x1, t1|x2, t2) p(x2, t2|x3, t3) . (3.6)

In case we are dealing with a discrete random variable N ∈ N0 defined on

the integers we replace the integral by a sum and obtain

P (n1, t1|n3, t3) =∑

n2

P (n1, t1|n2, t2)P (n2, t2|n3, t3) . (3.7)

The Chapman-Kolmogorov equation can be interpreted in two different ways

known as forward and backward equation. In the forward equation the

double (x3, t3) is considered to be fixed and (x1, t1) represents the variable

x1(t1), and the time t1 proceeding in positive direction. The backward equa-

tion is exploring the past of a a given situation: the double (x1, t1) is fixed

and (x3, t3) is propagating backwards in time. The forward equation is bet-

ter suited to describe actual processes, whereas the backward equation is

the appropriate tool to compute the evolution towards given events, for ex-

ample first passage times. In order to discuss the structure of solutions of

equations (3.6) and (3.7), we shall derive the equations in differential form.

Continuity of processes. Before we can do so we need to consider a condi-

tion for the continuity of Markov processes. The process goes from position

z at time t to position x at time t + ∆t. Continuity of the process implies

that the probability of x to be finitely different from z goes to zero faster

than ∆t in the limit lim ∆t→ 0:

lim∆t→0

1

∆t

∫

|x−z|>εdx p(x, t+ ∆t|z, t) = 0 , (3.8)

and this uniformly in z, t, and ∆t. In other words, the difference in proba-

bility as a function of |x − z| converges sufficiently fast to zero and thus no

jumps occur in the random variable X (t).

108 Peter Schuster

Figure 3.2: Continuity in Markov processes. Continuity is illustrated by

means of two stochastic processes of the random variable X (t), the Wiener process

W(t) (3.9) and the Cauchy process C(t) (3.10). The Wiener process describes

Brownian motion and is continuous but almost nowhere differentiable. The even

more irregular Cauchy process is wildly discontinuous.

As two illustrative examples for the analysis continuity we choose in fig-

ure 3.2 the Einstein-Smoluchowski solution of the Brownian motion,5 which

leads to normally distributed probability,

p(x, t+ ∆t|z, t) =1√

4πD∆texp

(−(x− z)2

4D∆t

), (3.9)

and the so-called Cauchy process following the Cauchy-Lorentz distribution,

p(x, t+ ∆t|z, t) =∆t

π

1

(x− z)2 + ∆t2. (3.10)

In case of the Wiener process we exchange the limit and the integral, intro-

5Later on we shall discuss this particular stochastic process in detail and call it a

Wiener process.


duce ϑ = (∆t)−1, perform the limit ϑ→∞, and have

lim∆t→0

1

∆t

∫

|x−z|>εdx

1√4πD

1√∆t

exp

(−(x− z)2

4D∆t

)=

=

∫

|x−z|>εdx lim

∆t→0

1

∆t

1√4πD

1√∆t

exp

(−(x− z)2

4D∆t

)=

=

∫

|x−z|>εdx lim

ϑ→∞

1√4πD

ϑ3/2

exp(

(x−z)24D

ϑ) , where

limϑ→∞

ϑ3/2

1 + (x−z)24D· ϑ + 1

2!

((x−z)2

4D

)2

· ϑ2 + 13!

((x−z)2

4D

)3

· ϑ3 + . . .= 0 .

Since the power expansion of the exponential in the denominator increases

faster than every finite power of ϑ, the ratio vanishes in the limit ϑ → ∞and the value of the integral is zero.

In the second example, the Cauchy process, we exchange limit and integral

as incase of the Wiener process, and perform the limit ∆t→ 0:

lim∆t→0

1

∆t

∫

|x−z|>εdx

∆t

π

1

(x− z)2 + ∆t2=

=

∫

|x−z|>εdx lim

∆t→0

1

∆t

∆t

π

1

(x− z)2 + ∆t2=

=

∫

|x−z|>εdx lim

∆t→0

1

π

1

(x− z)2 + ∆t2=

∫

|x−z|>ε

1

π(x− z)2dx 6= 0 .

The last integral, I =∫∞|x−z|>ε dx/(x−z)2, takes a value of the order I ≈ 1/ε.

Although it is continuous, the curve of Brownian motion is indeed ex-

tremely irregular since it is nowhere differentiable (figure 3.2). The Cauchy-

process curve is also irregular but even discontinuous. Both processes, as

required for consistency, fulfill the relation

limt→0

p(x, t+ ∆t|z, t) = δ(x− z) ,

where δ(·) is the so-called delta-function.6 It is also straightforward to show

that the Chapman-Kolmogorov equation is fulfilled in both cases. Note, that

6The delta-function is no proper function but a generalized function or distribution. It

was introduced by Paul Dirac in quantum mechanics. For more details see, for example,

[62, pp.585-590] and [63, pp.38-42].

110 Peter Schuster

a small but finite difference |x − z| > ε is required to avoid the collapse of

the distribution to the delta-function which is prohibitive for the detection

of continuity.

Differential Chapman-Kolmogorov equation. Now we shall develop a

differential version of the Chapman-Kolmogorov equation which is based on

the continuity condition just discussed. This requires a technique to divide

differentiability conditions into parts corresponding either to continuous mo-

tion under generic conditions or to discontinuous motion. This partitioning

is based on the following conditions for all ε > 0: (3.11)

(i) lim∆t→01

∆tp(x, t+ ∆t|z, t) t = W (x|z, t) , uniformly in x, z, and t

for |x− z| ≥ ε;

(ii) lim∆t→01

∆t

∫|x−z|<ε dx (xi − zi) p(x, t+ ∆t|z, t) = ai(z, t) + O(ε) ,

uniformly in z, and t;

(iii) lim∆t→01

∆t

∫|x−z|<ε dx (xi − zi)(xj − zj) p(x, t+ ∆t|z, t) =

= Bij(z, t) + O(ε) , uniformly in z, and t.

where xi, xj , zi, and zj, refer to particular components of the vectors x and

z, respectively. In (i) W (x|z, t) is the jump probability from z to x at time

t. It is important to notice that all higher-order coefficients of motion Cijk,

defined in analogy to ai in (ii) or Bij in (iii), must vanish by symmetry

considerations [6, p.47-48]. As an example we consider the third order term

defined by

lim∆t→0

1

∆t

∫

|x−z|<εdx (xi − zi)(xj − zj)(xk − zk)p(x, t+ ∆t|z, t) .

= Cijk(z, t) +O(ε).

The function Cijk(z, t) is symmetric in the indices i, j and k, and in order to

check the consequences of this symmetry we define

C(α, z, t) ≡∑

i,j,k

αiαjαkCijk(z, t) ,

which can be written as

Cijk(z, t) =1

3!

∂3

∂αi∂αj∂αkC(α, z, t) .


By comparison with item (iii) we find

|C(α, z, t)| ≤

≤ lim∆t→0

1

∆t

∫

|x−z|<εdx |α(x− z)|

(α(x− z)

)2p(x, t+ ∆t|z, t) + O(ε) ,

≤ |α|ε lim∆t→0

1

∆t

∫

|x−z|<εdx(α(x− z)

)2p(x, t+ ∆t|z, t) + O(ε) ,

= ε|α|(αiαjBij(Z, t) +O(ε)

)k + O(ε) ,

= O(ε) ,

and accordingly, C vanishes. It can be shown by analogous derivation that

all quantities of higher than third order are zero too.

According to the continuity condition (3.8) a Markov process can only

have a continuous path if W (x|z, t) vanishes for all x 6= z. It is sugges-

tive therefore that this function describes discontinuous motion, whereas the

quantities ai and Bij are connected with aspects of continuous motion.

In order to derive a differential version of the Chapman-Kolmogorov equa-

tion we consider the time evolution of the expectation of a function f(z)

which is (at least) twice differentiable:

∂

∂t

∫dx f(x) p(x, t|y, t′) =

= lim∆t→0

1

∆t

(∫dx f(x)

(p(x, t+ ∆t|y, t′) − p(x, t|y, t′)

))=

= lim∆t→0

1

∆t

∫dx

∫dz f(x)p(x, t+ ∆t|z, t)p(z, t|y, t′) −

∫dz f(z)p(z, t|y, t′)

,

where we have used the Chapman-Kolmogorov equation in the positive term

in order to produce the∫dz expression. In the negative term we made use of

the fact that∫

dx p(x, t+∆t|z, t) = 1 because p(x, t+∆t|z, t) is a conditional

probability. The derivation of the expression can be visualized much easier

by considering the association of variables with times: x ↔ t + ∆t, z ↔ t,

and y ↔ t′ with t+∆t > t > t′. Then we obtain the second term just be the

appropriate change in variables: x↔ z.

The integral over dx is now divided into two regions, |x − z| ≥ ε and

|x− z| < ε. Since f(z) is assumed to be twice continuously differentiable, we

112 Peter Schuster

find by means of Taylor expansion up to second order:

f(x) = f(z)+∑

i

∂f(z)

∂zi(xi− zi)+

∑

i,j

1

2

∂2f(z)

∂zi∂zj(xi− zi)(xj − zj)+ |x− z|2R(x, z) .

From the condition of twice continuous differentiability follows |R(x, z)| → 0

as |x−z| → 0 where R(x, z) is the remainder term after the truncation of the

Taylor series. Substitution in the partial time derivative of the expectation

value from above yields:

∂

∂t

∫dx f(x) p(x, t|y, t′) =

= lim∆t→0

1

∆t

( ∫∫

|x−z|<ε

dxdz(∑

i

(xi − zi)∂f(z)

∂zi+∑

i,j

(xi − zi)(xj − zj)∂2f(z)

∂zi∂zj

)×

× p(x, t+ ∆t|z, t) p(z, t|y, t′) + (3.12a)

+

∫∫

|x−z|<ε

dxdz |x− z|2R(x, z) p(x, t+ ∆t|z, t) p(z, t|y, t′) + (3.12b)

+

∫∫

|x−z|<ε

dxdz f(x) p(x, t+ ∆t|z, t) p(z, t|y, t′) + (3.12c)

+

∫∫

|x−z|≥ε

dxdz f(x) p(x, t+ ∆t|z, t) p(z, t|y, t′) − (3.12d)

−∫∫

dxdz f(z) p(x, t+ ∆t|z, t) p(z, t|y, t′)). (3.12e)

In last term in the equation, line (3.12e), the integral over x is simply one,

since p(x, t + ∆t|z, t) is a probability and the integration covers the entire

sample space.

In the following we consider the individual terms separately. As we have

assumed uniform convergence we can take the limit inside the integral and

obtain by means of conditions (ii) and (iii) from (3.11) for the term (3.12a):

∫dz

(∑

i

ai(z)∂f(z)

∂zi+

1

2

∑

i,j

Bij(z)∂2f(z)

∂zi∂zj

)p(z, t|y, t′) + O(ε) .

The next term (3.12b) is a remainder term and vanishes in the limit ε → 0


[6, p.49]:

∣∣∣∣∣∣∣

1

∆t

∫

|x−z|<ε

dx (x− z)2R(x, z) p(x, t+ ∆t|z, t)

∣∣∣∣∣∣∣≤

≤

∣∣∣∣∣∣∣

1

∆t

∫

|x−z|<ε

dx (x− z)2 p(x, t+ ∆t|z, t)

∣∣∣∣∣∣∣

(max

|x−z|<ε

∣∣R(x, z)∣∣)→

→∣∣∣∣∣∑

i,j

Bij(z, t) + O(ε)

∣∣∣∣∣

(max

|x−z|<ε

∣∣R(x, z)∣∣).

From the previously stated requirement of twice continuous differentiability

follows max|x−z|<ε

∣∣R(x, z)∣∣→ 0 as ε→ 0 .

The remaining three terms, (3.12c), (3.12d), and (3.12e) can be combined

and yield:7

∫∫

|x−z|<ε

dx dz f(z)(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)

).

The whole right-hand side of equation (3.12) is independent of ε. Thus we

can take the limit ε→ 0 and find

∂

∂t

∫dz f(z) p(z, t|y, t′) =

=

∫dz

(∑

i

ai(z, t)∂f(z)

∂zi+

1

2

∑

i,j

Bij(z)∂2f(z)

∂zi∂zj

)

p(z, t|y, t′) +

+

∫dz f(z)

—

∫dx(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)

),

where –∫

dx stands for a principal value integral, for example,

limε→0

∫

|x−z|>ε

dxF (x, z) ≡ —

∫dxF (x, z) ,

7Note that we interchanged the variables x and z in the two positive terms (3.12c) and

(3.12d). This is generally admissible since the integration extends over both domains.

114 Peter Schuster

the principal value integral of a function F (x, z). For any realistic process we

can assume that this integral exists. Condition (i) of (3.11) defines W (x|z, t)for x 6= z only (ε = |x − z| > 0), and hence leaves open the possibility

that it becomes infinite at x = z.8 In general, when p(x, t|y, t′) is continuous

and once differentiable, then the principle value integral exists. In the forth-

coming part we shall dispense from spelling out the principal value integral

explicitly, since singular cases like the Cauchy process, for which contour in-

tegration based on complex functions theory is required, are considered only

rarely.

The final step in our derivation is now integration by parts, for which we

recall∫f ′(x)g(x)dx = f(x)g(x) −

∫f(x)g′(x)dx from elementary calculus.

Some careful computation finally yields

∫dz f(z)

∂

∂tp(z, t|y, t′) =

=

∫dz f(z)

(−∑

i

∂

∂ziai(z, t) p(z, t|y, t′) +

+∑

i,j

1

2

∂2

∂zi∂zjBij(z, t) p(z, t|y, t′) +

+


))

+ surface terms .

So far we have not yet specified the range of the integrals. The process under

consideration is assumed to be confined to a region in sample space R ∈ Ω

with some surface S. Clearly, probabilities vanish when variables are outside

R and by definition we have

p(x, t|z, t′) = 0 and W (x|z, t) = 0 unless both x, z ∈ R .

The situation with the functions ai(z, t) and Bij(z, t) is more subtle, since

the conditions on them can lead to discontinuities because the conditional

8This is indeed the case for the Cauchy process (figure 3.2), for which W (x|z, t) =

1/[π(x− z)2].


probability p(x, t+∆t|z, t′) may well change discontinuously as z crosses the

boundary of R. Such a discontinuity could, for example, be the consequence

that no transitions are allowed from outside R to inside R or vice versa.

Integration by parts requires, as initially stated, that ai and Bij are once

and twice differentiable, respectively. In order to avoid problems related to

discontinuous behavior at the surface S we may choose the function f(z) to

be (arbitrary but) non-vanishing only in an arbitrary region R′ = R\S ⊂ R

and according to the definition of f(z) the surface terms vanish necessarily.

Then we have for all z in the interior of R:

∂

∂tp(z, t|y, t′) =

−∑

i

∂

∂zi

(ai(z, t) p(z, t|y, t′)

)+ (3.13a)

+∑

i,j

1

2

∂2

∂zi∂zj

(Bij(z, t) p(z, t|y, t′)

)+ (3.13b)

+


). (3.13c)

This equation has been called the differential Chapman-Kolmogorov

equation by Crispin Gardiner [64]. Precisely, it is the forward equation

since it specifies initial conditions to lie in the past and it describes the

development of the probability density with increasing time. Later on, we

shall also discuss a backward equation.

From a mathematical puristic’s point of view it is not clear from the

derivation given here, that solutions of the differential Chapman-Kolmogorov

equation (3.13) exist or that the solutions of (3.13) are also solutions to the

Chapman-Kolmogorov equation (3.6). It is true, however, that the set of

conditional probabilities obeying equation (3.13) does generate a Markov

process in the sense that the joint probabilities produced satisfy all prob-

ability axioms. It has been shown, however, that a non-negative solution

to the differential Chapman-Kolmogorov equations exists and satisfies the

Chapman-Kolmogorov equation under certain conditions (see [65, Vol.II]):

116 Peter Schuster

(i) a(x, t) = ai(x, t); i = 1, . . . and B(x, t) = Bij(x, t); i, j = 1, . . .are specific vectors and positive semidefinite matrices9 of functions,

respectively,

(ii) W (x|y, t) are non-negative quantities,

(iii) the initial condition has to satisfy p(z, t|y, t) = δ(y − z) which follows

from the definition of of a conditional probability density, and

(iv) appropriate boundary conditions have to be fulfilled.

The boundary conditions are very hard to specify for the full equation but

can be discussed precisely for special cases, for example in the case of the

Fokker-Planck equation [9].

9A positive definite matrix has exclusively positive eigenvalues, λk > 0 whereas a

positive semidefinite matrix has non-negative eigenvalues, λk ≥ 0.


3.2 Classes of stochastic processes

The differential Chapman-Kolmogorov equation describes three classes of

components for stochastic processes which allow to define several important

special cases. These classes refer to the three conditions (i), (ii), and (iii)

discussed in (3.11) as well as combinations derived from them. Here, we shall

be concerned only with the general aspect of classification. Specific examples

will discussed in the forthcoming section 3.4.

3.2.1 Jump process and master equation

For the jump process we consider the last term in the differential Chapman-

Kolmogorov equation (3.13c) and set ai(z, t) = 0 and Bij(z, t) = 0 for all i

and j. The resulting equation is known as master equation:

∂

∂tp(z, t|y, t′) =


). (3.14)

J ( )t

t

X ( )t

Figure 3.3: Jump process. The figure shows a typical trajectory of a jump

process J (t) as described, for example, by a master equation. The random variable

X (t) stays constant except at certain discrete points where the jumps occur.

118 Peter Schuster

In order to illustrate the general process described by the master equation

(3.14) we consider the evolution in a short time interval. For this goal we solve

approximately to first order in ∆t and use the initial condition p(z, t|y, t) =

δ(y− z) representing a sharp probability density at t = 0:10

p(z, t+ ∆t|y, t) = p(z, t|y, t) +∂

∂tp(z, t|y, t) ∆t + . . . ≈

≈ p(z, t|y, t) +∂

∂tp(z, t|y, t) ∆t =

= δ(y− z) +

(W (z|y, t)−

∫dxW (x|y, t) δ(y− z)

)∆t =

=

(1−

∫dxW (x|y, t)∆t

)δ(y− z) + W (z|y, t) ∆t .

In the first term, the coefficient of δ(y− z) is the (finite) probability for the

particle to stay in the original position y, whereas the distribution of particles

that have jumped is given after normalization by W (z|y, t). A typical path

~X (t) thus will consist of constant sections, ~X (t) = const, and discontinuous

jumps which are distributed according to W (z|y, t) (figure 3.3). It is worth

noticing that a pure jump process does occur here even though the variable

~X (t) can take on a continuous range of values.

A highly relevant special case of the master equation is obtained when the

sample space is mapped onto the space of integers, Ω→ Z = ..,−2,−1, 0, 1, 2, ...Then, we can use conditional probabilities rather than probability densities

in the master equation:

∂P (n, t|n′, t′)

∂t=∑

m

(W (n|m, t)P (m, t|n′, t′) − W (m|n, t)P (n, t|n′, t′)

).

(3.14’)

Clearly, the process is confined to jumps since only discrete values of the

random variable ~N (t) are allowed. The master equation on the even more

restricted sample space Ω→ N0 = 0, 1, 2, . . . is of particular importance in

chemical kinetics. The random variable N (t) then counts particle numbers

which are necessarily non-negative integers.

10We recall a basic property of the delta-function:∫f(x)δ(x− y) dx = f(y).


Figure 3.4: Drift and diffusion process. The figure shows a typical trajec-

tory of a drift-and-diffusion process whose probability density is described by a

Fokker-Planck equation. The sample curve D(t) is characterized by drift (red)

and diffusion (pink) of the random variable X (t). The band indicates a confidence

interval of about ν±σ that contains 68.2 % of the points (when they are normally

distributed; see subsection 2.7.6).

3.2.2 Diffusion process and Fokker-Planck equation

The Fokker-Planck equation is in a way complementary to the master

equation since the quantities W (z|x, t) are assumed to be zero and hence

jumps are excluded:

∂p(z, t|y, t′)∂t

=

= −∑

i

∂

∂zi

(ai(z, t) p(z, t|y, t′)

)+

1

2

∑

i,j

∂2

∂zi∂zj

(Bij(z, t) p(z, t|y, t′)

). (3.15)

The process corresponding to the Fokker-Planck equation is a diffusion pro-

cess with a(z, t) being the drift vector and B(z, t) the diffusion matrix.

As a result of the definition given in condition (iii) of (3.11), the diffusion ma-

trix is positive semi-definite and symmetric. The trajectories of the diffusion

120 Peter Schuster

process are continuous as follows directly from W (z|x, t) = 0 in condition (i)

of (3.11).

Making use of the initial condition p(z, t|y, t) = δ(z − y) and neglecting

the derivatives of ai(z, t) and Bij(z, t) for small ∆t we find the approximation

∂p(z, t|y, t′)∂t

= −∑

i

ai(z, t)∂p(z, t|y, t′)

∂zi+

1

2

∑

i,j

Bij(z, t)∂2p(z, t|y, t′)∂zi zj∂

,

which can be solved for small ∆t = t− t′ and obtain in matrix form11

p(z, t+ ∆t|y, t) =1√

(2π)n ∆t

(|B(y, t)|

)−1/2×

× exp

(−1

2

(z− y− a(y, t) ∆t

)′(B(y, t)

)−1(z− y− a(y, t) ∆t

)

∆t

),

which is a normal distribution with a variance-covariance matrix Σ = B(y, t)

and an expectation value ν = a(y, t) ∆t. The general picture thus is a

sample point moving with a systematic drift, whose velocity is a(y, t), and a

superimposed Gaussian fluctuation with a covariance matrix B(y, t):

y(t+ ∆t) = y(t) + a(y(t), t

)∆t + η(t)

√∆t ,

with E(η(t)

)= 0 and E

(η(t)η(t)′

)= B(y, t). Accordingly, this picture gives

(i) trajectories which are always continuous (since, for example, y(t+∆t)→y(t) is fulfilled for ∆t→ 0), and

(ii) trajectories which are nowhere differentiable because of the√

∆t depen-

dence of the fluctuations.12 An example of a typical solution of a Fokker-

Planck equation is shown in figure 3.4.

11For readers not familiar with matrix formalism it is a straightforward exercise to write

down the one-dimensional solution of the problem (See also subsection 2.7.6).12The role of the

√∆t dependence will become clear when we discuss the Wiener process

in subsection 3.4.3.


3.2.3 Deterministic processes and Liouville’s equation

When the differential Chapman-Kolmogorov equation contains only the first

term, all others being zero, the resulting differential equation is a special case

of the Liouville equation

∂p(z, t|y, t′)∂t

= −∑

i

∂

∂zi

(ai(z, t) p(z, t|y, t′)

), (3.16)

which is known from classical mechanics. Equation (3.16) describes a com-

pletely deterministic motion which is a solution of the ordinary differential

equationdx(t)

dt= a

(x(t), t

)with x(y, t′) = y . (3.17)

The (probabilistic) solution of the differential equation (3.16) with the initial

condition p(z, t′|y, t′) = δ(z− y) is p(z, t|y, t′) = δ(z− x(y, t)

).

The proof of this assertion is obtained by direct substitution [6, p.54].

∑

i

∂

∂zi

(ai(z, t) δ

(z− x(y, z)

))=∑

i

∂

∂zi

(ai(x(y, t), t) δ

(z− x(y, t)

)),

=∑

i

(ai(x(y, t), t)

∂

∂ziδ(z− x(y, t)

)),

and∂

∂tδ(z− x(y, t)

)= −

∑

i

∂

∂ziδ(z− x(y, t)

) dxi(y, t)

dt,

and by means of equation (3.17) the last two lines become equal.

If a particle is in a well-defined position y at time t′ it will remain on

the trajectory obtained by solving the corresponding ordinary differential

equation (ODE). Deterministic motion can be visualized therefore as an ele-

mentary form of Markov process, which can be formulated by a drift-diffusion

process with a zero diffusion matrix.

122 Peter Schuster

3.3 Forward and backward equations

Equations which reproduce the time development with respect to initial vari-

ables (y, t′) of the probability density p(x, t|y, t′) are readily derived:

lim∆t′→0

1

∆t′

(p(x, t|y, t′ + ∆t′) − p(x, t′|y, t′)

)=

= lim∆t′→0

1

∆t′

∫dz p(z, t′ + ∆t′|y, t′)

(p(x, t|y, t′ + ∆t′) − p(x, t|z, t′ + ∆t′)

).

Figure 3.5: Illustration of forward and backward equations. The forward

differential Chapman-Kolmogorov equation starts from an initial condition corre-

sponding to the sharp distribution δ(y−z), (y, t′) is fixed (blue), and the probabil-

ity density unfolds with time t ≥ t′ (black). It is well suited for the description of

actual experimental situations. The backward equation, although somewhat more

convenient and easier to handle from the mathematical point of view, is less suited

to describe typical experiments and commonly applied to first passage time or

exit problems. Here, (x, t) is held constant (blue) and the time dependence of the

probability density corresponds to samples unfolding into the past, t′ ≤ t (red).

The initial condition, δ(y − z), in this case is rather a final condition represented

by a sharp final distribution.


Thereby we used the Chapman-Kolmogorov equation in the second term and

the fact that the first term yields 1× p(x, t|y, t′ + ∆t′) on integration.

Under the usual conditions that p(x, t′|y, t′) is continuous and bounded

in x, t, and t′ for a finite range t− t′ > δ > 0 and that all relevant derivatives

exist, we may rewrite the left-hand side of the last equations

lim∆t′→0

1

∆t′

(p(x, t|y, t′ + ∆t′) − p(x, t′|y, t′)

)=

= lim∆t′→0

1

∆t′

∫dz p(z, t′ + ∆t′|y, t′)

(p(x, t|y, t′) − p(x, t|z, t′)

).

Similarly as in section 3.1.2 we can proceed and derive a differential version

of this class of the Chapman-Kolmogorov equation:

∂p(x, t|y, t′)∂t′

= −∑

i

ai(y, t′)∂p(x, t|y, t′)

∂yi− (3.18a)

− 1

2

∑

i,j

Bij(y, t′)∂2p(x, t|y, t′)

∂yi∂yj+ (3.18b)

+

∫dz W (z|y, t′)

(p(x, t|y, t′) − p(x, t|z, t′)

). (3.18c)

This equation is called the backward differential Chapman-Kolmogorov

equation in contrast to the previously derived forward equation 3.13. In

purely mathematical terms the backward equation is (somewhat) better de-

fined than its forward analogue. The appropriate initial condition is

p(x, t|y, t) = δ(x− y) for all t ,

which expresses the fact that the probability density for finding the particle at

position x at time t if it is at y at the same time is δ(x−y), or in other words

the (classical and non-quantum-mechanical) particle can be simultaneously

at x and y if and only if x ≡ y.

The forward and the backward equations are equivalent to each other.

The basic difference concerns the set of variables which is held fixed. In case

on the forward equation we hold (y, t′) fixed, and consequently solutions

exist for t ≥ t′, so that p(x, t|y, t) = δ(x− y) is an initial condition for the

forward equation. The backward equation has solutions for t′ ≤ t and hence

124 Peter Schuster

it expresses development in t′. Accordingly, p(x, t|y, t) = δ(x− y) is a final

condition rather than an initial condition.

Both differential expressions, the forward and the backward equation, are

useful in their own right. The forward equation gives more directly the values

of measurable quantities as functions of the observed time. Accordingly,

it is more commonly used in modeling experimental systems. The backward

equation finds applications in the study of first passage time or exit prob-

lems in which we search for the probability that a particle leaves a region at

a certain time.


3.4 Examples of special stochastic processes

In this section we discuss a few stochastic processes which are of special

importance and, at the same time, well understood in terms of mathematical

analysis.

3.4.1 Poisson process

The Poisson process is commonly used to model certain classes of cumulative

random events. These may be, for example, electrons arriving at an anode,

customers entering a shop or telephone calls arriving at a switch board. The

cumulative number of these events is denoted by the random variable N (t).

The probability of arrival is assumed to be λ per unit time, or λ · ∆t = λ

in a time interval of length ∆t. The master equation for this process is

derived from the conditional probability quantities W (·|·) which we denote

as transition probabilities

W (n+ 1|n, t) = α and otherwise W (m|n, t) = 0 ∀ m 6= n+ 1 . (3.19)

Accordingly, the master equation is of the form

∂P (n, t|n′, t′)

∂t= λ

(P (n− 1, t|n′, t′) − P (n, t|n′, t′)

)(3.20)

and represents a one-sided random walk with a probability α for the walker

to step to the right within a unit time interval.

The increase in the probability to have n recorded events at time t is

proportional to the difference in the probabilities of n − 1 and n recorded

events, because of the elementary processes (n − 1 → n) and (n → n + 1)

of a single arrival, which increase or decrease the probability of n events, re-

spectively. We solve the master equation by introducing the time dependent

characteristic function(see equations (2.59) and (2.59’)

):

φ(s, t) = E(eı.ıs n(t)) =

∑

n

P (n, t|n′, t′) exp(ı.ıs n) .

Now we differentiate φ(s, t) with respect to time and obtain by combining it

126 Peter Schuster

with the master equation

∂

∂tφ(s, t) =

∑

n

∂

∂tP (n, t|n′, t′) eı

.ıs n =

= λ∑

n

(P (n− 1, t|n′, t′) − P (n, t|n′, t′)

)eı

.ıs n =

= λ

(∑

n

P (n− 1, t|n′, t′) eı.ıs(n−1) eı

.ıs − P (n, t|n′, t′) eı

.ıs n

)=

= λ(eı

.ıs − 1

)φ(s, t) .

Since, at first, time t is the only explicit variable it is straightforward to

compute the solution:

φ(s, t) = φ(s, 0) exp(λ (eı

.ıs − 1) t

). (3.21)

It is meaningful to assume that there are no electrons or customers at time

t = 0, which implies P (0, 0) = 1, P (n, 0) = 0 for all n 6= 0, and φ(s, 0) = 1.

We are now obtaining the respective solution

P (n, t|0, 0) = e−λt(λt)n

n!= e−α

αn

n!. (3.22)

With λt = α is our old friend, the Poisson distribution (2.74), which has the

expectation value E(N (t)

)= λt.

In case the probability space is discrete, n ∈ N0, and the initial conditions

are simple, for example P (n, 0) = δ(n) or P (n, 0) = δ(n −m) for t = 0, we

shall prefer a simpler notation for the probability of particle numbers

Pn(t) = Prob(N (t) = n) . (3.23)

With the transition probabilities given in (3.19) we can write equations and

solution of the Poisson process in the following way:

dPn(t)

dt= λ

(Pn−1(t) − Pn(t)

)and Pn(t) =

(λt)n

n!e−λt . (3.24)

This notation will conveniently be used for the majority of stochastic pro-

cesses in chemistry and biology.


As said initially the Poisson process can be viewed also from a slightly

different perspective by considering the arrival times of individual indepen-

dent events as random variables T1, T2, · · · . We shall assume that they are

positive and follow an exponential density %(a, t) = a · e−a·t with a > 0 and∫∞0%(a, t) dt = 1, and thus for each index j we have

P (Tj ≤ t) = 1− e−a·t and thus P (Tj > t) = e−a·t , t ≥ 0 .

Independence of the individual events implies the validity of

P (T1 > t1, . . . , Tn > tn) = P (T1 > t1) · . . . · P (Tn > tn) = e−a(t1+...+tn) ,

which determines the joint probability distribution of the arrival times Tj’s.The expectation value of the inter-arrival times is simply given by E(Tj) = 1

a.

Clearly, the smaller a is, the longer will be the mean inter-arrival time, and

thus a can be addressed as the intensity of flow. In comparison to the previous

derivation we have a ≡ λ. For S0 = 0 and n ≥ 1 we define by the cumulative

random variable

Sn = T1 + . . . Tn =

n∑

j=1

Tj

the waiting time until the nth arrival. The event I = (Sn ≤ t) implies

that the nth arrival has occurred before time t. The connection between the

arrival times and the cumulative number of arrived objects, N (t), is easily

performed and illustrates the usefulness of the dual point of view:

P (I) = P (Sn ≤ t) = P (N (t) ≥ n) .

More precisely, N (t) is determined by the whole sequence (Tj , j ≥ 1), and

depends on the elements ω of the sample space through the individual arrival

times Tj . In fact, we can compute the number of objects exactly by

N (t) = n = Sn ≤ t − Sn+1 ≤ t = Sn ≤ t ≤ Sn+1 .

We may interpret this equation directly: there are exactly n arrivals in [0, t]

if and only if the arrival n occurs before t and the arrival (n+1) occurs after

128 Peter Schuster

t. For each value of t the probability distribution of the random variable

N (t) is given by

P (N (t) = n) = PSn ≤ t − PSn+1 ≤ t , n ∈ N0 ,

where we used already the initial condition S0 = 0. As we have shown before

this distribution of N (t) is the Poisson distribution π(at) = π(λt) = π(α).

3.4.2 Random walk in one dimension

The random walk in one dimension is now a classical and famous problem

in probability theory. A walker moves along a line and takes steps to the

left or to the right with equal probability and length `. The position of the

walker is thus n ` with n being an integer, n ∈ Z. The first problem to solve

is the computation of the probability that the walker reaches a given point

at distance n ` from the origin within a predefined time span. For this goal

we encapsulate the random walk in a master equation and try to find an

analytical solution.

For the master equation we have the following transition probabilities per

unit time:

W (n+1|n, t) = W (n−1|n, t) = ϑ and W (m|n, t) = 0 ∀m 6= n+1, n−1 .(3.25)

Hence, the master equation describing the evolution of the probability for

the walker to be in position n ` at time t when he started at n′ ` at time t′ is

∂P (n, t|n′, t′)

∂t= ϑ

(P (n+ 1, t|n′, t′) + P (n− 1, t|n′, t′) − 2P (n, t|n′, t′)

).

(3.26)

As for the Poisson process, the master equation can be solved by means of

the time dependent characteristic function(see equations (2.59) and (2.59’)

):

φ(s, t) = E(eı.ıs n(t)) =

∑

n

P (n, t|n′, t′) exp(ı.ıs n) . (3.27)

Combining (3.26) and (3.27) yields

∂φ(s, t)

∂t= ϑ

(eı

.ıs + e−ı

.ıs)φ(s, t) .


Figure 3.6: Probability distribution of the random walk. The figure

presents the conditional probabilities P (n, t|0, 0) of a random walker to be in posi-

tion n ∈ Z at time t for the initial condition to be at n = 0 at time t = t0 = 0. The

n-values of the individual curves are: n = 0 (black), n = 1 (blue), n = 2 (purple),

and n = 3 (red). Parameter choice: ϑ = 1.

Accordingly, the solution for the initial condition n′ = 0 at t′ = 0 is

φ(s, t) = φ(s, 0) exp(ϑ t (eı

.ıs + e−ı

.ıs − 2)

)= exp

(ϑ t (eı

.ıs + e−ı

.ıs − 2)

).

Comparison of the coefficients for individual powers of eı.ıs yields the individ-

ual conditional probabilities:

P (n, t|0, 0) = In(4ϑt) e−2ϑt , n ∈ Z or

Pn(t) = In(4ϑt) e−2ϑt , n ∈ Z for Pn(0) = δ(n) . (3.28)

where the pre-exponential term is written in terms of modified Bessel func-

tions Ik(θ) with θ = 4ϑt, which are defined by

Ik(θ) =

∞∑

j=0

(θ/4)2j+k

j!(j + k)!. (3.29)

It is straightforward to calculate first and second moments from the charac-

teristic function φ(s, t) by means of equation (2.73) and the result is:

E(N (t)

)= n0 and σ2

(N (t)

)= 2ϑ (t− t0) . (3.30)

130 Peter Schuster

The expectation value is constant and coincides with the starting point of

the random walk and the variance increases linearly with time.

In figure 3.6 we illustrate the probabilities Pn(t) by means of a concrete

example. The probability distribution is symmetric for a symmetric initial

condition Pn(0) = δ(n) and hence Pn(t) = P−n(t). For long times the proba-

bility density P (n, t) becomes flatter and flatter and eventually converges to

the uniform distribution over the spatial domain. In case n ∈ Z all probabil-

ities vanish: limt→∞ Pn(t) = 0 for all n.

In the random walk we may also discretize time in the sense that the

walker takes a step precisely every time interval τ . Then, time is a discrete

variable, t = m · τ with m ∈ N0. In addition we assume the random walk to

be symmetric in the sense that steps to the left and to the right are taken

with equal probability and find:

P(n, (m+ 1)τ |n′,m′τ

)=

1

2

(P (n+ 1,mτ |n′,m′τ) + P (n− 1,mτ |n′,m′τ)

).

For small τ the continuous and the discrete process represent approximations

to each other with t = mτ , t′ = m′ τ , and ϑ = 12τ−1. The transition

probability per unit time, ϑ, in the master equation model corresponds to

one half of the inverse waiting time, τ , in the discrete model. Again, it is

straightforward to apply the same generating function – with m = t/τ –

which leads to the solution

φ(s,m) =

(1

2

(eı

.ıs + e−ı

.ıs))m

and finally to the probability distribution

P (n,m τ |0, 0) =

(1

2

)mm!

((m− n2

)!(m+ n

2

)!

)−1

, (3.31)

which is also known as the Bernoulli distribution.

It is also straightforward to consider the continuous time random walk

in the limit of continuous space. This is achieved by setting the distance

traveled to x = n ` and performing the limit `→ 0. For that purpose we can

start from the characteristic function of the distribution in x,

φ(s, t) = E(eı

.ısx)

= φ(`s, t) = exp(ϑ (eı

.ı`s + e−ı

.ı`s − 2)

),


and take the limit of infinitesimally small steps, lim `→ 0:

lim`→0

exp(ϑ t (eı

.ı`s + e−ı

.ı`s − 2) t

)=

= lim`→0

exp(ϑ t (−`2s2 + . . .)

)= exp(−s2Dt/2) ,

where we used the definition D = 2 lim`→0(`2d). Since this is the character-

istic function of the normal distribution we have for the density (2.79):

p(x, t|0, 0) =1√

2πDtexp(−x2/2Dt

). (2.79)

We could also have proceeded directly from equation (3.26) and expanded

the right-hand side as a function of x up to second order in ` which gives

∂p(x, t|0, 0)

∂t=

D

2

∂2

∂x2p(x, t|0, 0) , (3.32)

where D stands again for 2 lim`→0(`2ϑ). This equation will be considered in

detail in the next section 3.4.3, which is dealing with the Wiener process.

3.4.3 Wiener process and the diffusion problem

The Wiener process named after the American mathematician and logician

Norbert Wiener is fundamental in many aspects. It is synonymous to Brown-

ian motion or white noise and describes among other things random fluctua-

tions caused by thermal motion. From the point of view of statistic processes

the Wiener process is the solution of the Fokker-Planck equation in one ran-

dom variable, W(t) and the probability

P (W(t) ≤ w′) =

∫ w′

−∞p(w, t) dw ,

with drift coefficient zero and diffusion coefficient D = 1. This equation

reads:∂p(w, t|w0, t0)

∂t=

1

2

∂2

∂w2p(w, t|w0, t0) . (3.33)

We solve for the initial condition on the conditional probability p(w, t0|w0, t0) =

δ(w − w0) by using the characteristic function

φ(s, t) =

∫dw p(w, t|w0, t0) exp(ı

.ıs w) ,

132 Peter Schuster

which fulfils ∂φ(s, t)/∂t = −12s2 φ(s, t) as can be shown by applying integra-

tion in parts twice and making use that p(w, t|w0, t0) like every probability

density has to vanish in the limits w → ±∞ and the same is true for the first

partial derivative (∂p/∂w). Next we compute the characteristic function by

integration:

φ(s, t) = φ(s, t0) · exp(−1

2s2 (t− t0)

). (3.34)

With the initial condition φ(s, t0) = exp(ı.ıs w0) we complete the characteris-

tic function

φ(s, t) = exp(ı.ıs w0 −

1

2s2 (t− t0)

)(3.35)

and eventually obtain the probability density through inversion of Fourier

transformation

p(w, t|w0, t0) =1√

2π (t− t0)exp

(−(w − w0)

2

2 (t− t0)

). (3.36)

The density function is a normal distribution with an expectation value of

E(W(t)

)= w0 = ν and variance of E

((W(t) − w0

)2)= t − t0 = σ(t)2, so

that an initially sharp distribution spreads in time as illustrated in figure 3.7.

The Wiener process may be characterized by three important features:

(i) irregularity of sample path,

(ii) non-differentiability of sample path, and

(iii) independence of increment.

Although the mean value E(W(t)

)is well defined and independent of time,

w0, in the sense of a martingale, the mean square E(W(t)2

)becomes infinite

as t→∞. This implies that the individual trajectories, W(t), are extremely

variable and diverge after short time (see, for example, the three trajectories

of the forward equation in figure 3.5). We shall encounter such a situation

with finite mean but diverging variance also in biology in the case of multi-

plication as a birth and death process (chapter 5): Although the mean is well

defined it looses its value in practice when the standard deviation becomes

much larger than the expectation value.


Figure 3.7: Probability density of the Wiener process. In the figure we

show the conditional probability density of the Wiener process which is identical

with the normal distribution (figure 2.17),

p(w, t|w0, t0) = exp(−(w −w0)

2/(2(t− t0)

))/√

2π(t− t0).The values used are w0 = 5 and t − t0 =0.01 (red),0.5 (purple), 1.0 (violet), 2.0

(blue). The initially sharp distribution, p(w, t|w0, t0) = δ(w − w0) spreads with

increasing time until it becomes completely flat in the limit t→∞.

Continuity of sample paths of the Wiener process has been demonstrated

already in subsection 3.1.2. In order to show that the trajectories of the

Wiener process are not differentiable we consider the probability

P

(∣∣∣∣W(t+ h)−W(t)

h

∣∣∣∣ > k

)= 2

∫ ∞

k h

dw1√2πh

exp(−w2/2h) ,

which can be readily computed from the conditional probability (3.36). In

the limit h→ 0 the integral becomes 12

and the probability is one. The result

implies that, no matter what finite k we choose, |(W(t+ h)−W(t)

)/h| will

almost certainly be greater than this value. In other words, the derivative of

the Wiener process will be infinite with probability one and the sample path

is not differentiable.

Diffusion is closely related to the Wiener process and hence it is important

to proof statistical independence of the increments of W(t). Since we are

134 Peter Schuster

dealing with a Markov process we can write the joint probability

p(wn, tn;wn−1, tn−1;wn−2, tn−2; . . . ;w0, t0) =

n−1∏

i=0

p(wi+1, ti+1|wn, tn) p(w0, t0) .

Now we express the conditional probabilities in terms of (3.36) and find

p(wn, tn;wn−1, tn−1;wn−2, tn−2; . . . ;w0, t0) =

n−1∏

i=0

exp(− (wi+1−wi)2

2(ti+1−ti)

)

√2π(ti+1 − ti)

p(w0, t0) .

We simplify notation by writing new variables ∆Wi ≡ W(ti)−W(ti−1) and

∆ti ≡ ti − ti−1. The joint probability density for W (tn) becomes now,13

p(∆wn; ∆wn−1; ∆wn−2; . . . ; ∆w1; ∆w0) =

n∏

i=1

exp(−∆w2

i

2 ∆ti

)

√2π∆ti

p(w0, t0) ,

where the factorization shows independence of the variables ∆Wi of each

other and of W (t0).

The Wiener process is readily extended to higher dimension. For the

multivariate Wiener process, defined as

~W(t) =(W1(t), . . . ,Wn(t)

)(3.37)

satisfying the Fokker-Planck equation

∂p(w, t|w0, t0)

∂t=

1

2

∑

i

∂2

∂w2i

p(w, t|w0, t0) . (3.38)

The solution is a multivariate normal density

p(w, t|w0, t0) =1√

2π (t− t0)exp

(−(w−w0)

2

2 (t− t0)

). (3.39)

with mean E(~W(t)

)= w0 and variance-covariance matrix

(Σ)ij

= E((Wi(t)− w0i

)(Wj(t)− w0j

))= (t− t0) δij ,

where all off-diagonal elements – the covariances – are zero. Hence, Wiener

processes along different Cartesian coordinates are independent.

13Since we shall refer frequently to the Wiener process in the forthcoming section 3.5,

the notationW(t) will be replaced by W (t) and the expectation value E(·) by 〈·〉 for better

appearance.


3.4.4 Ornstein-Uhlenbeck process

All three examples of stochastic processes discussed so far had one feature in

common: All individual trajectories diverged to +infinity (Poisson-Process)

or ±infinity (random walk and Wiener process) in the long time limit and

no stationary solutions exist. In this subsection we shall consider a general

stochastic process that allows for the approach to a stationary solution, the

Ornstein-Uhlenbeck process named after the American physicists George Uh-

lenbeck and Leonard Ornstein [66]. Many further examples of processes with

stationary solutions will follow in chapters 4 and 5. The Ornstein-Uhlenbeck

process is obtained through addition of a drift term to the Wiener process:

∂p(x, t|x0, 0)

∂t=

∂

∂x

(kx p(x, t|x0, 0)

)+

1

2D

∂2

∂x2p(x, t|x0, 0) . (3.40)

In order to solve equation (3.40) for the probability density we make use of

the characteristic function that fulfils the equation

φ(s) =

∫ ∞

−∞eı

.ı s x p(x, t|x0, 0) dx , (3.41)

which is converted into the partial differential equation

∂φ

∂t+ k s

∂φ

∂s=

1

2Ds2 φ .

For the boundary condition p(x, 0|x0, 0) = δ(x − x0) one can calculate the

solution for the characteristic function

φ(s, t) = exp

(−Ds

2

4k

(1− e−2kt

)+ ı

.ısx0 e

−kt). (3.42)

The characteristic function corresponds to a normal distribution and can be

used to calculate the moments of the Ornstein-Uhlenbeck process:

E(X (t)

)= µ = x0 exp(−kt) and

σ2(X (t)

)= µ2 =

D

2k

(1− exp(−2kt)

).

(3.43)

The expectation value E(X (t)

)decreases exponentially from X (0) = x0 to

limt→∞E(X (t)

)= µ = 0. In other words, all trajectories start from the same

136 Peter Schuster

point X (0) = x0 and converge to a final distribution around the mean µ = 0.

Accordingly the variance is initially zero, σ2(X (0)

)= 0 and increases with

time until it reaches the value limt→∞ σ2(X (t)

)= D

/(2k). This behavior is

in contrast to the previously discussed stochastic processes, in particular in

contrast to the Wiener process, where the variance diverges.

The stationary solution of the Ornstein-Uhlenbeck process, p(x), can be

readily computed from (3.40) by putting ∂p(x, t|x0, 0)/∂t = 0, which yields

the differential equation

d

dx

(kx p(x) +

1

2D

dp(x)

dx

)

and is a Gaussian with mean x = 0 and variance σ2 = D/2k

p(x) =

√k

πDexp

(−kx

2

D

). (3.44)

An explicit solution can be derived by means of stochastic differential equa-

tions (see section 3.5).

One additional point concerning the probability density and the mean of

the Ornstein-Uhlenbeck process is worth to be stressed: The mean of the sta-

tionary can easily be shifted by replacing kx by k(x−m) in equation (3.40)

where m = E(X), the mean of the stationary distribution. The drift term –

the first term on the right-hand side of the equation – prevents the distribu-

tion to become completely flat in the long-time limit but still the distribution

extends from −∞ to +∞ and there are individual trajectories (of measure

zero) that diverge.


3.5 Stochastic differential equations

The idea of stochastic differential equations goes back to the French math-

ematician Paul Langevin who conceived an equation named after him that

allows for the introduction of random fluctuations into conventional differ-

ential equations [67]. The idea was to find a sufficiently simple approach

to model successfully Brownian motion. In its original form the Langevin

equation

md2r

dt2 = − γ dr

dt+ ξ(t) (3.45)

describes the motion of a Brownian particle where r(t) and v(t) = dr/dt

are position and velocity of the particle. Often the Langevin equation is

formulated in terms of the velocities and then is written in the more familiar

formdv(t)

dt= − γ

mv(t) +

1

mξ(t) . (3.45’)

The parameter γ = 6π η r is the friction coefficient according to Stokes law

with η being the viscosity coefficient of the medium and r the size of the

particle, m is the particle mass, and ξ(t) a fluctuating random force. The

analogy of (3.45) to Newton’s equation of motion is evident: The determinis-

tic force, f(x) = −(∂V/∂x) with V (x) being the potential energy, is replaced

by ξ(t).

The forthcoming discussion of stochastic differential equations follows the

presentation by Crispin Gardiner [6, pp.77-96]. In the literature one can find

an enormous variety of more detailed treatises. We mention here only the

monograph [68] and two books that are available on the internet: [61, 69].

3.5.1 Derivation of the stochastic differential equation

Generalization of equation (3.45) yields

dx

dt= a(x, t) + b(x, t) ξ(t) , (3.46)

where x is the variable under consideration, a(x, t) and b(x, t) are functions

defined by the model investigated, and ξ(t) is a rapidly fluctuating term.

From the mathematical point of view we require statistical independence for

138 Peter Schuster

ξ(t) and ξ(t′) iff t 6= t′ and furthermore we assume 〈ξ(t)〉 = 0 since any drift

term can be absorbed in a(x, t), and cast all requirements in the condition

〈ξ(t) ξ(t′)〉 = δ(t− t′) . (3.47)

This assumption has the consequence that σ2(ξ(t)

)is infinite and leads to

the idealized concept of white noise.

The differential equation (3.46) makes only sense if it can be integrated

and hence an integral of the form

u(t) =

∫ t

0

ξ(τ) dτ

exists. The assumption that u(t) is a continuous function of time has the

consequence that u(t) has the Markov property, which can be proven by

splitting the integral

u(t′) =

∫ t

0

ξ(τ) dτ +

∫ t′

t

ξ(τ ′) dτ ′ = limε→0

(∫ t−ε

0

ξ(τ) dτ

)+

∫ t′

t

ξ(τ ′) dτ ′

and hence for every ε > 0 the ξ(τ) in the first integral is independent of the

ξ(τ ′) in the second integral. By continuity u(t) and u(t′)−u(t) are statistically

independent in the limit ε → 0, and further u(t′) − u(t) is independent of

all u(t′′) with t′′ < t. In other words, u(t′) is completely determined in

probabilistic terms by the value u(t) and no information on any past values

is required: u(t) is Markovian.

According to the differential Chapman-Kolmogorov equation (3.13) and

because of the continuity of u(t), it must be possible to find a Fokker-Planck

equation for the description of u(t) (see subsection 3.2.2), and we can com-

pute the drift and diffusion coefficient(with u(t) = u):

⟨(u(t+ ∆t)− u|(u, t)

)⟩=

⟨∫ t+∆t

t

ξ(τ) dτ

⟩= 0 and

⟨((u(t+ ∆t)− u

)2|(u, t))⟩

=

∫ t+∆t

t

dτ

∫ t+∆t

t

〈ξ(τ)ξ(τ ′)〉 dτ ′ =

=

∫ t+∆t

t

dτ

∫ t+∆t

t

δ(τ − τ ′) dτ ′ = ∆t ,


and we obtain for the drift and diffusion coefficient

A(u, t) = lim∆t→0

⟨(u(t+ ∆t)− u|(u, t)

)⟩

∆t= 0

B(u, t) = lim∆t→0

⟨((u(t+ ∆t)− u

)2|(u, t))⟩

∆t= 1 .

(3.48)

Accordingly, the Fokker-Planck equation we are looking is that of the Wiener

process and we have

∫ t

0

ξ(τ) dτ = u(t) = W (t) .

Considering the consequences of equation (3.48) we are left with the paradox

that the integral of ξ(t) is W (t), which is continuous but not differentiable,

and hence the Langevin equation (3.45) and the stochastic differential equa-

tion (3.46) do not exist in strict mathematical terms. The corresponding

integral equation,

x(t) − x(0) =

∫ t

0

a(x(τ), τ

)dτ +

∫ t

0

b(x(τ), τ

)ξ(τ) dτ , (3.49)

however, is accessible to consistent interpretation. Eventually, we make the

relation to the Wiener process more visible by using

dW (t) ≡ W (t+ dt) − W (t) = ξ(t) dt

and obtain:

x(t) − x(0) =

∫ t

0

a(x(τ), τ

)dτ +

∫ t

0

b(x(τ), τ

)dW (τ) . (3.49’)

The second integral is a stochastic Stieltjes integral the evaluation of which

will be discussed in the next subsection 3.5.2.

Finally, we remark that we have presented here Crispin Gardiner’s ap-

proach and assumed continuity of the function u(t). The result was that ξ(t)

follows the normal distribution. An alternative approach starts out from the

assumption of the Gaussian nature of the probability density ξ(t). It is defi-

nitely a matter of taste, which assumption is preferred but the requirement

of continuity seems more natural.

140 Peter Schuster

t0 t1 t2 t3 t4 t5 t6 ttn-1tn-2tn-3

tnt6t5t4t3t2t1tn-1tn-2

G t( )

Figure 3.8: Stochastic integral. The time interval [t0, t] is partitioned into n

segments and an intermediate point τi is defined in each segment: ti−1 ≤ τi ≤ ti.

3.5.2 Stochastic integration

In this subsection we define the stochastic integral and present practical

recipes for integration (for more details see [70]). Let G(t) be an arbitrary

function of time and W (t) the Wiener process, then the stochastic integral

is defined as a Riemann-Stieltjes integral (2.35) of the form

I(t, t0) =

∫ t

t0

G(τ) dW (τ) . (3.50)

The integral is partitioned into n subintervals, which are separated by the

points ti: t0 ≤ t1 ≤ t2 ≤ · · · ≤ tn−1 ≤ t (figure 3.8). Intermediate points

are defined within the subintervals ti−1 ≤ τi ≤ ti for the evaluation of the

function G(τi) and as we shall see the value of the integral depends on the

position of the τ ’s within the subintervals.

The stochastic integral∫ t0G(τ) dW (τ) is defined as the limit of the partial

sums

Sn =

n∑

i=1

G(τi)(W (ti)−W (ti−1)

)


and it is not difficult to realize that the integral is different for different choices

of the intermediate point τi. As an example we consider the important case

G(τi) = W (τi):

〈Sn〉 =

⟨n∑

i=1

W (τi)(W (ti)−W (ti−1)

)⟩=

=

n∑

i=1

(min(τi, ti) − min(τi, ti−1)

)=

n∑

i=1

(τi − ti−1) .

Next we choose the same intermediate position τ for all subintervals ’i’

τi = α ti + (1− α) ti−1 with 0 ≤ α ≤ 1 (3.51)

and obtain for the sum

〈Sn〉 =

n∑

i=1

(ti − ti−1)α = (t− t0)α .

Accordingly, the mean value of the integral may adopt any value between

zero and (t− t0) depending on the choice of the position of the intermediate

points as expressed by the parameter α.

Ito stochastic integral. The most frequently used definition of the stochas-

tic integral is due to the Japanese mathematician Kiyoshi Ito [71, 72]. The

choice α = 0 or τi = ti−1 defines the Ito stochastic integral of a function G(t)

to be∫ t

t0

G(τ) dW (τ) = limn→∞

n∑

i=1

G(ti−1)(W (ti) − W (ti−1)

), (3.52)

where the limit is taken as the mean square limit (2.32).

As an example we compute the previously discussed integral∫ tt0W (τ) dW (τ)

and find for the sum Sn where we abbreviate W (ti) by Wi:

Sn =n∑

i=1

Wi−1 (Wi − Wi−1) ≡n∑

i=1

Wi−1 ∆Wi =

=1

2

n∑

i=1

((Wi−1 + ∆Wi)

2 − W 2i−1 − ∆W 2

i

)=

=1

2

(W (t)2 − W (t0)

2)− 1

2

n∑

i=1

∆W 2i ,

142 Peter Schuster

where the second line results from: 2∑ab = (a + b)2 − a2 − b2. It is now

necessary to calculate the mean square limit of the second term in the last

line of the equation. For a finite sum we have the expectation values

⟨n∑

i=1

∆W 2i

⟩=∑

i

⟨(Wi −Wi−1)

2⟩

=∑

i

(ti − ti−1) = t − t0 , (3.53)

where the second equality results from the Gaussian nature of the probability

density (3.36):⟨(Wi−Wj)

2⟩

= 〈W 2i 〉−

⟨W 2j

⟩= σ2(Wi)−σ2(Wj) = ti− tj .14

Next we calculate the expectation of the mean square deviation in (3.53):

⟨( n∑

i=1

(Wi −Wi−1)2 − (t− t0)

)2⟩

=

⟨∑

i

(Wi −Wi−1)4 +

+ 2∑

i<j

(Wi −Wi−1)2(Wj −Wj−1)

2−

− 2(t− t0)∑

i

(Wi −Wi−1)2 +

+ (t− t0)2

⟩.

We start the evaluation with the second line and make again use of the

independence of Gaussian variables:

⟨(Wi −Wi−1)

2(Wj −Wj−1)2⟩

= (ti − ti−1)(tj − tj−1) .

According to (2.84) the fourth moment of a Gaussian variable can be ex-

pressed in terms of the variance

⟨(Wi −Wi−1)

4⟩

= 3⟨(Wi −Wi−1)

2⟩2

= 3 (ti − ti−1)2

14For the derivation of this relation we used the fact that the stochastic variables of the

Wiener process at different times are uncorrelated, 〈WiWj〉 = 0 and the variance is

σ2(Wi) =⟨W 2

i

⟩− 〈Wi〉2 =

⟨W 2

i

⟩− ν2.


and insertion into the expectation value eventually yields:⟨( n∑

i=1

(Wi −Wi−1)2 − (t− t0)

)2⟩

=

= 2∑

i

(ti − ti−1)2 +

(∑

i

(ti − ti−1)− (t− t0))(∑

j

(tj − tj−1)− (t− t0))

=

= 2∑

i

(ti − ti−1)2 → 0 as n→∞ .

Accordingly, limn→∞∑

i(Wi −Wi−1)2 = t− t0 in the mean square limit.

Eventually, we obtain for the Ito stochastic integral of the Wiener process:

∫ t

t0

W (τ) dW (τ) =1

2

(W (t)2 −W (t0)

2 − (t− t0)). (3.54)

We remark that the Ito integral differs from the conventional Riemann-

Stieltjes integral where the term t− t0 is absent. An illustrative explanation

for this unusual behavior of the limit of the sum Sn is the fact that the quan-

tity |W (t+ ∆t)−W (t)| is almost always of the order√t and hence – unlike

in ordinary integration – the terms of second order in ∆W (t) do not vanish

on taking th e limit.

It is also worth noticing that the expectation value of the integral (3.54)

vanishes,

⟨∫ t

t0

W (τ) dW (τ)

⟩=

1

2

(⟨W (t)2

⟩−⟨W (t0)

2⟩− (t− t0)

)= 0 , (3.55)

since the intermediate terms 〈Wi−1∆Wi〉 vanish because ∆Wi and Wi−1 are

statistically independent.

Semimartingales (subsection 3.1.1), in particular local martingales are the

most common stochastic processes that allow for straightforward application

of Ito’s formulation of stochastic calculus.

144 Peter Schuster

Stratonovich stochastic integral. As already outlined, the value of a

stochastic integral depends on the particular choice of the intermediate points,

τi. The Russian physicist and engineer Ruslan Leontevich Stratonovich [73]

and the American mathematician Donald LeRoy Fisk [74] developed simul-

taneously an alternative approach to Ito’s stochastic integration, which is

commonly called Stratonovich integration. The intermediate points are cho-

sen such that the unconventional term (t − t0) does not appear any more.

The integrand as a function of W (t) is evaluated precisely in the middle,

namely at the value(ti − ti−1

)/2 and it is straightforward to show that that

the mean square limit converges to the expression for the integral in conven-

tional calculus

∫ t

t0

W (τ) dW (τ) = limn→∞

n∑

i=1

W (ti) +W (ti−1)

2

(W (ti)−W (ti−1)

)=

=1

2

(W (t)2 −W (t0)

2).

(3.56)

It is important to stress that a stand-alone Stratonovich integral has no

relationship to an Ito integral or, in other words, there is no connection

between the two classes of integrals for an arbitrary function G(t). Only

when the stochastic differential equation is known to which the two integrals

refer a formula can be derived that relates one integral to the other.

Nonanticipating functions. The concept of an nonanticipating or adap-

tive process has been discussed in subsection 3.1.1. Here we shall require

this property in order to be able to solve certain classes of Ito stochastic

integrals. The situation we are referring to requires that all functions can be

expressed as functions or functionals15 of the Wiener process W (t) by means

of a stochastic differential or integral equation of the form

x(t) − x(t0) =

∫ t

t0

a(x(τ), τ

)dτ +

∫ t

t0

b(x(τ), τ

)dW (τ) . (3.49’)

A function G(t) is nonanticipating (with respect to t) if G(t) is probabilis-

tically independent of(W (s) −W (t)

)for all s and t with s > t. In other

15A function assigns a value to the argument of the function, x0 → f(x0) whereas a

functional relates a function to the value of a function, f → f(x0).


words, G(t) is independent of the behavior of the Wiener process in the future

s > t. This is a natural and physically reasonable requirement for a solution

of equation (3.49”) because it boils down to the condition that x(t) involves

W (τ) only for τ ≤ t. Examples of important nonanticipating functions are

(i) W (t) ,

(ii)∫ tt0F(W (τ)

)dτ ,

(iii)∫ tt0F(W (τ)

)dW (τ) ,

(iv)∫ tt0G(τ) dτ , when G(t) itself is nonanticipating, and

(v)∫ tt0G(τ) dW (τ), when G(t) itself is nonanticipating.

The items (iii) and (v) depend on the fact that in Ito’s version the stochastic

integral is defined as the limit of a sequence in which G(τ) and W (τ) are

involved exclusively for τ < t.

Three reasons for the specific discussion of nonanticipating functions are

important:

1. Many results can be derived that are only valid for nonanticipating func-

tions,

2. nonanticipating function occur naturally in situations, in which causality

can be expected in the sense that the future cannot affect the presence, and

3. the definition of stochastic differential equations requires nonanticipating

functions.

In conventional calculus we never encounter situations in which the future

acts back on the presence or even on the past.

Several relations are useful and required in Ito calculus:

dW (t)2 = dt ,

dW (t)2+n = 0 for n > 0 ,

dW (t) dt = 0 ,

146 Peter Schuster

∫ t

t0

W (τ)n dW (τ) =1

n+ 1

(W (t)n+1 −W (t0)

n+1)− n

2

∫ t

t0

W (τ)n−1 dτ ,

df(W (t), t

)=

(∂f

∂t+

1

2

∂2f

∂W 2

)dt +

∂f

∂WdW (t) ,

⟨∫ t

t0

G(τ) dW (τ)

⟩= 0 , and

⟨∫ t

t0

G(τ) dW (τ)

∫ t

t0

H(τ) dW (τ)

⟩=

∫ t

t0

〈G(τ)H(τ)〉 dτ

The expressions are easier to memorize when we assign a dimension [t1/2] to

W (t) and discard all terms of order t1+n with n > 0.

At the end of this subsection we are left with the dilemma that the Ito

integral is mathematically and technically most satisfactory but the more

natural choice would be the Stratonovich integral that enables the usage of

conventional calculus. In addition, the noise term ξ(t) in the Stratonovich

interpretation can be real noise with finite correlation time whereas the ide-

alized white noise assumed as reference in Ito’s formalism gives rise to diver-

gence of variances and correlations.

3.5.3 Integration of stochastic differential equations

A stochastic variable x(t) is consistent with an Ito stochastic differential

equation (SDE)

dx(t) = a(x(t), t

)dt + b

(x(t), t

)dW (t) (3.46’)

if for all t and t0 the integral equation (3.49’) is fulfilled. Time is ordered,

t0 < t1 < t2 < · · · < tn = t ,

and the time axis may be assumed to be split into (equal or unequal) incre-

ments, ∆ti = ti+1 − ti. We visualize a particular solution curve of the SDE

for the initial condition x(t0) = x0 by means of a discretized version

xi+1 = xi + a(xi, ti) ∆ti + b(xi, ti) ∆Wi , (3.49”)

wherein xi = x(ti), ∆ti = ti+1 − ti, and ∆Wi = W (ti+1)−W (ti). Figure 3.9

illustrates the partitioning of the stochastic process into a deterministic drift


Figure 3.9: Stochastic integration. The figure illustrates the Cauchy-Euler

procedure for the construction of an approximate solution of the stochastic dif-

ferential equation (3.46’). The stochastic process consists of two different compo-

nents: (i) the drift term, which is the solution of the ODE in absence of diffusion

(red; b(xi, ti) = 0) and (ii) the diffusion term representing a Wiener process W (t)

(blue; a(xi, ti) = 0). The superposition of the two terms gives the stochastic pro-

cess (black). The two lower plots show the two components in separation. The

increments of the Wiener process ∆Wi are independent or uncorrelated. An ap-

proximation to a particular solution of the stochastic process is constructed by

letting the mesh size approach zero, lim ∆t→ 0.

component, which is the discretized solution curve of the ODE obtained by

setting b(x(t), t

)= 0 in equation (3.49”) and stochastic diffusion compo-

nent, which is a random Wiener process W (t) that is obtained by setting

a(x(t), t

)= 0 in the SDE. The increment of the Wiener process in the

stochastic term, ∆Wi, is independent of xi provided (i) x0 is independent

of all W (t)−W (t0) for t > t0 and (ii) a(x, t) is a nonanticipating function of

t for any fixed x. Condition (i) is tantamount to the requirement that any

random initial condition must be nonanticipating.

148 Peter Schuster

A particular solution to equation (3.49”) is constructed by letting the

mesh size go to zero, limn → ∞ implying ∆t → 0. In the construction

of an approximate solution xi is always independent of ∆Wj for j ≥ i as

we verify easily that by inspection of (3.49”). Uniqueness of solutions refers

to individual trajectories in the sense that a particular solution is uniquely

obtained for a given sample function W(t) of the Wiener Process W (t). The

existence of a solution is defined for the whole ensemble of sample functions:

A solution of equation (3.49”) exists if – with probability one – a particular

solution exists for any choice of sample function W(t) of the Wiener process.

Existence and uniqueness of solutions to Ito stochastic differential equa-

tions can be proven for two conditions [68, pp.100-115]: (i) the Lipschitz

condition and (ii) the growth condition. Existence and uniqueness of a

nonanticipating solution x(t) of an Ito SDE within the time interval [t0, t]

require:

(i) Lipschitz condition: there exists a κ such that

|a(x, τ)− a(y, τ)| + |b(x, τ)− b(y, τ)| ≤ κ |x− y|for all x and y and τ ∈ [t0, t], and

(ii) growth condition: a κ exists such that for all τ ∈ [t0, t]

|a(x, τ)|2 + |b(x, τ)|2 ≤ κ2 (1 + |x|2) .

The Lipschitz condition is almost always fulfilled for stochastic differential

equations in practice, because in essence it is a smoothness condition. The

growth condition, however, may often be violated in abstract model equa-

tions, for example, when a solution explodes to infinity. In other words, the

value of x may become infinite in a finite (random) time. We shall encounter

such situations in the applied chapters 4 and 5. As a matter of fact this is

typical model behavior since no population or spatial variable can approach

infinity at finite times in a finite world.

Several other properties known to apply to solutions of ordinary differen-

tial equations apply without major modifications to SDE’s: Continuity in the

dependence on parameters and boundary conditions as well as the Markov

property (for proofs we refer to [68]).


3.5.4 Changing variables in stochastic differential equations

In order to see the effect of a change of variables in Ito’s stochastic differential

equations we consider an arbitrary function: x(t)⇒ f(x(t)

). We start with

the simpler single variable case and then introduce the multidimensional

situation.

Single variable case. Making use of our previous results on nonanticipating

functions we expand df((t))

up to second order in dW (t):

f(x(t)

)= f

(x(t) + dx(t)

)− f

(x(t)

)=

= f ′(x(t)

)dx(t) +

1

2f ′′(x(t)

)dx(t)2 + · · · =

= f ′(x(t)

) (a(x(t), t

)dt + b

(x(t), t

))dW (t) +

1

2f ′′(x(t)

)b(x(t), t

)2dW (t)2 ,

where all terms higher than second order have been neglected. Introducing

dW (t)2 = dt into the last line of this equation we obtain Ito’s formula:

df(x(t)

)=(a(x(t), t

)f ′(x(t)

)+

1

2b(x(t), t

)2f ′′(x(t)

))dt+

+ b(x(t), t

)f ′(x(t), t

)dW (t) .

(3.57)

It is worth noticing that Ito’s formula and ordinary calculus lead to different

results unless f(x(t)

)is linear in x(t) and thus f ′′(x(t)

)vanishes.

Many variable case. The application of Ito’s formalism to many dimen-

sions, in general, becomes very complicated. The most straightforward sim-

plification is the extension of Ito calculus to the multivariate case by making

use of the rule that dW (t) is an infinitesimal of order t1/2. Then we can

show that the following relations hold for an n-dimensional Wiener process

W(t) =(W1(t),W2(t), . . . ,Wn(t)

):

dWi(t) dWj(t) = δij dt ,

dWi(t)2+N = 0 , (N > 0) ,

dWi(t) dt = 0 ,

dt1+N = 0 , (N > 0) .

150 Peter Schuster

The first relation is a consequence of the independence of increments of

Wiener processes along different coordinate axes, dWi(t) and dWj(t). Making

use of the drift vector A(x, t) and the diffusion matrix B(x, t) the multidi-

mensional stochastic differential equation

dx = A(x, t) dt + B(x, t) dW(t) . (3.58)

Following Ito’s procedure we obtain for an arbitrary well-behaved function

f(x(t)

)the result

df(x) =

(∑

i

Ai(x, t)∂

∂xif(x) +

+1

2

∑

i,j

(B(x, t) · B′(x, t)

)ij

∂2

∂xi∂xjf(x)

)dt+

+∑

i,j

Bij∂

∂xif(x) dWj(t) .

(3.59)

Again we observe the additional term introduced through the definition of

the Ito integral.

3.5.5 Fokker-Planck and stochastic differential equations

Next we calculate the expectation value of an arbitrary function f(x(t)

)by

means of Ito’s formula and begin with a single variable:⟨df(x(t)

)⟩

dt=

⟨df(x(t)

)

dt

⟩

=d

dt

⟨f(x(t)

)⟩=

=

⟨

a(x(t), t

)∂f(x(t)

)

∂x+

1

2b(x(t), t

)∂2f(x(t)

)

∂x2

⟩

.

The stochastic variable x(t) has the conditional probability density p(x, t|x0, t0)

and hence we can compute the expectation value by integration – whereby

we simplify the notation f(x) ≡ f(x(t)

)and p(x, t) ≡ p(x, t|x0, t0):

d

dt〈f(x)〉 =

∫dx f(x)

∂

∂tp(x, t) =

=

∫dx

(a(x, t)

∂f(x)

∂x+

1

2b(x, t)2 ∂

2f(x)

∂x2

)p(x, t)


The further derivation follows the procedure we have used in case of the

differential Chapman-Kolmogorov equation in subsection 3.1.2 – in particular

integration by parts, neglect of surface terms and so on – and we obtain∫

dx f(x)∂

∂tp(x, t) =

∫dx f(x)

(− ∂

∂x

(a(x, t) p(x, t)

)+

1

2

∂2

∂x2

(b(x, t)2 p(x, t)

)).

Since the choice of a function f(x) has been arbitrary we can drop it now

and obtain finally a forward Fokker-Planck type equation

∂p(x, t|x0, t0)

∂t= − ∂

∂x

(a(x, t) p(x, t|x0, t0)

)+

+1

2

∂2

∂x2

(b(x, t)2 p(x, t|x0, t0)

).

(3.60)

The probability density p(x, t) thus obeys an equation that is completely

equivalent to the equation for a diffusion process characterized by a drift coef-

ficient a(x, t) and a diffusion coefficient b(x, t) as derived from the Chapman-

Kolmogorov equation. Hence, Ito’s stochastic differential equation provides

indeed a local approximation to a (drift and) diffusion process in probability

space. An example comparing a change form Cartesian in polar coordinates

in an Ito stochastic differential equation and in the corresponding Fokker-

Planck equation is shown in subsubsection 3.6.3.1.

The extension to the multidimensional case based on Ito’s formula (3.59)

is straightforward, and we obtain for the conditional probability density

p(x, t|x0, t0 ≡ p) the Fokker-Planck equation:

∂p

∂t= −

∑

i

∂

∂xi

(Ai(x, t) p

)+

1

2

∑

i,j

∂

∂xi

∂

∂xj

((B(x, t) · B′(x, t)

)i,jp

). (3.61)

Here, we derive one additional property, which is relevant in practice. The

stochastic differential equation,

dx = A(x, t) dt + B(x, t) dW(t) , (3.58)

is mapped into a Fokker-Planck equation that depends only on the matrix

product B ·B′ and accordingly, the same Fokker-Planck equation arises from

all matrices B that give rise to the same product B · B′. Thus, the Fokker-

Planck equation is invariant to a replacement B⇒ B ·S when S is an orthog-

onal matrix: S · S′ = I. If S fulfils the orthogonality relation it may depend

on x(t), but for the stochastic handling it has to be nonanticipating.

152 Peter Schuster

Now, we want to proof the redundancy directly from the SDE and define

a transformed Wiener process

dV(t) = S(t) dW(t) .

The random vector V(t) is a normalized linear combination of Gaussian

variables dWi(t) and S(t) in nonanticipating, and accordingly, dV (t) is it-

self Gaussian with the same correlation matrix. Averages dWi(t) to various

powers and taken at different times factorize and the same is true for the

dVi(t). Accordingly, the infinitesimal elements dV(t) are increments of a

Wiener process: The orthogonal transformation mixes sample path without,

however, changing the stochastic nature of the process.

Equation (3.58) can be rewritten and yields

dx = A(x, t) dt + B(x, t) S′(t) · S(t) dW(t) =

= A(x, t) dt + B(x, t) S′(t) · dV(t) =

= A(x, t) dt + B(x, t) S′(t) · dW(t) ,

since V(t) is as good a Wiener process as W(t) is, and both SDEs give rise

to the same Fokker-Planck equation.


3.6 Fokker-Planck equations

The name Fokker-Planck equation originated from two independent works by

the Dutch physicist Adriaan Daniel Fokker on Brownian motion of electric

dipoles in a radiation field [75] and by the German physicist Max Planck

who aimed at a comprehensive theory of fluctuations [76]. Other frequently

used notations for this equation are Kolmogorov’s forward equation preferred

by mathematicians because Kolmogorov developed the rigorous basis for it

[77] or Smoluchowski equation because of Smoluchowski’s use of the equa-

tion in random motion of colloidal particles. Fokker-Planck equations are

related to stochastic differential equations in the sense that they describe the

(deterministic) time evolution of a probability distribution p(x, t|x0, t0) that

is derived from the ensemble of trajectories obtained by integration of the

stochastic differential equation with different time courses of the underlying

Wiener Process W (t) (subsection 3.5.5).

The Fokker-Planck equation (3.15) is a parabolic partial differential equa-

tion16 and thus its solution requires boundary conditions in addition to

the initial conditions. The boundary conditions are determined by the na-

ture of the stochastic process, reflecting boundary conditions, for example,

conserve the numbers of particles whereas the particles disappear on the

boundaries in case they are absorbing. General boundary conditions may be

much more complex than reflection or absorption but these two simple special

cases may be used to characterize the extremes, permeable and impermeable

boundaries.

16The classification of partial differential equations of the form

Auxx + 2B uxy + C uyy + Dux + E uy + F = 0

where u(x, y) is a function and the subscripts stand for partial differentiation, for example

ux ≡ ∂u/∂x, makes use of the determinant of the matrix Z =

(A B

B C

), det(Z) = AC−B2

and defines an elliptic PDE by the condition Z is positive definite, a parabolic PDE by

det(Z) = 0, and a hyperbolic PDE by det(Z) < 0.

154 Peter Schuster

3.6.1 Probability currents and boundary conditions

For the definition of a probability current and the derivation of bound-

ary conditions we consider a multi-variable forward Fokker-Planck equation

(where we omit the explicit statement of initial conditions)

∂p(z, t)

∂t= −

∑

i

∂

∂ziAi(z, t) p(z, t) +

1

2

∑

i,j

∂2

∂zi∂zjBij(z, t) p(z, t) . (3.62)

We rewrite this equation by introducing a vectorial flux J(z, t) denoted as

probability current that in components is defined as

Ji(z, t) = Ai(z, t) p(z.t) −1

2

∑

j

∂

∂zjBij(z, t) p(z, t) , (3.63)

and introduction into equation (3.62) yields

∂p(z, t)

∂t+∑

i

∂

∂ziJi(z, t) = 0 . (3.64)

This equation can be interpreted as a local conservation condition. By in-

tegration over a volume V with a boundary S we obtain for the probability

that the random variable Z lies in V :

P (V, t) = Prob(Z ∈ V ) =

∫

V

dz p(z, t) .

The time derivative is conveniently formulated by means of the surface inte-

gral∂P (V, t)

∂t=

∫

S

dS nS J(z, t) ,

where nS is a unit vector perpendicular to the surface and pointing outward,

and nS · J is the component of J perpendicular to the surface. The total

change of probability is given by the surface integral of the current J over

the boundary of V .

The sketch in figure 3.10 is used now to show that the surface integral over

the current J for any arbitrary surface S yields the net flow of probability

across this surface. The volume V is split into two parts, V1 and V2, and

the two volumes are separated by the surface S12. Then, V1 is enclosed by


Figure 3.10: Probability current. The figure presents a sketch, which is used

to proof that the probability current (3.63) measures the flow of probability. A

total volume V = V1 + V2 with surface S = S1 + S2 is split by a surface S12 into

two parts, V1 and V2 and Φ12 (red) and Φ21 (blue) are the particle fluxes from V2

to V1 and vice versa. The unit vector n defines the local direction perpendicular

to the differential surface element dS. For the calculation of the probability net

flow, Φ = Φ12 − Φ21 see text.

S1 +S12 and V2 by S2 +S12. In order to compute the net flow of probability

we make use of the fact that the sample path of the stochastic process are

continuous (because a process described by a Fokker-Planck equation is free

of jumps). We denote by Φ12 the particle flow crossing the boundary S12

from V2 to V1, and by Φ21 the flux going in opposite direction from V1 to

V1. Choosing a sufficiently small time interval ∆t the probability of crossing

the boundary S12 from V2 to V1 can be expressed by the joint probability of

being in V2 at time t and in V1 at time t+ ∆t,

Φ12(t,∆t) =

∫

V1

dx

∫

V2

dy p(x, t+ ∆t; y, t) .

The net flow of probability from V2 to V1 is obtained from the difference

between the flows in opposite direction, Φ12 − Φ21, through division by ∆t

and forming the limit ∆t→ 0:

Φ(t) = lim∆t→0

1

∆t

∫

V1

dx

∫

V2

dy(p(x, t+ ∆t; y, t)− p(y, t+ ∆t; x, t)

).

In the limit ∆t = 0 the integrals Φ12(t,∆t) and Φ21(t,∆t) vanish because

∫

V1

dx

∫

V2

dy p(x, t; y, t) = 0 or

∫

V1

dx

∫

V2

dy p(y, t; x, t) = 0

156 Peter Schuster

since the probability of being simultaneously in both compartments is zero,

and we obtain further

Φ(t) =

∫

V1

dx

∫

V2

dy( ∂∂τp(x, τ ; y, t)− ∂

∂τp(y, τ ; x, t)

).

Now we may use the flow version of the Fokker-Planck equation (3.64) and

obtain

Φ(t) = −∫

V1

∑

i

∂

∂xiJi(x, t;V2, t) +

∫

V2

∑

i

∂

∂yiJi(y, t;V1, t) ,

where the integration over V2 or V1 is encapsulated in the definition of the

probability that applies for the flow J(x, t;V2, t), which is calculated accord-

ing to (3.63):

p(x, t;V2, t) =

∫

V2

dy p(x, t; y, t) .

Volume integrals are now converted into surface integrals. The integrals over

the boundaries S2 and S1 vanish (except for sets of measure zero) because

they involve probabilities p(x, t;V2, t) with x neither in V2 nor in its boundary

and vice versa. The only non-vanishing terms are those where the integration

extends over S12 and this yields

Φ(t) = lim∆t→0

1

∆t

∫

V1

dx

∫

V2

dy(p(x, t+ ∆t; y, t)− p(y, t+ ∆t; x, t)

)=

=

∫

S12

dS n ·(J(x, t;V1, t) + J(x, t;V1, t)

)=

=

∫

Σ

dS n · J(x, t) , (3.65)

where Σ is some surface separating two regions and n is a unit vector pointing

from region V2 to region V1.

With the precise definition and the known properties of the probability

current we are now in the position to discuss different types of boundary

conditions.


Reflecting boundary conditions. A reflecting barrier prevents particles

from leaving the volume V and accordingly there is zero net flow across S,

the boundary of V :

n · J(z, t) = 0 for z ∈ S and n normal to S , (3.66)

where J(z, t) is defined by equation (3.63). Since a particle cannot cross S is

must be reflected there and this explains the name reflecting barrier.

Absorbing boundary conditions. An absorbing barrier is defined by the

fact that every particle that reaches the boundary is instantaneously removed

from the system. In other words, since the fate of the particle outside V is not

considered, the barrier absorbs the particle and accordingly the probability

to find a particle at the barrier is zero:

P (z, t) = 0 for z ∈ S . (3.67)

Discontinuities at boundaries. Both classes of coefficients, Aj and Bij

may have discontinuities at the surface S, although the particles are supposed

move freely across the boundary S. In order to allow for free motion both

the probability, p(z), and the normal component of the current, n ·J(z), have

be continuous at the boundary S:

p(z)∣∣S+

= p(z)∣∣S−

and n · J(z)∣∣S+

= n · J(z)∣∣S−

, (3.68)

where the subscripts S+ and S− indicate the limits of the quantities taken

from the left and right-hand side of the surface. The definition of the prob-

ability current in equation (3.63) thus is compatible with discontinuities in

the derivatives of p(z) at the surface S. .

158 Peter Schuster

3.6.2 Fokker-Planck equation in one dimension

The general Fokker-Planck equation in one variable is of the simple form:

∂f(x, t)

∂t= − ∂

∂x

(A(x, t) f(x, t)

)+

1

2

∂2

∂x2

(B(x, t) f(x, t)

). (3.69)

So far we have applied the Fokker-Planck operator always to the conditional

probability

f(x, t) = p(x, t|x0, t0) with p(x, t0|x0, t0) = δ(x− x0) (3.70)

as initial condition. In order to allow for more general initial conditions we

need only to redefine the one time probability

p(x, t) =

∫dx0 p(x, t; x0, t0) ≡

∫dx0 p(x, t|x0, t0) p(x0, t0) , (3.71)

which is compatible with the general initial probability density

p(x, t)∣∣∣t=t0

= p(x, t0) .

In the previous section 3.5 we had shown that the stochastic process described

by the conditional probability (3.70), which satisfies the Fokker-Planck equa-

tion (3.69), is equivalent to the Ito stochastic differential equation

dx(t) = A(x(t), t)

)dt +

√B(x(t), t)

)dW (t) . (3.72)

In a way the two description are complementary to each other. In particu-

lar, perturbation theories derived from the Fokker-Planck equation are very

different from those based on the stochastic differential equation but both a

suitable in their own right for specific applications.

Boundary conditions in one dimensional systems. General boundary

conditions have been discussed in subsection 3.6.1. Systems in one dimension

allow for the introduction of additional special boundary conditions that are

useful for certain classes of problems.

Periodic boundary conditions. An, in principle, infinite system or cyclic sys-

tem is partitioned into identical intervals (of finite size). Then, the stochastic


process takes place on an interval [a, b], the endpoints of which are assumed

to be identical. For a discontinuity at the boundary we obtain:

limx→b−

p(x, t) = limx→a+

p(x, t) and

limx→b−

J(x, t) = limx→a+

J(x, t) .(3.73)

Continuous boundary conditions simply imply that the two functions A(x, t)

and B(x, t) are periodic on the same interval:

A(b, t) = A(a, t) and B(b, t) = B(a, t) or

p(a, t) = p(b, t) and J(b, t) = J(a, t) ,(3.74)

the probability and its derivatives are identical at the endpoints of the inter-

val.

Prescribed boundary conditions. Under the assumption that the diffusion co-

efficient vanishes at the boundary and diffusive motion occurs only for x > a

and further that A(x, t) and√B(x, t) obey the Lipschitz condition at x = a

and B(x, t) is differentiable there, we have

∂B(a, t)

∂t= 0 , the SDE dx(t) = A(x, t) dt +

√B(x, t) dW (t)

has solutions and the nature of the boundary conditions is exclusively deter-

mined by the sign of A(x, t) at x = a,

(i) exit boundary: A(a, t) > 0, if a particle reaches the point x = a it will

certainly proceed out of the interval [a, b] into the open x < a,

(ii) entrance boundary: A(a, t) < 0, if a particle reaches the point x = a it

will certainly return to the region x > a or in other words a particle at

the right-hand side of x = a can never leave the interval, if the particle

is introduced at x = a it will certainly enter the region x > a, and

(iii) natural boundary: A(a, t) = 0, a particle that has reached x = a would

remain there, however, it can be shown that this point cannot even be

reached from x > a and, moreover, a particle introduced there will stay,

and thus this boundary is neither absorbing nor releasing particles.

160 Peter Schuster

The Feller classification of boundaries. William Feller [78] gave very general

criteria for the classification of boundary conditions into four classes: regu-

lar, exit, entrance, and natural. For this goal definitions of four classes of

functions are required:

(i) f(x) = exp

(−2

∫ x

x0

dsA(s)

B(s)

),

(ii) g(x) =2

B(x) f(x),

(iii) h1(x) = f(x)

∫ x

x0

g(s) ds , and

(iv) h2(x) = g(x)

∫ x

x0

f(s) ds .

wherein x0 is fixed and from the interval x0 ∈]a, b[. Now, we write L(x1, x2)

for the set of all functions, which can be integrated on the interval ]x1, x2[

and then the Feller classification is of the form:

(I) regular: if f(x) ∈ L(a, x0) and g(x) ∈ L(a, x0)

(II) exit: if g(x) /∈ L(a, x0) and h1(x) ∈ L(a, x0)

(III) entrance: if g(x) ∈ L(a, x0) and h2(x) ∈ L(a, x0)

(IV) natural: all other cases.

The classification becomes important in the context of stationary solutions

of the one-dimensional Fokker-Planck equation. Many results concerning the

compatibility of stationary solutions with certain classes of boundaries are

self-evident and will be discussed in the next paragraph.

Boundaries at infinity. In principle, all kinds of boundaries can exist at in-

finity but the requirement to obtain a probability density p(x, t) that can

be normalized and is sufficiently well behaved is a severe restriction. These

requirements are

limx→∞

p(x, t) = 0 and limx→∞

∂p(x, t)

∂t= 0 , (3.75)

where the second condition excludes cases in which the probability oscillated

infinitely fast as x → ∞. Accordingly, nonzero current at infinity can only

occur when either A(x, t) or B(x, t) diverges in the limit x→∞.


Only two currents at boundaries x = ±∞ are compatible with conserva-

tion of probability: (i) J(±∞, t) = 0 and (ii) J(+∞, t) = J(−∞, t) corre-

sponding to reflecting and periodic boundary conditions, respectively.

Stationary solutions for homogeneous Fokker-Planck equations. In

a homogeneous process the drift and the diffusion coefficients do not depend

on time and hence the stationary probability density satisfies the ordinary

differential equation

d

dx

(A(x) p(x)

)− 1

2

d2

dx2

(B(x) p(x)

)= 0 or

dJ(x)

dx= 0 , (3.76)

which evidently has the solution

J(x) = constant . (3.76’)

For a process on an interval [a, b] we have J(a) = J(x) = J(b) = J , therefore.

One reflecting boundary implies that the other boundary is reflecting too and

hence the current vanishes J = 0. For boundaries that are not reflecting the

only case satisfying equation (3.76) is periodic boundary conditions fulfilling

(3.73). The conditions for these cases are:

Zero current condition. From J = 0 follows

A(x) p(x) =1

2

d

dx

(B(x) p(x)

)= 0 ,

which can be solved by integration:

p(x) =NB(x)

exp

(2

∫ x

a

dξA(ξ)

B(ξ)

),

where the normalization constant N is determined by the probability con-

servation condition,∫ ba

dx p(x) = 1.

Periodic boundary condition. The nonzero current of periodic boundary con-

ditions fulfils the equation

A(x) p(x) =1

2

d

dx

(B(x) p(x)

)= J .

It is, however, not arbitrary since it is restricted by normalization and the

conditions for periodic boundary conditions, p(a) = p(b) and J(a) = J(b).

162 Peter Schuster

In order to simplify the expressions we define

ψ(x) = exp

(2

∫ x

a

dξA(ξ)

B(ξ)

),

integrate and obtain

p(x)B(x)

ψ(x)=

p(a)B(a)

ψ(a)− 2 J

∫ x

a

dξ

ψ(ξ).

Making use of the boundary condition for the current

J =

(B(b)ψ(b)− B(a)

ψ(a)

)p(a)

∫ ba

dξψ(ξ)

,

and find eventually

p(x) = p(a)

(∫ xa

dξψ(ξ)· B(b)ψ(b)

+∫ bx

dξψ(ξ)· B(a)ψ(a)

B(x)ψ(x)·∫ ba

dξψ(ξ)

)

Infinite range of the stochastic variable and singular boundaries may com-

plicate the situation and a full enumeration and analysis of possible cases

is extremely hard. Commonly one relies on the handling of special cases, a

typical one is given in the next paragraph.

A chemical reaction as model. For the purpose of illustration we con-

sider an autocatalytic chemical reaction, although chemical reactions are

commonly modeled better by master equations,

X + A

k1

−−−→←−−−k2

2X . (3.77)

The stochastic variable X describes the numbers of molecules X and

p(x, t) = P(X (t) = x

)(3.78)

is the corresponding probability density whereby we assume that particle

numbers are sufficiently large in order to justify modeling by continuous


Figure 3.11: “Stationary” probability density of the reaction A+X 2X.

The figure shows the “stationary” solution of the Fokker-Planck equation of the

autocatalytic reaction 3.77 according to equation (3.80). The result is not an

ordinary stationary solution because it is not normalizable.

variables. The reaction system is of special interest since it has an exit

barrier at x = 0 which has a simple physical explanation: If no molecule X is

present in the system no X can be produced.17 The Fokker-Planck equation,

which will be derived in chapter 4, for reaction 3.77 is of the form

∂p(x, t)

∂t=

∂

∂x

((k1ax− k2x(x− 1)

)p(x, t)

)+

+1

2

∂2

∂x2

((k1ax+ k2x(x− 1)

)p(x, t)

)≈

≈ ∂

∂x

((k1ax− k2x

2)p(x, t)

)+

1

2

∂2

∂x2

((k1ax+ k2x

2) p(x, t)).

(3.79)

Reflecting boundaries are introduced at the positions x = α and x = β and

the stationary probability density is computed to be

p(x) =(a+ x)4a−1

xe−2x . (3.80)

17Actually this is an artifact of a system violating thermodynamics. Correct thermo-

dynamic handling of catalysis requires that for every catalyzed reaction the uncatalyzed

reaction is taken into account too. This is AX for the current example and if this is

done properly the singularity at x = 0 disappears.

164 Peter Schuster

This function (figure 3.11) is not normalizable for α = 0. In fact the pole

at x = 0 is a result of absorption occurring there and this becomes evident

when we compute the Fokker-Planck coefficients

B(0, t) = (ax+ x2)∣∣x=0

= 0 ,

A(0, t) = (ax− x2)∣∣x=0

= 0 , and

∂

∂xB(0, t) = (a+ 2x)

∣∣x=0

> 0 ,

(3.81)

which meet the conditions of an exit boundary. The stationary solution is

strictly relevant for α > 0 only. The meaning of the reflecting barrier is quite

simple, whenever a molecule X disappears another one is added instanta-

neously. Despite the mathematical difficulties the stationary solution (3.80)

is a useful representation of the probability distribution in practice except

near the point x = 0, because the time required for all molecules X to disap-

pear is extraordinarily long in real chemical systems and exceeds the duration

of an experiment by many orders of magnitude.

3.6.3 Fokker-Planck equation in several dimensions

Although multidimensional Fokker-Planck equations are characterized by es-

sentially more complex behavior than in the one-dimensional case – in par-

ticular, boundary problems are much more complex and show much higher

variability, some analogies between one and many dimensions are quite useful

and will be reported here.

3.6.3.1 Change of variables

A multidimensional Fokker-Planck equation written in general variables, x =

(x1, x2, . . . , xn)′,

∂p(x, t)

∂t= −

n∑

i=1

∂

∂xi

(Ai(x)p(x, t)

)+

1

2

n∑

i,j=1

∂2

∂xi∂xj

(Bij(x)p(x, t)

), (3.82)

is to be transformed into the corresponding equation for the new variables

ξi = fi(x) (with i = 1, . . . , n). The functions fi are assumed to be inde-

pendent and differentiable. If π(ξ, t)is the probability density for the new


variable, we can obtain it from

π(ξ, t) = p(x, t)

∣∣∣∣∣∣∣∣∣∣

∂x1

∂ξ1

∂x1

∂ξ2. . . ∂x1

∂ξn∂x2

∂ξ1∂x2

∂ξ2. . . ∂x2

∂ξn...

.... . .

...∂xm

∂ξ1∂xm

∂ξ2. . . ∂xm

∂ξn

∣∣∣∣∣∣∣∣∣∣

= p(x, t) · |J | , (3.83)

where J is the Jacobian matrix and |J | the Jacobian determinant of the

transformation of coordinates.

Often, the easiest way to implement the change of variable is make use if

the corresponding stochastic differential equation

dx(t) = A(x) dt + b(x) dW(t) with b(x) · b(x)′ = B(x) , (3.58’)

and then recompute the Fokker-Planck equations for π(ξ, t) from the stochas-

tic differential equations. Commonly, both procedures are quite involved and

the calculations are quite mess unless the use of symmetries or simplifications

facilitate the problem.

Transformation from Cartesian to polar coordinates.. As an example

we consider the Rayleigh process also called Rayleigh fading. The commonly

applied model uses two orthogonal Ornstein-Uhlenbeck processes along the

real and the imaginary axis of an electric field, E = (E1, E2). The stochastic

differential equation,

dE1(t) = − γ E1(t) dt + ε dW1(t) and

dE2(t) = − γ E2(t) dt + ε dW2(t) ,(3.84)

is converted into polar coordinates

E1(t) = α(t) cosφ(t) and E2(t) = α(t) sin φ(t) .

With α(t) = exp(µ(t)

)we can write E1 + ı

.ı E2 = exp

(µ(t) + ı

.ı φ(t)

)and for

the Wiener processes we define

dWα = dW1(t) cosφ(t) + dW2(t) sin φ(t) and

dWφ = − dW1(t) cosφ(t) + dW2(t) sin φ(t) ,

166 Peter Schuster

and obtain for Raleigh fading in polar coordinates

dφ(t) =ε

α(t)dWφ(t) and

dα(t) =

(− γ α(t) +

ε2

2α(t)

)dt + ε dWα(t) .

(3.85)

The interesting result is that the phase angle φ diffused without a drift term.

Now we shall perform the analogous transformation on the Fokker-Planck

equation describing the probability density p(E1, E2, t)

∂p(E1, E2, t)

∂t= γ

(∂

∂E1(E1 p) +

∂

∂E2(E2 p)

)+

1

2ε2(∂2p

∂E21

+∂2p

∂E22

)(3.86)

The transformation of coordinates, (E1, E2) =⇒ (α, φ) with E1 = α cosφ

and E2 = α cosφ yields the following Jacobian determinant

|J | =

∣∣∣∣∂(E1, E2

∂(α, φ)

∣∣∣∣ =

∣∣∣∣∣cos φ sinφ

−α sinφ α cosφ

∣∣∣∣∣ = α ,

and the Laplacian transformed to polar coordinates yields (equation (3.86)

second term on the right-hand side):

∂2

∂E21

+∂2

∂E22

=1

α

∂

∂α

(α∂

∂α

)+

1

α2

∂2

∂φ2.

For the inverse transformation we obtain α =√E2

1 + E22 and φ = tan−1(E2/E1)

and the Jacobian

|J−1| =

∣∣∣∣∂(α, φ

∂(E1, E2)

∣∣∣∣ =

∣∣∣∣∣cosφ − sin φ/α

sinφ cosφ/α

∣∣∣∣∣ = α−1 .

Further calculations are straightforward and yield

∂

∂E1(E1 p) +

∂

∂E2(E2, p) =

= 2 p + E1

(∂p

∂α

∂α

∂E1+∂p

∂φ

∂φ

∂E1

)+ E2

(∂p

∂α

∂α

∂E2+∂p

∂φ

∂φ

∂E2

)=

= 2 p + α∂p

∂α=

1

α

∂

∂α(α2 p) ,


the probability density in polar coordinates becomes

π(α, φ) =

∣∣∣∣∂(E1, E2

∂(α, φ)

∣∣∣∣ p(E1, E2) = α p(E1, E2) .

Collection of all previous results leads to

∂π(α, φ, t)

∂t= − ∂

∂α

((− γ α +

ε2

α

)π

)+ε2

2

(1

α2

∂2π

∂φ2+

∂2π

∂α2

). (3.87)

The Fokker-Planck equation (3.87) corresponds to the two stochastic differ-

ential equations (3.85), which were derived by changing variables according

to Ito’s formalism.

3.6.3.2 Stationary solutions

Here, we give a brief overview over the stationary solutions of many variable

Fokker-Planck equations. Some general aspects of boundary conditions have

been discussed already in subsection 3.6.1 and we just summarize.

For the forward Fokker-Planck equation the reflecting barrier boundary

condition has to satisfy

n · J = 0 for x ∈ S ,

where S is the surface of the domain V that is considered (figure 3.10), n is

a local vector standing normal to the surface and J is the probability current

with the components

Ji(x, t) = Ai(x, t) p(x, t) −1

2

∑

j

∂

∂xjBij(x, t) p(x, t) .

The absorbing barrier condition is

p(x, t) = 0 for x ∈ S .

In reality, parts of the surface may be reflecting whereas other parts are

absorbing. Then, at discontinuities on the surface the conditions

n · J(x, t)1 = n · J(x, t)2 and p1(x, t) = p2(x, t) for x ∈ S

168 Peter Schuster

have to be fulfilled.

The stationarity condition for the multidimensional Fokker-Planck equa-

tion implies that the probability current J (3.63) vanishes for all x ∈ V and

leads to the equation for the stationary probability density p(x):

1

2

∑

j

Bij(x)∂p(x)

∂xj= p(x)

(Ai(x) −

∑

j

∂

∂xjBij(x)

). (3.88)

Under the assumption that the matrix B is invertible this equation can be

rewritten as

∂

∂xilog(p(x)

)=∑

k

B−1ik (x)

(2Ak(x) −

∑

j

∂

∂xjBkj(x)

)≡

≡ Zi(A,B, x) .

(3.89)

The stationarity condition (3.89) cannot be satisfied for arbitrary drift vec-

tors A and diffusion matrices B since the left-hand side of the equation is

a gradient and thus Zi has to fulfill the (necessary and sufficient) condition

for a vanishing curl, (∂Zi/∂xj) = (∂Zj/∂xi). If the vanishing curl condi-

tion is fulfilled, the stationary solution can be obtained by straightforward

integration

p(x) = exp

(∫ x

dξ Z(A,B, x)

).

This condition is often addressed as potential condition, because the gradients

Zi can be associated with the existence of a potential −Φ(x), and is better

illustrated by

p(x) = exp(−Φ(x)

)with Φ(x) = −

∫ x

dξ Z(A,B, x) . (3.90)

Not every Fokker-Planck equation has a stationary solution but if it sustains

one, the solution can be obtained by simple integration.

The Rayleigh process in polar coordinates (subsubsection 3.6.3.1) is used

as an illustrative example. From equation (3.87) we obtain

A =

(−γα + ε2

2α

0

)

, B =

(ε2 0

0 ε2

)

,


Figure 3.12: Detailed balance in a reaction cycle A B C A. In the

cycle of three monomolecular reaction the condition of detailed balance is stronger

than the stationarity condition. The common condition for the stationary state,

d[A]/

dt = d[B]/

dt = d[C]/

dt = 0, requires that the probability currents for

the individual reaction steps are equal, J12 = J23 = J31 = J , whereas detailed

balance is satisfied only when all individual currents vanish, J = 0.

from which we obtain

∑

j

∂

∂xjBα,j = 0 ,

∑

j

∂

∂xjBφ,j = 0 and hence Z = 2B−1 ·A =

(−2γα

ε2+ 1

α

0

),

(∂Zα/∂φ) = 0 and (∂Zφ

/∂α) = 0, and then the stationary solution is of

the form (with N being a normalization constant)

p(α, φ) = exp

(∫ (α,φ)(dαZα + dφZφ

))

=

= N exp

(logα − γα2

ε2

)=

= N α exp

(− γα

2

ε2

).

(3.91)

3.6.3.3 Detailed balance

The stationary solutions of certain Fokker-Planck equations correspond to

the condition of vanishing probability current and this property suggests that

170 Peter Schuster

the condition is a particular version of the physical principle of detailed bal-

ance. In statistical mechanics detailed balance was studied first by Richard

Tolman [79]. A Markov process fulfils detailed balance if in the stationary

state the frequency of every possible transition is balanced by the correspond-

ing reverse transition. A special example showing that detailed balance is

stronger than the stationarity condition is presented in figure 3.12. In an

n membered cycle stationarity is obtained iff all individual probability cur-

rents are equal, J1 = J2 = · · · = Jn = J , whereas detailed balance requires

J = 0. The local flux for a single reaction step is Ji = ki,i+1[Ii]− ki+1,i[Ii+1]

the condition of a constant flux Ji = J can be fulfilled even in case of irre-

versible reactions, for example kj,i = 0 for all i = 1, 2, . . . , n and j = i + 1,

j = mod (n).

Detailed balance in Markov processes. In general, the condition of de-

tailed balance in Markov processes is formulated best by means of time re-

versal visualized as a transformation to reversed variables: xi =⇒ εixi with

εi = ±1 depending whether the variable shows even or odd behavior under

time reversal. The detailed balance requires

p(x, t+ τ ; z, t) = p(ζ, t+ τ ; ξ, t) with

ξ = (ε1x1, ε2x2, . . . ) and ζ = (ε1z1, ε2z2, . . . ) .(3.92)

By setting τ = 0 equation (3.92) we find

δ(x− z) p(z) = δ(ξ − ζ) p(ξ) .

The two delta functions are equal as only simultaneous changes of sign are in-

volved, and hence p(x) = p(ξ) as a consequence of the formulation of detailed

balance. The condition 3.92 can now be rewritten in terms of conditional

probabilities:

p(x, τ |z, 0) p(z) = p(ζ, τ |ξ, 0) p(x) . (3.92’)

Equation (3.92’) has consequences for several stationary quantities and func-

tions. For the mean holds

〈x〉 =⟨ξ⟩

(3.93)


and this has the consequence that all odd variables – variables xi with εi = −1

– have zero mean at the stationary state with detailed balance. For the au-

tocorrelation functions and the spectrum one obtains with ε = (ε1, . . . , εn)′:

G(τ) = εG′(τ)ε′ and S(ω) = εS ′(ω)ε′ ,

with the covariance matrix Σ fulfilling Σ ε = ε′ Σ for G(τ) at τ = 0.

A somewhat different situation arises in cases where the vectorial quan-

tity transforms like an axial vector or pseudovector and represents angular

momentum like mechanical rotation or magnetic fields. Then, there exist

two or more stationary solutions and the condition (3.92) must be relaxed to

pλk(x, t+ τ |z, t) = p εkλk(ζ, t+ τ |ξ, t), (3.92”)

where λ = (λ1, λ2, . . . ) is a vector of constant quantities under rotation which

changes to (ε1λ1, ε2λ2 . . . ) under time reversal. Crispin Gardiner suggests to

call this property time reversal invariance instead of detailed balance [6,

p.145].

Detailed balance in the differential Chapman-Kolmogorov equa-

tion. The necessary and sufficient conditions for a homogeneous Markov

process to have a stationary state that fulfils detailed balance are:

(i) W (x|z) p(z) = W (ζ|ξ) p(x) , (3.94)

(ii) εiAi(ξ) p(x) = −Ai(x) +∑

j

∂

∂xj

(Bij(x) p(x)

), (3.95)

(iii) εiεj Bij(ξ) = Bij(x) . (3.96)

The corresponding conditions for the Fokker-Planck equation are obtained

simply by setting the jump probabilities equal to zero: W (x|z) = 0. A

considerable simplification arises for exclusively even variables because the

conditions simplify to:

(i) W (x|z) p(z) = W (z|x) p(x) , (3.94’)

(ii) Ai(x) p(x) =1

2

∑

j

∂

∂xj

(Bij(x) p(x)

), (3.95’)

(iii) Bij(x) = Bij(x) . (3.96’)

172 Peter Schuster

We remark that condition (3.95’) is identical with the condition of a vanishing

flux at the stationary state equation (3.88) that has been the requirement

for the existence of a potential φ(x).

Special for Fokker-Planck is the partitioning of the drift term into a re-

versible and an irreversible part [80–82]:

Di(x) =1

2

(Ai(x) + εiAi(ξ)

)irreversible drift (3.97)

Ii(x) =1

2

(Ai(x) − εiAi(ξ)

)reversible drift (3.98)

Making use again of the probability formulated by means of a potential,

p(x) = exp(−φ(x)

), the conditions for detailed balance are of the form

εiεj Bij(ξ) = Bij(x) ,

Di(x) − 1

2

∑

j

∂

∂xjBij(x) = − 1

2

∑

j

Bij(x)∂φ(x)

∂xj, and

∑

i

(∂

∂xiIi(x) − Ii(x)

∂φ(x)

∂xi

)= 0 .

Under the assumption that these conditions are fulfilled by the functions

Di(x) and Bij(x) and matrix B is invertible the equation in the middle can

be rewritten in the form

∂Zi∂xj

=∂Zj∂xi

with

Zi =∑

k

B−1ik (x)

(2Dk(x) −

∑

j

∂

∂xjBkj(x)

),

and the stationary probability density p(x) fulfilling detailed balance is of

the form,

p(x) = exp(−φ(x)

)= exp

(∫ x

dz · Z), (3.99)

and can be calculated explicitly as an integral.

Finally, we mention the the reciprocity relations in linear irreversible ther-

modynamics, developed by the Norwegian and American physicist Lars On-

sager and therefore also called Onsager relations, are also a consequence of

detailed balance [83, 84].


3.7 Autocorrelation functions and spectra

Analysis of experimentally recorded or computer created sample path is often

largely facilitated by the usage of additional tools complementing moments

and probability distributions since they can, in principle, be derived from a

single trajectory. These tools are autocorrelation functions and spectra of

random variables (for an extensive treatment of time series analysis see, for

example, [85]). They provide direct insight into the dynamics of the process

as they deal with relations between sample points collected at different times.

The autocorrelation function of the random variable X (t) is a measure

of the influence the value x recorded at time t has on the measurement of

the same variable at time t+ τ

G(τ) = limt→∞

1

t

∫ t

0

dθ x(θ) x(θ + τ) . (3.100)

It represents the time average of the product of two values recorded at arbi-

trary or sufficiently long times. The autocorrelation function is of particular

importance in the analysis of experimental data because technical devices

called autocorrelators have been built which sample data and can record

directly the autocorrelation function of a process under investigation.

Another relevant quantity is the spectrum or the spectral density of the

quantity x(t). In order to derive the spectrum, we construct a new variable

y(ω) by means of the transformation y(ω) =∫ t0

dθ eı.ıωθ x(θ). The spectrum

is then obtained from y by performing the limit t→∞:

S(ω) = limt→∞

1

2πt|y(ω)|2 = lim

T→∞

1

2πt

∣∣∣∣∫ t

0

dθ eı.ıωθ x(θ)

∣∣∣∣2

. (3.101)

The autocorrelation function and the spectrum are closely connected. By

some calculations one finds

S(ω) = limt→∞

(1

π

∫ t

0

cos(ωτ) dτ1

t

∫ t−τ

0

x(θ) x(θ + τ) dθ

).

Under certain assumptions, which insure the validity of the interchanges of

order, we may take the limit t→∞ and find

S(ω) =1

π

∫ ∞

0

cos(ωτ)G(τ) dτ .

174 Peter Schuster

This result relates the Fourier transform of the autocorrelation function to

the spectrum and can be cast in an even prettier form by using

G(−τ) = limt→∞

1

t

∫ t−τ

−τdθ x(θ) x(θ + τ) = G(τ)

to yield the Wiener-Khinchin theorem named after Norbert Wiener and

the Russian mathematician Aleksandr Khinchin

S(ω) =1

2π

∫ +∞

−∞e−ı

.ıωτ G(τ) dτ and G(τ) =

∫ +∞

−∞eı

.ıωτS(ω) dω . (3.102)

Spectrum and autocorrelation function are related to each other by the

Fourier transformation and its inversion.

Equation (3.102) allows for a straightforward proof that the Wiener pro-

cess ~W(t) = W (t) gives rise to white noise (subsection 3.4.3). Let w be

a zero-mean random vector with the identity matrix as (auto)covariance or

autocorrelation matrix:

E(w) = µ = 0 and Cov(W,W) = E(ww′) = σ2 I ,

then the Wiener process W (t) fulfils the relations,

µW (t) = E(W(t)

)= 0 and

GW (τ) = E(W(t)W(t + τ)

)= δ(τ) ,

defining it as a zero-mean process with infinite power at zero time shift. For

the spectral density of the Wiener process we obtain:

SW (ω) =1

2π

∫ +∞

−∞e−ı

.ıωτ δ(τ) dτ =

1

2π. (3.103)

The spectral density of the Wiener process is a constant and hence all fre-

quencies in the noise are represented with equal weight. All colors mixed

with equal weight in light yields white light and this property of visible light

gave the name for white noise; in colored noise the noise frequencies do not

fulfil the uniform distribution.

The time average of a signal as expressed by an autocorrelation function

is complemented by the ensemble average, 〈·〉, or expressed by the expec-

tation value of the corresponding random variable, E(·), which implies an


(infinite) number of repeats of the same measurement. In case the assump-

tion of ergodic behavior is true, the time average is equal to the ensemble

average. Thus we find for a fluctuating quantity X (t) in the ergodic limit

E(X (t),X (t+ τ)

)= 〈x(t)x(t+ τ)〉 = G(τ) .

It is straightforward to consider dual quantities which are related by Fourier

transformation and get:

x(t) =1

2π

∫dω c(ω) eı

.ıωt and c(ω) =

∫dt x(t) e−ı

.ıωt .

We use this relation to derive several important results. Measurements refer

to real quantities x(t) and this implies: c(ω) = c∗(−ω). From the condition

of stationarity, 〈x(t)x(t′)〉 = f(t − t′) and does not depend on t otherwise

follows

〈c(ω)c∗(ω′)〉 =1

(2π)2

∫∫dt dt′ e−ı

.ıωt+iωt′ 〈x(t)x(t′)〉 =

=δ(ω − ω′)

2π

∫dτ eı

.ıωτ G(τ) = δ(ω − ω′)S(ω) .

The last expression relates not only the mean square 〈|c(ω)|2〉 with the spec-

trum of the random variable, it shows also that stationarity alone implies

that c(ω) and c∗(ω′) are uncorrelated.

176 Peter Schuster

4. Applications in chemistry

The chapter starts by a discussion of stochasticity in chemical reactions and

then presents the chemical master equation as an appropriate tool for mod-

eling chemical reactions. Then, the birth-and-death process is introduced

as a well justified and useful approximation to general jump processes. The

equilibration of particle numbers or concentrations in the flow reactor is used

as a simple example to demonstrate the analysis of a birth-and-death pro-

cess. Next follow discussions of mono- and bimolecular chemical reactions

that can be still solved exactly by means of time dependent probability gen-

erating functions. The last sections handle the transition from microscopic

to macroscopic systems by means of the size expansion technique and the

numerical approach to stochastic chemical kinetics.

4.1 Stochasticity in chemical reactions

Stochastic chemical kinetics is based on the assumption that knowledge on

the transformation of molecules in chemical reactions is not accessible in full

detail or if it would, the information would be overwhelming and obscur-

ing essential features. Thus, it is assumed that chemical reactions have a

probabilistic element and can be modeled properly by means of stochastic

processes. The random processes are caused by thermal noise as well as by

random encounter of molecules in collisions. Fluctuations, therefore, play an

important role and they are responsible for the limitations in the reproduc-

tion of experiments. We shall model chemical reactions as Markov processes

and analyze the corresponding master and Fokker-Planck equations. As an

appropriate criterium for classification we shall use the molecularity of reac-

tions1 and the complexity of the dynamical behavior.

1The molecularity of a reaction is the number of molecules that are involved in the reac-

tion, for example two in a reactive collision between molecules or one in a conformational

177

178 Peter Schuster

The stochastic approach to chemical reaction kinetics has tradition and

started in the late fifties from two different initiatives: (i) approximation of

the highly complex vibrational relaxation [86–88] and its application to chem-

ical reactions, and (ii) direct simulation of chemical reactions as stochastic

processes [89–91]. The latter approach is in the sense of the initially men-

tioned limited information on details and has been taken up and developed

further by several groups [92–96]. The major part of these works has been

summarized in an early review [97] which is particularly recommended here

for further reading. Bartholomay’s work is also highly relevant for biological

models of evolution, because he studied reproduction as a linear birth-and-

death process. Exact solutions to master or Fokker-Planck equation can be

found only for particularly simple special cases. Often approximations were

used or the analysis has been restricted the expectation values of variables.

Later on computer assisted approximation techniques and numerical simula-

tion methods were developed which allow for handling stochastic phenomena

in chemical kinetics on a more general level [64, 98].

4.1.1 Elementary steps of chemical reactions

Chemical reactions are defined by mechanisms, which can be decomposed

into elementary processes. An elementary process describes the transforma-

tion of one or two molecules into products. Elementary processes involving

three or more molecules are unlikely to happen in the vapor phase or in

dilute solutions, because trimolecular encounters are very rare under these

conditions. Therefore, elementary steps of three molecules are not consid-

ered in conventional chemical kinetics.2 Two additional events which occur

in open systems, for example in flow reactors, are the creation of a molecules

through influx or the annihilation of a molecule through outflux. Common

elementary steps are:

change.2Exceptions are reactions involving surfaces as third partner, which are important in

gas phase kinetics, and biochemical reactions involving macromolecules.


? −−−→ A (4.1a)

A −−−→ (4.1b)

A −−−→ B (4.1c)

A −−−→ 2B (4.1d)

A −−−→ B + C (4.1e)

A + B −−−→ C (4.1f)

A + B −−−→ 2A (4.1g)

A + B −−−→ C + B (4.1h)

A + B −−−→ C + D (4.1i)

2A −−−→ B (4.1j)

2A −−−→ 2B (4.1k)

2A −−−→ B + C (4.1l)

Depending on the number of reacting molecules the elementary processes are

called mono-, bi-, or trimolecular. Tri- and higher molecular elementary

steps are excluded in conventional chemical reaction kinetics as said above.

The example show in equation (4.1g) is an autocatalytic elementary

process. In practice autocatalytic reactions commonly involve many elemen-

tary steps and thus are the results of complex reaction mechanisms. In order

to study basic features of autocatalysis or chemical self-enhancement, single

step autocatalysis is often used as a model system. One particular trimolec-

ular autocatalytic process,

2A + B −−−→ 3A , (4.2)

became very famous [99] despite its trimolecular nature which makes it un-

likely to occur in real systems. The elementary step (4.2) is the essential step

in the so-called Brusselator model, it can be straightforwardly addresses by

analytical mathematical techniques, and it gives rise to complex dynamical

180 Peter Schuster

phenomena in space and time which are otherwise not observed in chemi-

cal reaction systems. Among other features such special phenomena are: (i)

multiple stationary states, (ii) chemical hysteresis, (iii) oscillations in concen-

trations, (iv) deterministic chaos, and (v) spontaneous formation of spatial

structures. The last example is known as Turing instability [100] and is fre-

quently used as a model for pattern formation or morphogenesis in biology

[101].

4.1.2 The master equation in chemistry

Provided particle numbers are assigned to the variables describing the progress

of chemical reactions, the stochastic variableN (t) with the probability Pn(t) =

P (N (t) = n) can take only nonnegative integer values, n ∈ N0. In addition

we introduce a few simplifications and some conventions in our notation.

We shall use the forward equation unless stated differently and assume an

infinitely sharp initial density: P (n, 0|n0, 0) = δn,n0 with n0 = n(0). Then,

we can simplify the full notation by P (n, t|n0, 0) ⇒ Pn(t) with the implicit

assumption of the initial condition specified above. Other sharp initial val-

ues or for initial extended probability densities will be given explicitly. The

expectation value of the stochastic variable N (t) will be denoted by

E(N (t)

)= 〈n(t)〉 =

∞∑

n=0

n · Pn(t) . (4.3)

Its stationary value, provided it exists, will be written

n = limt→∞〈n(t)〉 . (4.4)

Almost always n will be identical with the long time value of the correspond-

ing deterministic variable. The running index of integers will be denoted by

either m or n′. Then the chemical master equation is of the form

∂Pn(t)

∂t=∑

m

(W (n|m)Pm(t) − W (m|n)Pn(t)

). (4.5)

The transition probabilities may be time dependent in certain cases: W (n|m, t).Most frequently we shall assume that they are not. The probabilitiesW (n|m)


can be understood as the elements of a transition matrix W.= Wnm;n,m ∈

N0. Diagonal elements Wnn cancel in the master equation (4.5) and hence

need not be defined. According to their nature as transition probabilities,

all Wnm with n 6= m have to be nonnegative. We may define, nevertheless,∑

mWmn = 0 which implies Wnn = −∑m6=nWmn and then insertion into

(4.5) leads to a compact form of the master equation

∂Pn(t)

∂t=∑

m

WnmPm(t) . (4.5’)

Introducing vector notation, P(t)′ = (P1(t), . . . , Pn(t), . . .), we obtain

∂P(t)

∂t= W ×P(t) . (4.5”)

With the initial condition Pn(0) = δn,n0 stated above we can solve equa-

tion (4.5”) formally for each n0 and obtain

P (n, t|n0, 0) =(exp(W t)

)

n,n0

,

where the element (n, n0) of the matrix exp(Wt) is the probability to have

n particles at time t, N (t) = n, when there were n0 particles at time t0 = 0.

The evaluation of this equation boils down to diagonalize the matrix W which

can be done analytically in rather few cases only.

For the forthcoming considerations we shall derive the so-called jump

moments

αp(n) =

∞∑

m=0

(m− n)pW (m|n) ; p = 1, 2, . . . . (4.6)

The usefulness of the first two jump moments (p = 1, 2) is easily demon-

strated: We multiply equation (4.5) by n and obtain through summation:

d

dt〈n〉 =

∞∑

n=0

∞∑

m=0

(mW (n|m)Pm(t) − nW (m|n)Pn(t)

)=

=

∞∑

n=0

∞∑

m=0

(m− n)W (m|n)Pn(t) = 〈α1(n)〉 .

182 Peter Schuster

Only in case α1(n) is a linear function of n formation of moment and expec-

tation value may be interchanged and we have the simple equation

d

dt〈n〉 = α1(〈n〉) .

Otherwise this is only a zeroth order approximation which can be improved

through expansion of α1(n) in (n−〈n〉). Break off after the second derivative

yieldsd

dt〈n〉 = α1(〈n〉) +

1

2σ 2n

d2

dn2α1(〈n〉) . (4.6’)

In order to obtain a consistent approximation one may apply a similar ap-

proximation to the time development of the variance and finds [98]:

d

dtσ 2n = α2(〈n〉) + 2 σ 2

n

d

dnα1(〈n〉) . (4.6”)

These expressions will be simplified in case of the forthcoming examples. We

proceed now by discussing first some important special cases where exact

solutions are derivable and then present a general and systematic approx-

imation scheme which allows to solve the master equation for sufficiently

large systems [64, 98]. This scheme is based on a power series expansion in

some extensive physical parameter Ω, for example the size of the system or

the total number of particles. It will turn out that Ω−1/2 is the appropri-

ate quantity for the expansion and thus the approximation is based on the

smallness of fluctuations. This implies that we shall encounter the limits

of reliability of the technique at small population sizes or in situations of

self-enhancing fluctuations, for example at instabilities or phase transitions.

The chemical master equation has been shown to be based on a rigor-

ous microscopic concept of chemical reactions in the vapor phase within the

frame of classical collision theory [3]. The two general requirements that

have to be fulfilled are: (i) a homogeneous mixture as it is assumed to exits

through well stirring and (ii) thermal equilibrium implying that the veloci-

ties of molecules follow a Maxwell-Boltzmann distribution. Daniel Gillespie’s

approach focusses on chemical reactions rather than molecular species and

is well suited to handle reaction networks. In addition the algorithm can be

easily implemented for computer simulation. We shall discuss the Gillespie

formalism together with the computer program in section 4.4.


Figure 4.1: Sketch of the transition probabilities in master equations. In

the general jump process steps of any size are admitted (upper drawing) whereas

in birth-and-death processes all jumps have the same size. The simplest and most

common case is dealing with the condition that the particles are born and die one

at a time (lower drawing).

4.1.3 Birth-and-death master equations

The concept of birth-and-death processes has been created in biology and is

based on the assumption that only a finite number of individuals are produced

– born – or destroyed – die – in a single event. The simplest and the only case,

we shall discuss here, is occurs when birth and death is confined to single

individuals of only one species. These processes are commonly denoted as

one step birth-and-death processes.3 In figure 4.1 the transitions in a general

jump process and a birth-and-death process are illustrated. Restriction to

single events is tantamount to the choice of a sufficiently small time interval

of recording, ∆t, such that the simultaneous occurrence of two events has

3In addition, one commonly distinguishes between birth-and-death processes in one

variable and in many variables [64]. We shall restrict the analysis here to the simpler

single variable case here.

184 Peter Schuster

a probability of measure zero (see also section 4.4). This small time step is

often called the blind interval, because no information on things happening

within ∆t is available.

Then, the transition probabilities can be written in the form

W (n|m, t) = w+(m) δn,m+1 + w−(m) δn,m−1 , (4.7)

since we are dealing with only two allowed processes:

n −→n+ 1 with w+(n) as transition probability per unit time and

n −→n− 1 with w−(n) as transition probability per unit time.

In subsection 3.4.1 we discussed the Poisson process which can be understood

as a birth-and-death process on n ∈ N0 with zero death rate. Modeling of

chemical reactions by birth-and-death processes turns out to be a very useful

approach for reaction mechanisms which can be described by changes in a

single variable.

The stochastic process can now be described by a birth-and-death master

equation

∂Pn(t)

∂t= w+(n− 1)Pn−1(t) + w−(n+ 1)Pn+1(t)−

−((w+(n) + w−(n)

)Pn(t) .

(4.8)

There is no general technique that allows to find the time-dependent solutions

of equation (4.8) and therefore we shall present some special cases later on.

In subsection 5.1.2 we shall give also a detailed overview of exactly solvable

single step birth-and-death processes. It is, however, possible to analyze the

stationary case in full generality.

Provided a stationary solution of equation (4.8), limt→∞ Pn(t) = Pn, ex-

ists, we can compute it in straightforward manner. It is useful to define a

probability current J(n) for the n-th step in the series,

Particle number 0 1 . . . n− 1 n n+ 1 . . .

Reaction step 1 2 . . . n− 1 n n+ 1 . . .


which is of the form

J(n) = w− n P (n) − w+ (n− 1) P (n− 1) . (4.9)

Now, the conditions for the stationary solution are given by

∂Pn(t)

∂t= 0 = J(n + 1) − J(n) , (4.10)

Restriction to positive particle numbers, n ∈ N0, implies w−(0) = 0 and

Pn(t) = 0 for n < 0, which in turn leads to J(0) = 0.

Now we sum the vanishing flow terms according to equation (4.10) and

obtain:

0 =

n−1∑

z=0

(J(z + 1) − J(z)

)= J(n) − J(0) .

Thus we find J(n) = 0 for arbitrary n which leads to

Pn =w+(n− 1)

w−(n)Pn−1 and finally Pn = P0

n∏

z=1

w+(z − 1)

w−(z).

The condition J(n) = 0 for every reaction step is known in chemical ki-

netics as the principle of detailed balance, which has been formulated first

by the American mathematical physicist Richard Tolman [79] (see also sec-

tion 3.6.3.3 and [6, pp.142-158]).

The macroscopic rate equations are readily derived from the master equa-

tion through computation of the expectation value:

∂

∂tE(n(t)

)=

∂

∂t

( ∞∑

n=0

nPn(t)

)=

=

∞∑

n=0

n(w+(n− 1)Pn−1(t) − w+(n)Pn(t)

)+

+∞∑

n=0

n(w−(n+ 1)Pn+1(t) − w−(n)Pn(t)

)=

=∞∑

n=0

((n+ 1)w+(n) − nw+(n) + (n− 1)w−(n) − nw−(n)

)Pn(t) =

=

∞∑

n=0

w+(n)P n(t) −∞∑

n=0

w−(n)Pn(t) = E(w+(n)

)− E

(w−(n)

).

186 Peter Schuster

Neglect of fluctuations yields the deterministic rate equation of the birth-

and-death process

d〈n〉dt

= w+(〈n〉)− w−(〈n〉

). (4.11)

The condition of stationary yields: n = limt→∞ 〈n(t)〉 forwhich holdsw+(n) =

w−(n). Compared to this results we note that the maximum value of the

stationary probability density, maxPn, n ∈ N0, is defined by Pn+1 − Pn ≈−(Pn − Pn−1) or Pn+1 ≈ Pn−1 which coincide with the deterministic value

for large n.

4.1.4 The flow reactor

The flow reactor is introduced as an experimental device that allows for

investigations of systems off thermodynamic equilibrium. The establishment

of a stationary state or the flow equilibrium in a flow reactor (figure 4.2) is a

suitable case study for the illustration of the search for a solution of a birth-

and-death master equation. At the same time the non-reactive flow of a single

compound represents the simplest conceivable process in such a reactor. The

stock solution contains A at the concentration [A]influx = a = a [mole·l−1].

The influx concentration a is equal to the stationary concentration a, because

no reaction is assumed to take place in the reactor. The flow is measured by

means of the flow rate r [l·sec−1]: This implies an influx of a · r [mole·sec−1]

of A into the reactor, instantaneous mixing with the content of the reactor,

and an outflux of the mixture in the reactor at the same flow rate r.4 The

reactor has a volume of V [l] and thus we have a mean residence time of

τR = V · r−1 [sec] of a volume element dV in the reactor.

In- and outflux of compound A into and from the reactor are modeled by

two formal elementary steps or pseudo-reactions

? −−−→ A

A −−−→ .(4.12)

4The assumption of equal influx and outflux rate is required because we are dealing

with a flow reactor of constant volume V (CSTR, figure 4.2).


Figure 4.2: The flow reactor. The reactor shown in the sketch is a device of

chemical reaction kinetics which is used to carry out chemical reactions in an open

system. The stock solution contains materials, for example A at the concentration

[A]influx = a, which are usually consumed during the reaction to be studied. The

reaction mixture is stirred in order to guarantee a spatially homogeneous reaction

medium. Constant volume implies an outflux from the reactor that compensates

precisely the influx. The flow rate r is equivalent to the inverse mean residence

time of solution in the reactor multiplied by the reactor volume, τ −1R · V = r. The

reactor shown here is commonly called continuously stirred tank reactor (CSTR).

In chemical kinetics the differential equations are almost always formulated in

molecular concentrations. For the stochastic treatment, however, we replace

concentrations by the numbers of particles, n = a · V ·NL with n ∈ N0 and

188 Peter Schuster

NL being Loschmidt’s or Avogadro’s number,5 the number of particles per

mole.

The particle number of A in the reactor is a stochastic variable with the

probability Pn(t) = P(N (t) = n

). The time derivative of the probability

distribution is described by means of the master equation

∂Pn(t)

∂t= r

(n Pn−1(t) + (n+1)Pn+1(t) − (n+n)Pn(t)

); n ∈ N0 . (4.13)

Equation (4.13) describes a birth-and-death process with w+(n) = rn and

w−(n) = rn. Thus we have a constant birth rate and a death rate which

is proportional to n. Solutions of the master equation can be found in text

books listing stochastic processes with known solutions, for example [19].

Here we shall derive the solution by means of probability generating functions

as introduced in subsection 2.7.1, equation (2.70) in order to illustrate one

particularly powerful approach:

g(s, t) =∞∑

n=0

Pn(t) sn . (2.70’)

Sometimes the initial state is included in the notation: gn0(s, t) implies

Pn(0) = δn,n0. Partial derivatives with respect to time t and the dummy

variable s are readily computed:

∂g(s, t)

∂t=

∞∑

n=0

∂Pn(t)

∂t· sn =

= r

∞∑

n=0

(n Pn−1(t) + (n + 1)Pn+1(t) − (n+ n)Pn(t)

)sn and

∂g(s, t)

∂s=

∞∑

n=0

nPn(t) sn−1 .

5As a matter of fact there is a difference between Loschmidt’s and Avogadro’s number

that is often ignored in the literature: Avogadro’s number, NL = 6.02214179×1023mole−1

refers to one mole substance whereas Loschmidt’s constant n0 = 2.6867774 × 1025 m−3

counts the number of particles in one liter gas under normal conditions. The conver-

sion factor between both constants is the molar volume of an ideal gas that amounts to

22.414 dm−3 ·mole−1.


Proper collection of terms and arrangement of summations – by taking into

account: w−(0) = 0 – yields

∂g(s, t)

∂t= rn

∞∑

n=0

(Pn−1(t) − Pn(t)

)sn + r

∞∑

n=0

((n+ 1)Pn+1(t) − nPn(t)

)sn .

Evaluation of the four infinite sums

∑∞

n=0Pn−1(t) s

n = s∑∞

n=0Pn−1(t) s

n−1 = s g(s, t) ,

∑∞

n=0Pn(t) s

n = g(s, t) ,

∑∞

n=0(n + 1)Pn+1(t) s

n =∂g(s, t)

∂t, and

∑∞

n=0nPn(t) s

n = s∑∞

n=0nPn(t) s

n−1 = s∂g(s, t)

∂t,

and regrouping of terms yields a linear partial differential equation of first

order∂g(s, t)

∂t= r

(n(s− 1) g(s, t) − (s− 1)

∂g(s, t)

∂s

). (4.14)

The partial differential equation (PDE) is solved through consecutive sub-

stitutions

φ(s, t) = g(s, t) exp(−n s) −→ ∂φ(s, t)

∂t= −r(s− 1)

∂φ(s, t)

∂s,

s− 1 = eρ and ψ(ρ, t) = φ(s, t) −→ ∂ψ(ρ, t)

∂t+ r

∂ψ(ρ, t)

∂ρ.

Computation of the characteristic manifold is equivalent to solving the

ordinary differential equation (ODE) r dt = −dρ. We find: rt − ρ = C

where C is the integration constant. The general solution of the PDE is an

arbitrary function of the combined variable rt− ρ:

ψ(ρ, t) = f(exp(−rt+ ρ)

)· e−n and φ(s, t) = f

((s− 1) e−rt

)· e−n ,

and the probability generating function

g(s, t) = f((s− 1) e−rt)

)· exp

((s− 1)n

).

190 Peter Schuster

Normalization of probabilities (for s = 1) requires g(1, t) = 1 and hence

f(0) = 1. The initial conditions as expressed by the conditional probability

P (n, 0|n0, 0) = Pn(0) = δn,n0 leads to the final expression

g(s, 0) = f(s− 1) · exp((s− 1)n

)= sn0 ,

f(ζ) = (ζ + 1)n0 · exp(−ζn) with ζ = (s− 1) e−rt ,

g(s, t) =(1 + (s− 1) e−rt

)n0 · exp(−n(s− 1) e−rt

)· exp

(n(s− 1)

)=

=(1 + (s− 1) e−rt

)n0 · exp−n(s − 1) (1 − e−rt)

.

(4.15)

From the generating function we compute with somewhat tedious but straight-

forward algebra the probability distribution

Pn(t) =

minn0,n∑

k=0

(n0

k

)nn−k · e

−krt (1− e−rt)n0+n−2k

(n− k)! · e−n (1−e−rt) (4.16)

with n, n0, n ∈ N0. In the limit t → ∞ we obtain a non-vanishing contri-

bution to the stationary probability only from the first term, k = 0, and

find

limt→∞

Pn(t) =nn

n!exp(−n) .

This is a Poissonian distribution with parameter and expectation value α =

n. The Poissonian distribution has also a variance which is numerically

identical with the expectation value, σ2(NA) = E(NA) = n, and thus the

distribution of particle numbers fulfils the√N -law at the stationary state.

The time dependent probability distribution allows to compute the ex-

pectation value and the variance of the particle number as a function of time

E(N (t)

)= n + (n0 − n) · e−rt ,

σ2(N (t)

)=(n + n0 · e−rt

)·(1 − e−rt

).

(4.17)

As expected the expectation value apparently coincides with the solution

curve of the deterministic differential equation

dn

dt= w+(n) − w−(n) = r (n− n) , (4.18)

which is of the form

n(t) = n + (n0 − n) · e−rt . (4.18’)


Figure 4.3: Establishment of the flow equilibrium in the CSTR. The

upper part shows the evolution of the probability density, Pn(t), of the number of

molecules of a compound A which flows through a reactor of the type illustrated in

figure 4.2. The initially infinitely sharp density becomes broader with time until the

variance reaches its maximum and then sharpens again until it reaches stationarity.

The stationary density is a Poissonian distribution with expectation value and

variance, E(N ) = σ2(N ) = n. In the lower part we show the expectation value

E(N (t)

)in the confidence interval E ± σ. Parameters used: n = 20, n0 = 200,

and V = 1; sampling times (upper part): τ = r · t = 0 (black), 0.05 (green), 0.2

(blue), 0.5 (violet), 1 (pink), and ∞ (red).

192 Peter Schuster

Since we start from sharp initial densities variance and standard deviation are

zero at time t = 0. The qualitative time dependence of σ2NA(t), however,

depends on the sign of (n0 − n):

(i) For n0 ≤ n the standard deviation increases monotonously until it

reaches the value√n in the limit t→∞, and

(ii) for n0 > n the standard deviation increases until it passes through a

maximum at

t(σmax) =1

r

(ln 2 + lnn0 − ln(n0 − n)

)

and approaches the long-time value√n from above.

In figure 4.3 we show an example for the evolution of the probability den-

sity (4.16). In addition, the figure contains a plot of the expectation value

E(N (t)

)inside the band E − σ < E < E + σ. In case of a normally dis-

tributed stochastic variable we find 68.3% of all values within this confidence

interval . In the interval E − 2σ < E < E + 2σ we would find even 95.4% of

all stochastic trajectories (2.7.6).


4.2 Classes of chemical reactions

In this section we shall present exact solutions of the chemical master equa-

tion for two classes of chemical reactions: monomolecular and bimolecular.

Molecularity of a reaction refers to the number of molecules in the reaction

complex and in most cases the molecularity is also reflected by the chemical

rate law of reaction kinetics in form of the reaction order. In particular, we

distinguish fist order and second order kinetics, which is typically observed

with monomolecular and bimolecular reactions, respectively.

4.2.1 Monomolecular chemical reactions

The reversible mono- or unimolecular chemical reaction can be split into two

irreversible elementary reactions

Ak1−−−→ B (4.19a)

Ak2

←−−− B , (4.19b)

wherein the reaction rate parameters, k1 and k2, are called reaction rate

constants . The reaction rate parameters depend on temperature, pressure,

and other environmental factors. At equilibrium the rate of the forward

reaction (4.19a) is precisely compensated by the rate of the reverse reac-

tion (4.19b), k1 ·[A] = k2 ·[B], leading to the condition for the thermodynamic

equilibrium:

K =k1

k2=

[B]

[A]. (4.20)

The parameter K is called the equilibrium constant that depends on tem-

perature, pressure, and other environmental factors like the reaction rate

parameters. In an isolated or in a closed system we have a conservation law:

NA(t) + NB(t)

V ·NA= [A] + [B] = c(t) = c0 = c = constant , (4.21)

with c being the total concentration and c the corresponding equilibrium

value, limt→∞ c(t) = c.

194 Peter Schuster

4.2.1.1 Irreversible monomolecular chemical reaction

We start by discussing the simpler irreversible case,

Ak

−−−→ B , (4.19a’)

which is can be modeled and analyzed in full analogy to the previous case of

the flow equilibrium. Although we are dealing with two molecular species,

A and B the process is described by a single stochastic variable, NA(t),

since we have NB(t) = n0 − NA(t) with n0 = n(0) being the number of

A molecules initially present because of the conservation relation (4.21). If

a sufficiently small time interval is applied, the irreversible monomolecular

reaction is modeled by a single step birth-and-death process with w+(n) = 0

and w−(n) = kn.6 The probability density is defined by Pn(t) = P (NA = n)

and its time dependence obeys

∂Pn(t)

∂t= k (n+ 1)Pn+1(t) − k nPn(t) . (4.22)

The master equation (4.22) is solved again by means of the probability gen-

erating function,

g(s, t) =

∞∑

n=0

Pn(t) sn ; |s| ≤ 1 ,

which is determined by the PDE

∂g(s, t)

∂t− k (1− s) ∂g(s, t)

∂s= 0.

The computation of the characteristic manifold of this PDE is tantamount

to solving the ODE

k dt =ds

s− 1=⇒ ekt = s− 1 + const .

With φ(s, t) = (s − 1) exp(−kt) + γ, g(s, t) = f(φ), the normalization con-

dition g(1, t) = 1, and the boundary condition g(s, 0) = f(φ)t=0 = sn0 we

find

g(s, t) =(s · e−kt + 1 − e−kt

)n0

. (4.23)

6We remark that w−(0) = 0 and w+(0) = 0 are fulfilled, which are the conditions for

a natural absorbing barrier at n = 0 (section 5.1.2).


This expression is easily expanded in binomial form, which orders with re-

spect to increasing powers of s,

g(s, t) = (1− e−kt)n0+

(n0

1

)se−kt(1− e−kt)n0−1 +

(n0

2

)se−2kt(1− e−kt)n0−2+

+ . . .+

(n0

n0 − 1

)sn0−1e−(n0−1)kt(1− e−kt) + sn0e−n0kt .

Comparison of coefficients yields the time dependent probability density

Pn(t) =

(n0

n

)(exp(−kt)

)n (1− exp(−kt)

)n0−n. (4.24)

It is straightforward to compute the expectation value of the stochastic vari-

able NA, which coincides again with the deterministic solution, and its vari-

ance

E(NA(t)

)= n0 e

−kt ,

σ2(NA(t)

)= n0 e

−kt (1− e−kt).

(4.25)

The half-life of a population of n0 particles,

t1/2 : ENA(t) =n0

2= n0 · e−ktm =⇒ t1/2 =

1

kln 2 ,

is time of maximum variance or standard deviation, dσ2/ dt = 0 or dσ/ dt =

0, respectively. An example of the time course of the probability density of

an irreversible monomolecular reaction is shown in figure 4.4.

4.2.1.2 Reversible monomolecular chemical reaction

The analysis of the irreversible reaction is readily extended to the reversible

case (4.19), where we are dealing with a one step birth-and-death process.

Again we are dealing with a closed system, the conservation relation NA(t)+

NB(t) = n0 – with n0 being again the number of molecules of class A initially

present, Pn(0) = δn,n0 – holds and the transition probabilities are given by:

w+(n) = k2(n0 − n) and w−(n) = k1n.7 The master equation is now of the

7Here we note the existence of barriers at n = 0 and n = n0, which are characterized

by w−(0) = 0, w+(0) = k2n0 > 0 and w+(n0) = 0, w−(n0) = k1n0 > 0, respectively.

These equations fulfil the conditions for reflecting barriers (section 5.1.2).

196 Peter Schuster

Figure 4.4: Continued on next page.


Figure 4.4: Probability density of an irreversible monomolecular reac-

tion. The three plots on the previous page show the evolution of the probability

density, Pn(t), of the number of molecules of a compound A which undergo a re-

action A→B. The initially infinitely sharp density Pn(0) = δn,n0 becomes broader

with time until the variance reaches its maximum at time t = t1/2 = ln 2/k and

then sharpens again until it approaches full transformation, limt→∞ Pn(0) = δn,0.

On this page we show the expectation value E(NA(t)

)and the confidence intervals

E ± σ (68,3%,red) and ±2σ (95,4%,blue) with σ2(NA(t)

)being the variance. Pa-

rameters used: n0 = 200, 2000, and 20 000; k = 1 [t−1]; sampling times: 0 (black),

0.01 (green), 0.1 (blue), 0.2 (violet), (0.3) (magenta), 0.5 (pink), 0.75 (red), 1

(pink), 1.5 (magenta), 2 (violet), 3 (blue), and 5 (green).

form

∂Pn(t)

∂t= k2(n0 − n + 1)Pn−1(t) + k1(n+ 1)Pn+1(t)−

−(k1n+ k2(n0 − n)

)Pn(t) .

(4.26)

Making use of the probability generating function g(s, t) we derive the PDE

∂g(s, t)

∂t=(k1 + (k2 − k1)s− k1s

2)∂g(s, t)

∂s+ n0 k2(s− 1) g(s, t) .

The solutions of the PDE are simpler when expressed in terms of parameter

combinations, κ = k1 + k2 and λ = k1/k2, and the function

198 Peter Schuster

ω(t) = λ exp(−κt) + 1:

g(s, t) =(1 + (s− 1) e−κt − s

λ

)n0

=

=

(λ (1− e−κt) + s (λe−κt + 1)

1 + λ

)n0

=

=

n0∑

n=0

((n0

n

) (λe−κt + 1

)n (λ(1− e−κt)

))n0−n sn

(1 + λ)n0.

The probability density for the reversible reaction is then obtained as

Pn(t) =

(n0

n

)1

(1 + λ)n0

(λe−κt + 1

)n (λ(1− e−κt)

)n0−n. (4.27)

Expectation value and variance of the numbers of molecules are readily com-

puted (with ω(t) = λ exp(−κt) + 1):

E(NA(t)

)=

n0

1 + λω(t) ,

σ2(NA(t)

)=

n0 ω(t)

1 + λ

(1− ω(t)

1 + λ

),

(4.28)

and the stationary values are

limt→∞

E(NA(t)

)= n0

k2

k1 + k2,

limt→∞

σ2(NA(t)

)= n0

k1 k2

(k1 + k2)2,

limt→∞

σ(NA(t)

)=√n0

√k1 k2

k1 + k2

.

(4.29)

This result shows that the√N -law is fulfilled up to a factor that is indepen-

dent of N : E/σ =√n0 k2/

√k1 k2.

Starting from a sharp distribution, Pn(0) = δn,n0, the variance increases,

may or may not pass through a maximum and eventually reaches the equi-

librium value, σ2 = k1k2 n0/(k1 + k2)2. The time of maximal fluctuations is

easily calculated from the condition dσ2/ dt = 0 and one obtains

tvar max =1

k1 + k2ln

(2 k1

k1 − k2

). (4.30)



200 Peter Schuster

Figure 4.5: Probability density of a reversible monomolecular reaction

The three plots on the previous page show the evolution of the probability density,

Pn(t), of the number of molecules of a compound A which undergo a reaction AB.

The initially infinitely sharp density Pn(0) = δn,n0 becomes broader with time until

the variance settles down at the equilibrium value eventually passing a point of

maximum variance. On this page we show the expectation value E(NA(t)

)and the

confidence intervals E±σ (68,3%,red) and ±2σ (95,4%,blue) with σ2(NA(t)

)being

the variance. Parameters used: n0 = 200, 2000, and 20 000; k1 = 2 k2 = 1 [t−1];

sampling times: 0 (black), 0.01 (dark green), 0.025 (green), 0.05 (turquoise), 0.1

(blue), 0.175 (blue violet), 0.3 (purple), 0.5 (magenta), 0.8 (deep pink), 2 (red).

Depending on the sign of (k1−k2) the approach towards equilibrium passes a

maximum value or not. The maximum is readily detected from the height of

the mode of Pn(t) as seen in figure 4.5 where a case with k1 > k2 is presented.

In order to illustrate fluctuations and their value under equilibrium con-

ditions the Austrian physicist Paul Ehrenfest designed a game called Ehren-

fest’s urn model [102], which was indeed played in order to verify the√N -law.

Balls, 2N in total, are numbered consecutively, 1, 2, . . . , 2N , and distributed

arbitrarily over two containers, say A and B. A lottery machine draws lots,

which carry the numbers of the balls. When the number of a ball is drawn,

the ball is put from one container into the other. This setup is already suffi-

cient for a simulation of the equilibrium condition. The more balls are in a


container, the more likely it is that the number of one of its balls is drawn

and a transfer occurs into the other container. Just as it occurs with chem-

ical reactions we have self-controlling fluctuations: Whenever a fluctuations

becomes large it creates a force for compensation which is proportional to

the size of the fluctuation.

4.2.2 Bimolecular chemical reactions

Two classes of bimolecular reactions are accessible to full stochastic analysis:

A + Bk

−−−→ C and (4.31a)

2Ak

−−−→ B . (4.31b)

Bimolecularity gives rise to nonlinearities in the kinetic differential equations

and in the master equations and complicates substantially the analysis of the

individual cases. At the same time, these classes of bimolecular equations

do not show essential differences in the qualitative behavior compared to

the corresponding monomolecular or linear case A −→ B in contrast to

autocatalytic processes (section 5.1). The following derivations are based

upon two publications [94, 95].

4.2.2.1 Addition reaction

In the first example (4.31a) we are dealing with three dependent stochastic

variables NA(t), NB(t), and NC(t). Following McQuarrie et al. we define the

probability Pn(t) = P(NA(t) = n

)and apply the standard initial condition

Pn(0) = δn,n0, P (NB(0) = b) = δb,b0 , and P (NC(0) = c) = δc,0. Accordingly,

we have from the laws of stoichiometry NB(t) = b0−n0 +NA(t) and NC(t) =

n0−NA(t). For simplicity we denote b0−n0 = ∆0. Then the master equation

for the chemical reaction is of the form

∂Pn(t)

∂t= k (n+ 1) (∆0 + n + 1)Pn+1(t) − k n (∆0 + n)Pn(t) . (4.31a’)

202 Peter Schuster

Figure 4.6: Irreversible bimolecular addition reaction A+B→ C. The plot

shows the probability distribution Pn(t) = Prob(NC(t) = n

)describing the num-

ber of molecules of species C as a function of time and calculated by equation (4.37).

The initial conditions are chosen to be NA(t) = δ(a, a0), NB(t) = δ(b, b0), and

NC(t) = δ(c, 0). With increasing time the peak of the distribution moves from

left to right. The state n = min(a0, b0) is an absorbing state and hence the long

time limit of the system is: limt→∞NC(t) = δ(n,min(a0, b0)

). Parameters used:

a0 = 50, b0 = 51, k = 0.02 [t−1 ·M−1]; sampling times (upper part): t = 0 (black),

0.01 (green), 0.1 (turquoise), 0.2 (blue), 0.3 (violet), 0.5 (magenta), 0.75 (red),

1.0 (yellow), 1.5 (red), 2.25 (magenta), 3.5 (violet), 5.0 (blue), 7.0 (cyan), 11.0

(turquoise), 20.0 (green), and ∞ (black).

We remark that the birth and death rates are no longer linear in n. The

corresponding PDE for the generating function is readily calculated

∂g(s, t)

∂t= k (∆0 + 1)(1− s)∂g(s, t)

∂s+ k s(1− s) ∂

2g(s, t)

∂s2. (4.32)

The derivation of solutions or this PDE is quite demanding. It can be

achieved by separation of variables:

g(s, t) =

∞∑

m=0

Am Zm(s)Tm(t) . (4.33)


We dispense from details and list only the coefficients and functions of the

solution:

Am = (−1)m(2m+ ∆0)Γ(m+ ∆0)Γ(n0 + 1)Γ(n0 + ∆0 + 1)

Γ(m+ 1)Γ(∆0 + 1)Γ(n0 −m+ 1)Γ(n0 + ∆0 +m+ 1),

Zm(s) = Jm(∆0,∆0 + 1, s) , and

Tm(t) = exp(−m(m+ ∆0) kt

).

Herein, Γ represents the conventional gamma function with the definition

Γ(x + 1) = xΓ(x), and J(p, q, s) are the Jacobi polynomials named after

the German mathematician Carl Jacobi [103, ch.22, pp.773-802], which are

solutions of the differential equation

s(1−s)d2Jn(p, q, s)

ds2 +(q− (p+1)s

)dJn(p, q, s)

ds+ n(n+p) Jn(p, q, s) = 0 .

These polynomials fulfil the following conditions:

dJn(p, q, s)

ds= − n(n + p)

sJn−1(p+ 2, q + 1, s) and

∫ 1

0

sq−1(1− s)p−qJn(p, q, s)J`(p, q, s) ds =n!(Γ(q)

)2

Γ(n+ p− q + 1)

(2n+ p)Γ(n+ p)Γ(n + q)δ`,n .

At the relevant value of the dummy variable, s = 1, we differentiate twice

and find:(∂g(s, t)

∂s

)

s=1

=

n0∑

m=1

(2m+ ∆0)Γ(n0 + 1)Γ(n0 + ∆0 + 1)

Γ(n0 −m+ 1)Γ(n0 + ∆0 +m+ 1)Tm(t) , (4.34)

(∂2g(s, t)

∂s2

)

s=1

=

=

n0∑

m=2

(m− 1)(m+ ∆0 + 1)(2m + ∆0)Γ(n0 + 1)Γ(n0 + ∆0 + 1)

Γ(n0 −m+ 1)Γ(n0 −∆0 +m+ 1)Tm(t) (4.35)

from which we obtain expectation value and variance according to subsec-

tion 2.7.1

E(NA(t)

)=

(∂g(s, t)

∂s

)

s=1

and

σ2(NA(t)

)=

(∂2g(s, t)

∂s2

)

s=1

+

(∂g(s, t)

∂s

)

s=1

−((

∂g(s, t)

∂s

)

s=1

)2

. (2.71’)

204 Peter Schuster

As we see in the current example and we shall see in the next subsubsec-

tion, bimolecularity complicates the solution of the chemical master equa-

tions substantially and makes it quite sophisticated. We dispense here from

the detailed expressions but provide the results for the special case of vast

excess of one reaction partner, |∆0| n0 > 1, which is known as pseudo

first order condition. Then, the sums can be approximated well be the first

terms and we find (with k′ = ∆0k):

(∂g(s, t)

∂s

)

s=1

≈ n0∆0 + 2

n0 + ∆0 + 1e−(∆0+1)kt ≈ n0 e

−k′t and

(∂2g(s, t)

∂s2

)

s=1

≈ n0 (n0 − 1) e−2k′t ,

and we obtain finally,

E(NA(t)

)= n0 e

−k′t and

σ2(NA(t)

)= n0 e

−k′t(1− e−k′t

),

(4.36)

which is essentially the same result as obtained for the irreversible first order

reaction.

For the calculation of the probability density we make use of a slightly

different definition of the stochastic variables and use NC(t) counting the

number of molecules C in the system: Pn(t) = P(NC(t) = n

). With the

initial condition Pn(0) = δ(n, 0) and the upper limit of n, limt→∞ Pn(t) = c

with c = mina0, b0 where a0 and b0 are the sharply defined numbers of A

and B molecules initially present (NA(0) = a0, NB(0) = b0), we have

c∑

n=0

Pn(t) = 1 and thus Pn(t) = 0 ∀ (n /∈ [0, c], n ∈ Z)

and the master equation is now of the form

∂Pn(t)

∂t= k

(a0 − (n− 1)

)(b0 − (n− 1)

)Pn−1(t)−

− k (a0 − n)(b0 − n)Pn(t) . (4.31a”)


In order to solve the master equation (4.31a”) the probability distribution

Pn(t) is Laplace transformed in order to obtain a set of pure difference equa-

tion from the master equation being a set of differential-difference equation

qn(s) =

∫ ∞

0

exp(− s · t)Pn(t) dt

and with the initial condition Pn(0) = δ(n, 0) we obtain

−1 + s q0(s) = − k a0 b0 q0(s) ,

s qn(s) = k(a0 − (n− 1)

)(b0 − (n− 1)

)qn−1(s)−

− k (a0 − n)(b0 − n) qn(s) , 1 ≤ n ≤ c .

Successive iteration yields the solutions in terms of the functions qn(s)

qn(s) =

(a0

n

)(b0n

)(n!)2kn

n∏

j=0

1

s+ k(a0 − j)(b0 − j), 0 ≤ n ≤ c

and after converting the product into partial fractions and inverse transfor-

mation one finds the result

Pn(t) = (−1)n(a0

n

)(b0n

) n∑

j=0

(−1)j(

1 +n− j

a0 + b0 − n− j

)×

×(n

j

)(a0 + b0 − j

n

)−1

e−k(a0−j)(b0−j)t .

(4.37)

An illustrative example is shown in figure 4.6. The difference between the

irreversible reactions monomolecular conversion and the bimolecular addition

reaction (figure 4.4) is indeed not spectacular.

4.2.2.2 Dimerization reaction

When the dimerization reaction (4.31b) is modeled by means of a master

equation [94] we have to take into account that two molecules A vanish at a

time, and an individual jump involves always ∆n = 2:

∂Pn(t)

∂t=

1

2k (n + 2)(n+ 1)Pn+2(t) −

1

2k n(n− 1)Pn(t) , (4.31b’)

206 Peter Schuster

Figure 4.7: Irreversible dimerization reaction 2A → C. The plot shows

the probability distribution Pn(t) = Prob(NA(t) = n

)describing the number of

molecules of species C as a function of time and calculated by equation (4.42). The

number of molecules C is given by the distribution Pm(t) = Prob(NC(t) = m

).

The initial conditions are chosen to be NA(t) = δ(n, a0), and NC(t) = δ(m, 0) and

hence we have n + 2m = a0. With increasing time the peak of the distribution

moves from right to left. The state n = 0 is an absorbing state and hence the long

time limit of the system is: limt→∞NA(t) = δ(n, 0) limt→∞NC(t) = δ(m,a0/2).

Parameters used: a0 = 100 and k = 0.02[t−1 ·M−1]; sampling times (upper part):

t = 0 (black), 0.01 (green), 0.1 (turquoise), 0.2 (blue), 0.3 (violet), 0.5 (magenta),

0.75 (red), 1.0 (yellow), 1.5 (red), 2.25 (magenta), 3.5 (violet), 5.0 (blue), 7.0

(cyan), 11.0 (turquoise), 20.0 (green), 50.0 (chartreuse), and ∞ (black).

which gives rise to the following PDE for the probability generating function

∂g(s, t)

∂t=

k

2(1− s2)

∂2g(s, t)

∂s2. (4.38)

The analysis of this PDE is more involved than it might look at a first glance.

Nevertheless, an exact solution similar to (4.33) is available:

g(s, t) =

∞∑

m=0

AmC− 1

2m (s)Tm(t) , (4.39)


wherein the parameters and functions are defined by

Am =1− 2m

2m· Γ(n0 + 1) Γ[(n0 −m+ 1)/2]

Γ(n0 −m+ 1) Γ[(n0 +m+ 1)/2],

C− 1

2m (s) : (1− s2)

d2C− 1

2m (s)

ds2 + m(m− 1)C− 1

2m (s) = 0 ,

Tm(t) = exp−1

2km(m− 1) t .

The functions C− 1

2m (s) are ultraspherical or Gegenbauer polynomials named

after the German mathematician Leopold Gegenbauer [103, ch.22, pp.773-

802]. They are solution of the differential equation shown above and belong to

the family of hypergeometric functions. It is straightforward to write down

expressions for the expectation values and the variance of the stochastic

variable NA(t) (µ stands for an integer running index, µ ∈ N):

E(NA(t)

)= −

2bn02c∑

m=2µ=2

Am Tm(t) and

σ2(NA(t)

)= −

2bn02c∑

m=2µ=2

(1

2(m2 −m+ 2)Am Tm(t) + A 2

m T2m(t)

).

(4.40)

In order to obtain concrete results these expressions can be readily evaluated

numerically.

There is one interesting detail in the deterministic version of the dimer-

ization reaction. It is conventionally modeled by the differential equation

(4.41a), which can be solved readily. The correct ansatz, however, would

be (4.41b) for which we have also an exact solution (with [A]=a(t) and

a(0) = a0):

−da

dt= k a2 =⇒ a(t) =

a0

1 + a0 ktand (4.41a)

−da

dt= k a(a− 1) =⇒ a(t) =

a0

a0 + (1− a0)e−kt. (4.41b)

The expectation value of the stochastic solution lies always between the solu-

tion curves (4.41a) and (4.41b). An illustrative example is shown in figure 4.7.

208 Peter Schuster

As the previous subsection 4.2.2.1 we consider also a solution of the master

equation by means of a Laplace transformation [95]. Since we are dealing

with a step size of two molecules A converted into one molecule C, the master

equation is defined only for odd or only for even numbers of molecules A. For

an initial number of 2a0 molecules and a probability P2n(t) = P(NA(t) =

2n)

we have for the initial conditions NA(0) = 2a0, NC(0) = 0 and the

condition that all probabilities outside the interval [0, 2a0] as well as the odd

probabilities P2n−1 (n = 1, . . . , 2a0 − 1) vanish

∂P2n(t)

∂t= −1

2k (2n)(2n − 1)P2n(t) +

1

2k (2n + 2)(2n + 1)P2n+2(t) (4.31b”)

The probability distribution P2n(t) is derived as in the previous subsection

by Laplace transformation

q2y(s) =

∫ ∞

0

exp(− s · t)P2y(t) dt

yielding the set of difference equations

−1 + s q2a0(s) = −1

2k (2a0)(2a0 − 1) q2a0(s) ,

s q2n(s) = −1

2k (2n)(2n− 1) q2n(s) +

+1

2k (2n+ 2)(2n+ 1) q2n+2(s) , 0 ≤ y ≤ a0 − 1 ,

which again can be solved by successive iteration. It is straightforward to cal-

culate first the Laplace transform for 2µ, the number of molecules of species

A that have reacted to yield C: 2µ = 2(a0−m) with m = [C] and 0 ≤ m ≤ a0:

q2(a0−m)(s) =

(k

2

)m (2a0

2m

)(2m)!

m∏

j=1

(s+

k

2

(2(a0 − j)

)·(2(a0 − j) − 1

))−1

,

and a somewhat tedious but straightforward exercise in algebra yields the

inverse Laplace transform:

P2(a0−m)(t) = (−1)ma0! (2a0 − 1)!!

(a−m)! (2a0 − 2m− 1)×

×m∑

j=0

(−1)j(4a0 − 4j − 1)(4a0 − 2m− 2j − 3)!!

j!(m− j)!(4a0 − 2j − 1)!!×

× e−k (a0−j)·(2(a0−j)−1

)t .


The substitution i = a0 − j leads to

P2(a0−m)(t) = (−1)ma0! (2a0 − 1)!!

(a−m)! (2a0 − 2m− 1)×

×a0∑

i=a0−m(−1)a0−i

(4i− 1)(2a0 − 2m+ 2i− 3)!!

(a0 − i)!(a0 − i+m)!(2a0 + 2i− 1)!!×

× e−k 2i·(2i−1)t .

Setting now n = a0 − m in accord with the definition of m we obtain the

final result

P2n(t) = (−1)na0!(2a0 − 1)!!

n!(2n− 1)!!×

×n∑

i=1

(−1)i(4i− 1)(2n+ 2i− 3)!!

n!(2n− 1)!!× e−k i(2i−1)t .

(4.42)

The results are illustrated be means of a numerical example in figure 4.6.

210 Peter Schuster

4.3 Fokker-Planck approximation of master equations

It is often desirable to use a single differential equation for the description

of probability distributions P (x) for jump processes. The obvious strategy

is to approximate the master equation by a Fokker-Planck equation and im-

plicitly such an approach has been already used by Albert Einstein in his

famous work on Brownian motion [33]. A nonrigorous but straightforward

derivation of an expansion of the master equation was given by the Dutch

physicist Hendrik Kramers [104] and later the Kramers approach has been

substantially improved by the mathematical physicist Jose Moyal [105]. This

approach is generally known a Kramers-Moyal expansion. A differently moti-

vated and rigorous expansion is due the Dutch theoretical physicist Nicholas

van Kampen who was a PhD student of Hendrik Kramers. His expansion,

known as the van Kramers size expansion, is based on a general approxima-

tion by size dependent parameters containing those that vanish in the limit

of macroscopic extensions.

Before we discuss these expansions and approximations, we shall intro-

duce first the reverse approach: A diffusion process can always be approxi-

mated by a jump process whereas the inverse is not always true. We shall

encounter such a situation in case of the Poisson process, which cannot be

simulated by a suitable Fokker-Planck equation. Basic for the conversion in

the limit of vanishing step size is an expansion of the transition probabilities

that is truncated after the second term as in the case of the derivation of the

Chapman-Kolmogorov equation (subsection 3.1.2).

4.3.1 Diffusion process approximated by a jump process

The typical result is derived for a random walk (subsection 3.4.2) where the

master equation becomes a Fokker-Planck equation in the limit of infinitely

small step size. Jumps, in this case, must become smaller and more prob-

able simultaneously and this is taken care by a scaling assumption that is

encapsulated in a parameter δ in the following way: average step size and the

variance of the step size are proportional to δ whereas the jump probabilities

increase as δ becomes smaller.


First we introduce a new variable y = (z − x− A(x)δ)/√δ and write for

the jump probability

Wδ(z|x) = δ−3/2 Φ(y, x) with

∫dyΦ(y, x)

= Q and

∫dy yΦ(y, x) = 0 .

(4.43)

Now we define three terms for a series expansion in the jump moments,

α0(x) ≡∫

dz Wδ(z|x) = Q/δ

α1(x) ≡∫

dz (z − x)Wδ(z|x) = A(x)Q (4.44)

α2(x) ≡∫

dz (z − x)2Wδ(z|x) =

∫dy y2 Φ(y, x) ,

and assume that the function Φ(y, x) vanishes sufficiently fast as y → ∞ in

order to guarantee that

limδ→0

Wδ(z|x) = limy→∞

((x

z − x

)3

Φ(y, x)

)

= 0 for z 6= x .

Next we choose a twice differentiable function f(z) and carry out a procedure

that is very similar to the derivation of the differential Chapman-Kolmogorov

equation in section 3.1.2 and find

limδ→0

⟨∂f(z)

∂t

⟩=

⟨α1(z)

∂f(z)

∂z+

1

2α2(z)

∂2f(z)

∂z2

⟩.

This result has the consequence that in the limit δ → 0 the master equation

∂P (x)

∂t=

∫dz(W (x|z)P (z) − W (z|x)P (x)

)(4.45a)

becomes the FPE

∂P (x)

∂t= − ∂

∂x

(α1 P (x)

)+

1

2

∂2

∂x2

(α2 P (x)

). (4.45b)

Accordingly, one can always construct a master equation if the requirements

imposed by the three α-functions (4.44) are met. In case these criteria are not

212 Peter Schuster

fulfilled, there is no approximation possible.The approximation is illustrated

by means of three examples:

Random walk. Based on the notation introduced in subsection 3.4.2 we

find for x = n · l:

W (x|z) = ϑ (δz,x−l + δz,x+l) =⇒ α0(x) = 2ϑ, α1(x) = 0, α2(x) = 2 l2 ϑ .

With δ = l2 and D = l2ϑ we obtain the familiar diffusion equation

∂P (x, t)

∂t= D

∂2P (x, t)

∂x2. (4.46)

Poisson process. With the notation used in subsection 3.4.1 (except α↔ ϑ)

and x = n · l we find:

W (x|z) = ϑ δz,x+l =⇒ α0(x) = ϑ, α1(x) = l ϑ, α2(x) = l2 ϑ .

In this case there is no way to define l and ϑ as functions of δ such that α1(x)

and α2(x) remain finite in the limit l → 0. There is no Fokker-Planck limit

for the Poisson process.

General approximation of diffusion by birth-and death master equa-

tions. We begin with a master equation of the class

Wδ(z|x) =

(A(x)

2δ+B(x)

2δ2

)δz,x+δ +

(−A(x)

2δ+B(x)

2δ2

)δz,x−δ , (4.47)

where Wδ(z|x) is positive for sufficiently small δ. Under the assumption that

this is fulfilled for the entire range of interest for x, the process takes place

on a range of x that is composed of integer multiples of δ.8 In the limit δ → 0

the birth and death process is converted into a Fokker-Planck equation with

α0(x) = B(x)/δ2, α1(x) = A(x), α2(x) = B(x) and

limδ→0

Wδ(z|x) = 0 for z 6= x .(4.48)

8We remark that the scaling relations (4.43) and (4.47) not the same but both lead to

a Fokker-Planck equation.


Although α0(x) diverges with 1δ2 in contrast to (4.44) – where we pre-

scribed the required 1/δ behavior – and the imagination of jumps converging

smoothly into a continuous distribution is no longer valid, there exists a

limiting Fokker-Planck equation, because the behavior of α0(x) is irrelevant

∂P (x, t)

∂t= − ∂

∂x

(A(x)P (x, t)

)+

1

2

∂2

∂x2

(B(x)P (x, t)

). (4.49)

Equation (4.48) provides a tool for the simulation of a diffusion process by

an approximating birth-and-death process. The method, however, fails for

B(x) = 0 for all possible ranges of x since then Wδ(z, x) does not fulfil the

criterion of being nonnegative.

4.3.2 Kramers-Moyal expansion

The derivation starts from the master equation (4.45a) and a substitution of

z by the definition

y = x − z in the first term and

y = z − x in the second term .

We redefine also the elements of the transition matrix

T (y, x) = W (x+ y|x) ,

and the master equation is now of the form

∂P (x, t)

∂t=

∫dy(T (x, x− y)P (x− y, t) − T (y, x)P (x, t)

), , (4.45a’)

and the integral is readily expanded in a power series

∂P (x, t)

∂t=

∫dy

∞∑

n=1

(−y)nn!

∂n

∂xn

(T (y, x)P (x, t)

)

=∞∑

i=1

(−1)n

n!

∂n

∂xn

(αn(x)P (x, t)

), (4.50)

where the n-th derivative moment is defined by

αn(x) =

∫dz (z − x)nW (z|x) =

∫dy yn T (y, x) . (4.50’)

In case the Kramers-Moyal expansion is terminated at the second term the

result is Fokker-Planck equation of the form (4.45b).

214 Peter Schuster

4.3.3 Size expansion of the chemical master equation

Although we were able to analyze a number of representative examples by

solving the one step birth-and-death master equation exactly, the actual ap-

plicability to specific problems of chemical kinetics of this technique is rather

limited. In order to apply a chemical master equation to a problem in prac-

tice on is commonly dealing with about 1012 particles or more. Upscaling

discloses one particular problem that is related to size expansion and that

becomes virulent in the transition from the master equation to a Fokker-

Planck equation. The problem is intimately related to the parameter volume

V , which is the best possible estimator of system size. We distinguish to

classes of quantities: (i) intensive quantities that are independent of sys-

tem size, and (ii) extensive quantities that grow proportional to system size.

Examples of intensive properties are temperature, pressure, density, and ex-

tensive properties are volume, energy, or entropy. In upscaling from say 1000

to 1012 particles extensive properties grow by a factor of 109 whereas inten-

sive properties remain the same. Some pairs of properties – one extensive

and one intensive – are of particular importance, for example particle num-

ber N and concentration c = N /(V · NL) or mass M and (volume) density

% = M/V .

In order to compensate for the lack of generality, approximation methods

were developed which turned out to be particularly useful in the limit of suffi-

ciently large particle numbers [98]. The Dutch theoretical physicist Nicholas

van Kampen expands the master equation in the inverse square root of some

extensive quantity, particle number, mass or volume, which is characteristic

of system size and which will be denoted by Ω. In van Kampen’s notation,

a ∝ Ω = extensive variable

x = a/Ω = intensive variable ,(4.51)

the limit of interest is a large value of Ω at fixed x, which is tantamount

to the transition to a macroscopic system. The transition probabilities are

reformulated as

W (a|a′) = W (a′; ∆a) with ∆a = a− a′ ,


and scaled according to the assumption

W (a|a′) = Ωψ

(a′

Ω,∆a

).

The essential trick in the van Kampen expansion is that the size of the jump is

expressed in term of an extensive quantity, ∆a, whereas the intensive variable

x is used for the expression of the dependence on the position variable, a′.

The expansion is made now in the new variable z defined by

a = Ωφ(t) + Ω1/2 z ,

where the function φ(t) is still to be determined. The derivative moments

αn(a) are now proportional to the system size Ω and therefore we can scale

them accordingly: αn(a) = Ω αn(x). In the next step the new variable z is

introduced into the Kramers-Moyal expansion (4.50):

∂P (z, t)

∂t− Ω1/2 ∂φ

∂t

∂P (z, t)

∂z=

=

∞∑

n=1

(−1)nΩ 1−n/2

n!

∂n

∂zn

(αn(φ(t) + Ω−1/2 z

)P (z, t)

)

For general validity of an expansion all terms of a certain order in the ex-

pansion parameter must vanish. We make use of this property to define φ(t)

such that the terms of order Ω1/2 are eliminated by demanding

∂φ

∂t= α1

(φ(t)

). (4.52)

This equation an ODE determining φ(t) and, of course, in full agreement with

the deterministic equation for the expectation value of the random variable.

Accordingly, φ(t) is indeed the deterministic part of the solution.

The next step is an expansion of αn(φ(t)+Ω−1/2z

)in Ω−1/2 and reordering

of terms yielding

∂P (z, t)

∂t=

∞∑

m=2

Ω−(m−2)/2

m!

m∑

n=1

(−1)n(m

n

)αm−nn

(φ(t)

) ∂n

∂zn

(zm−nP (z, t)

)

In taking the limit of large system size Ω all terms vanish except the one

with m = 2 and we find the result

∂P (z, t)

∂t= − α(1)

1

(φ(t)

) ∂∂z

(z P (z, t)

)+

1

2α2

(φ(t)

) ∂2

∂z2P (z, t) , (4.53)

216 Peter Schuster

where α(1)1 stands for the linear drift term. Since the size expansion is of

fundamental importance for an understanding of the relation between micro-

scopic and macroscopic processes, we shall also provide the original slightly

different derivation of van Kampen’s in section 5.2.

It is straightforward to compare with the result of the Kramers-Moyal

expansion (4.50) terminated after two terms:

∂P (a, t)

∂t= − ∂

∂a

(α1(a)P (a, t)

)+

1

2

∂2

∂a2

(α2(a)P (a, t)

).

The change of variables x = a/Ω leads to

∂P (x, t)

∂t= − ∂

∂x

(α1(x)P (x, t)

)+

1

2Ω

∂2

∂x2

(α2(x)P (x, t)

).

The application of small noise theory with ε2 = Ω−1 and for the substitution

z = Ω1/2(x − φ(t)

)one obtains the lowest order Fokker-Planck equation,

which is exactly the same as the lowest order approximation in the van

Kampen expansion. This result has an important consequence: If we are only

interested in the lowest order approximation we may use the Kramers-Moyal

equation, which is much easier to derive than the van Kampen equation.

If only the small noise approximation is approximately valid than it is

appropriate to consider only the linearization of the drift term and individual

solutions of this equations are represented by the trajectories of the stochastic

equation:

dz = α(1)1

(φ(z)

)z dt +

√α2

(φ(t)

)dW (t) . (4.54)

Eventually, we have found a procedure to relate approximately master equa-

tions, Fokker-Planck and stochastic differential equations and to close the

gap between microscopic stochasticity and macroscopic behavior.

The chemical reaction AB as an example. The transition probabilities

for the interval t′ → t of the corresponding single step birth-and-death master

equation with [A]t = a(t), [A]t′ = a′, [B] = b, a fixed concentration, and the

reaction rate parameters k1 and k2 are:

W (a|a′) = δa,a′+1 k2b + δa,a′−1 k1a .


Figure 4.8: Comparison of expansions of the master equation. The re-

action AB with B buffered, [B] = b = b0, is chosen as example and the exact

solution (black) is compared with the results of the Kramers-Moyal expansion (red)

and the van Kampen size expansion (blue). Parameter choice: V = 1, k1 = 2,

k2 = 1, b = 40.

Now we choose the volume of the system, V , as size parameter and have:

a = αV and b = βV . This leads to the scaled transition probability,

W (α′; ∆a) = V(k2β δ∆a,1 + k1α

′ δ∆a,−1

),

and the first two derivative moments

α1 =∑

(a′)

(a′ − a)W (a′|a) = k2b − k1a = V (k2β − k1α) ,

α2 =∑

(a′)

(a′ − a)2W (a′|a) = k2b + k1a = V (k2β + k1α) .

218 Peter Schuster

Following the procedure of van Kampen’s expansion we define

a = V φ(t) + V 1/2 z (4.55)

and obtain for the deterministic differential equation and its solution:

dφ(t)

dt= k2β − k1 φ(t) and φ(t) = φ(0) e−k1t +

k2β

k1

(1− e−k1t) .

The Fokker-Planck equation takes on the form

∂P (z)

∂t= k1

∂

∂z

(z P (z)

)+

1

2

∂2

∂z2

((k2β + k1 φ(t)

)P (z)

)

The expectation value of z is readily computed to be 〈z(t)〉 = z(0)e−k1t.

Since the partition of the variable a in equation (4.55) is arbitrary we can

assume z(0) = 0 – as usual9 – and find for the variance in z

σ2(z(t)

)=

(k2β

k1+ φ(0)

)(1− e−k1t)

and eventually obtain for the solutions in the macroscopic variable a with

a(0) = V φ(0)

〈a(t)〉 = V φ(t) = a(0) e−k1t +k2 b

k1(1− e−k1t) ,

σ2(a(t)

)= V σ2

(z(t)

)=

(k2 b

k1+ a(0)

)(1− e−k1t) .

Finally, we compare the different stationary state solutions obtained from

the van Kampen expansion, α = k2b/k1,

P (z) =1√

πα2

(1 + erf(

√α2)) exp

(− (z − α)2

2α

),

with those derived from the Kramers-Moyal expansion

P (a) = N (k2 b + k1 a)−1+4k2 b/k1 e−2a ,

and the exact solution

P (a) =

(k2 b/k1

)aexp(−k2 b/k1

)

a!=

αa e−α

a!,

which is a Poissonian. A comparison of numerical plots is shown in figure 4.8.

9The assumption z(0) = 0 implies z(t) = 0 and hence the corresponding stochastic

variable Z(t) describes the fluctuations around zero.


4.4 Numerical simulation of master equations

In this section we introduce a model for computer simulation of stochastic

chemical kinetics that has been developed and put upon a solid basis by the

American physicist and mathematical chemist Daniel Gillespie [1–4]. Consid-

ered is a population ofN molecular species, S1,S2, . . . ,SN in the gas phase,

which interact throughM elementary chemical reactions (R1,R2, . . . ,RM).10

Two conditions are assumed to be fulfilled by the system: (i) the container

with constant volume V in the sense of a flow reactor (CSTR in figure 4.2)

is assumed to be well mixed by efficient stirring and (ii) the system is as-

sumed to be in thermal equilibrium at constant temperature T . The goals

of the simulation are the computation of the time course of the stochas-

tic variables – Xk(t) being the number of molecules (K) of species Sk at

time t – and the description of the evolution of the population. A single

computation yields a single trajectory, very much in the sense of a single so-

lution of a stochastic differential equation (figure 3.9) and observable results

are commonly derived through sampling of trajectories. Exceptions are sin-

gle molecule techniques, which allow for experimental observation of single

events including whole trajectories of biopolymer folding and unfolding (see,

for example [28, 30, 106, 107]).

4.4.1 Definitions and conditions

For a reaction mechanism involving N species in M reactions the entire

population is characterized by an N -dimensional random vector counting

numbers of molecules for the various species Sk,

~X (t) =(X1(t),X2(t), . . . ,XN(t)

). (4.56)

The common variables in chemistry are concentrations rather than particle

numbers:

x =(x1(t), x2(t), . . . , xN (t)

)with xk =

XkV ·NL

, (4.57)

10Elementary steps of chemical reactions are defined and discussed in subsection 4.1.1.

220 Peter Schuster

where the volume V is the appropriate expansion parameter Ω for the sys-

tem size (subsection 4.3.3).11 The following derivation of the chemical master

equation [3, pp. 407-417] focuses on reaction channels Rµ of bimolecular na-

ture

Sa + Sb −−−→ Sc + . . . (4.58)

like (4.1f,4.1i,4.1j and 4.1k) shown in the list (4.1). An extension to monomolec-

ular and trimolecular reaction channels is straightforward, and zero-molecular

processes like the influx of material into the reactor in the elementary step (4.1a)

provide no major problems. Reversible reactions, for example (4.19), are

handled as two elementary steps, A + B −→ C + D and C + D −→ A + B.

In equation (4.58) we distinguish between reactant species, A and B, and

product species, C . . . , of a reaction Rµ.

The two stipulations (i) perfect mixture and (ii) thermal equilibrium can

now be cast into precise physical meanings. Premise (i) requires that the

probability of finding the center of an arbitrarily chosen molecule inside a

container subregion with a volume ∆V is equal to ∆V/V . The system is

spatially homogeneous on macroscopic scales but it allows for random fluctu-

ations from homogeneity. Formally, requirement (i) asserts that the position

of a randomly selected molecule is described by a random variable, which is

uniformly distributed over the interior of the container. Premise (ii) implies

that the velocity of a randomly chosen molecule of mass m will be found to

lie within an infinitesimal region dv3 around the velocity v is equal to

PMB =

(m

2πkBT

)e−mv

2/(2kBT ) .

Here, the velocity vector is denoted by v = (vx, vy, vz) in Cartesian coordi-

nates, the infinitesimal volume element fulfils dv3 = dvx dvy dvz, the square

of the velocity is v2 = v2x + v2

y + v2z , and kB is Boltzmann’s constant. Premise

(ii) asserts that the velocities of molecules follow a Maxwell-Boltzmann dis-

tribution or formally it states that each Cartesian velocity component of a

randomly selected molecule of mass m is represented by a random variable,

11In order to distinguish random and deterministic variables stochastic concentrations

are indicated by upright fonts.


Figure 4.9: Sketch of a molecular collision in dilute gases. A spherical

molecule Sa with radius ra moves with a velocity v = vb−va relative to a spherical

molecule Sb with radius rb. If the two molecules are to collide within the next

infinitesimal time interval dt, the center of Sb has to lie inside a cylinder of radius

r = ra + rb and height v dt. The upper and lower surface of the cylinder are

deformed into identically oriented hemispheres of radius r and therefore the volume

of the deformed cylinder is identical with that of the non-deformed one.

which is normally distributed with mean 0 and variance kBT/m. Implicitly,

the two stipulations assert that the molecular position and velocity compo-

nents are all statistically independent of each other. For practical purposes,

we expect premises (i) and (ii) to be valid for any dilute gas system at con-

stant temperature in which nonreactive molecular collisions occur much more

frequently than reactive molecular collisions

4.4.2 The probabilistic rate parameter

In order to derive a chemical master equation for the population variables

Xk(t) some properties of the probability πµ(t, dt) with µ = 1, . . . ,M that a

randomly selected combination of the reactant molecules for reaction Rµ at

time t will react to yield products within the next infinitesimal time interval

[t, t + dt[. With the assumptions made in the previous subsection 4.4.1

virtually all chemical reaction channels fulfil the condition

πµ(t, dt) = γµ dt , (4.59)

222 Peter Schuster

where the specific probability rate parameter γµ is independent of dt. First,

we calculate the rate parameter for a general bimolecular reaction by means

of classical collision theory and then extend briefly to mono- and trimolec-

ular reactions. Apart from the quantum mechanical approach the theory of

collisions in dilute gases is the best developed microscopic model for chemi-

cal reactions and well suited for a rigorous derivation of the master equation

from molecular motion and events.

4.4.2.1 Bimolecular reactions

The occurrence of a reaction A + B has to be preceded by a collision of an

Sa molecule with an Sb molecule, and first we shall calculate the probability

of such a collision in the reaction volume V . For simplicity molecular species

are regarded as spheres with specific masses and radii, for example ma and

ra for Sa, and mb and rb for Sb, respectively. A collision occurs whenever the

center-to center distance of the two molecules RAB decreases to (RAB)min =

ra + rb. Next we define the probability that a randomly selected pair of Rµ

reactant molecules at time t will collide within the next infinitesimal time

interval [t, t+ dt[ by π∗µ(t, dt) and calculate it from the Maxwell-Boltzmann

distribution of molecular velocities according to figure 4.9.

The probability that a randomly selected pair of reactant molecules Rµ,

one molecule Sa and one molecule Sb, has a relative velocity v = vb − va

lying in an infinitesimal volume element dv3 about v at time t is denoted by

P (v(t),Rµ) and can be readily obtained from kinetic theory of gases:

P (v(t),Rµ) =

(µ

2π kBT

)exp(−µv2/(2kBT )

)dv3 .

Herein v = |v| =√v2x + v2

y + v2z is the value of the relative velocity and

µ = mamb/(ma + mb) is the reduced mass of the two Rµ molecules. Two

properties of the probabilities P (v(t),Rµ) for different velocities v are im-

portant:

(i) The elements in the set of all velocity combinations, Ev(t),Rµ are mutu-

ally exclusive, and

(ii) they are collectively exhaustive since v is varied over the entire three


dimensional velocity space.

Now we relate the probability P (v(t),Rµ) to a collision event Ecol by calculat-

ing the conditional probability P (Ecol(t, dt)|Ev(t),Rµ). In figure 4.9 we sketch

the geometry of the collision event between to randomly selected spherical

molecules Sa and Sb that is assumed to occur with an infinitesimal time in-

terval dt:12 A randomly selected molecule Sa moves along the vector v of the

relative velocity vb − vb between Sa and an also randomly selected molecule

Sb. A collision between the molecules will take place in the interval [t, t+ dt

if and only if the center of molecule Sb is inside the spherically distorted

cylinder (figure 4.9) at time t. Thus P (Ecol(t, dt)|Ev(t),Rµ) is the probability

that the center of a randomly selected Sb molecule moving with velocity v(t)

relative to the randomly selected Sa molecule will be situated at time t with

a certain subregion of V , which has a volume Vcol = v dt · π(ra + rb)2, and by

scaling with the total volume V we obtain:13

P(Ecol(t, dt)|Ev(t),Rµ

)=

v(t) dt · π(ra + rb)2

V. (4.60)

By substitution and integration over the entire velocity space we can calculate

the desired probability

π∗µ(t, dt) =

∫∫∫

v

(µ

2π kBT

)e−µv

2/(2kBT ) · v(t) dt · π(ra + rb)2

Vdv3 .

Evaluation of the integral is straightforward and yields

π∗µ(t, dt) =

(8π kBT

V

)1/2(ra + rb)

2

µdt . (4.61)

The first factor contains only constants and the macroscopic quantities, vol-

ume V and temperature T , whereas the molecular parameters, the radii ra

and rb and the reduced mass µ.

12The absolute time t comes into play because the positions of the molecules, ra and

rb, and their velocities, va and vb, depend on t.13Implicitly in the derivation we made use of the infinitesimally small size of dt. Only

if the distance v dt is vanishingly small, the possibility of collisional interference of a third

molecule can be neglected.

224 Peter Schuster

A collision is a necessary but not a sufficient condition for a reaction to

take place and therefore we introduce a collision-conditioned reaction prob-

ability pµ that is the probability that a randomly selected pair of colliding

Rµ reactant molecules will indeed react according to Rµ. By multiplication

of independent probabilities we have

πµ(t, dt) = pµ π∗µ(t, dt) ,

and with respect to equation (4.59) we find

γµ = pµ

(8π kBT

V

)1/2(ra + rb)

2

µ. (4.62)

As said before, it is crucial for the forthcoming analysis that γµ is independent

of dt and this will be the case if and only if pµ does not depend on dt. This

is highly plausible for the above given definition, and an illustrative check

through the detailed examination of bimolecular reactions can be found in

[3, pp.413-417]. It has to be remarked, however, that the application of

classical collision theory to molecular details of chemical reactions can be

an illustrative and useful heuristic at best, because the molecular domain

falls into the realm of quantum phenomena and any theory that aims at a

derivation of reaction probabilities from first principles has to be built upon

a quantum mechanical basis.

4.4.2.2 Monomolecular, trimolecular, and other reactions

A monomolecular reaction is of the form A −→ C and describes the sponta-

neous conversion

Sa −−−→ Sc . (4.63)

One molecule Sa is converted into one molecule Sc. This reaction is different

from a catalyzed conversion

Sa + Sb −−−→ Sc + Sb , (4.58’)


where the conversion A −→ C is initiated by a collision of an A molecule

with a B molecule,14 and a description as an ordinary bimolecular process is

straightforward.

The true monomolecular conversion (4.63) is driven by some quantum

mechanical mechanism similar as in the case of radioactive decay of a nu-

cleus. Time-dependent perturbation theory in quantum mechanics [108,

pp.724-739] shows that almost all weakly perturbed energy-conserving tran-

sitions have linear probabilities of occurrence in time intervals δt, when δt

is microscopically large but macroscopically small. Therefore, to a good ap-

proximation the probability for a radioactive nucleus to decay within the

next infinitesimal time interval dt is of the form α dt, were α is some time-

independent constant. On the basis of analogy we may expect πµ(t, dt) the

probability for a monomolecular conversion to be approximately of the form

γµ dt with γµ being independent of dt.

Trimolecular reactions of the form

Sa + Sb + Sc −−−→ Sd + . . . (4.64)

should not be considered because collisions of three particles do not occur

with a probability larger than of measure zero. There may be, however, spe-

cial situations where approximations of complicated processes by trimolecular

events is justified. One example is a set of three coupled reactions with four

reactant molecules [109, pp.359-361] where is was shown that πµ(t, dt) is

essentially linear in dt.

The last class of reaction to be considered here is no proper chemical

reaction but an influx of material into the reactor. It is often denoted as a

the zeroth order reaction (4.1a):

∗ −−−→ Sa . (4.65)

Here, the definition of the influx and the efficient mixing or homogeneity

condition is helpful, because it guarantees that the number of molecules

entering the homogeneous system is a constant and does not depend on dt.

14Remembering what had been said in subsection 3.6.2 the two reactions are related by

rigorous thermodynamics: Whenever the catalyzed reaction is described an incorporation

of the uncatalyzed process in the reaction mechanism is required.

226 Peter Schuster

4.4.3 Simulation of chemical master equations

So far we succeeded to derive the fundamental fact that for each elementary

reaction channel Rµ with µ = 1, . . . ,M , which is accessible to the molecules

of a well-mixed and thermally equilibrated system in the gas phase, exists a

scalar quantity γµ, which is independent of dt such that [3, p.418]

γµ dt = probability that a randomly selected combination of

Rµ reactant molecules at time t will react accordingly

in the next infinitesimal time interval [t, t+ dt[ .

(4.66)

The specific probability rate constant, γµ is one of three quantities that are

required to fully characterize a particular reaction channel Rµ. In addition we

shall require a function hµ(n) where the vector n = (n1, . . . , nn)′ contains the

exact numbers of all molecules at time t, ~N (t) =(N1(t), . . . ,NN(t)

)′= n(t),

hµ(n) ≡ the number of distinct combinations of Rµ reactant

molecules in the system when the numbers of molecules

Sk are exactly nk with k = 1, . . . , N ,

(4.67)

and an N ×M matrix of integers S = νkµ; k = 1, . . . , N, µ = 1, . . . ,M,where

νkµ ≡ the change in the Sk molecular population caused by the

occurrence of one Rµ reaction.(4.68)

The functions hµ(n) and the matrix N are readily deduced by inspection of

the algebraic structure of the reaction channels. We illustrate by means of

an example:

R1 : S1 + S2 −−−→ S3 + S4 ,

R2 : 2S1 −−−→ S1 + S5 , and (4.69)

R3 : S3 −−−→ S5 .


The functions hµ(n) are obtained by simple combinatorics

h1(n) = n1 n2 ,

h2(n) = n1 (n1 − 1)/2 , and

h3(n) = n3 ,

and the matrix S is of the form

S =

−1 −1 0

−1 0 0

+1 0 −1

+1 0 0

0 +1 +1

.

It is worth noticing that the functional form of hµ is determined exclusive

by the reactant side of Rµ. In particular, has precisely the same form de-

termined by mass action in the deterministic kinetic equations with the ex-

ception that the particle numbers have to counted exactly in small systems,

n(n − 1) instead of n2 for example. The stoichiometric matrix S refers to

the product side of the reaction equations and counts the net production of

molecular species per one elementary reaction event: νkµ is the number of

molecules Sk produces by reaction Rµ, these numbers are integers and nega-

tive values indicate the number of molecules, which have disappeared during

one reaction. In the forthcoming analysis we shall make use of vectors cor-

responding to individual reactions Rµ: νµ = (ν1µ, . . . , νNµ)′.

Analogy to deterministic kinetics. It is illustrative to consider now the

analogy to conventional chemical kinetics. If we denote the concentration

vector of our molecular species Sk by x = (x1, . . . , xN)′ and the flux vector

ϕ = (ϕ1, . . . , ϕN)′ the kinetic equation can be expressed by

dx

dt= S · ϕ . (4.70)

The individual elements of the flux vector in mass action kinetics are

ϕµ = kµ

n∏

k=1

xgkµ

k for g1µ S1 + g2µ S2 + . . . + gNµ SN −−−→

228 Peter Schuster

wherein the factors gkµ are the stoichiometric coefficients on the reactant

side of the reaction equations. It is sometimes useful to define analogous

factors qkµ for the product side, both classes of factors can be summarized

in matrices G and Q and then the stochastic matrix is simply given by the

difference S = Q−G. We illustrate by means of the model mechanism (4.69)

in our example:

Q − G =

0 +1 0

0 0 0

+1 0 0

+1 0 0

0 +1 +1

−

+1 +2 0

+1 0 0

0 0 +1

0 0 0

0 0 0

=

−1 −1 0

−1 0 0

+1 0 −1

+1 0 0

0 +1 +1

= S

We remark that the entries of G and Q are nonnegative integers by defini-

tion. The flux ϕ has the same structure as in the stochastic approach, γµ

corresponds to the kinetic rate parameter or rate constant kµ and the com-

binatorial function hµ and the mass action product are identical apart from

the simplifications for large particle numbers.

Occurrence of reactions. The probability of occurrence of reaction events

within an infinitesimal time interval dt is cast into three theorems:

Theorem 1. If ~X (t) = n, then the probability that exactly one Rµ will occur

in the system within the time interval [t, t+ dt[ is equal to

γµ hµ(n) dt + o( dt) ,

where o( dt) denotes terms that approach zero with dt faster than dt.

Theorem 2. If ~X (t) = n, then the probability that no reaction will occur

within the time interval [t, t+ dt[ is equal to

1 −∑

µ

γµ hµ(n) dt + o( dt) .

Theorem 3. The probability of more than one reaction occurring in the

system within the time interval [t, t+ dt[ is of order o( dt).

Proofs for all three theorems are found in [3, pp.420,421].


Based on the three theorems an analytical description of the evolution of

the population vector ~X (t). The initial state of the system at some initial

time t0 is fixed: ~X (t0) = n0. Although there is no chance to derive a

deterministic equation for the time-evolution, a deterministic function for

the time-evolution of the probability function P (n, t|n0, t0) for t ≥ t0 will

be obtained. We express the probability P (n, t|n0, t0) as the sum of the

probabilities of several mutually exclusive and and collectively exhaustive

routes from ~X (t0) = n0 to ~X (t + dt) = n. These routes are distinguished

from one another with respect to the event that happened in the time interval

[t, t+ dt[:

P (n, t+ dt|n0, t0) = P (n, t|n0, t0) ×

1−M∑

µ=1

γµ hµ(n) dt + o( dt)

+

+M∑

µ=1

P (n− νµ, t|n0, t0) ×(γµ hµ(n− νµ) dt + o( dt)

)+

+ o( dt) .

(4.71)

The different routes from ~X (t0) = n0 to ~X (t + dt) = n are obvious from

the balance equation (4.71):

(i) One route from ~X (t0) = n0 to ~X (t + dt) = n is given by the first

term on the right-hand side of the equation: No reaction is occurring in the

time interval [t, t + dt[ and hence ~X (t) = n was fulfilled at time t. The

joint probability for route (i) is therefore the probability to be in ~X (t) = n

conditioned by ~X (t0) = n0 times the probability that no reaction has occurred

in [t, t+ dt[. In other words, the probability for this route is the probability

to go from n0 at time t0 to n at time t and to stay in this state during the

next interval dt.

(ii) An alternative route from ~X (t0) = n0 to ~X (t + dt) = n accounted

for by one particular term in sum of terms on the right-hand side of the

equation: An Rµ reaction is occurring in the time interval [t, t + dt[ and

hence ~X (t) = n − νµ was fulfilled at time t. The joint probability for

route (ii) is therefore the probability to be in ~X (t) = n − νµ conditioned by

~X (t0) = n0 times the probability that exactly one Rµ reaction has occurred

230 Peter Schuster

in [t, t+ dt[. In other words, the probability for this route is the probability

to go from n0 at time t0 to n− νµ at time t and to undergo an Rµ during

the next interval dt. Obviously, the same consideration is valid for every

elementary reaction and we have M terms of this kind.

(iii) A third possibility – neither no reaction nor exactly one reaction

chosen from the set Rµ; µ = 1, . . . ,M – must inevitably invoke more than

one reaction within the time interval [t, t + dt[. The probability for such

events, however, is o( dt) or of measure zero by theorem 3.

All routes (i) and (ii) are mutually exclusive since different events are

taking place within the last interval [t, t+ dt[.

The last step to derive the chemical master equation is straightforward:

P (n, t|n0, t0) is subtracted from both sides in equation (4.71), then both sides

are divided by dt, the limit dt ↓ 0 is taken, all o( dt) terms vanish and finally

we obtain

∂

∂tP (n, t|n0, t0) =

M∑

µ=1

(γµ hµ(n− νµ)P (n− νµ|n0, t0)−

− γµ hµ(n)P (n, t|n0, t0)).

(4.72)

Initial conditions are required to calculate the time evolution of the proba-

bility P (n, t|n0, t0) and we can easily express them in the form

P (n, t0|n0, t0) =

1 , if n = n0 ,

0 , if n 6= n0 ,(4.72’)

which is precisely the initial condition used in the derivation of equation (4.71).

Any sharp probability distribution P(nk, t0|n(0)

k , t0)

= δ(nk − n(0)k ) is admit-

ted for the molecular particle numbers at t0. The assumption of extended

initial distributions is, of course, also possible but the corresponding master

equation becomes more sophisticated.


4.4.4 The simulation algorithm

The chemical master equation (4.72) as derived in the last subsection 4.4.3 is

closely related to a stochastic simulation algorithm for chemical reactions [1,

2, 4] and it is important to realize how the simulation tool fits into the general

theoretical framework of the chemical master equation. The algorithm is

not based on the probability function P (n, t|n0, t0) but on another related

probability density p(τ,µ|n, t), which expresses the probability that given

~X (t) = n the next reaction in the system will occur in the infinitesimal time

interval [t+ τ, t+ τ + dτ [, and it will be an Rµ reaction.

Figure 4.10: Partitioning of the time interval [t, t+τ+dτ [. The entire interval

is subdivided into (k+ 1) nonoverlapping subintervals. The first k intervals are of

equal size ε = τ/k and the (k + 1)-th interval is of length dτ .

Considering the theory of random variables, p(τ,µ|n, t) is the joint den-

sity function of two random variables: (i) the time to the next reaction, τ ,

and (ii) the index of the next reaction, µ. The possible values of the two

random variables are given by the domain of the real variable 0 ≤ τ < ∞and the integer variable 1 ≤ µ ≤ M . In order to derive an explicit formula

for the probability density p(τ,µ|n, t) we introduce the quantity

a(n) =M∑

µ=1

γµ hµ(n)

and consider the time interval [t, t + τ + dτ [ to be partitioned into k + 1

subintervals, k > 1. The first k of these intervals are chosen to be of equal

length ε = τ/k, and together they cover the interval [t, t + τ [ leaving the

232 Peter Schuster

interval [t + τ, t + τ + dτ [ as the remaining (k + 1)-th part (figure 4.10.

With ~X (t) = n the probability p(τ,µ|n, t) describes the event no reaction

occurring in each of the k ε-size subintervals and exactly one Rµ reaction in

the final infinitesimal dτ interval. Making use of theorems 1 and 2 and the

multiplication law of probabilities we find

p(τ,µ|n, t) =(1 − a(n) ε + o(ε)

)k(γµ hµ(n) dτ + o(dτ)

)

Dividing both sides by dτ and taking the limit dτ ↓ 0 yields

p(τ,µ|n, t) =(1 − a(n) ε + o(ε)

)kγµ hµ(n)

This equation is valid for any integer k > 1 and hence its validity is also

guaranteed for k → ∞. Next we rewrite the first factor on the right-hand

side of the equation

(1 − a(n) ε + o(ε)

)k=

(1 − a(n) kε + k o(ε)

k

)k=

=

(1 − a(n) τ + τ o(ε)/ε

k

)k,

and take now the limit k → ∞ whereby we make use of the simultaneously

occurring convergence o(ε)/ε ↓ 0:

limk→∞

(1 − a(n) ε + o(ε)

)k= lim

k→∞

(1− a(n) τ

k

)k= e−a(n) τ .

By substituting this result into the initial equation for the probability density

of the occurrence of a reaction we find

p(τ,µ|n, t) = a(n) e−a(n) τ γµ hµ(n)

a(n)= γµ hµ(n) e−

∑Mν=1 γνhν (n) τ . (4.73)

Equation (4.73) provides the mathematical basis for the stochastic simu-

lation algorithm. Given ~X (t) = n, the probability density consists of two

independent probabilities where the first factor describes the time to the next

reaction and the second factor the index of the next reaction. These factors

correspond to two statistically independent random variables r1 and r2.


4.4.5 Implementation of the simulation algorithm

Equation (4.73) is implemented now for computer simulation and we inspect

the probability densities of the two unit-interval uniform random variables

r1 and r2 in order to find the conditions to be imposed of a statistically exact

sample pair (τ,µ): r1 has an exponential density function with the decay

constant a(n),

τ =1

a(n)ln(1/r1), (4.74a)

and taking m to be the smallest integer which fulfils

µ = inf

m∣∣∣

m∑

µ=1

cµ hµ(n) > a(n) r2

. (4.74b)

After the values for τ and µ have been determined accordingly the action

advance the state vector ~X (t) of the system is taking place:

~X (t) = n −→ ~X (t+ τ) = n + νµ .

Repeated application of the advancement procedure is the essence of the

stochastic simulation algorithm. It is important to realize that this advance-

ment procedure is exact as far as r1 and r2 are obtained by fair samplings

from a unit-interval uniform random number generator or, in other words,

the correctness of the procedure depends on the quality of the random num-

ber generator applied. Two further issues are important: (i) The algorithm

operates with internal time control that corresponds to real time of the chem-

ical process, and (ii) contrary to the situation in differential equation solvers

the discrete time steps are not finite interval approximations of an infinites-

imal time step and instead, the population vector ~X (t) maintains the value

~X (t) = n throughout the entire finite time interval [t, t + dτ [ and then

changes abruptly to ~X (t + τ) = n + νµ at the instant t + τ when the Rµ

reaction occurs. In other words, there is no blind interval during which the

algorithm is unable to record changes.

234 Peter Schuster

Table 4.1: The combinatorial functions hµ(n) for elementary reactions.

Reactions are ordered with respect to reaction order, which in case of mass action

is identical to the molecularity of the reaction. Order zero implies that no reactant

molecule is involved and the products come from an external source, for example

from the influx in a flow reactor. Orders 1,2 and 3 mean that one, two or three

molecules are involved in the elementary step, respectively.

No. Reaction Order hµ(n)

1 ∗ −→ products 0 1

2 A −→ products 1 nA

3 A + B −→ products 2 nAnB

4 2A −→ products 2 nA(nA − 1)/2

5 A + B + C −→ products 3 nAnBnC

6 2A + B −→ products 3 nA(nA − 1)nB/2

7 3A −→ products 3 nA(nA − 1)(nA − 2)/6

Structure of the algorithm. The time evolution of the population in de-

scribed by the vector ~X (t) = n(t), which is updated after every individual

reaction event. Reactions are chosen from the set R = Rµ; µ = 1, . . . ,M,which is defined by the reaction mechanism under consideration. They are

classified according to the criteria listed in table 4.1. The reaction proba-

bilities corresponding to the reaction rates of deterministic kinetics are con-

tained in a vector a(n) =(c1h1(n), . . . , cMhM(n)

)′, which is also updated

after every individual reaction event. Updating is performed according to

the stoichiometric vectors νµ of the individual reactions Rµ, which represent

columns of the stoichiometric matrix S. We repeat that the combinato-

rial functions hµ(n) are determined exclusively by the reactant side of the

reaction equation whereas the stoichiometric vectors νµ represent the net

production, (products)−(reactants).


The algorithm comprises five steps:

(i) Step 0. Initialization: The time variable is set to t = 0, the initial

values of all N variables X1, . . . ,XN for the species – Xk for species

Sk – are stored, the values for the M parameters of the reactions Rµ,

c1, . . . , cM , are stored, and the combinatorial expressions are incorpo-

rated as factors for the calculation of the reaction rate vector a(n)

according to table 4.1 and the probability density P (τ,µ). Sampling

times, t1 < t2 < · · · and the stopping time tstop are specified, the first

sampling time is set to t1 and stored and the pseudorandom number

generator is initialized by means of seeds or at random.

(ii) Step 1. Monte Carlo step: A pair of random numbers is created (τ,µ)

by the random number generator according to the joint probability

function P (τ,µ). In essence two explicit methods can be used: the

direct method and the first-reaction method.

(iii) Step 2. Propagation step: (τ,µ) is used to advance the simulation time

t and to update the population vector n, t → t + τ and n → n + νµ,

then all changes are incorporated in a recalculation of the reaction rate

vector a.

(iv) Step 3. Time control : Check whether or not the simulation time has

been advanced through the next sampling time ti, and for t > ti send

current t and current n(t) to the output storage and advance the sam-

pling time, ti → ti+1. Then, if t > tstop or if no more reactant molecules

remain leading to hµ = 0 ∀ µ = 1, . . . ,M , finalize the calculation by

switching to step 4, and otherwise continue with step 1.

(v) Step 4. Termination: Prepare for final output by setting flags for early

termination or other unforseen stops and send final time t and final n

to the output storage and terminate the computation.

A caveat is needed for the integration of stiff systems where the values of in-

dividual variable can vary by many orders of magnitude and such a situation

might caught the calculation in a trap by slowing down time progress.

236 Peter Schuster

The Monte Carlo step. Pseudorandom numbers are drawn from a random

number generator of sufficient quality whereby quality is meant in terms of

no or very long recurrence cycles and a the closeness of the distribution of the

pseudorandom numbers r to the uniform distribution on the unit interval:

0 ≤ α < β ≤ 1 =⇒ P (α ≤ r ≤ β) = β − α .

With this prerequisite we discuss now two methods which use two output

values r of the pseudorandom number generator to generate a random pair

(τ,µ) with the prescribed probability density function P (τ,µ).

The direct method. The two-variable probability density is written as the

product of two one-variable density functions:

P (τ,µ) = P1(τ) · P2(µ|τ) .

Here, P1(τ) dτ is the probability that the next reaction will occur between

times t + τ and t + τ + dτ , irrespective of which reaction it might be, and

P2(µ|τ) is the probability that the next reaction will be an Rµ given that

the next reaction occurs at time t+ τ .

By the addition theorem of probabilities, P1(τ) dτ is obtained by summa-

tion of P (τ,µ) dτ over all reactions Rµ:

P1(τ) =

M∑

µ=1

P (τ,µ) . (4.75)

Combining the last two equations we obtain for P2(µ|τ)

P2(µ|τ) = P (τ,µ)/ M∑

ν

P (τ,ν) (4.76)

Equations (4.75) and (4.76) express the two one-variable density functions

in terms of the original two-variable density function P (τ,µ). From equa-

tion (4.73) we substitute into P (τ,µ) = p(τ,µ|n, t) through simplifying the

notation by using

aµ ≡ γµhµ(n) and a =

M∑

µ=1

aµ ≡M∑

µ=1

γµhµ(n)


and find

P1(τ) = a exp(−a τ) , 0 ≤ τ <∞ and

P2(µ|τ) = P2(µ) = aµ

/a , µ = 1, . . . ,M .

(4.77)

As indicated, in this particular case, P2(µ|τ) turns out to be independent

of τ . Both one variable density functions are properly normalized over their

domains of definition:

∫ ∞

0

P1(τ) dτ =

∫ ∞

0

a e−a τ dτ = 1 and

M∑

µ=1

P2(µ) =

M∑

µ=1

aµ

a= 1 .

Thus, in the direct method a random value τ is created from a random

number on the unit interval, r1, and the distribution P1(τ) by taking

τ = − ln r1a

. (4.78)

The second task is to generate a random integer µ according to P2(µ|τ) in

such a way that the pair (τ,µ) will be distributed as prescribed by P (τ,µ).

For this goal another random number, r2, will be drawn from the unit interval

and then µ is taken to be the integer that fulfils

µ−1∑

ν=1

aν < r2 a ≤µ∑

ν=1

aν . (4.79)

The values a1, a2, . . . , are cumulatively added in sequence until their sum

is observed to be equal or to exceed r2a and then µ is set equal to the

index of the last aν term that had been added. Rigorous justifications for

equations (4.78) and (4.79) are found in [1, pp.431-433]. If a fast and reliable

uniform random number generator is available, the direct method can be

easily programmed and rapidly executed. This it represents a simple, fast,

and rigorous procedure for the implementation of the Monte Carlo step of

the simulation algorithm.

The first-reaction method. This alternate method for the implementa-

tion of the Monte Carlo step of the simulation algorithm is not quite as

efficient as the direct method but it is worth presenting here because it adds

238 Peter Schuster

insight into the stochastic simulation approach. Adopting again the notation

aν ≡ γνhν(n) it is straightforward to derive

Pν(τ) dτ = aν exp(−aν τ) dτ (4.80)

from (4.66) and (4.67). Then, Pν(τ) would indeed be the probability at time

t for an Rν reaction to occur in the time interval [t+ τ, t+ τ+dτ [ were it not

for the fact that the number of Rν reactant combinations might have been

altered between t and t+ τ by the occurrence of other reactions. Taking this

into account, a tentative reaction time τν for Rν is generated according to

the probability density function Pν(τ), and in fact, the same can be done for

all reactions Rµ. We draw a random number rν from the unit interval and

compute

τν = − ln rν

aν

, ν = 1, . . . ,M . (4.81)

From these M tentative next reactions the one, which occurs first, is chosen

to be the actual next reactions:

τ = smallest τν for all ν = 1, . . . ,M ,

µ = ν for which τν is smallest .(4.82)

Daniel Gillespie [1, pp.420-421] provides a straightforward proof that the

random (τ,µ) obtained by the first reaction method is in full agreement

with the probability density P (τ,µ) from equation (4.73).

It is tempting to try to extend the first reaction methods by letting the

second next reaction be the one for which τν has the second smallest value.

This, however, is in conflict with correct updating of the vector of particle

numbers, n, because the results of the first reaction are not incorporated into

the combinatorial terms hµ(n). Using the second earliest reaction would, for

example, allow the second reaction to involve molecules already destroyed in

the first reaction but would not allow the second reaction to involve molecules

created ion the first reaction.

Thus, the first reaction method is just as rigorous as the direct method

and it is probably easier to implement in a computer code than the direct

method. From a computational efficiency point of view, however, the direct


method is preferable because for M ≥ 3 it requires fewer random numbers

and hence the first reaction methods is wasteful. This question of economic

use of computer time is not unimportant because stochastic simulations in

general are taxing the random number generator quite heavily. For M ≥ 3

and in particular for large M the direct method is probably the method of

choice for the Monte Carlo step.

An early computer code of the simple version of the algorithm described

– still in FORTRAN – is found in [1]. Meanwhile many attempts were made

in order to speed-up computations and allow for simulation of stiff systems

(see e.g. [110]. A recent review of the simulation methods also contains a

discussion of various improvements of the original code [4].

240 Peter Schuster

5. Applications of stochastic processes in

biology

Compared to stochasticity in chemistry stochastic phenomena in biology are

not only more important but also much harder to control. The major sources

of the problem are small particle numbers and the lack of sufficiently simple

references systems that are accessible to experimental studies. In biology we

are regularly encountering reaction mechanisms that lead to enhancement of

fluctuations at non-equilibrium conditions and biology in essence is dealing

with processes and stationary states far away from equilibrium whereas in

chemistry autocatalysis in non-equilibrium systems became an object of gen-

eral interest and intensive investigation not before some forty years ago. We

start therefore with the analysis of simple autocatalysis modeled by means of

a simple birth-and-death process. Then we present an overview over solvable

birth-and-death processes (section 5.1) and discuss the role of boundaries in

form of different barriers (section 5.1.2). In section 5.2 we come back to the

size expansion for stochastic processes and analyze it with biological prob-

lems in the focus. Finally, the Poisson expansion is presented in section 5.3,

because it finds very useful applications in biology.

5.1 Autocatalysis, replication, and extinction

In the previous chapter we analyzed already bimolecular reactions, the addi-

tion and the dimerization reaction, which gave rise to perfectly normal be-

havior although the analysis was quite sophisticated (subsection 4.2.2). The

nonlinearity became manifest in task to find solutions but did not change

effectively the qualitative behavior of the reaction systems, for example the√N -law for the fluctuations in the stationary states retained its validity. As

an exactly solvable example we shall study first a simple reaction mechanism

241

242 Peter Schuster

consisting of two elementary steps, replication and extinction. In this case

the√N -law is not valid and fluctuations do not settle down to some value

which is proportional to the square root of the size of the system but grow

in time without limit as we saw in case of the Wiener process (3.4.3).

5.1.1 Autocatalytic growth and death

Reproduction of individuals is modeled by a simple duplication mechanism

and death is represented by first order decay. In the language of chemical

kinetics these two steps are:

A + Xλ

−−−→ 2X , (5.1a)

Xµ

−−−→ B . (5.1b)

The rate parameters for reproduction and extinction are denoted by λ and

µ, respectively.1 The material required for reproduction is assumed to be

replenished as it is consumed and hence the amount of A available is con-

stant and assumed to be included in the birth parameter: λ = f · [A]. The

degradation product B does not enter the kinetic equation because reaction

(5.1b) is irreversible. The stochastic process corresponding to equations (5.1)

belongs to the class of linear birth-and-death processes with w+(n) = λ · nand w−(n) = µ · n.2 The master equation is of the form,

∂Pn(t)

∂t= λ (n− 1)Pn−1(t) + µ (n+ 1)Pn+1(t) − (λ+ µ)nPn(t) , (5.2)

1Reproduction is to be understood a asexual reproduction here. Sexual reproduction,

of course, requires two partners and gives rise to a process of order 2 (table 4.1).2Here we use the symbols commonly applied in biology: λ(n) for birth, µ(n) for death,

and ν for immigration and ρ for emigration (tables 5.1 and 5.2). These notions were created

especially for application to biological problems, in particular for problems in theoretical

ecology. Other notions and symbols are common in chemistry: A birth corresponds to the

production of a molecule, f ≡ λ, a death to its decomposition or degradation through a

chemical reaction, d ≡ µ. Influx and outflux are the proper notions for immigration and

emigration.


t

E( )N( )t

Figure 5.1: A growing linear birth-and-death process.The two-step reaction

mechanism of the process is (X → 2X, X → ) with rate parameters λ and

µ, respectively. The upper part shows the evolution of the probability density,

Pn(t) = ProbX (t) = n. The initially infinitely sharp density, P (n, 0) = δ(n, n0)

becomes broader with time and flattens as the variance increases with time. In

the lower part we show the expectation value E(N (t)

)in the confidence interval

E ± σ. Parameters used: n0 = 100, λ =√

2, and µ = 1/√

2; sampling times

(upper part): t = 0 (black), 0.1 (green), 0.2 (turquoise), 0.3 (blue), 0.4 (violet),

0.5 (magenta), 0.75 red), and 1.0 (yellow).

and after introduction of the probability generating function g(s, t) gives rise

to the PDE∂g(s, t)

∂t− (s− 1) (λs− µ)

∂g(s, t)

∂s= 0 . (5.3)

244 Peter Schuster

t

E( )N( )t

Figure 5.2: A decaying linear birth-and-death process. The two-step reac-

tion mechanism of the process is (X → 2X, X → ) with rate parameters λ and

µ, respectively. The upper part shows the evolution of the probability density,

Pn(t) = ProbX (t) = n. The initially infinitely sharp density, P (n, 0) = δ(n, n0)

becomes broader with time and flattens as the variance increases but then sharp-

ens again as process approaches the absorbing barrier at n = 0. In the lower part

we show the expectation value E(N (t)

)in the confidence interval E ± σ. Param-

eters used: n0 = 40, λ = 1/√

2, and µ =√

2; sampling times (upper part): t = 0

(black), 0.1 (green), 0.2 (turquoise), 0.35 (blue), 0.65 (violet), 1.0 (magenta), 1.5

red), 2.0 (orange), 2.5 (yellow), and limt→∞ (black).


Solution of this PDE yields different results for equal or different replication

and extinction rate coefficients, λ 6= µ and λ = µ, respectively. In the first

case we substitute γ = λ/µ ( 6= 1) and η(t) = exp((λ− µ)t

), and find:

g(s, t) =

(η(t)− 1

)+(γ − η(t)

)s(

γη(t)− 1)

+ γ(1− η(t)

)s

n0

and

Pn(t) = γnmin(n,n0)∑

m=0

(−1)m(n0 + n−m− 1

n−m

)(n0

m

)×

×(

1− η(t)1− γη(t)

)n0+n−m(

γ − η(t)γ(1− η(t)

))m

.

(5.4)

In the derivation of the expression for the probability distributions we ex-

panded enumerator and denominator of the expression in the generating

function g(s, t), by using expressions for the sums (1 + s)n =∑n

k=0

(nk

)sk

and (1 + s)−n = 1 +∑∞

k=1(−1)k n(n+1)...(n+k−1)k!

sk, multiply, order terms with

respect to powers of s, and compare with the expansion of the generating

function, g(s, t) =∑∞

n=0 Pn(t) sn.

Computations of expectation value and variance are straightforward:

E(NX(t)

)= n0 e

(λ−µ) t and

σ2(NX(t)

)= n0

λ+ µ

λ− µ e(λ−µ) t

(e(λ−µ) t − 1

) (5.5)

Illustrative examples of linear birth-and-death processes with growing (λ >

µ) and decaying (λ < µ) populations are shown in figures 5.1 and 5.2, re-

spectively.

In the degenerate case of neutrality with respect to growth, µ = λ, the

246 Peter Schuster

same procedure yields:

g(s, t) =

(λt + (1− λt) s1 + λt + λt s

)n0

, (5.6a)

Pn(t) =

(λt

1 + λt

)n0+n min(n,n0)∑

m=0

(n0 + n−m− 1

n−m

)(n0

m

)(1− λ2t2

λ2t2

)m,

(5.6b)

E(NX(t)

)= n0 , and (5.6c)

σ2(NX(t)

)= 2n0 λt . (5.6d)

Comparison of the last two expressions shows the inherent instability of this

reaction system. The expectation value is constant whereas the fluctuations

increase with time. The degenerate birth-and-death process is illustrated in

figure 5.3. The case of steadily increasing fluctuations is in contrast to an

equilibrium situation where both, expectation value and variance approach

constant values. Recalling the Ehrenfest urn game, where fluctuations were

negatively correlated with the deviation from equilibrium, we have here two

uncorrelated processes, replication and extinction. The particle number n

fulfils a kind of random walk on the natural numbers, and indeed in case of

the random walk (see equation (3.30) in subsection 3.4.2 we had also obtained

a constant expectation value E = n0 and a variance that increases linearly

with time, σ2(t) = 2ϑ(t− t0)).A constant expectation value accompanied by a variance that increases

with time has an easy to recognize consequence: At some critical time above

which the standard deviation exceeds the expectation, tcr = n0

/(2λ). From

this instant on predictions on the evolution of the system based on the expec-

tation value become obsolete. Then we have to rely on individual probabili-

ties or other quantities. Useful in this context is the probability of extinction

of all particles, which can be readily computed:

P0(t) =

(λt

1 + λt

)n0

. (5.7)

Provided we wait long enough, the system will die out with probability one,

since we have limt→∞ P0(t) = 1. This seems to be a contradiction to the


n

P tn ( )


248 Peter Schuster

Figure 5.3: Probability density of a linear birth-and-death with equal

birth and death rate. The two-step reaction mechanism of the process is (X→2X, X → ) with rate parameters λ = µ. The upper and the middle part show

the evolution of the probability density, Pn(t) = Prob(X (t) = n

). The initially

infinitely sharp density, P (n, 0) = δ(n, n0) becomes broader with time and flattens

as the variance increases but then sharpens again as the process approaches the

absorbing barrier at n = 0. In the lower part, we show the expectation value

E(N (t)

)in the confidence interval E ± σ. The variance increases linearly with

time and at t = n0/(2λ) = 50 the standard deviation is as large as the expectation

value. Parameters used: n0 = 100, λ = 1; sampling times, upper part: t = 0

(black), 0.1 (green), 0.2 (turquoise), 0.3 (blue), 0.4 (violet), 0.49999 (magenta),

0.99999 red), 2.0 (orange), 10 (yellow), and middle part: t = 10 (yellow), 20

(green), 50 (cyan), 100 (blue), and limt→∞ (black).

constant expectation value. As a matter of fact it is not: In almost all

individual runs the system will go extinct, but there are very few cases of

probability measure zero where the particle number grows to infinity for

t→∞. These rare cases are responsible for the finite expectation value.

Equation (5.7) can be used to derive a simple model for random selection

[111]. We assume a population of n different species

A + Xj

λ

−−−→ 2Xj , j = 1, . . . , n , (5.1a’)

Xj

µ

−−−→ B , j = 1, . . . , n . (5.1b’)

The probability joint distribution of the population is described by

Px1...xn = P(X1(t) = x1, . . . ,Xn(t) = xn

)= P (1)

x1· . . . · P (n)

xn, (5.8)

wherein all probability distribution for individual species are given by equa-

tion (5.6b) and independence of individual birth events as well as death events

allows for the simple product expression. In the spirit of Motoo Kimura’s

neutral theory of evolution [7] all birth and all death parameters are assumed

to be equal, λj = λ and µj = µ for all j = 1, . . . , n. For convenience we as-

sume that every species is initially present in a single copy: Pnj(0) = δnj ,1.


Figure 5.4: The distribution of sequential extinction times Tk. Shown are

the expectation values E(Tk) for n = 20 according to equation(5.10). Since E(T0)diverges, T1 is the extinction that appears on the average at a finite value. A single

species is present above T1 and random selection has occurred in the population.

We introduce a new random variable that has the nature of a first passage

time: Tk is the time up to the extinction of n − k species and characterize

it as sequential extinction time. Accordingly, n species are present in the

population between Tn, which fulfils Tn ≡ 0 by definition, Tn−1, n−1 species

between Tn−1 and Tn−2, and eventually a single species between T1 and T0,which is the moment of extinction of the entire population. After T0 no

particle X exists any more.

Next we consider the probability distribution of the sequential extinction

times

Hk(t) = P (Tk < t) . (5.9)

The probability of extinction of the population is readily calculated: Since

individual reproduction and extinction events are independent we find

H0 = P0,...,0 = P(1)0 · . . . · P (n)

0 =

(λt

1 + λt

)n.

250 Peter Schuster

The event T1 < t can happen in several ways: Either X1 is present and all

other species have become extinct already, or only X2 is present, or only X3,

and so on, but T1 < t is also fulfilled if the whole population has died out:

H1 = Px1 6=0,0,...,0 + P0,x2 6=0,...,0 + P0,0,...,xn 6=0 + H0 .

The probability that a given species has not yet disappeared is obtained by

exclusion since existence and nonexistence are complementary,

Px 6=0 = 1 − P0 = 1 − λt

1 + λt=

1

1 + λt,

which yields the expression for the presence of a single species

H1(t) = (n+ λt)(λt)n−1

(1 + λt)n,

and by similar arguments a recursion formula is found for the extinction

probabilities with higher indices

Hk(t) =

(n

k

)(λt)n−k

(1 + λt)n+ Hk−1(t) ,

that eventually leads to the expression

Hk(t) =

k∑

j=0

(n

j

)(λt)n−j

(1 + λt)n.

The moments of the sequential extinction times are computed straightfor-

wardly by means of a handy trick: Hk is partitioned into terms for the

individual powers of λt, Hk(t) =∑k

j=0 hj(t) and then differentiated with

respect to time t

hj(t) =

(n

j

)(λt)n−k

(1 + λt)n,

dhj(t)

dt= h′j =

λ

(1 + λt)n+1

((n

j

)(n− j)(λt)n−j−1 −

(n

j

)j(λt)n−j

).

The summation of the derivatives is simple because h′k + h′k−1 + . . .+ h′0 is a

telescopic sum and we find

dHk(t)

dt=

(n

k

)(n− k)λn−k tn−k−1

(1 + λt)n+1.


Making use of the definite integral [112, p.338]∫ ∞

0

tn−k

(1 + λt)n+1dt =

λ−(n−k+1)

(nk

)k

,

we finally obtain for the expectation values of the sequential extinction times

E(Tk) =

∫ ∞

0

dHk(t)

dtt dt =

n− kk· 1λ, n ≥ k ≥ 1 , (5.10)

and E(T0) = ∞ (see figure ). It is worth recognizing here another paradox

of probability theory: Although extinction is certain, the expectation value

for the time to extinction diverges. Similarly as the expectation values, we

calculate the variances of the sequential extinction times:

σ2(Tk) =n(n− k)k2(k − 1)

· 1

λ2, n ≥ k ≥ 2 , (5.11)

from which we see that the variances diverges for k = 0 and k = 1.

For distinct birth parameters, λ1, . . . , λn, and different initial particle

numbers, x1(0), . . . , xn(0), the expressions for the expectation values become

considerably more complicated, but the main conclusion remains unaffected:

E(T1) is finite whereas E(T0) diverges.

5.1.2 Boundaries in one step birth-and-death processes

One step birth-and-death processes have been studied extensively and ana-

lytical solutions are available in table form [19]. For transition probabilities

at most linear in n, w+(n) = ν + λn and w−(n) = ρ+ µn, one distinguishes

birth (λ), death (µ), immigration (ν), and emigration (ρ) terms. Analytical

solutions exist for all one step birth-and-death processes whose transitions

probabilities are not of higher order than linear in particle numbers n.

It is necessary, however, to consider also the influence of boundaries on

these processes. For this goal we define an interval [a, b] for the stochastic

process. There are two classes of boundary conditions, absorbing and reflect-

ing boundaries. In the former case, a particle that left the interval is not

allowed to return to it whereas the latter boundary implies that it is forbid-

den to exit from the interval. Boundary conditions can be easily implemented

by ad hoc definitions of transition probabilities:

252 Peter Schuster

Reflecting Absorbing

Boundary at a w−(a) = 0 w+(a− 1) = 0

Boundary at b w+(b) = 0 w−(b+ 1) = 0

The reversible chemical reaction with w−(n) = k1n and w+(n) = k2(n0− n),

for example had two reflecting barriers at a = 0 and b = n0. Among the

examples we have studied so far we found an absorbing boundary in the

replication-extinction process between NA = 1 and NA = 0 tantamount to

the boundary a = 1, and the state n = 0 is an end point of all trajectories

reaching it.

Compared, for example, to an unrestricted random walk on positive and

negative integers, n ∈ Z, a chemical reaction or a biological process has to

be restricted by definition, n ∈ N0, since negative particle numbers are not

allowed. In general, the one step birth-and-death master equation (4.8),

∂Pn(t)

∂t= w+(n−1)Pn−1(t) + w−(n+1)Pn+1(t)−

((w+(n)+w−(n)

)Pn(t) ,

is not restricted to n ∈ N0 and thus does not automatically fulfil the proper

boundary conditions to model a chemical reaction. What we need is a mod-

ification of the equation at n = 0 which introduces a proper boundary of the

process:∂P0(t)

∂t= w−(1)P1(t) − w+(0)P0(t) . (4.8’)

This occurs naturally if w−(n) vanishes for n = 0 which is always so when

the constant or migration term vanishes, ν = 0. With w−(0) = 0 we only

need to make sure that P−1(t) = 0 and obtain equation (4.8’). This will

be the case whenever we take an initial state with Pn(0) = 0 ∀n < 0, and

it is certainly true for our conventional initial condition, Pn(0) = δn,n0 with

n0 ≥ 0. By the same token we prove that the upper reflecting boundary

for chemical reactions, b = n0, fulfils the conditions of being natural too.

Equipped with natural boundary conditions the stochastic process can be

solved for the entire integer range, n ∈ Z, which is often much easier than

with artificial boundaries. All the barriers we have encountered so far were

natural.

Stochastic

Kin

etic

s253

Table 5.1: Comparison of of results for some unrestricted processes. Data are taken from [19, pp.10,11]. Abbrevi-

ation and notations: γ ≡ λ/µ, σ ≡ e(λ−µ)t, (n, n0) ≡ minn, n0, and In(x) is the modified Bessel function.

Process λn µn gn0(s, t) Pn,n0(t) Mean Variance Ref.

Poisson ν 0 sn0 eν(s−1) t (νt)n−n0 eν t

(n−n0)!, n ≥ n0; n0 > (0, n) n0 + νt νt [16]

Poisson 0 ρ sn0 eρ(1−s) t/s (ρt)n−n0 eρ t

(n0−n)! , n ≤ n0; n0 < (0, n) n0 − ρt ρt [16]

ν ρ sn0 e−(ν+ρ)t+(νs+ρ/s)t(νρ

)(n−n0)/2In0−n(2t

√νρ) e−(ν+ρ)t n0 + (ν − ρ)t (ν + ρ)t [113]

Birth λn 0(1− eλt(1− 1/s)

)−n0 (nn0

)e−n0λ t(1− e−λ t)n−n0 , n ≥ n0; n0 > (0, n) n0 e

λt n0 eλt(eλt − 1) [114]

Death 0 µn(1− e−µt(1− s)

)n0 (n0

n

)e−nµ t(1− e−µ t)n0−n , n ≤ n0; n0 < (0, n) n0 e

−µt n0 e−µt(1− e−µt) [114]

ν µn(1− e−µt(1− s)

)n0 × exp(− νµ(1− e−µt)

)× n0 e

−µt+(νµ + n0e

−µt)× [16]

× exp(ν(s− 1)(1 − e−µt)/µ

)×

(n,n0)∑k=0

e−µtk(1−e−µt)n+n0−2k

(n−k)!

(νµ

)n−k+ν(1−e−µt)

µ ×(1− e−µt)

Birth& λn µn(

(σ−1)+(γ−σ)s(γσ−1)γ(1−σ)s

)n0

γn(n,n0)∑k=0

(−1)k(n+n0−k−1

n−k)(n0

k

)× n0 σ

n0σ(γ+1)(σ−1)γ−1 [114]

Death ×(

1−σ1−γσ

)n+n0−k ( 1−σ/γ1−σ

)k

λn λn(λt+(1−λt)s1+λt−λt s

)n (λt

1+λt

)n+n0 ∑(n,n0)k=0

(n0

k

)× n0 2n0λ t

×(n+n0−k−1

n−k) (

1−λ2t2

λ2t2

)k

254Peter

Schuster

Table 5.2: Comparison of of results for some restricted processes. Data are taken from [19, pp.16,17]. Abbreviation

and notations used in the table are: γ ≡ λ/µ, σ ≡ e(λ−µ)t, α ≡ (ν/ρ)(n−n0)/2e(ν+ρ)t; In = I−n ≡ In(2(νρ)1/2t

)where In(x)

is a modified Bessel function; Gn ≡ Gn(ξj , γ) where Gn is a Gottlieb polynomial, Gn ≡ Gn(ξj , γ),Gn(x, γ) ≡ γn

∑nk=0(1− γ−1)k

(nk

)(x−k+1k

)= γnF (−n,−x, 1, 1− γ−1) where F is a hypergeometric function, ξj and ξj are the

roots of Gu−l(ξj , γ) = 0, j = 0, . . . , u− l− 1 and Gu−l+1(ξj, γ) = γ Gu−l(ξj, γ), j = 0, . . . , u− l, respectively; Hn ≡ Hn(ζj , γ),

Hn ≡ Hn(ζj , γ), Hn(x, γ) = Gn(x, γ−1), Hu−l(ζj , γ) = 0, j = 0, . . . , u− l− 1 and Hu+l−1(ζj , γ) = Hu−l(ζj , γ)/γ, respectively.

λn µn Boundaries Pn,n0(t) Ref.

ν ρ u : abs; l : −∞ α(In−n0 − I2u−n−n0

)[16, 115]

ν ρ u : +∞; l : abs α(In−n0 − In+n0−2l

)[16, 115]

ν ρ u : refl; l : −∞ α

(In−n0 +

(νρ

1/2 I2u+l−n−n0 +(1− ρ

ν

))·∑∞

j=2

(νρ

)j/2I2u−n−n0+j

)[16, 115]

ν ρ u : +∞; l : refl α

(In−n0 +

(νρ

1/2 In+n0+l−2u +(1− ρ

ν

))·∑∞

j=2

(νρ

)j/2In+n0−2l+j

)[16, 115]

ν ρ u : abs; l : abs α

(∑∞k=−∞ In−n0+2k(u−l) −

∑∞k=0

(In+n0−2l+2k(u−l) + I2l−n−n0+2k(u−l)

))[16, 115]

λ(n− l + 1) µ(n− l) u : abs; l : refl γl−n∑u−l−1

k=0 Gn0−lGn−l σξk(∑u−l−1

j=0Gj

γj

)−1[116, 117]

λ(n− l + 1) µ(n− l) u : refl; l : refl γl−n∑u−l

k=0 Gn0−lGn−l σξk(∑u−l

j=0Gj

γj

)−1

[116, 117]

λ(u− n) µ(u− n+ 1) u : refl; l : abs γu−n∑u−l−1

k=0 Hu−n0Hu−nσ−ζk(∑u−l−1

j=0 Hjγj)−1

[116, 117]

λ(u− n) µ(u− n+ 1) u : refl; l : refl γu−n∑u−l

k=0 Hu−n0Hu−nσ−ζk(∑u−l

j=0 Hjγj)−1

[116, 117]


For the sake of completeness we summarize the conditions, which can be

introduced in the master equation in order to sustain reflecting or absorbing

barriers for processes described by forward and backward master equations

on the interval [a, b] (see e.g. [6, pp.283-284]).

Forward master equation on [a, b]


Boundary at a w−(a)Pa(t) = w+(a− 1)Pa−1(t) Pa−1(t) = 0

Boundary at b w+(b)Pb(t) = w−(b+ 1)Pb+1(t) Pb+1(t) = 0

Backward master equation on [a, b]


Boundary at a P (., .|a− 1, t′) = P (., .|a, t′) P (., .|a− 1, t′) = 0

Boundary at b P (., .|b+ 1, t′) = P (., .|b, t′) P (., .|b+ 1, t′) = 0

An overview over a few selected birth-and-death processes is given in

tables 5.1 and 5.2. Commonly, unrestricted and restricted processes are

distinguished [19]. An unrestricted process is characterized by the possibility

to reach all states N (t) = n. A requirement imposed by physics demands

that all changes in state space are finite for finite times (growth condition in

subsection 3.5.3) and hence the probabilities to reach infinity at finite times

must vanish: limn→±∞ Pn,n0 = 0. The linear birth and death process in

table 5.1 is unrestricted only in the positive direction and the state N (t) = 0

is special because it represents an absorbing barrier. The restriction is here

hidden and met by the condition Pn,n0(t) = 0 ∀ n < 0.

256 Peter Schuster

5.2 Size expansion in biology

In the previous chapter we introduced a size expansion for the chemical mas-

ter equation (section 4.3.3). Here, we shall repeat the derivation of introduce

this useful technique by means of a simple example, the spreading of an epi-

demic, which is, nevertheless, sufficiently general in order to be transferable

to other cases [98, pp.251-258].

Before we discuss this example, however, we come back to the birth-and-

death master equation (4.8) for one step processes, W (n|n′) = w+(n′)δn,n′−1+

w−(n′)δn,n′+1, where w+(n) and w−(n) are usually analytic functions, which

we shall assume are (at least) twice differentiable. We repeat the one step

master equation:

∂Pn(t)

∂t= w+(n− 1)Pn−1(t) + w−(n+ 1)Pn+1(t) −

(w+(n) + w−(n)

)Pn(t) ,

and define a single step difference operator D by

D f(n) = f(n+ 1) , and D−1f(n) = f(n− 1) . (5.12)

Using this operator we can rewrite the master equation and find

∂P(t)

∂t=

( D− 1)w+(n) + ( D−1 − 1)w−(n)

P(t) . (4.8’)

The jump moments are now

αp(n) = (−1)pw+(n) + w−(n) . (4.6’)

We repeat the macroscopic rate equation

d〈n〉dt

= −w−(〈n〉) + w+(〈n〉) ,

find for the coupled equations for expectation value and variance the simpler

expressions

d〈n〉dt

= w+(〈n〉) − w−(〈n〉) +1

2

(d2w+(〈n〉)

dn2 − d2w−(〈n〉)dn2

)σ 2n , (5.13a)

dσ 2n

dt= w+(〈n〉) + w−(〈n〉) + 2

(dw+(〈n〉)

dn− dw−(〈n〉)

dn

)σ 2n , (5.13b)


Figure 5.5: The macroscopic part of a stochastic variable N . The variable

n is partitioned according to into a macroscopic part and the fluctuations around

it, n = Ωφ(t)+Ω1/2x(t), wherein Ω is a size parameter, for example the size of the

population or the volume of the system. Computations: Ωφ(t) = 5n0(1− 0.8e−kt)

with n0 = 2 and k = 0.5; p(n, t) = Ω1/2x(t) = e−(n−Ωφ(t))2/(2σ2)/√

2πσ2 with

σ = 0.1, 0.17, 0.24, 0.285, 0.30.

and are now in the position to handle the example by means of an expansion

technique (see [98, pp.251-254] and section 4.3.3).

An epidemic spreads in a population of Ω individuals. We assume that

n(t) individuals are already infected. The probability of a new infection

is proportional to both, to the number of infected and to the number of

uninfected individuals, w−(n) = βn(Ω − n). No cure is possible and thus

258 Peter Schuster

w+(n) = 0. Finally, we have

W (n|n′) = β δn,n′+1 n′ (Ω− n′) ,

which leads to the master equation

∂Pn(t)

∂t= β(n− 1)(Ω− n+ 1)Pn+1(t) − βn(Ω− n)Pn(t) or

∂P

∂t= β

(D

−1 − 1)n (Ω− n)P(t) .

(5.14)

Basic to the expansion is the idea that the density of the stochastic variable

N can be split in a macroscopic part, Ωφ(t), and fluctuations of the order

Ω1/2 around it. As shown in figure 5.5 we assume that P (n, t) is represented

by a (relatively) sharp peak located approximately at Ωφ(t) with a width of

order Ω1/2. In other words, we assume that the fluctuations fulfil a√N -law,

and we make the ansatz

n(t) = Ωφ(t) + Ω1/2 x(t) , (5.15)

where x is a new variable describing the fluctuations and the function φ(t)

has to be chosen in accord with the master equation. As said above, Ωφ(t)

is called the macroscopic part and Ω1/2 x the fluctuating part of n. We may

refer to the new variables as an Ω language. The probability density of n

becomes now a probability density Π(x, t) of x:

P (n, t) ∆n = Π(x, t) ∆x ,

Π(x, t) = Ω1/2 P(Ωφ(t) + Ω1/2x, t

).

(5.16)

Differentiation yields3

∂Π

∂x= Ω1/2 ∂P

∂n,

∂Π

∂t= Ω1/2

(Ω

dφ

dt

∂P

∂n+∂P

∂t

),

and eventually we obtain

Ω1/2 ∂P

∂t=

∂Π

∂t− Ω1/2 dφ

dt

∂Π

∂x. (5.17)

3The somewhat unclear differentiation ∂P/∂n can be circumvented through direct vari-

ation of t by δt and simultaneously of x by −Ω1/2φ(t)δt, which leads to the same final

result.


Now, the difference operators are also size expanded in power series of differ-

ential operators

D = 1 + Ω−1/2 ∂

∂x+

1

2Ω−1 ∂2

∂x2+ . . . , (5.18a)

D−1

=1

1 + Ω−1/2 ∂∂x

+ 12Ω−1 ∂2

∂x2 + . . .=

= 1 − Ω−1/2 ∂

∂x− 1

2Ω−1 ∂2

∂x2+ . . . + Ω−1 ∂2

∂x2+ . . . ≈

≈ 1 − Ω−1/2 ∂

∂x+

1

2Ω−1 ∂2

∂x2, (5.18b)

D−1 − 1 = −Ω−1/2 ∂

∂x+

1

2Ω−1 ∂2

∂x2. (5.18c)

Insertion of the operator and substitution of the new variables into the master

equation (5.14) yields after cancelation of an overall factor Ω−1/2

∂Π

∂t− Ω1/2 dφ

dt

∂Π

∂x= βΩ2

(−Ω−1/2 ∂

∂x+

1

2Ω−1 ∂2

∂x2

)·

·((φ+ Ω−1/2x

) (1− φ− Ω−1/2x

)Π(x, t)

).

The right-hand side requires two consecutive differentiations of three factors:

− Ω−1/2 ∂

∂x

((φ+ Ω−1/2x

)·(1− φ− Ω−1/2x

)Π(x, t)

)=

= −(1− 2φ− 2Ω−1/2x

)Π(x, t) − Ω1/2

(φ+ Ω−1/2x

)·(1− φ−Ω−1/2x

) ∂Π

∂x,

1

2Ω−1 ∂2

∂x2

(φ+ Ω−1/2x

)·(1− φ− Ω−1/2x

)Π(x, t)

=

= − Ω−3/2 ∂Π

∂x+

1

2Ω−1

(φ+ Ω−1/2x

)·(1− φ− Ω−1/2x

) ∂2Π

∂x2.

For convenience we introduce a new time scale, τ = βΩt, in order to absorb

one factor Ω – and for convenience also the factor β – into the time variable.

Collection of terms corresponding to the largest powers in Ω now yields

Ω1/2 :dφ

dτ

∂Π(x, τ)

∂x= φ (1− φ)

∂Π(x, τ)

∂x,

Ω0 :∂Π(x, τ)

∂τ= − (1− 2φ)

∂

∂x

(xΠ(x, τ)

)+

1

2φ (1− φ)

∂2Π(x, τ)

∂x2.

260 Peter Schuster

The largest term cancels if

dφ

dτ= φ (1− φ) , (5.19’)

and this yields the differential equation for the macroscopic variable φ(t),

which after transformation back into the original variables leads to the macro-

scopic rate equationdn

dt= β n (Ω − n) . (5.19)

Equating the next largest term, the coefficient of Ω0 to zero results in a linear

Fokker-Planck equation with time dependent coefficients φ(t):

∂Π(x, τ)

∂τ= − (1− 2φ)

∂

∂x

(xΠ(x, τ)

)+

1

2φ (1− φ)

∂2Π(x, τ)

∂x2. (5.20)

Equation (5.20) describes the fluctuations of the random variableN (t) around

the macroscopic part and these fluctuations are of order Ω1/2 as expected and

initially assumed.

The strategy for solving the master equation (5.14) is now obvious. At

first one determines φ(τ) by integrating the differential equation (5.19’) with

the initial value φ(0) = n0/Ω, then one solves the Fokker-Planck equa-

tion (5.20) with the initial condition Π(x, 0) = δ(x) and finally one obtains

the desired probabilities from

P (n, t|n0, 0) = Ω−1/2 Π(n− Ωφ(τ)

Ω1/2, τ)

(5.21)

A typical solution is sketched in figure 5.5 and it compares perfectly with

the exact solutions for sufficiently large systems (see, for example figures 4.4

and 4.5). Remembering the derivation we remark the terms of relative order

Ω−1/2 and smaller have been neglected.


5.3 The Poisson representation

Master equations that are constructed on the basis of combinatorial or mass

action kinetcs and expansion of the probability distribution in Poisson distri-

butions provide a useful technique for the setup of Fokker-Planck equations

and stochastic differential equations, which are exactly equivalent to birth-

and-death master equations with many variables as they are used for the de-

scription of chemical or biological reaction networks. The expansion method

is called Poisson representation and has been developed by Crispin Gardiner

and Subhash Chaturvedi [118, 119] (see also [6, pp.301-335]). In this context

the system size expansion takes on the form of a small noise expansion of the

Fokker-Planck equation obtained by the Poisson representation.

In simple cases the Poisson representation yields stochastic differential

equations, which can be solved straightforwardly, but there are other cases

where the corresponding stochastic differential equation can be formulated

only in the complex plane. Then, the gauge Poisson representation [120]

provides an appropriate frame for solving the more complicated cases, and

recent applications of the theory to practical problems in population biology

have demonstrated the usefulness of the approach [121].

In a very wide class of systems the time development is considered as

the result of individual encounters between members of a population. Such

systems comprise, for example, (i) chemical reactions being caused by colli-

sions between molecules, (ii) biological population dynamics resulting from

giving birth, mating, eating each other and dying, and (iii) systems in epi-

demiology where diseases are transmitted by contacts between individuals.

Encounter based time evolution gives rise to combinatorial kinetics, which on

the macroscopic level is known as mass action kinetics in chemistry.We in-

clude therefore a consideration of birth-an-death processes in many variables

in this section (subsection 5.3.2).

262 Peter Schuster

5.3.1 Motivation of the Poisson representation

Statistical physics relates microscopic processes to macroscopic quantities

and thus provides the link between the realm of atoms and molecules and

thermodynamics. Under the conditions of an ideal gas or an ideal solution

of reacting species probability distributions of microstates are assumed to

be Poissonian and the correctness or suitability of this assumption has been

well established empirically. Considering a system of n molecular species Sj

involved in m reactions Rµ we obtain for the distribution function in the

grand canonical ensemble4

P (σk) = N · exp

(β

(∑

j

µjnj(σk) − εk

)), (5.22)

where σk is a microscopic state characterized by an index (set) ‘k’, nj(σk) is

the number of particles of species Sj in state σk, εk the energy of this par-

ticular state, µj is the chemical potential of species Sj , N is a normalization

factor, and β = 1/(kBT ) with T being the (absolute) temperature in Kelvin.

Chemical reactions are converting chemical species and consequently equi-

librium thermodynamics requires that the chemical potentials fulfil certain

relations. If a state σk is supposed to be converted into a state σ` then

∑

j

ν(S)j nj(σk) =

∑

i

ν(S)i ni(σ`) , S = 1, 2, 3, . . . ,

which implies the stoichiometric restrictions on the reaction system. The

canonical and the grand canonical ensembles are defined by the requirements

∑

j

ν(S)j nj(σk) = τ (S) and

∑

k

P (σk)∑

j

ν(S)j nj(σk) ≡

∑

j

ν(S)j 〈nj〉 = τ (S) ,

4In statistical thermodynamics three ensembles are distinguished [122, pp.513-518]:

The microcanonical ensemble referring to constant energy and maximizing entropy, the

canonical ensemble defined for constant volume and minimizing Helmholtz free energy,

and eventually the grand canonical ensemble is defined by constant chemical potentials µj

and the quantity minimized is the product p V .


respectively. The grand canonical probability distribution results through

maximizing entropy with fixed mean energy with the stoichiometric con-

straint, and this implies that the chemical potential satisfies the relation

µj =∑

S

ν(S)j µS . (5.23)

Eventually one finds for the probability distribution of the population num-

bers of the individual states nj:

P (nj) = N exp

(β∑

j

µjnj

)∏

j

1

nj !

(∑

k

exp(−β ε(A)

k

))nj

. (5.24)

Herein the ε(A)k -values are the eigenstates of a single molecules of A. Equa-

tion (5.24) is a multivariate Poissonian distribution with the expectation

values

〈nj〉 =exp(β µj)∑

k exp(−β ε(A)

k

) . (5.25)

Equation (5.25) combined with the chemical potential (5.23) yields the law

of mass action. Implementation of the stronger constraint for the canonical

ensemble leads to

P (nj) = N∏

A

δ

(∑

j

ν(A)j nj , τ

(A)

)×∏

j

1

nj!

(∑

k

exp(−β ε(A)

k

))nj

. (5.26)

Local fluctuations can be included into the considerations by partitioning the

system into individual cells interrelated by transport and the stoichiometric

relation involve summations over all cells [118]. The canonical distribution

allows for a straightforward proof that one obtains locally Poissonian dis-

tributions, which become uncorrelated in the large volume limit. For local

calculations there is no difference between the canonical and the grand canon-

ical distribution but the latter is much easier to handle for the description of

thermodynamic equilibrium.

The application of pure statistical mechanics thus leads to Poissonian

distributions, locally as well as globally, and it is suggestive therefore to

approximate otherwise hard to derive probability distributions form master

equations by multivariate Poissonians.

264 Peter Schuster

5.3.2 Many variable birth-and-death systems

Combinatorial kinetics is introduced best by means of examples and we shall

start by illustrating a reversible dimerization process (see the subsubsec-

tion 4.2.2.2): X 2Y. The forward reaction X→ 2Y occurs by a kind of

spontaneous fission giving rise to two identical molecules Y. Such a sponta-

neous process can be visualized as a kind of degenerate encounter involving

just one molecule of X. For the transition in the molecular population we

have the probability

w(x→ x− 1, y → y + 2) = k+ x ,

and for the reverse reaction, 2Y→X, pairs of molecules Y have to be as-

sembled, which have a combinatorial probability of y(y− 1)/2 and hence we

find

w(x→ x+ 1, y → y − 2) = k− y(y − 1) .

Chemical multi-variate master equation. Generalization to a reaction

system with n reaction components or chemical species – reactants and/or

products Xj – involved in m individual reactions Rµ where nµGand nµQ

are the numbers of variable chemical reactant species or product species,5

respectively, yields:

nµG∑

j=1

ν(µG)j Xj

k+µ

−−−→←−−−k−µ

nµQ∑

i=1

ν(µQ)i Xi with µ = 1, 2, . . . , m . (5.27)

The stoichiometric coefficients ν(µG)j and ν

(µQ)i represent the numbers of

molecules of species Xj or Xi, respectively, which are involved in the elemen-

tary step of reaction Rµ (5.27).6 The coefficients are properly understood as

5Concentrations or particle numbers of species that are kept constant are assumed to

be incorporated into rate parameters.6For consistency we shall use the formal indices G and Q for the reactant side and the

product side of a general reaction Rµ, respectively. For clarity we use different indices

for summation and products – ‘j’ for reactants and ‘i’ for products – although this is not

demanded by mathematical rules.


elements of three matrices: (i) the stoichiometric reactant matrix G, (ii) the

stoichiometric product matrix Q, and (iii) the stoichiometric matrix S, which

fulfil the relation Q−G = S. The particle numbers or concentrations of the

individual chemical species are subsumed in the vector,

x(t) = ([X1], [X2], . . . , [Xn]) = (x1, x2, . . . , xn)′ ,

and making use of individual, reaction specific columns of the three matrices,

G, Q, and S, we obtain for the changes in particle numbers for one elementary

step of the reaction Rµ:

S =

ν(1)1 ν

(2)1 · · · ν

(µ)1 · · · ν

(m)1

ν(1)2 ν

(2)2 · · · ν

(µ)2 · · · ν

(m)2

......

. . ....

. . ....

ν(1)n ν

(2)n · · · ν

(µ)n · · · ν

(m)n

.

For each reaction we define a vector

r(µ) = α (ν(µ)1 ,ν

(µ)1 , . . . ,ν(µ)

n )′ , (5.28)

which, in principle, accounts for α steps of the reaction Rµ. It is straight-

forward now to write down the changes in particle numbers caused by α

elementary steps of reaction R(µ) in either direction, forward or backward:

x −→ x + r(µ) in forward direction, and

x −→ x − r(µ) in backward direction.(5.29)

The reaction rates or transition probabilities are readily calculated from the

reaction rate parameters and the combinatorics of molecular encounters

w+µ

= k+µ

nµG∏

j=1

xj !

(xj − ν(µG)j )!

, and

w−µ

= k−µ

nµQ∏

i=1

xi!

(xi − ν(µQ)i )!

.

(5.30)

For large particle numbers and in macroscopic deterministic kinetics the com-

binatorial expressions are approximated by the terms with the highest power

266 Peter Schuster

in the particle numbers

w+µ≈ k+

µ

nµG∏

j=1

xµGj

ν(µG)j !

and w−µ≈ k−

µ

nµQ∏

i=1

xµQ

i

ν(µQ)i !

.

Finally, we obtain the master equation for an arbitrary chemical reaction

step (5.27) of a reaction network

∂P (x, t)

∂t=

m∑

µ=1

(w−

µ(x + r(µ))P (x + r(µ), t) − w+

µ(x)P (x, t) +

+w+µ(x− r(µ))P (x− r(µ), t) − w−

µ(x)P (x, t)

),

(5.31)

or written in terms of the original parameters:

∂P (x, t)

∂t=

m∑

µ=1

k+µ

(( n∏

j=1

(xj + ν(µG)j − ν(µQ)

j )!

(xj − ν(µQ)j )!

)P(x + ν(µG) − ν(µQ), t

)−

−( n∏

j=1

xj!

(xj − ν(µG)j )!

)P(x, t))

+

+

m∑

µ=1

k−µ

(( n∏

j=1

(xj − ν(µG)j + ν

(µQ)j )!

(xj − ν(µG)j )!

)P(x− ν(µG) + ν(µQ), t

)−

−( n∏

j=1

xj!

(xj − ν(µQ)j )!

)P(x, t))

. (5.31’)

In equations (5.28) and (5.31) steps of any size α are permitted, the single-

step birth-and-death master equation for the chemical reaction network, how-

ever, is tantamount to the restriction α = 1 (see also subsection 4.4.3).

Generating functions. For combinatorial kinetics we can derive a fairly

simple differential equation for the probability generation function:

g(s, t) =xmax∑

x=0

(n∏

j=1

sxj

j

)P (x, t) (5.32)

with xmax =(x

(max)1 , x

(max)1 , . . . , x

(max)n

)being a vector collecting the (in-

dividually) maximal values of particle numbers for individual species and


0 = (0, 0, . . . , 0) being the zero-vector. Now we shall calculate the corre-

sponding partial differential equation. For this goal we separate the two

parts corresponding to step-up and step-down functions as expressed by the

transition probabilities w+µ

and w−µ, respectively:

∂g(s, t)

∂t=∂+g(s, t)

∂t+∂−g(s, t)

∂t

As follows from equation (5.30) the corresponding equations are of the form

∂+g(s, t)

∂t=∑

µ,x

k+µ

( nµG∏

j=1

((xj − rµ

j )!

(xj − rµ

j − ν(µG)j )!

sxj

j

)P (x− rµ, t) −

−nµG∏

j=1

(xj !

(xj − ν(µG)j )!

sxj

j

)P (x, t)

),

(5.33)

whereby the step-down equation (∂−g(s, t)/∂t) has the same structure and

the two equations are related by replacing the superscripts, + ↔ −, the

running index, i ↔ j, and numbers of reactions, nµG↔ nµQ

,7 and the stoi-

chiometric coefficients νµGj ↔ ν

µQ

i . Next we change the summation variable

in the first term from x to x − r(µ) and obtain, because the summation is

extended over the entire domain of x and all probabilities for x-values outside

vanish:

∂+g(s, t)

∂t=∑

µ,x

k+µ

(nµG∏

j=1

xj!

(xj − ν(µG)j )!

sxj+r

(µ)j

j −nµG∏

j=1

xj !

(xj − ν(µG)j )!

sxj

j

)

P (x, t) ,

For further simplification we make use of the easy to verify expressions

∏

j

sxj

j xj !

(xj − ν(µG)j )!

=∏

j

(∂ ν

(µG)j

∂xjν(µG)j

sxj

j

)sν(µG)j

j and

∏

j

sxj+r

(µ)j

j xj !

(xj − ν(µG)j )!

=∏

j

(∂ ν

(µG)j

∂xjν(µG)j

sxj

j

)

sν(µQ)

j

j ,

7The numbers nµG and nµQ need not be the same in case some reactions are considered

to proceed irreversibly. An alternative approach uses nµG = nµQ and zero values for some

rate parameters.

268 Peter Schuster

and obtain for the step-up equation

∂+g(s, t)

∂t=∑

µ

k+µ

(∏

i

sν(µQ)

ii −

∏

j

sν(µG)j

j

)∂ ν

(µG)j g(s, t)

∂xν(µG)j

j

.

A similar formula is derived for the step-down equation and summation of

the step-up and the step-down equation finally yields the general expression

of the differential equation for the generating function of chemical networks:

∂g(s, t)

∂t=∑

µ

((∏

i

sν(µQ)

ii −

∏

j

sν(µG)j

j

)×

×(k+

µ

∏

j

∂ ν(µG)j

∂xν(µG)j

j

− k−µ

∏

i

∂ ν(µQ)

i

∂xν(µQ)

ii

))g(s, t) .

(5.34)

We illustrate by means of examples.

Two reactions with a single variable. The two-step reaction mechanism

A + Xk1−−−→ 2X + D and

B + X

k2

−−−→←−−−k3

C .(5.35)

consists of an irreversible autocatalytic reaction step and a reversible addition

reaction. The concentrations of three molecular species are assumed to be

fixed: [A] = a0, [B] = b0, and [C] = c0, species D does not enter the kinetic

equations since it is the product of an irreversible reaction and hence, only a

single variable x = [X] has to be considered. Among other applications this

model is a simple implementation of the processes taking place in a nuclear

reactor. The first reaction represents nuclear fission: One neutron (X) hits

a nucleus A and releases to neutrons (2X) and residual products (D). The

second reaction describes absorption of neutrons by B and creation by the

reverse process.

The three stoichiometric matrices have a very simple form – n = 1 and

m = 2 – in this case:

G = (1 1) , Q = (2 0) , and S = Q − G = (1 − 1) .


The constant concentrations can be absorbed into the rate parameters and

we have

k+1 = k1 a0 = α , k−1 = 0 , k+

2 = k2 b0 = β , k−2 = k3 c0 = γ .

From equation (5.34) we obtain by insertion and some calculation:

∂g(s, t)

∂t= (1− s)(β − α s)

∂g(s, t)

∂s− (1− s) γ g(s, t) . (5.36)

Formal division by dg yields the characteristic equations

dt

1= − ds

(1− s)(β − αs) =dg

γ(1− s) g ,

leading to two ODEs

− dt =ds

(1− s)(β − αs) and − 1

γ

dg

g=

ds

β − αs

that can be readily integrated and yield the solutions

u =1− sβ − αs e

(α−β)t and v = (β − αs)γ/α g .

The general solution can now be written v = f(u) where f(u) is some function

still to be determined

g(s, t) = (β − αs)−γ/α f((

1− sβ − αs

)e(α−β)t

)

.

This equation sustains a variety of time-dependent solutions. From the condi-

tional probability P (x, t|x0, 0) that represents the initial conditions we obtain

g(s, 0) = sx0 and

f(z) = (1− βz)x0 (1− αz)−x0/α−x0 (β − α)γ/α

and with λ = β − α the probability generating function takes on the form

g(s, t) = λγ/α(β(1− e−λt)− s(α− βe−λt)

)x0

×

×((β − αe−λt)− αs(1− e−λt)

)−γ/α−x0

.

(5.37)

270 Peter Schuster

First and second moment for the probability distribution of the random vari-

able X (t) are readily computed from the derivatives of the generating func-

tion by means of equation (2.71):

∂g(s, t)

∂s

∣∣∣s=1

= 〈x(t)〉 and∂2g(s, t)

∂s2

∣∣∣s=1

=⟨x(t)

(x(t)− 1

)⟩. (2.71’)

The computation then yields for the time derivatives

d〈x(t)〉dt

= (α− β) 〈x(t)〉 + γ and

d⟨x(t)

(x(t)− 1

)⟩

dt= 2(α− β)

⟨x(t)

(x(t)− 1

)⟩+ 2(α + γ) 〈x(t)〉

For α < β the two equations have a stationary solution and the stationary

mean and variance are

〈x〉 =γ

β − α and σ2(x) =βγ

(α− β)2. (5.38)

Equation (5.37) allows for straightforward calculation of stationary states

limt→∞

g(s, t) = (β − α)γ/α(β − sα)x0(β − sα)−γ/α−x0 ,

g(s) =

(β − αβ − sα

)γ/α, and

P (x) =Γ(x+ γ/α) (α/β)x

Γ(γ/α) x!(β − α)γ/α . (5.39)

A stationary solution exist only if α < β or [A] < k2 · [B]/k1 is fulfilled. For

α > β the system is unstable in the sense that 〈x(t)〉 diverges exponentially,

limt→∞ 〈x(t)〉 = ∞. Expectation value and variance at the steady state are

readily computed from (∂g(s)/∂s)|s=1 and (∂2g(s)/∂s2)|s=1 and, of course,

yield the same result as in equation (5.38).

Interesting phenomena arise when α = k1a0 approaches β = k2b0, this is

when the number of particles X created by reaction one approximates the

number of particles X, which are annihilated in reaction two. Since both

quantities α and β involve the input concentrations of A and B, respectively,

the critical quantity β − α can easily fine tuned in experiments. It is inter-

esting that both moments become very large when the critical value α = β


in approached since

σ2(x)

〈x〉 =β

β − α =k2b0

k2b0 − k1a0→ ∞ for α→ β ,

and accordingly we are dealing with very large fluctuations in 〈x〉 near the

critical point. Since the system is Markovian and the differential equation

for the expectation value is linear we have

〈x(t), x(0)〉 = σ2(x) eα−β = σ2(x) ek1a0−k2b0 ,

and the relaxation of fluctuations becomes very slow near the critical point,

a phenomenon that is known as critical slowing down.

5.3.3 The formalism of the Poisson representation

The basic assumption is that a probability distribution P (x, t) can be ex-

panded as a superposition of uncorrelated, multivariate Poisson distributions:

P (x, t) =

∫dα

∏

k

e−αkαxkk

xk!f(α, t) , (5.40)

where f(α, t) is the quasiprobability of the Poisson representation, and then

the probability generating function g(s, t) can be written as

g(s, t) =

∫dα exp

(∑

k

(sk − 1)αk

)

f(α, t) . (5.41)

Substitution of g(s, t) into the probability generating function in equation (5.34),

which stems from the general master equation (5.31’) for a chemical reaction,

leads to

∂g(s, t)

∂t=∑

µ

∫dα

((∏

i

(∂

∂αi+ 1

)ν(µQ)

i

−∏

j

(∂

∂αj+ 1

)ν(µG)j)×

×(k+

µ

∏

i

αν(µQ)

ii − k−

µ

∏

j

αν(µG)j

j

)eΣk(sk−1)αk

)f(α, t) .

272 Peter Schuster

Integration by parts, neglect of surface terms and finally comparison of co-

efficients in the exponential functions yields [6, pp.301,302]

∂f(α, t)

∂t=∑

µ

(∏

i

(1− ∂

∂αi

)ν(µQ)

i

−∏

j

(1− ∂

∂αj

)ν(µG)j

)×

×(k+

µ

∏

i

αν(µQ)

ii − k−

µ

∏

j

αν(µG)j

j

)f(α, t) .

(5.42)

The function f(α, t) is obtained as solution of equation (5.42) and insertion

into equation (5.40) yields the approximation to the probability distribution

P (x, t).

The introduction of a reaction flux J(α) with the components

Jµ(α) = k+µ

∏

i

αν(µQ)

ii − k−

µ

∏

j

αν(µG)j

j (5.43)

facilitates the formulation of a Fokker-Planck equation. If the repertoire

of elementary steps does not contain reaction orders larger than two or at

maximum bimolecular reactions – which is almost always the case in realistic

systems – then the Fokker-Planck equation contains no higher derivatives

than second order:

∂f(α, t)

∂t= −

n∑

i=1

∂

∂αi

(∑

µ

Aµ

i Jµ(α)

)

f(α, t) +

+1

2

n∑

i,j=1

∂2

∂αi∂αjBij

(J(α)

)f(α, t) with

Aµ

i = ν(µQ)i − ν

(µG)i and

Bij

(J(α)

)= δi,j

∑

µ

(ν

(µQ)i (ν

(µQ)i − 1)− ν(µG)

i (ν(µG)i − 1)

)Jµ(α) +

+ (1− δi,j)∑

µ

(ν

(µQ)i ν

(µQ)j − ν(µG)

i ν(µG)j

)Jµ(α) .

(5.44)

For practical purposes it is much more convenient not to work with the

Fokker-Planck equation (5.44) but with the equivalent stochastic differential


equation in Ito form:

dαi =∑

µ

Aµ

i Jµ(α) dt +n∑

j=1

cij(J(α)

)dW (t) with

cij(J(α)

)=(B(J(α)

))

ij,

(5.45)

where dW (t) is a differential Wiener process.

In order to calculate quantities of interest from equation (5.45) in the

inverse power of the system size V , it is useful to define

η =α

V, κ+

µ=

k+µ

V∑

i ν(µQ)

i +1, and κ−

µ=

k−µ

V∑

i ν(µG)i +1

.

and the stochastic differential equation takes on the form

dαi =∑

µ

Aµ

i Jµ(α) dt + εn∑

j=1

cij(J(α)

)dW (t) with ε =

1√V.

We illustrate by means of two examples.

The monomolecular reversible reaction. Here, we consider the monomolec-

ular reversible reaction

X

k1

−−−→←−−−k2

Y (5.46)

as an exercise in two variables, although mass conservation would allow for

the elimination of one variable. It can be described by the master equation

∂P (x, y, t)

∂t= k1

((x+ 1)P (x+ 1, y − 1, t) − xP (x, y, t)

)+

+ k2

((y + 1)P (x− 1, y + 1, t) − y P (x, y, t)

),

(5.47)

where P (x, y, t) = Prob(X (t) = x,Y(t) = y).8 Now we expand P (x, y, t) in

Poisson distributions according to equation (5.40) whereby N is a normal-

ization factor and the region of integration is still to be determined:

P (x, y, t) = N∫

dαx dαy e−αx

αxxx!e−αy

αyyy!f(αx, αy, t) . (5.48)

8It is important to keep in mind that depending on the experimental setup the stochas-

tic variables X and Y may be dependent or independent. In closed systems we have the

conservation relation X + Y = N or x+ y = n with N and n being constants.

274 Peter Schuster

In general, a Poisson representation in terms of a function f(α, t) need not

exist but it can be proven to exist in terms of generalized functions:

Any distribution in x can be realized as a linear combination of Kronecker

deltas δx,z, which can be chosen to fulfill9

δx,z =

∫dα e−α

αx

x!

(δz(−α) eα

).

Now we choose fz(α) = (−1)z δz(α) eα and find through integration by parts∫

dα fz(α) e−ααx

x!=

∫dα

αx

x!

(− d

dα

)zδ(α) = δx,z ,

and for a one-variable probability distribution we can write

P (x) =

∫dα e−α

αx

x!

(∑

z

(−1)z P (z) δz(α) eα

),

and in a formal sense a function f(α) can always be found for any P (x).

Commonly, however, one need not rely on such rather bizarre distribu-

tions as the one that has been used for the proof of existence of a Poisson

quasiprobability. If the quasiprobability vanishes at the boundary of the re-

gion of integration, substitution of (5.48) into (5.47) and integration by parts

yields the Fokker-Planck equation for f(αx, αy, t)

∂f(αx, αy, t)

∂t=

(− ∂

∂αx+

∂

∂αy

)((k1αx − k2αy) f(αx, αy, t)

)(5.49)

It is important to note that the diffusion coefficient in equation (5.49) is

zero and this is generally observed for all first-order reactions giving rise to

linear kinetic equations: All fluctuations obey the Poissonian law of noise as

it is encapsulated in the Poissonian distribution. The range of integration

in equation (5.48) follows from the solution of (5.49) by searching for the

manifold in the (αx, αy) plane as the boundary on which f(αx, αy) vanishes.

The general steady state solution satisfying the boundary condition of

vanishing f(αx, αy) at the limits of the range of integration can be written

f(αx, αy) = δ(k1αx − k2αy)φ(αx, αy) , (5.50)

9The n-derivative of the delta function with respect to x is denoted by δn(x).


where φ(αx, αy) still is some arbitrary function.10

If we choose, for example, φ(αx, αy) = δ(αx− x) the corresponding steady

state solution P (x, y) is

P (x, y) = N∫

dαx dαy e−αx

αxxx!

e−αyαyyy!

δ(k1αx − k2αy) δ(αx − x) .

The range of integration is any region, which contains the point where the

arguments of both delta functions vanish. Eventually we find for the equi-

librium distribution

P (x, y) =xx

x!

yy

y!e−(x+y) . (5.51)

which is a Poisson distribution in x and y wherein the values x and y are

related by the condition for the deterministic equilibrium: k1x− k2y = 0.

Alternatively, we could use

φ(αx, αy) = (−1)n δn(αy) eαx+αy .

Then, we obtain for the stationary probability distribution

P (x, y) =n!

nn1

x!

1

y!xx yy δ(x+ y − n) and with y = n− x

P (x) =1

nn

(n

x

)xx (n− x)(n−x) ,

(5.52)

which is a binomial distribution. In this case the condition for the determin-

istic equilibrium is k1x = k2y = n k1k2/(k1+k2). The two choices correspond

to the grand canonical and the canonical ensemble, respectively, and we see

nicely that the latter is more restricted than the former: In the closed system

the total number of molecules X and Y is constant, X + Y = N .

10An alternative but equivalent ansatz for the stationary solution is

f(αx, αy) = g(αx + αy)/(k1αx − k2αy) but because of the pole at k1αx = k2αy it requires

integration in the complex plane (subsection 5.3.4). If integration is restricted to a region

along the real axis (5.50) is unique.

276 Peter Schuster

Two reactions with a single variable. The reaction mechanism studied

here is closely related to the nuclear reactor model (5.35):

A + X

k1

−−−→←−−−k2

2X and

B + X

k3

−−−→←−−−k4

C ,

(5.53)

both reaction are assumed to be reversible and the amounts of three molecular

species are assumed to be constant, [A] = a0, [B] = b0, and [C] = c0.11 The

stoichiometric matrices for n = 1 and m = 2 have a very simple form:

G = (1 1) , Q = (2 0) , and S = Q − G = (1 − 1) .

The constant concentrations can be absorbed into the rate parameters and

we have

k+1 = k1 a0, k

−1 = k2 , k

+2 = k3 b0 , k

−2 = k4 c0 .

Equation (5.42) for the function f(α, t) in the Poisson representation now

takes on the form

∂f(α, t)

∂t=

((1− ∂

∂α

)2

−(

1− ∂

∂α

))(k1a0 α− k2 α

2) f(α, t) +

+

(1−

(1− ∂

∂α

))(k3b0 α− k4c0) f(α, t) and

eventually leading to

∂f(α, t)

∂t=

(− ∂

∂α

(k4c0 + (k1a0 − k3b0)α− k2α

2)

+

+∂2

∂α2(k1a0α− k2α

2)

)f(α, t) .

(5.54)

11We remark that the mechanism (5.53) is compatible with equilibrium thermodynamics

if and only if the relation (k1a0 · k3b0 − k2 · k4c0) = ϑ = 0 is fulfilled.


This equation is of Fokker-Planck form as long as D = k1a0α − k2α2 > 0

is fulfilled. The critical values of α where the expression for D vanishes are

α1 = 0 and α2 = k1a0/k2 between these two values we have D > 0 and a

conventional Fokker-Planck equation exists.

For one variable the factorial moments are obtained from the simple re-

lationship (2.75d) and they are of the general form

〈xr〉 ≡∑

x

∫dα(x(x− 1) . . . (x− r + 1)

)e−α

x!f(α) =

=

∫dααr f(α) ≡ 〈αr〉 .

(5.55)

There is, however, one caveat: The quasiprobability f(α) need not fulfil the

conditions required for a probability.

The stationary solution of the Fokker-Planck equation 5.54 up to normal-

ization is

f(α) = N (k1a0 − k2α)k3b0/k2−k4c0/(k1a0)−1 αk4c0/(k1a0)−1 × eα . (5.56)

Although this is a fairly smooth function, the prerequisites for a probability

density – f(α) being nonnegative everywhere and normalizable – need not

be fulfilled. With ϑ = (k3b0/k2− k4c0/k1a0) normalization is possible on the

interval ]0, k1a0/k2[ provided ϑ > 0 and k4 > 0 are fulfilled. In addition, one

has to check whether the surface terms vanish under these conditions, which

is the case in the current example, and thus there exists a genuine Fokker-

Planck equation for ϑ > 0 and k4 > 0 that is equivalent to the stochastic

differential equation

dα =(k4c0 + (k1a0 − k3b0)α− k2 α

2)

dt +√

2(k1a0 α− k2 α2) dW (t) .

(5.57)

The domain of the variable is 0 < α < k1a0/k2 one can readily verify that

both boundaries satisfy the criteria for entrance boundaries. Accordingly,

it is not possible to leave the interval ]0, k1a0/k2[. Outside this interval the

coefficient of dW (t) becomes imaginary and an interpretation of the SDE on

the real axis alone in no longer possible.

278 Peter Schuster

5.3.4 Real, complex, and positive Poisson representations

In the case of a single variable only the Poisson representation of the proba-

bility distribution P (x, t) is of the form

P (x, t) =

∫

D

dµ(α)e−

ααx

x!f(α) , (5.58)

where µ(α) is a measure that can be chosen in different ways and D is the

domain of integration, which can take on various forms depending on the

choice of µ(α).

In real Poisson representations the choice of the measure is dµ(α) = dα

and D is a section [a, b] of the real line. As we have seen in the second

example of the previous subsection 5.3.3 there may be situations where the

diffusion coefficient becomes negative and then a real Poisson representation

does not exist.

For complex Poisson representations we choose again the simple measure

dµ(α) = dα but D is a contour C in the complex plane. In order to analyze

the existence of a complex Poisson representation we choose

fz =z!

2πı.ıα−z−1 eα

and C being a contour surrounding the origin instead of the expression

fz(α) = (−1)zδz(α)eα used previously for the real representation. Then

we can show that

Pz(x) =1

2πı.ı

z!

x!

∮

C

dααx−z−1 = δx,z

By appropriate summation we may express a given probability distribution

P (x) in terms of functions f(α) which are now given by

f(α) =1

2πı.ı

∑

z

P (z) eα α−z−1 z! . (5.59)

Provided the P (z) have the property that z!P (z) is bounded for all z, the

series has a finite radius of convergence outside which f(α) is analytic. By


Figure 5.6: Definition of the contour C in the complex plane The sketch

refers to the reaction mechanism (5.53). The contour C is chosen to enclose the

pole on the real axis at α = k1a0/k2. The range on the real axis that is accessible

to the real Poisson representation, the stretch ]0, k1a0/k2[, is shown as full line.

choosing the contour C to lie outside this circle of convergence, the integra-

tion can be taken inside the summation and we find that P (x) is finally given

by

P (x) =

∮

C

dαe−α αx

x!f(α) . (5.60)

Equation (5.60) is the analogue of the real representation in the complex

plane and, for example, can be used for functions f(α) that have poles on

the real axis.

The complex Poisson representation allows for the calculation of station-

ary solutions in analytic form to which exact or asymptotic methods are easily

applicable. In general, it is nor so useful for the derivation of time-dependent

solutions.

Two reactions with a single variable. Again we adopt the reaction

model (5.53) of subsection 5.3.3 with the notation introduced there. When a

steady state exists, the quantity ϑ = (k3b0/k2−k4c0/k1a0) provides a measure

for the relative direction in which the two reactions are progressing: ϑ > 0

implies that the first reaction produces and the second reaction consumes

X, at ϑ = 0 both reaction balance separately and we have thermodynamic

280 Peter Schuster

equilibrium(see the footnote to the mechanism (5.53)

), and ϑ < 0 the first

reaction consumes X whereas the second reaction produces it.12 Now we dis-

cuss the three conditions separately:

(i) ϑ> 0: This case has been analyzed in subsection 5.3.3. The conditions

for f(α) to be a valid quasiprobability on the real interval ]0, k1a0/k2[ are

fulfilled. Within this range the diffusion coefficient D = k1a0α − k2α2 is

positive and the deterministic mean of α given by

α = 〈α〉 =k1a0 − k3b0 +

√(k1a0 − k3b0)2 + 4k2 k4c0

2k2,

lies within the interval under consideration, ]0, k1a0/k2[, we are dealing with

a genuine Fokker-Planck equation and f(α) is a function vanishing at both

ends of the interval and having its maximum near the deterministic mean.

(ii) ϑ= 0: Both reaction balance separately and the existence of Poissonian

steady state is expected. The quasiprobability f(α) has a pole at α = k1a0/k2

and the range of α is chosen to be a contour C in the complex plane enclosing

this pole. No boundary terms will arise for a closed contour C through

partial integration and hence P (x) resulting from this type of Poissonian

representation satisfies the steady state master equation. By calculus of

residues we find that

P (x) =eα0 αx0x!

with α0 =k1a0

k2. (5.61)

(iii) ϑ< 0: The steady state does no longer satisfy the condition ϑ > 0. If,

however, the range of α is chosen to be a contour C in the complex plane

as shown in figure 5.6 the complex Poisson representation can be used to

construct a steady state solution of the master equation that has the form:

P (x) =

∮

C

dα f(α)e−α αx

x!

Now the deterministic steady state corresponds to a point on the real axis

the is situated to the right of the singularity at α = k1a0/k2 and asymptotic

12This example demonstrates nicely the difference between stationarity and detailed

balance: All three cases describe steady states but only the condition ϑ = 0, the state of

thermodynamic equilibrium, is compatible with detailed balance.


evaluation of means, higher moments and other quantities may be performed

by choosing C to pass through the saddle point occurring there. Then, the

variance σ2(α) = 〈α2〉 − 〈α〉2 is negative. This has the consequence that the

variance in x, which is obtained as the variance in α plus the variance of the

Poisson distribution, 〈α〉,

σ2(x) =⟨x2⟩− 〈x〉2 =

⟨α2⟩− 〈α〉2 + 〈α〉 , (5.62)

is smaller than the variance of the Poissonian: σ2(x) < 〈x〉. In other words,

the steady distribution is narrower than the Poissonian distribution.

Finally, we stress that all three cases may be obtained from the contour

C. For ϑ = 0 the cut from the singularity at α = k1a0/k2 to α = −∞and C can be distorted to a simple contour around the pole. If ϑ > 0 the

singularity becomes integrable and the contour C may be collapsed onto the

cut. The integral to be evaluated is a discontinuity integral over the full range

[0, k1a0/k2] that might need modifications for ϑ being a positive integer.

For the positive Poisson representation we choose α to be a genuine two

valued complex variable, α ≡ αx + ı.ıαy,

dµ(α) = d2α = dαx dαy ,

and D being the entire complex plane. Then, it can be proven [6, pp.309-312]

that for any P (x) there exists a positive f(α) such that

P (x) =

∫d2α

e−α αx

x!f(α) . (5.63)

A positive Poisson representation, fp(α), however, need not be unique as we

show by means of an example. We choose

fp =1

2πσ2exp

(− |α− α0|2

2σ2

)

and g(α) an analytic function of α that can be expanded according to

g(α) = g(α0) +

∞∑

n=1

g(n)(α0)(α− α0)

n

n!such that

∫1

2πσ2exp

(− |α− α0|2

2σ2

)g(α) = g(α0) ,

282 Peter Schuster

since all terms with n ≤ 1 vanish upon integration. As the Poissonian

e−ααx/x! itself is an analytic function we find for any positive value of σ2

P (x) =

∫d2α

e−α αx

x!fp(α) =

e−α0 αx0x!

.

Nonuniqueness of this kind is an advantage in practice rather than a problem.

As as example we consider again mechanism (5.53) that gives rise to the

SDE in the complex plane since α = αx + ı.ı αy

dα =(k4c0 + (k1a0 − k3b0)α− k2 α

2)

dt +√

2(k1a0 α− k2 α2) dW (t) .

(5.64)

Again the term ϑ plays the decisive role: For ϑ > 0 the noise term vanishes

at α = 0 and α = k1a0/k2, is positive between these point and the drift term

takes care that α returns to the range ]0, k1a0/k2[ whenever it reaches one

of the endpoints. For ϑ > 0 equation (5.64) is the real stochastic differential

equation (5.57) on the real interval [0, k1a0/k2].

For ϑ < 0 the stationary point lies outside the interval [0, k1a0/k2], and

a point inside the interval will migrate according to (5.64) along the interval

until it reaches the right-hand end, where the noise vanishes but the drift

continues to drive it further towards the right. On leaving the interval the

noise becomes imaginary and the point will start to diffuse in the complex

plane until it may eventually return again to the interval [0, k1a0/k2]. Thus,

the behavior for the entire range of ϑ is encapsulated in a single SDE.

Application to logistic growth. Based on the mechanism (5.53) an ex-

tensive stochastic analysis of the logistic model in population dynamics has

been performed by Alexei and Peter Drummond [121, 123]. The deterministic

dynamics is described by the differential equation

dx

dt= x (g − c x) ,

which is identical with the logistic equation of Pierre-Francois Verhulst [124].

In this formulation, g is the unconstrained growth rate and g/c the carrying

capacity of the ecosystem. The dynamical system has an equilibrium point

at x = g/c and a second stationary state at x = 0. Drummond and Drum-

mond add a death reaction in addition the competition process allowing for


limitation of the population size. The implementation of the logistic model

makes use of three processes:

Death : Xa

−−−→ ,

Birth : Xb

−−−→ 2X , and (5.65)

Competition : 2Xc

−−−→ X .

Parametrization yields the three quantities: (i) the net growth rate g = b−a,(ii) the carrying capacity Nc = g/c, and (iii) the reproductive ratio r = b/a.

Comparison with the mechanism (5.53) shows that this logistic model is a

special case of it with the substitutions: k1a0 → b, k2 → c, k3b0 → a, and

k4c0 → 0.

284 Peter Schuster

Bibliography

[1] D. T. Gillespie. A general method for numerically simulating the stochastictime evolution of coupled chemical reactions. J.Comp. Phys., 22:403–434,1976.

[2] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions.J. Phys. Chem., 81:2340–2361, 1977.

[3] D. T. Gillespie. A rigorous derivation of the chemical master equation.Physica A, 188:404–425, 1992.

[4] D. T. Gillespie. Stochastic simulation of chemical kinetics.Annu.Rev. Phys. Chem., 58:35–55, 2007.

[5] K. L. Chung. Elementary Probability Theory with Stochastic Processes.Springer-Verlag, New York, 3rd edition, 1979.

[6] C. W. Gardiner. Stochastic Methods. A Handbook for the Natural Sciencesand Social Sciences. Springer Series in Synergetics. Springer-Verlag, Berlin,fourth edition, 2009.

[7] M. Kimura. The Neutral Theory of Molecular Evolution. CambridgeUniversity Press, Cambridge, UK, 1983.

[8] D. S. Moore, G. P. McCabe, and B. Craig. Introduction to the Practice ofStatistics. W. H. Freeman & Co., New York, sixth edition, 2009.

[9] H. Risken. TheFokker-Planck Equation. Methods of Solution andApplications. Springer-Verlag, Berlin, 2nd edition, 1989.

[10] M. Fisz. Wahrscheinlichkeitsrechnung und mathematische Statistik. VEBDeutscher Verlag der Wissenschaft, Berlin, 1989.

[11] H. Georgii. Stochastik. Einfuhrung in die Wahrscheinlichkeitstheorie undStatistik. Walter de Gruyter GmbH & Co., Berlin, third edition, 2007.

[12] D. S. Moore and W. I. Notz. Statistics. Concepts and Controversies.W. H.Freeman & Co., New York, seventh edition, 2009.

[13] W. I. Notz and M. A. Fligner. Study Guide for Moore and McCabe’sIntroduction to the Practice of Statistics. W. H.Freeman, New York, thirdedition, 1999.

285

286 BIBLIOGRAPHY

[14] N. Henze. Stochastik fur Einsteiger. Eine Einfuhrung in die faszinierendeWelt des Zufalls. Vieweg Verlag, Braunschweig, DE, fourth edition, 2003.

[15] M. S. Bartlett. An Introduction to Stochastic Processes with SpecialReference to Methods and Applications. Cambrigde University Press,Cambridge, UK, 3rd edition, 1978.

[16] D. R. Cox and H. D. Miller. The Theory of Stochastic Processes. Methuen,London, 1965.

[17] J. L. Doob. Stochastic Processes. John Wiley & Sons, New York, 1953.

[18] W. Feller. An Introduction to Probability Theory and its Applications,volume I and II. John Wiley, New York, 1966.

[19] N. S. Goel and N. Richter-Dyn. Stochastic Models in Biology. AcademicPress, New York, 1974.

[20] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics.Addison-Wesley, Reading, MA, 2nd edition, 1994.

[21] M. Iosifescu and P. Tautu. Stochastic Processes and Application in Biologyand Medicine. I. Theory, volume 3 of Biomathematics. Springer-Verlag,Berlin, 1973.

[22] M. Iosifescu and P. Tautu. Stochastic Processes and Application in Biologyand Medicine. II. Models, volume 4 of Biomathematics. Springer-Verlag,Berlin, 1973.

[23] S. Karlin and H. M. Taylor. A First Course in Stochastic Processes.Academic Press, New York, 1975.

[24] S. Karlin and H. M. Taylor. A Second Course in Stochastic Processes.Academic Press, New York, 1981.

[25] A. Stuart and J. K. Ord. Kendall’s Advanced Theory of Statistics. Volume1: Distribution Theory. Charles Griffin & Co., London, fifth edition, 1987.

[26] A. Stuart and J. K. Ord. Kendall’s Advanced Theory of Statistics. Volume2: Classical Inference and Relationship. Edward Arnold, London, fifthedition, 1991.

[27] A. Loman, I. Gregor, C. Stutz, M. Mund, and J. Enderlein. Measuringrotational diffusion of macromolecules by fluorescence correlationspectroscopy. Photochem. Photobiol. Sci., 9:627–636, 2010.

[28] M. Gosch and R. Rigler. Fluorescence correlation spectroscopy of molecularmotions and kinetics. Advanced Drug Delivery Reviews, 57:169–190, 2005.

BIBLIOGRAPHY 287

[29] S. T. Hess, S. Huang, A. A. Heikal, and W. W. Webb. Biological andchemical applications of fluorescence correlation spectroscopy: A review.Biochemistry, 41:697–705, 2002.

[30] J. Hohlbein, K. Gryte, M. Heilemann, and A. N. Kapanidis. Surfing on anew wave of single-molecule fluourescence methods. Phys. Biol., 7:031001,2010.

[31] E. Haustein and P. Schwille. Single-molecule spectroscopic methods.Curr. Op. Struct. Biol., 14:531–540, 2004.

[32] R. Brown. A brief description of microscopical observations made in themonths of June, July and August 1827, on the particles contained in thepollen of plants, and on the general existence of active molecules in organicand inorganic bodies. Phil.Mag., Series 2, 4:161–173, 1828.

[33] A. Einstein. Uber die von der molekular-kinetischen Theorie der Warmegeforderte Bewegung von in ruhenden Flussigkeiten suspendierten Teilchen.Annal. Phys. (Leipzig), 17:549–560, 1905.

[34] M. von Smoluchowski. Zur kinetischen Theorie der BrownschenMolekularbewegung und der Suspensionen. Annal. Phys. (Leipzig),21:756–780, 1906.

[35] E. N. Lorenz. Deterministic nonperiodic flow. J.Atmospheric Sciences,20:130–141, 1963.

[36] G. Mendel. Versuche uber Pflanzen-Hybriden. Verhandlungen desnaturforschenden Vereins in Brunn, 4:3–47, 1866.

[37] R. A. Fisher. Has Mendel’s work been rediscovered? Annals of Science,1:115–137, 1936.

[38] A. Franklin, A. W. F. Edwards, D. Fairbanks, D. Hartl, and T. Seidenfeld.Ending the Mendel-Fisher Controversy. University of Pittsburgh Press,Pittsburgh, PA, 2008.

[39] W. Penney. Problem: Penney-Ante. J.Recreational Math., 2(October):241,1969.

[40] R. T. Cox. The Algebra of Probable Inference. The John Hopkins Press,Baltimore, MD, 1961.

[41] E. T. Jaynes. Probability Theory. The Logic of Science. CambridgeUniversity Press, Cambridge, UK, 2003.

[42] G. Vitali. Sul problema della misura dei gruppi di punti di una retta. TipiGamberini e Parmeggiani, Bologna, 1905.

288 BIBLIOGRAPHY

[43] P. Billingsley. Probability and Measure. Wiley-Interscience, New York,third edition, 1995.

[44] M. Carter and B. van Brunt. The Lebesgue-Stieltjes Integral. A PracticalIntroduction. Springer-Verlag, Berlin, 2007.

[45] D. Meintrup and S. Schaffler. Stochastik. Theorie und Anwendungen.Springer-Verlag, Berlin, 2005.

[46] G. de Beer, Sir. Mendel, Darwin, and Fisher. Notes and Records of theRoyal Society of London, 19:192–226, 1964.

[47] K. Sander. Darwin und Mendel. Wendepunkte im biologischen Denken.Biologie in unserer Zeit, 18:161–167, 1988.

[48] T. Bayes and R. Price. An essay towards solving a problem in the doctrineof chances. By the late Rev. Mr.Bayes, communicated by Mr. Price, in aletter to John Canton, M.A. and F.R.S. Phil. Trans. Roy. Soc. London,53:370–418, 1763.

[49] C. P. Robert. The Bayesian Choice: From Decision-Theoretic Foundationsto Computational Implementation. Springer-Verlag, Berlin, 2007.

[50] P. M. Lee. Bayesian Statistics. An Introduction. Hodder Arnold, NewYork, third edition, 2004.

[51] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian DataAnalysis. Chapman& Hall / CRC, Boca Raton, FL, second edition, 2004.

[52] B. E. Cooper. Statistics for Experimentalists. Pergamon Press, Oxford,1969.

[53] J. F. Kenney and E. S. Keeping. Mathematics of Statistics. Van Nostrand,Princeton, NJ, second edition, 1951.

[54] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling.Numerical Recipes. The Art of Scientific Computing. Cambridge UniversityPress, Cembridge, UK, 1986.

[55] J. F. Kenney and E. S. Keeping. The k-Statistics. In Mathematics ofStatistics. Part I, § 7.9. Van Nostrand, Princeton, NJ, third edition, 1962.

[56] M. Evans, N. A. J. Hastings, and J. B. Peacock. Statistical Distributions.John Wiley & Sons, New York, third edition, 2000.

[57] N. A. Weber. Dimorphism of the African oecophylla worker and ananomaly (hymenoptera formicidae). Annals of the Entomological Society ofAmerica, 39:7–10, 1946.

BIBLIOGRAPHY 289

[58] M. F. Schilling, A. E. Watkins, and W. Watkins. Is human heightbimodal? The American Statistician, 56:223–229, 2002.

[59] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin,first edition, 1983.

[60] D. Williams. Diffusions, Markov Processes and Martingales. Volume 1:Foundations. John Wiley & Sons, Chichester, UK, 1979.

[61] B. K. Øksendal. Stochastic Differential Equations. An Introduction withApplications. Springer-Verlag, Berlin, sixth edition, 2003.

[62] M. Schubert and G. Weber. Quantentheorie. Grundlagen undAnwendungen. Spektrum Akademischer Verlag, Heidelberg, DE, 1993.

[63] R. W. Robinett. Quantum Mechanics. Classical Results, Modern Systems,and Visualized Examples. Oxford University Press, New York, 1997.

[64] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin,second edition, 1985.

[65] I. F. Gihman and A. V. Skorohod. The Theory of Stochastic Processes.Vol. I, II, and III. Springer-Verlag, Berlin, 1975.

[66] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the Brownianmotion. Phys. Rev., 36:823–841, 1930.

[67] P. Langevin. On the theory of Brownian motion. C.R.Acad. Sci. (Paris),146:530–533, 1908.

[68] L. Arnold. Stochastic Differential Equations. Theory and Applications.John Wiley & Sons, New York, 1974.

[69] P. Medvegyev. Stochastic Integration Theory. Oxford University Press,New York, 2007.

[70] P. E. Protter. Stochastic Intergration and Differential Equations,volume 21 of Applications of Mathematics. Springer-Verlag, Berlin, secondedition, 2004.

[71] K. Ito. Stochastic integral. Proc. Imp. Acad. Tokyo, 20:519–524, 1944.

[72] K. Ito. On stochastic differential equations. Mem.Amer.Math. Soc.,4:1–51, 1951.

[73] R. L. Stratonovich. Introduction to the Theory of Random Noise. Gordonand Breach, New York, 1963.

[74] D. L. Fisk. Quasi-martingales. Trans. Amer.Math. Soc., 120:369–389, 1965.

290 BIBLIOGRAPHY

[75] A. D. Fokker. Die mittlere Energie rotierender elektrischer Dipole imStrahlungsfeld. Annal. Phys. (Leipzig), 43:810–820, 1914.

[76] M. Planck. Uber einen Satz der statistischen Dynamik und seineErweiterung in der Quantentheorie.Sitzungsber. Preuss.Akad.Wiss. Phys.Math.Kl., 1917/I:324–341, 1917.

[77] A. N. Kolmogorov. Uber die analytischen Methoden in derWahrscheinlichkeitsrechnung. Mathematische Annalen, 104:415–418, 1931.

[78] W. Feller. The parabolic differential equations and the associatedsemi-groups of transformations. Annals of Mathematics, Second Series,55:468–519, 1952.

[79] R. C. Tolman. The Principle of Statistical Mechanics. Oxford UniversityPress, Oxford, UK, 1938.

[80] N. G. van Kampen. Derivation of the phenomenological equations from themaster equation. I. Even variables only. Physica, 23:707–719, 1957.

[81] N. G. van Kampen. Derivation of the phenomenological equations from themaster equation. II. Even and odd variables. Physica, 23:816–729, 1957.

[82] R. Graham and H. Haken. Generalized thermodynamic potential forMarkoff systems in detailed balance and far from thermal equilibrium.Z.Physik, 243:289–302, 1971.

[83] L. Onsager. Reciprocal relations in irreversible processes. I. Phys. Rev.,37:405–426, 1931.

[84] L. Onsager. Reciprocal relations in irreversible processes. II. Phys. Rev.,38:2265–2279, 1931.

[85] W. W. S. Wei. Time Series Analysis. Univariate and MultivariateMethods. Addison-Wesley Publ. Co., Redwood City, CA, 1990.

[86] E. W. Montroll and K. E. Shuler. Studies in nonequilibrium rate processes:I. The relaxation of a system of harmonic oscillators. J. Chem. Phys.,26:454–464, 1956.

[87] K. E. Shuler. Studies in nonequilibrium rate processes: II. The relaxationof vibrational nonequilibrium distributions in chemical reactions and shockwaves. J. Phys. Chem., 61:849–856, 1957.

[88] N. W. Bazley, E. W. Montroll, R. J. Rubin, and K. E. Shuler. Studies innonequilibrium rate processes: III. The vibrational relaxation of a systemof anharmonic oscillators. J.Chem. Phys., 28:700–704, 1958.

BIBLIOGRAPHY 291

[89] A. F. Bartholomay. On the linear birth and death processes of biology asMarkoff chains. Bull.Math. Biophys., 20:97–118, 1958.

[90] A. F. Bartholomay. Stochastic models for chemical reactions: I. Theory ofthe unimolecular reaction process. Bull.Math. Biophys., 20:175–190, 1958.

[91] A. F. Bartholomay. Stochastic models for chemical reactions: II. Theunimolecular rate constant. Bull.Math. Biophys., 21:363–373, 1959.

[92] S. K. Kim. Mean first passage time for a random walker and its applicationto chemical knietics. J.Chem. Phys., 28:1057–1067, 1958.

[93] D. A. McQuarrie. Kinetics of small systems. I. J.Chem. Phys., 38:433–436,1962.

[94] D. A. McQuarrie, C. J. Jachimowski, and M. E. Russell. Kinetics of smallsystems. II. J.Chem. Phys., 40:2914–2921, 1964.

[95] K. Ishida. Stochastic model for bimolecular reaction. J.Chem. Phys.,41:2472–2478, 1964.

[96] I. G. Darvey and P. J. Staff. Stochastic approach to first-order chemicalreaction kinetics. J.Chem. Phys., 44:990–997, 1966.

[97] D. A. McQuarrie. Stochastic approach to chemical kinetics. J.Appl. Prob.,4:413–478, 1967.

[98] N. G. van Kampen. The expansion of the master equation.Adv.Chem. Phys., 34:245–309, 1976.

[99] G. Nicolis and I. Prigogine. Self-Organization in Nonequilibrium Systems.John Wiley & Sons, New York, 1977.

[100] A. M. Turing. The chemical basis of morphogenesis.Phil. Trans. Roy. Soc. London B, 237(641):37–72, 1952.

[101] H. Meinhardt. Models of Biological Pattern Formation. Academic Press,London, 1982.

[102] P. Ehrenfest and T. Ehrenfest. Uber zwei bekannte Einwande gegen dasBoltzmannsche H-Theorem. Z.Phys., 8:311–314, 1907.

[103] M. Abramowitz and I. A. Segun, editors. Handbook of MathematicalFunctions with Formulas, Graphs, and Mathematical Tables, New York,1965. Dover Publications.

[104] H. A. Kramers. Brownian motion in a field of force and the diffusion modelof chemical reactions. Physica, 7:284–304, 1940.

292 BIBLIOGRAPHY

[105] J. E. Moyal. Stochastic porcesses and statistical physics.J.Roy. Statist. Soc. B, 11:151–210, 1949.

[106] A. Janshoff, M. Neitzert, Y. Oberdorfer, and H. Fuchs. Force spectroscopyof molecular systems – single molecule spectroscopy of polymers andbiomolecules. Angew.Chem. Int. Ed., 39:3212–3237, 2000.

[107] W. K. Zhang and X. Zhang. Single molecule mechanochemistry ofmacromolecules. Prog. Polym. Sci., 28:1271–1295, 2003.

[108] A. Messiah. Quantum Mechanics, volume II. North-Holland PublishingCompany, Amsterdam, NL, 1970.

[109] D. T. Gillespie. Markov Processes: An Introduction for Physical Sceintists.Academic Press, San Diego, CA, 1992.

[110] Y. Cao, D. T. Gillespie, and L. R. Petzold. Efficient step size selection forthe tau-leaping simulation method. J.Chem. Phys., 124:044109, 2004.

[111] P. Schuster and K. Sigmund. Random selection – A simple model based onlinear birth and death processes. Bull.Math. Biol., 46:11–17, 1984.

[112] I. S. Gradstein and I. M. Ryshik. Tables of Series, Products, and Integrals,volume 1. Verlag Harri Deutsch, Thun, DE, 1981.

[113] C. R. Heathcote and J. E. Moyal. The random walk (in continuous time)and its application to the theory of queues. Biometrika, 46:400–411, 1959.

[114] N. T. J. Bailey. The Elements of Stochastic Processes with Application inthe Natural Sciences. Wiley, New York, 1964.

[115] E. W. Montroll. Stochastic processes and chemical kinetics. In W. M.Muller, editor, Energetics in Metallurgical Phenomenon, volume 3, pages123–187. Gordon & Breach, New York, 1967.

[116] E. W. Montroll and K. E. Shuler. The application of the theory ofstochastic processes to chemical kinetics. Adv. Chem. Phys., 1:361–399,1958.

[117] K. E. Shuler, G. H. Weiss, and K. Anderson. Studies in nonequilibriumrate processes. V. The relaxation of moments derived from a masterequation. J.Math. Phys., 3:550–556, 1962.

[118] C. W. Gardiner and S. Chaturvedi. The Poisson representation. I. A newtechnique for chemical master equations. J. Statist. Phys., 17:429–468, 1977.

[119] S. Chaturvedi and C. W. Gardiner. The Poisson representation. II.Two-time correlation functions. J. Statist. Phys., 18:501–522, 1978.

BIBLIOGRAPHY 293

[120] P. D. Drummond. Gauge Poisson representation for birth-death masterequations. Eur. Phys. J. B, 38:617–634, 2004.

[121] P. D. Drummond, T. G. Vaughan, and A. J. Drummond. Extinction timesin autocatalytic systems. J. Phys. Chem.A, 114:10481–10491, 2010.

[122] R. S. Berry, S. A. Rice, and J. Ross. Physical Chemistry. OxfordUniversity Press, New York, second edition, 2000.

[123] A. J. Drummond and P. D. Drummond. Extinction in a self-regulatingpopulation with demographic and environmental noise.ArXiv: 0807.4772v2, the University of Auckland, Auckland, NZ and theUniversity of Queensland, Brisbane, QLD, AU, 2008.

[124] P. Verhulst. Recherches mathematiques sur la loi d’accroisement de lapopulation. Nouv.Mem. de l’Academie Royale des Sci. et Belles-Lettres deBruxelles, 18:1–41, 1845.

294 BIBLIOGRAPHY

Contents

1 History and Classical Probability 5

1.1 Precision limits and fluctuations . . . . . . . . . . . . . . . . . 7

1.2 Thinking in terms of probability . . . . . . . . . . . . . . . . . 10

2 Probabilities, Random Variables, and Densities 17

2.1 Sets and sample spaces . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Probability measure on countable sample spaces . . . . . . . . 22

2.2.1 Probabilities on countable sample spaces . . . . . . . . 22

2.2.2 Random variables and functions . . . . . . . . . . . . . 27

2.2.3 Probabilities on intervals . . . . . . . . . . . . . . . . . 32

2.3 Probability measure on uncountable sample spaces . . . . . . 34

2.3.1 Existence of non-measurable sets . . . . . . . . . . . . 34

2.3.2 Borel σ-algebra and Lebesgue measure . . . . . . . . . 36

2.3.3 Random variables on uncountable sets . . . . . . . . . 42

2.3.4 Limits of series of random variables . . . . . . . . . . . 46

2.3.5 Stieljes and Lebesgue integration . . . . . . . . . . . . 47

2.4 Conditional probabilities and independence . . . . . . . . . . . 56

2.5 Expectation values and higher moments . . . . . . . . . . . . 60

2.5.1 First and second moments . . . . . . . . . . . . . . . . 61

2.5.2 Higher moments . . . . . . . . . . . . . . . . . . . . . . 67

2.6 Mathematical statistics . . . . . . . . . . . . . . . . . . . . . . 70

2.7 Distributions, densities and generating functions . . . . . . . . 74

2.7.1 Probability generating functions . . . . . . . . . . . . . 74

2.7.2 Moment generating functions . . . . . . . . . . . . . . 76

2.7.3 Characteristic functions . . . . . . . . . . . . . . . . . 76

2.7.4 The Poisson distribution . . . . . . . . . . . . . . . . . 79

2.7.5 The binomial distribution . . . . . . . . . . . . . . . . 81

2.7.6 The normal distribution . . . . . . . . . . . . . . . . . 83

2.7.7 Central limit theorem and the law of large numbers . . 91

2.7.8 The Cauchy-Lorentz distribution . . . . . . . . . . . . 96

2.7.9 Bimodal distributions . . . . . . . . . . . . . . . . . . . 97

I

II CONTENTS

3 Stochastic processes 101

3.1 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . 1033.1.1 Simple stochastic processes . . . . . . . . . . . . . . . . 1043.1.2 The Chapman-Kolmogorov equation . . . . . . . . . . 106

3.2 Classes of stochastic processes . . . . . . . . . . . . . . . . . . 1173.2.1 Jump process and master equation . . . . . . . . . . . 1173.2.2 Diffusion process and Fokker-Planck equation . . . . . 1193.2.3 Deterministic processes and Liouville’s equation . . . . 121

3.3 Forward and backward equations . . . . . . . . . . . . . . . . 1223.4 Examples of special stochastic processes . . . . . . . . . . . . 125

3.4.1 Poisson process . . . . . . . . . . . . . . . . . . . . . . 1253.4.2 Random walk in one dimension . . . . . . . . . . . . . 1283.4.3 Wiener process and the diffusion problem . . . . . . . . 1313.4.4 Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . 135

3.5 Stochastic differential equations . . . . . . . . . . . . . . . . . 1373.5.1 Derivation of the stochastic differential equation . . . . 1373.5.2 Stochastic integration . . . . . . . . . . . . . . . . . . . 1403.5.3 Integration of stochastic differential equations . . . . . 1463.5.4 Changing variables in stochastic differential equations . 1493.5.5 Fokker-Planck and stochastic differential equations . . 150

3.6 Fokker-Planck equations . . . . . . . . . . . . . . . . . . . . . 1533.6.1 Probability currents and boundary conditions . . . . . 1543.6.2 Fokker-Planck equation in one dimension . . . . . . . . 1583.6.3 Fokker-Planck equation in several dimensions . . . . . 164

3.6.3.1 Change of variables . . . . . . . . . . . . . . . 1643.6.3.2 Stationary solutions . . . . . . . . . . . . . . 1673.6.3.3 Detailed balance . . . . . . . . . . . . . . . . 169

3.7 Autocorrelation functions and spectra . . . . . . . . . . . . . . 173

4 Applications in chemistry 177

4.1 Stochasticity in chemical reactions . . . . . . . . . . . . . . . . 1774.1.1 Elementary steps of chemical reactions . . . . . . . . . 1784.1.2 The master equation in chemistry . . . . . . . . . . . . 1804.1.3 Birth-and-death master equations . . . . . . . . . . . . 1834.1.4 The flow reactor . . . . . . . . . . . . . . . . . . . . . 186

4.2 Classes of chemical reactions . . . . . . . . . . . . . . . . . . . 1934.2.1 Monomolecular chemical reactions . . . . . . . . . . . . 193

4.2.1.1 Irreversible monomolecular chemical reaction 1944.2.1.2 Reversible monomolecular chemical reaction . 195

4.2.2 Bimolecular chemical reactions . . . . . . . . . . . . . 2014.2.2.1 Addition reaction . . . . . . . . . . . . . . . . 201

CONTENTS III

4.2.2.2 Dimerization reaction . . . . . . . . . . . . . 2054.3 Fokker-Planck approximation of master equations . . . . . . . 210

4.3.1 Diffusion process approximated by a jump process . . . 2104.3.2 Kramers-Moyal expansion . . . . . . . . . . . . . . . . 2134.3.3 Size expansion of the chemical master equation . . . . 214

4.4 Numerical simulation of master equations . . . . . . . . . . . . 2194.4.1 Definitions and conditions . . . . . . . . . . . . . . . . 2194.4.2 The probabilistic rate parameter . . . . . . . . . . . . . 221

4.4.2.1 Bimolecular reactions . . . . . . . . . . . . . 2224.4.2.2 Monomolecular, trimolecular, and other reac-

tions . . . . . . . . . . . . . . . . . . . . . . . 2244.4.3 Simulation of chemical master equations . . . . . . . . 2264.4.4 The simulation algorithm . . . . . . . . . . . . . . . . 2314.4.5 Implementation of the simulation algorithm . . . . . . 233

5 Applications of stochastic processes in biology 241

5.1 Autocatalysis, replication, and extinction . . . . . . . . . . . . 2415.1.1 Autocatalytic growth and death . . . . . . . . . . . . . 2425.1.2 Boundaries in one step birth-and-death processes . . . 251

5.2 Size expansion in biology . . . . . . . . . . . . . . . . . . . . . 2565.3 The Poisson representation . . . . . . . . . . . . . . . . . . . . 261

5.3.1 Motivation of the Poisson representation . . . . . . . . 2625.3.2 Many variable birth-and-death systems . . . . . . . . . 2645.3.3 The formalism of the Poisson representation . . . . . . 2715.3.4 Real, complex, and positive Poisson representations . . 278

IV CONTENTS

Index

algebrafiltered, 105

assumptionscaling, 210

Bayes, Thomas, 58Bernoulli trials, 104Bernoulli, Jakob, 81Boltzmann, Ludwig, 8, 220Borel, Emile, 34boundary

absorbing, 157, 167, 194, 251entrance, 159, 160, 277exit, 159, 160natural, 159, 160periodic, 158, 161prescribed, 159reflecting, 157, 167, 251regular, 160

Brown, Robert, 7

Cantor, Georg, 17Cardano, Gerolamo, 10carrying capacity, 282Cauchy, Augustin Louis, 96central limit theorem, 91chaos

deterministic, 8Chapman, Sydney, 107Chebyshev, Pafnuty, 94Chung, Kai Lai, 6collisions

classical theory, 222nonreactive, 221reactive, 221

conditiongrowth, 148

Lipschitz, 148Markov, 106potential, 168pseudo first order, 204

conditionsboundary, 153

confidence interval, 84, 192constant

equilibrium, 193reaction rate, 193

controversyMendel-Fisher, 11

convergencepointwise, 51

correctionBessel, 71

correlationcoefficient, 64

covariance, 64sample, 72

currentprobability, 154

Darboux, Gaston, 48de Moivre, Abraham, 91Dedekind, Richard, 17density

discrete probability, 25joint, 61, 103joint , 44marginal, 44spectral, 173

descriptiondeterministic, 5, 7

detailed balance, 170, 185diffusion matrix, 119Dirac, Paul, 109

I

II INDEX

Dirichlet, Peter Gustav Lejeune, 51distribution

bimodal, 65discrete uniform, 26joint, 33, 43, 88joint , 44marginal, 33, 45, 58Maxwell-Boltzmann, 220normal, 67uniform, 35

Doob, Joseph, 104drift vector, 119dynamics

complex, 8

Ehrenfest, Paul, 200Einstein, Albert, 7, 106ensemble

canonical, 262grand canonical, 262microcanonical, 262

ensemble average, 174equation

backward, 107, 115, 123Chapman-Kolmogorov, 107, 123,

138chemical master, 230Fokker-Planck, 116, 119, 138,

153, 177forward, 107, 115, 123Kolmogorov, 153Langevin, 137Liouville, 121master, 117, 177, 180Smoluchowski, 153

ergodicity, 175estimator, 70event, 24exit problem, 124expectation value, 49, 60

Feller, William, 160

filtration, 105Fisher, Ronald, 11Fisk, Donald, 144fluctuations, 5, 7, 9Fokker, Adriaan, 153function

autocorrelation, 173characteristic, 74, 76cumulative distribution, 30, 33,

66density, 42distribution, 43Heaviside, 28indicator, 50marginal distribution, 45measurable, 49moment generating, 74, 76nonanticipating, 144probability generating, 74probability mass, 30signum, 29simple, 50

gamePenney’s, 13

Gardiner, Crispin, 102, 115, 137Gegenbauer, Leopold, 207genetics

Mendelian, 11Gillespie, Daniel, 182, 219

Heaviside, Oliver, 28

independence, 57inequality

Cauchy-Schwarz, 64inference, statistical, 59integral

Lebesgue, 47Riemann, 47Stieltjes, 47, 139, 140Stratonovich, 144

INDEX III

integrationCauchy-Euler, 147

Ito, Kiyoshi, 141

Jacobi, Carl, 203

Khinchin, Aleksandr, 174Kimura, Motoo, 248kinetic theory

gases, 222kinetics

combinatorial, 261combinorial, 261mass action, 261

Kolmogorov, Andrey, 107Kramers, Hendrik, 210kurtosis, 67

excess, 67

Levy, Paul Pierre, 104Langevin, Paul, 137Laplace, Pierre-Simon, 91law

large numbers, 94Lebesgue, Henri Leon, 30limit

almost certain, 46in distribution, 47mean square, 46, 141stochastic, 47

Liouville, Joseph, 121Lorentz, Hendrik, 96

Markov, Andrey, 106martingale, 104

local, 105, 143mass action, 227matrix

stoichiometric, 265Maxwell, James Clerk, 8, 220mean

sample, 70

measureLebesgue, 39

median, 64Mendel, Gregor, 11, 58mode, 65molecularity, 193moment

centered, 62jump, 181raw, 62

momentsfactorial, 80raw, 84sample, unbiased, 71

motionBrownian, 7, 137thermal, 8

Moyal, Jose, 210

noisecolored, 174white, 138, 174

numbersrational, 41

objectelementary, 17

Onsager, Lars, 172operator

linear, 60Ornstein, Leonard, 135Ostwald, Wilhelm, 9

Pascal, Blaise, 10PDE

parabolic, 153Planck, Max, 153Poincare, Henri, 9Poisson representation, 261Poisson, Simeon Denis, 79pre-image, 50probability

IV INDEX

conditional, 56density, 42, 49distribution, 43, 49elementary, 45mass function, 30net flow, 154posterior, 59prior, 59triple, 27, 42

processadapted, 105adaptive, 144Bernoulli, 81birth-and-death, 178cadlag, 30, 105diffusion, 119, 151jump, 117Markov, 138, 177nonanticipating, 105Ornstein-Uhlenbeck, 135Poisson, 79, 125Rayleigh, 165, 168Wiener, 108, 120, 131, 135, 143,

147, 174product, reaction, 220property

extensive, 214intensive, 214

pseudovector, 171

quantile, 66quasiprobability, 271

random walk, 128, 135rate parameter

probabilistic, 222Rayleigh fading, 165reactant, 220reaction

Belousov-Zhabotinskii, 9reaction order, 193

Riemann, Bernhard, 47

samplespace, 17

selectionrandom, 248

semimartingale, 30, 105, 143sensitivity

to fluctuations, 9set

Borel, 34, 38Cantor, 41countable, 21empty, 18power, 34uncountable, 21Vitali, 35, 41

setsdisjoint, 20

sigma-algebra, 37Borelian, 38

skewness, 67space

event, 37measurable, 37

spectroscopyfluorescence correlation, 6single molecule, 6

spectrum, 173standard deviation, 63

sample, 70statistics, Bayesian, 58Stieltjes, Thomas Jean, 30stochastic process, 101

independent, 104Markov, 106separable, 103

Stratonovich, Ruslan, 144system

closed, 193, 195event, 36, 37

INDEX V

isolated, 193systems

dynamical, 8

theoremmutliplication, 62

theorylarge samples, 94

timearrival, 127first passage, 107, 124, 249sequential extinction, 249

Tolman, Richard Chace, 170, 185trajectory, 101translation, 40

Uhlenbeck, George, 135uncertainty

quantum mechanical, 8

van Kampen, Nicholas, 210, 214variable

random, 28variables

continuous, 42discrete, 42

variance, 63sample, 70

vectoraxial, 171random, 87

Vitali, Giuseppe, 34volume

generalized, 39von Smoluchowski, Marian, 7, 106,

153

Wiener, Norbert, 131

Documents

Stochastic Chemical Kinetics - TBIpks/Preprints/stochast.pdf · numbers. Stochastic kinetics is an interdisciplinary subject and hence the course will contain elements from various