Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Stochastic Chemical Kinetics
A Special Course on Probability and
Stochastic Processes for Physicists,
Chemists, and Biologists
Summer Term 2011
Version of July 7, 2011
Peter Schuster
Institut fur Theoretische Chemie der Universitat Wien
Wahringerstraße 17, A-1090 Wien, Austria
Phone: +43 1 4277 527 36 , Fax: +43 1 4277 527 93
E-Mail: pks @ tbi.univie.ac.at
Internet: http://www.tbi.univie.ac.at/~pks
2 Peter Schuster
Preface
The current text contains notes that were prepared first for a course on
‘Stochastic Chemical Kinetics’ held at Vienna University in the winter term
1999/2000 and repeated in the winter term 2006/2007. The current version
refers to the summer term 2011 but no claim is made that it is free of errors.
The course is addressed to students of chemistry, biochemistry, molecular
biology, mathematical and theoretical biology, bioinformatics, and systems
biology with particular interests in phenomena observed at small particle
numbers. Stochastic kinetics is an interdisciplinary subject and hence the
course will contain elements from various disciplines, mainly from probability
theory, mathematical statistics, stochastic processes, chemical kinetics, evo-
lutionary biology, and computer science. Considerable usage of mathemat-
ical language and analytical tools is indispensable, but we have consciously
avoided to dwell upon deeper and more formal mathematical topics.
This series of lectures will concentrate on principles rather than technical
details. At the same time it will be necessary to elaborate tools that allow
to treat real problems. Analytical results on stochastic processes are rare
and thus it will be unavoidable to deal also with approximation methods and
numerical techniques that are able to produce (approximate) results through
computer calculations (see, for example, the articles [1–4]). The applica-
bility of simulations to real problems depends critically on population sizes
that can by handled. Present day computers can deal with 106 to 107 parti-
cles, which is commonly not enough for chemical reactions but sufficient for
most biological problems and accordingly the sections dealing with practical
examples will contain more biological than chemical problems.
The major goal of this text is to avoid distraction of the audience by taking
notes and to facilitate understanding of subjects that are quite sophisticated
at least in parts. At the same time the text allows for a repetition of the
major issues of the course. Accordingly, an attempt was made in preparing a
3
4 Peter Schuster
useful and comprehensive list of references. To study the literature in detail
is recommended to every serious scholar who wants to progress towards a
deeper understanding of this rather demanding discipline. Apart from a
respectable number of publications mentioned during the progress of the
course the following books were used in the preparation [5–9], the German
textbooks [10, 11], elementary texts in English [12, 13] and in German [14]. In
addition, we mention several other books on probability theory and stochastic
processes [15–26]. More references will be given in the chapters on chemical
and biological applications of stochastic processes.
Peter Schuster Wien, March 2011.
1. History and Classical Probability
An experimentalist reproduces an experiment. What can he expect to find?
There are certainly limits to precision and these limits confine the repro-
ducibility of experiments and at the same time restrict the predictability
of outcomes. The limitations of correct predictions are commonplace: We
witness them every day by watching the failures of various forecasts from
the weather to the stock market. Daily experience also tells us that there
is an enormous variability in the sensitivity of events with respect to preci-
sion of observation. It ranges from the highly sensitive and hard to predict
phenomena like the ones just mentioned to the enormous accuracy of astro-
nomical predictions, for example, the precise dating of the eclipse of the sun
in Europe on August 11, 1999. Most cases lie between these two extremes
and careful analysis of unavoidable randomness becomes important. In this
series of lectures we are heading for a formalism, which allows for proper ac-
counting of the limitations of the deterministic approach and which extends
the conventional description by differential equations.
In order to be able to study reproducibility and sensitivity of prediction
by means of a scientific approach we require a solid mathematical basis. An
appropriate conceptual frame is provided by the theory of probability and
stochastic processes. Conventional or deterministic variables have to be re-
placed by random variables which fluctuate in the sense that different values
are obtained in consecutive measurements under (within the limits of control)
identical conditions. The solutions of differential equations, commonly used
to describe time dependent phenomena, will not be sufficient and we shall
search for proper formalisms to model stochastic processes. Fluctuations play
a central role in the stochastic description of processes. The search for the
origin of fluctuations is an old topic of physics, which still is of high current
interest. Some examples will be mentioned in the next section 1.1. Recently,
5
6 Peter Schuster
the theory of fluctuations became important because of the progress in spec-
troscopic techniques, particularly in fluorescence spectroscopy. Fluctuations
became directly measurable in fluorescence correlation spectroscopy [27–29].
Even single molecules can be efficiently detected, identified, and analyzed by
means of these new techniques [30, 31].
In this series of lectures we shall adopt a phenomenological approach
to sources of randomness or fluctuations. Thus, we shall not search here
for the various mechanisms setting the limitations to reproducibility (except
a short account in the next section), but we will develop a mathematical
technique that allows to handle stochastic problems. In the philosophy of
such a phenomenological approach the size of fluctuations, for example, is
given through external parameters that can be provided by means of a deeper
or more basic theory or derived from experiments. This course starts with a
very brief primer of current probability theory (chapter 2), which is largely
based on an undergraduate text by Kai Lai Chung [5] and an introduction to
stochasticity by Hans-Otto Gregorii [11] . Then, we shall be concerned with
the general formalism to describe stochastic processes (chapter 3, [6]) and
present several analytical as well as numerical tools to derive or compute
solutions. The following two chapters 4 and 5 deal with applications to
selected problems in chemistry and biology.
Stochastic Kinetics 7
1.1 Precision limits and fluctuations
Conventional chemical kinetics is handling ensembles of molecules with large
numbers of particles, N ≈ 1020 and more. Under many conditions1 random
fluctuations of particle numbers are proportional to√N . Dealing with 10−4
moles, being tantamount to N = 1020 particles, natural fluctuations involve
typically√N = 1010 particles and thus are in the range of ±10−10N . Un-
der these conditions the detection of fluctuations would require a precision
in the order of 1 : 10−10, which is (almost always) impossible to achieve.2
Accordingly, the chemist uses concentrations rather than particle numbers,
c = N/(NL × V ) wherein NL = 6.23 × 1023 and V are Avogadro’s number
and the volume (in dm3), respectively. Conventional chemical kinetics con-
siders concentrations as continuous variables and applies deterministic meth-
ods, in essence differential equations, for modeling and analysis of reactions.
Thereby, it is implicitly assumed that particle numbers are sufficiently large
that the limit of infinite particle numbers neglecting fluctuations is fulfilled.
In 1827 the British botanist Robert Brown detected and analyzed ir-
regular motions of particles in aqueous suspensions that turned out to be
independent of the nature of the suspended materials – pollen grains, fine
particles of glass or minerals [32]. Although Brown himself had already
demonstrated that Brownian motion is not caused by some (mysterious) bi-
ological effect, its origin remained kind of a riddle until Albert Einstein [33],
and independently by Marian von Smoluchowski [34], published a satisfac-
tory explanation in 1905 which contained two main points:
(i) The motion is caused by highly frequent impacts on the pollen grain of
the steadily moving molecules in the liquid in which it is suspended.
(ii) The motion of the molecules in the liquid is so complicated in detail
that its effect on the pollen grain can only be described probabilistically
1Computation of fluctuations and their time course of will be the subject of this course.
Here we mention only that the√N law is always fulfilled in the approach towards equi-
librium and stable stationary states.2Most techniques of analytical chemistry meet serious difficulties when accuracies in
particle numbers of 10−4 or higher are required.
8 Peter Schuster
in terms of frequent statistically independent impacts.
In particular, Einstein showed that the number of particles per unit volume,
f(x, t),3 fulfils the already known differential equation of diffusion,
∂f
∂t= D
∂2f
∂x2with the solution f(x, t) =
N√4πD
exp(−x2/(4Dt)
)
√t
,
where N is the total number of particles. From the solution of the diffusion
equation Einstein computes the square root of the mean square displacement,
λx, the particle experiences in x-direction:
λx =√x2 =
√2Dt .
Einstein’s treatment is based on discrete time steps and thus contains an
approximation – that is well justified – but it represents the first analysis
based on a probabilistic concept of a process that is comparable to the current
theories and we may consider Einstein’s paper as the beginning of stochastic
modeling. Brownian motion was indeed the first completely random process
that became accessible to a description that was satisfactory by the standards
of classical physics. Thermal motion as such had been used previously as the
irregular driving force causing collisions of molecules in gases by James Clerk
Maxwell and Ludwig Boltzmann. The physicists in the second half of the
nineteenth century, however, were concerned with molecular motion only as
it is required to describe systems in the thermodynamic limit. They derived
the desired results by means of global averaging statistics.
Thermal motion as an uncontrollable source of random fluctuation has
been complemented by quantum mechanical uncertainty as another limita-
tion of achievable precision. For the purpose of this course the sensitivity of
processes too small (and uncontrolled) changes in initial conditions, however,
is of more relevance than the consequences of uncertainty. Analysis of com-
plex dynamical systems was initiated in essence by Edward Lorenz [35] who
detected through computer integration of differential equations what is nowa-
days called deterministic chaos. Complex dynamics in physics and chemistry
3For the sake of simplicity we consider only motion in one spatial direction, x.
Stochastic Kinetics 9
has been known already earlier as the works of the French mathematician
Henri Pioncare and the German chemist Wilhelm Ostwald demonstrate. New
in the second half of this century were not the ideas but the tools to study
complex dynamics. A previously unknown power in the analysis by numerical
computation became available through easy access to electronic computers.
These studies have shown that the majority of dynamical systems modeled
by nonlinear differential equations show irregular – that means non-periodic
– oscillation for certain ranges of parameter values. In these chaotic regimes
solutions curves were found to be extremely sensitive to small changes in the
initial conditions. Solution curves which are almost identical at the beginning
deviate exponentially from each other. Limitations in the control of initial
conditions, which are inevitable because of the natural limits to achievable
precision, result in upper bounds of the time spans for which the dynamics of
the system can be predicted with sufficient accuracy. It is not accidental that
Lorenz detected chaotic dynamics first in equations for atmospheric motions
which are thought to be so complex that forecast is inevitably limited to
rather short times.
Fluctuations play an important role in highly sensitive dynamical sys-
tems. Commonly fluctuations increase with time and any description of such
a system will be incomplete when it does not consider their development
in time. Thermal fluctuations are highly relevant at low concentration of
one, two or more reaction partners or intermediates and such situations oc-
cur almost regularly in oscillating or chaotic systems. An excellent and well
studied example in chemistry is the famous Belousov-Zhabotinskii reaction.
In biology, on the other hand, we encounter regularly situations that are
driven by fluctuations. Every mutation leading to a new variant produces
a single individual at first. Whether or not the mutant will be amplified to
population level depends on both, the properties of the new individual and
events that are completely governed by chance.
Both phenomena, quantum mechanical uncertainty and sensitivity of
complex dynamics, provided an ultimate end for the deterministic view of the
world. Quantum mechanics set a principle limit to determinism that com-
monly becomes evident only in the world of atoms and molecules. Limited
10 Peter Schuster
predictability of complex dynamics is more of a practical nature: Although
the differential equations used to describe and analyze chaos are still deter-
ministic, initial conditions of a precision that can never be achieved in reality
would be required for correct long-time predictions.
1.2 Thinking in terms of probability
The concept of probability originated from the desire to analyze gambling
by rigorous mathematical thoughts. An early study that has largely re-
mained unnoticed but contained already the basic ideas of probability was
done in the sixteenth century by the Italian mathematician Gerolamo Car-
dano. Commonly, the beginning of classical probability theory is attributed
to the French mathematician Blaise Pascal who wrote in the middle of sev-
enteenth century – 100 years later – several letters to Pierre de Fermat. The
most famous of this letters, dated July 29, 1654, reports the careful observa-
tion of a professional gambler, the Chevalier de Mere. The Chevalier observed
that obtaining at least one “six” with one die in 4 throws is successful in more
than 50% whereas obtaining at least two times the “six” with two dice in
24 throws has less than 50% chance to win. He considered this finding as
a paradox because he calculated naıvely and erroneously that the chances
should be the same:
4 throws with one die yields 4× 1
6=
2
3,
24 throws with two dice yields 24× 1
36=
2
3.
Blaise Pascal became interested in the problem and calculated correctly the
probability as we do it now in classical probability theory by careful counting
of events:
probability = Prob =number of favorable events
total number of events. (1.1)
Probability according to equation (1.1) is always a positive quantity between
zero and one, 0 ≤ Prob ≤ 1. The sum of the probabilities that an event
has occurred or did not occur thus has to be always one. Sometimes, as in
Stochastic Kinetics 11
Pascal’s example, it is easier to calculate the probability of the unfavorable
case, q, and to obtain the desired probability as p = 1 − q. In the one-
die example the probability not to throw a “six” is 5/6, in the two-dice
example we have 35/36 as the probability of failure. In case of independent
events probabilities are multiplied4 and we finally obtain for 4 and 24 trials,
respectively:
q(1) =
(5
6
)4
and p(1) = 1−(
5
6
)4
= 0.5177 ,
q(2) =
(35
36
)24
and p(2) = 1−(
35
36
)24
= 0.4914 .
It is remarkable that the gambler could observe this rather small difference
in the probability of success – he must have tried the game very often indeed!
Statistics in biology has been pioneered by the Augustinian monk Gregor
Mendel. In table 1.1 we list the results of two typical experiments distin-
guishing roundish or wrinkled seeds with yellow or green color. The ratios
observed for single plants show large scatter. In the mean values for ten
plants some averaging has occurred but still the deviations from the ideal
values are substantial. Mendel carefully investigated several hundred plant
and then the statistical law of inheritance demanding a ratio of 3:1 became
evident [36]. Ronald Fisher in a rather polemic publication [37] reanalyzed
Mendel’s experiments, questioned Mendel’s statistics, and accused him of
intentionally manipulating his data because the results are too close to the
ideal ratio. Fisher’s publication initiated a long lasting debate during which
the majority of scientists spoke up in favor of Mendel until 2008 recent book
declared the end of the Mendel-Fisher controversy [38]. In chapter 5 we shall
discuss statistical laws and Mendel’s statistics in the light of present day
mathematical statistics.
The third example we mention here can be used to demonstrate the usual
weakness of people in estimating probabilities. Let your friends guess –
without calculating – how many persons you need in a group such that there
4We shall come back to the problem of independent events later when we introduce
current probability theory in section 2, which is based on set theory.
12 Peter Schuster
Table 1.1: Statistics of Gregor Mendel’ experiments with the garden
pea (pisum sativum). The results of two typical experiments with ten plants are
shown. In total Mendel analyzed 7324 seeds from 253 hybrid plants in the second
trial year, 5474 were round or roundish and 1850 angular wrinkled yielding a ratio
2.96:1. The color was recorded for 8023 seeds from 258 plants out of which 6022
were yellow and 2001 were green with a ratio of 3.01:1.
Form of seed Color of seeds
plants round angular ratio yellow green ratio
1 45 12 3.75 25 11 2.27
2 27 8 3.38 32 7 4.57
3 24 7 3.43 14 5 2.80
4 19 10 1.90 70 27 2.59
5 32 11 2.91 24 13 1.85
6 26 6 4.33 20 6 3.33
7 88 24 3.67 32 13 2.46
8 22 10 2.20 44 9 4.89
9 28 6 4.67 50 14 3.57
10 25 7 3.57 44 18 2.44
total 336 101 3.33 355 123 2.89
is fifty percent chance that at least two of them celebrate their birthday on
the same day – You will be surprised by the oddness of the answers! With
our knowledge on the gambling problem the probability is easy to calculate.
First we compute the negative event: all person celebrate their birthdays on
different days in the year – 365 days, no leap-year – and find for n people in
the group,5 .
q =365
365· 364
365· 363
365· . . . · 365− (n− 1)
365and p = 1− q .
5The expressions is obtained by the argument that the first person can choose his
birthday freely. The second person must not choose the same day and so he has 364
possible choices. For the third remain 363 choices and the nth person, ultimately, has
365− (n− 1) possibilities.
Stochastic Kinetics 13
Figure 1.1: The birthday puzzle. The curve shows the probability p(n) that
two persons in a group of n people celebrate birthday on the same day of the year.
The function p(n) is shown in figure 1.1. For the above mentioned 50%
chance we need only 27 persons, with 41 people we have already more than
90% chance that two celebrate birthday one the same day; 57 yield more
than 99% and 70 persons exceed 99,9%.
The fourth and final example deals again with counterintuitive proba-
bilities: The coin toss game Penney Ante invented by Walter Penney [39].
Before a sufficiently long sequence of heads and tails is determined by flip-
ping each of two players chooses a sequence of n consecutive flips – commonly
n = 3 is applied and this leaves the choice of the eight triples shown in ta-
ble 1.2. The second player has the advantage of knowing the choice of the
first player. Then a sufficiently long sequence of coins flips is recorded un-
til both of the chosen triples have appeared in the sequence. The player
whose sequence appeared first has won. The advantage of the second player
is commonly largely underestimated when guessed without calculation. A
simple argument illustrates the disadvantage of player 1: Assume he had
chosen ’111’. If the second player chooses a triple starting with ’0’ the only
chances for player 1 to win are expressed by the sequences beginning ’111. . .
and they have a probability of p=1/8 leading to the odds 7 to 1 for player 2.
Eventually, we mention the optimal strategy for player 2: Take the first two
digits of the three-bit sequence of player 1 and precede it with the opposite
14 Peter Schuster
Table 1.2: Advantage of the second player in Penney’s game. Two players
choose two triples of digits one after the other, player 2 after player 1. Coin flipping
is played until the two triples appear. The player whose triple came first has won.
An optimally gambling player 2 (column 2) has the advantage shown in column 3.
Code: 1= head and 0= tail. The optimal strategy for player 2 is encoded by color
and boldface (see text).
Player’s choice Outcome
Player 1 Player 2 Odds in favor of player 2
111 011 7 to 1
110 011 3 to 1
101 110 2 to 1
100 110 2 to 1
011 001 2 to 1
010 001 2 to 1
001 100 3 to 1
000 100 7 to 1
of the symbol in the middle of the triple (The shifted pair is shown in red,
the switched symbol in bold in table 1.2).
Probability theory in its classical form is more than 300 years old. Not
accidentally the concept arose in thinking about gambling, which was con-
sidered as a domain of chance in contrast to rigorous science. It took indeed
rather long time before the concept of probability entered scientific thought
in the nineteenth century. The main obstacle for the acceptance of prob-
abilities in physics was the strong belief in determinism that has not been
overcome before the advent of quantum theory. Probabilistic concepts in
physics of last century were still based on deterministic thinking, although
the details of individual events were considered to be too numerous and too
complex to be accessible to calculation. It is worth mentioning that think-
ing in terms of probabilities entered biology earlier, already in the second
half of the nineteenth century through the reported works of Gregor Mendel
Stochastic Kinetics 15
on genetic inheritance. The reason for this difference appears to lie in the
very nature of biology: small sample sizes are typical, most of the regular-
ities are probabilistic and become observable only through the application
of probability theory. Ironically, Mendel’s investigations and papers did not
attract a broad scientific audience before their rediscovery at the beginning
of the twentieth century. The scientific community in the second half of
the nineteenth century was simply not yet prepared for the acceptance of
probabilistic concepts.
Although classical probability theory can be applied successfully to a
great variety of problems, a more elaborate notion of probability that is
derived from set theory is advantageous and absolutely necessary for extrap-
olation to infinitely large sample size. Here we shall use the latter concept
because it is easily extended to probability measures on continuous variables
where numbers of sample points are not only infinite but also uncountable.
16 Peter Schuster
2. Probabilities, Random Variables, and
Densities
The development of set theory initiated by Georg Cantor and Richard Dedekind
in the eighteen seventieth provided a possibility to build the concept of prob-
ability on a firm basis that allows for an extension to certain families of
uncountable samples as they occur, for example, with continuous variables.
Present day probability theory thus can be understood as a convenient ex-
tension of the classical concept by means of set and measure theory. We start
by repeating a few indispensable notions and operations of set theory.
2.1 Sets and sample spaces
Sets are collections of objects with two restrictions: (i) Each object belongs
to one set cannot be a member of more sets and (ii) a member of a set must
not appear twice or more often. In other words, objects are assigned to
sets unambiguously. In application to probability theory we shall denote the
elementary objects by the small Greek letter omega, ω – if necessary with
various sub- and superscripts – and call them sample points or individual
results and the collection of all objects ω under consideration the sample
space is denoted by Ω with ω ∈ Ω. Events, A, are subsets of sample points
that fulfil some condition1
A =ω, ωk ∈ Ω : f(ω) = c
(2.1)
with ω = (ω1, ω2, . . .) being some set of individual results and f(ω) = c
encapsulates the condition on ensemble of the sample points ωk.
1What a condition means will become clear later. For the moment it is sufficient to
understand a condition as a function providing a restriction, which implies that not all
subsets of sample points belong to A.
17
18 Peter Schuster
Any partial collection of points is a subset of Ω. We shall be dealing
with fixed Ω and, for simplicity, often call these subsets of Ω just sets. There
are two extreme cases, the entire sample space Ω and the empty set, ∅.The number of points in some set S is called its size, |S|, and thus is a
nonnegative integer or ∞. In particular, the size of the empty set is |∅| = 0.
The unambiguous assignment of points to sets can be expressed by2
ω ∈ S exclusive or ω /∈ S .
Consider two sets A and B. If every point of A belongs to B, then A is
contained in B. A is a subset of B and B is a superset of A:
A ⊂ B and B ⊃ A .
Two sets are identical if the contain exactly the same points and then we
write A = B. In other words, A = B iff (if and only if) A ⊂ B and B ⊂ A.
The basic operations with sets are illustrated in figure 2.1. We briefly repeat
them here:
Complement. The complement of the set A is denoted by Ac and consists
of all points not belonging to A:3
Ac = ω|ω /∈ A . (2.2)
There are three evident relations which can be verified easily: (Ac)c = A,
Ω c = ∅, and ∅ c = Ω.
Union. The union of the two sets A and B, A∪B, is the set of points which
belong to at least one of the two sets:
A ∪ B = ω|ω ∈ A or ω ∈ B . (2.3)
2In order to be unambiguously clear we shall write or for and/or and exclusive or for
or in the strict sense.3Since we are considering only fixed sample sets Ω these points are uniquely defined.
Stochastic Kinetics 19
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaa
Figure 2.1: Some definitions and examples from set theory. Part a shows
the complement Ac of a set A in the sample space Ω. In part b we explain the
two basic operations union and intersection, A∪B and A∩B, respectively. Parts
c and d show the set-theoretic difference, A \ B and B \ A, and the symmetric
difference, A4B. In parts e and f we demonstrate that a vanishing intersection of
three sets does not imply pairwise disjoint sets.
Intersection. The intersection of the two sets A and B, A ∩ B, is the set
of points which belong to both sets (For short A ∩ B is sometimes written
AB):
A ∩B = ω|ω ∈ A and ω ∈ B . (2.4)
Unions and intersections can be executed in sequence and are also defined
20 Peter Schuster
for more than two sets, or even for an infinite number of sets:
⋃
n=1,...
An = A1 ∪ A2 ∪ · · · = ω|ω ∈ An for at least one value of n ,
⋂
n=1,...
An = A1 ∩ A2 ∩ · · · = ω|ω ∈ An for all values of n .
These relations are true because the commutative and the associative laws
are fulfilled by both operations, intersection and union:
A ∪ B = B ∪A , A ∩B = B ∩ A ;
(A ∪ B) ∪ C = A ∪ (B ∪ C) , (A ∩B) ∩ C = A ∩ (B ∩ C) .
Difference. The set A \ B is the set of points, which belong to A but not
to B:
A \B = A ∩Bc = ω|ω ∈ A and ω /∈ B . (2.5)
In case A ⊃ B we write A− B for A \ B and have A \ B = A− (A ∩ B) as
well as Ac = Ω− A.
Symmetric difference. The symmetric difference A∆B is the set of points
which belongs exactly to one of the two sets A and B. It is used in advanced
theory of sets and is symmetric as it fulfils the commutative law, A∆B =
B∆A:
A∆B = (A ∩Bc) ∪ (Ac ∩B) = (A \B) ∪ (B \ A) . (2.6)
Disjoint sets. Disjoint sets A and B have no points in common and hence
their intersection, A ∩ B, is empty. They fulfill the following relations:
A ∩ B = ∅ , A ⊂ Bc and B ⊂ Ac . (2.7)
A number of sets are disjoint only if they are pairwise disjoint. For three sets,
A, B and C, this requires A∩B = ∅, B ∩C = ∅, and C ∩A = ∅. When
two sets are disjoint the addition symbol is (sometimes) used for the union,
A+B for A∪B. Clearly we have always the decomposition: Ω = A+Ac.
Stochastic Kinetics 21
Figure 2.2: Sizes of sample sets and countability. Finite, countably infinite,
and uncountable sets are distinguished. We show examples of every class. A set
is countably infinite when its elements can be assigned uniquely to the natural
numbers (1,2,3,. . .,n,. . .).
Sample spaces may contain finite or infinite numbers of sample points.
As shown in figure 2.2 it is important to distinguish further between different
classes of infinity: countable and uncountable numbers of points. The set of
rational numbers, for example, is a countably infinite since the numbers can
be labeled and assigned uniquely to the positive integers 1 < 2 < 3 < · · · <n < · · · . The set of real numbers cannot be ordered in such a way and hence
it is uncountable.
22 Peter Schuster
2.2 Probability measure on countable sample spaces
Although we are equipped now, in principle, with the tools of probability
theory, which shall enable us to handle uncountable sets under certain con-
ditions, the starting point pursued in this section will be chosen under the
assumptions that our sets are countable.
2.2.1 Probabilities on countable sample spaces
For countable sets it is straightforward to measure the size of sets by counting
the numbers of points they contain. The proportion
P (A) =|A||Ω| (2.8)
is identified as the probability of the event represented by the elements of
subset A. For another event holds, for example, P (B) = |B|/|Ω|. Calculating
the sum of the two probabilities, P (A) + P (B), requires some care since we
know only (figure 2.1):
|A| + |B| ≥ |A ∪ B| .
The excess of |A|+ |B| over the size of the union |A∪B| is precisely the size
of the intersection |A ∩B| and thus we find
|A| + |B| = |A ∪ B| + |A ∩ B|
or by division through the size of sample space Ω
P (A) + P (B) = P (A ∪ B) + P (A ∩B) .
Only in case the intersection is empty, A ∩ B = ∅, the two sets are disjoint
and their probabilities are additive, |A ∪ B| = |A|+ |B|, and hence
P (A+B) = P (A) + P (B) iff A ∩B = ∅ . (2.9)
It is important to memorize this condition for later use, because it represents
an implicitly made assumption for computing probabilities.
Stochastic Kinetics 23
Now we can define a probability measure by means of the basic axioms
of probability theory (for alternative axioms in probability theory see, for
example [40, 41]):
A probability measure on the sample space Ω is a function of subsets of
Ω, P : S → P (S) or P (·) for short, which is defined by the three axioms:
(i) For every set A ⊂ Ω, the value of the probability measure is a nonneg-
ative number, P (A) ≥ 0 for all A,
(ii) the probability measure of the entire sample set – as a subset – is equal
to one, P (Ω) = 1, and
(iii) for any two disjoint subsets A and B, the value of the probability mea-
sure for the union, A ∪B = A+B, is equal to the sum of its value for
A and its value for B,
P (A ∪ B) = P (A+B) = P (A) + P (B) provided P (A ∩ B) = ∅ .
Condition (iii) implies that for any countable – eventually infinite – collection
of disjoint or non-overlapping sets, Ai (i = 1, 2, 3, . . .) with Ai ∩ Aj = ∅ for
all i 6= j, the relation called σ-additivity
P
(⋃
i
Ai
)=∑
i
P (Ai) or P
( ∞∑
k=1
Ak
)=
∞∑
k=1
P (Ak) (2.10)
holds. Clearly we have also P (Ac) = 1− P (A), P (A) = 1− P (Ac) ≤ 1, and
P (∅) = 0. For any two sets A ⊂ B we have P (A) ≤ P (B) and P (B − A) =
P (B) − P (A). For any two arbitrary sets A and B we can write a sum of
disjoint sets as follows
A ∪B = A + Ac ∩ B and
P (A ∪B) = P (A) + P (Ac ∩B) .
Since B ⊂ Ac ∩B we obtain P (A ∪B) ≤ P (A) + P (B).
The set of all subsets of Ω is the powerset Π(Ω) (figure 2.3). It contains
the empty set ∅, the sample space Ω and the subsets of Ω and this includes
24 Peter Schuster
Figure 2.3: The powerset. The powerset Π(Ω) is a set containing all subsets
of Ω including the empty set ∅ and Ω itself. The figure sketches the powerset of
three events A, B, and C.
the results of all set theoretic operations that were listed above. The relation
between the sample point ω, an event A, the sample space Ω and the powerset
Π(Ω) is illustrated by means of an example presented already in section 1.2
as Penney’s game: the repeated coin toss. Flipping a coin has two outcomes:
’0’ for head and ’1’ tail (see also Bernoulli process in subsection 2.7.5). The
sample points for flipping the coin n-times are binary n-tuples or strings, ω =
(ω1, ω2, . . . , ωn) with ωi ∈ 0, 1.4 It is useful to consider also infinite numbers
of repeats, in particular for computing limits n → ∞: ω = (ω1, ω2, . . .) =
(ωi)i∈N with ωi ∈ 0, 1. Then we are dealing with infinitely long binary
strings and the sample space Ω = 0, 1N is the space of all infinitely long
binary strings. It is countable as can be easily verified: Every binary string
represents the binary encoding of a natural number (including ’0’) Nk ∈ N0
and hence Ω is countable as the natural numbers are.
A subset of Ω will be called an event A when a probability measure
4There is a trivial but important distinction between strings or n-tuples and sets: In
a string the position of an element matters, whereas in a set it does not. The following
three sets are identical: 1, 2, 3 = 3, 1, 2 = 1, 2, 2, 3. In order to avoid ambiguities
string are written in (normal) parentheses and sets in curly brackets.
Stochastic Kinetics 25
derived from axioms (i), (ii), and (iii) has been assigned. Commonly, one is
not interested in the full detail of a probabilistic result and events can be
easily adapted to lumping together sample points. We ask, for example, for
the probability A that n coin flips yield at least k-times tail (the score for
tail is 1):
A =
ω = (ω1, ω2, . . . , ωn) ∈ Ω :
n∑
i=1
ωi ≥ k
,
where the sample space is Ω = 0, 1n. The task is now to find a system
of events F that allows for a consistent assignment of a probability P (A)
for every event A. For countable sample spaces Ω the powerset Π(Ω) rep-
resents such a system F , we characterize P (A) as a probability measure
on(Ω,Π(Ω)
), and the further handling of probabilities as outlined below
is straightforward. In case of uncountable sample spaces Ω, however, the
powerset Π(Ω) is too large and a more sophisticated procedure is required
(section 2.3).
So far we have constructed and compared sets but not yet introduced
numbers for actual computations. In order to construct a probability measure
that is adaptable for numerical calculations on some countable sample space,
Ω = ω1, ω2, . . . , ωn, . . ., we assign a weight %n to every sample point ωn
subject to the conditions
∀ n : %n ≥ 0 ;∑
n
%n = 1 . (2.11)
Then for P (ωn) = %n ∀ n the following two equations
P (A) =∑
ω∈A%(ω) for A ∈ Π(Ω) and
%(ω) = P (ω) for ω ∈ Ω
(2.12)
represent a bijective relation between the probability measure P on(Ω,Π(Ω)
)
and the sequences % =(%(ω)
)ω∈Ω
in [0,1] with∑
ω∈Ω %(ω) = 1. Such a
sequence is called a (discrete) probability density.
The function %(ωn) = %n has to be estimated or determined empirically
because it is the result of factors lying outside mathematics or probability
26 Peter Schuster
Figure 2.4: Probabilities of throwing two dice. The probability of obtaining
two to twelve counts through throwing two perfect or fair dice are based on the
equal probability assumption for obtaining the individual faces of a single die.
The probability P (N) raises linearly from two to seven and then decreases linearly
between seven and twelve (P (N) is a discretized tent map) and the additivity
condition requires∑12
k=2 P (N = k) = 1.
theory. In physics and chemistry the correct assignment of probabilities has
to meet the conditions of the experimental setup. An example will make this
point clear: The fact whether a die is fair and shows all its six faces with equal
probability or it has been manipulated and shows the ’six’ more frequently
then the other numbers is a matter of physics and not mathematics. For many
purposes the discrete uniform distribution, UΩ, is applied: All results
ω ∈ Ω appear with equal probability and hence %(ω) = 1/|Ω|.With the assumption of uniform distribution UΩ we can measure the size
of sets by counting sample points as illustrated best by considering the throws
Stochastic Kinetics 27
Figure 2.5: An ordered partial sum of a random variable. The sum Sn =∑n
k=1Xk represents the cumulative outcome of a series of events described by a
class of random variables, Xk. The series can be extended to +∞ and such a case
will be encountered, for example, with probability distributions. The ordering
criterion is not yet specified, it could be time t, for example.
of dice. For one die the sample space is Ω = 1, 2, 3, 4, 5, 6 and for the fair
die we make the assumption
P (k) =1
6; k = 1, 2, 3, 4, 5, 6 .
that all six outcomes corresponding to different faces of the die are equally
likely. Based on the assumption of UΩ we obtain the probabilities for the
outcome of two simultaneously thrown fair dice shown in figure 2.4. The
most likely outcome is a count of seven points because it has the largest
multiplicity, (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1).
2.2.2 Random variables and functions
For the definition of random variables on countable sets a probability triple
(Ω,Π(Ω), P ) is required: Ω contains the sample points or individual results,
28 Peter Schuster
the powerset Π(Ω) provides the events A as subsets, and P eventually rep-
resents the probability measure defined by equation (2.12). Based on such
a probability triple we define a random variable as a numerically valued
function X of ω on the domain of the entire sample space Ω,
ω ∈ Ω : ω → X (ω) . (2.13)
Random variables, X (ω) and Y(ω), can be subject to operations to yield
other random variables, such as
X (ω) + Y(ω) , X (ω)−Y(ω) , X (ω)Y(ω) , X (ω)/Y(ω) [Y(ω) 6= 0] ,
and, in particular, also any linear combination of random variables such as
aX (ω) + bY(ω) is a random variable too. Just as a function of a function is
still a function, a function of a random variable is a random variable,
ω ∈ Ω : ω → ϕ (X (ω),Y(ω)) = ϕ(X ,Y) .
Particularly important cases are the (partial) sums of n variables:
Sn(ω) = X1(ω) + . . . + Xn(ω) =n∑
k=1
Xk(ω) . (2.14)
Such a partial sum Sn could be, for example, the cumulative outcome of
n successive throws of a die.5Consider, for example, an ordered series of
events where the current cumulative outcome is given by the partial sum
Sn =∑n
k=1Xk as shown in figure 2.5. In principle, the series can be extended
to t→∞.
The ordered partial sum is step functions and their precise definitions
are hidden in the equations (2.13) and (2.14). Three definitions are possible
for the value of the function at the discontinuity. We present them for the
Heaviside step function
H(x) =
0 , if x < 0 ,
0, 12, 1 if x = 0 ,
1 , if x > 0 .
(2.15)
5The use of partial in this context implies that the sum does not cover the entire
sample space in the moment. Series of throws of dice, for example, could be continued in
the future.
Stochastic Kinetics 29
Figure 2.6: The cumulative distribution function of fair dice. The cumula-
tive probability distribution function (cdf) or probability distribution is a mapping
from the sample space Ω onto the unit interval [0, 1] of R. It corresponds to the
ordered partial sum with the ordering parameter being the score determined by
the stochastic variable. The example shown deals with throwing fair dice: The
distribution for one die (black) consists of six steps of equal height at the scores
1, 2, . . . , 6. The second curve (red) is the probability of throwing two dice (fig-
ure 2.4).
The value ’0’ at x = 0 implies left hand continuity for H(x) and in terms
of a probability distribution would correspond to a definition P (X < x)
in equation (2.16), the value 12
implies that H(x) is neither right-hand nor
left-hand semi-differentiable at x = 0 but is useful in many applications that
make use of the inherent symmetry of the Heaviside function, for example the
relation H(x) =(1 + sgn(x)
)/2 where sgn(x) is the sign or signum function:
sgn(x)
−1 if x < 0 ,
0 if x = 0 ,
1 if x > 0 .
30 Peter Schuster
The functions in probability theory make use of the third definition deter-
mined by P (X ≤ x) or H(0) = 1 in case of the Heaviside function.
Right-hand continuity is an important definition in the conventional han-
dling of stochastic processes, for example for semimartigales (subsection 3.1.1).
Often the property of right-hand continuity with left limits is denoted as
cadlag , which is an acronym from French for “continue a droite, limites a
gauche”.
An important step function for the characterization of a discrete proba-
bility distribution is the cumulative distribution function (cdf, see also
subsection 2.2.3). It is a mapping from sample space into the real numbers
on the unit interval, (P (X ≤ x; Ω) ⇒ (FX (x); R : 0 ≤ FX (x) ≤ 1) defined
by
FX (x) = P (X ≤ x) with limx→−∞
FX (x) = 0 and limx→+∞
FX (x) = 1 . (2.16)
Two examples for throwing one die or two dice are shown in figure 2.6. The
distribution function is defined for the entire x-axis, x ∈ R, but cannot be
integrated by conventional Riemann integration. The cumulative distribution
function and the partial sums of random variables, however, are continuous
and differentiable on the right-hand side of the steps and therefore they are
Stieltjes-Lebesgue integrable (see subsection 2.3.5).
The probability mass function (pmf) is also mapping from sample
space into the real numbers and gives the probability that a discrete random
variable X attains exactly some value x. We assume that X is a discrete
random variable on the sample space Ω, X : Ω→ R, and then we define the
probability mass function as a mapping onto the unit interval, fX : R→ [0, 1],
by
fX (x) = P (X = x) = P(s ∈ Ω : X (s) = x
). (2.17)
Sometimes it is useful to be able to treat a discrete probability distribution
as if it were continuous. The function fX (x) is defined therefore for all real
numbers, x ∈ R including those outside the sample set. Then we have:
fX (x) = 0 ∀x /∈ X (Ω). Figure 2.7 shows the probability mass function of
fair dice corresponding to the cumulative distributions in figure 2.6. For
Stochastic Kinetics 31
Figure 2.7: Probability mass function of fair dice. The probability mass
function (pmf), fX (x), belong to the two probability distributions FX (x) shown
in figure 2.6. The upper part of the figure represents the scores x obtained with
one die. The pmf is zero everywhere on the x-axis except at a set of points,
x = 1, 2, 3, 4, 5, 6, of measure zero where it adopts the value 1/6 (black). The
lower part contains the probability mass function for simultaneously throwing two
dice (red, see also figure 2.4). The maximal probability value is obtained for the
score x = 7.
throwing one die it consists of six peaks, fX (k) = 1/6 with k = 1, 2, . . . , 6
and has the value fX (x) = 0 everywhere else (x 6= k). In the case of two dice
the probability mass function corresponds to the discrete probabilities shown
32 Peter Schuster
in figure 2.4. It contains, in essence, the same information as the cumulative
distribution function or the listing of discrete probabilities.
2.2.3 Probabilities on intervals
Now, we write down sets, which are defined by the range of a random variable
on the closed interval [a, b],6
a ≤ X ≤ b = ω| a ≤ X (ω) ≤ b ,
and define their probabilities by P (a ≤ X ≤ b). More generally, the set
A of sample points can be defined by the open interval ]a, b[, the half-open
intervals [a, b[ and ]a, b], the infinite intervals, ]−∞, b[ and ]a,+∞[, as well
as the set of real numbers, R =]−∞,+∞[. When A is reduced to the single
point x, it is called the singleton x:
P (X = x) = P (X ∈ x) .
For countable, finite or countably infinite, sample spaces Ω the exact range
of X is just the set of the real numbers vi below:
VX =⋃
ω∈Ω
X (ω) = v1, v2, . . . , vn, . . . .
Now we introduce probabilities
pn = P (X = vn) , vn ∈ VX ,
and clearly we have P (X = x) = 0 if x /∈ VX .
Knowledge of all pn-values implies full information on all probabilities
concerning the random variable X :
P (a ≤ X ≤ b) =∑
a≤vn≤bpn or, in general, P (X ∈ A) =
∑
vn∈Apn . (2.18)
6The notation we are applying here uses square brackets, ’[’·’]’, for closed intervals,
open square brackets, ’]’·’[’, for open intervals, ’]’·’]’ and ’[’·’[’ for left-hand or right-hand
half-open intervals, respectively. An alternative less common notation uses parentheses
instead of open square brackets, e.g., ’(’·’)’.
Stochastic Kinetics 33
An especially important case, which has been discussed already in the pre-
vious subsection 2.2.2, is obtained when A is the infinite interval ] −∞, x].The function x→ FX (x), defined on R and in particular on the unit interval
[0, 1], 0 ≤ FX (x) ≤ 1, is the cumulative distribution function of X :
FX (x) = P (X ≤ x) =∑
vn≤xpn . (2.16’)
It fulfils several easy to verify properties:
FX (a) − FX (b) = P (X ≤ b) − P (X ≤ a) = P (a < X ≤ b) ,
P (X = x) = limε→0
(FX (x+ ε) − FX (x− ε)
), and
P (a < X < b) = limε→0
(FX (b− ε) − FX (a+ ε)
).
An important special case is an integer valued positive random variable Xcorresponding to a countably infinite sample space which is the set of non-
negative integers: Ω = N0 = 0, 1, 2, . . . , n, . . . with
pn = P (X = n) , n ∈ N0 and FX (x) =∑
0≤n≤xpn . (2.19)
Integer valued random variables will be used, for example, for modeling par-
ticle numbers in stochastic processes.
Two (or more) random variables,7 X and Y , form a random vector
(X ,Y), which is defined by the probability
P (X = xi,Y = yj) = p(xi, yj) . (2.20)
These probabilities constitute the joint probability distribution of the
random vector. By summation over one variable we obtain the probabilities
for the two marginal distributions:
P (X = xi) =∑
yj
p(xi, yj) = p(xi, ∗) and
P (Y = yj) =∑
xi
p(xi, yj) = p(∗, yj) ,(2.21)
of X and Y , respectively.
7For simplicity we restrict ourselves to the two variable case here. The extension to
any finite number of variables is straightforward.
34 Peter Schuster
2.3 Probability measure on uncountable sample spaces
A new and more difficult situation arises when the sample space Ω is un-
countable. Then problems with measurability arise from the impossibility
to assign a probability to every subset of Ω. The general task to develop
measures for uncountable sets that are based on countably infinite subsets is
highly demanding and requires advanced mathematics. For the probability
concept we are using here, however, a restriction of the measure to sets of a
certain family called Borel sets or Borel fields, F , and the introduction of
the Lebesgue measure is sufficient.
2.3.1 Existence of non-measurable sets
The notion that is essential for the construction of a probability measure for
an uncountable set is again the powerset Π(Ω), which is defined as the set of
all subsets of Ω (figure 2.3). It would seem straightforward to proceed exactly
as we did in the case of countability, the powerset, however, is too large since
it contains uncountably many subsets. Giuseppe Vitali [42] provided a proof
by example that no mapping P : Π(Ω) → [0, 1] for the infinitely repeated
coin flip, Ω = 0, 1N, exists, which fulfils the three indispensable properties
for probabilities [11, p.9,10]:
(N) normalization: P (Ω) = 1 ,
(A) σ-additivity: for pairwise disjoint events A1, A2, . . . ⊂ Ω holds
P
(⋃
i≥1
Ai
)
=∑
i≥1
P (Ai) ,
(I) invariance: For all A ⊂ Ω and k ≥ 1 holds P (TkA) = P (A), where Tk
is an operator that inverts the outcome of the n-th toss,
Tk : ω = (ω1, . . . , ωk−1, ωk, ωk+1 . . .)→ (ω1, . . . , ωk−1, 1− ωk, ωk+1 . . .),
and TkA = Tk(ω) : ω ∈ A is the image of A under the operation Tk.
Stochastic Kinetics 35
The first two conditions are the criteria for probability measures and the in-
variance condition (I) is specific for coin flipping and encapsulates the prop-
erties derived from the uniform distribution, UΩ. In general, such a relation
will always exist with the details depending on the stochastic process – coin
flip – and its implementation in the real world – uniform distribution – be
it an experimental setup, a census in sociology or the rules of gambling. We
dispense first with the details of the proof and mention only the nature of
the constructed contradiction:
1 = P (Ω) =∑
S∈SP (TSA) =
∑
S∈SP (A) (2.22)
cannot be fulfilled for infinitely large sequences of coin tosses. There are
sequences S where all values P (A) or P (TSA) are the same and infinite
summation of the same number yields either 0 or ∞ but never 1.
The construction of the proof starts from finite sets of all subsets of N:
S = S ⊂ N : |S| < ∞, S is countable as it is a union of countably many
finite sets S ⊂ N : maxS = m. For S = k1, . . . , kn ∈ S we define
TS = Πk∈STk = Tk1 . . . Tkn and an equivalence relation ’≡’ on Ω: ω ≡ ω iff
ωk = ωk for sufficiently large k. The axiom of choice guarantees the existence
of a set A ⊂ Ω, which contains exactly one element of each equivalence class.
Then we have
(i) for each ω ∈ Ω there exists an ω ∈ A with ω ≡ ω, and therefore an
S ∈ S with ω = TSω ∈ TSA and hence Ω =⋃S∈S TSA, and
(ii) the sets (TS)S∈S are pairwise disjoint as can be easily verified: Assume
that TSA∩ TSA 6= ∅ for S, S ∈ S then there exist a pair ω, ω ∈ A with
TSω = TS and consequently ω ≡ TSω = TS ≡ ω, and according to the
choice of A we have ω = ω and consequently S = S.
Application of the properties (N), (A) and (I) on P and taking m to infinity
yields equation (2.22) and completes the proof.
Accordingly, the proof of Vitali’s theorem demonstrates the existence of
a non-measurable subset of the real numbers called Vitali sets – precisely
36 Peter Schuster
Figure 2.8: Conceptual levels of sets in probability theory. The lowest
level is the sample space Ω, it contains the sample points or individual results ω as
elements, and events A are subsets of Ω: ω ∈ Ω and A ⊂ Ω. The next higher level
is the powerset Π(Ω). Events A are its elements and event systems F constitute its
subsets: A ∈ Π(Ω) and F ⊂ Π(Ω). The highest level finally is the power powerset
Π(Π(Ω)
)that houses event systems F as elements: F ∈ Π
(Π(Ω)
).
a subset of the real numbers that is not Lebesgue measurable (see subsec-
tion 2.3.2). The problem to be solved is a reduction of the powerset to an
event system F such that the subsets causing the lack of countability are
eliminated.
2.3.2 Borel σ-algebra and Lebesgue measure
Before we define minimal requirements for an event system F the three level
of sets in set theory that are relevant for our construction are considered (fig-
ure 2.8). The objects on the lowest level are the sample points corresponding
to individual results, ω ∈ Ω. The next higher level is the powerset Π(Ω)
housing the events A ∈ Π(Ω). The elements of the powerset are subsets
of the sample space A ⊂ Ω. To illustrate the role of event systems F we
need one more higher level, the powerset of the powerset, Π(Π(Ω)
): Event
systems F are elements of the power powerset, F ∈ Π(Π(Ω)
)and subsets on
the powerset, F ⊂ Π(Ω).
Stochastic Kinetics 37
Minimal requirements for an event system F are summarized in the
following definition for a σ-algebra on Ω with Ω 6= ∅ and F ⊂ Π(Ω):
(a) Ω ∈ F ,
(b) A ∈ F =⇒ Ac := Ω\A ∈ F , and
(c) A1, A2, . . . ∈ F =⇒ ⋃i≥1Ai ∈ F .
Condition (b) defines the logical negation as expressed by the difference be-
tween the entire sample space and the event A, and condition (c) represents
the logical ’or’ operation. The pair (Ω,F) is called an event space or a
measurable space. From the three properties (a) to (c) follow other prop-
erties. The intersection, for example, is the complement of the union of the
complements A ∩B = (Ac ∪Bc)c ∈ F . The consideration is easily extended
to the intersection of countable many subsets of F that belongs also to F .
Thus, a σ-algebra is closed under the operations ’c’, ’∪’ and ’∩’.8 Trivial
examples of σ-algebras are ∅,Ω, ∅, A, Ac,Ω or the family of all subsets.
A construction principle for σ-algebras makes stats out form an event
system G ⊂ Π(Ω) (Ω 6= ∅) that is sufficiently small and arbitrary. Then,
there exists exactly one smallest σ-algebra F = σ(G) in Ω with F ⊃ G, and
F is called the σ-algebra induced by G or G is the generator of F . Here are
three important examples:
(i) the powerset with Ω being countable where G =ω : ω ∈ Ω
is the
system of the subsets of Ω containing a single element, σ(G) = Π(Ω),
eachA ∈ Π(Ω) is countable, andA =⋃ω∈Aω ∈ σ(G) (see section 2.2),
(ii) the Borel σ-algebra Bn (see below), and
(iii) the product σ-algebra for sample spaces Ω that are Cartesian products
of sets Ek, Ω =∏
k∈I Ek where I is a set of indices with I 6= ∅. We
assume Ek is a σ-algebra on Ek with Xk : Ω→ Ek being the projection
onto the k-th coordinate and the generator G = X−1k Ak : k ∈ I, Ak ∈
8A family of sets is called closed under an operation when the operation can be applied
a countable number of times without producing a set that lies outside the family.
38 Peter Schuster
Ek is the system of all sets in Ω, which are determined by an event
on a single coordinate. Then,⊗
k∈I Ek := σ(G) is called the product
σ-algebra of the sets Ek on Ω. In the important case of equivalent
Cartesian coordinates, Ek = E and Ek = E for all k ∈ I the short-hand
notion E⊗I is common. The Borel σ-algebra on Rn is represented by
the n-dimensional product σ-algebra of the Borel σ-algebra B = B1 on
R, or Bn = B⊗n.
All three examples are required for a deeper understanding of probability
measures. The power set (i) provides the frame for discrete sample spaces the
Borel σ-algebra (ii) to be discussed below sets the stage for one-dimensional
continuous sample spaces. and (iii) the product σ-algebra represents the
natural extension to the n-dimensional Cartesian space.
For the construction of the Borelian σ-algebra9 we define a generator
representing the set of all compact cuboids in n-dimensional Cartesian space,
Ω = Rn, which have rational corners,
G =
n∏
k=1
[ak, bk] : ak < bk; ak, bk ∈ Q
(2.23)
where Q is the set of all rational numbers. The σ-algebra induced by this
generator is denoted as the Borelian σ-algebra, Bn := σ(G) on Rn and each
A ∈ Bn is a Borel set (for n = 1 one commonly writes B instead of B1).
Five properties of the Borel σ-algebra are useful for application and for
imagination of its enormous size.
(i) Each open set ’]..[’A ⊂ Rn is Borelian. Every ω ∈ A has a neighborhood
Q ∈ G with Q ⊂ A and therefore we have A =⋃Q∈G,Q⊂AQ representing
a union of countable many sets in Bn which follows from condition (c)
of σ-algebras.
(ii) Each closed set ’[..]’ A ⊂ Rn is Borelian since Ac is open and Borelian
according to item (i).
9Sometimes a Borel σ-algebra is also called a Borel field.
Stochastic Kinetics 39
(iii) The σ-algebra Bn cannot be described in a constructive way, because is
consists of much more than the union of cuboids and their complements.
In order to create Bn the operation of adding complements and count-
able unions has to be repeated as often as there are countable ordinal
numbers (and this leads to uncountable many times [43, pp.24, 29]). It
is sufficient to memorize for practical purposes that Bn covers almost
all sets in Rn – but not all of them.
(iv) The Borelian σ-algebra B on R is generated not only by the system
of compact sets (2.23) but also by the system of closed left-hand open
infinite intervals:
G = ]−∞, c]; c ∈ R . (2.23’)
Condition (b) requires G ⊂ B and – because of minimality of σ(G) –
σ(G) ⊂ B too. Alternatively, σ(G) contains all left-open intervals since
]a, b] =] − ∞, b] \ ] − ∞, a] and all compact or closed intervals since
[a, b] =⋂n≤1 ]a − 1
n, b] and accordingly also the σ-algebra B generated
from these intervals (2.23). In full analogy B is also generated from all
open left-unbounded, from all closed and open right-unbounded inter-
vals.
(v) The event system BnΩ = A∩Ω : A ∈ Bn on Ω ⊂ Rn, Ω 6= ∅ represents
a σ-algebra on ω, which is denoted as the Borelian σ-algebra on Ω.
All intervals discussed in items (i) to (iv) are (Lebesgue) measurable while
other sets are not.
The Lebesgue measure is the conventional mean of assigning lengths,
areas, volumes to subsets of three-dimensional Euclidean space and in formal
Cartesian spaces to objects with higher dimensional volumes. Sets to which
generalized volumes10 can be assigned are called Lebesgue measurable and
the measure or the volume of such a set A is denoted by λ(A). The Lebesgue
measure on Rn has the following properties:11
10We generalize volume here to arbitrary dimension n: The generalized volume for
n = 1 is a length, for n = 2 an area, for n = 3 a (conventional) volume and for arbitrary
dimension n a cuboid in n-dimensional space.11Slightly modified from Wikipedia: Lebesgue measure, version March 04, 2011.
40 Peter Schuster
(1) If A is a Lebesgue measurable set, then λ(A) ≥ 0.
(2) If A is a Cartesian product of intervals, I1 ⊗ I2 ⊗ . . . ⊗ In, then A is
Lebesgue measurable and λ(A) = |I1| · |I2| · . . . · |In|.
(3) If A is Lebesgue measurable, its complement Ac is so too.
(4) If A is a disjoint union of countably many disjoint Lebesgue measurable
sets, A =⋃k Ak, then A is itself Lebesgue measurable and
λ(A) =∑
k λ(Ak).
(5) If A and B are Lebesgue measurable and A ⊂ B, then holds
λ(A) ≤ λ(B).
(6) Countable unions and countable intersections of Lebesgue measurable
sets are Lebesgue measurable.12
(7) If A is an open or closed subset or Borel set of Rn, then A is Lebesgue
measurable.
(8) The Lebesgue measure is strictly positive on non-empty open sets, and
so its support is the entire Rn.
(9) If A is a Lebesgue measurable set with λ(A) = 0, called a null set, then
every subset of A is also a null set, and every subset of A is measurable.
(10) If A is Lebesgue measurable and r is an element of Rn, then the transla-
tion of A by r that is defined by A+r = a+r : a ∈ A is also Lebesgue
measurable and has the same measure as A.
(11) If A is Lebesgue measurable and δ > 0, then the dilation of A by δ
defined by δA = δr : r ∈ A is also Lebesgue measurable and has
measure δnλ(A).
12This is not a consequence of items (3) and (4): A family of sets, which is closed
under complements and countable disjoint unions need not be closed under (non-disjoint)
countable unions, for example the set∅, 1, 2, 1, 3, 2, 4, 3, 4, 1, 2, 3, 4
.
Stochastic Kinetics 41
(12) In generalization of items (10) and (11), if L is a linear transformation
and A is a measurable subset of Rn, then T (A) is also measurable and
has the measure | det(T )| λ(A).
All twelve items listed above can be succinctly summarized in one lemma:
The Lebesgue measurable sets form a σ-algebra on Rn containing
all products of intervals, and λ is the unique complete translation-
invariant measure on that σ-algebra with
λ([0, 1]⊗ [0, 1]⊗ . . .⊗ [0, 1]
)= 1.
We conclude this subsection on Borel algebra and Lebesgue measure by men-
tioning a few characteristic and illustrative examples:
• Any closed interval [a, b] of real numbers is Lebesgue measurable, and
its Lebesgue measure is the length b − a. The open interval ]a, b[ has
the same measure, since the difference between the two sets consists of
the two endpoint a and b only and has measure zero.
• Any Cartesian product of intervals [a, b] and [c, d] is Lebesgue mea-
surable and its Lebesgue measure is (b − a) · (d − c) the area of the
corresponding rectangle.
• The Lebesgue measure of the set of rational numbers in an interval of
the line is zero, although this set is dense in the interval.
• The Cantor set13 is an example of an uncountable set that has Lebesgue
measure zero.
• Vitali sets are examples of sets that are not measurable with respect
to the Lebesgue measure.
In the forthcoming sections we make use of the fact that the continuous
sets on the real axes become countable and Lebesgue measurable if rational
numbers are chosen as beginnings and end points of intervals. Hence, we can
work with real numbers with almost no restriction for practical purposes.
13The Cantor set is generated from the interval [0, 1] through consecutively taking out
the open middle third: [0, 1] → [0, 13 ] ∪ [ 23 , 1] → [0, 1
9 ] ∪ [ 29 ,13 ] ∪ [ 23 ,
79 ] ∪ [89 , 1] → . . .. An
explicit formula for the set is: C = [0, 1]\⋃∞m=1
⋃(3m−1−1)k=0
]3k+13m , 3k+2
3m
[.
42 Peter Schuster
2.3.3 Random variables on uncountable sets
Sufficient for dealing with random variables on uncountable sets is a prob-
ability triple (Ω,F , P ). The sets in F being the Borel σ-algebra are mea-
surable and they alone have probabilities. We are now in the position to
handle probabilities on uncountable sets:
ω|X (ω) ≤ x ∈ F and P (X ≤ x) =|X (ω) ≤ x|
|Ω| (2.24a)
a < X ≤ b = X ≤ b − X ≤ a ∈ F with a < b (2.24b)
P (a < X ≤ b) =|a < X ≤ b|
|Ω| = F (b) − F (a) . (2.24c)
Equation (2.24a) contains the definition of a real-valued function X that
is called a random variable iff it fulfils P (X ≤ x) for any real number x,
equation (2.24b) is valid since F is closed under difference, and finally equa-
tion (2.24c) provides the basis for defining and handling probabilities on
uncountable sets. The three equations (2.24) together constitute the basis of
the probability concept on uncountable sample spaces that will be applied
throughout this course.
Random variables on uncountable sets Ω are commonly characterized by
probability density functions (pdf). The probability density function –
or density for short – is the continuous analogue to the (discrete) probability
mass function (pmf). A density is a function f on R =]−∞,+∞[ , u→ f(u),
which satisfies the two conditions:
(i) ∀u : f(u) ≥ 0 , and
(ii)
∫ +∞
−∞f(u) du = 1 .
Now we can define a class of random variables14 on general sample spaces: Xis a function on Ω : ω → X (ω) whose probabilities are prescribed by means
of a density function f(u). For any interval [a, b] the probability is given by
P (a ≤ X ≤ b) =
∫ b
a
f(u) du . (2.25)
14Random variables having a density are often called continuous in order to distinguish
them from discrete random variables defined on countable sample spaces.
Stochastic Kinetics 43
If A is the union of not necessarily disjoint intervals (some of which may be
even infinite), the probability can be derived in general from the density
P (X ∈ A) =
∫
A
f(u) du ,
in particular, A can be split in disjoint intervals, A =⋃kj=1[aj, bj ] and then
the integral can be rewritten as
∫
A
f(u) du =
k∑
j=1
∫ bj
aj
f(u) du .
For A = ]−∞, x] we derive the (cumulative probability) distribution func-
tion F (x) of the continuous random variable X
F (x) = P (X ≤ x) =
∫ x
−∞f(u) du .
If f is continuous then it is the derivative of F as follows from the fundamental
theorem of calculus
F ′(x) =dF (x)
dx= f(x).
If the density f is not continuous everywhere, the relation is still true for
every x at which f is continuous.
If the random variable X has a density, then we find by setting a = b = x
P (X = x) =
∫ x
x
f(u) du = 0
reflecting the trivial geometric result that every line segment has zero area.
It seems somewhat paradoxical that X (ω) must be some number for every ω
whereas any given number has probability zero. The paradox can be resolved
by looking at countable and uncountable sets in more depth.
Extension to two variables X and Y , forming a random vector (X ,Y),
yields the joint probability distribution f
P (X ≤ x,Y ≤ y) =
∫ x
−∞
∫ y
−∞f(u, v) du dv . (2.26)
Again we have to restrict the definition of probabilities to Borel sets S which
could be, for example, polygons filling the two-dimensional plane,
P((X ,Y) ∈ S
)=
∫∫
S
f(u, v) du dv .
44 Peter Schuster
Figure 2.9: Discretization of a probability density. The segment [x1, xm] on
the u-axis is divided up into m− 1 not necessarily equal intervals and elementary
probabilities are obtained by integration.
The joint density function f satisfies the following conditions:
(i) ∀(u, v) : f(u, v) ≥ 0 , and
(ii)
∫ +∞
−∞
∫ +∞
−∞f(u, v) du dv = 1 .
Condition (ii) implies that f is integrable over the whole plane. As in the
discrete case we may compute the probabilities of the individual variable from
marginal density functions, u→ f(u, ∗) and v → f(∗, v), respectively:
P (X ≤ x) =
∫ x
−∞f(u, ∗) du where f(u, ∗) =
∫ +∞
−∞f(u, v) dv ,
P (Y ≤ y) =
∫ y
−∞f(∗, v) dv where f(∗, v) =
∫ +∞
−∞f(u, v) du .
(2.27)
In the most general case we may also define a joint distribution function
F of (X ,Y) by
F (x, y) = P (X ≤ x,Y ≤ y) for all (x, y) ,
Stochastic Kinetics 45
Table 2.1: Comparison of the formalism of probability theory on countable and
uncountable sample spaces.
Countable case Uncountable case
Range vn, n = 1, 2, . . . −∞ < u < +∞
Probability element pn f(u) du = dF (u)
P (x ≤ X ≤ b)∑
a≤vn≤b pn∫ baf(u) du
P (X ≤ x) = F (x)∑
vn≤x pn∫ x−∞ f(u) du
E(X )∑
n pn vn∫∞−∞ u f(u) du
proviso∑
n pn |vn| <∞∫∞−∞ |u| f(u) du <∞
and obtain the marginal distribution functions as limits:
limy→∞
F (x, y) = F (x,∞) = P (X ≤ x,Y <∞) = P (X ≤ x) and
limx→∞
F (x, y) = F (∞, y) = P (X <∞,Y ≤ y) = P (Y ≤ y) .(2.28)
We note that the relations X < ∞ and Y < ∞ put no restrictions on the
variables X and Y .
Let us finally consider the process of discretization of a density function
in order to yield a set of elementary probabilities. The x-axis is divided up
into m+1 pieces (figure 2.9), not necessarily equal and not necessarily small,
and denote the piece of the integral between xn and xn+1 by
pn =
∫ xn+1
xn
f(u) du , 0 ≤ n ≤ m . (2.29)
When x0 = −∞ and xm+1 = +∞ we have
∀n : pn ≥ 0 and∑
n
pn = 1 .
The partition is not finite but countable, provided we label the intervals
suitably, for example . . . , p−2, p−1, p0, p1, p2, . . . . Now we consider a random
46 Peter Schuster
variable Y such that
P (Y = xn) = pn , (2.29’)
where we may replace xn by any number in the subinterval [xn, xn+1]. The
random variable Y can be interpreted as the discrete analogue of the random
variable X .
We end by presenting a comparison between probability measures on
countable and uncountable sample spaces where the latter are based on a
probability density f(u) in table 2.1.15
2.3.4 Limits of series of random variables
Limits of sequences are required for problems convergence of and for approx-
imations to random variables. The problem of limits arises because there is
ambiguity in the definition of limits.
A sequence of random variables, Xn, is defined on a probability space Ω
and it is assumed to have the limit
X = limn→∞
Xn . (2.30)
The probability space Ω, we assume now, has elements ω which have a prob-
ability density p(ω). Four different definitions of the limit are common in
probability theory [6, pp.40,41].
Almost certain limit. The series Xn converges almost certainly to X if for
all ω except a set of probability zero
X (ω) = limn→∞
Xn(ω) . (2.31)
is fulfilled and each realization of Xn converges to X .
Limit in the mean. The limit in the mean or the mean square limit of a
series requires that the mean square deviation of Xn(ω) from X (ω) vanishes
in the limit and the condition is
limn→∞
∫
Ω
dω p(ω)(Xn(ω)−X (ω)
)2
≡ limn→∞
⟨(Xn −X )2
⟩= 0 . (2.32)
15Expectation values E(X ) and higher moments of probability distributions are dis-
cussed in section 2.5.
Stochastic Kinetics 47
The mean square limit is the standard limit in Hilbert space theory and it is
commonly used in quantum mechanics.
Stochastic limit. The limit in probability also called the stochastic limit
fulfils the condition: X is the stochastic limit if for any ε > 0 the relation
limn→∞
P (|Xn − X| > ε) = 0 . (2.33)
Limit in distribution. Probability theory uses also a weaker form of conver-
gence than the previous three limits, the limit in distribution, which requires
that for any continuous and bounded function f(x) the relation
limn→∞
〈f(Xn)〉 = 〈f(X )〉 (2.34)
holds. This limit is particularly useful for characteristic functions, f(x) =
exp(ı.ıxs): If two characteristic functions approach each other, then the prob-
ability density of Xn converges to that of X .
2.3.5 Stieljes and Lebesgue integration
This subsection provides a short repetition of some generalizations of the
conventional Riemann integral, which are important in probability theory.
We start with a sketch comparing the Riemann and the Lebesgue approach
to integration presented in figure 2.10. One difference between the two inte-
gration methods for a non-negative function – like the probability functions
– can be visualized in three dimensional space: The volume below a surface
given by the non-negative function g(x, y) is measured by summation of the
volumes of cuboids with squares of edge length ∆d whereas the Lebesgue in-
tegral is summing the volumes of layers with thickness ∆d between constant
level sets.
The Stieltjes integral is an important generalization of Riemannian
integration ∫ b
a
g(x) dh(x) . (2.35)
Herein g(x) is the integrand and h(x) is the integrator, and the conventional
Riemann integral is obtained for h(x) = x. The integrator can be visualized
48 Peter Schuster
Figure 2.10: Comparison of Riemann and Lebesgue integrals. In the con-
ventional Riemannian-Darboux integrationa the integrand is embedded between
an upper sum (light blue) and a lower sum (blue) of rectangles. The integral ex-
ists iff the upper sum and the lower sum converge to the integrand in the limit
∆d→ 0. The Lebesgue integral can be visualized as an approach to calculating the
area enclosed by the x-axis and the integrand through partitioning into horizontal
stripes (red) and considering the limit ∆d→ 0. The definite integral∫ ba g(x) dx is
confining the integrand to a closed interval: [a, b] or a ≤ x ≤ b.a The concept of representing the integral by the convergence of two sums is due to the
French mathematician Gaston Darboux. A function is Darboux integrable iff it is Riemann
integrable, and the values of the Riemann and the Darboux integral are equal in case they
exist.
best as a weighting function for the integrand. In case g(x) and h(x) are con-
tinuous and continuously differentiable the Stieltjes integral can be resolved
Stochastic Kinetics 49
by partial integration:∫ b
a
g(x) dh(x) =
∫ b
a
g(x)dh(x)
dxdx =
=(g(x)h(x)
) ∣∣∣b
x=a−∫ b
a
dg(x)
dxh(x) dx =
= g(b)h(b)− g(a)h(a) −∫ b
a
dg(x)
dxh(x) dx .
The integrator F (x) may also be a step function: For g(x) being continuous
and F (x) making jumps at the points x1, . . . , xn ∈ ]a, b [ with the heights
∆F1, . . . ,∆Fn ∈ R, and∑n
i=1 ∆Fn ≤ 1, the Stieltjes integral is of the form
∫ b
a
g(x) dF (x) =n∑
i=1
g(xi) ∆Fi . (2.36)
With g(x) = 1 and in the limit lima→−∞ the integral becomes identical with
the (discrete) cumulative probability distribution function (cdf).
Riemann-Stieltjes integration is used in probability theory to compute,
for example, moments of probability densities (section 2.5). If F (x) is the
cumulative probability distribution of a random variable X the expected
value (see section 2.5) for any function g(X ) is obtained from
E(g(X )
)=
∫ +∞
−∞g(x) dF (x) =
∑
i
g(xi) ∆Fi,
and this is the equation for the discrete case. If the random variable X has a
probability density f(x) = dF (x)/dx with respect to the Lebesgue measure
continuous integration can be used
E(g(X )
)=
∫ +∞
−∞g(x) f(x) dx .
Important special cases are the moments: E(X n) =∫ +∞−∞ xn dF (x).
Lebesgue theory of integration assumes the existence of a probability
space defined by the triple (Ω,F , µ) representing sample space Ω, a σ-algebra
F of subsets A ∈ Ω and a (non-negative) probability measure µ satisfying
µ(Ω) = 1. Lebesgue integrals are defined for measurable functions g fulfilling
g−1([a, b]
)∈ Ω for all a < b . (2.37)
50 Peter Schuster
This condition is equivalent to the requirement that the pre-image of any
Borel subset [a, b] of R is an element of the event system B. The set of mea-
surable functions is closed under algebraic operation and also closed under
certain pointwise sequential limits like supk∈N
gk, lim infk∈N
gk or lim supk∈N
gk, which
are measurable if the sequence of functions (gk)k∈N contains only measurable
functions.
The construction of an integral∫Ωg dµ =
∫Ωg(x)µ(dx) is done in steps
and we begin with the indicator function:
1A(x) =
1 iff x ∈ A0 otherwise
, (2.38)
which provides a possibility to define the integral over A ∈ Bn by∫
A
g(x) dx :=
∫1A(x) g(x) dx
and which assigns a volume to Lebesgue measurable sets by setting g ≡ 1∫
1A dµ = µ(A)
and which is the Lebesgue measure µ(A) = λ(A) for a mapping λ : B → R.
Simple functions are finite linear combinations of indicator functions
g =∑
j αj 1Aj. They are measurable if the coefficients αj are real numbers
and the sets Aj are measurable subsets of Ω. For non-negative coefficients
αj the linearity property of the integral leads to a measure for non-negative
simple functions:
∫ (∑
j
αj 1Aj
)dµ =
∑
j
αj
∫1Aj
dµ =∑
j
αj µ(Aj) .
Often a simple function can be written in several ways as a linear combination
of indicator functions and then the value of the integral will always be the
same. Sometimes some care is needed in the construction of a real-valued
simple function g =∑
j αj1Ajin order to avoid undefined expressions of the
kind ∞−∞. Choosing αi = 0 implies that αiµ(Ai) = 0 because 0 · ∞ = 0
by convention in measure theory.
Stochastic Kinetics 51
An arbitrary non-negative function g : (Ω,F , µ) → (R+,B, λ) is mea-
surable iff there exists a sequence of simple functions (gk)k∈N that converges
pointwise16 and growing monotonously to g. The Lebesgue integral of a
non-negative and measurable function g is defined by∫
Ω
g dµ = limk→∞
∫
Ω
gk dµ (2.39)
with gk being simple functions that converge pointwise and monotonously
towards g. The limit is independent of the particular choice of the functions
gk. Such a sequence of simple functions is easily visualized, for example,
by the bands below the function g(x) in figure 2.10: The band widths ∆d
decrease and converge to zero as the index increases, k →∞.
The extension to general functions with positive and negative value do-
mains is straightforward. There is one important major difference between
Riemann and Lebesgue integration: The contribution to the Riemann inte-
gral changes sign when the function changes sign, whereas all partial areas
enclosed between the function and the axis of integration are summed up in
the Lebesgue integral. The improper Riemann integral,∫∞
0cosxdx, has a
limit inferior, lim infn→∞ xn = −1, and a limit superior, lim supn→∞ xn = +1,
whereas the corresponding Lebesgue integral does not exist.
For Ω = R and the Lebesgue measure λ holds: Functions that are Rie-
mann integrable on a compact interval [a, b] are Lebesgue integrable too and
the values of both integral are the same. The inverse is not true: Not every
Lebesgue integrable function is Riemann integrable (see the Dirichlet func-
tion below). A function that has an improper Riemann integral need not be
Lebesgue integrable on the whole. We consider one example for each case:
(i) The Dirichlet (step) function, D(x) is the characteristic function of the
rational numbers and assumes the value 1 for rational x and the value 0 for
irrational x:
D(x) =
1 , if x ∈ Q ,
0 , otherwise ,or D(x) = lim
m→∞limn→∞
cos2n(m! π x) .
16Pointwise convergence of a sequence of functions fn, limn→∞ fn = f pointwise is
fulfilled iff limn→∞ fn(x) = f(x) for every x in the domain.
52 Peter Schuster
Figure 2.11: The alternating harmonic series. The alternating harmonic
step function, h(x) = nk = (−1)k+1/k with (k − 1) ≤ x < k and nk ∈ N, has an
improper Riemann integral since∑∞
k=1 nk = ln 2. It is not Lebesgue integrable
because the series∑∞
k=1 |nk| diverges.
D(x) is lacking Riemann integrability for every arbitrarily small interval:
Each partitioning S of the integration domain [a, b] into intervals [xk−1, xk]
leads to parts that contain necessarily at least one rational and one irrational
number. Hence the lower Darboux sum,
Σlow(S) =
n∑
k=1
(xk − xk−1) · infxk−1<x<xk
D(x) = 0 ,
vanishes because the infimum is always zero, and the upper Darboux sum,
Σhigh(S) =
n∑
k=1
(xk − xk−1) · supxk−1<x<xk
D(x) = b− a ,
is the length of the integration interval, b − a, because the supremum is
always one and the summation runs over all partial intervals. Since Riemann
Stochastic Kinetics 53
integrability requires
supS
Σlow(S) =
∫ b
a
f(x)dx = infS
Σhigh(S)
D(x) cannot be Riemann integrated.
D(x), on the other hand, has a Lebesgue integral for every interval: D(x) is a
non-negative simple function and therefore we can write the Lebesgue integral
over an interval S through sorting into irrational and rational numbers:∫
S
D dλ = 0 · λ(S ∩ R\Q) + 1 · λ(S ∩Q) ,
with λ being the Lebesgue measure. The evaluation of the integral is straight-
forward. The first term vanishes because multiplication by zero yields zero
no matter how large λ(S ∩ R\Q) is, because 0 · ∞ is zero by the conven-
tion of measure theory, and the second term is also zero because λ(S ∩ Q)
is zero since the set of rational numbers, Q, is countable. Hence we have∫SD dλ = 0.
(ii) The step function with alternatingly positive and negative areas of size1n, (1,−1
2, 1
3,−1
4, . . .) (see figure 2.11) is an example of a function that has
an improper Riemann integral whereas the Lebesgue integral diverges. The
function h(x) = (−1)k+1/k with (k − 1) ≤ x < k and k ∈ N yields a series
of contributions of alternating sign on Riemann integration that has a finite
infinite sum ∫ ∞
0
h(x) dx = 1− 1
2+
1
3− . . . = ln 2 ,
whereas Lebesgue integrability of h requires∫
R+ |h| dλ < ∞ and this is not
fulfilled since the harmonic series,∑∞
k=1 k−1, diverges.
The first case is the more important issue since it provides the proof that the
set of rational numbers, Q is of Lebesgue measure zero.
Finally, we introduce the Lebesgue-Stieltjes integral in a way that allows
us to summarize the most important results of this subsection. For each
righthand continuous and monotonously increasing function F : R → R
exists a uniquely determined Lebesgue-Stieltjes measure λF that fulfils
λF((a, b]
)= F (b) − F (a) for all (a, b] ⊂ R
54 Peter Schuster
Such righthand continuous and monotonously increasing functions F : R→R are therefore called measure generating. The Lebesgue integral of a λF
integrable function g is called Lebesgue-Stieltjes integral
∫
A
g dλF with A ∈ B (2.40)
being Borel measurable. If F is the identity function on R,17 F = id :
R→ R, id(x) = x, then the corresponding Lebesgue-Stieltjes measure is the
Lebesgue itself: λF = λid = λ. For (proper) Riemann integrable functions
g we have stated that the Lebesgue integral is identical with the Riemann
integral: ∫
[a,b]
g dλ =
∫ b
a
g(x) dx .
The interval [a, b] = a ≤ x ≤ b is partitioned into a sequence σn = (a =
x(n)0 , x
(n)1 , . . . , x
(n)r = b) where the superscript ’(n)’ indicates a Riemann sum
with |σn| → 0 and the Riemann integral on the righthand side is replaced by
the limit of the Riemann summation:
∫
[a,b]
g dλ = limn→∞
r∑
k=1
g(x(n)k−1)
(x
(n)k − x
(n)k−1
)=
= limn→∞
r∑
k=1
g(x(n)k−1)
(id(x
(n)k )− id(x
(n)k−1)
).
The Lebesgue measure λ has been introduced above as the special case F = id
and therefore we find for the Stieltjes-Lebesgue integral by replacing λ by λF
and ’id’ by F
∫
[a,b]
g dλ = limn→∞
r∑
k=1
g(x(n)k−1)
(F (x
(n)k )− F (x
(n)k−1)
).
The details of the derivation are found in [44, 45].
17The identity function id(x).= x, it maps a domain, for example [a, b], point by point
onto itself.
Stochastic Kinetics 55
In summary we define a Stieltjes-Lebesgue integral or F -integral by: F, g :
R → R are two functions partitioned on the interval [a, b] by the sequence
σ = (a = x0, x1, . . . , xr = b) defined by
∑
σ
g dF.=
r∑
k=1
g(xk−1)(F (xk)− F (xk−1)
).
The function g is F-integrable on [a,b] if
b∫
a
g dF.= lim
|σ|→0
∑
σ
g dF (2.41)
exists in R and∫ bagdF is called the Stieltjes-Lebesgue integral or F -integral
of g. This formulation will be required for the presentation of the Ito integral
used in Ito calculus in section 3.
56 Peter Schuster
2.4 Conditional probabilities and independence
The conventional probability has been defined on the entire sample space
Ω, P (A) = |A|/|Ω| =∑
ω∈A P (ω)/ ∑
ω∈Ω P (ω).18 We shall now define a
probability of set A relative to another set, say S. This means that we are
interested in the proportional weight of the part of A in S which is expressed
by the intersection A ∩ S relative to S, and obtain
∑
ω∈A∩SP (ω)
/ ∑
ω∈SP (ω) .
In other words, we switch from Ω to S as the new universe and consider the
conditional probability of A relative to S:
P (A|S) =P (A ∩ S)
P (S)=
P (AS)
P (S)(2.42)
provided P (S) 6= 0. Apparently, the conditional probability vanishes when
the intersection is empty: P (A|S) = 0 if P (A ∩ S) = ∅. From here on we
shall always use the short notation for the intersection, AS ≡ A ∩ S.
Next we mention several simple but fundamental relations involving con-
ditional probabilities that we present here, in essence, without proof (for
details see [5], pp.111-144). For n arbitrary events Ai we have
P (A1, A2, . . . , An) = P (A1)P (A2|A1)P (A3|A1A2) . . . P (An|A1A2 . . . An−1)
provided P (A1A2 . . . An−1) > 0. Under this proviso all conditional probabil-
ities are well defined since
P (A1) ≥ P (A1A2) ≥ . . . ≥ P (A1A2 . . . An−1) > 0 .
Let us assume that the sample space Ω is partitioned into n disjoint sets,
Ω =∑
nAn. For any set B we have then
P (B) =∑
n
P (An)P (B|An) .
18The sample space Ω is assumed to be countable and the weight P (ω) = P (ω) is
assigned to every point. Generalization to Lebesgue measures is straightforward.
Stochastic Kinetics 57
From this relation it is straightforward to derive the conditional probability
P (Aj|B) =P (Aj)P (B|Aj)∑n P (An)P (B|An)
provided P (B) > 0.
Independence of random variables will be a highly relevant problem in
the forthcoming chapters. Countably-valued random variables X1, . . . ,Xnare defined to be independent if and only if for any combination x1, . . . , xn
of real numbers the joint probabilities can be factorized:
P (X1 = x1, . . . ,Xn = xn) = P (X1 = x1) · . . . · P (Xn = xn) . (2.43)
A major extension of equation (2.43) replaces the single values xi by arbitrary
sets Si
P (X1 ∈ S1, . . . ,Xn ∈ Sn) = P (X1 ∈ S1) · . . . · P (Xn ∈ Sn) .
In order to proof this extension we sum over all points belonging to the sets
S1, . . . , Sn:
∑
x1∈S1
· · ·∑
xn∈Sn
P (X1 = x1, . . . ,Xn = xn) =
=∑
x1∈S1
· · ·∑
xn∈Sn
P (X1 ∈ S1) · . . . · P (Xn ∈ Sn) =
=
(∑
x1∈S1
P (X1 ∈ S1)
)· . . . ·
(∑
xn∈Sn
P (Xn ∈ Sn))
,
which is equal to the right hand side of the equation to be proven.
Since the factorization is fulfilled for arbitrary sets S1, . . . Sn it holds also
for all subsets of (X1 . . .Xn) and accordingly the events
X1 ∈ S1, . . . , Xn ∈ Sn
are also independent. It can also be verified that for arbitrary real-valued
functions ϕ1, . . . , ϕn on (−∞,+∞) the random variables ϕ1(X1), . . . , ϕn(Xn)are independent too.
58 Peter Schuster
Independence can be extended in straightforward manner to the joint
distribution function of the random vector (X1, . . . ,Xn)
F (x1, . . . , xn) = F1(x1) · . . . · Fn(xn) ,
where the Fj’s are the marginal distributions of the Xj ’s , 1 ≤ j ≤ n. Thus,
the marginal distributions determine the joint distribution in case of inde-
pendence of the random variables.
For the continuous case we can formulate the definition of independence
for the sets S1, . . . , Sn forming a Borel family. In particular if there is a joint
density function f , then we have
P (X1 ∈ S1, . . . ,Xn ∈ Sn) =
(∫
S1
f1(u) du
)· . . . ·
(∫
Sn
fn(u) du
)=
=
∫
S1
· · ·∫
Sn
f1(u1) . . . fn(un) du1 . . . dun ,
where f1, . . . , fn are the marginal densities. The probability is also equal to∫
S1
· · ·∫
Sn
f(u1, . . . , un) du1 . . . dun
and hence we finally find for the density case:
f(u1, . . . , un) = f1(u1) . . . fn(un) . (2.44)
As we have seen here, stochastic independence makes it possible to factorize
joint probabilities, distributions or densities.
Applications of conditional probabilities to problems in biology are found
in chapter 5. Genetics was indeed one of the first cases in science where
probabilities were used in the interpretation of experimental results (see, for
example, the works of Gregor Mendel as described in [46, 47] and section 1.2).
Finally, we mention that a whole branch of probability theory Bayesian
statistics is based on conditional probabilities. It is named after the English
mathematician and Presbyterian minister Thomas Bayes who initiated an
alternative way to think about probabilities by the formulation of Bayes’s
theorem about hypothesis H and data D [48]:
P (H|D) =P (D|H)P (H)
P (D), (2.45)
Stochastic Kinetics 59
wherein P (H) is the prior probability that H is correct before the data
are seen, P (D) is the marginal probability of D giving the prior probability
of witnessing the data under all possible hypotheses and as such depends on
the prior probabilities gives to them
P (D) =∑
i
P (D,Hi) =∑
i
P (D|Hi)P (Hi) ,
P (D|H) is the conditional probability of seeing the data D given that the
hypothesis H is true, and P (H|D) eventually is the posterior probability,
which is the probability that the hypothesis H is true given the data D in
the previous state of belief about the hypothesis (For the current status of
Bayesian statistics see, for example, [49–51]). Bayesian statistics is thus deal-
ing with statistical inference rather than the conventional frequency-based
interpretation of probabilities and accordingly is much closer to formal logic.
60 Peter Schuster
2.5 Expectation values and higher moments
Random variables are accessible to analysis via their probabilities. In addi-
tion, straightforward information can be derived also from ensembles defined
on the entire sample space Ω. The most important example is the expecta-
tion value, E(X ) = 〈X 〉. We start with a countable sample space:
E(X ) =∑
ω∈AX (ω)P (ω) =
∑
n
pn vn . (2.46)
In the special case of a random variable X on N0 we have
E(X ) =∑
n=0
n pn .
The expectation value (2.46) exists when the series converges in absolute val-
ues,∑
ω∈Ω |X (ω)|P (ω) <∞. Whenever the random variable X is bounded,
which means that there exists a number m such that |X (ω)| ≤ m for all
ω ∈ Ω, then it is summable and in fact
E(|X |) =∑
ω
|X (ω)|P (ω) ≤ m∑
ω
P (ω) = m .
It is straightforward to show that the sum of two random variable, X + Y is
summable iff X and Y are summable:
E(X + Y) = E(X ) + E(Y) .
The relation can be extended to an arbitrary countable number of random
variables:
E
(n∑
k=1
Xk)
=n∑
k=1
E(Xk) .
In addition, the expectation values fulfill the following relations E(a) = a,
E(aX ) = a ·E(X ) which can be combined in
E
(n∑
k=1
ak Xk)
=n∑
k=1
ak · E(Xk) . (2.47)
Thus, E(·) is a linear operator.
Stochastic Kinetics 61
For a random variable X on an arbitrary sample space Ω the expectation
value may be written as an abstract integral on Ω or – provided the density
f(u) exists and we know it – as an integral over R:
E(X ) =
∫
Ω
X (ω) dω =
∫ +∞
−∞u f(u) du . (2.48)
It is worth to reconsider the discretization of a continuous density in this
context (see figure 2.9 and section 2.3): The discrete expression for the ex-
pectation value is based upon pn = P (Y = xn) as outlined in equations (2.29)
and (2.29’),
E(Y) =∑
n
xn pn ≈ E(X ) =
∫ +∞
−∞u f(u) du ,
and approximates the exact value in the sense of an approximation to the
Riemann integral.
2.5.1 First and second moments
For two or more variables, for example (X ,Y), described by a joint density
f(u, v), we have
E(X ) =
∫ +∞
−∞u f(u, ∗) du and E(Y) =
∫ +∞
−∞v f(∗, v) dv .
The expectation value of the sum of the variables, X + Y , can be evaluated
by iterated integration:
E(X + Y) =
∫ +∞
−∞
∫ +∞
−∞(u+ v) f(u, v) du dv =
=
∫ +∞
−∞u du
(∫ +∞
−∞f(u, v) dv
)+
∫ +∞
−∞v dv
(∫ +∞
−∞f(u, v) du
)=
=
∫ +∞
−∞u f(u, ∗) du +
∫ +∞
−∞v f(∗, v) dv =
= E(X ) + E(Y) ,
which establishes the previously derived expression.
62 Peter Schuster
The multiplication theorem of probability theory requires that the
two variables X and Y are independently summable and this implies for the
discrete and the continuous case,
E(X · Y) = E(X ) ·E(Y) and (2.49a)
E(X · Y) =
∫ +∞
−∞
∫ +∞
−∞uv f(u, v) du dv =
=
∫ +∞
−∞u f(u, ∗) du
∫ +∞
−∞v f(∗, v) dv =
= E(X ) ·E(Y) , (2.49b)
respectively. The multiplication theorem is easily extended to any finite
number of independent and summable random variables:
E(X1, . . . ,Xn) = E(X1) · . . . ·E(Xn) . (2.49c)
Let us now consider the expectation values of special functions of random
variables, in particular, their powers which give rise to the raw moments of
the probability distribution, µr. For a random variable X we distinguish the
r-th moments E(X r) and the so-called centered moments19 µr = E(X r)
referring to the random variable
X = X − E(X ) .
Clearly, the first raw moment is the expectation value and the first centered
moment vanishes, E(X ) = µ1 = 0. Often the expectation value is denoted by
µ = µ1 = E(X ), a notation that we shall use too for the sake of convenience
but it is important not to confuse µ and µ1.
In general, a moment is defined about some point a by means of the
random variable
X (a) = X − a .
19Since the moments centered around the expectation value, will be used more frequently
than the raw moments we denote them by µ and the raw moments by µ. The r-th moment
of a distribution is also called the moment of order r.
Stochastic Kinetics 63
For a = 0 we obtain the raw moments
µr = E(X r) (2.50)
whereas a = E(X ) yields the centered moments.
The general expressions for the raw r-th moments and centered moments
as derived from the density f(u) are
E(X r) = αr(X ) =
∫ +∞
−∞ur f(u) du and (2.51a)
E(X r) = µr(X ) =
∫ +∞
−∞(u− µ)r f(u) du . (2.51b)
The second centered moment is called the variance, σ2(X ), and its pos-
itive square root, the standard deviation σ(X ). The variance is always a
non-negative quantity as can be easily shown. Further we can derive:
σ2(X ) = E(X 2) = E
((X − E(X )
)2)
=
= E(X 2 − 2X E(X ) + E(X )2
)=
= E(X 2) − 2E(X )E(X ) + E(X )2 =
= E(X 2) − E(X )2 .
(2.52)
If E(X 2) is finite, than E(|X |) is finite too and fulfils the inequality
E(|X |)2 ≤ E(X 2) ,
and since E(X ) ≤ E(|X |) the variance is necessarily a non-negative quantity,
σ2(X ) ≥ 0.
If X and Y are independent and have finite variances, then we obtain
σ2(X + Y) = σ2(X ) + σ2(Y) ,
as follows readily by simple calculation:
E((X + Y)2
)= E
(X 2 + 2 X Y + Y2
)=
= E(X 2)
+ 2E(X )E(Y) + E(Y2)
= E(X 2)
+ E(Y2).
64 Peter Schuster
Here we have used the fact that the first centered moments vanish: E(X ) =
E(Y) = 0.
For two general – non necessarily independent – random variables X and
Y , the Cauchy-Schwarz inequality holds for the mixed expectation value:
E(XY)2 ≤ E(X 2)E(Y2) . (2.53)
If both random variables have finite variances, the covariance is defined by
Cov(X ,Y) = E((X − E(X )
)(Y − E(Y)
))=
= E(XY − X E(Y) − E(X )Y + E(X )E(Y)
)=
= E(XY) − E(X )E(Y) .
(2.54)
The covariance Cov(X ,Y) and the coefficient of correlation ρ(X ,Y),
Cov(X ,Y) = E(XY) − E(X )E(Y) and ρ(X ,Y) =Cov(X ,Y)
σ(X ) σ(Y), (2.54’)
are a measure of correlation between the two variables. As a consequence
of the Cauchy-Schwarz inequality we have −1 ≤ ρ(X ,Y) ≤ 1. If covariance
and correlation coefficient are equal to zero, the two random variables X and
Y are uncorrelated. Independence implies lack of correlation but the latter
is in general the weaker property.
In addition to the expectation value two more quantities are used to char-
acterize the center of probability distributions (figure 2.12): (i) The median
µ is the value at which the number of points of a distribution at lower values
of matches exactly the number of points at higher values as expressed in
terms of two inequalities,
P (X ≤ µ) ≥ 1
2and P (X ≥ µ) ≥ 1
2or
∫ µ
−∞dF (x) ≥ 1
2and
∫ +∞
µ
dF (x) ≥ 1
2,
(2.55)
where Lebesgue-Stieltjes integration is applied or in case of an absolutely
continuous distribution the condition simplifies to
P (X ≤ µ) = P (X ≥ µ) =
∫ µ
−∞f(x) dx =
1
2, (2.55’)
Stochastic Kinetics 65
Figure 2.12: Probability densities and moments. As an example of an asym-
metric distribution with highly different values for mode, median, and mean, the
lognormal density f(x) = 1√2π σ x
exp(−(lnx − ν)2/(2σ2)
)is shown. Parameters
values σ =√
ln 2 and ν = ln 2 were chosen and they yield µ = exp(ν − σ2/2) = 1
for the mode, µ = exp(ν) = 2 for the median and µ = exp(ν + σ2/2) = 2√
2
for the mean, respectively. The sequence mode<median<mean is charac-
teristic for distributions with positive skewness whereas the opposite sequence
mean<median<mode is found in cases of negative skewness (see also figure 2.14).
and (ii) the mode µ of a distribution is the most frequent value – the value
that is most likely to obtain through sampling – and it is obtained as the
maximum of the probability mass function for discrete distribution or as the
maximum of the probability density in the continuous case. An illustrative
example for the discrete case is the probability mass function of the scores for
throwing to dice (The mode in the lower part of figure 2.7 is µ = 7). A prob-
ability distribution may have more than one mode. Bimodal distributions
occur occasionally and then the two modes provide much more information
on the expected outcomes than mean or median (see also subsection 2.7.9).
For many purposes a generalization of the median from two to n equally
66 Peter Schuster
Figure 2.13: Definition and determination of quantiles. A quantile q with
pq = k/n defines a value xq at which the (cumulative) probability distribution
reaches the value F (xq) = pq corresponding to P (X < x) ≤ p. The figure shows
how the position of the quantile pq = k/n is used to determine its value xq(p).
In particular we use here the normal distribution Φ(x) as function F (x) and the
computation yields Φ(xq) = 12
(1 + erf
( xq−ν√2σ2
))= pq. Parameter choice: ν = 2,
σ2 = 12 , and for the quantile (n = 5, k = 2), yielding pq = 2/5 and xq = 1.8209.
sized data sets is useful. The quantiles are points taken at regular intervals
from the cumulative distribution function F (x) of a random variable X . Or-
dered data are divided into n essentially equal-sized subsets and accordingly,
(n− 1) points on the x-axis separate the subsets. Then, the k-th n-quantile
is defined by P (X < x) ≤ kn
= p or equivalently
F−1(p).= infx ∈ R : F (x) ≥ p and p =
∫ x
−∞dF (u) . (2.56)
In case the random variable has a probability density the integral simplifies
to p =∫ x−∞ f(u)du. The median is simply the value of x for p = 1
2. For
partitioning into four parts we haver the first or lower quartile at p = 14,
the second quartile or median at p = 12, and the third or upper quartile at
p = 34. The lower quartile contains 25% of the data, the median 50%, and
the upper quartile eventually 75%.
The interpretation of expectation value and variance or standard devia-
Stochastic Kinetics 67
tion is straightforward: The expectation value is the mean or average value
of a distribution and the variance measures the width.
2.5.2 Higher moments
Two other quantities related to higher moments are frequently used for a
more detailed characterization of probability distributions:20 (i) The skew-
ness
γ1 =µ3
µ3/22
=µ3
σ3=
E((X −E(X )
)3)
(E((X −E(X )
)2))3/2
(2.57)
and (ii) kurtosis, which is either defined as the fourth standardized moment
β2 or in terms of cumulants given as excess kurtosis, γ2,
β2 =µ4
µ22
=µ4
σ4=
E((X − E(X )
)4)
(E((X −E(X )
)2))2 and
γ2 =κ4
κ22
=µ4
σ4− 3 = β2 − 3 .
(2.58)
Skewness is a measure of the asymmetry of the probability density: curves
that are symmetric about the mean have zero skew, negative skew implies a
longer left tail of the distribution caused by fewer low values, and positive
skew is characteristic for a distribution with a longer right tail. Kurtosis
characterizes the degree of peakedness of a distribution. High kurtosis implies
a sharper peak and flat tails, low kurtosis in contrary characterizes flat or
round peaks and this tails. Distributions are called leptokurtic if they have a
positive excess kurtosis and therefore are sharper peak and a thicker tail than
the normal distribution which is taken as a reference with zero kurtosis
(see section 2.7), they are characterized as platykurtic if they have a negative
excess kurtosis, a broader peak and thinner tails (see figure 2.14). One
property of skewness and kurtosis being caused by definition is important
20In contrast to expectation value, variance and standard deviation, skewness and kur-
tosis are not uniquely defined and it is necessary therefore to check carefully the author’s
definitions when reading text from literature.
68 Peter Schuster
Figure 2.14: Skewness and kurtosis. The upper part of the figures illustrates
the sign of skewness with asymmetric density functions. The examples are taken
form the binomial distribution Bk(n, p): γ1 = (1 − 2p)/√np(1− p) with p = 0.1
(red), 0.5 (black; symmetric), and 0.9 (blue) with the values γ1 = 0.596, 0, −0.596.
Densities with different kurtosis are compared in the lower part of the figure: The
Laplace distribution (D, red), the hyperbolic secant distribution (S, orange), and
the logistic distribution (L, green) are leptokurtic with excess kurtosis values 3, 2,
and 1.2, respectively. The normal distribution is the reference curve with excess
kurtosis 0 (N, black). The raised cosine distribution (C, cyan), the Wigner semicir-
cle distribution (W, blue), and the uniform distribution (U, magenta) are platykur-
tic with excess kurtosis values of -0.593762, -1, and -1.2 respectively (The picture
is taken from http://en.wikipedia.org/wiki/Kurtosis, March 30,2011).
Stochastic Kinetics 69
to note: The expectation value, the standard deviation, and the variance are
quantities with dimensions, mass, length and/or time, whereas skewness and
kurtosis are dimensionless numbers.
The cumulants κi are the coefficients of a series expansion of the logarithm
of the characteristic function (see section 2.7), which in turn is the Fourier
transform of the probability density function, f(x):
φ(z) =
∫ +∞
−∞exp(ı.ı z x) f(x)dx and ln φ(z) =
∞∑
i=1
κi(ı.ı z)i
i!. (2.59)
The first five cumulants κi (i = 1, . . . , 5) expressed in terms of the expectation
value µ and the central moments µi (µ1 = 0) are
κ1 = µ
κ2 = µ2
κ3 = µ3
κ4 = µ4 − 3µ22
κ5 = µ5 − 10µ2µ3 .
We shall come back to the use of cumulants κi in the next section 2.6 when
we apply k-statistics in order to compute empirical moments from incomplete
data sets and in section 2.7 where we shall compare frequently used individual
probability densities.
70 Peter Schuster
2.6 Mathematical statistics
Although mathematical statistics is a discipline in its own right and would
require a separate course, we mention here briefly the basic concept which
is of general importance for every scientist.21 In practice, we can collect
data for all sample points of the sample space Ω only in very exceptional
cases. Otherwise we have to rely on limited samples as they are obtained in
experiments or in opinion polls. Among other things mathematical statistics
is dealing extensively with the problem of incomplete data sets.
For a given incomplete random sample (X1, . . . ,Xn) some function Z is
evaluated and yields a random variable Z = Z(X1, . . . ,Xn) as output. From
a limited set of data, x = x1, x2, . . . , xn, sample expectation values also
called sample means, sample variances, sample standard deviations or other
quantities are computed as estimators in the same way as if the sample
set would cover the complete sample space. In particular we compute the
sample mean
m =1
n
n∑
i=1
xi (2.60)
and the moments around the sample mean. For the sample variance we
obtain
m2 = =1
n
n∑
i=1
x2i −
(1
n
n∑
i=1
xi
)2
, (2.61)
21For the reader who is interested in more details on mathematical statistics we recom-
mend the textbook by Fisz [10] and the comprehensive treatise by Stuart and Ord [25, 26]
which is a new edition of Kendall’s classic on statistics. The monograph by Cooper [52]
is particularly addressed to experimentalists practicing statistics. A great variety of other
and equally well suitable texts are, of course, available in the rich literature on mathemat-
ical statistics.
Stochastic Kinetics 71
and for the third and fourth moments after some calculations
m3 =1
n
n∑
i=1
x3i −
3
n2
(n∑
i=1
xi
)(n∑
j=1
x2j
)+
2
n3
(n∑
i=1
xi
)3
(2.62a)
m4 =1
n
n∑
i=1
x4i −
4
n2
(n∑
i=1
xi
)(n∑
j=1
x3j
)+
+6
n3
(n∑
i=1
xi
)2( n∑
j=1
x2j
)− 3
n4
(n∑
i=1
xi
)4
. (2.62b)
These naıve estimators, mi (i = 2, 3, 4, . . .), contain a bias because the exact
expectation value µ around which the moments are centered is not known
and has to be approximated by the sample mean m. For the variance we
illustrate the systematic deviation by calculating a correction factor known
as Bessel’s correction but more properly attributed to Gauss [53, part 2,
p.161]. In order to obtain an expectation value for the sample moments
we repeat drawing of samples with n elements and denote their expectation
values by < mi >. In particular we have
m2 =1
n
n∑
i=1
x2i −
(1
n
n∑
i=1
xi
)2
=
=1
n
n∑
i=1
x2i −
1
n2
(n∑
i=1
x2i +
n∑
i,j=1, i6=jxi xj
)
=
=n− 1
n2
n∑
i=1
x2i −
1
n2
n∑
i,j=1, i6=jxi xj .
The expectation value is now of the form
< m2 > =n− 1
n
⟨1
n
n∑
i=1
x2i
⟩− 1
n2
⟨n∑
i,j=1, i6=jxi xj
⟩,
and by using < xixj >=< xi >< xj >=< xi >2 we find
< m2 > =n− 1
n
⟨1
n
n∑
i=1
x2i
⟩
− n(n− 1)
n2
⟨n∑
i=1
xi
⟩2
=
=n− 1
nα2 −
n(n− 1)
n2µ2 =
n− 1
n(α2 − µ2) ,
72 Peter Schuster
where α2 is the second moment about zero. Using the identity α2 = µ2 + µ2
we find eventually
< m2 > =n− 1
nµ2 and var(x) =
1
n− 1
n∑
i=1
(xi −m)2 . (2.63)
Further useful measures of correlation between pairs of random variables can
be derived straightforwardly: (i) the unbiased sample covariance
MXY =1
n− 1
n∑
i=1
(xi − m) (yi − m) , (2.64)
and (ii) the sample correlation coefficient
RXY =
∑ni=1 (xi − m) (yi − m)√
(∑n
i=1 (xi − m)2) (∑n
i=1 (yi − m)2). (2.65)
For practical purposes Bessel’s correction is often unimportant when the data
sets are sufficiently large but the recognition of the principle is important in
particular for statistical properties more involved than variances. Sometimes
a problem is encountered in cases where the second moment of a distribution,
µ2, does not exist that means it diverges. Then, computing variances from
incomplete data sets is also unstable and one may choose the mean absolute
deviation,
D(X ) =1
n
n∑
i=1
|Xi −m| , (2.66)
as a measure for the width of the distribution [54, pp.455-459], because it is
commonly more robust than variance or standard deviation.
In order to derive unbiased estimators for the cumulants of a probability
distribution, < ki >= κi, k-statistics has been invented [55, pp.99-100]. The
first four terms of k-statistics for n sample points are
k1 = m ,
k2 =n
n− 1m2 ,
k3 =n2
(n− 1)(n− 2)m3 and
k4 =n2((n+ 1)m4 − 3 (n− 1)m 2
2
)
(n− 1)(n− 2)(n− 3),
(2.67)
Stochastic Kinetics 73
which result from inversion of the relationships
< m > = µ ,
< m2 > =n− 1
nµ2 ,
< m3 > =(n− 1)(n− 2)
n2µ3 ,
< m 22 > =
(n− 1)((n− 1)µ4 + (n2 − 2n+ 3)µ 2
2
)
n3, and
< m4 > =(n− 1)
((n2 − 3n+ 3)µ4 + 3 (2n− 3)µ 2
2
)
n3.
(2.68)
The usefulness of these relations becomes evident in various applications.
The statistician computes moments and other functions from his empiri-
cal data sets, for example x1, . . . , xn or (x1, y1), . . . , (xn, yn) by means of
the equations (2.60) and (2.63) to (2.65). The main issue of mathematical
statistics, however, has always been and still is the development of inde-
pendent tests that allow for the derivation of information on the quality of
data. The underlying assumption commonly is that the values of the empir-
ical functions converge to the corresponding exact moments as the random
sample increases. Predictions on the reliability of the computed values are
made by means of a great variety of tools. We dispense from details which
are extensively treated in the literature [10, 25, 26].
74 Peter Schuster
2.7 Distributions, densities and generating functions
In this section we introduce a few frequently used probability distributions
and analyze their properties. For this goal we define and make use of two
auxiliary functions, which allow for the derivation of compact representa-
tions of the distributions and which provide convenient tools for handling
functions of probabilities. In particular we shall make use of probability gen-
erating functions, g(s), moment generating functions M(s) and characteristic
functions φ(s). The characteristic function φ(s) exists for all distributions
but we shall encounter cases where no generating function exists (see, for
example, the Cauchy-Lorentz distribution in subsection 2.7.8). In addition
to the three generating functions mentioned here other functions are in use
as well. An example is the cumulant generating function that is lacking a
uniform definition. It is either the logarithm of the moment generation or
the logarithm of the characteristic function.
2.7.1 Probability generating functions
Let X be a random variable taking only non-negative integer values with a
probability distribution given by
P (X = j) = aj ; j = 0, 1, 2, . . . . (2.69)
A dummy variable s is introduced and the probability generating func-
tion is expressed by an infinite power series
g(s) = a0 + a1 s + a2 s2 + . . . =
∞∑
j=0
aj sj . (2.70)
As we shall show later, the full information on the probability distribution is
encapsulated in the coefficients aj (j ≥ 0). In most cases s is a real variable,
although it can be of advantage to consider also complex s. Recalling that∑
j aj = 1 we verify easily that the power series (2.70) converges for |s| ≤ 1:
|g(s)| ≤∑
j
|aj| · |s|j ≤∑
j
aj = 1 , for |s| ≤ 1 .
Stochastic Kinetics 75
For |s| < 1 we can differentiate the series term by term to calculate the
derivatives of the generating function g
g′(s) =dg
ds= a1 + 2 a2s + 3 a3s
2 + . . . =∞∑
n=1
n ansn−1 ,
g′′(s) =d2g
ds2 = 2 a2 + 6 a3s + . . . =
∞∑
n=2
n(n− 1) ansn−2 ,
and, in general, we have
g(j) =djg
dsj=
∞∑
n=j
n(n− 1) . . . (n− j + 1) ansn−j =
∞∑
n=j
(n
j
)j ! an s
n−j .
Setting s = 0, all terms vanish except the constant term
g(j)(0) = j ! aj or aj =1
j !g(j)(0) .
In this way all aj ’s may be obtained by consecutive differentiation from the
generating function and alternatively the generating function can be deter-
mined from the known probability distribution.
Putting s = 1 in g′(s) and g′′(s) we can compute the first and second
moments of the distribution of X :
g′(1) =∞∑
n=0
n an = E(X ) ,
g′′(1) =
∞∑
n=0
n2 an −∞∑
n=0
n an = E(X 2) − E(X ) and,
E(X ) = g′(1) and E(X 2) = g′(1) + g′′(1) .
(2.71)
We summarize: The probability distribution of a non-negative integer val-
ues random variable can be converted into a generating function without
loosing information. The generating function is uniquely determined by the
distribution and vice versa.
76 Peter Schuster
2.7.2 Moment generating functions
Basis of the moment generating function is the series expansion of the expo-
nential of the random variable X
eX s = 1 + X s +X 2
2!s2 +
X 3
3!s3 . . . .
The moment generating function allows for direct computation of the
moments of a probability distribution as defined in equation (2.69) since we
have:
MX (s) = E(eX s) = 1 + µ1 s +µ2
2!s2 +
µ3
3!s3 . . . , (2.72)
wherein µi is the i-th raw moment. The moments are obtained by differen-
tiating MX (s) i times with respect to s and then setting s = 0
E(X n) = µn = M(n)X =
dnMXdsn
∣∣∣∣s=0
.
A probability distribution thus has (at least) as many moments as many times
the moment generating function can be continuously differentiated (see also
characteristic function in subsection 2.7.3) . If two distributions have the
same moment generating functions they are identical at all points:
MX (s) = MY(s) ⇐⇒ FX (x) = FY(x) .
This statement, however, does not imply that two distributions are iden-
tical when they have the same moments, because in some cases the mo-
ments exist but the moment generating function does not since the limit
limn→∞∑n
k=0µk s
k
k!diverges as, for example, in case of the logarithmic nor-
mal distribution.
2.7.3 Characteristic functions
Like the moment generating function the characteristic function φ(s) of
a random variable X completely describes the probability distribution F (x).
It is defined by
φ(s) =
∫ +∞
−∞exp(ı.ı s x) dF (x) =
∫ +∞
−∞exp(ı.ı s x) f(x) dx , (2.59’)
Stochastic Kinetics 77
where the integral over dF (x) is of Riemann-Stieltjes type. In case a probabil-
ity density f(x) exists for the random variable X the characteristic function
is (almost) the Fourier transform of the density.22 From equation (2.59’) fol-
lows the useful expression φ(s) = E(eı.ısX ) that we shall use, for example, in
solving the equations for stochastic processes (subsection 3.4).
The characteristic function exists for all random variables since it is an in-
tegral of a bounded continuous function over a space of finite measure. There
is a bijection between distribution functions and characteristic functions:
φX (s) = φY(s) ⇐⇒ FX (x) = FY(x) .
If a random variable X has moments up to k-th order, then the characteristic
function φ(x) is k times continuously differentiable on the entire real line and
vice versa if a characteristic function φ(x) has a k-th derivative at zero, then
the random variable X has all moments up to k if k is even and up to k − 1
if k is odd:
E(X k) = (−ı.ı)k dkφ(s)
dsk
∣∣∣∣s=0
anddkφ(s)
dsk
∣∣∣∣s=0
= ı.ık E(X k) . (2.73)
An interesting example is presented by the Cauchy distribution (subsec-
tion 2.7.8) with φ(s) = exp(|s|): It is not differentiable at s = 0 and the
distribution has no moments including the expectation value.
The moment generating function is related to the probability generating
function g(s) (subsection 2.7.1) and the characteristic function φ(s) (subsec-
tion 2.7.3) by
g (es) = E(eX s)
= MX (s) and φ(s) = Mı.ıX (s) = MX (ı
.ıs) .
All three generating functions are closely related but it may happen that not
all three are existing. As said, characteristic functions exist for all probability
distributions.
22The difference between the Fourier transform ψ(s) and the characteristic function
φ(s),
ψ(s) =
∫ +∞
−∞exp(−2π ı
.ı s x) dx and φ(s) =
∫ +∞
−∞exp(+ı
.ı s x) dx ,
is only a matter of the factor 2π and the choice of the sign.
78Peter
Schuster
able 2.2: Comparison of several common probability densities. Abbreviation and notations used in the table
Γ(r, x) =∫∞x sr−1e−sds and γ(r, x) =
∫ x0 s
r−1e−sds are the upper and lower incomplete gamma function, respectively;
a, b) = B(x; a, b)/B(1; a, b) is the regularized incomplete beta function with B(x; a, b) =∫ x0 s
a−1(1 − s)b−1ds. For more
details see [56].
Name Parameters Support pmf / pdf cdf Mean Median Mode Variance Skewness Kurtosis mgf cf
oisson α > 0 ∈ R k ∈ N0 αk
k!e−α Γ(bk+1c,α)
bkc! α ≈ bα+ 13− 0.02
αc dαe−1 α 1√
α1α
exp(α(es−1)
)exp
(α(eı
.ıs−1)
)
α)
Binomial n ∈ N k ∈ N0
(nk
)pk(1−p)n−k I1−p = (n−k, 1+k) np bnpc or dnpe b(n+1)pc or np(1−p)
1−2p√np(1−p)
1−6p(1−p)np(1−p)
(1−p+ps)n (1−p+pı.ıs)n
n, p) p ∈ [0, 1] p ∈ [0, 1] b(n+1)pc−1
Normal ν ∈ R x ∈ R1√
2πσ2e− (x−ν)2
2σ2 12
(1+erf
(x−ν√2σ2
))ν ν ν σ2 0 0 exp(νs+ 1
2σ2s2) exp(ı
.ıνs− 1
2σ2s2)
ν, σ) σ2 ∈ R+
hi-square k ∈ N x ∈ [0,∞[ xk2−1
e− x
2
2k2 Γ
(k2
)γ( k
2, x2
)
Γ( k2
)k ≈ k
(1− 2
9k
)3maxk−2, 0 2k
√8k
12k
(1−2s)− k
2 (1−2ı.ıs)
− k2
(k) for s < 12
Logistic a ∈ R, b > 0 x ∈ R
sech2((x−a)/2b
)
4b1
1+exp(−(x−a)/b
) a a a π2b2/3 0 4.2 πbs eassin(πbs)
ı.ıπbs eassin(ı
.ıπbs)
Laplace ν ∈ R x ∈ R12b
e− |x−ν|
b
12
e− ν−x
b ,
x < a
1− 12
e− x−ν
b ,
x ≥ a
ν ν ν 2b2 0 3exp(νs)
1−b2s2exp(ı
.ıνs)
1−b2s2
b > 0 for |s| < 1b
Uniform a < b x ∈ [a, b]
1b−a
, x ∈ [a, b]
0 otherwise
0, x < a
x−ab−a
, x ∈ [a, b]
1, x ≥ b
a+b2
a+b2
m ∈ [a, b](b−a)2
120 − 6
5ebs−eas
(b−a)seı.ıbs−eı
.ıas
ı.ı(b−a)s
a, b) a, b ∈ R
Cauchy x0 ∈ R x ∈ R1
πγ
(1+
(x−x0
γ
)2) 1
πarctan
(x−x0
γ
)– x0 x0 – – – – exp(ı
.ıx0s−γ|s|)
γ ∈ R+
Stochastic Kinetics 79
Figure 2.15: The Poisson probability density. Two examples of Poisson
distributions, πk(α) = αke−α/k!, with α = 1 (black) and α = 5 (red ) are shown.
The distribution with the larger α has the mode shifted further to the right and a
thicker tail.
Before entering the discussion of individual common probability distribu-
tions we present an overview over the most important characteristics of these
distributions in table 2.2.
2.7.4 The Poisson distribution
The Poisson distribution, named after the French physicist and mathemati-
cian Simeon Denis Poisson, is a discrete probability distribution. It is used
to model the number of independent events occurring in a given interval of
time. In physics and chemistry the Poisson process is the stochastic basis of
first order processes, for example radioactive decay or irreversible first order
reactions and the Poisson distribution is the probability distribution underly-
ing the time course of particle numbers, N(t). Despite its major importance
in physics and biology the Poisson distribution, π(α), is a fairly simple math-
ematical object. It contains a single parameter only, the real valued positive
80 Peter Schuster
number α:23
P (X = k) = πk(α) =e−α
k!αk ; k ∈ N0 . (2.74)
As an exercise we leave to verify the following properties:
∞∑
k=0
πk = 1 ,
∞∑
k=0
k πk = α and
∞∑
k=0
k2 πk = α + α2
Examples of Poisson distributions with two different parameter values, α = 1
and 5, are shown in figure 2.15.
By means of a Taylor expansion we can find the generating function of
the Poisson distribution,
g(s) = eα(s−1) . (2.75)
From the generating function we calculate easily
g′(s) = α eα(s−1) and g′′(s) = α2 eα(s−1) .
Expectation value and second moment follow straightforwardly from equa-
tion(2.71):
E(X ) = g′(1) = α , (2.75a)
E(X 2) = g′(1) + g′′(1) = α + α2 , and (2.75b)
σ2(X ) = α . (2.75c)
Both, the expectation value and the variance are equal to the parameter α
and hence, the standard deviation amounts to σ(X ) =√α. This remarkable
property of the Poisson distribution is not limited to the second moment.
The factorial moments, 〈X r〉f , fulfil the equation
〈X r〉f = E(X (X − 1) . . . (X − r + 1)
)= αr . (2.75d)
23In order to solve the problems one requires knowledge on some basic infinite series:
e =∑∞
n=01n! , e
x =∑∞
n=0xn
n! for |x| <∞, e = limn→∞(1 + 1n )n, e−α = limn→∞(1− α
n )n.
Stochastic Kinetics 81
Figure 2.16: The binomial probability density. Two examples of binomial
distributions, Bk(n, p) =(nk
)pk(1 − p)n−k, with n = 10, p = 0.5 and p = 0.1 are
shown. The former distribution is symmetric with respect to the expectation value
E(Bk) = n/2.
2.7.5 The binomial distribution
The binomial distribution, B(n, p), characterizes the cumulative result of
independent trials with two-valued outcomes, for example, successive coin
tosses as we discussed in sections 1.2 and 2.2, Sn =∑n
i=1Xi. The Xi’s are
commonly called Bernoulli random variables named after the Swiss mathe-
matician Jakob Bernoulli, and accordingly, the sequence of events is known
as a Bernoulli process, and the corresponding random variable is said to have
a Bernoulli or binomial distribution:
P (Sn = k) = Bk(n, p) =
(n
k
)pk qn−k , q = 1− p and (k ∈ N0, k ≤ n) .
(2.76)
Two examples are shown in figure 2.16. The distribution with p = 0.5 is
symmetric with respect to k = n/2.
The generating function for the single trial is g(s) = q + ps. Since we
have n independent trials the complete generating function is
g(s) = (q + ps)n =
n∑
k=0
(n
k
)qn−k pk sk . (2.77)
82 Peter Schuster
From the derivatives of the generating function,
g′(s) = n p (q + ps)n−1 and g′′(s) = n(n− 1) p2 (q + ps)n−2 ,
we compute readily expectation value and variance:
E(Sn) = g′(1) = n p , (2.77a)
E(S2n) = g′(1) + g′′(1) = np + n2p2 − np2 = n p q + n2 p2 , (2.77b)
σ2(Sn) = n p q , and (2.77c)
σ(Sn) =√npq . (2.77d)
For p = 1/2, the case of the unbiased coin, we have the symmetric binomial
distribution with E(Sn) = n/2, σ2(Sn) = n/4, and σ(Sn) =√n/2. We note
that the expectation value is proportional to the number of trials, n, and the
standard deviation is proportional to its square root,√n.
The binomial distribution B(n, p) can be transformed into the Poisson
distribution π(α) in the limit n→∞. In order to show this we start from
Bk(n, p) =
(n
k
)pk (1− p)n−k , 0 ≤ k ≤ n (k ∈ N0, k ≤ n) .
The symmetry parameter p is assumed to vary with n, p(n) = α/n for n ≥ 1,
and thus we have
Bk
(n,α
n
)=
(n
k
) (αn
)k (1− α
n
)n−k, (k ∈ N0, k ≤ n) .
We let n go to infinity for fixed k and start with B0(n, p):
limn→∞
B0
(n,α
n
)= lim
n→∞
(1− α
n
)n= e−α .
Now we compute the ratio of two consecutive terms, Bk+1/Bk:
Bk+1
(n, α
n
)
Bk
(n, α
n
) =n− kk + 1
·(αn
)·(1− α
n
)−1
=α
k + 1·[(
n− kn
)·(1− α
n
)−1].
Both terms in the square brackets converge to one as n→∞, and hence we
find:
limn→∞
Bk+1
(n, α
n
)
Bk
(n, α
n
) =α
k + 1.
Stochastic Kinetics 83
From the two results we compute all terms starting from the limit value of
B0 and find limB1 = α exp(−α), limB2 = α2 exp(−α)/2!, . . ., limBk =
αk exp(−α)/k!. Accordingly we have verified Poisson’s limit law:
limn→∞
Bk
(n,α
n
)= πk(α) , k ∈ N0 . (2.78)
It is worth remembering that the limit was performed in a peculiar way
since the symmetry parameter p(n) was shrinking with increasing n and as
a matter of fact vanished in the limit of n→∞.
2.7.6 The normal distribution
The normal distribution is of central importance in probability theory be-
cause many distributions converge to it in the limit of large numbers. It is
basic for the estimate of statistical errors and thus we shall discuss it in some
detail. The general normal distribution has the density24
ϕ(x) =1√2π σ
e−(x−ν)2
2σ2 with
∫ +∞
−∞ϕ(x) dx = 1 , (2.79)
and the corresponding random variable X has the moments E(X ) = ν,
σ2(X ) = σ2, and σ(X ) = σ.
For many purposes it is convenient to use the normal density in centered
and normalized form:
ϕ(x) =1√2π
e−x2/2 with
∫ +∞
−∞φ(x) dx = 1 , (2.79’)
In this form we have E(X ) = 0, σ2(X ) = 1, and σ(X ) = 1. Integration of
the density yields the distribution function
P (X ≤ x) = Φ(x) =1√2π
∫ x
−∞e−
u2
2 du . (2.80)
The function Φ(x) is not available in analytical form, but it can be easily
formulated in terms of the error function, erf(x). This function as well as its
24The notations applied here for the normal distribution are: Φ(x; ν, σ) for the cumu-
lative distribution and ϕ(x; ν, σ) for the density, Commonly, the parameters, (ν, σ) are
omitted.
84 Peter Schuster
complement, erfc(x), defined by
erf(x) =2√π
∫ x
0
e−t2
dt and erfc(x) =2√π
∫ ∞
x
e−t2
dt ,
are available in tables and in standard mathematical packages.25 Examples of
the normal density ϕ(x) with different values of the standard deviation σ and
one example of the integrated distribution Φ(x) are shown in figure 2.17. The
normal distribution is also used in statistics to define confidence intervals:
68.2% of the data points lie within an interval ν±σ, 95.4% within an interval
ν ± 2σ, and eventually 99,7% with an interval ν ± 3σ.
The normal density function ϕ(x) has, among other remarkable prop-
erties, derivatives of all orders. Each derivative can be written as product
of ϕ(x) by a polynomial, of the order of the derivative, known as Hermite
polynomial. This existence of all derivatives makes the bell-shaped curve
x → ϕ(x) particularly smooth. In addition, the function ϕ(x) decreases to
zero very rapidly as |x| → ∞.
The normal density is particularly smooth and can be differentiated ar-
bitrarily often. This makes the moment generating function of the normal
distribution especially attractive (see subsection 2.7.2). M(s) can be ob-
tained directly by integration
M(s) =
∫ +∞
−∞ex s ϕ(x) dx =
∫ +∞
−∞exp
(x s − x2
2
)dx =
=
∫ +∞
−∞e(
s2
2− (x−s)2
2) dx = es
2/2
∫ +∞
−∞ϕ(x− s) dx =
= es2/2 .
(2.81)
All raw moments of the normal distribution
µ(n) =
∫ +∞
−∞xn ϕ(x) dx (2.82)
They can be obtained, for example, by successive differentiation of M(s) with
respect to s (subsection 2.7.2). The moments are obtained more efficiently
25We remark that erf(x) and erfc(x) are not normalized in the same way as the normal
density: erf(x) + erfc(x) = 2√π
∫∞0
exp(−t2)dt = 1, but∫∞0ϕ(x)dx = 1
2
∫ +∞−∞ ϕ(x)dx = 1
2 .
Stochastic Kinetics 85
Figure 2.17: The normal probability density. In the upper part we show the
density function of the normal distribution, ϕ(x) = exp[−(x− ν)2/(2σ2)]/(√
2πσ)
for ν = 5 and σ = 0.5 (red), 1.0 (black), and 2.0 (blue). The smaller the
standard deviation σ, the sharper is the curve. The lower part shows the den-
sity function ϕ(x) (red) together with the distribution Φ(x) =∫ x−∞ ϕ(u)du =
0.5[1 + erf
(x/(√
2σ)− ν)]
(black) for ν = 5 and σ = 0.5.
by expanding the first and the last expression in the previous equation (2.81)
86 Peter Schuster
Figure 2.18: A fit of the normal distribution to the binomial distribu-
tion. The curves represent normal densities (red), which were fit to the points of
the binomial distribution (black). The three examples. Parameter choice for the
binomial distribution: (n = 4, p = 0.5), (n = 10, p = 0.5), and (n = 10, p = 0.1),
for the upper, middle, and lower plot, respectively.
Stochastic Kinetics 87
in a power series of s,
∫ +∞
−∞
(1 + x s +
(x s)2
2!+ . . .+
(x s)n
n!+ . . .
)ϕ(x) dx =
= 1 +s2
2+
1
2!
(s2
2
)2
+ . . .+1
n!
(s22)n
+ . . . ,
or expressed in terms of the moments µ(n),
∞∑
n=0
µ(n)
n!sn =
∞∑
n=0
1
2n n!s2n ,
from which we compute the moments of Φ(x) by putting the coefficients of
the powers of s equal on both sides of the expansion and find for n ≥ 1:26
µ(2n−1) = 0 and µ(2n) =(2n)!
2n n!. (2.83)
All odd moments vanish because of symmetry. In case of the fourth moment,
kurtosis, a kind of standardization is common, which assigns zero excess
kurtosis, γ2 = 0 to the normal distribution. In other words, excess kurtosis
monitors peak shape with respect to the normal distribution: Positive excess
kurtosis implies peaks that are sharper than the normal density, negative
excess kurtosis peaks that are broader than the normal density.
Multivariate normal distribution. For general applications it is worth
considering the normal distribution in multiple dimensions. The random
variable X is replaced by a random vector,27 ~X = (X1, . . . ,Xn) with the joint
26The definite integrals are:
∫ +∞−∞ xn exp(−x2)dx =
√π n = 0
0 n ≥ 1; odd
(n−1)!!2n/2
√π n ≥ 2; even
,
where (n− 1)!! = 1 · 3 · . . . · (n− 1).27The notation we use here for vectors is ‘~· ’ or bold-face letters. Matrices are denoted
either by upper-case Roman or upper-case Greek letters. Vectors are commonly under-
stood as columns or one-column matrices. Transposition, indicated by ‘ · ′ ’, converts them
into row vectors or one-row matrices and vice versa transposition of a row vector yields a
column vector. If no confusion is possible row vectors are used instead of column vectors
in order to save space.
88 Peter Schuster
probability distribution
P (X1 = x1, . . . ,Xn = xn) = p(x1, . . . , xn) = p(x) .
This multivariate the normal probability density can be written as
ϕ(x) =1√
(2π)n |Σ|exp(−1
2(x− ν)′ Σ−1 (x− ν)
).
The vector ν consists of the (raw) first moments along the different coordi-
nates, ν = (ν1, . . . , νn) and the variance-covariance matrix Σ contains the n
variances in the diagonal and the covariances are combined as off-diagonal
elements
Σ =
σ2(X1) Cov(X1,X2) . . . Cov(X1,Xn)Cov(X2,X1) σ2(X2) . . . Cov(X2,Xn)
......
. . ....
Cov(Xn,X1) Cov(Xn,X2) . . . σ2(Xn)
,
which is symmetric by definition of covariances: Cov(Xi,Xj) = Cov(Xj,Xi).Mean and variance are given by ν = ν and the variance-covariance matrix
Σ, the moment generating function expressed in the dummy vector variable
s = (s1, . . . , sn) is of the form
M(s) = exp (ν ′s) · exp
(1
2s′Σs
),
and, finally, the characteristic function is given by
φ(s) = exp (ı.ıν ′s) · exp
(−1
2s′Σs
)
Without showing the details we remark that this particulary simple char-
acteristic function implies that all moments higher than order two can be
expressed in terms of first and second moments, in particular expectation
values, variances, and covariances. To give an example that we shall require
later in subsection 3.5.2, the fourth order moments can be derived from
E(X1,X2,X3,X4) = σ12σ34 + σ41σ23 + σ13σ24 and
E(X 41 ) = 3 σ2
11 .(2.84)
Stochastic Kinetics 89
Normal and binomial distribution. The normal distribution is of general
importance since it may be derived, for example, from the binomial distri-
bution
Bk(n, p) =
(n
k
)pk (1− p)n−k , 0 ≤ k ≤ n ,
through extrapolation to large values of n at constant p.28 As a matter of
fact the expression normal distribution originated from the idea that many
distributions can be transformed in a natural way for large n to yield the
distribution Φ(x). The transformation from the binomial distribution to the
normal distribution is properly done in two steps (see also [5, pp.210-217]):
(i) At first we make the binomial distribution comparable by shifting the
maximum towards x = 0 and adjusting the width (figures 2.18 and 2.19).
For 0 < p < 1 and q = 1 − p we define a new variable ξk in order to
replace the discrete variable k.29 The new variables, X ∗k and S∗
n =∑n
k X ∗k , are
centered and adjusted to the standard Gaussian, ϕ(x) = exp(−x2/2)/√
2π,
by making use of the expectation value, E(Sn) = np, and the standard
deviation, σ(Sn) =√npq, of the binomial distribution:
ξk =k − np√npq
; 0 ≤ k ≤ n .
We assume now an arbitrary but fixed positive constant c. In the range of k
defined by |ξk| ≤ c we approximate
(n
k
)pkqn−k ≈ 1√
2πnpqe−ξ
2k/2 .
The convergence is uniform with respect to k in the range specified above.
(ii) The limit n→∞ is performed by means of the deMoivre-Laplace theo-
rem which proofs convergence of the (centered and adjusted) distribution of
the random variable S∗n towards the normal distribution Φ(x) on any finite
28This is different from the extrapolation in the previous subsection because the limit
limn→∞Bk(n, α/n) = πk(α) leading to the Poisson distribution was performed in the limit
of vanishing p = α/n.29The new variable ξk depends also on n, but for short we dispense from a second
subscript.
90 Peter Schuster
Figure 2.19: Normalization of the binomial distribution. The figure shows
a symmetric binomial distribution B(20, 12), which is centered around µ = ν = 10
(black). The transformation yields a binomial distribution centered around the
origin with unit variance: σ = σ2 = 1 (red). The grey and the pink continu-
ous curves are normal distributions ϕ = exp(− (x − ν)2/(2σ2)
)/√
2πσ2 with the
parameters (ν = 10, σ2 = np(1− p) = 5) and (ν = 0, σ2 = 1), respectively.
interval. For an arbitrary constant interval ] a, b] with a < b, we have
limn→∞
P
((Sn −
np√npq
)∈ ]a, b]
)=
1√2π
∫ b
a
e−x2/2 dx . (2.85)
In the proof the definite integral∫ baϕ(x)dx is partitioned into n (small) seg-
ments like in Riemannian integration: The segments still reflect the discrete
distribution. In the limit n→∞ the partition becomes finer and eventually
converges to the continuous function described by the integral. A comparison
of figures 2.18 and 2.19 shows that the convergence is particularly effective in
the symmetric case, p = q = 0.5, where only minor differences are observable
already for n = 20. (see also the next subsection 2.7.7).
Stochastic Kinetics 91
2.7.7 Central limit theorem and the law of large numbers
The central limit theorem, in essence, is a more general formulation of the just
discussed convergence of the binomial distribution to the normal distribution
in the limit of large n, as expressed by the deMoivre-Laplace theorem (2.85).
The outcome of a number of successive trials described by random variables
Xj is summed up to yield the random variable
Sn = X1 + X2 + . . .+ Xn , n ≥ 1 .
The individual random variables Xj are assumed to be independent and
identically distributed. By identically distributed we mean that the variables
have a common distribution which need not be specified. In the previous
subsection we considered the binomial distribution as the outcome of succes-
sive Bernoulli trials. Here, the only assumptions concerning the distributions
P (Xj ≤ x) = Fj(x) and P (Sn ≤ x) = Fn(x) are that the means µ and the
variances σ2 of all random variables Xj are the same and finite.
The first step towards the central limit theorem is again a transformation
of variables shifting the maximum to the origin and adjusting the width of
the distribution:
X ∗j =
Xj − E(Xj)σ(Xj)
and S∗n =
Sn − E(Sn)σ(Sn)
=1√n
n∑
j=1
X ∗j . (2.86)
The values for the individual first and second moments are: E(Xj) = µ,
E(Sn) = nµ, σ(Xj) = σ, and σ(Sn) =√nσ. The transformation is in full
analogy to the one performed with the random variable Sn of the binomial
distribution in the previous subsection and eventually we obtain:
E(X ∗j ) = 0 , σ2(X ∗
j ) = 1 ,
E(S∗n) = 0 , σ2(S∗
n) = 1 .(2.87)
If I is the finite interval ]a, b] then F (I) = F (b)− F (a) for any distribution
function F and we can write the deMoivre-Laplace theorem in the compact
form
limn→∞
Fn(I) = Φ(I) . (2.88)
92 Peter Schuster
Under the generalized conditions the central limit theorem states that for
any interval ]a, b] with a < b the limit
limn→∞
P
(Sn − nµ√nσ
∈ ]a, b]
)=
1√2π
∫ b
a
e−x2/2 dx. (2.89)
is fulfilled.
A proof of the central limit theorem makes use of the characteristic func-
tion for the unit normal distribution (ν = 0, σ2 = 1):
φ(s) = exp(ı.ıν s − 1
2σ2 s2) = e−s
2/2 . (2.90)
Assume that for every s the characteristic function for Sn converges to the
characteristic function φ(s),
limn→∞
φn(s) = φ(s) = e−s2/2 .
Since φn(s) are the characteristic functions associated with an arbitrary dis-
tribution function Fn(x) follows for every x
limn→∞
Fn(x) = Φ(x) =1√2π
∫ x
−∞e−u
2/2du , (2.91)
and in particular the deMoivre-Laplace theorem follows as a special case.
Characteristic functions h(s) of random variables X with mean zero,
ν = 0, and variance one, σ2 = 1, have the Taylor expansion
h(s) = 1 − s2
2
(1 + ε(s)
)with lim
s→0ε(s) = 0
at s = 0 and truncation after the second term. The proof is straightforward
and starts from the full Taylor expansion up to the second term:
h(s) = h(0) + h′(0) s +h′′(0)
2s2(1 + ε(s)
).
From h(s) = E(eı.ısX ) follows by differentiation
h′(s) = E(ı.ıX eı.ısX
)and h′′(s) = E
(−X 2 eı
.ısX )
Stochastic Kinetics 93
and hence h′(0) = E(ı.ıX ) = 0 and h′′(0) = E(−X 2) = −1 yielding the
equation given above. Next we consider the characteristic function of S∗n:
E(exp(ı
.ısS∗
n))
= E
(exp(ı.ıs(∑n
j=1X ∗j
)/√n))
The right hand side of the equation can be factorized and yields
E(eı
.ıs(∑n
j=1 X ∗j
)/√n)
= E(eı
.ısX ∗
j /√n)n
= h( s√
n
)n,
where h(s) is the characteristic function of the random variable Xj. Insertion
into the expression for the Taylor series yields now
h( s√
n
)= 1 − s2
2n
(1 + ε
( s√n
)).
Herein s is fixed and n is approaching infinity:
limn→∞
E(eı
.ısS∗
n
)= lim
n→∞
(1 − s2
2n
(1 + ε
( s√n
)))n
= e−s2/2 . (2.92)
For taking the limit in the last step of the derivation we recall the summation
of infinite series:
limn→∞
(1− αn
n
)n= e−α for lim
n→∞αn = α . (2.93)
This is a stronger result than the convergence of the conventional exponential
series, limn→∞(1− α/n)n = e−α. Thus, we have shown that the characteris-
tic function of the normalized sum of random variables, S∗n, converges to the
characteristic function of the unit normal distribution and therefore by equa-
tion (2.91) the distribution Fn(x) converges to the unit normal distribution
Φ(x) and the validity of (2.88) follows straightforwardly.
Summarizing the results of this part we conclude that the distribution
of the sum Sn of every sequence of n independent random variables Xj with
finite variance converges to the normal distribution for n→∞. This conver-
gence is independent of the particular distribution of the random variables
provided all variables follow the same distribution and they have finite mean
µ and variance σ2.
94 Peter Schuster
Contrasting the rigorous mathematical derivation, simple practical appli-
cations used in large sample theory of statistics turn the limit theorem
encapsulated in equation (2.92) into a rough approximation
P (σ√nx1 < Sn − nm < σ
√nx2) ≈ Φ(x2) − Φ(x1) (2.94)
or for the spread around the sample mean m by setting x1 = −x2
P (|Sn − nm| < σ√nx) ≈ 2Φ(x) − 1 . (2.94’)
For practical purposes equation (2.94) has been used in pre-computer time
together with extensive tabulations of the functions Φ(x) and Φ−1(x), which
are still found in statistics textbooks.
The law of large numbers can be derived as a straightforward conse-
quence of the central limit theorem (2.92) [5, pp.227-233]. For any fixed but
arbitrary constant c > 0 we have
limn→∞
P
(∣∣∣∣Snn− µ
∣∣∣∣ < c
)= 1 (2.95)
Related to and a consequence of equation (2.95) is Chebyshev’s inequality
named after the Russian mathematician Pafnuty Chebyshev for random vari-
ables X that have a finite second moment:
P (|X | ≥ c) ≤ E(X 2)
c2(2.96)
is fulfilled for any constant c. We dispense here from a proof that is found
in [5, pp.228-233].
Equation (2.95) is extended to a sequence of independent random vari-
ables Xj with different expectation values and variances, E(Xj) = µ(j) and
σ2(Xj) = σ 2j , with the restriction that there exists a constant Σ2 < ∞ such
that σ 2j ≤ Σ2 is fulfilled for all Xj . Then we have for each c > 0:
limn→∞
P
(∣∣∣∣X1 + . . .+ Xn
n− µ(1) + . . .+ µ(n)
n
∣∣∣∣ < c
)= 1 . (2.97)
For the purpose of illustration we consider a Bernoulli sequence of coin tosses
as described by a binomial distribution. First we rewrite equation (2.97) by
Stochastic Kinetics 95
the introduction of centered random variables Xj = Xj − µ(j) – we note that
the variables Xj are not normalized with respect to variance, X ∗j = Xj/σ(Xj)
– and their sum Sn =∑n
i=1 Xj with
E(Sn) = 0 and E((Sn)2
)= σ2(Sn) =
n∑
i=1
σ2(Xi) =
n∑
i=1
σ2i ,
make use of boundedness of the variances, σ 2j ≤ Σ2,
E((Sn)2
)≤ nΣ and E
(( Snn
)2)≤ Σ
n,
and find through application of Chebyshev’s inequality (2.96)
P
(∣∣∣∣∣Snn
∣∣∣∣∣ ≥ c
)≤ E
((Sn)2)
)
c2≤ Σ
n c2.
Hence the probability P above converges to zero in the limit n→∞. Second,
insertion of the specific data for the Bernoulli series yields
P
(∣∣∣∣Snn− p∣∣∣∣ ≥ c
)≤ p(1− p)
n c2≤ 1
4nc2,
where the last inequality result from p(1−p) ≤ 1/4 for 0 ≤ p ≤ 1. In order to
make an error smaller than ε, the number of trials has to exceed n ≥ 1/(4c2ε).
Since the Chebyshev inequality provides a rather crude estimate we present
also a sharper bound that leads to the approximation
P
(∣∣∣∣Sn − n p√n p(1− p)
∣∣∣∣ ≥√
n
p(1− p) c)≈ 2
(1− Φ
(√n
p(1− p) c))
.
Eventually, we put η = c√n/√
p (1− p) and find
2(I − Φ(η) ≤ ε
)or Φ(η) ≥ 1− ε
2,
which is suitable for numerical evaluation.
The main message of the law of large numbers is that for a sufficiently
large numbers of independent events the statistical errors in the sum will
vanish and the mean converges to the exact expectation value. Hence, the
law of large numbers provides the basis for the assumption of convergence in
mathematical statistics.
96 Peter Schuster
2.7.8 The Cauchy-Lorentz distribution
The Cauchy-Lorentz distribution is named after the French mathematician
Augustin Louis Cauchy and the Dutch physicist Hendrik Antoon Lorentz
and is important in mathematics and in particular in physics where it occurs
as the solution to the differential equation for forced resonance. In spec-
troscopy the Lorentz curve is used for the description of spectral lines that
are homogeneously broadened. The Cauchy density function is of the form
f(x) =1
π γ· 1
1 +(x−x0
γ
)2 =1
π· γ
(x− x0)2 + γ2, (2.98)
which yields the cumulative distribution function
F (x) =1
πarctan
(x− x0
γ
)+
1
2. (2.99)
The two parameters define the position of the peak, x0, and the width of the
distribution, γ (figure 2.20). The peak height or amplitude is 1/(πγ). The
function F (x) can be inverted
F−1(p) = x0 + γ tan(π(p− 1
2
))(2.99’)
and we obtain for the quartiles and the median the values: (x0−γ, x0, x0+γ).
The characteristic function of the Cauchy distribution is given by
φX (s) = E(eı.ıX s) =
∫ ∞
−∞f(x)eı
.ıs x dx = exp(ı
.ıx0 s − γ|s|) , (2.100)
which is the Fourier transform of the probability density that in turn can be
derived from
f(x) =1
2π
∫ ∞
−∞φX (s)e−ı
.ıx s ds .
We remark that the conventional Fourier transformation is slightly different
through the choice of a factor 2π and the sign in the exponent.
The Cauchy distribution has a well defined median and a well defined
mode given by µ = µ = x0, quantiles can be calculated readily from (2.99’),
but the mean and all other moments do not exist because the integral∫∞−∞ xf(x)dx diverges.
Stochastic Kinetics 97
Figure 2.20: The Cauchy-Lorentz probability density. The figure shows
three examples of Cauchy-Lorentz densities, f(x) = γ/ (
π((x − x0)
2 + γ2))
centered around the median x0 = 5. The width of the distribution increases with
γ, which was chosen to be 0.5 (red), 1.0 (black), and 2.0 (blue).
2.7.9 Bimodal distributions
As the name of the bimodal distribution indicates that the density function
f(x) has two maxima. It arises commonly as a mixture of two unimodal
distribution in the sense that the bimodally distributed random variable Xis defined as
Prob(X ) =
P (X = Y1) = α and
P (X = Y2) = (1− α) .
Bimodal distributions commonly arise from statistics of populations that are
split into two subpopulations with sufficiently different properties. The sizes
of weaver ants give rise to a bimodal distributions because of the existence of
two classes of workers [57]. In case the differences are too small as in case of
the combined distribution of body heights for men and women monomodality
is observed [58].
As an illustrative model we choose the superposition of two normal dis-
tributions with different means and variances (figure 2.21). The probability
98 Peter Schuster
Figure 2.21: A bimodal probability density. The figure illustrates a bimodal
distribution modeled as a superposition of two normal distributions (2.101) with
α = 1/2 and different values for mean and variance (ν1 = 2, σ21 = 1/2) and (ν2 =
6, σ22 = 1): f(x) = (
√2e−(x−2)2 + e−(x−6)2/2)
/(2√
2π). The upper part shows the
probability density corresponding to the two modes µ1 = ν1 = 2 and µ2 = ν2 = 6.
Median µ = 3.65685 and mean µ = 4 are situated near the density minimum
between the two maxima. The lower part presents the cumulative probability
distribution, F (x) = 14
(2 + erf
(x− 2
)+ erf
(x−6√
2
)), as well as the construction of
the median. The variances in this example are: µ2 = 20.75 and µ2 = 4.75.
Stochastic Kinetics 99
density for α = 1/2 is then of the form:
f(x) =1
2√
2π
(e
−(x−ν1)2
2σ21
/√σ2
1 + e−(x−ν2)2
2σ22
/√σ2
2
). (2.101)
The cumulative distribution function is readily obtained by integration. As in
the case of the normal distribution the result is not analytical but formulated
in terms of the error function, which is available only numerically through
integration:
F (x) =1
4
(2 + erf
(x− ν1√
2σ21
)+ erf
(x− ν2√
2σ22
)). (2.102)
In the numerical example shown in figure 2.21 the distribution function shows
two distinct steps corresponding to the maxima of the density f(x).
As an exercise first an second moments of the bimodal distribution can
be readily computed analytically. The results are:
µ1 = µ =1
2(ν1 + ν2) , µ1 = 0 and
µ2 =1
2(ν2
1 + ν22) +
1
2(σ2
1 + σ22) , µ2 =
1
4(ν1 − ν2)
2 +1
2(σ2
1 + σ22) .
The centered second moment illustrates the contributions to the variance
of the bimodal density. It is composed of the sum of the variances of the
subpopulations and the square of the difference between the two means,
(ν1 − ν2)2.
In table 2.2 we have listed also several other probability densities which
are of importance for special applications. In the forthcoming chapters 4 and
5 dealing with applications we shall make use of them, in particular of the
logistic distribution that describes the stochasticity of growth following the
logistic equation.
100 Peter Schuster
3. Stochastic processes
Systems evolving probabilistically in time can be described and modeled in
mathematical terms by stochastic processes. More precisely, we postulate
the existence of a time dependent random variable X (t) or random vector
~X (t).1 We shall distinguish the simpler discrete case,
Pn(t) = P(X (t) = n
)with n ∈ N0 ,
and the continuous or probability density case,
dF (x, t) = f(x, t) dx = P(x ≤ X (t) ≤ x+ dx
)with x ∈ R .
In both cases an experiment, or a trajectory, is understood as a recording
of the particular values of X at certain times:
T =((x1, t1), (x2, t2), (x3, t3), · · · , (y1, τ1), (y2, τ2), · · ·
). (3.1)
Although it is not essential for the application of probability theory, we shall
assume for the sake of clearness that the recorded values are always time
ordered here with the earliest or oldest values on the rightmost position and
the most recent values at the latest entry on the left-hand side (figure 3.1):2
t1 ≥ t2 ≥ t3 ≥ · · · ≥ τ1 ≥ τ2 ≥ · · · .A trajectory is a sequence of time ordered doubles (x, t).
A general comment on the meaning of variables is required. So far we
have only used the vague notion of scores and not yet specified what kinds of
1At first we need not specify whether X (t) is a simple random variable or a random
vector. Later on, when a distinction between problems of different dimensionality becomes
necessary, we shall make clear, in which sense X (t) is used (variable in one dimension or
vector ~X (t)).2It is worth noticing that the conventional time axis in drawings of stochastic processes
goes in opposite direction, from left to right.
101
102 Peter Schuster
Figure 3.1: Time order in modeling stochastic processes. Time is pro-
gressing from left to right and the most recent event is given by the rightmost
recording at time t1. The Chapman-Kolmogorov equation describing stochastic
processes comes in two forms: (i) the forward equation predicting the future form
past and present and (ii) the backward equation that extrapolates back in time
from present to past.
quantities the random variables (X ,Y ,Z) ∈ Ω describe or what their realiza-
tions in some measurable space, denoted by (x, y, z) ∈ Rn, are. Depending
on the chemical or biological model these variables can be discrete numbers
of particles in ensembles of atoms or molecules or continuous concentrations,
they can be the numbers of individuals in populations or they can be po-
sitions in three-dimensional space when migration processes are considered.
In the last example we shall tacitly assume that the one-dimensional vari-
ables can be replaced by vectors,(X (t),Y(t),Z(t)
)=⇒
( ~X (t), ~Y(t), ~Z(t))
or
(x, y, z) =⇒ (x, y, z) without changing the equations, and if this is not the
case the differences will be stated. In the more involved models of chemical
reaction-diffusion systems the variables will be functions of space and time,
for example X (r, t).
In this chapter we shall present a general formalism to describe stochastic
processes and distinguish between differen classes, in particular drift, diffu-
sion, and jump processes. In essence, we shall use the notation introduced
by Crispin Gardiner [59]. The introduction given here is essentially based
on the two textbooks [6, 16]. A few examples of stochastic processes of gen-
eral importance will be discussed in this chapter in order to illustrate the
formalism. Applications are presented in the following two chapters 4 and 5.
Stochastic Kinetics 103
3.1 Markov processes
A stochastic process, as we shall assume, is determined by a set of joint
probability densities the existence of which is given and which determine the
system completely:3
p(x1, t1; x2, t2; x3, t3; · · · ; xn, tn; · · · ) . (3.2)
By the phrase ’the determination is complete’ we mean that no additional
information is needed to describe the progress in terms of a time ordered
series (3.1) and we shall call such a process a separable stochastic pro-
cess. Although more general processes are conceivable, they play little role
in current physics, chemistry, and biology and therefore we shall not consider
them here.
Calculation of probabilities from (3.2) by means of marginal densities
(2.21) and (2.27) is straight forward. For the continuous case we obtain
P (X1 = x1 ∈ [a, b]) =
∫ b
a
dx1
∫∫∫ ∞
−∞dx2dx3 · · ·dxn · · ·
p(x1, t1; x2, t2; x3, t3; · · · ; xn, tn; · · · )
and in the discrete case the result is obvious
P (X = x1) = p(x1, ∗) =∑
xk 6=x1
p(x1, t1; x2, t2; x3, t3; · · · ; xn, tn; · · · ) .
Time ordering allows us to formulate predictions of future values from the
known past in terms of conditional probabilities:
p(x1, t1; x2, t2; · · · |y1, τ1; y2, τ2, · · · ) =p(x1, t1; x2, t2; · · · ; y1, τ1; y2, τ2, · · · )
p(y1, τ1; y2, τ2, · · · ),
with t1 ≥ t2 ≥ · · · ≥ τ1 ≥ τ2 ≥ · · · . In other words, we may compute
(x1, t1), (x2, t2), · · · from known (y1, τ1), (y2, τ2), · · · . Before we derive
3The joint density p is defined in the same way as in equations (2.20) and (2.26)
but with a slightly different notation. In describing stochastic processes we are always
dealing with doubles (x, t), and therefore we separate individual doubles by a semicolon:
· · · ;xk, tk;xk+1, tk+1; · · · .
104 Peter Schuster
a general concept that allows for flexible modeling and tractable stochastic
description of processes we introduce a few common and characteristic classes
of stochastic processes.
3.1.1 Simple stochastic processes
The simplest class of stochastic processes is characterized by complete in-
dependence,
p(x1, t1; x2, t2; x3, t3; · · · ) =∏
i
p(xi, ti) , (3.3)
which implies that the current value X (t) is completely independent of its
values in the past. A special case is the sequence of Bernoulli trials (see Sn in
chapter 2, in particular in subsections 2.2.1 and 2.7.5) where the probability
densities are also independent of time: p(xi, ti) = p(xi), and then we have
p(x1, t1; x2, t2; x3, t3; · · · ) =∏
i
p(xi) . (3.3’)
Further simplification occurs, of course, when all trials are based on the same
probability distribution – for example, if the same coin is tossed in Bernoulli
trials – and then the product is replaced by p(x)n.
The notion of martingale has been introduced by the French mathe-
matician Paul Pierre Levy and the development of the theory of martingales
is due to the American mathematician Joseph Leo Doob. The conditional
mean value of the random variable X (t) provided X (t0) = x0 is defined as
E(X (t)|(x0, t0)
) .=
∫dx p(x, t|x0, t0) .
In a martingale the conditional mean is simple given by
E(X (t)|(x0, t0)
)= x0 . (3.4)
The mean value at time t is identical to the initial value of the process. The
martingale property is rather strong and we shall use it for several specific
situations.
Stochastic Kinetics 105
The somewhat relaxed notion of a semimartingale is of importance
because it covers the majority of processes that are accessible to modeling by
stochastic differential equations (section 3.5). A semimartingale is composed
of a local martingale and a cadlag adapted process with bounded variation
X (t) = M(t) + A(t)
A local martingale is a stochastic process that satisfies locally the martingale
property (3.4) but its expectation value 〈M(t)〉 may be distorted at long
times by large values of low probability. Hence, every martingale is a local
martingale and every bounded local martingale is a martingale. In particu-
lar, every driftless diffusion process is a local martingale but need not be a
martingale. An adapted or nonanticipating process is a process that cannot
see in the future. An informal interpretation [60, section II.25] would say: A
stochastic process X (t) is adapted iff for every realization and for every time
t, X (t) is known at time t. Cadlag stands for right-hand continuous with left
limits (For the cadlag property of processes see also section 2.2.2).
More formally the definition of an adapted process reads: For a prob-
ability space (Ω,F , P ) with I being an index set with total order (≤) and
I ∈ N, I ∈ N0, I = [0, t] or I = [0,+∞), F· = (Fi)i∈I being a filtration4 of
the σ algebra F , (S,Σ) being a measurable state space and X : I⊗Ω→ S be-
ing a stochastic process, the process X is said to be adapted to the filtration
(Fi)i∈I if the random variable
Xi : Ω→ S is a (Fi,Σ)-measurable function for each i ∈ I .
The concept of adapted processes is essential, for the Ito stochastic integral,
which requires that the integrand is an adaptive process.
4A filtration is an indexed set Si of subobjects from a given algebraic structure S with
the index i running over some index set I, which is totally ordered with respect to the
condition: i ≤ j, (i, j) ∈ I ⇒ Si ⊆ Sj . Alternatively, in a filtered algebra there is instead
the requirement that the Si are subobjects with respect to certain operations, for example
vector addition, whereas other operations are objects from other structures, for example
multiplication that satisfies Si · Sj ⊂ Si⊕j where the index set is the natural numbers,
I ≡ N (see also [61]).
106 Peter Schuster
Another simple concept assumes that knowledge of the present only is
sufficient to predict the future. It is realized in Markov processes named
after the Russian mathematician Andrey Markov and can formulated easily
in terms of conditional probabilities:
p(x1, t1; x2, t2; · · · |y1, τ1; y2, τ2, · · · ) = p(x1, t1; x2, t2; · · · |y1, τ1) . (3.5)
In essence, the Markov condition expresses more precisely the assumptions
of Albert Einstein and Marian von Smoluchowski in their derivation of the
diffusion process. particular, we have
p(x1, t1; x2, t2; y1, τ1) = p(x1, t1|x2, t2) p(x2, t2|y1, τ1) .
As we have seen in section 2.4 any arbitrary joint probability can be simply
expressed as products of conditional probabilities:
p(x1, t1; x2, t2; x3, t3; · · ·xn, tn) =
= p(x1, t1|x2, t2) p(x2, t2|x3, t3) · · · p(xn−1, tn−1|xn, tn) p(xn, tn) (3.5’)
under the assumption of time ordering t1 ≥ t2 ≥ t3 ≥ . . . tn−1 ≥ tn.
3.1.2 The Chapman-Kolmogorov equation
From joint probabilities also follows that summation of all mutually exclusive
events of one kind eliminates the corresponding variable:
∑
B
P (A ∩ B ∩ C) = P (A ∩ C) .
By the same token we find
p(x1, t1) =
∫dx2 p(x1, t1; x2, t2) =
∫dx2 p(x1, t1|x2, t2) p(x2, t2) .
Extension to three events leads to
p(x1, t1|x3, t3) =
∫dx2 p(x1, t1; x2, t2|x3, t3) =
=
∫dx2 p(x1, t1|x2, t2; x3, t3) p(x2, t2|x3, t3) .
Stochastic Kinetics 107
For t1 ≥ t2 ≥ t3 and making use of the Markov assumption we obtain the
Chapman-Kolmogorov equation, which is named after the British geo-
physicist and mathematician Sydney Chapman and the Russian mathemati-
cian Andrey Kolmogorov:
p(x1, t1|x3, t3) =
∫dx2p(x1, t1|x2, t2) p(x2, t2|x3, t3) . (3.6)
In case we are dealing with a discrete random variable N ∈ N0 defined on
the integers we replace the integral by a sum and obtain
P (n1, t1|n3, t3) =∑
n2
P (n1, t1|n2, t2)P (n2, t2|n3, t3) . (3.7)
The Chapman-Kolmogorov equation can be interpreted in two different ways
known as forward and backward equation. In the forward equation the
double (x3, t3) is considered to be fixed and (x1, t1) represents the variable
x1(t1), and the time t1 proceeding in positive direction. The backward equa-
tion is exploring the past of a a given situation: the double (x1, t1) is fixed
and (x3, t3) is propagating backwards in time. The forward equation is bet-
ter suited to describe actual processes, whereas the backward equation is
the appropriate tool to compute the evolution towards given events, for ex-
ample first passage times. In order to discuss the structure of solutions of
equations (3.6) and (3.7), we shall derive the equations in differential form.
Continuity of processes. Before we can do so we need to consider a condi-
tion for the continuity of Markov processes. The process goes from position
z at time t to position x at time t + ∆t. Continuity of the process implies
that the probability of x to be finitely different from z goes to zero faster
than ∆t in the limit lim ∆t→ 0:
lim∆t→0
1
∆t
∫
|x−z|>εdx p(x, t+ ∆t|z, t) = 0 , (3.8)
and this uniformly in z, t, and ∆t. In other words, the difference in proba-
bility as a function of |x − z| converges sufficiently fast to zero and thus no
jumps occur in the random variable X (t).
108 Peter Schuster
Figure 3.2: Continuity in Markov processes. Continuity is illustrated by
means of two stochastic processes of the random variable X (t), the Wiener process
W(t) (3.9) and the Cauchy process C(t) (3.10). The Wiener process describes
Brownian motion and is continuous but almost nowhere differentiable. The even
more irregular Cauchy process is wildly discontinuous.
As two illustrative examples for the analysis continuity we choose in fig-
ure 3.2 the Einstein-Smoluchowski solution of the Brownian motion,5 which
leads to normally distributed probability,
p(x, t+ ∆t|z, t) =1√
4πD∆texp
(−(x− z)2
4D∆t
), (3.9)
and the so-called Cauchy process following the Cauchy-Lorentz distribution,
p(x, t+ ∆t|z, t) =∆t
π
1
(x− z)2 + ∆t2. (3.10)
In case of the Wiener process we exchange the limit and the integral, intro-
5Later on we shall discuss this particular stochastic process in detail and call it a
Wiener process.
Stochastic Kinetics 109
duce ϑ = (∆t)−1, perform the limit ϑ→∞, and have
lim∆t→0
1
∆t
∫
|x−z|>εdx
1√4πD
1√∆t
exp
(−(x− z)2
4D∆t
)=
=
∫
|x−z|>εdx lim
∆t→0
1
∆t
1√4πD
1√∆t
exp
(−(x− z)2
4D∆t
)=
=
∫
|x−z|>εdx lim
ϑ→∞
1√4πD
ϑ3/2
exp(
(x−z)24D
ϑ) , where
limϑ→∞
ϑ3/2
1 + (x−z)24D· ϑ + 1
2!
((x−z)2
4D
)2
· ϑ2 + 13!
((x−z)2
4D
)3
· ϑ3 + . . .= 0 .
Since the power expansion of the exponential in the denominator increases
faster than every finite power of ϑ, the ratio vanishes in the limit ϑ → ∞and the value of the integral is zero.
In the second example, the Cauchy process, we exchange limit and integral
as incase of the Wiener process, and perform the limit ∆t→ 0:
lim∆t→0
1
∆t
∫
|x−z|>εdx
∆t
π
1
(x− z)2 + ∆t2=
=
∫
|x−z|>εdx lim
∆t→0
1
∆t
∆t
π
1
(x− z)2 + ∆t2=
=
∫
|x−z|>εdx lim
∆t→0
1
π
1
(x− z)2 + ∆t2=
∫
|x−z|>ε
1
π(x− z)2dx 6= 0 .
The last integral, I =∫∞|x−z|>ε dx/(x−z)2, takes a value of the order I ≈ 1/ε.
Although it is continuous, the curve of Brownian motion is indeed ex-
tremely irregular since it is nowhere differentiable (figure 3.2). The Cauchy-
process curve is also irregular but even discontinuous. Both processes, as
required for consistency, fulfill the relation
limt→0
p(x, t+ ∆t|z, t) = δ(x− z) ,
where δ(·) is the so-called delta-function.6 It is also straightforward to show
that the Chapman-Kolmogorov equation is fulfilled in both cases. Note, that
6The delta-function is no proper function but a generalized function or distribution. It
was introduced by Paul Dirac in quantum mechanics. For more details see, for example,
[62, pp.585-590] and [63, pp.38-42].
110 Peter Schuster
a small but finite difference |x − z| > ε is required to avoid the collapse of
the distribution to the delta-function which is prohibitive for the detection
of continuity.
Differential Chapman-Kolmogorov equation. Now we shall develop a
differential version of the Chapman-Kolmogorov equation which is based on
the continuity condition just discussed. This requires a technique to divide
differentiability conditions into parts corresponding either to continuous mo-
tion under generic conditions or to discontinuous motion. This partitioning
is based on the following conditions for all ε > 0: (3.11)
(i) lim∆t→01
∆tp(x, t+ ∆t|z, t) t = W (x|z, t) , uniformly in x, z, and t
for |x− z| ≥ ε;
(ii) lim∆t→01
∆t
∫|x−z|<ε dx (xi − zi) p(x, t+ ∆t|z, t) = ai(z, t) + O(ε) ,
uniformly in z, and t;
(iii) lim∆t→01
∆t
∫|x−z|<ε dx (xi − zi)(xj − zj) p(x, t+ ∆t|z, t) =
= Bij(z, t) + O(ε) , uniformly in z, and t.
where xi, xj , zi, and zj, refer to particular components of the vectors x and
z, respectively. In (i) W (x|z, t) is the jump probability from z to x at time
t. It is important to notice that all higher-order coefficients of motion Cijk,
defined in analogy to ai in (ii) or Bij in (iii), must vanish by symmetry
considerations [6, p.47-48]. As an example we consider the third order term
defined by
lim∆t→0
1
∆t
∫
|x−z|<εdx (xi − zi)(xj − zj)(xk − zk)p(x, t+ ∆t|z, t) .
= Cijk(z, t) +O(ε).
The function Cijk(z, t) is symmetric in the indices i, j and k, and in order to
check the consequences of this symmetry we define
C(α, z, t) ≡∑
i,j,k
αiαjαkCijk(z, t) ,
which can be written as
Cijk(z, t) =1
3!
∂3
∂αi∂αj∂αkC(α, z, t) .
Stochastic Kinetics 111
By comparison with item (iii) we find
|C(α, z, t)| ≤
≤ lim∆t→0
1
∆t
∫
|x−z|<εdx |α(x− z)|
(α(x− z)
)2p(x, t+ ∆t|z, t) + O(ε) ,
≤ |α|ε lim∆t→0
1
∆t
∫
|x−z|<εdx(α(x− z)
)2p(x, t+ ∆t|z, t) + O(ε) ,
= ε|α|(αiαjBij(Z, t) +O(ε)
)k + O(ε) ,
= O(ε) ,
and accordingly, C vanishes. It can be shown by analogous derivation that
all quantities of higher than third order are zero too.
According to the continuity condition (3.8) a Markov process can only
have a continuous path if W (x|z, t) vanishes for all x 6= z. It is sugges-
tive therefore that this function describes discontinuous motion, whereas the
quantities ai and Bij are connected with aspects of continuous motion.
In order to derive a differential version of the Chapman-Kolmogorov equa-
tion we consider the time evolution of the expectation of a function f(z)
which is (at least) twice differentiable:
∂
∂t
∫dx f(x) p(x, t|y, t′) =
= lim∆t→0
1
∆t
(∫dx f(x)
(p(x, t+ ∆t|y, t′) − p(x, t|y, t′)
))=
= lim∆t→0
1
∆t
∫dx
∫dz f(x)p(x, t+ ∆t|z, t)p(z, t|y, t′) −
∫dz f(z)p(z, t|y, t′)
,
where we have used the Chapman-Kolmogorov equation in the positive term
in order to produce the∫dz expression. In the negative term we made use of
the fact that∫
dx p(x, t+∆t|z, t) = 1 because p(x, t+∆t|z, t) is a conditional
probability. The derivation of the expression can be visualized much easier
by considering the association of variables with times: x ↔ t + ∆t, z ↔ t,
and y ↔ t′ with t+∆t > t > t′. Then we obtain the second term just be the
appropriate change in variables: x↔ z.
The integral over dx is now divided into two regions, |x − z| ≥ ε and
|x− z| < ε. Since f(z) is assumed to be twice continuously differentiable, we
112 Peter Schuster
find by means of Taylor expansion up to second order:
f(x) = f(z)+∑
i
∂f(z)
∂zi(xi− zi)+
∑
i,j
1
2
∂2f(z)
∂zi∂zj(xi− zi)(xj − zj)+ |x− z|2R(x, z) .
From the condition of twice continuous differentiability follows |R(x, z)| → 0
as |x−z| → 0 where R(x, z) is the remainder term after the truncation of the
Taylor series. Substitution in the partial time derivative of the expectation
value from above yields:
∂
∂t
∫dx f(x) p(x, t|y, t′) =
= lim∆t→0
1
∆t
( ∫∫
|x−z|<ε
dxdz(∑
i
(xi − zi)∂f(z)
∂zi+∑
i,j
(xi − zi)(xj − zj)∂2f(z)
∂zi∂zj
)×
× p(x, t+ ∆t|z, t) p(z, t|y, t′) + (3.12a)
+
∫∫
|x−z|<ε
dxdz |x− z|2R(x, z) p(x, t+ ∆t|z, t) p(z, t|y, t′) + (3.12b)
+
∫∫
|x−z|<ε
dxdz f(x) p(x, t+ ∆t|z, t) p(z, t|y, t′) + (3.12c)
+
∫∫
|x−z|≥ε
dxdz f(x) p(x, t+ ∆t|z, t) p(z, t|y, t′) − (3.12d)
−∫∫
dxdz f(z) p(x, t+ ∆t|z, t) p(z, t|y, t′)). (3.12e)
In last term in the equation, line (3.12e), the integral over x is simply one,
since p(x, t + ∆t|z, t) is a probability and the integration covers the entire
sample space.
In the following we consider the individual terms separately. As we have
assumed uniform convergence we can take the limit inside the integral and
obtain by means of conditions (ii) and (iii) from (3.11) for the term (3.12a):
∫dz
(∑
i
ai(z)∂f(z)
∂zi+
1
2
∑
i,j
Bij(z)∂2f(z)
∂zi∂zj
)p(z, t|y, t′) + O(ε) .
The next term (3.12b) is a remainder term and vanishes in the limit ε → 0
Stochastic Kinetics 113
[6, p.49]:
∣∣∣∣∣∣∣
1
∆t
∫
|x−z|<ε
dx (x− z)2R(x, z) p(x, t+ ∆t|z, t)
∣∣∣∣∣∣∣≤
≤
∣∣∣∣∣∣∣
1
∆t
∫
|x−z|<ε
dx (x− z)2 p(x, t+ ∆t|z, t)
∣∣∣∣∣∣∣
(max
|x−z|<ε
∣∣R(x, z)∣∣)→
→∣∣∣∣∣∑
i,j
Bij(z, t) + O(ε)
∣∣∣∣∣
(max
|x−z|<ε
∣∣R(x, z)∣∣).
From the previously stated requirement of twice continuous differentiability
follows max|x−z|<ε
∣∣R(x, z)∣∣→ 0 as ε→ 0 .
The remaining three terms, (3.12c), (3.12d), and (3.12e) can be combined
and yield:7
∫∫
|x−z|<ε
dx dz f(z)(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)
).
The whole right-hand side of equation (3.12) is independent of ε. Thus we
can take the limit ε→ 0 and find
∂
∂t
∫dz f(z) p(z, t|y, t′) =
=
∫dz
(∑
i
ai(z, t)∂f(z)
∂zi+
1
2
∑
i,j
Bij(z)∂2f(z)
∂zi∂zj
)
p(z, t|y, t′) +
+
∫dz f(z)
—
∫dx(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)
),
where –∫
dx stands for a principal value integral, for example,
limε→0
∫
|x−z|>ε
dxF (x, z) ≡ —
∫dxF (x, z) ,
7Note that we interchanged the variables x and z in the two positive terms (3.12c) and
(3.12d). This is generally admissible since the integration extends over both domains.
114 Peter Schuster
the principal value integral of a function F (x, z). For any realistic process we
can assume that this integral exists. Condition (i) of (3.11) defines W (x|z, t)for x 6= z only (ε = |x − z| > 0), and hence leaves open the possibility
that it becomes infinite at x = z.8 In general, when p(x, t|y, t′) is continuous
and once differentiable, then the principle value integral exists. In the forth-
coming part we shall dispense from spelling out the principal value integral
explicitly, since singular cases like the Cauchy process, for which contour in-
tegration based on complex functions theory is required, are considered only
rarely.
The final step in our derivation is now integration by parts, for which we
recall∫f ′(x)g(x)dx = f(x)g(x) −
∫f(x)g′(x)dx from elementary calculus.
Some careful computation finally yields
∫dz f(z)
∂
∂tp(z, t|y, t′) =
=
∫dz f(z)
(−∑
i
∂
∂ziai(z, t) p(z, t|y, t′) +
+∑
i,j
1
2
∂2
∂zi∂zjBij(z, t) p(z, t|y, t′) +
+
∫dx(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)
))
+ surface terms .
So far we have not yet specified the range of the integrals. The process under
consideration is assumed to be confined to a region in sample space R ∈ Ω
with some surface S. Clearly, probabilities vanish when variables are outside
R and by definition we have
p(x, t|z, t′) = 0 and W (x|z, t) = 0 unless both x, z ∈ R .
The situation with the functions ai(z, t) and Bij(z, t) is more subtle, since
the conditions on them can lead to discontinuities because the conditional
8This is indeed the case for the Cauchy process (figure 3.2), for which W (x|z, t) =
1/[π(x− z)2].
Stochastic Kinetics 115
probability p(x, t+∆t|z, t′) may well change discontinuously as z crosses the
boundary of R. Such a discontinuity could, for example, be the consequence
that no transitions are allowed from outside R to inside R or vice versa.
Integration by parts requires, as initially stated, that ai and Bij are once
and twice differentiable, respectively. In order to avoid problems related to
discontinuous behavior at the surface S we may choose the function f(z) to
be (arbitrary but) non-vanishing only in an arbitrary region R′ = R\S ⊂ R
and according to the definition of f(z) the surface terms vanish necessarily.
Then we have for all z in the interior of R:
∂
∂tp(z, t|y, t′) =
−∑
i
∂
∂zi
(ai(z, t) p(z, t|y, t′)
)+ (3.13a)
+∑
i,j
1
2
∂2
∂zi∂zj
(Bij(z, t) p(z, t|y, t′)
)+ (3.13b)
+
∫dx(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)
). (3.13c)
This equation has been called the differential Chapman-Kolmogorov
equation by Crispin Gardiner [64]. Precisely, it is the forward equation
since it specifies initial conditions to lie in the past and it describes the
development of the probability density with increasing time. Later on, we
shall also discuss a backward equation.
From a mathematical puristic’s point of view it is not clear from the
derivation given here, that solutions of the differential Chapman-Kolmogorov
equation (3.13) exist or that the solutions of (3.13) are also solutions to the
Chapman-Kolmogorov equation (3.6). It is true, however, that the set of
conditional probabilities obeying equation (3.13) does generate a Markov
process in the sense that the joint probabilities produced satisfy all prob-
ability axioms. It has been shown, however, that a non-negative solution
to the differential Chapman-Kolmogorov equations exists and satisfies the
Chapman-Kolmogorov equation under certain conditions (see [65, Vol.II]):
116 Peter Schuster
(i) a(x, t) = ai(x, t); i = 1, . . . and B(x, t) = Bij(x, t); i, j = 1, . . .are specific vectors and positive semidefinite matrices9 of functions,
respectively,
(ii) W (x|y, t) are non-negative quantities,
(iii) the initial condition has to satisfy p(z, t|y, t) = δ(y − z) which follows
from the definition of of a conditional probability density, and
(iv) appropriate boundary conditions have to be fulfilled.
The boundary conditions are very hard to specify for the full equation but
can be discussed precisely for special cases, for example in the case of the
Fokker-Planck equation [9].
9A positive definite matrix has exclusively positive eigenvalues, λk > 0 whereas a
positive semidefinite matrix has non-negative eigenvalues, λk ≥ 0.
Stochastic Kinetics 117
3.2 Classes of stochastic processes
The differential Chapman-Kolmogorov equation describes three classes of
components for stochastic processes which allow to define several important
special cases. These classes refer to the three conditions (i), (ii), and (iii)
discussed in (3.11) as well as combinations derived from them. Here, we shall
be concerned only with the general aspect of classification. Specific examples
will discussed in the forthcoming section 3.4.
3.2.1 Jump process and master equation
For the jump process we consider the last term in the differential Chapman-
Kolmogorov equation (3.13c) and set ai(z, t) = 0 and Bij(z, t) = 0 for all i
and j. The resulting equation is known as master equation:
∂
∂tp(z, t|y, t′) =
∫dx(W (z|x, t) p(x, t|y, t′) − W (x|z, t) p(z, t|y, t′)
). (3.14)
J ( )t
t
X ( )t
Figure 3.3: Jump process. The figure shows a typical trajectory of a jump
process J (t) as described, for example, by a master equation. The random variable
X (t) stays constant except at certain discrete points where the jumps occur.
118 Peter Schuster
In order to illustrate the general process described by the master equation
(3.14) we consider the evolution in a short time interval. For this goal we solve
approximately to first order in ∆t and use the initial condition p(z, t|y, t) =
δ(y− z) representing a sharp probability density at t = 0:10
p(z, t+ ∆t|y, t) = p(z, t|y, t) +∂
∂tp(z, t|y, t) ∆t + . . . ≈
≈ p(z, t|y, t) +∂
∂tp(z, t|y, t) ∆t =
= δ(y− z) +
(W (z|y, t)−
∫dxW (x|y, t) δ(y− z)
)∆t =
=
(1−
∫dxW (x|y, t)∆t
)δ(y− z) + W (z|y, t) ∆t .
In the first term, the coefficient of δ(y− z) is the (finite) probability for the
particle to stay in the original position y, whereas the distribution of particles
that have jumped is given after normalization by W (z|y, t). A typical path
~X (t) thus will consist of constant sections, ~X (t) = const, and discontinuous
jumps which are distributed according to W (z|y, t) (figure 3.3). It is worth
noticing that a pure jump process does occur here even though the variable
~X (t) can take on a continuous range of values.
A highly relevant special case of the master equation is obtained when the
sample space is mapped onto the space of integers, Ω→ Z = ..,−2,−1, 0, 1, 2, ...Then, we can use conditional probabilities rather than probability densities
in the master equation:
∂P (n, t|n′, t′)
∂t=∑
m
(W (n|m, t)P (m, t|n′, t′) − W (m|n, t)P (n, t|n′, t′)
).
(3.14’)
Clearly, the process is confined to jumps since only discrete values of the
random variable ~N (t) are allowed. The master equation on the even more
restricted sample space Ω→ N0 = 0, 1, 2, . . . is of particular importance in
chemical kinetics. The random variable N (t) then counts particle numbers
which are necessarily non-negative integers.
10We recall a basic property of the delta-function:∫f(x)δ(x− y) dx = f(y).
Stochastic Kinetics 119
Figure 3.4: Drift and diffusion process. The figure shows a typical trajec-
tory of a drift-and-diffusion process whose probability density is described by a
Fokker-Planck equation. The sample curve D(t) is characterized by drift (red)
and diffusion (pink) of the random variable X (t). The band indicates a confidence
interval of about ν±σ that contains 68.2 % of the points (when they are normally
distributed; see subsection 2.7.6).
3.2.2 Diffusion process and Fokker-Planck equation
The Fokker-Planck equation is in a way complementary to the master
equation since the quantities W (z|x, t) are assumed to be zero and hence
jumps are excluded:
∂p(z, t|y, t′)∂t
=
= −∑
i
∂
∂zi
(ai(z, t) p(z, t|y, t′)
)+
1
2
∑
i,j
∂2
∂zi∂zj
(Bij(z, t) p(z, t|y, t′)
). (3.15)
The process corresponding to the Fokker-Planck equation is a diffusion pro-
cess with a(z, t) being the drift vector and B(z, t) the diffusion matrix.
As a result of the definition given in condition (iii) of (3.11), the diffusion ma-
trix is positive semi-definite and symmetric. The trajectories of the diffusion
120 Peter Schuster
process are continuous as follows directly from W (z|x, t) = 0 in condition (i)
of (3.11).
Making use of the initial condition p(z, t|y, t) = δ(z − y) and neglecting
the derivatives of ai(z, t) and Bij(z, t) for small ∆t we find the approximation
∂p(z, t|y, t′)∂t
= −∑
i
ai(z, t)∂p(z, t|y, t′)
∂zi+
1
2
∑
i,j
Bij(z, t)∂2p(z, t|y, t′)∂zi zj∂
,
which can be solved for small ∆t = t− t′ and obtain in matrix form11
p(z, t+ ∆t|y, t) =1√
(2π)n ∆t
(|B(y, t)|
)−1/2×
× exp
(−1
2
(z− y− a(y, t) ∆t
)′(B(y, t)
)−1(z− y− a(y, t) ∆t
)
∆t
),
which is a normal distribution with a variance-covariance matrix Σ = B(y, t)
and an expectation value ν = a(y, t) ∆t. The general picture thus is a
sample point moving with a systematic drift, whose velocity is a(y, t), and a
superimposed Gaussian fluctuation with a covariance matrix B(y, t):
y(t+ ∆t) = y(t) + a(y(t), t
)∆t + η(t)
√∆t ,
with E(η(t)
)= 0 and E
(η(t)η(t)′
)= B(y, t). Accordingly, this picture gives
(i) trajectories which are always continuous (since, for example, y(t+∆t)→y(t) is fulfilled for ∆t→ 0), and
(ii) trajectories which are nowhere differentiable because of the√
∆t depen-
dence of the fluctuations.12 An example of a typical solution of a Fokker-
Planck equation is shown in figure 3.4.
11For readers not familiar with matrix formalism it is a straightforward exercise to write
down the one-dimensional solution of the problem (See also subsection 2.7.6).12The role of the
√∆t dependence will become clear when we discuss the Wiener process
in subsection 3.4.3.
Stochastic Kinetics 121
3.2.3 Deterministic processes and Liouville’s equation
When the differential Chapman-Kolmogorov equation contains only the first
term, all others being zero, the resulting differential equation is a special case
of the Liouville equation
∂p(z, t|y, t′)∂t
= −∑
i
∂
∂zi
(ai(z, t) p(z, t|y, t′)
), (3.16)
which is known from classical mechanics. Equation (3.16) describes a com-
pletely deterministic motion which is a solution of the ordinary differential
equationdx(t)
dt= a
(x(t), t
)with x(y, t′) = y . (3.17)
The (probabilistic) solution of the differential equation (3.16) with the initial
condition p(z, t′|y, t′) = δ(z− y) is p(z, t|y, t′) = δ(z− x(y, t)
).
The proof of this assertion is obtained by direct substitution [6, p.54].
∑
i
∂
∂zi
(ai(z, t) δ
(z− x(y, z)
))=∑
i
∂
∂zi
(ai(x(y, t), t) δ
(z− x(y, t)
)),
=∑
i
(ai(x(y, t), t)
∂
∂ziδ(z− x(y, t)
)),
and∂
∂tδ(z− x(y, t)
)= −
∑
i
∂
∂ziδ(z− x(y, t)
) dxi(y, t)
dt,
and by means of equation (3.17) the last two lines become equal.
If a particle is in a well-defined position y at time t′ it will remain on
the trajectory obtained by solving the corresponding ordinary differential
equation (ODE). Deterministic motion can be visualized therefore as an ele-
mentary form of Markov process, which can be formulated by a drift-diffusion
process with a zero diffusion matrix.
122 Peter Schuster
3.3 Forward and backward equations
Equations which reproduce the time development with respect to initial vari-
ables (y, t′) of the probability density p(x, t|y, t′) are readily derived:
lim∆t′→0
1
∆t′
(p(x, t|y, t′ + ∆t′) − p(x, t′|y, t′)
)=
= lim∆t′→0
1
∆t′
∫dz p(z, t′ + ∆t′|y, t′)
(p(x, t|y, t′ + ∆t′) − p(x, t|z, t′ + ∆t′)
).
Figure 3.5: Illustration of forward and backward equations. The forward
differential Chapman-Kolmogorov equation starts from an initial condition corre-
sponding to the sharp distribution δ(y−z), (y, t′) is fixed (blue), and the probabil-
ity density unfolds with time t ≥ t′ (black). It is well suited for the description of
actual experimental situations. The backward equation, although somewhat more
convenient and easier to handle from the mathematical point of view, is less suited
to describe typical experiments and commonly applied to first passage time or
exit problems. Here, (x, t) is held constant (blue) and the time dependence of the
probability density corresponds to samples unfolding into the past, t′ ≤ t (red).
The initial condition, δ(y − z), in this case is rather a final condition represented
by a sharp final distribution.
Stochastic Kinetics 123
Thereby we used the Chapman-Kolmogorov equation in the second term and
the fact that the first term yields 1× p(x, t|y, t′ + ∆t′) on integration.
Under the usual conditions that p(x, t′|y, t′) is continuous and bounded
in x, t, and t′ for a finite range t− t′ > δ > 0 and that all relevant derivatives
exist, we may rewrite the left-hand side of the last equations
lim∆t′→0
1
∆t′
(p(x, t|y, t′ + ∆t′) − p(x, t′|y, t′)
)=
= lim∆t′→0
1
∆t′
∫dz p(z, t′ + ∆t′|y, t′)
(p(x, t|y, t′) − p(x, t|z, t′)
).
Similarly as in section 3.1.2 we can proceed and derive a differential version
of this class of the Chapman-Kolmogorov equation:
∂p(x, t|y, t′)∂t′
= −∑
i
ai(y, t′)∂p(x, t|y, t′)
∂yi− (3.18a)
− 1
2
∑
i,j
Bij(y, t′)∂2p(x, t|y, t′)
∂yi∂yj+ (3.18b)
+
∫dz W (z|y, t′)
(p(x, t|y, t′) − p(x, t|z, t′)
). (3.18c)
This equation is called the backward differential Chapman-Kolmogorov
equation in contrast to the previously derived forward equation 3.13. In
purely mathematical terms the backward equation is (somewhat) better de-
fined than its forward analogue. The appropriate initial condition is
p(x, t|y, t) = δ(x− y) for all t ,
which expresses the fact that the probability density for finding the particle at
position x at time t if it is at y at the same time is δ(x−y), or in other words
the (classical and non-quantum-mechanical) particle can be simultaneously
at x and y if and only if x ≡ y.
The forward and the backward equations are equivalent to each other.
The basic difference concerns the set of variables which is held fixed. In case
on the forward equation we hold (y, t′) fixed, and consequently solutions
exist for t ≥ t′, so that p(x, t|y, t) = δ(x− y) is an initial condition for the
forward equation. The backward equation has solutions for t′ ≤ t and hence
124 Peter Schuster
it expresses development in t′. Accordingly, p(x, t|y, t) = δ(x− y) is a final
condition rather than an initial condition.
Both differential expressions, the forward and the backward equation, are
useful in their own right. The forward equation gives more directly the values
of measurable quantities as functions of the observed time. Accordingly,
it is more commonly used in modeling experimental systems. The backward
equation finds applications in the study of first passage time or exit prob-
lems in which we search for the probability that a particle leaves a region at
a certain time.
Stochastic Kinetics 125
3.4 Examples of special stochastic processes
In this section we discuss a few stochastic processes which are of special
importance and, at the same time, well understood in terms of mathematical
analysis.
3.4.1 Poisson process
The Poisson process is commonly used to model certain classes of cumulative
random events. These may be, for example, electrons arriving at an anode,
customers entering a shop or telephone calls arriving at a switch board. The
cumulative number of these events is denoted by the random variable N (t).
The probability of arrival is assumed to be λ per unit time, or λ · ∆t = λ
in a time interval of length ∆t. The master equation for this process is
derived from the conditional probability quantities W (·|·) which we denote
as transition probabilities
W (n+ 1|n, t) = α and otherwise W (m|n, t) = 0 ∀ m 6= n+ 1 . (3.19)
Accordingly, the master equation is of the form
∂P (n, t|n′, t′)
∂t= λ
(P (n− 1, t|n′, t′) − P (n, t|n′, t′)
)(3.20)
and represents a one-sided random walk with a probability α for the walker
to step to the right within a unit time interval.
The increase in the probability to have n recorded events at time t is
proportional to the difference in the probabilities of n − 1 and n recorded
events, because of the elementary processes (n − 1 → n) and (n → n + 1)
of a single arrival, which increase or decrease the probability of n events, re-
spectively. We solve the master equation by introducing the time dependent
characteristic function(see equations (2.59) and (2.59’)
):
φ(s, t) = E(eı.ıs n(t)) =
∑
n
P (n, t|n′, t′) exp(ı.ıs n) .
Now we differentiate φ(s, t) with respect to time and obtain by combining it
126 Peter Schuster
with the master equation
∂
∂tφ(s, t) =
∑
n
∂
∂tP (n, t|n′, t′) eı
.ıs n =
= λ∑
n
(P (n− 1, t|n′, t′) − P (n, t|n′, t′)
)eı
.ıs n =
= λ
(∑
n
P (n− 1, t|n′, t′) eı.ıs(n−1) eı
.ıs − P (n, t|n′, t′) eı
.ıs n
)=
= λ(eı
.ıs − 1
)φ(s, t) .
Since, at first, time t is the only explicit variable it is straightforward to
compute the solution:
φ(s, t) = φ(s, 0) exp(λ (eı
.ıs − 1) t
). (3.21)
It is meaningful to assume that there are no electrons or customers at time
t = 0, which implies P (0, 0) = 1, P (n, 0) = 0 for all n 6= 0, and φ(s, 0) = 1.
We are now obtaining the respective solution
P (n, t|0, 0) = e−λt(λt)n
n!= e−α
αn
n!. (3.22)
With λt = α is our old friend, the Poisson distribution (2.74), which has the
expectation value E(N (t)
)= λt.
In case the probability space is discrete, n ∈ N0, and the initial conditions
are simple, for example P (n, 0) = δ(n) or P (n, 0) = δ(n −m) for t = 0, we
shall prefer a simpler notation for the probability of particle numbers
Pn(t) = Prob(N (t) = n) . (3.23)
With the transition probabilities given in (3.19) we can write equations and
solution of the Poisson process in the following way:
dPn(t)
dt= λ
(Pn−1(t) − Pn(t)
)and Pn(t) =
(λt)n
n!e−λt . (3.24)
This notation will conveniently be used for the majority of stochastic pro-
cesses in chemistry and biology.
Stochastic Kinetics 127
As said initially the Poisson process can be viewed also from a slightly
different perspective by considering the arrival times of individual indepen-
dent events as random variables T1, T2, · · · . We shall assume that they are
positive and follow an exponential density %(a, t) = a · e−a·t with a > 0 and∫∞0%(a, t) dt = 1, and thus for each index j we have
P (Tj ≤ t) = 1− e−a·t and thus P (Tj > t) = e−a·t , t ≥ 0 .
Independence of the individual events implies the validity of
P (T1 > t1, . . . , Tn > tn) = P (T1 > t1) · . . . · P (Tn > tn) = e−a(t1+...+tn) ,
which determines the joint probability distribution of the arrival times Tj’s.The expectation value of the inter-arrival times is simply given by E(Tj) = 1
a.
Clearly, the smaller a is, the longer will be the mean inter-arrival time, and
thus a can be addressed as the intensity of flow. In comparison to the previous
derivation we have a ≡ λ. For S0 = 0 and n ≥ 1 we define by the cumulative
random variable
Sn = T1 + . . . Tn =
n∑
j=1
Tj
the waiting time until the nth arrival. The event I = (Sn ≤ t) implies
that the nth arrival has occurred before time t. The connection between the
arrival times and the cumulative number of arrived objects, N (t), is easily
performed and illustrates the usefulness of the dual point of view:
P (I) = P (Sn ≤ t) = P (N (t) ≥ n) .
More precisely, N (t) is determined by the whole sequence (Tj , j ≥ 1), and
depends on the elements ω of the sample space through the individual arrival
times Tj . In fact, we can compute the number of objects exactly by
N (t) = n = Sn ≤ t − Sn+1 ≤ t = Sn ≤ t ≤ Sn+1 .
We may interpret this equation directly: there are exactly n arrivals in [0, t]
if and only if the arrival n occurs before t and the arrival (n+1) occurs after
128 Peter Schuster
t. For each value of t the probability distribution of the random variable
N (t) is given by
P (N (t) = n) = PSn ≤ t − PSn+1 ≤ t , n ∈ N0 ,
where we used already the initial condition S0 = 0. As we have shown before
this distribution of N (t) is the Poisson distribution π(at) = π(λt) = π(α).
3.4.2 Random walk in one dimension
The random walk in one dimension is now a classical and famous problem
in probability theory. A walker moves along a line and takes steps to the
left or to the right with equal probability and length `. The position of the
walker is thus n ` with n being an integer, n ∈ Z. The first problem to solve
is the computation of the probability that the walker reaches a given point
at distance n ` from the origin within a predefined time span. For this goal
we encapsulate the random walk in a master equation and try to find an
analytical solution.
For the master equation we have the following transition probabilities per
unit time:
W (n+1|n, t) = W (n−1|n, t) = ϑ and W (m|n, t) = 0 ∀m 6= n+1, n−1 .(3.25)
Hence, the master equation describing the evolution of the probability for
the walker to be in position n ` at time t when he started at n′ ` at time t′ is
∂P (n, t|n′, t′)
∂t= ϑ
(P (n+ 1, t|n′, t′) + P (n− 1, t|n′, t′) − 2P (n, t|n′, t′)
).
(3.26)
As for the Poisson process, the master equation can be solved by means of
the time dependent characteristic function(see equations (2.59) and (2.59’)
):
φ(s, t) = E(eı.ıs n(t)) =
∑
n
P (n, t|n′, t′) exp(ı.ıs n) . (3.27)
Combining (3.26) and (3.27) yields
∂φ(s, t)
∂t= ϑ
(eı
.ıs + e−ı
.ıs)φ(s, t) .
Stochastic Kinetics 129
Figure 3.6: Probability distribution of the random walk. The figure
presents the conditional probabilities P (n, t|0, 0) of a random walker to be in posi-
tion n ∈ Z at time t for the initial condition to be at n = 0 at time t = t0 = 0. The
n-values of the individual curves are: n = 0 (black), n = 1 (blue), n = 2 (purple),
and n = 3 (red). Parameter choice: ϑ = 1.
Accordingly, the solution for the initial condition n′ = 0 at t′ = 0 is
φ(s, t) = φ(s, 0) exp(ϑ t (eı
.ıs + e−ı
.ıs − 2)
)= exp
(ϑ t (eı
.ıs + e−ı
.ıs − 2)
).
Comparison of the coefficients for individual powers of eı.ıs yields the individ-
ual conditional probabilities:
P (n, t|0, 0) = In(4ϑt) e−2ϑt , n ∈ Z or
Pn(t) = In(4ϑt) e−2ϑt , n ∈ Z for Pn(0) = δ(n) . (3.28)
where the pre-exponential term is written in terms of modified Bessel func-
tions Ik(θ) with θ = 4ϑt, which are defined by
Ik(θ) =
∞∑
j=0
(θ/4)2j+k
j!(j + k)!. (3.29)
It is straightforward to calculate first and second moments from the charac-
teristic function φ(s, t) by means of equation (2.73) and the result is:
E(N (t)
)= n0 and σ2
(N (t)
)= 2ϑ (t− t0) . (3.30)
130 Peter Schuster
The expectation value is constant and coincides with the starting point of
the random walk and the variance increases linearly with time.
In figure 3.6 we illustrate the probabilities Pn(t) by means of a concrete
example. The probability distribution is symmetric for a symmetric initial
condition Pn(0) = δ(n) and hence Pn(t) = P−n(t). For long times the proba-
bility density P (n, t) becomes flatter and flatter and eventually converges to
the uniform distribution over the spatial domain. In case n ∈ Z all probabil-
ities vanish: limt→∞ Pn(t) = 0 for all n.
In the random walk we may also discretize time in the sense that the
walker takes a step precisely every time interval τ . Then, time is a discrete
variable, t = m · τ with m ∈ N0. In addition we assume the random walk to
be symmetric in the sense that steps to the left and to the right are taken
with equal probability and find:
P(n, (m+ 1)τ |n′,m′τ
)=
1
2
(P (n+ 1,mτ |n′,m′τ) + P (n− 1,mτ |n′,m′τ)
).
For small τ the continuous and the discrete process represent approximations
to each other with t = mτ , t′ = m′ τ , and ϑ = 12τ−1. The transition
probability per unit time, ϑ, in the master equation model corresponds to
one half of the inverse waiting time, τ , in the discrete model. Again, it is
straightforward to apply the same generating function – with m = t/τ –
which leads to the solution
φ(s,m) =
(1
2
(eı
.ıs + e−ı
.ıs))m
and finally to the probability distribution
P (n,m τ |0, 0) =
(1
2
)mm!
((m− n2
)!(m+ n
2
)!
)−1
, (3.31)
which is also known as the Bernoulli distribution.
It is also straightforward to consider the continuous time random walk
in the limit of continuous space. This is achieved by setting the distance
traveled to x = n ` and performing the limit `→ 0. For that purpose we can
start from the characteristic function of the distribution in x,
φ(s, t) = E(eı
.ısx)
= φ(`s, t) = exp(ϑ (eı
.ı`s + e−ı
.ı`s − 2)
),
Stochastic Kinetics 131
and take the limit of infinitesimally small steps, lim `→ 0:
lim`→0
exp(ϑ t (eı
.ı`s + e−ı
.ı`s − 2) t
)=
= lim`→0
exp(ϑ t (−`2s2 + . . .)
)= exp(−s2Dt/2) ,
where we used the definition D = 2 lim`→0(`2d). Since this is the character-
istic function of the normal distribution we have for the density (2.79):
p(x, t|0, 0) =1√
2πDtexp(−x2/2Dt
). (2.79)
We could also have proceeded directly from equation (3.26) and expanded
the right-hand side as a function of x up to second order in ` which gives
∂p(x, t|0, 0)
∂t=
D
2
∂2
∂x2p(x, t|0, 0) , (3.32)
where D stands again for 2 lim`→0(`2ϑ). This equation will be considered in
detail in the next section 3.4.3, which is dealing with the Wiener process.
3.4.3 Wiener process and the diffusion problem
The Wiener process named after the American mathematician and logician
Norbert Wiener is fundamental in many aspects. It is synonymous to Brown-
ian motion or white noise and describes among other things random fluctua-
tions caused by thermal motion. From the point of view of statistic processes
the Wiener process is the solution of the Fokker-Planck equation in one ran-
dom variable, W(t) and the probability
P (W(t) ≤ w′) =
∫ w′
−∞p(w, t) dw ,
with drift coefficient zero and diffusion coefficient D = 1. This equation
reads:∂p(w, t|w0, t0)
∂t=
1
2
∂2
∂w2p(w, t|w0, t0) . (3.33)
We solve for the initial condition on the conditional probability p(w, t0|w0, t0) =
δ(w − w0) by using the characteristic function
φ(s, t) =
∫dw p(w, t|w0, t0) exp(ı
.ıs w) ,
132 Peter Schuster
which fulfils ∂φ(s, t)/∂t = −12s2 φ(s, t) as can be shown by applying integra-
tion in parts twice and making use that p(w, t|w0, t0) like every probability
density has to vanish in the limits w → ±∞ and the same is true for the first
partial derivative (∂p/∂w). Next we compute the characteristic function by
integration:
φ(s, t) = φ(s, t0) · exp(−1
2s2 (t− t0)
). (3.34)
With the initial condition φ(s, t0) = exp(ı.ıs w0) we complete the characteris-
tic function
φ(s, t) = exp(ı.ıs w0 −
1
2s2 (t− t0)
)(3.35)
and eventually obtain the probability density through inversion of Fourier
transformation
p(w, t|w0, t0) =1√
2π (t− t0)exp
(−(w − w0)
2
2 (t− t0)
). (3.36)
The density function is a normal distribution with an expectation value of
E(W(t)
)= w0 = ν and variance of E
((W(t) − w0
)2)= t − t0 = σ(t)2, so
that an initially sharp distribution spreads in time as illustrated in figure 3.7.
The Wiener process may be characterized by three important features:
(i) irregularity of sample path,
(ii) non-differentiability of sample path, and
(iii) independence of increment.
Although the mean value E(W(t)
)is well defined and independent of time,
w0, in the sense of a martingale, the mean square E(W(t)2
)becomes infinite
as t→∞. This implies that the individual trajectories, W(t), are extremely
variable and diverge after short time (see, for example, the three trajectories
of the forward equation in figure 3.5). We shall encounter such a situation
with finite mean but diverging variance also in biology in the case of multi-
plication as a birth and death process (chapter 5): Although the mean is well
defined it looses its value in practice when the standard deviation becomes
much larger than the expectation value.
Stochastic Kinetics 133
Figure 3.7: Probability density of the Wiener process. In the figure we
show the conditional probability density of the Wiener process which is identical
with the normal distribution (figure 2.17),
p(w, t|w0, t0) = exp(−(w −w0)
2/(2(t− t0)
))/√
2π(t− t0).The values used are w0 = 5 and t − t0 =0.01 (red),0.5 (purple), 1.0 (violet), 2.0
(blue). The initially sharp distribution, p(w, t|w0, t0) = δ(w − w0) spreads with
increasing time until it becomes completely flat in the limit t→∞.
Continuity of sample paths of the Wiener process has been demonstrated
already in subsection 3.1.2. In order to show that the trajectories of the
Wiener process are not differentiable we consider the probability
P
(∣∣∣∣W(t+ h)−W(t)
h
∣∣∣∣ > k
)= 2
∫ ∞
k h
dw1√2πh
exp(−w2/2h) ,
which can be readily computed from the conditional probability (3.36). In
the limit h→ 0 the integral becomes 12
and the probability is one. The result
implies that, no matter what finite k we choose, |(W(t+ h)−W(t)
)/h| will
almost certainly be greater than this value. In other words, the derivative of
the Wiener process will be infinite with probability one and the sample path
is not differentiable.
Diffusion is closely related to the Wiener process and hence it is important
to proof statistical independence of the increments of W(t). Since we are
134 Peter Schuster
dealing with a Markov process we can write the joint probability
p(wn, tn;wn−1, tn−1;wn−2, tn−2; . . . ;w0, t0) =
n−1∏
i=0
p(wi+1, ti+1|wn, tn) p(w0, t0) .
Now we express the conditional probabilities in terms of (3.36) and find
p(wn, tn;wn−1, tn−1;wn−2, tn−2; . . . ;w0, t0) =
n−1∏
i=0
exp(− (wi+1−wi)2
2(ti+1−ti)
)
√2π(ti+1 − ti)
p(w0, t0) .
We simplify notation by writing new variables ∆Wi ≡ W(ti)−W(ti−1) and
∆ti ≡ ti − ti−1. The joint probability density for W (tn) becomes now,13
p(∆wn; ∆wn−1; ∆wn−2; . . . ; ∆w1; ∆w0) =
n∏
i=1
exp(−∆w2
i
2 ∆ti
)
√2π∆ti
p(w0, t0) ,
where the factorization shows independence of the variables ∆Wi of each
other and of W (t0).
The Wiener process is readily extended to higher dimension. For the
multivariate Wiener process, defined as
~W(t) =(W1(t), . . . ,Wn(t)
)(3.37)
satisfying the Fokker-Planck equation
∂p(w, t|w0, t0)
∂t=
1
2
∑
i
∂2
∂w2i
p(w, t|w0, t0) . (3.38)
The solution is a multivariate normal density
p(w, t|w0, t0) =1√
2π (t− t0)exp
(−(w−w0)
2
2 (t− t0)
). (3.39)
with mean E(~W(t)
)= w0 and variance-covariance matrix
(Σ)ij
= E((Wi(t)− w0i
)(Wj(t)− w0j
))= (t− t0) δij ,
where all off-diagonal elements – the covariances – are zero. Hence, Wiener
processes along different Cartesian coordinates are independent.
13Since we shall refer frequently to the Wiener process in the forthcoming section 3.5,
the notationW(t) will be replaced by W (t) and the expectation value E(·) by 〈·〉 for better
appearance.
Stochastic Kinetics 135
3.4.4 Ornstein-Uhlenbeck process
All three examples of stochastic processes discussed so far had one feature in
common: All individual trajectories diverged to +infinity (Poisson-Process)
or ±infinity (random walk and Wiener process) in the long time limit and
no stationary solutions exist. In this subsection we shall consider a general
stochastic process that allows for the approach to a stationary solution, the
Ornstein-Uhlenbeck process named after the American physicists George Uh-
lenbeck and Leonard Ornstein [66]. Many further examples of processes with
stationary solutions will follow in chapters 4 and 5. The Ornstein-Uhlenbeck
process is obtained through addition of a drift term to the Wiener process:
∂p(x, t|x0, 0)
∂t=
∂
∂x
(kx p(x, t|x0, 0)
)+
1
2D
∂2
∂x2p(x, t|x0, 0) . (3.40)
In order to solve equation (3.40) for the probability density we make use of
the characteristic function that fulfils the equation
φ(s) =
∫ ∞
−∞eı
.ı s x p(x, t|x0, 0) dx , (3.41)
which is converted into the partial differential equation
∂φ
∂t+ k s
∂φ
∂s=
1
2Ds2 φ .
For the boundary condition p(x, 0|x0, 0) = δ(x − x0) one can calculate the
solution for the characteristic function
φ(s, t) = exp
(−Ds
2
4k
(1− e−2kt
)+ ı
.ısx0 e
−kt). (3.42)
The characteristic function corresponds to a normal distribution and can be
used to calculate the moments of the Ornstein-Uhlenbeck process:
E(X (t)
)= µ = x0 exp(−kt) and
σ2(X (t)
)= µ2 =
D
2k
(1− exp(−2kt)
).
(3.43)
The expectation value E(X (t)
)decreases exponentially from X (0) = x0 to
limt→∞E(X (t)
)= µ = 0. In other words, all trajectories start from the same
136 Peter Schuster
point X (0) = x0 and converge to a final distribution around the mean µ = 0.
Accordingly the variance is initially zero, σ2(X (0)
)= 0 and increases with
time until it reaches the value limt→∞ σ2(X (t)
)= D
/(2k). This behavior is
in contrast to the previously discussed stochastic processes, in particular in
contrast to the Wiener process, where the variance diverges.
The stationary solution of the Ornstein-Uhlenbeck process, p(x), can be
readily computed from (3.40) by putting ∂p(x, t|x0, 0)/∂t = 0, which yields
the differential equation
d
dx
(kx p(x) +
1
2D
dp(x)
dx
)
and is a Gaussian with mean x = 0 and variance σ2 = D/2k
p(x) =
√k
πDexp
(−kx
2
D
). (3.44)
An explicit solution can be derived by means of stochastic differential equa-
tions (see section 3.5).
One additional point concerning the probability density and the mean of
the Ornstein-Uhlenbeck process is worth to be stressed: The mean of the sta-
tionary can easily be shifted by replacing kx by k(x−m) in equation (3.40)
where m = E(X), the mean of the stationary distribution. The drift term –
the first term on the right-hand side of the equation – prevents the distribu-
tion to become completely flat in the long-time limit but still the distribution
extends from −∞ to +∞ and there are individual trajectories (of measure
zero) that diverge.
Stochastic Kinetics 137
3.5 Stochastic differential equations
The idea of stochastic differential equations goes back to the French math-
ematician Paul Langevin who conceived an equation named after him that
allows for the introduction of random fluctuations into conventional differ-
ential equations [67]. The idea was to find a sufficiently simple approach
to model successfully Brownian motion. In its original form the Langevin
equation
md2r
dt2 = − γ dr
dt+ ξ(t) (3.45)
describes the motion of a Brownian particle where r(t) and v(t) = dr/dt
are position and velocity of the particle. Often the Langevin equation is
formulated in terms of the velocities and then is written in the more familiar
formdv(t)
dt= − γ
mv(t) +
1
mξ(t) . (3.45’)
The parameter γ = 6π η r is the friction coefficient according to Stokes law
with η being the viscosity coefficient of the medium and r the size of the
particle, m is the particle mass, and ξ(t) a fluctuating random force. The
analogy of (3.45) to Newton’s equation of motion is evident: The determinis-
tic force, f(x) = −(∂V/∂x) with V (x) being the potential energy, is replaced
by ξ(t).
The forthcoming discussion of stochastic differential equations follows the
presentation by Crispin Gardiner [6, pp.77-96]. In the literature one can find
an enormous variety of more detailed treatises. We mention here only the
monograph [68] and two books that are available on the internet: [61, 69].
3.5.1 Derivation of the stochastic differential equation
Generalization of equation (3.45) yields
dx
dt= a(x, t) + b(x, t) ξ(t) , (3.46)
where x is the variable under consideration, a(x, t) and b(x, t) are functions
defined by the model investigated, and ξ(t) is a rapidly fluctuating term.
From the mathematical point of view we require statistical independence for
138 Peter Schuster
ξ(t) and ξ(t′) iff t 6= t′ and furthermore we assume 〈ξ(t)〉 = 0 since any drift
term can be absorbed in a(x, t), and cast all requirements in the condition
〈ξ(t) ξ(t′)〉 = δ(t− t′) . (3.47)
This assumption has the consequence that σ2(ξ(t)
)is infinite and leads to
the idealized concept of white noise.
The differential equation (3.46) makes only sense if it can be integrated
and hence an integral of the form
u(t) =
∫ t
0
ξ(τ) dτ
exists. The assumption that u(t) is a continuous function of time has the
consequence that u(t) has the Markov property, which can be proven by
splitting the integral
u(t′) =
∫ t
0
ξ(τ) dτ +
∫ t′
t
ξ(τ ′) dτ ′ = limε→0
(∫ t−ε
0
ξ(τ) dτ
)+
∫ t′
t
ξ(τ ′) dτ ′
and hence for every ε > 0 the ξ(τ) in the first integral is independent of the
ξ(τ ′) in the second integral. By continuity u(t) and u(t′)−u(t) are statistically
independent in the limit ε → 0, and further u(t′) − u(t) is independent of
all u(t′′) with t′′ < t. In other words, u(t′) is completely determined in
probabilistic terms by the value u(t) and no information on any past values
is required: u(t) is Markovian.
According to the differential Chapman-Kolmogorov equation (3.13) and
because of the continuity of u(t), it must be possible to find a Fokker-Planck
equation for the description of u(t) (see subsection 3.2.2), and we can com-
pute the drift and diffusion coefficient(with u(t) = u):
⟨(u(t+ ∆t)− u|(u, t)
)⟩=
⟨∫ t+∆t
t
ξ(τ) dτ
⟩= 0 and
⟨((u(t+ ∆t)− u
)2|(u, t))⟩
=
∫ t+∆t
t
dτ
∫ t+∆t
t
〈ξ(τ)ξ(τ ′)〉 dτ ′ =
=
∫ t+∆t
t
dτ
∫ t+∆t
t
δ(τ − τ ′) dτ ′ = ∆t ,
Stochastic Kinetics 139
and we obtain for the drift and diffusion coefficient
A(u, t) = lim∆t→0
⟨(u(t+ ∆t)− u|(u, t)
)⟩
∆t= 0
B(u, t) = lim∆t→0
⟨((u(t+ ∆t)− u
)2|(u, t))⟩
∆t= 1 .
(3.48)
Accordingly, the Fokker-Planck equation we are looking is that of the Wiener
process and we have
∫ t
0
ξ(τ) dτ = u(t) = W (t) .
Considering the consequences of equation (3.48) we are left with the paradox
that the integral of ξ(t) is W (t), which is continuous but not differentiable,
and hence the Langevin equation (3.45) and the stochastic differential equa-
tion (3.46) do not exist in strict mathematical terms. The corresponding
integral equation,
x(t) − x(0) =
∫ t
0
a(x(τ), τ
)dτ +
∫ t
0
b(x(τ), τ
)ξ(τ) dτ , (3.49)
however, is accessible to consistent interpretation. Eventually, we make the
relation to the Wiener process more visible by using
dW (t) ≡ W (t+ dt) − W (t) = ξ(t) dt
and obtain:
x(t) − x(0) =
∫ t
0
a(x(τ), τ
)dτ +
∫ t
0
b(x(τ), τ
)dW (τ) . (3.49’)
The second integral is a stochastic Stieltjes integral the evaluation of which
will be discussed in the next subsection 3.5.2.
Finally, we remark that we have presented here Crispin Gardiner’s ap-
proach and assumed continuity of the function u(t). The result was that ξ(t)
follows the normal distribution. An alternative approach starts out from the
assumption of the Gaussian nature of the probability density ξ(t). It is defi-
nitely a matter of taste, which assumption is preferred but the requirement
of continuity seems more natural.
140 Peter Schuster
t0 t1 t2 t3 t4 t5 t6 ttn-1tn-2tn-3
tnt6t5t4t3t2t1tn-1tn-2
G t( )
Figure 3.8: Stochastic integral. The time interval [t0, t] is partitioned into n
segments and an intermediate point τi is defined in each segment: ti−1 ≤ τi ≤ ti.
3.5.2 Stochastic integration
In this subsection we define the stochastic integral and present practical
recipes for integration (for more details see [70]). Let G(t) be an arbitrary
function of time and W (t) the Wiener process, then the stochastic integral
is defined as a Riemann-Stieltjes integral (2.35) of the form
I(t, t0) =
∫ t
t0
G(τ) dW (τ) . (3.50)
The integral is partitioned into n subintervals, which are separated by the
points ti: t0 ≤ t1 ≤ t2 ≤ · · · ≤ tn−1 ≤ t (figure 3.8). Intermediate points
are defined within the subintervals ti−1 ≤ τi ≤ ti for the evaluation of the
function G(τi) and as we shall see the value of the integral depends on the
position of the τ ’s within the subintervals.
The stochastic integral∫ t0G(τ) dW (τ) is defined as the limit of the partial
sums
Sn =
n∑
i=1
G(τi)(W (ti)−W (ti−1)
)
Stochastic Kinetics 141
and it is not difficult to realize that the integral is different for different choices
of the intermediate point τi. As an example we consider the important case
G(τi) = W (τi):
〈Sn〉 =
⟨n∑
i=1
W (τi)(W (ti)−W (ti−1)
)⟩=
=
n∑
i=1
(min(τi, ti) − min(τi, ti−1)
)=
n∑
i=1
(τi − ti−1) .
Next we choose the same intermediate position τ for all subintervals ’i’
τi = α ti + (1− α) ti−1 with 0 ≤ α ≤ 1 (3.51)
and obtain for the sum
〈Sn〉 =
n∑
i=1
(ti − ti−1)α = (t− t0)α .
Accordingly, the mean value of the integral may adopt any value between
zero and (t− t0) depending on the choice of the position of the intermediate
points as expressed by the parameter α.
Ito stochastic integral. The most frequently used definition of the stochas-
tic integral is due to the Japanese mathematician Kiyoshi Ito [71, 72]. The
choice α = 0 or τi = ti−1 defines the Ito stochastic integral of a function G(t)
to be∫ t
t0
G(τ) dW (τ) = limn→∞
n∑
i=1
G(ti−1)(W (ti) − W (ti−1)
), (3.52)
where the limit is taken as the mean square limit (2.32).
As an example we compute the previously discussed integral∫ tt0W (τ) dW (τ)
and find for the sum Sn where we abbreviate W (ti) by Wi:
Sn =n∑
i=1
Wi−1 (Wi − Wi−1) ≡n∑
i=1
Wi−1 ∆Wi =
=1
2
n∑
i=1
((Wi−1 + ∆Wi)
2 − W 2i−1 − ∆W 2
i
)=
=1
2
(W (t)2 − W (t0)
2)− 1
2
n∑
i=1
∆W 2i ,
142 Peter Schuster
where the second line results from: 2∑ab = (a + b)2 − a2 − b2. It is now
necessary to calculate the mean square limit of the second term in the last
line of the equation. For a finite sum we have the expectation values
⟨n∑
i=1
∆W 2i
⟩=∑
i
⟨(Wi −Wi−1)
2⟩
=∑
i
(ti − ti−1) = t − t0 , (3.53)
where the second equality results from the Gaussian nature of the probability
density (3.36):⟨(Wi−Wj)
2⟩
= 〈W 2i 〉−
⟨W 2j
⟩= σ2(Wi)−σ2(Wj) = ti− tj .14
Next we calculate the expectation of the mean square deviation in (3.53):
⟨( n∑
i=1
(Wi −Wi−1)2 − (t− t0)
)2⟩
=
⟨∑
i
(Wi −Wi−1)4 +
+ 2∑
i<j
(Wi −Wi−1)2(Wj −Wj−1)
2−
− 2(t− t0)∑
i
(Wi −Wi−1)2 +
+ (t− t0)2
⟩.
We start the evaluation with the second line and make again use of the
independence of Gaussian variables:
⟨(Wi −Wi−1)
2(Wj −Wj−1)2⟩
= (ti − ti−1)(tj − tj−1) .
According to (2.84) the fourth moment of a Gaussian variable can be ex-
pressed in terms of the variance
⟨(Wi −Wi−1)
4⟩
= 3⟨(Wi −Wi−1)
2⟩2
= 3 (ti − ti−1)2
14For the derivation of this relation we used the fact that the stochastic variables of the
Wiener process at different times are uncorrelated, 〈WiWj〉 = 0 and the variance is
σ2(Wi) =⟨W 2
i
⟩− 〈Wi〉2 =
⟨W 2
i
⟩− ν2.
Stochastic Kinetics 143
and insertion into the expectation value eventually yields:⟨( n∑
i=1
(Wi −Wi−1)2 − (t− t0)
)2⟩
=
= 2∑
i
(ti − ti−1)2 +
(∑
i
(ti − ti−1)− (t− t0))(∑
j
(tj − tj−1)− (t− t0))
=
= 2∑
i
(ti − ti−1)2 → 0 as n→∞ .
Accordingly, limn→∞∑
i(Wi −Wi−1)2 = t− t0 in the mean square limit.
Eventually, we obtain for the Ito stochastic integral of the Wiener process:
∫ t
t0
W (τ) dW (τ) =1
2
(W (t)2 −W (t0)
2 − (t− t0)). (3.54)
We remark that the Ito integral differs from the conventional Riemann-
Stieltjes integral where the term t− t0 is absent. An illustrative explanation
for this unusual behavior of the limit of the sum Sn is the fact that the quan-
tity |W (t+ ∆t)−W (t)| is almost always of the order√t and hence – unlike
in ordinary integration – the terms of second order in ∆W (t) do not vanish
on taking th e limit.
It is also worth noticing that the expectation value of the integral (3.54)
vanishes,
⟨∫ t
t0
W (τ) dW (τ)
⟩=
1
2
(⟨W (t)2
⟩−⟨W (t0)
2⟩− (t− t0)
)= 0 , (3.55)
since the intermediate terms 〈Wi−1∆Wi〉 vanish because ∆Wi and Wi−1 are
statistically independent.
Semimartingales (subsection 3.1.1), in particular local martingales are the
most common stochastic processes that allow for straightforward application
of Ito’s formulation of stochastic calculus.
144 Peter Schuster
Stratonovich stochastic integral. As already outlined, the value of a
stochastic integral depends on the particular choice of the intermediate points,
τi. The Russian physicist and engineer Ruslan Leontevich Stratonovich [73]
and the American mathematician Donald LeRoy Fisk [74] developed simul-
taneously an alternative approach to Ito’s stochastic integration, which is
commonly called Stratonovich integration. The intermediate points are cho-
sen such that the unconventional term (t − t0) does not appear any more.
The integrand as a function of W (t) is evaluated precisely in the middle,
namely at the value(ti − ti−1
)/2 and it is straightforward to show that that
the mean square limit converges to the expression for the integral in conven-
tional calculus
∫ t
t0
W (τ) dW (τ) = limn→∞
n∑
i=1
W (ti) +W (ti−1)
2
(W (ti)−W (ti−1)
)=
=1
2
(W (t)2 −W (t0)
2).
(3.56)
It is important to stress that a stand-alone Stratonovich integral has no
relationship to an Ito integral or, in other words, there is no connection
between the two classes of integrals for an arbitrary function G(t). Only
when the stochastic differential equation is known to which the two integrals
refer a formula can be derived that relates one integral to the other.
Nonanticipating functions. The concept of an nonanticipating or adap-
tive process has been discussed in subsection 3.1.1. Here we shall require
this property in order to be able to solve certain classes of Ito stochastic
integrals. The situation we are referring to requires that all functions can be
expressed as functions or functionals15 of the Wiener process W (t) by means
of a stochastic differential or integral equation of the form
x(t) − x(t0) =
∫ t
t0
a(x(τ), τ
)dτ +
∫ t
t0
b(x(τ), τ
)dW (τ) . (3.49’)
A function G(t) is nonanticipating (with respect to t) if G(t) is probabilis-
tically independent of(W (s) −W (t)
)for all s and t with s > t. In other
15A function assigns a value to the argument of the function, x0 → f(x0) whereas a
functional relates a function to the value of a function, f → f(x0).
Stochastic Kinetics 145
words, G(t) is independent of the behavior of the Wiener process in the future
s > t. This is a natural and physically reasonable requirement for a solution
of equation (3.49”) because it boils down to the condition that x(t) involves
W (τ) only for τ ≤ t. Examples of important nonanticipating functions are
(i) W (t) ,
(ii)∫ tt0F(W (τ)
)dτ ,
(iii)∫ tt0F(W (τ)
)dW (τ) ,
(iv)∫ tt0G(τ) dτ , when G(t) itself is nonanticipating, and
(v)∫ tt0G(τ) dW (τ), when G(t) itself is nonanticipating.
The items (iii) and (v) depend on the fact that in Ito’s version the stochastic
integral is defined as the limit of a sequence in which G(τ) and W (τ) are
involved exclusively for τ < t.
Three reasons for the specific discussion of nonanticipating functions are
important:
1. Many results can be derived that are only valid for nonanticipating func-
tions,
2. nonanticipating function occur naturally in situations, in which causality
can be expected in the sense that the future cannot affect the presence, and
3. the definition of stochastic differential equations requires nonanticipating
functions.
In conventional calculus we never encounter situations in which the future
acts back on the presence or even on the past.
Several relations are useful and required in Ito calculus:
dW (t)2 = dt ,
dW (t)2+n = 0 for n > 0 ,
dW (t) dt = 0 ,
146 Peter Schuster
∫ t
t0
W (τ)n dW (τ) =1
n+ 1
(W (t)n+1 −W (t0)
n+1)− n
2
∫ t
t0
W (τ)n−1 dτ ,
df(W (t), t
)=
(∂f
∂t+
1
2
∂2f
∂W 2
)dt +
∂f
∂WdW (t) ,
⟨∫ t
t0
G(τ) dW (τ)
⟩= 0 , and
⟨∫ t
t0
G(τ) dW (τ)
∫ t
t0
H(τ) dW (τ)
⟩=
∫ t
t0
〈G(τ)H(τ)〉 dτ
The expressions are easier to memorize when we assign a dimension [t1/2] to
W (t) and discard all terms of order t1+n with n > 0.
At the end of this subsection we are left with the dilemma that the Ito
integral is mathematically and technically most satisfactory but the more
natural choice would be the Stratonovich integral that enables the usage of
conventional calculus. In addition, the noise term ξ(t) in the Stratonovich
interpretation can be real noise with finite correlation time whereas the ide-
alized white noise assumed as reference in Ito’s formalism gives rise to diver-
gence of variances and correlations.
3.5.3 Integration of stochastic differential equations
A stochastic variable x(t) is consistent with an Ito stochastic differential
equation (SDE)
dx(t) = a(x(t), t
)dt + b
(x(t), t
)dW (t) (3.46’)
if for all t and t0 the integral equation (3.49’) is fulfilled. Time is ordered,
t0 < t1 < t2 < · · · < tn = t ,
and the time axis may be assumed to be split into (equal or unequal) incre-
ments, ∆ti = ti+1 − ti. We visualize a particular solution curve of the SDE
for the initial condition x(t0) = x0 by means of a discretized version
xi+1 = xi + a(xi, ti) ∆ti + b(xi, ti) ∆Wi , (3.49”)
wherein xi = x(ti), ∆ti = ti+1 − ti, and ∆Wi = W (ti+1)−W (ti). Figure 3.9
illustrates the partitioning of the stochastic process into a deterministic drift
Stochastic Kinetics 147
Figure 3.9: Stochastic integration. The figure illustrates the Cauchy-Euler
procedure for the construction of an approximate solution of the stochastic dif-
ferential equation (3.46’). The stochastic process consists of two different compo-
nents: (i) the drift term, which is the solution of the ODE in absence of diffusion
(red; b(xi, ti) = 0) and (ii) the diffusion term representing a Wiener process W (t)
(blue; a(xi, ti) = 0). The superposition of the two terms gives the stochastic pro-
cess (black). The two lower plots show the two components in separation. The
increments of the Wiener process ∆Wi are independent or uncorrelated. An ap-
proximation to a particular solution of the stochastic process is constructed by
letting the mesh size approach zero, lim ∆t→ 0.
component, which is the discretized solution curve of the ODE obtained by
setting b(x(t), t
)= 0 in equation (3.49”) and stochastic diffusion compo-
nent, which is a random Wiener process W (t) that is obtained by setting
a(x(t), t
)= 0 in the SDE. The increment of the Wiener process in the
stochastic term, ∆Wi, is independent of xi provided (i) x0 is independent
of all W (t)−W (t0) for t > t0 and (ii) a(x, t) is a nonanticipating function of
t for any fixed x. Condition (i) is tantamount to the requirement that any
random initial condition must be nonanticipating.
148 Peter Schuster
A particular solution to equation (3.49”) is constructed by letting the
mesh size go to zero, limn → ∞ implying ∆t → 0. In the construction
of an approximate solution xi is always independent of ∆Wj for j ≥ i as
we verify easily that by inspection of (3.49”). Uniqueness of solutions refers
to individual trajectories in the sense that a particular solution is uniquely
obtained for a given sample function W(t) of the Wiener Process W (t). The
existence of a solution is defined for the whole ensemble of sample functions:
A solution of equation (3.49”) exists if – with probability one – a particular
solution exists for any choice of sample function W(t) of the Wiener process.
Existence and uniqueness of solutions to Ito stochastic differential equa-
tions can be proven for two conditions [68, pp.100-115]: (i) the Lipschitz
condition and (ii) the growth condition. Existence and uniqueness of a
nonanticipating solution x(t) of an Ito SDE within the time interval [t0, t]
require:
(i) Lipschitz condition: there exists a κ such that
|a(x, τ)− a(y, τ)| + |b(x, τ)− b(y, τ)| ≤ κ |x− y|for all x and y and τ ∈ [t0, t], and
(ii) growth condition: a κ exists such that for all τ ∈ [t0, t]
|a(x, τ)|2 + |b(x, τ)|2 ≤ κ2 (1 + |x|2) .
The Lipschitz condition is almost always fulfilled for stochastic differential
equations in practice, because in essence it is a smoothness condition. The
growth condition, however, may often be violated in abstract model equa-
tions, for example, when a solution explodes to infinity. In other words, the
value of x may become infinite in a finite (random) time. We shall encounter
such situations in the applied chapters 4 and 5. As a matter of fact this is
typical model behavior since no population or spatial variable can approach
infinity at finite times in a finite world.
Several other properties known to apply to solutions of ordinary differen-
tial equations apply without major modifications to SDE’s: Continuity in the
dependence on parameters and boundary conditions as well as the Markov
property (for proofs we refer to [68]).
Stochastic Kinetics 149
3.5.4 Changing variables in stochastic differential equations
In order to see the effect of a change of variables in Ito’s stochastic differential
equations we consider an arbitrary function: x(t)⇒ f(x(t)
). We start with
the simpler single variable case and then introduce the multidimensional
situation.
Single variable case. Making use of our previous results on nonanticipating
functions we expand df((t))
up to second order in dW (t):
f(x(t)
)= f
(x(t) + dx(t)
)− f
(x(t)
)=
= f ′(x(t)
)dx(t) +
1
2f ′′(x(t)
)dx(t)2 + · · · =
= f ′(x(t)
) (a(x(t), t
)dt + b
(x(t), t
))dW (t) +
1
2f ′′(x(t)
)b(x(t), t
)2dW (t)2 ,
where all terms higher than second order have been neglected. Introducing
dW (t)2 = dt into the last line of this equation we obtain Ito’s formula:
df(x(t)
)=(a(x(t), t
)f ′(x(t)
)+
1
2b(x(t), t
)2f ′′(x(t)
))dt+
+ b(x(t), t
)f ′(x(t), t
)dW (t) .
(3.57)
It is worth noticing that Ito’s formula and ordinary calculus lead to different
results unless f(x(t)
)is linear in x(t) and thus f ′′(x(t)
)vanishes.
Many variable case. The application of Ito’s formalism to many dimen-
sions, in general, becomes very complicated. The most straightforward sim-
plification is the extension of Ito calculus to the multivariate case by making
use of the rule that dW (t) is an infinitesimal of order t1/2. Then we can
show that the following relations hold for an n-dimensional Wiener process
W(t) =(W1(t),W2(t), . . . ,Wn(t)
):
dWi(t) dWj(t) = δij dt ,
dWi(t)2+N = 0 , (N > 0) ,
dWi(t) dt = 0 ,
dt1+N = 0 , (N > 0) .
150 Peter Schuster
The first relation is a consequence of the independence of increments of
Wiener processes along different coordinate axes, dWi(t) and dWj(t). Making
use of the drift vector A(x, t) and the diffusion matrix B(x, t) the multidi-
mensional stochastic differential equation
dx = A(x, t) dt + B(x, t) dW(t) . (3.58)
Following Ito’s procedure we obtain for an arbitrary well-behaved function
f(x(t)
)the result
df(x) =
(∑
i
Ai(x, t)∂
∂xif(x) +
+1
2
∑
i,j
(B(x, t) · B′(x, t)
)ij
∂2
∂xi∂xjf(x)
)dt+
+∑
i,j
Bij∂
∂xif(x) dWj(t) .
(3.59)
Again we observe the additional term introduced through the definition of
the Ito integral.
3.5.5 Fokker-Planck and stochastic differential equations
Next we calculate the expectation value of an arbitrary function f(x(t)
)by
means of Ito’s formula and begin with a single variable:⟨df(x(t)
)⟩
dt=
⟨df(x(t)
)
dt
⟩
=d
dt
⟨f(x(t)
)⟩=
=
⟨
a(x(t), t
)∂f(x(t)
)
∂x+
1
2b(x(t), t
)∂2f(x(t)
)
∂x2
⟩
.
The stochastic variable x(t) has the conditional probability density p(x, t|x0, t0)
and hence we can compute the expectation value by integration – whereby
we simplify the notation f(x) ≡ f(x(t)
)and p(x, t) ≡ p(x, t|x0, t0):
d
dt〈f(x)〉 =
∫dx f(x)
∂
∂tp(x, t) =
=
∫dx
(a(x, t)
∂f(x)
∂x+
1
2b(x, t)2 ∂
2f(x)
∂x2
)p(x, t)
Stochastic Kinetics 151
The further derivation follows the procedure we have used in case of the
differential Chapman-Kolmogorov equation in subsection 3.1.2 – in particular
integration by parts, neglect of surface terms and so on – and we obtain∫
dx f(x)∂
∂tp(x, t) =
∫dx f(x)
(− ∂
∂x
(a(x, t) p(x, t)
)+
1
2
∂2
∂x2
(b(x, t)2 p(x, t)
)).
Since the choice of a function f(x) has been arbitrary we can drop it now
and obtain finally a forward Fokker-Planck type equation
∂p(x, t|x0, t0)
∂t= − ∂
∂x
(a(x, t) p(x, t|x0, t0)
)+
+1
2
∂2
∂x2
(b(x, t)2 p(x, t|x0, t0)
).
(3.60)
The probability density p(x, t) thus obeys an equation that is completely
equivalent to the equation for a diffusion process characterized by a drift coef-
ficient a(x, t) and a diffusion coefficient b(x, t) as derived from the Chapman-
Kolmogorov equation. Hence, Ito’s stochastic differential equation provides
indeed a local approximation to a (drift and) diffusion process in probability
space. An example comparing a change form Cartesian in polar coordinates
in an Ito stochastic differential equation and in the corresponding Fokker-
Planck equation is shown in subsubsection 3.6.3.1.
The extension to the multidimensional case based on Ito’s formula (3.59)
is straightforward, and we obtain for the conditional probability density
p(x, t|x0, t0 ≡ p) the Fokker-Planck equation:
∂p
∂t= −
∑
i
∂
∂xi
(Ai(x, t) p
)+
1
2
∑
i,j
∂
∂xi
∂
∂xj
((B(x, t) · B′(x, t)
)i,jp
). (3.61)
Here, we derive one additional property, which is relevant in practice. The
stochastic differential equation,
dx = A(x, t) dt + B(x, t) dW(t) , (3.58)
is mapped into a Fokker-Planck equation that depends only on the matrix
product B ·B′ and accordingly, the same Fokker-Planck equation arises from
all matrices B that give rise to the same product B · B′. Thus, the Fokker-
Planck equation is invariant to a replacement B⇒ B ·S when S is an orthog-
onal matrix: S · S′ = I. If S fulfils the orthogonality relation it may depend
on x(t), but for the stochastic handling it has to be nonanticipating.
152 Peter Schuster
Now, we want to proof the redundancy directly from the SDE and define
a transformed Wiener process
dV(t) = S(t) dW(t) .
The random vector V(t) is a normalized linear combination of Gaussian
variables dWi(t) and S(t) in nonanticipating, and accordingly, dV (t) is it-
self Gaussian with the same correlation matrix. Averages dWi(t) to various
powers and taken at different times factorize and the same is true for the
dVi(t). Accordingly, the infinitesimal elements dV(t) are increments of a
Wiener process: The orthogonal transformation mixes sample path without,
however, changing the stochastic nature of the process.
Equation (3.58) can be rewritten and yields
dx = A(x, t) dt + B(x, t) S′(t) · S(t) dW(t) =
= A(x, t) dt + B(x, t) S′(t) · dV(t) =
= A(x, t) dt + B(x, t) S′(t) · dW(t) ,
since V(t) is as good a Wiener process as W(t) is, and both SDEs give rise
to the same Fokker-Planck equation.
Stochastic Kinetics 153
3.6 Fokker-Planck equations
The name Fokker-Planck equation originated from two independent works by
the Dutch physicist Adriaan Daniel Fokker on Brownian motion of electric
dipoles in a radiation field [75] and by the German physicist Max Planck
who aimed at a comprehensive theory of fluctuations [76]. Other frequently
used notations for this equation are Kolmogorov’s forward equation preferred
by mathematicians because Kolmogorov developed the rigorous basis for it
[77] or Smoluchowski equation because of Smoluchowski’s use of the equa-
tion in random motion of colloidal particles. Fokker-Planck equations are
related to stochastic differential equations in the sense that they describe the
(deterministic) time evolution of a probability distribution p(x, t|x0, t0) that
is derived from the ensemble of trajectories obtained by integration of the
stochastic differential equation with different time courses of the underlying
Wiener Process W (t) (subsection 3.5.5).
The Fokker-Planck equation (3.15) is a parabolic partial differential equa-
tion16 and thus its solution requires boundary conditions in addition to
the initial conditions. The boundary conditions are determined by the na-
ture of the stochastic process, reflecting boundary conditions, for example,
conserve the numbers of particles whereas the particles disappear on the
boundaries in case they are absorbing. General boundary conditions may be
much more complex than reflection or absorption but these two simple special
cases may be used to characterize the extremes, permeable and impermeable
boundaries.
16The classification of partial differential equations of the form
Auxx + 2B uxy + C uyy + Dux + E uy + F = 0
where u(x, y) is a function and the subscripts stand for partial differentiation, for example
ux ≡ ∂u/∂x, makes use of the determinant of the matrix Z =
(A B
B C
), det(Z) = AC−B2
and defines an elliptic PDE by the condition Z is positive definite, a parabolic PDE by
det(Z) = 0, and a hyperbolic PDE by det(Z) < 0.
154 Peter Schuster
3.6.1 Probability currents and boundary conditions
For the definition of a probability current and the derivation of bound-
ary conditions we consider a multi-variable forward Fokker-Planck equation
(where we omit the explicit statement of initial conditions)
∂p(z, t)
∂t= −
∑
i
∂
∂ziAi(z, t) p(z, t) +
1
2
∑
i,j
∂2
∂zi∂zjBij(z, t) p(z, t) . (3.62)
We rewrite this equation by introducing a vectorial flux J(z, t) denoted as
probability current that in components is defined as
Ji(z, t) = Ai(z, t) p(z.t) −1
2
∑
j
∂
∂zjBij(z, t) p(z, t) , (3.63)
and introduction into equation (3.62) yields
∂p(z, t)
∂t+∑
i
∂
∂ziJi(z, t) = 0 . (3.64)
This equation can be interpreted as a local conservation condition. By in-
tegration over a volume V with a boundary S we obtain for the probability
that the random variable Z lies in V :
P (V, t) = Prob(Z ∈ V ) =
∫
V
dz p(z, t) .
The time derivative is conveniently formulated by means of the surface inte-
gral∂P (V, t)
∂t=
∫
S
dS nS J(z, t) ,
where nS is a unit vector perpendicular to the surface and pointing outward,
and nS · J is the component of J perpendicular to the surface. The total
change of probability is given by the surface integral of the current J over
the boundary of V .
The sketch in figure 3.10 is used now to show that the surface integral over
the current J for any arbitrary surface S yields the net flow of probability
across this surface. The volume V is split into two parts, V1 and V2, and
the two volumes are separated by the surface S12. Then, V1 is enclosed by
Stochastic Kinetics 155
Figure 3.10: Probability current. The figure presents a sketch, which is used
to proof that the probability current (3.63) measures the flow of probability. A
total volume V = V1 + V2 with surface S = S1 + S2 is split by a surface S12 into
two parts, V1 and V2 and Φ12 (red) and Φ21 (blue) are the particle fluxes from V2
to V1 and vice versa. The unit vector n defines the local direction perpendicular
to the differential surface element dS. For the calculation of the probability net
flow, Φ = Φ12 − Φ21 see text.
S1 +S12 and V2 by S2 +S12. In order to compute the net flow of probability
we make use of the fact that the sample path of the stochastic process are
continuous (because a process described by a Fokker-Planck equation is free
of jumps). We denote by Φ12 the particle flow crossing the boundary S12
from V2 to V1, and by Φ21 the flux going in opposite direction from V1 to
V1. Choosing a sufficiently small time interval ∆t the probability of crossing
the boundary S12 from V2 to V1 can be expressed by the joint probability of
being in V2 at time t and in V1 at time t+ ∆t,
Φ12(t,∆t) =
∫
V1
dx
∫
V2
dy p(x, t+ ∆t; y, t) .
The net flow of probability from V2 to V1 is obtained from the difference
between the flows in opposite direction, Φ12 − Φ21, through division by ∆t
and forming the limit ∆t→ 0:
Φ(t) = lim∆t→0
1
∆t
∫
V1
dx
∫
V2
dy(p(x, t+ ∆t; y, t)− p(y, t+ ∆t; x, t)
).
In the limit ∆t = 0 the integrals Φ12(t,∆t) and Φ21(t,∆t) vanish because
∫
V1
dx
∫
V2
dy p(x, t; y, t) = 0 or
∫
V1
dx
∫
V2
dy p(y, t; x, t) = 0
156 Peter Schuster
since the probability of being simultaneously in both compartments is zero,
and we obtain further
Φ(t) =
∫
V1
dx
∫
V2
dy( ∂∂τp(x, τ ; y, t)− ∂
∂τp(y, τ ; x, t)
).
Now we may use the flow version of the Fokker-Planck equation (3.64) and
obtain
Φ(t) = −∫
V1
∑
i
∂
∂xiJi(x, t;V2, t) +
∫
V2
∑
i
∂
∂yiJi(y, t;V1, t) ,
where the integration over V2 or V1 is encapsulated in the definition of the
probability that applies for the flow J(x, t;V2, t), which is calculated accord-
ing to (3.63):
p(x, t;V2, t) =
∫
V2
dy p(x, t; y, t) .
Volume integrals are now converted into surface integrals. The integrals over
the boundaries S2 and S1 vanish (except for sets of measure zero) because
they involve probabilities p(x, t;V2, t) with x neither in V2 nor in its boundary
and vice versa. The only non-vanishing terms are those where the integration
extends over S12 and this yields
Φ(t) = lim∆t→0
1
∆t
∫
V1
dx
∫
V2
dy(p(x, t+ ∆t; y, t)− p(y, t+ ∆t; x, t)
)=
=
∫
S12
dS n ·(J(x, t;V1, t) + J(x, t;V1, t)
)=
=
∫
Σ
dS n · J(x, t) , (3.65)
where Σ is some surface separating two regions and n is a unit vector pointing
from region V2 to region V1.
With the precise definition and the known properties of the probability
current we are now in the position to discuss different types of boundary
conditions.
Stochastic Kinetics 157
Reflecting boundary conditions. A reflecting barrier prevents particles
from leaving the volume V and accordingly there is zero net flow across S,
the boundary of V :
n · J(z, t) = 0 for z ∈ S and n normal to S , (3.66)
where J(z, t) is defined by equation (3.63). Since a particle cannot cross S is
must be reflected there and this explains the name reflecting barrier.
Absorbing boundary conditions. An absorbing barrier is defined by the
fact that every particle that reaches the boundary is instantaneously removed
from the system. In other words, since the fate of the particle outside V is not
considered, the barrier absorbs the particle and accordingly the probability
to find a particle at the barrier is zero:
P (z, t) = 0 for z ∈ S . (3.67)
Discontinuities at boundaries. Both classes of coefficients, Aj and Bij
may have discontinuities at the surface S, although the particles are supposed
move freely across the boundary S. In order to allow for free motion both
the probability, p(z), and the normal component of the current, n ·J(z), have
be continuous at the boundary S:
p(z)∣∣S+
= p(z)∣∣S−
and n · J(z)∣∣S+
= n · J(z)∣∣S−
, (3.68)
where the subscripts S+ and S− indicate the limits of the quantities taken
from the left and right-hand side of the surface. The definition of the prob-
ability current in equation (3.63) thus is compatible with discontinuities in
the derivatives of p(z) at the surface S. .
158 Peter Schuster
3.6.2 Fokker-Planck equation in one dimension
The general Fokker-Planck equation in one variable is of the simple form:
∂f(x, t)
∂t= − ∂
∂x
(A(x, t) f(x, t)
)+
1
2
∂2
∂x2
(B(x, t) f(x, t)
). (3.69)
So far we have applied the Fokker-Planck operator always to the conditional
probability
f(x, t) = p(x, t|x0, t0) with p(x, t0|x0, t0) = δ(x− x0) (3.70)
as initial condition. In order to allow for more general initial conditions we
need only to redefine the one time probability
p(x, t) =
∫dx0 p(x, t; x0, t0) ≡
∫dx0 p(x, t|x0, t0) p(x0, t0) , (3.71)
which is compatible with the general initial probability density
p(x, t)∣∣∣t=t0
= p(x, t0) .
In the previous section 3.5 we had shown that the stochastic process described
by the conditional probability (3.70), which satisfies the Fokker-Planck equa-
tion (3.69), is equivalent to the Ito stochastic differential equation
dx(t) = A(x(t), t)
)dt +
√B(x(t), t)
)dW (t) . (3.72)
In a way the two description are complementary to each other. In particu-
lar, perturbation theories derived from the Fokker-Planck equation are very
different from those based on the stochastic differential equation but both a
suitable in their own right for specific applications.
Boundary conditions in one dimensional systems. General boundary
conditions have been discussed in subsection 3.6.1. Systems in one dimension
allow for the introduction of additional special boundary conditions that are
useful for certain classes of problems.
Periodic boundary conditions. An, in principle, infinite system or cyclic sys-
tem is partitioned into identical intervals (of finite size). Then, the stochastic
Stochastic Kinetics 159
process takes place on an interval [a, b], the endpoints of which are assumed
to be identical. For a discontinuity at the boundary we obtain:
limx→b−
p(x, t) = limx→a+
p(x, t) and
limx→b−
J(x, t) = limx→a+
J(x, t) .(3.73)
Continuous boundary conditions simply imply that the two functions A(x, t)
and B(x, t) are periodic on the same interval:
A(b, t) = A(a, t) and B(b, t) = B(a, t) or
p(a, t) = p(b, t) and J(b, t) = J(a, t) ,(3.74)
the probability and its derivatives are identical at the endpoints of the inter-
val.
Prescribed boundary conditions. Under the assumption that the diffusion co-
efficient vanishes at the boundary and diffusive motion occurs only for x > a
and further that A(x, t) and√B(x, t) obey the Lipschitz condition at x = a
and B(x, t) is differentiable there, we have
∂B(a, t)
∂t= 0 , the SDE dx(t) = A(x, t) dt +
√B(x, t) dW (t)
has solutions and the nature of the boundary conditions is exclusively deter-
mined by the sign of A(x, t) at x = a,
(i) exit boundary: A(a, t) > 0, if a particle reaches the point x = a it will
certainly proceed out of the interval [a, b] into the open x < a,
(ii) entrance boundary: A(a, t) < 0, if a particle reaches the point x = a it
will certainly return to the region x > a or in other words a particle at
the right-hand side of x = a can never leave the interval, if the particle
is introduced at x = a it will certainly enter the region x > a, and
(iii) natural boundary: A(a, t) = 0, a particle that has reached x = a would
remain there, however, it can be shown that this point cannot even be
reached from x > a and, moreover, a particle introduced there will stay,
and thus this boundary is neither absorbing nor releasing particles.
160 Peter Schuster
The Feller classification of boundaries. William Feller [78] gave very general
criteria for the classification of boundary conditions into four classes: regu-
lar, exit, entrance, and natural. For this goal definitions of four classes of
functions are required:
(i) f(x) = exp
(−2
∫ x
x0
dsA(s)
B(s)
),
(ii) g(x) =2
B(x) f(x),
(iii) h1(x) = f(x)
∫ x
x0
g(s) ds , and
(iv) h2(x) = g(x)
∫ x
x0
f(s) ds .
wherein x0 is fixed and from the interval x0 ∈]a, b[. Now, we write L(x1, x2)
for the set of all functions, which can be integrated on the interval ]x1, x2[
and then the Feller classification is of the form:
(I) regular: if f(x) ∈ L(a, x0) and g(x) ∈ L(a, x0)
(II) exit: if g(x) /∈ L(a, x0) and h1(x) ∈ L(a, x0)
(III) entrance: if g(x) ∈ L(a, x0) and h2(x) ∈ L(a, x0)
(IV) natural: all other cases.
The classification becomes important in the context of stationary solutions
of the one-dimensional Fokker-Planck equation. Many results concerning the
compatibility of stationary solutions with certain classes of boundaries are
self-evident and will be discussed in the next paragraph.
Boundaries at infinity. In principle, all kinds of boundaries can exist at in-
finity but the requirement to obtain a probability density p(x, t) that can
be normalized and is sufficiently well behaved is a severe restriction. These
requirements are
limx→∞
p(x, t) = 0 and limx→∞
∂p(x, t)
∂t= 0 , (3.75)
where the second condition excludes cases in which the probability oscillated
infinitely fast as x → ∞. Accordingly, nonzero current at infinity can only
occur when either A(x, t) or B(x, t) diverges in the limit x→∞.
Stochastic Kinetics 161
Only two currents at boundaries x = ±∞ are compatible with conserva-
tion of probability: (i) J(±∞, t) = 0 and (ii) J(+∞, t) = J(−∞, t) corre-
sponding to reflecting and periodic boundary conditions, respectively.
Stationary solutions for homogeneous Fokker-Planck equations. In
a homogeneous process the drift and the diffusion coefficients do not depend
on time and hence the stationary probability density satisfies the ordinary
differential equation
d
dx
(A(x) p(x)
)− 1
2
d2
dx2
(B(x) p(x)
)= 0 or
dJ(x)
dx= 0 , (3.76)
which evidently has the solution
J(x) = constant . (3.76’)
For a process on an interval [a, b] we have J(a) = J(x) = J(b) = J , therefore.
One reflecting boundary implies that the other boundary is reflecting too and
hence the current vanishes J = 0. For boundaries that are not reflecting the
only case satisfying equation (3.76) is periodic boundary conditions fulfilling
(3.73). The conditions for these cases are:
Zero current condition. From J = 0 follows
A(x) p(x) =1
2
d
dx
(B(x) p(x)
)= 0 ,
which can be solved by integration:
p(x) =NB(x)
exp
(2
∫ x
a
dξA(ξ)
B(ξ)
),
where the normalization constant N is determined by the probability con-
servation condition,∫ ba
dx p(x) = 1.
Periodic boundary condition. The nonzero current of periodic boundary con-
ditions fulfils the equation
A(x) p(x) =1
2
d
dx
(B(x) p(x)
)= J .
It is, however, not arbitrary since it is restricted by normalization and the
conditions for periodic boundary conditions, p(a) = p(b) and J(a) = J(b).
162 Peter Schuster
In order to simplify the expressions we define
ψ(x) = exp
(2
∫ x
a
dξA(ξ)
B(ξ)
),
integrate and obtain
p(x)B(x)
ψ(x)=
p(a)B(a)
ψ(a)− 2 J
∫ x
a
dξ
ψ(ξ).
Making use of the boundary condition for the current
J =
(B(b)ψ(b)− B(a)
ψ(a)
)p(a)
∫ ba
dξψ(ξ)
,
and find eventually
p(x) = p(a)
(∫ xa
dξψ(ξ)· B(b)ψ(b)
+∫ bx
dξψ(ξ)· B(a)ψ(a)
B(x)ψ(x)·∫ ba
dξψ(ξ)
)
Infinite range of the stochastic variable and singular boundaries may com-
plicate the situation and a full enumeration and analysis of possible cases
is extremely hard. Commonly one relies on the handling of special cases, a
typical one is given in the next paragraph.
A chemical reaction as model. For the purpose of illustration we con-
sider an autocatalytic chemical reaction, although chemical reactions are
commonly modeled better by master equations,
X + A
k1
−−−→←−−−k2
2X . (3.77)
The stochastic variable X describes the numbers of molecules X and
p(x, t) = P(X (t) = x
)(3.78)
is the corresponding probability density whereby we assume that particle
numbers are sufficiently large in order to justify modeling by continuous
Stochastic Kinetics 163
Figure 3.11: “Stationary” probability density of the reaction A+X 2X.
The figure shows the “stationary” solution of the Fokker-Planck equation of the
autocatalytic reaction 3.77 according to equation (3.80). The result is not an
ordinary stationary solution because it is not normalizable.
variables. The reaction system is of special interest since it has an exit
barrier at x = 0 which has a simple physical explanation: If no molecule X is
present in the system no X can be produced.17 The Fokker-Planck equation,
which will be derived in chapter 4, for reaction 3.77 is of the form
∂p(x, t)
∂t=
∂
∂x
((k1ax− k2x(x− 1)
)p(x, t)
)+
+1
2
∂2
∂x2
((k1ax+ k2x(x− 1)
)p(x, t)
)≈
≈ ∂
∂x
((k1ax− k2x
2)p(x, t)
)+
1
2
∂2
∂x2
((k1ax+ k2x
2) p(x, t)).
(3.79)
Reflecting boundaries are introduced at the positions x = α and x = β and
the stationary probability density is computed to be
p(x) =(a+ x)4a−1
xe−2x . (3.80)
17Actually this is an artifact of a system violating thermodynamics. Correct thermo-
dynamic handling of catalysis requires that for every catalyzed reaction the uncatalyzed
reaction is taken into account too. This is AX for the current example and if this is
done properly the singularity at x = 0 disappears.
164 Peter Schuster
This function (figure 3.11) is not normalizable for α = 0. In fact the pole
at x = 0 is a result of absorption occurring there and this becomes evident
when we compute the Fokker-Planck coefficients
B(0, t) = (ax+ x2)∣∣x=0
= 0 ,
A(0, t) = (ax− x2)∣∣x=0
= 0 , and
∂
∂xB(0, t) = (a+ 2x)
∣∣x=0
> 0 ,
(3.81)
which meet the conditions of an exit boundary. The stationary solution is
strictly relevant for α > 0 only. The meaning of the reflecting barrier is quite
simple, whenever a molecule X disappears another one is added instanta-
neously. Despite the mathematical difficulties the stationary solution (3.80)
is a useful representation of the probability distribution in practice except
near the point x = 0, because the time required for all molecules X to disap-
pear is extraordinarily long in real chemical systems and exceeds the duration
of an experiment by many orders of magnitude.
3.6.3 Fokker-Planck equation in several dimensions
Although multidimensional Fokker-Planck equations are characterized by es-
sentially more complex behavior than in the one-dimensional case – in par-
ticular, boundary problems are much more complex and show much higher
variability, some analogies between one and many dimensions are quite useful
and will be reported here.
3.6.3.1 Change of variables
A multidimensional Fokker-Planck equation written in general variables, x =
(x1, x2, . . . , xn)′,
∂p(x, t)
∂t= −
n∑
i=1
∂
∂xi
(Ai(x)p(x, t)
)+
1
2
n∑
i,j=1
∂2
∂xi∂xj
(Bij(x)p(x, t)
), (3.82)
is to be transformed into the corresponding equation for the new variables
ξi = fi(x) (with i = 1, . . . , n). The functions fi are assumed to be inde-
pendent and differentiable. If π(ξ, t)is the probability density for the new
Stochastic Kinetics 165
variable, we can obtain it from
π(ξ, t) = p(x, t)
∣∣∣∣∣∣∣∣∣∣
∂x1
∂ξ1
∂x1
∂ξ2. . . ∂x1
∂ξn∂x2
∂ξ1∂x2
∂ξ2. . . ∂x2
∂ξn...
.... . .
...∂xm
∂ξ1∂xm
∂ξ2. . . ∂xm
∂ξn
∣∣∣∣∣∣∣∣∣∣
= p(x, t) · |J | , (3.83)
where J is the Jacobian matrix and |J | the Jacobian determinant of the
transformation of coordinates.
Often, the easiest way to implement the change of variable is make use if
the corresponding stochastic differential equation
dx(t) = A(x) dt + b(x) dW(t) with b(x) · b(x)′ = B(x) , (3.58’)
and then recompute the Fokker-Planck equations for π(ξ, t) from the stochas-
tic differential equations. Commonly, both procedures are quite involved and
the calculations are quite mess unless the use of symmetries or simplifications
facilitate the problem.
Transformation from Cartesian to polar coordinates.. As an example
we consider the Rayleigh process also called Rayleigh fading. The commonly
applied model uses two orthogonal Ornstein-Uhlenbeck processes along the
real and the imaginary axis of an electric field, E = (E1, E2). The stochastic
differential equation,
dE1(t) = − γ E1(t) dt + ε dW1(t) and
dE2(t) = − γ E2(t) dt + ε dW2(t) ,(3.84)
is converted into polar coordinates
E1(t) = α(t) cosφ(t) and E2(t) = α(t) sin φ(t) .
With α(t) = exp(µ(t)
)we can write E1 + ı
.ı E2 = exp
(µ(t) + ı
.ı φ(t)
)and for
the Wiener processes we define
dWα = dW1(t) cosφ(t) + dW2(t) sin φ(t) and
dWφ = − dW1(t) cosφ(t) + dW2(t) sin φ(t) ,
166 Peter Schuster
and obtain for Raleigh fading in polar coordinates
dφ(t) =ε
α(t)dWφ(t) and
dα(t) =
(− γ α(t) +
ε2
2α(t)
)dt + ε dWα(t) .
(3.85)
The interesting result is that the phase angle φ diffused without a drift term.
Now we shall perform the analogous transformation on the Fokker-Planck
equation describing the probability density p(E1, E2, t)
∂p(E1, E2, t)
∂t= γ
(∂
∂E1(E1 p) +
∂
∂E2(E2 p)
)+
1
2ε2(∂2p
∂E21
+∂2p
∂E22
)(3.86)
The transformation of coordinates, (E1, E2) =⇒ (α, φ) with E1 = α cosφ
and E2 = α cosφ yields the following Jacobian determinant
|J | =
∣∣∣∣∂(E1, E2
∂(α, φ)
∣∣∣∣ =
∣∣∣∣∣cos φ sinφ
−α sinφ α cosφ
∣∣∣∣∣ = α ,
and the Laplacian transformed to polar coordinates yields (equation (3.86)
second term on the right-hand side):
∂2
∂E21
+∂2
∂E22
=1
α
∂
∂α
(α∂
∂α
)+
1
α2
∂2
∂φ2.
For the inverse transformation we obtain α =√E2
1 + E22 and φ = tan−1(E2/E1)
and the Jacobian
|J−1| =
∣∣∣∣∂(α, φ
∂(E1, E2)
∣∣∣∣ =
∣∣∣∣∣cosφ − sin φ/α
sinφ cosφ/α
∣∣∣∣∣ = α−1 .
Further calculations are straightforward and yield
∂
∂E1(E1 p) +
∂
∂E2(E2, p) =
= 2 p + E1
(∂p
∂α
∂α
∂E1+∂p
∂φ
∂φ
∂E1
)+ E2
(∂p
∂α
∂α
∂E2+∂p
∂φ
∂φ
∂E2
)=
= 2 p + α∂p
∂α=
1
α
∂
∂α(α2 p) ,
Stochastic Kinetics 167
the probability density in polar coordinates becomes
π(α, φ) =
∣∣∣∣∂(E1, E2
∂(α, φ)
∣∣∣∣ p(E1, E2) = α p(E1, E2) .
Collection of all previous results leads to
∂π(α, φ, t)
∂t= − ∂
∂α
((− γ α +
ε2
α
)π
)+ε2
2
(1
α2
∂2π
∂φ2+
∂2π
∂α2
). (3.87)
The Fokker-Planck equation (3.87) corresponds to the two stochastic differ-
ential equations (3.85), which were derived by changing variables according
to Ito’s formalism.
3.6.3.2 Stationary solutions
Here, we give a brief overview over the stationary solutions of many variable
Fokker-Planck equations. Some general aspects of boundary conditions have
been discussed already in subsection 3.6.1 and we just summarize.
For the forward Fokker-Planck equation the reflecting barrier boundary
condition has to satisfy
n · J = 0 for x ∈ S ,
where S is the surface of the domain V that is considered (figure 3.10), n is
a local vector standing normal to the surface and J is the probability current
with the components
Ji(x, t) = Ai(x, t) p(x, t) −1
2
∑
j
∂
∂xjBij(x, t) p(x, t) .
The absorbing barrier condition is
p(x, t) = 0 for x ∈ S .
In reality, parts of the surface may be reflecting whereas other parts are
absorbing. Then, at discontinuities on the surface the conditions
n · J(x, t)1 = n · J(x, t)2 and p1(x, t) = p2(x, t) for x ∈ S
168 Peter Schuster
have to be fulfilled.
The stationarity condition for the multidimensional Fokker-Planck equa-
tion implies that the probability current J (3.63) vanishes for all x ∈ V and
leads to the equation for the stationary probability density p(x):
1
2
∑
j
Bij(x)∂p(x)
∂xj= p(x)
(Ai(x) −
∑
j
∂
∂xjBij(x)
). (3.88)
Under the assumption that the matrix B is invertible this equation can be
rewritten as
∂
∂xilog(p(x)
)=∑
k
B−1ik (x)
(2Ak(x) −
∑
j
∂
∂xjBkj(x)
)≡
≡ Zi(A,B, x) .
(3.89)
The stationarity condition (3.89) cannot be satisfied for arbitrary drift vec-
tors A and diffusion matrices B since the left-hand side of the equation is
a gradient and thus Zi has to fulfill the (necessary and sufficient) condition
for a vanishing curl, (∂Zi/∂xj) = (∂Zj/∂xi). If the vanishing curl condi-
tion is fulfilled, the stationary solution can be obtained by straightforward
integration
p(x) = exp
(∫ x
dξ Z(A,B, x)
).
This condition is often addressed as potential condition, because the gradients
Zi can be associated with the existence of a potential −Φ(x), and is better
illustrated by
p(x) = exp(−Φ(x)
)with Φ(x) = −
∫ x
dξ Z(A,B, x) . (3.90)
Not every Fokker-Planck equation has a stationary solution but if it sustains
one, the solution can be obtained by simple integration.
The Rayleigh process in polar coordinates (subsubsection 3.6.3.1) is used
as an illustrative example. From equation (3.87) we obtain
A =
(−γα + ε2
2α
0
)
, B =
(ε2 0
0 ε2
)
,
Stochastic Kinetics 169
Figure 3.12: Detailed balance in a reaction cycle A B C A. In the
cycle of three monomolecular reaction the condition of detailed balance is stronger
than the stationarity condition. The common condition for the stationary state,
d[A]/
dt = d[B]/
dt = d[C]/
dt = 0, requires that the probability currents for
the individual reaction steps are equal, J12 = J23 = J31 = J , whereas detailed
balance is satisfied only when all individual currents vanish, J = 0.
from which we obtain
∑
j
∂
∂xjBα,j = 0 ,
∑
j
∂
∂xjBφ,j = 0 and hence Z = 2B−1 ·A =
(−2γα
ε2+ 1
α
0
),
(∂Zα/∂φ) = 0 and (∂Zφ
/∂α) = 0, and then the stationary solution is of
the form (with N being a normalization constant)
p(α, φ) = exp
(∫ (α,φ)(dαZα + dφZφ
))
=
= N exp
(logα − γα2
ε2
)=
= N α exp
(− γα
2
ε2
).
(3.91)
3.6.3.3 Detailed balance
The stationary solutions of certain Fokker-Planck equations correspond to
the condition of vanishing probability current and this property suggests that
170 Peter Schuster
the condition is a particular version of the physical principle of detailed bal-
ance. In statistical mechanics detailed balance was studied first by Richard
Tolman [79]. A Markov process fulfils detailed balance if in the stationary
state the frequency of every possible transition is balanced by the correspond-
ing reverse transition. A special example showing that detailed balance is
stronger than the stationarity condition is presented in figure 3.12. In an
n membered cycle stationarity is obtained iff all individual probability cur-
rents are equal, J1 = J2 = · · · = Jn = J , whereas detailed balance requires
J = 0. The local flux for a single reaction step is Ji = ki,i+1[Ii]− ki+1,i[Ii+1]
the condition of a constant flux Ji = J can be fulfilled even in case of irre-
versible reactions, for example kj,i = 0 for all i = 1, 2, . . . , n and j = i + 1,
j = mod (n).
Detailed balance in Markov processes. In general, the condition of de-
tailed balance in Markov processes is formulated best by means of time re-
versal visualized as a transformation to reversed variables: xi =⇒ εixi with
εi = ±1 depending whether the variable shows even or odd behavior under
time reversal. The detailed balance requires
p(x, t+ τ ; z, t) = p(ζ, t+ τ ; ξ, t) with
ξ = (ε1x1, ε2x2, . . . ) and ζ = (ε1z1, ε2z2, . . . ) .(3.92)
By setting τ = 0 equation (3.92) we find
δ(x− z) p(z) = δ(ξ − ζ) p(ξ) .
The two delta functions are equal as only simultaneous changes of sign are in-
volved, and hence p(x) = p(ξ) as a consequence of the formulation of detailed
balance. The condition 3.92 can now be rewritten in terms of conditional
probabilities:
p(x, τ |z, 0) p(z) = p(ζ, τ |ξ, 0) p(x) . (3.92’)
Equation (3.92’) has consequences for several stationary quantities and func-
tions. For the mean holds
〈x〉 =⟨ξ⟩
(3.93)
Stochastic Kinetics 171
and this has the consequence that all odd variables – variables xi with εi = −1
– have zero mean at the stationary state with detailed balance. For the au-
tocorrelation functions and the spectrum one obtains with ε = (ε1, . . . , εn)′:
G(τ) = εG′(τ)ε′ and S(ω) = εS ′(ω)ε′ ,
with the covariance matrix Σ fulfilling Σ ε = ε′ Σ for G(τ) at τ = 0.
A somewhat different situation arises in cases where the vectorial quan-
tity transforms like an axial vector or pseudovector and represents angular
momentum like mechanical rotation or magnetic fields. Then, there exist
two or more stationary solutions and the condition (3.92) must be relaxed to
pλk(x, t+ τ |z, t) = p εkλk(ζ, t+ τ |ξ, t), (3.92”)
where λ = (λ1, λ2, . . . ) is a vector of constant quantities under rotation which
changes to (ε1λ1, ε2λ2 . . . ) under time reversal. Crispin Gardiner suggests to
call this property time reversal invariance instead of detailed balance [6,
p.145].
Detailed balance in the differential Chapman-Kolmogorov equa-
tion. The necessary and sufficient conditions for a homogeneous Markov
process to have a stationary state that fulfils detailed balance are:
(i) W (x|z) p(z) = W (ζ|ξ) p(x) , (3.94)
(ii) εiAi(ξ) p(x) = −Ai(x) +∑
j
∂
∂xj
(Bij(x) p(x)
), (3.95)
(iii) εiεj Bij(ξ) = Bij(x) . (3.96)
The corresponding conditions for the Fokker-Planck equation are obtained
simply by setting the jump probabilities equal to zero: W (x|z) = 0. A
considerable simplification arises for exclusively even variables because the
conditions simplify to:
(i) W (x|z) p(z) = W (z|x) p(x) , (3.94’)
(ii) Ai(x) p(x) =1
2
∑
j
∂
∂xj
(Bij(x) p(x)
), (3.95’)
(iii) Bij(x) = Bij(x) . (3.96’)
172 Peter Schuster
We remark that condition (3.95’) is identical with the condition of a vanishing
flux at the stationary state equation (3.88) that has been the requirement
for the existence of a potential φ(x).
Special for Fokker-Planck is the partitioning of the drift term into a re-
versible and an irreversible part [80–82]:
Di(x) =1
2
(Ai(x) + εiAi(ξ)
)irreversible drift (3.97)
Ii(x) =1
2
(Ai(x) − εiAi(ξ)
)reversible drift (3.98)
Making use again of the probability formulated by means of a potential,
p(x) = exp(−φ(x)
), the conditions for detailed balance are of the form
εiεj Bij(ξ) = Bij(x) ,
Di(x) − 1
2
∑
j
∂
∂xjBij(x) = − 1
2
∑
j
Bij(x)∂φ(x)
∂xj, and
∑
i
(∂
∂xiIi(x) − Ii(x)
∂φ(x)
∂xi
)= 0 .
Under the assumption that these conditions are fulfilled by the functions
Di(x) and Bij(x) and matrix B is invertible the equation in the middle can
be rewritten in the form
∂Zi∂xj
=∂Zj∂xi
with
Zi =∑
k
B−1ik (x)
(2Dk(x) −
∑
j
∂
∂xjBkj(x)
),
and the stationary probability density p(x) fulfilling detailed balance is of
the form,
p(x) = exp(−φ(x)
)= exp
(∫ x
dz · Z), (3.99)
and can be calculated explicitly as an integral.
Finally, we mention the the reciprocity relations in linear irreversible ther-
modynamics, developed by the Norwegian and American physicist Lars On-
sager and therefore also called Onsager relations, are also a consequence of
detailed balance [83, 84].
Stochastic Kinetics 173
3.7 Autocorrelation functions and spectra
Analysis of experimentally recorded or computer created sample path is often
largely facilitated by the usage of additional tools complementing moments
and probability distributions since they can, in principle, be derived from a
single trajectory. These tools are autocorrelation functions and spectra of
random variables (for an extensive treatment of time series analysis see, for
example, [85]). They provide direct insight into the dynamics of the process
as they deal with relations between sample points collected at different times.
The autocorrelation function of the random variable X (t) is a measure
of the influence the value x recorded at time t has on the measurement of
the same variable at time t+ τ
G(τ) = limt→∞
1
t
∫ t
0
dθ x(θ) x(θ + τ) . (3.100)
It represents the time average of the product of two values recorded at arbi-
trary or sufficiently long times. The autocorrelation function is of particular
importance in the analysis of experimental data because technical devices
called autocorrelators have been built which sample data and can record
directly the autocorrelation function of a process under investigation.
Another relevant quantity is the spectrum or the spectral density of the
quantity x(t). In order to derive the spectrum, we construct a new variable
y(ω) by means of the transformation y(ω) =∫ t0
dθ eı.ıωθ x(θ). The spectrum
is then obtained from y by performing the limit t→∞:
S(ω) = limt→∞
1
2πt|y(ω)|2 = lim
T→∞
1
2πt
∣∣∣∣∫ t
0
dθ eı.ıωθ x(θ)
∣∣∣∣2
. (3.101)
The autocorrelation function and the spectrum are closely connected. By
some calculations one finds
S(ω) = limt→∞
(1
π
∫ t
0
cos(ωτ) dτ1
t
∫ t−τ
0
x(θ) x(θ + τ) dθ
).
Under certain assumptions, which insure the validity of the interchanges of
order, we may take the limit t→∞ and find
S(ω) =1
π
∫ ∞
0
cos(ωτ)G(τ) dτ .
174 Peter Schuster
This result relates the Fourier transform of the autocorrelation function to
the spectrum and can be cast in an even prettier form by using
G(−τ) = limt→∞
1
t
∫ t−τ
−τdθ x(θ) x(θ + τ) = G(τ)
to yield the Wiener-Khinchin theorem named after Norbert Wiener and
the Russian mathematician Aleksandr Khinchin
S(ω) =1
2π
∫ +∞
−∞e−ı
.ıωτ G(τ) dτ and G(τ) =
∫ +∞
−∞eı
.ıωτS(ω) dω . (3.102)
Spectrum and autocorrelation function are related to each other by the
Fourier transformation and its inversion.
Equation (3.102) allows for a straightforward proof that the Wiener pro-
cess ~W(t) = W (t) gives rise to white noise (subsection 3.4.3). Let w be
a zero-mean random vector with the identity matrix as (auto)covariance or
autocorrelation matrix:
E(w) = µ = 0 and Cov(W,W) = E(ww′) = σ2 I ,
then the Wiener process W (t) fulfils the relations,
µW (t) = E(W(t)
)= 0 and
GW (τ) = E(W(t)W(t + τ)
)= δ(τ) ,
defining it as a zero-mean process with infinite power at zero time shift. For
the spectral density of the Wiener process we obtain:
SW (ω) =1
2π
∫ +∞
−∞e−ı
.ıωτ δ(τ) dτ =
1
2π. (3.103)
The spectral density of the Wiener process is a constant and hence all fre-
quencies in the noise are represented with equal weight. All colors mixed
with equal weight in light yields white light and this property of visible light
gave the name for white noise; in colored noise the noise frequencies do not
fulfil the uniform distribution.
The time average of a signal as expressed by an autocorrelation function
is complemented by the ensemble average, 〈·〉, or expressed by the expec-
tation value of the corresponding random variable, E(·), which implies an
Stochastic Kinetics 175
(infinite) number of repeats of the same measurement. In case the assump-
tion of ergodic behavior is true, the time average is equal to the ensemble
average. Thus we find for a fluctuating quantity X (t) in the ergodic limit
E(X (t),X (t+ τ)
)= 〈x(t)x(t+ τ)〉 = G(τ) .
It is straightforward to consider dual quantities which are related by Fourier
transformation and get:
x(t) =1
2π
∫dω c(ω) eı
.ıωt and c(ω) =
∫dt x(t) e−ı
.ıωt .
We use this relation to derive several important results. Measurements refer
to real quantities x(t) and this implies: c(ω) = c∗(−ω). From the condition
of stationarity, 〈x(t)x(t′)〉 = f(t − t′) and does not depend on t otherwise
follows
〈c(ω)c∗(ω′)〉 =1
(2π)2
∫∫dt dt′ e−ı
.ıωt+iωt′ 〈x(t)x(t′)〉 =
=δ(ω − ω′)
2π
∫dτ eı
.ıωτ G(τ) = δ(ω − ω′)S(ω) .
The last expression relates not only the mean square 〈|c(ω)|2〉 with the spec-
trum of the random variable, it shows also that stationarity alone implies
that c(ω) and c∗(ω′) are uncorrelated.
176 Peter Schuster
4. Applications in chemistry
The chapter starts by a discussion of stochasticity in chemical reactions and
then presents the chemical master equation as an appropriate tool for mod-
eling chemical reactions. Then, the birth-and-death process is introduced
as a well justified and useful approximation to general jump processes. The
equilibration of particle numbers or concentrations in the flow reactor is used
as a simple example to demonstrate the analysis of a birth-and-death pro-
cess. Next follow discussions of mono- and bimolecular chemical reactions
that can be still solved exactly by means of time dependent probability gen-
erating functions. The last sections handle the transition from microscopic
to macroscopic systems by means of the size expansion technique and the
numerical approach to stochastic chemical kinetics.
4.1 Stochasticity in chemical reactions
Stochastic chemical kinetics is based on the assumption that knowledge on
the transformation of molecules in chemical reactions is not accessible in full
detail or if it would, the information would be overwhelming and obscur-
ing essential features. Thus, it is assumed that chemical reactions have a
probabilistic element and can be modeled properly by means of stochastic
processes. The random processes are caused by thermal noise as well as by
random encounter of molecules in collisions. Fluctuations, therefore, play an
important role and they are responsible for the limitations in the reproduc-
tion of experiments. We shall model chemical reactions as Markov processes
and analyze the corresponding master and Fokker-Planck equations. As an
appropriate criterium for classification we shall use the molecularity of reac-
tions1 and the complexity of the dynamical behavior.
1The molecularity of a reaction is the number of molecules that are involved in the reac-
tion, for example two in a reactive collision between molecules or one in a conformational
177
178 Peter Schuster
The stochastic approach to chemical reaction kinetics has tradition and
started in the late fifties from two different initiatives: (i) approximation of
the highly complex vibrational relaxation [86–88] and its application to chem-
ical reactions, and (ii) direct simulation of chemical reactions as stochastic
processes [89–91]. The latter approach is in the sense of the initially men-
tioned limited information on details and has been taken up and developed
further by several groups [92–96]. The major part of these works has been
summarized in an early review [97] which is particularly recommended here
for further reading. Bartholomay’s work is also highly relevant for biological
models of evolution, because he studied reproduction as a linear birth-and-
death process. Exact solutions to master or Fokker-Planck equation can be
found only for particularly simple special cases. Often approximations were
used or the analysis has been restricted the expectation values of variables.
Later on computer assisted approximation techniques and numerical simula-
tion methods were developed which allow for handling stochastic phenomena
in chemical kinetics on a more general level [64, 98].
4.1.1 Elementary steps of chemical reactions
Chemical reactions are defined by mechanisms, which can be decomposed
into elementary processes. An elementary process describes the transforma-
tion of one or two molecules into products. Elementary processes involving
three or more molecules are unlikely to happen in the vapor phase or in
dilute solutions, because trimolecular encounters are very rare under these
conditions. Therefore, elementary steps of three molecules are not consid-
ered in conventional chemical kinetics.2 Two additional events which occur
in open systems, for example in flow reactors, are the creation of a molecules
through influx or the annihilation of a molecule through outflux. Common
elementary steps are:
change.2Exceptions are reactions involving surfaces as third partner, which are important in
gas phase kinetics, and biochemical reactions involving macromolecules.
Stochastic Kinetics 179
? −−−→ A (4.1a)
A −−−→ (4.1b)
A −−−→ B (4.1c)
A −−−→ 2B (4.1d)
A −−−→ B + C (4.1e)
A + B −−−→ C (4.1f)
A + B −−−→ 2A (4.1g)
A + B −−−→ C + B (4.1h)
A + B −−−→ C + D (4.1i)
2A −−−→ B (4.1j)
2A −−−→ 2B (4.1k)
2A −−−→ B + C (4.1l)
Depending on the number of reacting molecules the elementary processes are
called mono-, bi-, or trimolecular. Tri- and higher molecular elementary
steps are excluded in conventional chemical reaction kinetics as said above.
The example show in equation (4.1g) is an autocatalytic elementary
process. In practice autocatalytic reactions commonly involve many elemen-
tary steps and thus are the results of complex reaction mechanisms. In order
to study basic features of autocatalysis or chemical self-enhancement, single
step autocatalysis is often used as a model system. One particular trimolec-
ular autocatalytic process,
2A + B −−−→ 3A , (4.2)
became very famous [99] despite its trimolecular nature which makes it un-
likely to occur in real systems. The elementary step (4.2) is the essential step
in the so-called Brusselator model, it can be straightforwardly addresses by
analytical mathematical techniques, and it gives rise to complex dynamical
180 Peter Schuster
phenomena in space and time which are otherwise not observed in chemi-
cal reaction systems. Among other features such special phenomena are: (i)
multiple stationary states, (ii) chemical hysteresis, (iii) oscillations in concen-
trations, (iv) deterministic chaos, and (v) spontaneous formation of spatial
structures. The last example is known as Turing instability [100] and is fre-
quently used as a model for pattern formation or morphogenesis in biology
[101].
4.1.2 The master equation in chemistry
Provided particle numbers are assigned to the variables describing the progress
of chemical reactions, the stochastic variableN (t) with the probability Pn(t) =
P (N (t) = n) can take only nonnegative integer values, n ∈ N0. In addition
we introduce a few simplifications and some conventions in our notation.
We shall use the forward equation unless stated differently and assume an
infinitely sharp initial density: P (n, 0|n0, 0) = δn,n0 with n0 = n(0). Then,
we can simplify the full notation by P (n, t|n0, 0) ⇒ Pn(t) with the implicit
assumption of the initial condition specified above. Other sharp initial val-
ues or for initial extended probability densities will be given explicitly. The
expectation value of the stochastic variable N (t) will be denoted by
E(N (t)
)= 〈n(t)〉 =
∞∑
n=0
n · Pn(t) . (4.3)
Its stationary value, provided it exists, will be written
n = limt→∞〈n(t)〉 . (4.4)
Almost always n will be identical with the long time value of the correspond-
ing deterministic variable. The running index of integers will be denoted by
either m or n′. Then the chemical master equation is of the form
∂Pn(t)
∂t=∑
m
(W (n|m)Pm(t) − W (m|n)Pn(t)
). (4.5)
The transition probabilities may be time dependent in certain cases: W (n|m, t).Most frequently we shall assume that they are not. The probabilitiesW (n|m)
Stochastic Kinetics 181
can be understood as the elements of a transition matrix W.= Wnm;n,m ∈
N0. Diagonal elements Wnn cancel in the master equation (4.5) and hence
need not be defined. According to their nature as transition probabilities,
all Wnm with n 6= m have to be nonnegative. We may define, nevertheless,∑
mWmn = 0 which implies Wnn = −∑m6=nWmn and then insertion into
(4.5) leads to a compact form of the master equation
∂Pn(t)
∂t=∑
m
WnmPm(t) . (4.5’)
Introducing vector notation, P(t)′ = (P1(t), . . . , Pn(t), . . .), we obtain
∂P(t)
∂t= W ×P(t) . (4.5”)
With the initial condition Pn(0) = δn,n0 stated above we can solve equa-
tion (4.5”) formally for each n0 and obtain
P (n, t|n0, 0) =(exp(W t)
)
n,n0
,
where the element (n, n0) of the matrix exp(Wt) is the probability to have
n particles at time t, N (t) = n, when there were n0 particles at time t0 = 0.
The evaluation of this equation boils down to diagonalize the matrix W which
can be done analytically in rather few cases only.
For the forthcoming considerations we shall derive the so-called jump
moments
αp(n) =
∞∑
m=0
(m− n)pW (m|n) ; p = 1, 2, . . . . (4.6)
The usefulness of the first two jump moments (p = 1, 2) is easily demon-
strated: We multiply equation (4.5) by n and obtain through summation:
d
dt〈n〉 =
∞∑
n=0
∞∑
m=0
(mW (n|m)Pm(t) − nW (m|n)Pn(t)
)=
=
∞∑
n=0
∞∑
m=0
(m− n)W (m|n)Pn(t) = 〈α1(n)〉 .
182 Peter Schuster
Only in case α1(n) is a linear function of n formation of moment and expec-
tation value may be interchanged and we have the simple equation
d
dt〈n〉 = α1(〈n〉) .
Otherwise this is only a zeroth order approximation which can be improved
through expansion of α1(n) in (n−〈n〉). Break off after the second derivative
yieldsd
dt〈n〉 = α1(〈n〉) +
1
2σ 2n
d2
dn2α1(〈n〉) . (4.6’)
In order to obtain a consistent approximation one may apply a similar ap-
proximation to the time development of the variance and finds [98]:
d
dtσ 2n = α2(〈n〉) + 2 σ 2
n
d
dnα1(〈n〉) . (4.6”)
These expressions will be simplified in case of the forthcoming examples. We
proceed now by discussing first some important special cases where exact
solutions are derivable and then present a general and systematic approx-
imation scheme which allows to solve the master equation for sufficiently
large systems [64, 98]. This scheme is based on a power series expansion in
some extensive physical parameter Ω, for example the size of the system or
the total number of particles. It will turn out that Ω−1/2 is the appropri-
ate quantity for the expansion and thus the approximation is based on the
smallness of fluctuations. This implies that we shall encounter the limits
of reliability of the technique at small population sizes or in situations of
self-enhancing fluctuations, for example at instabilities or phase transitions.
The chemical master equation has been shown to be based on a rigor-
ous microscopic concept of chemical reactions in the vapor phase within the
frame of classical collision theory [3]. The two general requirements that
have to be fulfilled are: (i) a homogeneous mixture as it is assumed to exits
through well stirring and (ii) thermal equilibrium implying that the veloci-
ties of molecules follow a Maxwell-Boltzmann distribution. Daniel Gillespie’s
approach focusses on chemical reactions rather than molecular species and
is well suited to handle reaction networks. In addition the algorithm can be
easily implemented for computer simulation. We shall discuss the Gillespie
formalism together with the computer program in section 4.4.
Stochastic Kinetics 183
Figure 4.1: Sketch of the transition probabilities in master equations. In
the general jump process steps of any size are admitted (upper drawing) whereas
in birth-and-death processes all jumps have the same size. The simplest and most
common case is dealing with the condition that the particles are born and die one
at a time (lower drawing).
4.1.3 Birth-and-death master equations
The concept of birth-and-death processes has been created in biology and is
based on the assumption that only a finite number of individuals are produced
– born – or destroyed – die – in a single event. The simplest and the only case,
we shall discuss here, is occurs when birth and death is confined to single
individuals of only one species. These processes are commonly denoted as
one step birth-and-death processes.3 In figure 4.1 the transitions in a general
jump process and a birth-and-death process are illustrated. Restriction to
single events is tantamount to the choice of a sufficiently small time interval
of recording, ∆t, such that the simultaneous occurrence of two events has
3In addition, one commonly distinguishes between birth-and-death processes in one
variable and in many variables [64]. We shall restrict the analysis here to the simpler
single variable case here.
184 Peter Schuster
a probability of measure zero (see also section 4.4). This small time step is
often called the blind interval, because no information on things happening
within ∆t is available.
Then, the transition probabilities can be written in the form
W (n|m, t) = w+(m) δn,m+1 + w−(m) δn,m−1 , (4.7)
since we are dealing with only two allowed processes:
n −→n+ 1 with w+(n) as transition probability per unit time and
n −→n− 1 with w−(n) as transition probability per unit time.
In subsection 3.4.1 we discussed the Poisson process which can be understood
as a birth-and-death process on n ∈ N0 with zero death rate. Modeling of
chemical reactions by birth-and-death processes turns out to be a very useful
approach for reaction mechanisms which can be described by changes in a
single variable.
The stochastic process can now be described by a birth-and-death master
equation
∂Pn(t)
∂t= w+(n− 1)Pn−1(t) + w−(n+ 1)Pn+1(t)−
−((w+(n) + w−(n)
)Pn(t) .
(4.8)
There is no general technique that allows to find the time-dependent solutions
of equation (4.8) and therefore we shall present some special cases later on.
In subsection 5.1.2 we shall give also a detailed overview of exactly solvable
single step birth-and-death processes. It is, however, possible to analyze the
stationary case in full generality.
Provided a stationary solution of equation (4.8), limt→∞ Pn(t) = Pn, ex-
ists, we can compute it in straightforward manner. It is useful to define a
probability current J(n) for the n-th step in the series,
Particle number 0 1 . . . n− 1 n n+ 1 . . .
Reaction step 1 2 . . . n− 1 n n+ 1 . . .
Stochastic Kinetics 185
which is of the form
J(n) = w− n P (n) − w+ (n− 1) P (n− 1) . (4.9)
Now, the conditions for the stationary solution are given by
∂Pn(t)
∂t= 0 = J(n + 1) − J(n) , (4.10)
Restriction to positive particle numbers, n ∈ N0, implies w−(0) = 0 and
Pn(t) = 0 for n < 0, which in turn leads to J(0) = 0.
Now we sum the vanishing flow terms according to equation (4.10) and
obtain:
0 =
n−1∑
z=0
(J(z + 1) − J(z)
)= J(n) − J(0) .
Thus we find J(n) = 0 for arbitrary n which leads to
Pn =w+(n− 1)
w−(n)Pn−1 and finally Pn = P0
n∏
z=1
w+(z − 1)
w−(z).
The condition J(n) = 0 for every reaction step is known in chemical ki-
netics as the principle of detailed balance, which has been formulated first
by the American mathematical physicist Richard Tolman [79] (see also sec-
tion 3.6.3.3 and [6, pp.142-158]).
The macroscopic rate equations are readily derived from the master equa-
tion through computation of the expectation value:
∂
∂tE(n(t)
)=
∂
∂t
( ∞∑
n=0
nPn(t)
)=
=
∞∑
n=0
n(w+(n− 1)Pn−1(t) − w+(n)Pn(t)
)+
+∞∑
n=0
n(w−(n+ 1)Pn+1(t) − w−(n)Pn(t)
)=
=∞∑
n=0
((n+ 1)w+(n) − nw+(n) + (n− 1)w−(n) − nw−(n)
)Pn(t) =
=
∞∑
n=0
w+(n)P n(t) −∞∑
n=0
w−(n)Pn(t) = E(w+(n)
)− E
(w−(n)
).
186 Peter Schuster
Neglect of fluctuations yields the deterministic rate equation of the birth-
and-death process
d〈n〉dt
= w+(〈n〉)− w−(〈n〉
). (4.11)
The condition of stationary yields: n = limt→∞ 〈n(t)〉 forwhich holdsw+(n) =
w−(n). Compared to this results we note that the maximum value of the
stationary probability density, maxPn, n ∈ N0, is defined by Pn+1 − Pn ≈−(Pn − Pn−1) or Pn+1 ≈ Pn−1 which coincide with the deterministic value
for large n.
4.1.4 The flow reactor
The flow reactor is introduced as an experimental device that allows for
investigations of systems off thermodynamic equilibrium. The establishment
of a stationary state or the flow equilibrium in a flow reactor (figure 4.2) is a
suitable case study for the illustration of the search for a solution of a birth-
and-death master equation. At the same time the non-reactive flow of a single
compound represents the simplest conceivable process in such a reactor. The
stock solution contains A at the concentration [A]influx = a = a [mole·l−1].
The influx concentration a is equal to the stationary concentration a, because
no reaction is assumed to take place in the reactor. The flow is measured by
means of the flow rate r [l·sec−1]: This implies an influx of a · r [mole·sec−1]
of A into the reactor, instantaneous mixing with the content of the reactor,
and an outflux of the mixture in the reactor at the same flow rate r.4 The
reactor has a volume of V [l] and thus we have a mean residence time of
τR = V · r−1 [sec] of a volume element dV in the reactor.
In- and outflux of compound A into and from the reactor are modeled by
two formal elementary steps or pseudo-reactions
? −−−→ A
A −−−→ .(4.12)
4The assumption of equal influx and outflux rate is required because we are dealing
with a flow reactor of constant volume V (CSTR, figure 4.2).
Stochastic Kinetics 187
Figure 4.2: The flow reactor. The reactor shown in the sketch is a device of
chemical reaction kinetics which is used to carry out chemical reactions in an open
system. The stock solution contains materials, for example A at the concentration
[A]influx = a, which are usually consumed during the reaction to be studied. The
reaction mixture is stirred in order to guarantee a spatially homogeneous reaction
medium. Constant volume implies an outflux from the reactor that compensates
precisely the influx. The flow rate r is equivalent to the inverse mean residence
time of solution in the reactor multiplied by the reactor volume, τ −1R · V = r. The
reactor shown here is commonly called continuously stirred tank reactor (CSTR).
In chemical kinetics the differential equations are almost always formulated in
molecular concentrations. For the stochastic treatment, however, we replace
concentrations by the numbers of particles, n = a · V ·NL with n ∈ N0 and
188 Peter Schuster
NL being Loschmidt’s or Avogadro’s number,5 the number of particles per
mole.
The particle number of A in the reactor is a stochastic variable with the
probability Pn(t) = P(N (t) = n
). The time derivative of the probability
distribution is described by means of the master equation
∂Pn(t)
∂t= r
(n Pn−1(t) + (n+1)Pn+1(t) − (n+n)Pn(t)
); n ∈ N0 . (4.13)
Equation (4.13) describes a birth-and-death process with w+(n) = rn and
w−(n) = rn. Thus we have a constant birth rate and a death rate which
is proportional to n. Solutions of the master equation can be found in text
books listing stochastic processes with known solutions, for example [19].
Here we shall derive the solution by means of probability generating functions
as introduced in subsection 2.7.1, equation (2.70) in order to illustrate one
particularly powerful approach:
g(s, t) =∞∑
n=0
Pn(t) sn . (2.70’)
Sometimes the initial state is included in the notation: gn0(s, t) implies
Pn(0) = δn,n0. Partial derivatives with respect to time t and the dummy
variable s are readily computed:
∂g(s, t)
∂t=
∞∑
n=0
∂Pn(t)
∂t· sn =
= r
∞∑
n=0
(n Pn−1(t) + (n + 1)Pn+1(t) − (n+ n)Pn(t)
)sn and
∂g(s, t)
∂s=
∞∑
n=0
nPn(t) sn−1 .
5As a matter of fact there is a difference between Loschmidt’s and Avogadro’s number
that is often ignored in the literature: Avogadro’s number, NL = 6.02214179×1023mole−1
refers to one mole substance whereas Loschmidt’s constant n0 = 2.6867774 × 1025 m−3
counts the number of particles in one liter gas under normal conditions. The conver-
sion factor between both constants is the molar volume of an ideal gas that amounts to
22.414 dm−3 ·mole−1.
Stochastic Kinetics 189
Proper collection of terms and arrangement of summations – by taking into
account: w−(0) = 0 – yields
∂g(s, t)
∂t= rn
∞∑
n=0
(Pn−1(t) − Pn(t)
)sn + r
∞∑
n=0
((n+ 1)Pn+1(t) − nPn(t)
)sn .
Evaluation of the four infinite sums
∑∞
n=0Pn−1(t) s
n = s∑∞
n=0Pn−1(t) s
n−1 = s g(s, t) ,
∑∞
n=0Pn(t) s
n = g(s, t) ,
∑∞
n=0(n + 1)Pn+1(t) s
n =∂g(s, t)
∂t, and
∑∞
n=0nPn(t) s
n = s∑∞
n=0nPn(t) s
n−1 = s∂g(s, t)
∂t,
and regrouping of terms yields a linear partial differential equation of first
order∂g(s, t)
∂t= r
(n(s− 1) g(s, t) − (s− 1)
∂g(s, t)
∂s
). (4.14)
The partial differential equation (PDE) is solved through consecutive sub-
stitutions
φ(s, t) = g(s, t) exp(−n s) −→ ∂φ(s, t)
∂t= −r(s− 1)
∂φ(s, t)
∂s,
s− 1 = eρ and ψ(ρ, t) = φ(s, t) −→ ∂ψ(ρ, t)
∂t+ r
∂ψ(ρ, t)
∂ρ.
Computation of the characteristic manifold is equivalent to solving the
ordinary differential equation (ODE) r dt = −dρ. We find: rt − ρ = C
where C is the integration constant. The general solution of the PDE is an
arbitrary function of the combined variable rt− ρ:
ψ(ρ, t) = f(exp(−rt+ ρ)
)· e−n and φ(s, t) = f
((s− 1) e−rt
)· e−n ,
and the probability generating function
g(s, t) = f((s− 1) e−rt)
)· exp
((s− 1)n
).
190 Peter Schuster
Normalization of probabilities (for s = 1) requires g(1, t) = 1 and hence
f(0) = 1. The initial conditions as expressed by the conditional probability
P (n, 0|n0, 0) = Pn(0) = δn,n0 leads to the final expression
g(s, 0) = f(s− 1) · exp((s− 1)n
)= sn0 ,
f(ζ) = (ζ + 1)n0 · exp(−ζn) with ζ = (s− 1) e−rt ,
g(s, t) =(1 + (s− 1) e−rt
)n0 · exp(−n(s− 1) e−rt
)· exp
(n(s− 1)
)=
=(1 + (s− 1) e−rt
)n0 · exp−n(s − 1) (1 − e−rt)
.
(4.15)
From the generating function we compute with somewhat tedious but straight-
forward algebra the probability distribution
Pn(t) =
minn0,n∑
k=0
(n0
k
)nn−k · e
−krt (1− e−rt)n0+n−2k
(n− k)! · e−n (1−e−rt) (4.16)
with n, n0, n ∈ N0. In the limit t → ∞ we obtain a non-vanishing contri-
bution to the stationary probability only from the first term, k = 0, and
find
limt→∞
Pn(t) =nn
n!exp(−n) .
This is a Poissonian distribution with parameter and expectation value α =
n. The Poissonian distribution has also a variance which is numerically
identical with the expectation value, σ2(NA) = E(NA) = n, and thus the
distribution of particle numbers fulfils the√N -law at the stationary state.
The time dependent probability distribution allows to compute the ex-
pectation value and the variance of the particle number as a function of time
E(N (t)
)= n + (n0 − n) · e−rt ,
σ2(N (t)
)=(n + n0 · e−rt
)·(1 − e−rt
).
(4.17)
As expected the expectation value apparently coincides with the solution
curve of the deterministic differential equation
dn
dt= w+(n) − w−(n) = r (n− n) , (4.18)
which is of the form
n(t) = n + (n0 − n) · e−rt . (4.18’)
Stochastic Kinetics 191
Figure 4.3: Establishment of the flow equilibrium in the CSTR. The
upper part shows the evolution of the probability density, Pn(t), of the number of
molecules of a compound A which flows through a reactor of the type illustrated in
figure 4.2. The initially infinitely sharp density becomes broader with time until the
variance reaches its maximum and then sharpens again until it reaches stationarity.
The stationary density is a Poissonian distribution with expectation value and
variance, E(N ) = σ2(N ) = n. In the lower part we show the expectation value
E(N (t)
)in the confidence interval E ± σ. Parameters used: n = 20, n0 = 200,
and V = 1; sampling times (upper part): τ = r · t = 0 (black), 0.05 (green), 0.2
(blue), 0.5 (violet), 1 (pink), and ∞ (red).
192 Peter Schuster
Since we start from sharp initial densities variance and standard deviation are
zero at time t = 0. The qualitative time dependence of σ2NA(t), however,
depends on the sign of (n0 − n):
(i) For n0 ≤ n the standard deviation increases monotonously until it
reaches the value√n in the limit t→∞, and
(ii) for n0 > n the standard deviation increases until it passes through a
maximum at
t(σmax) =1
r
(ln 2 + lnn0 − ln(n0 − n)
)
and approaches the long-time value√n from above.
In figure 4.3 we show an example for the evolution of the probability den-
sity (4.16). In addition, the figure contains a plot of the expectation value
E(N (t)
)inside the band E − σ < E < E + σ. In case of a normally dis-
tributed stochastic variable we find 68.3% of all values within this confidence
interval . In the interval E − 2σ < E < E + 2σ we would find even 95.4% of
all stochastic trajectories (2.7.6).
Stochastic Kinetics 193
4.2 Classes of chemical reactions
In this section we shall present exact solutions of the chemical master equa-
tion for two classes of chemical reactions: monomolecular and bimolecular.
Molecularity of a reaction refers to the number of molecules in the reaction
complex and in most cases the molecularity is also reflected by the chemical
rate law of reaction kinetics in form of the reaction order. In particular, we
distinguish fist order and second order kinetics, which is typically observed
with monomolecular and bimolecular reactions, respectively.
4.2.1 Monomolecular chemical reactions
The reversible mono- or unimolecular chemical reaction can be split into two
irreversible elementary reactions
Ak1−−−→ B (4.19a)
Ak2
←−−− B , (4.19b)
wherein the reaction rate parameters, k1 and k2, are called reaction rate
constants . The reaction rate parameters depend on temperature, pressure,
and other environmental factors. At equilibrium the rate of the forward
reaction (4.19a) is precisely compensated by the rate of the reverse reac-
tion (4.19b), k1 ·[A] = k2 ·[B], leading to the condition for the thermodynamic
equilibrium:
K =k1
k2=
[B]
[A]. (4.20)
The parameter K is called the equilibrium constant that depends on tem-
perature, pressure, and other environmental factors like the reaction rate
parameters. In an isolated or in a closed system we have a conservation law:
NA(t) + NB(t)
V ·NA= [A] + [B] = c(t) = c0 = c = constant , (4.21)
with c being the total concentration and c the corresponding equilibrium
value, limt→∞ c(t) = c.
194 Peter Schuster
4.2.1.1 Irreversible monomolecular chemical reaction
We start by discussing the simpler irreversible case,
Ak
−−−→ B , (4.19a’)
which is can be modeled and analyzed in full analogy to the previous case of
the flow equilibrium. Although we are dealing with two molecular species,
A and B the process is described by a single stochastic variable, NA(t),
since we have NB(t) = n0 − NA(t) with n0 = n(0) being the number of
A molecules initially present because of the conservation relation (4.21). If
a sufficiently small time interval is applied, the irreversible monomolecular
reaction is modeled by a single step birth-and-death process with w+(n) = 0
and w−(n) = kn.6 The probability density is defined by Pn(t) = P (NA = n)
and its time dependence obeys
∂Pn(t)
∂t= k (n+ 1)Pn+1(t) − k nPn(t) . (4.22)
The master equation (4.22) is solved again by means of the probability gen-
erating function,
g(s, t) =
∞∑
n=0
Pn(t) sn ; |s| ≤ 1 ,
which is determined by the PDE
∂g(s, t)
∂t− k (1− s) ∂g(s, t)
∂s= 0.
The computation of the characteristic manifold of this PDE is tantamount
to solving the ODE
k dt =ds
s− 1=⇒ ekt = s− 1 + const .
With φ(s, t) = (s − 1) exp(−kt) + γ, g(s, t) = f(φ), the normalization con-
dition g(1, t) = 1, and the boundary condition g(s, 0) = f(φ)t=0 = sn0 we
find
g(s, t) =(s · e−kt + 1 − e−kt
)n0
. (4.23)
6We remark that w−(0) = 0 and w+(0) = 0 are fulfilled, which are the conditions for
a natural absorbing barrier at n = 0 (section 5.1.2).
Stochastic Kinetics 195
This expression is easily expanded in binomial form, which orders with re-
spect to increasing powers of s,
g(s, t) = (1− e−kt)n0+
(n0
1
)se−kt(1− e−kt)n0−1 +
(n0
2
)se−2kt(1− e−kt)n0−2+
+ . . .+
(n0
n0 − 1
)sn0−1e−(n0−1)kt(1− e−kt) + sn0e−n0kt .
Comparison of coefficients yields the time dependent probability density
Pn(t) =
(n0
n
)(exp(−kt)
)n (1− exp(−kt)
)n0−n. (4.24)
It is straightforward to compute the expectation value of the stochastic vari-
able NA, which coincides again with the deterministic solution, and its vari-
ance
E(NA(t)
)= n0 e
−kt ,
σ2(NA(t)
)= n0 e
−kt (1− e−kt).
(4.25)
The half-life of a population of n0 particles,
t1/2 : ENA(t) =n0
2= n0 · e−ktm =⇒ t1/2 =
1
kln 2 ,
is time of maximum variance or standard deviation, dσ2/ dt = 0 or dσ/ dt =
0, respectively. An example of the time course of the probability density of
an irreversible monomolecular reaction is shown in figure 4.4.
4.2.1.2 Reversible monomolecular chemical reaction
The analysis of the irreversible reaction is readily extended to the reversible
case (4.19), where we are dealing with a one step birth-and-death process.
Again we are dealing with a closed system, the conservation relation NA(t)+
NB(t) = n0 – with n0 being again the number of molecules of class A initially
present, Pn(0) = δn,n0 – holds and the transition probabilities are given by:
w+(n) = k2(n0 − n) and w−(n) = k1n.7 The master equation is now of the
7Here we note the existence of barriers at n = 0 and n = n0, which are characterized
by w−(0) = 0, w+(0) = k2n0 > 0 and w+(n0) = 0, w−(n0) = k1n0 > 0, respectively.
These equations fulfil the conditions for reflecting barriers (section 5.1.2).
196 Peter Schuster
Figure 4.4: Continued on next page.
Stochastic Kinetics 197
Figure 4.4: Probability density of an irreversible monomolecular reac-
tion. The three plots on the previous page show the evolution of the probability
density, Pn(t), of the number of molecules of a compound A which undergo a re-
action A→B. The initially infinitely sharp density Pn(0) = δn,n0 becomes broader
with time until the variance reaches its maximum at time t = t1/2 = ln 2/k and
then sharpens again until it approaches full transformation, limt→∞ Pn(0) = δn,0.
On this page we show the expectation value E(NA(t)
)and the confidence intervals
E ± σ (68,3%,red) and ±2σ (95,4%,blue) with σ2(NA(t)
)being the variance. Pa-
rameters used: n0 = 200, 2000, and 20 000; k = 1 [t−1]; sampling times: 0 (black),
0.01 (green), 0.1 (blue), 0.2 (violet), (0.3) (magenta), 0.5 (pink), 0.75 (red), 1
(pink), 1.5 (magenta), 2 (violet), 3 (blue), and 5 (green).
form
∂Pn(t)
∂t= k2(n0 − n + 1)Pn−1(t) + k1(n+ 1)Pn+1(t)−
−(k1n+ k2(n0 − n)
)Pn(t) .
(4.26)
Making use of the probability generating function g(s, t) we derive the PDE
∂g(s, t)
∂t=(k1 + (k2 − k1)s− k1s
2)∂g(s, t)
∂s+ n0 k2(s− 1) g(s, t) .
The solutions of the PDE are simpler when expressed in terms of parameter
combinations, κ = k1 + k2 and λ = k1/k2, and the function
198 Peter Schuster
ω(t) = λ exp(−κt) + 1:
g(s, t) =(1 + (s− 1) e−κt − s
λ
)n0
=
=
(λ (1− e−κt) + s (λe−κt + 1)
1 + λ
)n0
=
=
n0∑
n=0
((n0
n
) (λe−κt + 1
)n (λ(1− e−κt)
))n0−n sn
(1 + λ)n0.
The probability density for the reversible reaction is then obtained as
Pn(t) =
(n0
n
)1
(1 + λ)n0
(λe−κt + 1
)n (λ(1− e−κt)
)n0−n. (4.27)
Expectation value and variance of the numbers of molecules are readily com-
puted (with ω(t) = λ exp(−κt) + 1):
E(NA(t)
)=
n0
1 + λω(t) ,
σ2(NA(t)
)=
n0 ω(t)
1 + λ
(1− ω(t)
1 + λ
),
(4.28)
and the stationary values are
limt→∞
E(NA(t)
)= n0
k2
k1 + k2,
limt→∞
σ2(NA(t)
)= n0
k1 k2
(k1 + k2)2,
limt→∞
σ(NA(t)
)=√n0
√k1 k2
k1 + k2
.
(4.29)
This result shows that the√N -law is fulfilled up to a factor that is indepen-
dent of N : E/σ =√n0 k2/
√k1 k2.
Starting from a sharp distribution, Pn(0) = δn,n0, the variance increases,
may or may not pass through a maximum and eventually reaches the equi-
librium value, σ2 = k1k2 n0/(k1 + k2)2. The time of maximal fluctuations is
easily calculated from the condition dσ2/ dt = 0 and one obtains
tvar max =1
k1 + k2ln
(2 k1
k1 − k2
). (4.30)
Stochastic Kinetics 199
Figure 4.5: Continued on next page.
200 Peter Schuster
Figure 4.5: Probability density of a reversible monomolecular reaction
The three plots on the previous page show the evolution of the probability density,
Pn(t), of the number of molecules of a compound A which undergo a reaction AB.
The initially infinitely sharp density Pn(0) = δn,n0 becomes broader with time until
the variance settles down at the equilibrium value eventually passing a point of
maximum variance. On this page we show the expectation value E(NA(t)
)and the
confidence intervals E±σ (68,3%,red) and ±2σ (95,4%,blue) with σ2(NA(t)
)being
the variance. Parameters used: n0 = 200, 2000, and 20 000; k1 = 2 k2 = 1 [t−1];
sampling times: 0 (black), 0.01 (dark green), 0.025 (green), 0.05 (turquoise), 0.1
(blue), 0.175 (blue violet), 0.3 (purple), 0.5 (magenta), 0.8 (deep pink), 2 (red).
Depending on the sign of (k1−k2) the approach towards equilibrium passes a
maximum value or not. The maximum is readily detected from the height of
the mode of Pn(t) as seen in figure 4.5 where a case with k1 > k2 is presented.
In order to illustrate fluctuations and their value under equilibrium con-
ditions the Austrian physicist Paul Ehrenfest designed a game called Ehren-
fest’s urn model [102], which was indeed played in order to verify the√N -law.
Balls, 2N in total, are numbered consecutively, 1, 2, . . . , 2N , and distributed
arbitrarily over two containers, say A and B. A lottery machine draws lots,
which carry the numbers of the balls. When the number of a ball is drawn,
the ball is put from one container into the other. This setup is already suffi-
cient for a simulation of the equilibrium condition. The more balls are in a
Stochastic Kinetics 201
container, the more likely it is that the number of one of its balls is drawn
and a transfer occurs into the other container. Just as it occurs with chem-
ical reactions we have self-controlling fluctuations: Whenever a fluctuations
becomes large it creates a force for compensation which is proportional to
the size of the fluctuation.
4.2.2 Bimolecular chemical reactions
Two classes of bimolecular reactions are accessible to full stochastic analysis:
A + Bk
−−−→ C and (4.31a)
2Ak
−−−→ B . (4.31b)
Bimolecularity gives rise to nonlinearities in the kinetic differential equations
and in the master equations and complicates substantially the analysis of the
individual cases. At the same time, these classes of bimolecular equations
do not show essential differences in the qualitative behavior compared to
the corresponding monomolecular or linear case A −→ B in contrast to
autocatalytic processes (section 5.1). The following derivations are based
upon two publications [94, 95].
4.2.2.1 Addition reaction
In the first example (4.31a) we are dealing with three dependent stochastic
variables NA(t), NB(t), and NC(t). Following McQuarrie et al. we define the
probability Pn(t) = P(NA(t) = n
)and apply the standard initial condition
Pn(0) = δn,n0, P (NB(0) = b) = δb,b0 , and P (NC(0) = c) = δc,0. Accordingly,
we have from the laws of stoichiometry NB(t) = b0−n0 +NA(t) and NC(t) =
n0−NA(t). For simplicity we denote b0−n0 = ∆0. Then the master equation
for the chemical reaction is of the form
∂Pn(t)
∂t= k (n+ 1) (∆0 + n + 1)Pn+1(t) − k n (∆0 + n)Pn(t) . (4.31a’)
202 Peter Schuster
Figure 4.6: Irreversible bimolecular addition reaction A+B→ C. The plot
shows the probability distribution Pn(t) = Prob(NC(t) = n
)describing the num-
ber of molecules of species C as a function of time and calculated by equation (4.37).
The initial conditions are chosen to be NA(t) = δ(a, a0), NB(t) = δ(b, b0), and
NC(t) = δ(c, 0). With increasing time the peak of the distribution moves from
left to right. The state n = min(a0, b0) is an absorbing state and hence the long
time limit of the system is: limt→∞NC(t) = δ(n,min(a0, b0)
). Parameters used:
a0 = 50, b0 = 51, k = 0.02 [t−1 ·M−1]; sampling times (upper part): t = 0 (black),
0.01 (green), 0.1 (turquoise), 0.2 (blue), 0.3 (violet), 0.5 (magenta), 0.75 (red),
1.0 (yellow), 1.5 (red), 2.25 (magenta), 3.5 (violet), 5.0 (blue), 7.0 (cyan), 11.0
(turquoise), 20.0 (green), and ∞ (black).
We remark that the birth and death rates are no longer linear in n. The
corresponding PDE for the generating function is readily calculated
∂g(s, t)
∂t= k (∆0 + 1)(1− s)∂g(s, t)
∂s+ k s(1− s) ∂
2g(s, t)
∂s2. (4.32)
The derivation of solutions or this PDE is quite demanding. It can be
achieved by separation of variables:
g(s, t) =
∞∑
m=0
Am Zm(s)Tm(t) . (4.33)
Stochastic Kinetics 203
We dispense from details and list only the coefficients and functions of the
solution:
Am = (−1)m(2m+ ∆0)Γ(m+ ∆0)Γ(n0 + 1)Γ(n0 + ∆0 + 1)
Γ(m+ 1)Γ(∆0 + 1)Γ(n0 −m+ 1)Γ(n0 + ∆0 +m+ 1),
Zm(s) = Jm(∆0,∆0 + 1, s) , and
Tm(t) = exp(−m(m+ ∆0) kt
).
Herein, Γ represents the conventional gamma function with the definition
Γ(x + 1) = xΓ(x), and J(p, q, s) are the Jacobi polynomials named after
the German mathematician Carl Jacobi [103, ch.22, pp.773-802], which are
solutions of the differential equation
s(1−s)d2Jn(p, q, s)
ds2 +(q− (p+1)s
)dJn(p, q, s)
ds+ n(n+p) Jn(p, q, s) = 0 .
These polynomials fulfil the following conditions:
dJn(p, q, s)
ds= − n(n + p)
sJn−1(p+ 2, q + 1, s) and
∫ 1
0
sq−1(1− s)p−qJn(p, q, s)J`(p, q, s) ds =n!(Γ(q)
)2
Γ(n+ p− q + 1)
(2n+ p)Γ(n+ p)Γ(n + q)δ`,n .
At the relevant value of the dummy variable, s = 1, we differentiate twice
and find:(∂g(s, t)
∂s
)
s=1
=
n0∑
m=1
(2m+ ∆0)Γ(n0 + 1)Γ(n0 + ∆0 + 1)
Γ(n0 −m+ 1)Γ(n0 + ∆0 +m+ 1)Tm(t) , (4.34)
(∂2g(s, t)
∂s2
)
s=1
=
=
n0∑
m=2
(m− 1)(m+ ∆0 + 1)(2m + ∆0)Γ(n0 + 1)Γ(n0 + ∆0 + 1)
Γ(n0 −m+ 1)Γ(n0 −∆0 +m+ 1)Tm(t) (4.35)
from which we obtain expectation value and variance according to subsec-
tion 2.7.1
E(NA(t)
)=
(∂g(s, t)
∂s
)
s=1
and
σ2(NA(t)
)=
(∂2g(s, t)
∂s2
)
s=1
+
(∂g(s, t)
∂s
)
s=1
−((
∂g(s, t)
∂s
)
s=1
)2
. (2.71’)
204 Peter Schuster
As we see in the current example and we shall see in the next subsubsec-
tion, bimolecularity complicates the solution of the chemical master equa-
tions substantially and makes it quite sophisticated. We dispense here from
the detailed expressions but provide the results for the special case of vast
excess of one reaction partner, |∆0| n0 > 1, which is known as pseudo
first order condition. Then, the sums can be approximated well be the first
terms and we find (with k′ = ∆0k):
(∂g(s, t)
∂s
)
s=1
≈ n0∆0 + 2
n0 + ∆0 + 1e−(∆0+1)kt ≈ n0 e
−k′t and
(∂2g(s, t)
∂s2
)
s=1
≈ n0 (n0 − 1) e−2k′t ,
and we obtain finally,
E(NA(t)
)= n0 e
−k′t and
σ2(NA(t)
)= n0 e
−k′t(1− e−k′t
),
(4.36)
which is essentially the same result as obtained for the irreversible first order
reaction.
For the calculation of the probability density we make use of a slightly
different definition of the stochastic variables and use NC(t) counting the
number of molecules C in the system: Pn(t) = P(NC(t) = n
). With the
initial condition Pn(0) = δ(n, 0) and the upper limit of n, limt→∞ Pn(t) = c
with c = mina0, b0 where a0 and b0 are the sharply defined numbers of A
and B molecules initially present (NA(0) = a0, NB(0) = b0), we have
c∑
n=0
Pn(t) = 1 and thus Pn(t) = 0 ∀ (n /∈ [0, c], n ∈ Z)
and the master equation is now of the form
∂Pn(t)
∂t= k
(a0 − (n− 1)
)(b0 − (n− 1)
)Pn−1(t)−
− k (a0 − n)(b0 − n)Pn(t) . (4.31a”)
Stochastic Kinetics 205
In order to solve the master equation (4.31a”) the probability distribution
Pn(t) is Laplace transformed in order to obtain a set of pure difference equa-
tion from the master equation being a set of differential-difference equation
qn(s) =
∫ ∞
0
exp(− s · t)Pn(t) dt
and with the initial condition Pn(0) = δ(n, 0) we obtain
−1 + s q0(s) = − k a0 b0 q0(s) ,
s qn(s) = k(a0 − (n− 1)
)(b0 − (n− 1)
)qn−1(s)−
− k (a0 − n)(b0 − n) qn(s) , 1 ≤ n ≤ c .
Successive iteration yields the solutions in terms of the functions qn(s)
qn(s) =
(a0
n
)(b0n
)(n!)2kn
n∏
j=0
1
s+ k(a0 − j)(b0 − j), 0 ≤ n ≤ c
and after converting the product into partial fractions and inverse transfor-
mation one finds the result
Pn(t) = (−1)n(a0
n
)(b0n
) n∑
j=0
(−1)j(
1 +n− j
a0 + b0 − n− j
)×
×(n
j
)(a0 + b0 − j
n
)−1
e−k(a0−j)(b0−j)t .
(4.37)
An illustrative example is shown in figure 4.6. The difference between the
irreversible reactions monomolecular conversion and the bimolecular addition
reaction (figure 4.4) is indeed not spectacular.
4.2.2.2 Dimerization reaction
When the dimerization reaction (4.31b) is modeled by means of a master
equation [94] we have to take into account that two molecules A vanish at a
time, and an individual jump involves always ∆n = 2:
∂Pn(t)
∂t=
1
2k (n + 2)(n+ 1)Pn+2(t) −
1
2k n(n− 1)Pn(t) , (4.31b’)
206 Peter Schuster
Figure 4.7: Irreversible dimerization reaction 2A → C. The plot shows
the probability distribution Pn(t) = Prob(NA(t) = n
)describing the number of
molecules of species C as a function of time and calculated by equation (4.42). The
number of molecules C is given by the distribution Pm(t) = Prob(NC(t) = m
).
The initial conditions are chosen to be NA(t) = δ(n, a0), and NC(t) = δ(m, 0) and
hence we have n + 2m = a0. With increasing time the peak of the distribution
moves from right to left. The state n = 0 is an absorbing state and hence the long
time limit of the system is: limt→∞NA(t) = δ(n, 0) limt→∞NC(t) = δ(m,a0/2).
Parameters used: a0 = 100 and k = 0.02[t−1 ·M−1]; sampling times (upper part):
t = 0 (black), 0.01 (green), 0.1 (turquoise), 0.2 (blue), 0.3 (violet), 0.5 (magenta),
0.75 (red), 1.0 (yellow), 1.5 (red), 2.25 (magenta), 3.5 (violet), 5.0 (blue), 7.0
(cyan), 11.0 (turquoise), 20.0 (green), 50.0 (chartreuse), and ∞ (black).
which gives rise to the following PDE for the probability generating function
∂g(s, t)
∂t=
k
2(1− s2)
∂2g(s, t)
∂s2. (4.38)
The analysis of this PDE is more involved than it might look at a first glance.
Nevertheless, an exact solution similar to (4.33) is available:
g(s, t) =
∞∑
m=0
AmC− 1
2m (s)Tm(t) , (4.39)
Stochastic Kinetics 207
wherein the parameters and functions are defined by
Am =1− 2m
2m· Γ(n0 + 1) Γ[(n0 −m+ 1)/2]
Γ(n0 −m+ 1) Γ[(n0 +m+ 1)/2],
C− 1
2m (s) : (1− s2)
d2C− 1
2m (s)
ds2 + m(m− 1)C− 1
2m (s) = 0 ,
Tm(t) = exp−1
2km(m− 1) t .
The functions C− 1
2m (s) are ultraspherical or Gegenbauer polynomials named
after the German mathematician Leopold Gegenbauer [103, ch.22, pp.773-
802]. They are solution of the differential equation shown above and belong to
the family of hypergeometric functions. It is straightforward to write down
expressions for the expectation values and the variance of the stochastic
variable NA(t) (µ stands for an integer running index, µ ∈ N):
E(NA(t)
)= −
2bn02c∑
m=2µ=2
Am Tm(t) and
σ2(NA(t)
)= −
2bn02c∑
m=2µ=2
(1
2(m2 −m+ 2)Am Tm(t) + A 2
m T2m(t)
).
(4.40)
In order to obtain concrete results these expressions can be readily evaluated
numerically.
There is one interesting detail in the deterministic version of the dimer-
ization reaction. It is conventionally modeled by the differential equation
(4.41a), which can be solved readily. The correct ansatz, however, would
be (4.41b) for which we have also an exact solution (with [A]=a(t) and
a(0) = a0):
−da
dt= k a2 =⇒ a(t) =
a0
1 + a0 ktand (4.41a)
−da
dt= k a(a− 1) =⇒ a(t) =
a0
a0 + (1− a0)e−kt. (4.41b)
The expectation value of the stochastic solution lies always between the solu-
tion curves (4.41a) and (4.41b). An illustrative example is shown in figure 4.7.
208 Peter Schuster
As the previous subsection 4.2.2.1 we consider also a solution of the master
equation by means of a Laplace transformation [95]. Since we are dealing
with a step size of two molecules A converted into one molecule C, the master
equation is defined only for odd or only for even numbers of molecules A. For
an initial number of 2a0 molecules and a probability P2n(t) = P(NA(t) =
2n)
we have for the initial conditions NA(0) = 2a0, NC(0) = 0 and the
condition that all probabilities outside the interval [0, 2a0] as well as the odd
probabilities P2n−1 (n = 1, . . . , 2a0 − 1) vanish
∂P2n(t)
∂t= −1
2k (2n)(2n − 1)P2n(t) +
1
2k (2n + 2)(2n + 1)P2n+2(t) (4.31b”)
The probability distribution P2n(t) is derived as in the previous subsection
by Laplace transformation
q2y(s) =
∫ ∞
0
exp(− s · t)P2y(t) dt
yielding the set of difference equations
−1 + s q2a0(s) = −1
2k (2a0)(2a0 − 1) q2a0(s) ,
s q2n(s) = −1
2k (2n)(2n− 1) q2n(s) +
+1
2k (2n+ 2)(2n+ 1) q2n+2(s) , 0 ≤ y ≤ a0 − 1 ,
which again can be solved by successive iteration. It is straightforward to cal-
culate first the Laplace transform for 2µ, the number of molecules of species
A that have reacted to yield C: 2µ = 2(a0−m) with m = [C] and 0 ≤ m ≤ a0:
q2(a0−m)(s) =
(k
2
)m (2a0
2m
)(2m)!
m∏
j=1
(s+
k
2
(2(a0 − j)
)·(2(a0 − j) − 1
))−1
,
and a somewhat tedious but straightforward exercise in algebra yields the
inverse Laplace transform:
P2(a0−m)(t) = (−1)ma0! (2a0 − 1)!!
(a−m)! (2a0 − 2m− 1)×
×m∑
j=0
(−1)j(4a0 − 4j − 1)(4a0 − 2m− 2j − 3)!!
j!(m− j)!(4a0 − 2j − 1)!!×
× e−k (a0−j)·(2(a0−j)−1
)t .
Stochastic Kinetics 209
The substitution i = a0 − j leads to
P2(a0−m)(t) = (−1)ma0! (2a0 − 1)!!
(a−m)! (2a0 − 2m− 1)×
×a0∑
i=a0−m(−1)a0−i
(4i− 1)(2a0 − 2m+ 2i− 3)!!
(a0 − i)!(a0 − i+m)!(2a0 + 2i− 1)!!×
× e−k 2i·(2i−1)t .
Setting now n = a0 − m in accord with the definition of m we obtain the
final result
P2n(t) = (−1)na0!(2a0 − 1)!!
n!(2n− 1)!!×
×n∑
i=1
(−1)i(4i− 1)(2n+ 2i− 3)!!
n!(2n− 1)!!× e−k i(2i−1)t .
(4.42)
The results are illustrated be means of a numerical example in figure 4.6.
210 Peter Schuster
4.3 Fokker-Planck approximation of master equations
It is often desirable to use a single differential equation for the description
of probability distributions P (x) for jump processes. The obvious strategy
is to approximate the master equation by a Fokker-Planck equation and im-
plicitly such an approach has been already used by Albert Einstein in his
famous work on Brownian motion [33]. A nonrigorous but straightforward
derivation of an expansion of the master equation was given by the Dutch
physicist Hendrik Kramers [104] and later the Kramers approach has been
substantially improved by the mathematical physicist Jose Moyal [105]. This
approach is generally known a Kramers-Moyal expansion. A differently moti-
vated and rigorous expansion is due the Dutch theoretical physicist Nicholas
van Kampen who was a PhD student of Hendrik Kramers. His expansion,
known as the van Kramers size expansion, is based on a general approxima-
tion by size dependent parameters containing those that vanish in the limit
of macroscopic extensions.
Before we discuss these expansions and approximations, we shall intro-
duce first the reverse approach: A diffusion process can always be approxi-
mated by a jump process whereas the inverse is not always true. We shall
encounter such a situation in case of the Poisson process, which cannot be
simulated by a suitable Fokker-Planck equation. Basic for the conversion in
the limit of vanishing step size is an expansion of the transition probabilities
that is truncated after the second term as in the case of the derivation of the
Chapman-Kolmogorov equation (subsection 3.1.2).
4.3.1 Diffusion process approximated by a jump process
The typical result is derived for a random walk (subsection 3.4.2) where the
master equation becomes a Fokker-Planck equation in the limit of infinitely
small step size. Jumps, in this case, must become smaller and more prob-
able simultaneously and this is taken care by a scaling assumption that is
encapsulated in a parameter δ in the following way: average step size and the
variance of the step size are proportional to δ whereas the jump probabilities
increase as δ becomes smaller.
Stochastic Kinetics 211
First we introduce a new variable y = (z − x− A(x)δ)/√δ and write for
the jump probability
Wδ(z|x) = δ−3/2 Φ(y, x) with
∫dyΦ(y, x)
= Q and
∫dy yΦ(y, x) = 0 .
(4.43)
Now we define three terms for a series expansion in the jump moments,
α0(x) ≡∫
dz Wδ(z|x) = Q/δ
α1(x) ≡∫
dz (z − x)Wδ(z|x) = A(x)Q (4.44)
α2(x) ≡∫
dz (z − x)2Wδ(z|x) =
∫dy y2 Φ(y, x) ,
and assume that the function Φ(y, x) vanishes sufficiently fast as y → ∞ in
order to guarantee that
limδ→0
Wδ(z|x) = limy→∞
((x
z − x
)3
Φ(y, x)
)
= 0 for z 6= x .
Next we choose a twice differentiable function f(z) and carry out a procedure
that is very similar to the derivation of the differential Chapman-Kolmogorov
equation in section 3.1.2 and find
limδ→0
⟨∂f(z)
∂t
⟩=
⟨α1(z)
∂f(z)
∂z+
1
2α2(z)
∂2f(z)
∂z2
⟩.
This result has the consequence that in the limit δ → 0 the master equation
∂P (x)
∂t=
∫dz(W (x|z)P (z) − W (z|x)P (x)
)(4.45a)
becomes the FPE
∂P (x)
∂t= − ∂
∂x
(α1 P (x)
)+
1
2
∂2
∂x2
(α2 P (x)
). (4.45b)
Accordingly, one can always construct a master equation if the requirements
imposed by the three α-functions (4.44) are met. In case these criteria are not
212 Peter Schuster
fulfilled, there is no approximation possible.The approximation is illustrated
by means of three examples:
Random walk. Based on the notation introduced in subsection 3.4.2 we
find for x = n · l:
W (x|z) = ϑ (δz,x−l + δz,x+l) =⇒ α0(x) = 2ϑ, α1(x) = 0, α2(x) = 2 l2 ϑ .
With δ = l2 and D = l2ϑ we obtain the familiar diffusion equation
∂P (x, t)
∂t= D
∂2P (x, t)
∂x2. (4.46)
Poisson process. With the notation used in subsection 3.4.1 (except α↔ ϑ)
and x = n · l we find:
W (x|z) = ϑ δz,x+l =⇒ α0(x) = ϑ, α1(x) = l ϑ, α2(x) = l2 ϑ .
In this case there is no way to define l and ϑ as functions of δ such that α1(x)
and α2(x) remain finite in the limit l → 0. There is no Fokker-Planck limit
for the Poisson process.
General approximation of diffusion by birth-and death master equa-
tions. We begin with a master equation of the class
Wδ(z|x) =
(A(x)
2δ+B(x)
2δ2
)δz,x+δ +
(−A(x)
2δ+B(x)
2δ2
)δz,x−δ , (4.47)
where Wδ(z|x) is positive for sufficiently small δ. Under the assumption that
this is fulfilled for the entire range of interest for x, the process takes place
on a range of x that is composed of integer multiples of δ.8 In the limit δ → 0
the birth and death process is converted into a Fokker-Planck equation with
α0(x) = B(x)/δ2, α1(x) = A(x), α2(x) = B(x) and
limδ→0
Wδ(z|x) = 0 for z 6= x .(4.48)
8We remark that the scaling relations (4.43) and (4.47) not the same but both lead to
a Fokker-Planck equation.
Stochastic Kinetics 213
Although α0(x) diverges with 1δ2 in contrast to (4.44) – where we pre-
scribed the required 1/δ behavior – and the imagination of jumps converging
smoothly into a continuous distribution is no longer valid, there exists a
limiting Fokker-Planck equation, because the behavior of α0(x) is irrelevant
∂P (x, t)
∂t= − ∂
∂x
(A(x)P (x, t)
)+
1
2
∂2
∂x2
(B(x)P (x, t)
). (4.49)
Equation (4.48) provides a tool for the simulation of a diffusion process by
an approximating birth-and-death process. The method, however, fails for
B(x) = 0 for all possible ranges of x since then Wδ(z, x) does not fulfil the
criterion of being nonnegative.
4.3.2 Kramers-Moyal expansion
The derivation starts from the master equation (4.45a) and a substitution of
z by the definition
y = x − z in the first term and
y = z − x in the second term .
We redefine also the elements of the transition matrix
T (y, x) = W (x+ y|x) ,
and the master equation is now of the form
∂P (x, t)
∂t=
∫dy(T (x, x− y)P (x− y, t) − T (y, x)P (x, t)
), , (4.45a’)
and the integral is readily expanded in a power series
∂P (x, t)
∂t=
∫dy
∞∑
n=1
(−y)nn!
∂n
∂xn
(T (y, x)P (x, t)
)
=∞∑
i=1
(−1)n
n!
∂n
∂xn
(αn(x)P (x, t)
), (4.50)
where the n-th derivative moment is defined by
αn(x) =
∫dz (z − x)nW (z|x) =
∫dy yn T (y, x) . (4.50’)
In case the Kramers-Moyal expansion is terminated at the second term the
result is Fokker-Planck equation of the form (4.45b).
214 Peter Schuster
4.3.3 Size expansion of the chemical master equation
Although we were able to analyze a number of representative examples by
solving the one step birth-and-death master equation exactly, the actual ap-
plicability to specific problems of chemical kinetics of this technique is rather
limited. In order to apply a chemical master equation to a problem in prac-
tice on is commonly dealing with about 1012 particles or more. Upscaling
discloses one particular problem that is related to size expansion and that
becomes virulent in the transition from the master equation to a Fokker-
Planck equation. The problem is intimately related to the parameter volume
V , which is the best possible estimator of system size. We distinguish to
classes of quantities: (i) intensive quantities that are independent of sys-
tem size, and (ii) extensive quantities that grow proportional to system size.
Examples of intensive properties are temperature, pressure, density, and ex-
tensive properties are volume, energy, or entropy. In upscaling from say 1000
to 1012 particles extensive properties grow by a factor of 109 whereas inten-
sive properties remain the same. Some pairs of properties – one extensive
and one intensive – are of particular importance, for example particle num-
ber N and concentration c = N /(V · NL) or mass M and (volume) density
% = M/V .
In order to compensate for the lack of generality, approximation methods
were developed which turned out to be particularly useful in the limit of suffi-
ciently large particle numbers [98]. The Dutch theoretical physicist Nicholas
van Kampen expands the master equation in the inverse square root of some
extensive quantity, particle number, mass or volume, which is characteristic
of system size and which will be denoted by Ω. In van Kampen’s notation,
a ∝ Ω = extensive variable
x = a/Ω = intensive variable ,(4.51)
the limit of interest is a large value of Ω at fixed x, which is tantamount
to the transition to a macroscopic system. The transition probabilities are
reformulated as
W (a|a′) = W (a′; ∆a) with ∆a = a− a′ ,
Stochastic Kinetics 215
and scaled according to the assumption
W (a|a′) = Ωψ
(a′
Ω,∆a
).
The essential trick in the van Kampen expansion is that the size of the jump is
expressed in term of an extensive quantity, ∆a, whereas the intensive variable
x is used for the expression of the dependence on the position variable, a′.
The expansion is made now in the new variable z defined by
a = Ωφ(t) + Ω1/2 z ,
where the function φ(t) is still to be determined. The derivative moments
αn(a) are now proportional to the system size Ω and therefore we can scale
them accordingly: αn(a) = Ω αn(x). In the next step the new variable z is
introduced into the Kramers-Moyal expansion (4.50):
∂P (z, t)
∂t− Ω1/2 ∂φ
∂t
∂P (z, t)
∂z=
=
∞∑
n=1
(−1)nΩ 1−n/2
n!
∂n
∂zn
(αn(φ(t) + Ω−1/2 z
)P (z, t)
)
For general validity of an expansion all terms of a certain order in the ex-
pansion parameter must vanish. We make use of this property to define φ(t)
such that the terms of order Ω1/2 are eliminated by demanding
∂φ
∂t= α1
(φ(t)
). (4.52)
This equation an ODE determining φ(t) and, of course, in full agreement with
the deterministic equation for the expectation value of the random variable.
Accordingly, φ(t) is indeed the deterministic part of the solution.
The next step is an expansion of αn(φ(t)+Ω−1/2z
)in Ω−1/2 and reordering
of terms yielding
∂P (z, t)
∂t=
∞∑
m=2
Ω−(m−2)/2
m!
m∑
n=1
(−1)n(m
n
)αm−nn
(φ(t)
) ∂n
∂zn
(zm−nP (z, t)
)
In taking the limit of large system size Ω all terms vanish except the one
with m = 2 and we find the result
∂P (z, t)
∂t= − α(1)
1
(φ(t)
) ∂∂z
(z P (z, t)
)+
1
2α2
(φ(t)
) ∂2
∂z2P (z, t) , (4.53)
216 Peter Schuster
where α(1)1 stands for the linear drift term. Since the size expansion is of
fundamental importance for an understanding of the relation between micro-
scopic and macroscopic processes, we shall also provide the original slightly
different derivation of van Kampen’s in section 5.2.
It is straightforward to compare with the result of the Kramers-Moyal
expansion (4.50) terminated after two terms:
∂P (a, t)
∂t= − ∂
∂a
(α1(a)P (a, t)
)+
1
2
∂2
∂a2
(α2(a)P (a, t)
).
The change of variables x = a/Ω leads to
∂P (x, t)
∂t= − ∂
∂x
(α1(x)P (x, t)
)+
1
2Ω
∂2
∂x2
(α2(x)P (x, t)
).
The application of small noise theory with ε2 = Ω−1 and for the substitution
z = Ω1/2(x − φ(t)
)one obtains the lowest order Fokker-Planck equation,
which is exactly the same as the lowest order approximation in the van
Kampen expansion. This result has an important consequence: If we are only
interested in the lowest order approximation we may use the Kramers-Moyal
equation, which is much easier to derive than the van Kampen equation.
If only the small noise approximation is approximately valid than it is
appropriate to consider only the linearization of the drift term and individual
solutions of this equations are represented by the trajectories of the stochastic
equation:
dz = α(1)1
(φ(z)
)z dt +
√α2
(φ(t)
)dW (t) . (4.54)
Eventually, we have found a procedure to relate approximately master equa-
tions, Fokker-Planck and stochastic differential equations and to close the
gap between microscopic stochasticity and macroscopic behavior.
The chemical reaction AB as an example. The transition probabilities
for the interval t′ → t of the corresponding single step birth-and-death master
equation with [A]t = a(t), [A]t′ = a′, [B] = b, a fixed concentration, and the
reaction rate parameters k1 and k2 are:
W (a|a′) = δa,a′+1 k2b + δa,a′−1 k1a .
Stochastic Kinetics 217
Figure 4.8: Comparison of expansions of the master equation. The re-
action AB with B buffered, [B] = b = b0, is chosen as example and the exact
solution (black) is compared with the results of the Kramers-Moyal expansion (red)
and the van Kampen size expansion (blue). Parameter choice: V = 1, k1 = 2,
k2 = 1, b = 40.
Now we choose the volume of the system, V , as size parameter and have:
a = αV and b = βV . This leads to the scaled transition probability,
W (α′; ∆a) = V(k2β δ∆a,1 + k1α
′ δ∆a,−1
),
and the first two derivative moments
α1 =∑
(a′)
(a′ − a)W (a′|a) = k2b − k1a = V (k2β − k1α) ,
α2 =∑
(a′)
(a′ − a)2W (a′|a) = k2b + k1a = V (k2β + k1α) .
218 Peter Schuster
Following the procedure of van Kampen’s expansion we define
a = V φ(t) + V 1/2 z (4.55)
and obtain for the deterministic differential equation and its solution:
dφ(t)
dt= k2β − k1 φ(t) and φ(t) = φ(0) e−k1t +
k2β
k1
(1− e−k1t) .
The Fokker-Planck equation takes on the form
∂P (z)
∂t= k1
∂
∂z
(z P (z)
)+
1
2
∂2
∂z2
((k2β + k1 φ(t)
)P (z)
)
The expectation value of z is readily computed to be 〈z(t)〉 = z(0)e−k1t.
Since the partition of the variable a in equation (4.55) is arbitrary we can
assume z(0) = 0 – as usual9 – and find for the variance in z
σ2(z(t)
)=
(k2β
k1+ φ(0)
)(1− e−k1t)
and eventually obtain for the solutions in the macroscopic variable a with
a(0) = V φ(0)
〈a(t)〉 = V φ(t) = a(0) e−k1t +k2 b
k1(1− e−k1t) ,
σ2(a(t)
)= V σ2
(z(t)
)=
(k2 b
k1+ a(0)
)(1− e−k1t) .
Finally, we compare the different stationary state solutions obtained from
the van Kampen expansion, α = k2b/k1,
P (z) =1√
πα2
(1 + erf(
√α2)) exp
(− (z − α)2
2α
),
with those derived from the Kramers-Moyal expansion
P (a) = N (k2 b + k1 a)−1+4k2 b/k1 e−2a ,
and the exact solution
P (a) =
(k2 b/k1
)aexp(−k2 b/k1
)
a!=
αa e−α
a!,
which is a Poissonian. A comparison of numerical plots is shown in figure 4.8.
9The assumption z(0) = 0 implies z(t) = 0 and hence the corresponding stochastic
variable Z(t) describes the fluctuations around zero.
Stochastic Kinetics 219
4.4 Numerical simulation of master equations
In this section we introduce a model for computer simulation of stochastic
chemical kinetics that has been developed and put upon a solid basis by the
American physicist and mathematical chemist Daniel Gillespie [1–4]. Consid-
ered is a population ofN molecular species, S1,S2, . . . ,SN in the gas phase,
which interact throughM elementary chemical reactions (R1,R2, . . . ,RM).10
Two conditions are assumed to be fulfilled by the system: (i) the container
with constant volume V in the sense of a flow reactor (CSTR in figure 4.2)
is assumed to be well mixed by efficient stirring and (ii) the system is as-
sumed to be in thermal equilibrium at constant temperature T . The goals
of the simulation are the computation of the time course of the stochas-
tic variables – Xk(t) being the number of molecules (K) of species Sk at
time t – and the description of the evolution of the population. A single
computation yields a single trajectory, very much in the sense of a single so-
lution of a stochastic differential equation (figure 3.9) and observable results
are commonly derived through sampling of trajectories. Exceptions are sin-
gle molecule techniques, which allow for experimental observation of single
events including whole trajectories of biopolymer folding and unfolding (see,
for example [28, 30, 106, 107]).
4.4.1 Definitions and conditions
For a reaction mechanism involving N species in M reactions the entire
population is characterized by an N -dimensional random vector counting
numbers of molecules for the various species Sk,
~X (t) =(X1(t),X2(t), . . . ,XN(t)
). (4.56)
The common variables in chemistry are concentrations rather than particle
numbers:
x =(x1(t), x2(t), . . . , xN (t)
)with xk =
XkV ·NL
, (4.57)
10Elementary steps of chemical reactions are defined and discussed in subsection 4.1.1.
220 Peter Schuster
where the volume V is the appropriate expansion parameter Ω for the sys-
tem size (subsection 4.3.3).11 The following derivation of the chemical master
equation [3, pp. 407-417] focuses on reaction channels Rµ of bimolecular na-
ture
Sa + Sb −−−→ Sc + . . . (4.58)
like (4.1f,4.1i,4.1j and 4.1k) shown in the list (4.1). An extension to monomolec-
ular and trimolecular reaction channels is straightforward, and zero-molecular
processes like the influx of material into the reactor in the elementary step (4.1a)
provide no major problems. Reversible reactions, for example (4.19), are
handled as two elementary steps, A + B −→ C + D and C + D −→ A + B.
In equation (4.58) we distinguish between reactant species, A and B, and
product species, C . . . , of a reaction Rµ.
The two stipulations (i) perfect mixture and (ii) thermal equilibrium can
now be cast into precise physical meanings. Premise (i) requires that the
probability of finding the center of an arbitrarily chosen molecule inside a
container subregion with a volume ∆V is equal to ∆V/V . The system is
spatially homogeneous on macroscopic scales but it allows for random fluctu-
ations from homogeneity. Formally, requirement (i) asserts that the position
of a randomly selected molecule is described by a random variable, which is
uniformly distributed over the interior of the container. Premise (ii) implies
that the velocity of a randomly chosen molecule of mass m will be found to
lie within an infinitesimal region dv3 around the velocity v is equal to
PMB =
(m
2πkBT
)e−mv
2/(2kBT ) .
Here, the velocity vector is denoted by v = (vx, vy, vz) in Cartesian coordi-
nates, the infinitesimal volume element fulfils dv3 = dvx dvy dvz, the square
of the velocity is v2 = v2x + v2
y + v2z , and kB is Boltzmann’s constant. Premise
(ii) asserts that the velocities of molecules follow a Maxwell-Boltzmann dis-
tribution or formally it states that each Cartesian velocity component of a
randomly selected molecule of mass m is represented by a random variable,
11In order to distinguish random and deterministic variables stochastic concentrations
are indicated by upright fonts.
Stochastic Kinetics 221
Figure 4.9: Sketch of a molecular collision in dilute gases. A spherical
molecule Sa with radius ra moves with a velocity v = vb−va relative to a spherical
molecule Sb with radius rb. If the two molecules are to collide within the next
infinitesimal time interval dt, the center of Sb has to lie inside a cylinder of radius
r = ra + rb and height v dt. The upper and lower surface of the cylinder are
deformed into identically oriented hemispheres of radius r and therefore the volume
of the deformed cylinder is identical with that of the non-deformed one.
which is normally distributed with mean 0 and variance kBT/m. Implicitly,
the two stipulations assert that the molecular position and velocity compo-
nents are all statistically independent of each other. For practical purposes,
we expect premises (i) and (ii) to be valid for any dilute gas system at con-
stant temperature in which nonreactive molecular collisions occur much more
frequently than reactive molecular collisions
4.4.2 The probabilistic rate parameter
In order to derive a chemical master equation for the population variables
Xk(t) some properties of the probability πµ(t, dt) with µ = 1, . . . ,M that a
randomly selected combination of the reactant molecules for reaction Rµ at
time t will react to yield products within the next infinitesimal time interval
[t, t + dt[. With the assumptions made in the previous subsection 4.4.1
virtually all chemical reaction channels fulfil the condition
πµ(t, dt) = γµ dt , (4.59)
222 Peter Schuster
where the specific probability rate parameter γµ is independent of dt. First,
we calculate the rate parameter for a general bimolecular reaction by means
of classical collision theory and then extend briefly to mono- and trimolec-
ular reactions. Apart from the quantum mechanical approach the theory of
collisions in dilute gases is the best developed microscopic model for chemi-
cal reactions and well suited for a rigorous derivation of the master equation
from molecular motion and events.
4.4.2.1 Bimolecular reactions
The occurrence of a reaction A + B has to be preceded by a collision of an
Sa molecule with an Sb molecule, and first we shall calculate the probability
of such a collision in the reaction volume V . For simplicity molecular species
are regarded as spheres with specific masses and radii, for example ma and
ra for Sa, and mb and rb for Sb, respectively. A collision occurs whenever the
center-to center distance of the two molecules RAB decreases to (RAB)min =
ra + rb. Next we define the probability that a randomly selected pair of Rµ
reactant molecules at time t will collide within the next infinitesimal time
interval [t, t+ dt[ by π∗µ(t, dt) and calculate it from the Maxwell-Boltzmann
distribution of molecular velocities according to figure 4.9.
The probability that a randomly selected pair of reactant molecules Rµ,
one molecule Sa and one molecule Sb, has a relative velocity v = vb − va
lying in an infinitesimal volume element dv3 about v at time t is denoted by
P (v(t),Rµ) and can be readily obtained from kinetic theory of gases:
P (v(t),Rµ) =
(µ
2π kBT
)exp(−µv2/(2kBT )
)dv3 .
Herein v = |v| =√v2x + v2
y + v2z is the value of the relative velocity and
µ = mamb/(ma + mb) is the reduced mass of the two Rµ molecules. Two
properties of the probabilities P (v(t),Rµ) for different velocities v are im-
portant:
(i) The elements in the set of all velocity combinations, Ev(t),Rµ are mutu-
ally exclusive, and
(ii) they are collectively exhaustive since v is varied over the entire three
Stochastic Kinetics 223
dimensional velocity space.
Now we relate the probability P (v(t),Rµ) to a collision event Ecol by calculat-
ing the conditional probability P (Ecol(t, dt)|Ev(t),Rµ). In figure 4.9 we sketch
the geometry of the collision event between to randomly selected spherical
molecules Sa and Sb that is assumed to occur with an infinitesimal time in-
terval dt:12 A randomly selected molecule Sa moves along the vector v of the
relative velocity vb − vb between Sa and an also randomly selected molecule
Sb. A collision between the molecules will take place in the interval [t, t+ dt
if and only if the center of molecule Sb is inside the spherically distorted
cylinder (figure 4.9) at time t. Thus P (Ecol(t, dt)|Ev(t),Rµ) is the probability
that the center of a randomly selected Sb molecule moving with velocity v(t)
relative to the randomly selected Sa molecule will be situated at time t with
a certain subregion of V , which has a volume Vcol = v dt · π(ra + rb)2, and by
scaling with the total volume V we obtain:13
P(Ecol(t, dt)|Ev(t),Rµ
)=
v(t) dt · π(ra + rb)2
V. (4.60)
By substitution and integration over the entire velocity space we can calculate
the desired probability
π∗µ(t, dt) =
∫∫∫
v
(µ
2π kBT
)e−µv
2/(2kBT ) · v(t) dt · π(ra + rb)2
Vdv3 .
Evaluation of the integral is straightforward and yields
π∗µ(t, dt) =
(8π kBT
V
)1/2(ra + rb)
2
µdt . (4.61)
The first factor contains only constants and the macroscopic quantities, vol-
ume V and temperature T , whereas the molecular parameters, the radii ra
and rb and the reduced mass µ.
12The absolute time t comes into play because the positions of the molecules, ra and
rb, and their velocities, va and vb, depend on t.13Implicitly in the derivation we made use of the infinitesimally small size of dt. Only
if the distance v dt is vanishingly small, the possibility of collisional interference of a third
molecule can be neglected.
224 Peter Schuster
A collision is a necessary but not a sufficient condition for a reaction to
take place and therefore we introduce a collision-conditioned reaction prob-
ability pµ that is the probability that a randomly selected pair of colliding
Rµ reactant molecules will indeed react according to Rµ. By multiplication
of independent probabilities we have
πµ(t, dt) = pµ π∗µ(t, dt) ,
and with respect to equation (4.59) we find
γµ = pµ
(8π kBT
V
)1/2(ra + rb)
2
µ. (4.62)
As said before, it is crucial for the forthcoming analysis that γµ is independent
of dt and this will be the case if and only if pµ does not depend on dt. This
is highly plausible for the above given definition, and an illustrative check
through the detailed examination of bimolecular reactions can be found in
[3, pp.413-417]. It has to be remarked, however, that the application of
classical collision theory to molecular details of chemical reactions can be
an illustrative and useful heuristic at best, because the molecular domain
falls into the realm of quantum phenomena and any theory that aims at a
derivation of reaction probabilities from first principles has to be built upon
a quantum mechanical basis.
4.4.2.2 Monomolecular, trimolecular, and other reactions
A monomolecular reaction is of the form A −→ C and describes the sponta-
neous conversion
Sa −−−→ Sc . (4.63)
One molecule Sa is converted into one molecule Sc. This reaction is different
from a catalyzed conversion
Sa + Sb −−−→ Sc + Sb , (4.58’)
Stochastic Kinetics 225
where the conversion A −→ C is initiated by a collision of an A molecule
with a B molecule,14 and a description as an ordinary bimolecular process is
straightforward.
The true monomolecular conversion (4.63) is driven by some quantum
mechanical mechanism similar as in the case of radioactive decay of a nu-
cleus. Time-dependent perturbation theory in quantum mechanics [108,
pp.724-739] shows that almost all weakly perturbed energy-conserving tran-
sitions have linear probabilities of occurrence in time intervals δt, when δt
is microscopically large but macroscopically small. Therefore, to a good ap-
proximation the probability for a radioactive nucleus to decay within the
next infinitesimal time interval dt is of the form α dt, were α is some time-
independent constant. On the basis of analogy we may expect πµ(t, dt) the
probability for a monomolecular conversion to be approximately of the form
γµ dt with γµ being independent of dt.
Trimolecular reactions of the form
Sa + Sb + Sc −−−→ Sd + . . . (4.64)
should not be considered because collisions of three particles do not occur
with a probability larger than of measure zero. There may be, however, spe-
cial situations where approximations of complicated processes by trimolecular
events is justified. One example is a set of three coupled reactions with four
reactant molecules [109, pp.359-361] where is was shown that πµ(t, dt) is
essentially linear in dt.
The last class of reaction to be considered here is no proper chemical
reaction but an influx of material into the reactor. It is often denoted as a
the zeroth order reaction (4.1a):
∗ −−−→ Sa . (4.65)
Here, the definition of the influx and the efficient mixing or homogeneity
condition is helpful, because it guarantees that the number of molecules
entering the homogeneous system is a constant and does not depend on dt.
14Remembering what had been said in subsection 3.6.2 the two reactions are related by
rigorous thermodynamics: Whenever the catalyzed reaction is described an incorporation
of the uncatalyzed process in the reaction mechanism is required.
226 Peter Schuster
4.4.3 Simulation of chemical master equations
So far we succeeded to derive the fundamental fact that for each elementary
reaction channel Rµ with µ = 1, . . . ,M , which is accessible to the molecules
of a well-mixed and thermally equilibrated system in the gas phase, exists a
scalar quantity γµ, which is independent of dt such that [3, p.418]
γµ dt = probability that a randomly selected combination of
Rµ reactant molecules at time t will react accordingly
in the next infinitesimal time interval [t, t+ dt[ .
(4.66)
The specific probability rate constant, γµ is one of three quantities that are
required to fully characterize a particular reaction channel Rµ. In addition we
shall require a function hµ(n) where the vector n = (n1, . . . , nn)′ contains the
exact numbers of all molecules at time t, ~N (t) =(N1(t), . . . ,NN(t)
)′= n(t),
hµ(n) ≡ the number of distinct combinations of Rµ reactant
molecules in the system when the numbers of molecules
Sk are exactly nk with k = 1, . . . , N ,
(4.67)
and an N ×M matrix of integers S = νkµ; k = 1, . . . , N, µ = 1, . . . ,M,where
νkµ ≡ the change in the Sk molecular population caused by the
occurrence of one Rµ reaction.(4.68)
The functions hµ(n) and the matrix N are readily deduced by inspection of
the algebraic structure of the reaction channels. We illustrate by means of
an example:
R1 : S1 + S2 −−−→ S3 + S4 ,
R2 : 2S1 −−−→ S1 + S5 , and (4.69)
R3 : S3 −−−→ S5 .
Stochastic Kinetics 227
The functions hµ(n) are obtained by simple combinatorics
h1(n) = n1 n2 ,
h2(n) = n1 (n1 − 1)/2 , and
h3(n) = n3 ,
and the matrix S is of the form
S =
−1 −1 0
−1 0 0
+1 0 −1
+1 0 0
0 +1 +1
.
It is worth noticing that the functional form of hµ is determined exclusive
by the reactant side of Rµ. In particular, has precisely the same form de-
termined by mass action in the deterministic kinetic equations with the ex-
ception that the particle numbers have to counted exactly in small systems,
n(n − 1) instead of n2 for example. The stoichiometric matrix S refers to
the product side of the reaction equations and counts the net production of
molecular species per one elementary reaction event: νkµ is the number of
molecules Sk produces by reaction Rµ, these numbers are integers and nega-
tive values indicate the number of molecules, which have disappeared during
one reaction. In the forthcoming analysis we shall make use of vectors cor-
responding to individual reactions Rµ: νµ = (ν1µ, . . . , νNµ)′.
Analogy to deterministic kinetics. It is illustrative to consider now the
analogy to conventional chemical kinetics. If we denote the concentration
vector of our molecular species Sk by x = (x1, . . . , xN)′ and the flux vector
ϕ = (ϕ1, . . . , ϕN)′ the kinetic equation can be expressed by
dx
dt= S · ϕ . (4.70)
The individual elements of the flux vector in mass action kinetics are
ϕµ = kµ
n∏
k=1
xgkµ
k for g1µ S1 + g2µ S2 + . . . + gNµ SN −−−→
228 Peter Schuster
wherein the factors gkµ are the stoichiometric coefficients on the reactant
side of the reaction equations. It is sometimes useful to define analogous
factors qkµ for the product side, both classes of factors can be summarized
in matrices G and Q and then the stochastic matrix is simply given by the
difference S = Q−G. We illustrate by means of the model mechanism (4.69)
in our example:
Q − G =
0 +1 0
0 0 0
+1 0 0
+1 0 0
0 +1 +1
−
+1 +2 0
+1 0 0
0 0 +1
0 0 0
0 0 0
=
−1 −1 0
−1 0 0
+1 0 −1
+1 0 0
0 +1 +1
= S
We remark that the entries of G and Q are nonnegative integers by defini-
tion. The flux ϕ has the same structure as in the stochastic approach, γµ
corresponds to the kinetic rate parameter or rate constant kµ and the com-
binatorial function hµ and the mass action product are identical apart from
the simplifications for large particle numbers.
Occurrence of reactions. The probability of occurrence of reaction events
within an infinitesimal time interval dt is cast into three theorems:
Theorem 1. If ~X (t) = n, then the probability that exactly one Rµ will occur
in the system within the time interval [t, t+ dt[ is equal to
γµ hµ(n) dt + o( dt) ,
where o( dt) denotes terms that approach zero with dt faster than dt.
Theorem 2. If ~X (t) = n, then the probability that no reaction will occur
within the time interval [t, t+ dt[ is equal to
1 −∑
µ
γµ hµ(n) dt + o( dt) .
Theorem 3. The probability of more than one reaction occurring in the
system within the time interval [t, t+ dt[ is of order o( dt).
Proofs for all three theorems are found in [3, pp.420,421].
Stochastic Kinetics 229
Based on the three theorems an analytical description of the evolution of
the population vector ~X (t). The initial state of the system at some initial
time t0 is fixed: ~X (t0) = n0. Although there is no chance to derive a
deterministic equation for the time-evolution, a deterministic function for
the time-evolution of the probability function P (n, t|n0, t0) for t ≥ t0 will
be obtained. We express the probability P (n, t|n0, t0) as the sum of the
probabilities of several mutually exclusive and and collectively exhaustive
routes from ~X (t0) = n0 to ~X (t + dt) = n. These routes are distinguished
from one another with respect to the event that happened in the time interval
[t, t+ dt[:
P (n, t+ dt|n0, t0) = P (n, t|n0, t0) ×
1−M∑
µ=1
γµ hµ(n) dt + o( dt)
+
+M∑
µ=1
P (n− νµ, t|n0, t0) ×(γµ hµ(n− νµ) dt + o( dt)
)+
+ o( dt) .
(4.71)
The different routes from ~X (t0) = n0 to ~X (t + dt) = n are obvious from
the balance equation (4.71):
(i) One route from ~X (t0) = n0 to ~X (t + dt) = n is given by the first
term on the right-hand side of the equation: No reaction is occurring in the
time interval [t, t + dt[ and hence ~X (t) = n was fulfilled at time t. The
joint probability for route (i) is therefore the probability to be in ~X (t) = n
conditioned by ~X (t0) = n0 times the probability that no reaction has occurred
in [t, t+ dt[. In other words, the probability for this route is the probability
to go from n0 at time t0 to n at time t and to stay in this state during the
next interval dt.
(ii) An alternative route from ~X (t0) = n0 to ~X (t + dt) = n accounted
for by one particular term in sum of terms on the right-hand side of the
equation: An Rµ reaction is occurring in the time interval [t, t + dt[ and
hence ~X (t) = n − νµ was fulfilled at time t. The joint probability for
route (ii) is therefore the probability to be in ~X (t) = n − νµ conditioned by
~X (t0) = n0 times the probability that exactly one Rµ reaction has occurred
230 Peter Schuster
in [t, t+ dt[. In other words, the probability for this route is the probability
to go from n0 at time t0 to n− νµ at time t and to undergo an Rµ during
the next interval dt. Obviously, the same consideration is valid for every
elementary reaction and we have M terms of this kind.
(iii) A third possibility – neither no reaction nor exactly one reaction
chosen from the set Rµ; µ = 1, . . . ,M – must inevitably invoke more than
one reaction within the time interval [t, t + dt[. The probability for such
events, however, is o( dt) or of measure zero by theorem 3.
All routes (i) and (ii) are mutually exclusive since different events are
taking place within the last interval [t, t+ dt[.
The last step to derive the chemical master equation is straightforward:
P (n, t|n0, t0) is subtracted from both sides in equation (4.71), then both sides
are divided by dt, the limit dt ↓ 0 is taken, all o( dt) terms vanish and finally
we obtain
∂
∂tP (n, t|n0, t0) =
M∑
µ=1
(γµ hµ(n− νµ)P (n− νµ|n0, t0)−
− γµ hµ(n)P (n, t|n0, t0)).
(4.72)
Initial conditions are required to calculate the time evolution of the proba-
bility P (n, t|n0, t0) and we can easily express them in the form
P (n, t0|n0, t0) =
1 , if n = n0 ,
0 , if n 6= n0 ,(4.72’)
which is precisely the initial condition used in the derivation of equation (4.71).
Any sharp probability distribution P(nk, t0|n(0)
k , t0)
= δ(nk − n(0)k ) is admit-
ted for the molecular particle numbers at t0. The assumption of extended
initial distributions is, of course, also possible but the corresponding master
equation becomes more sophisticated.
Stochastic Kinetics 231
4.4.4 The simulation algorithm
The chemical master equation (4.72) as derived in the last subsection 4.4.3 is
closely related to a stochastic simulation algorithm for chemical reactions [1,
2, 4] and it is important to realize how the simulation tool fits into the general
theoretical framework of the chemical master equation. The algorithm is
not based on the probability function P (n, t|n0, t0) but on another related
probability density p(τ,µ|n, t), which expresses the probability that given
~X (t) = n the next reaction in the system will occur in the infinitesimal time
interval [t+ τ, t+ τ + dτ [, and it will be an Rµ reaction.
Figure 4.10: Partitioning of the time interval [t, t+τ+dτ [. The entire interval
is subdivided into (k+ 1) nonoverlapping subintervals. The first k intervals are of
equal size ε = τ/k and the (k + 1)-th interval is of length dτ .
Considering the theory of random variables, p(τ,µ|n, t) is the joint den-
sity function of two random variables: (i) the time to the next reaction, τ ,
and (ii) the index of the next reaction, µ. The possible values of the two
random variables are given by the domain of the real variable 0 ≤ τ < ∞and the integer variable 1 ≤ µ ≤ M . In order to derive an explicit formula
for the probability density p(τ,µ|n, t) we introduce the quantity
a(n) =M∑
µ=1
γµ hµ(n)
and consider the time interval [t, t + τ + dτ [ to be partitioned into k + 1
subintervals, k > 1. The first k of these intervals are chosen to be of equal
length ε = τ/k, and together they cover the interval [t, t + τ [ leaving the
232 Peter Schuster
interval [t + τ, t + τ + dτ [ as the remaining (k + 1)-th part (figure 4.10.
With ~X (t) = n the probability p(τ,µ|n, t) describes the event no reaction
occurring in each of the k ε-size subintervals and exactly one Rµ reaction in
the final infinitesimal dτ interval. Making use of theorems 1 and 2 and the
multiplication law of probabilities we find
p(τ,µ|n, t) =(1 − a(n) ε + o(ε)
)k(γµ hµ(n) dτ + o(dτ)
)
Dividing both sides by dτ and taking the limit dτ ↓ 0 yields
p(τ,µ|n, t) =(1 − a(n) ε + o(ε)
)kγµ hµ(n)
This equation is valid for any integer k > 1 and hence its validity is also
guaranteed for k → ∞. Next we rewrite the first factor on the right-hand
side of the equation
(1 − a(n) ε + o(ε)
)k=
(1 − a(n) kε + k o(ε)
k
)k=
=
(1 − a(n) τ + τ o(ε)/ε
k
)k,
and take now the limit k → ∞ whereby we make use of the simultaneously
occurring convergence o(ε)/ε ↓ 0:
limk→∞
(1 − a(n) ε + o(ε)
)k= lim
k→∞
(1− a(n) τ
k
)k= e−a(n) τ .
By substituting this result into the initial equation for the probability density
of the occurrence of a reaction we find
p(τ,µ|n, t) = a(n) e−a(n) τ γµ hµ(n)
a(n)= γµ hµ(n) e−
∑Mν=1 γνhν (n) τ . (4.73)
Equation (4.73) provides the mathematical basis for the stochastic simu-
lation algorithm. Given ~X (t) = n, the probability density consists of two
independent probabilities where the first factor describes the time to the next
reaction and the second factor the index of the next reaction. These factors
correspond to two statistically independent random variables r1 and r2.
Stochastic Kinetics 233
4.4.5 Implementation of the simulation algorithm
Equation (4.73) is implemented now for computer simulation and we inspect
the probability densities of the two unit-interval uniform random variables
r1 and r2 in order to find the conditions to be imposed of a statistically exact
sample pair (τ,µ): r1 has an exponential density function with the decay
constant a(n),
τ =1
a(n)ln(1/r1), (4.74a)
and taking m to be the smallest integer which fulfils
µ = inf
m∣∣∣
m∑
µ=1
cµ hµ(n) > a(n) r2
. (4.74b)
After the values for τ and µ have been determined accordingly the action
advance the state vector ~X (t) of the system is taking place:
~X (t) = n −→ ~X (t+ τ) = n + νµ .
Repeated application of the advancement procedure is the essence of the
stochastic simulation algorithm. It is important to realize that this advance-
ment procedure is exact as far as r1 and r2 are obtained by fair samplings
from a unit-interval uniform random number generator or, in other words,
the correctness of the procedure depends on the quality of the random num-
ber generator applied. Two further issues are important: (i) The algorithm
operates with internal time control that corresponds to real time of the chem-
ical process, and (ii) contrary to the situation in differential equation solvers
the discrete time steps are not finite interval approximations of an infinites-
imal time step and instead, the population vector ~X (t) maintains the value
~X (t) = n throughout the entire finite time interval [t, t + dτ [ and then
changes abruptly to ~X (t + τ) = n + νµ at the instant t + τ when the Rµ
reaction occurs. In other words, there is no blind interval during which the
algorithm is unable to record changes.
234 Peter Schuster
Table 4.1: The combinatorial functions hµ(n) for elementary reactions.
Reactions are ordered with respect to reaction order, which in case of mass action
is identical to the molecularity of the reaction. Order zero implies that no reactant
molecule is involved and the products come from an external source, for example
from the influx in a flow reactor. Orders 1,2 and 3 mean that one, two or three
molecules are involved in the elementary step, respectively.
No. Reaction Order hµ(n)
1 ∗ −→ products 0 1
2 A −→ products 1 nA
3 A + B −→ products 2 nAnB
4 2A −→ products 2 nA(nA − 1)/2
5 A + B + C −→ products 3 nAnBnC
6 2A + B −→ products 3 nA(nA − 1)nB/2
7 3A −→ products 3 nA(nA − 1)(nA − 2)/6
Structure of the algorithm. The time evolution of the population in de-
scribed by the vector ~X (t) = n(t), which is updated after every individual
reaction event. Reactions are chosen from the set R = Rµ; µ = 1, . . . ,M,which is defined by the reaction mechanism under consideration. They are
classified according to the criteria listed in table 4.1. The reaction proba-
bilities corresponding to the reaction rates of deterministic kinetics are con-
tained in a vector a(n) =(c1h1(n), . . . , cMhM(n)
)′, which is also updated
after every individual reaction event. Updating is performed according to
the stoichiometric vectors νµ of the individual reactions Rµ, which represent
columns of the stoichiometric matrix S. We repeat that the combinato-
rial functions hµ(n) are determined exclusively by the reactant side of the
reaction equation whereas the stoichiometric vectors νµ represent the net
production, (products)−(reactants).
Stochastic Kinetics 235
The algorithm comprises five steps:
(i) Step 0. Initialization: The time variable is set to t = 0, the initial
values of all N variables X1, . . . ,XN for the species – Xk for species
Sk – are stored, the values for the M parameters of the reactions Rµ,
c1, . . . , cM , are stored, and the combinatorial expressions are incorpo-
rated as factors for the calculation of the reaction rate vector a(n)
according to table 4.1 and the probability density P (τ,µ). Sampling
times, t1 < t2 < · · · and the stopping time tstop are specified, the first
sampling time is set to t1 and stored and the pseudorandom number
generator is initialized by means of seeds or at random.
(ii) Step 1. Monte Carlo step: A pair of random numbers is created (τ,µ)
by the random number generator according to the joint probability
function P (τ,µ). In essence two explicit methods can be used: the
direct method and the first-reaction method.
(iii) Step 2. Propagation step: (τ,µ) is used to advance the simulation time
t and to update the population vector n, t → t + τ and n → n + νµ,
then all changes are incorporated in a recalculation of the reaction rate
vector a.
(iv) Step 3. Time control : Check whether or not the simulation time has
been advanced through the next sampling time ti, and for t > ti send
current t and current n(t) to the output storage and advance the sam-
pling time, ti → ti+1. Then, if t > tstop or if no more reactant molecules
remain leading to hµ = 0 ∀ µ = 1, . . . ,M , finalize the calculation by
switching to step 4, and otherwise continue with step 1.
(v) Step 4. Termination: Prepare for final output by setting flags for early
termination or other unforseen stops and send final time t and final n
to the output storage and terminate the computation.
A caveat is needed for the integration of stiff systems where the values of in-
dividual variable can vary by many orders of magnitude and such a situation
might caught the calculation in a trap by slowing down time progress.
236 Peter Schuster
The Monte Carlo step. Pseudorandom numbers are drawn from a random
number generator of sufficient quality whereby quality is meant in terms of
no or very long recurrence cycles and a the closeness of the distribution of the
pseudorandom numbers r to the uniform distribution on the unit interval:
0 ≤ α < β ≤ 1 =⇒ P (α ≤ r ≤ β) = β − α .
With this prerequisite we discuss now two methods which use two output
values r of the pseudorandom number generator to generate a random pair
(τ,µ) with the prescribed probability density function P (τ,µ).
The direct method. The two-variable probability density is written as the
product of two one-variable density functions:
P (τ,µ) = P1(τ) · P2(µ|τ) .
Here, P1(τ) dτ is the probability that the next reaction will occur between
times t + τ and t + τ + dτ , irrespective of which reaction it might be, and
P2(µ|τ) is the probability that the next reaction will be an Rµ given that
the next reaction occurs at time t+ τ .
By the addition theorem of probabilities, P1(τ) dτ is obtained by summa-
tion of P (τ,µ) dτ over all reactions Rµ:
P1(τ) =
M∑
µ=1
P (τ,µ) . (4.75)
Combining the last two equations we obtain for P2(µ|τ)
P2(µ|τ) = P (τ,µ)/ M∑
ν
P (τ,ν) (4.76)
Equations (4.75) and (4.76) express the two one-variable density functions
in terms of the original two-variable density function P (τ,µ). From equa-
tion (4.73) we substitute into P (τ,µ) = p(τ,µ|n, t) through simplifying the
notation by using
aµ ≡ γµhµ(n) and a =
M∑
µ=1
aµ ≡M∑
µ=1
γµhµ(n)
Stochastic Kinetics 237
and find
P1(τ) = a exp(−a τ) , 0 ≤ τ <∞ and
P2(µ|τ) = P2(µ) = aµ
/a , µ = 1, . . . ,M .
(4.77)
As indicated, in this particular case, P2(µ|τ) turns out to be independent
of τ . Both one variable density functions are properly normalized over their
domains of definition:
∫ ∞
0
P1(τ) dτ =
∫ ∞
0
a e−a τ dτ = 1 and
M∑
µ=1
P2(µ) =
M∑
µ=1
aµ
a= 1 .
Thus, in the direct method a random value τ is created from a random
number on the unit interval, r1, and the distribution P1(τ) by taking
τ = − ln r1a
. (4.78)
The second task is to generate a random integer µ according to P2(µ|τ) in
such a way that the pair (τ,µ) will be distributed as prescribed by P (τ,µ).
For this goal another random number, r2, will be drawn from the unit interval
and then µ is taken to be the integer that fulfils
µ−1∑
ν=1
aν < r2 a ≤µ∑
ν=1
aν . (4.79)
The values a1, a2, . . . , are cumulatively added in sequence until their sum
is observed to be equal or to exceed r2a and then µ is set equal to the
index of the last aν term that had been added. Rigorous justifications for
equations (4.78) and (4.79) are found in [1, pp.431-433]. If a fast and reliable
uniform random number generator is available, the direct method can be
easily programmed and rapidly executed. This it represents a simple, fast,
and rigorous procedure for the implementation of the Monte Carlo step of
the simulation algorithm.
The first-reaction method. This alternate method for the implementa-
tion of the Monte Carlo step of the simulation algorithm is not quite as
efficient as the direct method but it is worth presenting here because it adds
238 Peter Schuster
insight into the stochastic simulation approach. Adopting again the notation
aν ≡ γνhν(n) it is straightforward to derive
Pν(τ) dτ = aν exp(−aν τ) dτ (4.80)
from (4.66) and (4.67). Then, Pν(τ) would indeed be the probability at time
t for an Rν reaction to occur in the time interval [t+ τ, t+ τ+dτ [ were it not
for the fact that the number of Rν reactant combinations might have been
altered between t and t+ τ by the occurrence of other reactions. Taking this
into account, a tentative reaction time τν for Rν is generated according to
the probability density function Pν(τ), and in fact, the same can be done for
all reactions Rµ. We draw a random number rν from the unit interval and
compute
τν = − ln rν
aν
, ν = 1, . . . ,M . (4.81)
From these M tentative next reactions the one, which occurs first, is chosen
to be the actual next reactions:
τ = smallest τν for all ν = 1, . . . ,M ,
µ = ν for which τν is smallest .(4.82)
Daniel Gillespie [1, pp.420-421] provides a straightforward proof that the
random (τ,µ) obtained by the first reaction method is in full agreement
with the probability density P (τ,µ) from equation (4.73).
It is tempting to try to extend the first reaction methods by letting the
second next reaction be the one for which τν has the second smallest value.
This, however, is in conflict with correct updating of the vector of particle
numbers, n, because the results of the first reaction are not incorporated into
the combinatorial terms hµ(n). Using the second earliest reaction would, for
example, allow the second reaction to involve molecules already destroyed in
the first reaction but would not allow the second reaction to involve molecules
created ion the first reaction.
Thus, the first reaction method is just as rigorous as the direct method
and it is probably easier to implement in a computer code than the direct
method. From a computational efficiency point of view, however, the direct
Stochastic Kinetics 239
method is preferable because for M ≥ 3 it requires fewer random numbers
and hence the first reaction methods is wasteful. This question of economic
use of computer time is not unimportant because stochastic simulations in
general are taxing the random number generator quite heavily. For M ≥ 3
and in particular for large M the direct method is probably the method of
choice for the Monte Carlo step.
An early computer code of the simple version of the algorithm described
– still in FORTRAN – is found in [1]. Meanwhile many attempts were made
in order to speed-up computations and allow for simulation of stiff systems
(see e.g. [110]. A recent review of the simulation methods also contains a
discussion of various improvements of the original code [4].
240 Peter Schuster
5. Applications of stochastic processes in
biology
Compared to stochasticity in chemistry stochastic phenomena in biology are
not only more important but also much harder to control. The major sources
of the problem are small particle numbers and the lack of sufficiently simple
references systems that are accessible to experimental studies. In biology we
are regularly encountering reaction mechanisms that lead to enhancement of
fluctuations at non-equilibrium conditions and biology in essence is dealing
with processes and stationary states far away from equilibrium whereas in
chemistry autocatalysis in non-equilibrium systems became an object of gen-
eral interest and intensive investigation not before some forty years ago. We
start therefore with the analysis of simple autocatalysis modeled by means of
a simple birth-and-death process. Then we present an overview over solvable
birth-and-death processes (section 5.1) and discuss the role of boundaries in
form of different barriers (section 5.1.2). In section 5.2 we come back to the
size expansion for stochastic processes and analyze it with biological prob-
lems in the focus. Finally, the Poisson expansion is presented in section 5.3,
because it finds very useful applications in biology.
5.1 Autocatalysis, replication, and extinction
In the previous chapter we analyzed already bimolecular reactions, the addi-
tion and the dimerization reaction, which gave rise to perfectly normal be-
havior although the analysis was quite sophisticated (subsection 4.2.2). The
nonlinearity became manifest in task to find solutions but did not change
effectively the qualitative behavior of the reaction systems, for example the√N -law for the fluctuations in the stationary states retained its validity. As
an exactly solvable example we shall study first a simple reaction mechanism
241
242 Peter Schuster
consisting of two elementary steps, replication and extinction. In this case
the√N -law is not valid and fluctuations do not settle down to some value
which is proportional to the square root of the size of the system but grow
in time without limit as we saw in case of the Wiener process (3.4.3).
5.1.1 Autocatalytic growth and death
Reproduction of individuals is modeled by a simple duplication mechanism
and death is represented by first order decay. In the language of chemical
kinetics these two steps are:
A + Xλ
−−−→ 2X , (5.1a)
Xµ
−−−→ B . (5.1b)
The rate parameters for reproduction and extinction are denoted by λ and
µ, respectively.1 The material required for reproduction is assumed to be
replenished as it is consumed and hence the amount of A available is con-
stant and assumed to be included in the birth parameter: λ = f · [A]. The
degradation product B does not enter the kinetic equation because reaction
(5.1b) is irreversible. The stochastic process corresponding to equations (5.1)
belongs to the class of linear birth-and-death processes with w+(n) = λ · nand w−(n) = µ · n.2 The master equation is of the form,
∂Pn(t)
∂t= λ (n− 1)Pn−1(t) + µ (n+ 1)Pn+1(t) − (λ+ µ)nPn(t) , (5.2)
1Reproduction is to be understood a asexual reproduction here. Sexual reproduction,
of course, requires two partners and gives rise to a process of order 2 (table 4.1).2Here we use the symbols commonly applied in biology: λ(n) for birth, µ(n) for death,
and ν for immigration and ρ for emigration (tables 5.1 and 5.2). These notions were created
especially for application to biological problems, in particular for problems in theoretical
ecology. Other notions and symbols are common in chemistry: A birth corresponds to the
production of a molecule, f ≡ λ, a death to its decomposition or degradation through a
chemical reaction, d ≡ µ. Influx and outflux are the proper notions for immigration and
emigration.
Stochastic Kinetics 243
t
E( )N( )t
Figure 5.1: A growing linear birth-and-death process.The two-step reaction
mechanism of the process is (X → 2X, X → ) with rate parameters λ and
µ, respectively. The upper part shows the evolution of the probability density,
Pn(t) = ProbX (t) = n. The initially infinitely sharp density, P (n, 0) = δ(n, n0)
becomes broader with time and flattens as the variance increases with time. In
the lower part we show the expectation value E(N (t)
)in the confidence interval
E ± σ. Parameters used: n0 = 100, λ =√
2, and µ = 1/√
2; sampling times
(upper part): t = 0 (black), 0.1 (green), 0.2 (turquoise), 0.3 (blue), 0.4 (violet),
0.5 (magenta), 0.75 red), and 1.0 (yellow).
and after introduction of the probability generating function g(s, t) gives rise
to the PDE∂g(s, t)
∂t− (s− 1) (λs− µ)
∂g(s, t)
∂s= 0 . (5.3)
244 Peter Schuster
t
E( )N( )t
Figure 5.2: A decaying linear birth-and-death process. The two-step reac-
tion mechanism of the process is (X → 2X, X → ) with rate parameters λ and
µ, respectively. The upper part shows the evolution of the probability density,
Pn(t) = ProbX (t) = n. The initially infinitely sharp density, P (n, 0) = δ(n, n0)
becomes broader with time and flattens as the variance increases but then sharp-
ens again as process approaches the absorbing barrier at n = 0. In the lower part
we show the expectation value E(N (t)
)in the confidence interval E ± σ. Param-
eters used: n0 = 40, λ = 1/√
2, and µ =√
2; sampling times (upper part): t = 0
(black), 0.1 (green), 0.2 (turquoise), 0.35 (blue), 0.65 (violet), 1.0 (magenta), 1.5
red), 2.0 (orange), 2.5 (yellow), and limt→∞ (black).
Stochastic Kinetics 245
Solution of this PDE yields different results for equal or different replication
and extinction rate coefficients, λ 6= µ and λ = µ, respectively. In the first
case we substitute γ = λ/µ ( 6= 1) and η(t) = exp((λ− µ)t
), and find:
g(s, t) =
(η(t)− 1
)+(γ − η(t)
)s(
γη(t)− 1)
+ γ(1− η(t)
)s
n0
and
Pn(t) = γnmin(n,n0)∑
m=0
(−1)m(n0 + n−m− 1
n−m
)(n0
m
)×
×(
1− η(t)1− γη(t)
)n0+n−m(
γ − η(t)γ(1− η(t)
))m
.
(5.4)
In the derivation of the expression for the probability distributions we ex-
panded enumerator and denominator of the expression in the generating
function g(s, t), by using expressions for the sums (1 + s)n =∑n
k=0
(nk
)sk
and (1 + s)−n = 1 +∑∞
k=1(−1)k n(n+1)...(n+k−1)k!
sk, multiply, order terms with
respect to powers of s, and compare with the expansion of the generating
function, g(s, t) =∑∞
n=0 Pn(t) sn.
Computations of expectation value and variance are straightforward:
E(NX(t)
)= n0 e
(λ−µ) t and
σ2(NX(t)
)= n0
λ+ µ
λ− µ e(λ−µ) t
(e(λ−µ) t − 1
) (5.5)
Illustrative examples of linear birth-and-death processes with growing (λ >
µ) and decaying (λ < µ) populations are shown in figures 5.1 and 5.2, re-
spectively.
In the degenerate case of neutrality with respect to growth, µ = λ, the
246 Peter Schuster
same procedure yields:
g(s, t) =
(λt + (1− λt) s1 + λt + λt s
)n0
, (5.6a)
Pn(t) =
(λt
1 + λt
)n0+n min(n,n0)∑
m=0
(n0 + n−m− 1
n−m
)(n0
m
)(1− λ2t2
λ2t2
)m,
(5.6b)
E(NX(t)
)= n0 , and (5.6c)
σ2(NX(t)
)= 2n0 λt . (5.6d)
Comparison of the last two expressions shows the inherent instability of this
reaction system. The expectation value is constant whereas the fluctuations
increase with time. The degenerate birth-and-death process is illustrated in
figure 5.3. The case of steadily increasing fluctuations is in contrast to an
equilibrium situation where both, expectation value and variance approach
constant values. Recalling the Ehrenfest urn game, where fluctuations were
negatively correlated with the deviation from equilibrium, we have here two
uncorrelated processes, replication and extinction. The particle number n
fulfils a kind of random walk on the natural numbers, and indeed in case of
the random walk (see equation (3.30) in subsection 3.4.2 we had also obtained
a constant expectation value E = n0 and a variance that increases linearly
with time, σ2(t) = 2ϑ(t− t0)).A constant expectation value accompanied by a variance that increases
with time has an easy to recognize consequence: At some critical time above
which the standard deviation exceeds the expectation, tcr = n0
/(2λ). From
this instant on predictions on the evolution of the system based on the expec-
tation value become obsolete. Then we have to rely on individual probabili-
ties or other quantities. Useful in this context is the probability of extinction
of all particles, which can be readily computed:
P0(t) =
(λt
1 + λt
)n0
. (5.7)
Provided we wait long enough, the system will die out with probability one,
since we have limt→∞ P0(t) = 1. This seems to be a contradiction to the
Stochastic Kinetics 247
n
P tn ( )
Figure 5.3: Continued on next page.
248 Peter Schuster
Figure 5.3: Probability density of a linear birth-and-death with equal
birth and death rate. The two-step reaction mechanism of the process is (X→2X, X → ) with rate parameters λ = µ. The upper and the middle part show
the evolution of the probability density, Pn(t) = Prob(X (t) = n
). The initially
infinitely sharp density, P (n, 0) = δ(n, n0) becomes broader with time and flattens
as the variance increases but then sharpens again as the process approaches the
absorbing barrier at n = 0. In the lower part, we show the expectation value
E(N (t)
)in the confidence interval E ± σ. The variance increases linearly with
time and at t = n0/(2λ) = 50 the standard deviation is as large as the expectation
value. Parameters used: n0 = 100, λ = 1; sampling times, upper part: t = 0
(black), 0.1 (green), 0.2 (turquoise), 0.3 (blue), 0.4 (violet), 0.49999 (magenta),
0.99999 red), 2.0 (orange), 10 (yellow), and middle part: t = 10 (yellow), 20
(green), 50 (cyan), 100 (blue), and limt→∞ (black).
constant expectation value. As a matter of fact it is not: In almost all
individual runs the system will go extinct, but there are very few cases of
probability measure zero where the particle number grows to infinity for
t→∞. These rare cases are responsible for the finite expectation value.
Equation (5.7) can be used to derive a simple model for random selection
[111]. We assume a population of n different species
A + Xj
λ
−−−→ 2Xj , j = 1, . . . , n , (5.1a’)
Xj
µ
−−−→ B , j = 1, . . . , n . (5.1b’)
The probability joint distribution of the population is described by
Px1...xn = P(X1(t) = x1, . . . ,Xn(t) = xn
)= P (1)
x1· . . . · P (n)
xn, (5.8)
wherein all probability distribution for individual species are given by equa-
tion (5.6b) and independence of individual birth events as well as death events
allows for the simple product expression. In the spirit of Motoo Kimura’s
neutral theory of evolution [7] all birth and all death parameters are assumed
to be equal, λj = λ and µj = µ for all j = 1, . . . , n. For convenience we as-
sume that every species is initially present in a single copy: Pnj(0) = δnj ,1.
Stochastic Kinetics 249
Figure 5.4: The distribution of sequential extinction times Tk. Shown are
the expectation values E(Tk) for n = 20 according to equation(5.10). Since E(T0)diverges, T1 is the extinction that appears on the average at a finite value. A single
species is present above T1 and random selection has occurred in the population.
We introduce a new random variable that has the nature of a first passage
time: Tk is the time up to the extinction of n − k species and characterize
it as sequential extinction time. Accordingly, n species are present in the
population between Tn, which fulfils Tn ≡ 0 by definition, Tn−1, n−1 species
between Tn−1 and Tn−2, and eventually a single species between T1 and T0,which is the moment of extinction of the entire population. After T0 no
particle X exists any more.
Next we consider the probability distribution of the sequential extinction
times
Hk(t) = P (Tk < t) . (5.9)
The probability of extinction of the population is readily calculated: Since
individual reproduction and extinction events are independent we find
H0 = P0,...,0 = P(1)0 · . . . · P (n)
0 =
(λt
1 + λt
)n.
250 Peter Schuster
The event T1 < t can happen in several ways: Either X1 is present and all
other species have become extinct already, or only X2 is present, or only X3,
and so on, but T1 < t is also fulfilled if the whole population has died out:
H1 = Px1 6=0,0,...,0 + P0,x2 6=0,...,0 + P0,0,...,xn 6=0 + H0 .
The probability that a given species has not yet disappeared is obtained by
exclusion since existence and nonexistence are complementary,
Px 6=0 = 1 − P0 = 1 − λt
1 + λt=
1
1 + λt,
which yields the expression for the presence of a single species
H1(t) = (n+ λt)(λt)n−1
(1 + λt)n,
and by similar arguments a recursion formula is found for the extinction
probabilities with higher indices
Hk(t) =
(n
k
)(λt)n−k
(1 + λt)n+ Hk−1(t) ,
that eventually leads to the expression
Hk(t) =
k∑
j=0
(n
j
)(λt)n−j
(1 + λt)n.
The moments of the sequential extinction times are computed straightfor-
wardly by means of a handy trick: Hk is partitioned into terms for the
individual powers of λt, Hk(t) =∑k
j=0 hj(t) and then differentiated with
respect to time t
hj(t) =
(n
j
)(λt)n−k
(1 + λt)n,
dhj(t)
dt= h′j =
λ
(1 + λt)n+1
((n
j
)(n− j)(λt)n−j−1 −
(n
j
)j(λt)n−j
).
The summation of the derivatives is simple because h′k + h′k−1 + . . .+ h′0 is a
telescopic sum and we find
dHk(t)
dt=
(n
k
)(n− k)λn−k tn−k−1
(1 + λt)n+1.
Stochastic Kinetics 251
Making use of the definite integral [112, p.338]∫ ∞
0
tn−k
(1 + λt)n+1dt =
λ−(n−k+1)
(nk
)k
,
we finally obtain for the expectation values of the sequential extinction times
E(Tk) =
∫ ∞
0
dHk(t)
dtt dt =
n− kk· 1λ, n ≥ k ≥ 1 , (5.10)
and E(T0) = ∞ (see figure ). It is worth recognizing here another paradox
of probability theory: Although extinction is certain, the expectation value
for the time to extinction diverges. Similarly as the expectation values, we
calculate the variances of the sequential extinction times:
σ2(Tk) =n(n− k)k2(k − 1)
· 1
λ2, n ≥ k ≥ 2 , (5.11)
from which we see that the variances diverges for k = 0 and k = 1.
For distinct birth parameters, λ1, . . . , λn, and different initial particle
numbers, x1(0), . . . , xn(0), the expressions for the expectation values become
considerably more complicated, but the main conclusion remains unaffected:
E(T1) is finite whereas E(T0) diverges.
5.1.2 Boundaries in one step birth-and-death processes
One step birth-and-death processes have been studied extensively and ana-
lytical solutions are available in table form [19]. For transition probabilities
at most linear in n, w+(n) = ν + λn and w−(n) = ρ+ µn, one distinguishes
birth (λ), death (µ), immigration (ν), and emigration (ρ) terms. Analytical
solutions exist for all one step birth-and-death processes whose transitions
probabilities are not of higher order than linear in particle numbers n.
It is necessary, however, to consider also the influence of boundaries on
these processes. For this goal we define an interval [a, b] for the stochastic
process. There are two classes of boundary conditions, absorbing and reflect-
ing boundaries. In the former case, a particle that left the interval is not
allowed to return to it whereas the latter boundary implies that it is forbid-
den to exit from the interval. Boundary conditions can be easily implemented
by ad hoc definitions of transition probabilities:
252 Peter Schuster
Reflecting Absorbing
Boundary at a w−(a) = 0 w+(a− 1) = 0
Boundary at b w+(b) = 0 w−(b+ 1) = 0
The reversible chemical reaction with w−(n) = k1n and w+(n) = k2(n0− n),
for example had two reflecting barriers at a = 0 and b = n0. Among the
examples we have studied so far we found an absorbing boundary in the
replication-extinction process between NA = 1 and NA = 0 tantamount to
the boundary a = 1, and the state n = 0 is an end point of all trajectories
reaching it.
Compared, for example, to an unrestricted random walk on positive and
negative integers, n ∈ Z, a chemical reaction or a biological process has to
be restricted by definition, n ∈ N0, since negative particle numbers are not
allowed. In general, the one step birth-and-death master equation (4.8),
∂Pn(t)
∂t= w+(n−1)Pn−1(t) + w−(n+1)Pn+1(t)−
((w+(n)+w−(n)
)Pn(t) ,
is not restricted to n ∈ N0 and thus does not automatically fulfil the proper
boundary conditions to model a chemical reaction. What we need is a mod-
ification of the equation at n = 0 which introduces a proper boundary of the
process:∂P0(t)
∂t= w−(1)P1(t) − w+(0)P0(t) . (4.8’)
This occurs naturally if w−(n) vanishes for n = 0 which is always so when
the constant or migration term vanishes, ν = 0. With w−(0) = 0 we only
need to make sure that P−1(t) = 0 and obtain equation (4.8’). This will
be the case whenever we take an initial state with Pn(0) = 0 ∀n < 0, and
it is certainly true for our conventional initial condition, Pn(0) = δn,n0 with
n0 ≥ 0. By the same token we prove that the upper reflecting boundary
for chemical reactions, b = n0, fulfils the conditions of being natural too.
Equipped with natural boundary conditions the stochastic process can be
solved for the entire integer range, n ∈ Z, which is often much easier than
with artificial boundaries. All the barriers we have encountered so far were
natural.
Stochastic
Kin
etic
s253
Table 5.1: Comparison of of results for some unrestricted processes. Data are taken from [19, pp.10,11]. Abbrevi-
ation and notations: γ ≡ λ/µ, σ ≡ e(λ−µ)t, (n, n0) ≡ minn, n0, and In(x) is the modified Bessel function.
Process λn µn gn0(s, t) Pn,n0(t) Mean Variance Ref.
Poisson ν 0 sn0 eν(s−1) t (νt)n−n0 eν t
(n−n0)!, n ≥ n0; n0 > (0, n) n0 + νt νt [16]
Poisson 0 ρ sn0 eρ(1−s) t/s (ρt)n−n0 eρ t
(n0−n)! , n ≤ n0; n0 < (0, n) n0 − ρt ρt [16]
ν ρ sn0 e−(ν+ρ)t+(νs+ρ/s)t(νρ
)(n−n0)/2In0−n(2t
√νρ) e−(ν+ρ)t n0 + (ν − ρ)t (ν + ρ)t [113]
Birth λn 0(1− eλt(1− 1/s)
)−n0 (nn0
)e−n0λ t(1− e−λ t)n−n0 , n ≥ n0; n0 > (0, n) n0 e
λt n0 eλt(eλt − 1) [114]
Death 0 µn(1− e−µt(1− s)
)n0 (n0
n
)e−nµ t(1− e−µ t)n0−n , n ≤ n0; n0 < (0, n) n0 e
−µt n0 e−µt(1− e−µt) [114]
ν µn(1− e−µt(1− s)
)n0 × exp(− νµ(1− e−µt)
)× n0 e
−µt+(νµ + n0e
−µt)× [16]
× exp(ν(s− 1)(1 − e−µt)/µ
)×
(n,n0)∑k=0
e−µtk(1−e−µt)n+n0−2k
(n−k)!
(νµ
)n−k+ν(1−e−µt)
µ ×(1− e−µt)
Birth& λn µn(
(σ−1)+(γ−σ)s(γσ−1)γ(1−σ)s
)n0
γn(n,n0)∑k=0
(−1)k(n+n0−k−1
n−k)(n0
k
)× n0 σ
n0σ(γ+1)(σ−1)γ−1 [114]
Death ×(
1−σ1−γσ
)n+n0−k ( 1−σ/γ1−σ
)k
λn λn(λt+(1−λt)s1+λt−λt s
)n (λt
1+λt
)n+n0 ∑(n,n0)k=0
(n0
k
)× n0 2n0λ t
×(n+n0−k−1
n−k) (
1−λ2t2
λ2t2
)k
254Peter
Schuster
Table 5.2: Comparison of of results for some restricted processes. Data are taken from [19, pp.16,17]. Abbreviation
and notations used in the table are: γ ≡ λ/µ, σ ≡ e(λ−µ)t, α ≡ (ν/ρ)(n−n0)/2e(ν+ρ)t; In = I−n ≡ In(2(νρ)1/2t
)where In(x)
is a modified Bessel function; Gn ≡ Gn(ξj , γ) where Gn is a Gottlieb polynomial, Gn ≡ Gn(ξj , γ),Gn(x, γ) ≡ γn
∑nk=0(1− γ−1)k
(nk
)(x−k+1k
)= γnF (−n,−x, 1, 1− γ−1) where F is a hypergeometric function, ξj and ξj are the
roots of Gu−l(ξj , γ) = 0, j = 0, . . . , u− l− 1 and Gu−l+1(ξj, γ) = γ Gu−l(ξj, γ), j = 0, . . . , u− l, respectively; Hn ≡ Hn(ζj , γ),
Hn ≡ Hn(ζj , γ), Hn(x, γ) = Gn(x, γ−1), Hu−l(ζj , γ) = 0, j = 0, . . . , u− l− 1 and Hu+l−1(ζj , γ) = Hu−l(ζj , γ)/γ, respectively.
λn µn Boundaries Pn,n0(t) Ref.
ν ρ u : abs; l : −∞ α(In−n0 − I2u−n−n0
)[16, 115]
ν ρ u : +∞; l : abs α(In−n0 − In+n0−2l
)[16, 115]
ν ρ u : refl; l : −∞ α
(In−n0 +
(νρ
1/2 I2u+l−n−n0 +(1− ρ
ν
))·∑∞
j=2
(νρ
)j/2I2u−n−n0+j
)[16, 115]
ν ρ u : +∞; l : refl α
(In−n0 +
(νρ
1/2 In+n0+l−2u +(1− ρ
ν
))·∑∞
j=2
(νρ
)j/2In+n0−2l+j
)[16, 115]
ν ρ u : abs; l : abs α
(∑∞k=−∞ In−n0+2k(u−l) −
∑∞k=0
(In+n0−2l+2k(u−l) + I2l−n−n0+2k(u−l)
))[16, 115]
λ(n− l + 1) µ(n− l) u : abs; l : refl γl−n∑u−l−1
k=0 Gn0−lGn−l σξk(∑u−l−1
j=0Gj
γj
)−1[116, 117]
λ(n− l + 1) µ(n− l) u : refl; l : refl γl−n∑u−l
k=0 Gn0−lGn−l σξk(∑u−l
j=0Gj
γj
)−1
[116, 117]
λ(u− n) µ(u− n+ 1) u : refl; l : abs γu−n∑u−l−1
k=0 Hu−n0Hu−nσ−ζk(∑u−l−1
j=0 Hjγj)−1
[116, 117]
λ(u− n) µ(u− n+ 1) u : refl; l : refl γu−n∑u−l
k=0 Hu−n0Hu−nσ−ζk(∑u−l
j=0 Hjγj)−1
[116, 117]
Stochastic Kinetics 255
For the sake of completeness we summarize the conditions, which can be
introduced in the master equation in order to sustain reflecting or absorbing
barriers for processes described by forward and backward master equations
on the interval [a, b] (see e.g. [6, pp.283-284]).
Forward master equation on [a, b]
Reflecting Absorbing
Boundary at a w−(a)Pa(t) = w+(a− 1)Pa−1(t) Pa−1(t) = 0
Boundary at b w+(b)Pb(t) = w−(b+ 1)Pb+1(t) Pb+1(t) = 0
Backward master equation on [a, b]
Reflecting Absorbing
Boundary at a P (., .|a− 1, t′) = P (., .|a, t′) P (., .|a− 1, t′) = 0
Boundary at b P (., .|b+ 1, t′) = P (., .|b, t′) P (., .|b+ 1, t′) = 0
An overview over a few selected birth-and-death processes is given in
tables 5.1 and 5.2. Commonly, unrestricted and restricted processes are
distinguished [19]. An unrestricted process is characterized by the possibility
to reach all states N (t) = n. A requirement imposed by physics demands
that all changes in state space are finite for finite times (growth condition in
subsection 3.5.3) and hence the probabilities to reach infinity at finite times
must vanish: limn→±∞ Pn,n0 = 0. The linear birth and death process in
table 5.1 is unrestricted only in the positive direction and the state N (t) = 0
is special because it represents an absorbing barrier. The restriction is here
hidden and met by the condition Pn,n0(t) = 0 ∀ n < 0.
256 Peter Schuster
5.2 Size expansion in biology
In the previous chapter we introduced a size expansion for the chemical mas-
ter equation (section 4.3.3). Here, we shall repeat the derivation of introduce
this useful technique by means of a simple example, the spreading of an epi-
demic, which is, nevertheless, sufficiently general in order to be transferable
to other cases [98, pp.251-258].
Before we discuss this example, however, we come back to the birth-and-
death master equation (4.8) for one step processes, W (n|n′) = w+(n′)δn,n′−1+
w−(n′)δn,n′+1, where w+(n) and w−(n) are usually analytic functions, which
we shall assume are (at least) twice differentiable. We repeat the one step
master equation:
∂Pn(t)
∂t= w+(n− 1)Pn−1(t) + w−(n+ 1)Pn+1(t) −
(w+(n) + w−(n)
)Pn(t) ,
and define a single step difference operator D by
D f(n) = f(n+ 1) , and D−1f(n) = f(n− 1) . (5.12)
Using this operator we can rewrite the master equation and find
∂P(t)
∂t=
( D− 1)w+(n) + ( D−1 − 1)w−(n)
P(t) . (4.8’)
The jump moments are now
αp(n) = (−1)pw+(n) + w−(n) . (4.6’)
We repeat the macroscopic rate equation
d〈n〉dt
= −w−(〈n〉) + w+(〈n〉) ,
find for the coupled equations for expectation value and variance the simpler
expressions
d〈n〉dt
= w+(〈n〉) − w−(〈n〉) +1
2
(d2w+(〈n〉)
dn2 − d2w−(〈n〉)dn2
)σ 2n , (5.13a)
dσ 2n
dt= w+(〈n〉) + w−(〈n〉) + 2
(dw+(〈n〉)
dn− dw−(〈n〉)
dn
)σ 2n , (5.13b)
Stochastic Kinetics 257
Figure 5.5: The macroscopic part of a stochastic variable N . The variable
n is partitioned according to into a macroscopic part and the fluctuations around
it, n = Ωφ(t)+Ω1/2x(t), wherein Ω is a size parameter, for example the size of the
population or the volume of the system. Computations: Ωφ(t) = 5n0(1− 0.8e−kt)
with n0 = 2 and k = 0.5; p(n, t) = Ω1/2x(t) = e−(n−Ωφ(t))2/(2σ2)/√
2πσ2 with
σ = 0.1, 0.17, 0.24, 0.285, 0.30.
and are now in the position to handle the example by means of an expansion
technique (see [98, pp.251-254] and section 4.3.3).
An epidemic spreads in a population of Ω individuals. We assume that
n(t) individuals are already infected. The probability of a new infection
is proportional to both, to the number of infected and to the number of
uninfected individuals, w−(n) = βn(Ω − n). No cure is possible and thus
258 Peter Schuster
w+(n) = 0. Finally, we have
W (n|n′) = β δn,n′+1 n′ (Ω− n′) ,
which leads to the master equation
∂Pn(t)
∂t= β(n− 1)(Ω− n+ 1)Pn+1(t) − βn(Ω− n)Pn(t) or
∂P
∂t= β
(D
−1 − 1)n (Ω− n)P(t) .
(5.14)
Basic to the expansion is the idea that the density of the stochastic variable
N can be split in a macroscopic part, Ωφ(t), and fluctuations of the order
Ω1/2 around it. As shown in figure 5.5 we assume that P (n, t) is represented
by a (relatively) sharp peak located approximately at Ωφ(t) with a width of
order Ω1/2. In other words, we assume that the fluctuations fulfil a√N -law,
and we make the ansatz
n(t) = Ωφ(t) + Ω1/2 x(t) , (5.15)
where x is a new variable describing the fluctuations and the function φ(t)
has to be chosen in accord with the master equation. As said above, Ωφ(t)
is called the macroscopic part and Ω1/2 x the fluctuating part of n. We may
refer to the new variables as an Ω language. The probability density of n
becomes now a probability density Π(x, t) of x:
P (n, t) ∆n = Π(x, t) ∆x ,
Π(x, t) = Ω1/2 P(Ωφ(t) + Ω1/2x, t
).
(5.16)
Differentiation yields3
∂Π
∂x= Ω1/2 ∂P
∂n,
∂Π
∂t= Ω1/2
(Ω
dφ
dt
∂P
∂n+∂P
∂t
),
and eventually we obtain
Ω1/2 ∂P
∂t=
∂Π
∂t− Ω1/2 dφ
dt
∂Π
∂x. (5.17)
3The somewhat unclear differentiation ∂P/∂n can be circumvented through direct vari-
ation of t by δt and simultaneously of x by −Ω1/2φ(t)δt, which leads to the same final
result.
Stochastic Kinetics 259
Now, the difference operators are also size expanded in power series of differ-
ential operators
D = 1 + Ω−1/2 ∂
∂x+
1
2Ω−1 ∂2
∂x2+ . . . , (5.18a)
D−1
=1
1 + Ω−1/2 ∂∂x
+ 12Ω−1 ∂2
∂x2 + . . .=
= 1 − Ω−1/2 ∂
∂x− 1
2Ω−1 ∂2
∂x2+ . . . + Ω−1 ∂2
∂x2+ . . . ≈
≈ 1 − Ω−1/2 ∂
∂x+
1
2Ω−1 ∂2
∂x2, (5.18b)
D−1 − 1 = −Ω−1/2 ∂
∂x+
1
2Ω−1 ∂2
∂x2. (5.18c)
Insertion of the operator and substitution of the new variables into the master
equation (5.14) yields after cancelation of an overall factor Ω−1/2
∂Π
∂t− Ω1/2 dφ
dt
∂Π
∂x= βΩ2
(−Ω−1/2 ∂
∂x+
1
2Ω−1 ∂2
∂x2
)·
·((φ+ Ω−1/2x
) (1− φ− Ω−1/2x
)Π(x, t)
).
The right-hand side requires two consecutive differentiations of three factors:
− Ω−1/2 ∂
∂x
((φ+ Ω−1/2x
)·(1− φ− Ω−1/2x
)Π(x, t)
)=
= −(1− 2φ− 2Ω−1/2x
)Π(x, t) − Ω1/2
(φ+ Ω−1/2x
)·(1− φ−Ω−1/2x
) ∂Π
∂x,
1
2Ω−1 ∂2
∂x2
(φ+ Ω−1/2x
)·(1− φ− Ω−1/2x
)Π(x, t)
=
= − Ω−3/2 ∂Π
∂x+
1
2Ω−1
(φ+ Ω−1/2x
)·(1− φ− Ω−1/2x
) ∂2Π
∂x2.
For convenience we introduce a new time scale, τ = βΩt, in order to absorb
one factor Ω – and for convenience also the factor β – into the time variable.
Collection of terms corresponding to the largest powers in Ω now yields
Ω1/2 :dφ
dτ
∂Π(x, τ)
∂x= φ (1− φ)
∂Π(x, τ)
∂x,
Ω0 :∂Π(x, τ)
∂τ= − (1− 2φ)
∂
∂x
(xΠ(x, τ)
)+
1
2φ (1− φ)
∂2Π(x, τ)
∂x2.
260 Peter Schuster
The largest term cancels if
dφ
dτ= φ (1− φ) , (5.19’)
and this yields the differential equation for the macroscopic variable φ(t),
which after transformation back into the original variables leads to the macro-
scopic rate equationdn
dt= β n (Ω − n) . (5.19)
Equating the next largest term, the coefficient of Ω0 to zero results in a linear
Fokker-Planck equation with time dependent coefficients φ(t):
∂Π(x, τ)
∂τ= − (1− 2φ)
∂
∂x
(xΠ(x, τ)
)+
1
2φ (1− φ)
∂2Π(x, τ)
∂x2. (5.20)
Equation (5.20) describes the fluctuations of the random variableN (t) around
the macroscopic part and these fluctuations are of order Ω1/2 as expected and
initially assumed.
The strategy for solving the master equation (5.14) is now obvious. At
first one determines φ(τ) by integrating the differential equation (5.19’) with
the initial value φ(0) = n0/Ω, then one solves the Fokker-Planck equa-
tion (5.20) with the initial condition Π(x, 0) = δ(x) and finally one obtains
the desired probabilities from
P (n, t|n0, 0) = Ω−1/2 Π(n− Ωφ(τ)
Ω1/2, τ)
(5.21)
A typical solution is sketched in figure 5.5 and it compares perfectly with
the exact solutions for sufficiently large systems (see, for example figures 4.4
and 4.5). Remembering the derivation we remark the terms of relative order
Ω−1/2 and smaller have been neglected.
Stochastic Kinetics 261
5.3 The Poisson representation
Master equations that are constructed on the basis of combinatorial or mass
action kinetcs and expansion of the probability distribution in Poisson distri-
butions provide a useful technique for the setup of Fokker-Planck equations
and stochastic differential equations, which are exactly equivalent to birth-
and-death master equations with many variables as they are used for the de-
scription of chemical or biological reaction networks. The expansion method
is called Poisson representation and has been developed by Crispin Gardiner
and Subhash Chaturvedi [118, 119] (see also [6, pp.301-335]). In this context
the system size expansion takes on the form of a small noise expansion of the
Fokker-Planck equation obtained by the Poisson representation.
In simple cases the Poisson representation yields stochastic differential
equations, which can be solved straightforwardly, but there are other cases
where the corresponding stochastic differential equation can be formulated
only in the complex plane. Then, the gauge Poisson representation [120]
provides an appropriate frame for solving the more complicated cases, and
recent applications of the theory to practical problems in population biology
have demonstrated the usefulness of the approach [121].
In a very wide class of systems the time development is considered as
the result of individual encounters between members of a population. Such
systems comprise, for example, (i) chemical reactions being caused by colli-
sions between molecules, (ii) biological population dynamics resulting from
giving birth, mating, eating each other and dying, and (iii) systems in epi-
demiology where diseases are transmitted by contacts between individuals.
Encounter based time evolution gives rise to combinatorial kinetics, which on
the macroscopic level is known as mass action kinetics in chemistry.We in-
clude therefore a consideration of birth-an-death processes in many variables
in this section (subsection 5.3.2).
262 Peter Schuster
5.3.1 Motivation of the Poisson representation
Statistical physics relates microscopic processes to macroscopic quantities
and thus provides the link between the realm of atoms and molecules and
thermodynamics. Under the conditions of an ideal gas or an ideal solution
of reacting species probability distributions of microstates are assumed to
be Poissonian and the correctness or suitability of this assumption has been
well established empirically. Considering a system of n molecular species Sj
involved in m reactions Rµ we obtain for the distribution function in the
grand canonical ensemble4
P (σk) = N · exp
(β
(∑
j
µjnj(σk) − εk
)), (5.22)
where σk is a microscopic state characterized by an index (set) ‘k’, nj(σk) is
the number of particles of species Sj in state σk, εk the energy of this par-
ticular state, µj is the chemical potential of species Sj , N is a normalization
factor, and β = 1/(kBT ) with T being the (absolute) temperature in Kelvin.
Chemical reactions are converting chemical species and consequently equi-
librium thermodynamics requires that the chemical potentials fulfil certain
relations. If a state σk is supposed to be converted into a state σ` then
∑
j
ν(S)j nj(σk) =
∑
i
ν(S)i ni(σ`) , S = 1, 2, 3, . . . ,
which implies the stoichiometric restrictions on the reaction system. The
canonical and the grand canonical ensembles are defined by the requirements
∑
j
ν(S)j nj(σk) = τ (S) and
∑
k
P (σk)∑
j
ν(S)j nj(σk) ≡
∑
j
ν(S)j 〈nj〉 = τ (S) ,
4In statistical thermodynamics three ensembles are distinguished [122, pp.513-518]:
The microcanonical ensemble referring to constant energy and maximizing entropy, the
canonical ensemble defined for constant volume and minimizing Helmholtz free energy,
and eventually the grand canonical ensemble is defined by constant chemical potentials µj
and the quantity minimized is the product p V .
Stochastic Kinetics 263
respectively. The grand canonical probability distribution results through
maximizing entropy with fixed mean energy with the stoichiometric con-
straint, and this implies that the chemical potential satisfies the relation
µj =∑
S
ν(S)j µS . (5.23)
Eventually one finds for the probability distribution of the population num-
bers of the individual states nj:
P (nj) = N exp
(β∑
j
µjnj
)∏
j
1
nj !
(∑
k
exp(−β ε(A)
k
))nj
. (5.24)
Herein the ε(A)k -values are the eigenstates of a single molecules of A. Equa-
tion (5.24) is a multivariate Poissonian distribution with the expectation
values
〈nj〉 =exp(β µj)∑
k exp(−β ε(A)
k
) . (5.25)
Equation (5.25) combined with the chemical potential (5.23) yields the law
of mass action. Implementation of the stronger constraint for the canonical
ensemble leads to
P (nj) = N∏
A
δ
(∑
j
ν(A)j nj , τ
(A)
)×∏
j
1
nj!
(∑
k
exp(−β ε(A)
k
))nj
. (5.26)
Local fluctuations can be included into the considerations by partitioning the
system into individual cells interrelated by transport and the stoichiometric
relation involve summations over all cells [118]. The canonical distribution
allows for a straightforward proof that one obtains locally Poissonian dis-
tributions, which become uncorrelated in the large volume limit. For local
calculations there is no difference between the canonical and the grand canon-
ical distribution but the latter is much easier to handle for the description of
thermodynamic equilibrium.
The application of pure statistical mechanics thus leads to Poissonian
distributions, locally as well as globally, and it is suggestive therefore to
approximate otherwise hard to derive probability distributions form master
equations by multivariate Poissonians.
264 Peter Schuster
5.3.2 Many variable birth-and-death systems
Combinatorial kinetics is introduced best by means of examples and we shall
start by illustrating a reversible dimerization process (see the subsubsec-
tion 4.2.2.2): X 2Y. The forward reaction X→ 2Y occurs by a kind of
spontaneous fission giving rise to two identical molecules Y. Such a sponta-
neous process can be visualized as a kind of degenerate encounter involving
just one molecule of X. For the transition in the molecular population we
have the probability
w(x→ x− 1, y → y + 2) = k+ x ,
and for the reverse reaction, 2Y→X, pairs of molecules Y have to be as-
sembled, which have a combinatorial probability of y(y− 1)/2 and hence we
find
w(x→ x+ 1, y → y − 2) = k− y(y − 1) .
Chemical multi-variate master equation. Generalization to a reaction
system with n reaction components or chemical species – reactants and/or
products Xj – involved in m individual reactions Rµ where nµGand nµQ
are the numbers of variable chemical reactant species or product species,5
respectively, yields:
nµG∑
j=1
ν(µG)j Xj
k+µ
−−−→←−−−k−µ
nµQ∑
i=1
ν(µQ)i Xi with µ = 1, 2, . . . , m . (5.27)
The stoichiometric coefficients ν(µG)j and ν
(µQ)i represent the numbers of
molecules of species Xj or Xi, respectively, which are involved in the elemen-
tary step of reaction Rµ (5.27).6 The coefficients are properly understood as
5Concentrations or particle numbers of species that are kept constant are assumed to
be incorporated into rate parameters.6For consistency we shall use the formal indices G and Q for the reactant side and the
product side of a general reaction Rµ, respectively. For clarity we use different indices
for summation and products – ‘j’ for reactants and ‘i’ for products – although this is not
demanded by mathematical rules.
Stochastic Kinetics 265
elements of three matrices: (i) the stoichiometric reactant matrix G, (ii) the
stoichiometric product matrix Q, and (iii) the stoichiometric matrix S, which
fulfil the relation Q−G = S. The particle numbers or concentrations of the
individual chemical species are subsumed in the vector,
x(t) = ([X1], [X2], . . . , [Xn]) = (x1, x2, . . . , xn)′ ,
and making use of individual, reaction specific columns of the three matrices,
G, Q, and S, we obtain for the changes in particle numbers for one elementary
step of the reaction Rµ:
S =
ν(1)1 ν
(2)1 · · · ν
(µ)1 · · · ν
(m)1
ν(1)2 ν
(2)2 · · · ν
(µ)2 · · · ν
(m)2
......
. . ....
. . ....
ν(1)n ν
(2)n · · · ν
(µ)n · · · ν
(m)n
.
For each reaction we define a vector
r(µ) = α (ν(µ)1 ,ν
(µ)1 , . . . ,ν(µ)
n )′ , (5.28)
which, in principle, accounts for α steps of the reaction Rµ. It is straight-
forward now to write down the changes in particle numbers caused by α
elementary steps of reaction R(µ) in either direction, forward or backward:
x −→ x + r(µ) in forward direction, and
x −→ x − r(µ) in backward direction.(5.29)
The reaction rates or transition probabilities are readily calculated from the
reaction rate parameters and the combinatorics of molecular encounters
w+µ
= k+µ
nµG∏
j=1
xj !
(xj − ν(µG)j )!
, and
w−µ
= k−µ
nµQ∏
i=1
xi!
(xi − ν(µQ)i )!
.
(5.30)
For large particle numbers and in macroscopic deterministic kinetics the com-
binatorial expressions are approximated by the terms with the highest power
266 Peter Schuster
in the particle numbers
w+µ≈ k+
µ
nµG∏
j=1
xµGj
ν(µG)j !
and w−µ≈ k−
µ
nµQ∏
i=1
xµQ
i
ν(µQ)i !
.
Finally, we obtain the master equation for an arbitrary chemical reaction
step (5.27) of a reaction network
∂P (x, t)
∂t=
m∑
µ=1
(w−
µ(x + r(µ))P (x + r(µ), t) − w+
µ(x)P (x, t) +
+w+µ(x− r(µ))P (x− r(µ), t) − w−
µ(x)P (x, t)
),
(5.31)
or written in terms of the original parameters:
∂P (x, t)
∂t=
m∑
µ=1
k+µ
(( n∏
j=1
(xj + ν(µG)j − ν(µQ)
j )!
(xj − ν(µQ)j )!
)P(x + ν(µG) − ν(µQ), t
)−
−( n∏
j=1
xj!
(xj − ν(µG)j )!
)P(x, t))
+
+
m∑
µ=1
k−µ
(( n∏
j=1
(xj − ν(µG)j + ν
(µQ)j )!
(xj − ν(µG)j )!
)P(x− ν(µG) + ν(µQ), t
)−
−( n∏
j=1
xj!
(xj − ν(µQ)j )!
)P(x, t))
. (5.31’)
In equations (5.28) and (5.31) steps of any size α are permitted, the single-
step birth-and-death master equation for the chemical reaction network, how-
ever, is tantamount to the restriction α = 1 (see also subsection 4.4.3).
Generating functions. For combinatorial kinetics we can derive a fairly
simple differential equation for the probability generation function:
g(s, t) =xmax∑
x=0
(n∏
j=1
sxj
j
)P (x, t) (5.32)
with xmax =(x
(max)1 , x
(max)1 , . . . , x
(max)n
)being a vector collecting the (in-
dividually) maximal values of particle numbers for individual species and
Stochastic Kinetics 267
0 = (0, 0, . . . , 0) being the zero-vector. Now we shall calculate the corre-
sponding partial differential equation. For this goal we separate the two
parts corresponding to step-up and step-down functions as expressed by the
transition probabilities w+µ
and w−µ, respectively:
∂g(s, t)
∂t=∂+g(s, t)
∂t+∂−g(s, t)
∂t
As follows from equation (5.30) the corresponding equations are of the form
∂+g(s, t)
∂t=∑
µ,x
k+µ
( nµG∏
j=1
((xj − rµ
j )!
(xj − rµ
j − ν(µG)j )!
sxj
j
)P (x− rµ, t) −
−nµG∏
j=1
(xj !
(xj − ν(µG)j )!
sxj
j
)P (x, t)
),
(5.33)
whereby the step-down equation (∂−g(s, t)/∂t) has the same structure and
the two equations are related by replacing the superscripts, + ↔ −, the
running index, i ↔ j, and numbers of reactions, nµG↔ nµQ
,7 and the stoi-
chiometric coefficients νµGj ↔ ν
µQ
i . Next we change the summation variable
in the first term from x to x − r(µ) and obtain, because the summation is
extended over the entire domain of x and all probabilities for x-values outside
vanish:
∂+g(s, t)
∂t=∑
µ,x
k+µ
(nµG∏
j=1
xj!
(xj − ν(µG)j )!
sxj+r
(µ)j
j −nµG∏
j=1
xj !
(xj − ν(µG)j )!
sxj
j
)
P (x, t) ,
For further simplification we make use of the easy to verify expressions
∏
j
sxj
j xj !
(xj − ν(µG)j )!
=∏
j
(∂ ν
(µG)j
∂xjν(µG)j
sxj
j
)sν(µG)j
j and
∏
j
sxj+r
(µ)j
j xj !
(xj − ν(µG)j )!
=∏
j
(∂ ν
(µG)j
∂xjν(µG)j
sxj
j
)
sν(µQ)
j
j ,
7The numbers nµG and nµQ need not be the same in case some reactions are considered
to proceed irreversibly. An alternative approach uses nµG = nµQ and zero values for some
rate parameters.
268 Peter Schuster
and obtain for the step-up equation
∂+g(s, t)
∂t=∑
µ
k+µ
(∏
i
sν(µQ)
ii −
∏
j
sν(µG)j
j
)∂ ν
(µG)j g(s, t)
∂xν(µG)j
j
.
A similar formula is derived for the step-down equation and summation of
the step-up and the step-down equation finally yields the general expression
of the differential equation for the generating function of chemical networks:
∂g(s, t)
∂t=∑
µ
((∏
i
sν(µQ)
ii −
∏
j
sν(µG)j
j
)×
×(k+
µ
∏
j
∂ ν(µG)j
∂xν(µG)j
j
− k−µ
∏
i
∂ ν(µQ)
i
∂xν(µQ)
ii
))g(s, t) .
(5.34)
We illustrate by means of examples.
Two reactions with a single variable. The two-step reaction mechanism
A + Xk1−−−→ 2X + D and
B + X
k2
−−−→←−−−k3
C .(5.35)
consists of an irreversible autocatalytic reaction step and a reversible addition
reaction. The concentrations of three molecular species are assumed to be
fixed: [A] = a0, [B] = b0, and [C] = c0, species D does not enter the kinetic
equations since it is the product of an irreversible reaction and hence, only a
single variable x = [X] has to be considered. Among other applications this
model is a simple implementation of the processes taking place in a nuclear
reactor. The first reaction represents nuclear fission: One neutron (X) hits
a nucleus A and releases to neutrons (2X) and residual products (D). The
second reaction describes absorption of neutrons by B and creation by the
reverse process.
The three stoichiometric matrices have a very simple form – n = 1 and
m = 2 – in this case:
G = (1 1) , Q = (2 0) , and S = Q − G = (1 − 1) .
Stochastic Kinetics 269
The constant concentrations can be absorbed into the rate parameters and
we have
k+1 = k1 a0 = α , k−1 = 0 , k+
2 = k2 b0 = β , k−2 = k3 c0 = γ .
From equation (5.34) we obtain by insertion and some calculation:
∂g(s, t)
∂t= (1− s)(β − α s)
∂g(s, t)
∂s− (1− s) γ g(s, t) . (5.36)
Formal division by dg yields the characteristic equations
dt
1= − ds
(1− s)(β − αs) =dg
γ(1− s) g ,
leading to two ODEs
− dt =ds
(1− s)(β − αs) and − 1
γ
dg
g=
ds
β − αs
that can be readily integrated and yield the solutions
u =1− sβ − αs e
(α−β)t and v = (β − αs)γ/α g .
The general solution can now be written v = f(u) where f(u) is some function
still to be determined
g(s, t) = (β − αs)−γ/α f((
1− sβ − αs
)e(α−β)t
)
.
This equation sustains a variety of time-dependent solutions. From the condi-
tional probability P (x, t|x0, 0) that represents the initial conditions we obtain
g(s, 0) = sx0 and
f(z) = (1− βz)x0 (1− αz)−x0/α−x0 (β − α)γ/α
and with λ = β − α the probability generating function takes on the form
g(s, t) = λγ/α(β(1− e−λt)− s(α− βe−λt)
)x0
×
×((β − αe−λt)− αs(1− e−λt)
)−γ/α−x0
.
(5.37)
270 Peter Schuster
First and second moment for the probability distribution of the random vari-
able X (t) are readily computed from the derivatives of the generating func-
tion by means of equation (2.71):
∂g(s, t)
∂s
∣∣∣s=1
= 〈x(t)〉 and∂2g(s, t)
∂s2
∣∣∣s=1
=⟨x(t)
(x(t)− 1
)⟩. (2.71’)
The computation then yields for the time derivatives
d〈x(t)〉dt
= (α− β) 〈x(t)〉 + γ and
d⟨x(t)
(x(t)− 1
)⟩
dt= 2(α− β)
⟨x(t)
(x(t)− 1
)⟩+ 2(α + γ) 〈x(t)〉
For α < β the two equations have a stationary solution and the stationary
mean and variance are
〈x〉 =γ
β − α and σ2(x) =βγ
(α− β)2. (5.38)
Equation (5.37) allows for straightforward calculation of stationary states
limt→∞
g(s, t) = (β − α)γ/α(β − sα)x0(β − sα)−γ/α−x0 ,
g(s) =
(β − αβ − sα
)γ/α, and
P (x) =Γ(x+ γ/α) (α/β)x
Γ(γ/α) x!(β − α)γ/α . (5.39)
A stationary solution exist only if α < β or [A] < k2 · [B]/k1 is fulfilled. For
α > β the system is unstable in the sense that 〈x(t)〉 diverges exponentially,
limt→∞ 〈x(t)〉 = ∞. Expectation value and variance at the steady state are
readily computed from (∂g(s)/∂s)|s=1 and (∂2g(s)/∂s2)|s=1 and, of course,
yield the same result as in equation (5.38).
Interesting phenomena arise when α = k1a0 approaches β = k2b0, this is
when the number of particles X created by reaction one approximates the
number of particles X, which are annihilated in reaction two. Since both
quantities α and β involve the input concentrations of A and B, respectively,
the critical quantity β − α can easily fine tuned in experiments. It is inter-
esting that both moments become very large when the critical value α = β
Stochastic Kinetics 271
in approached since
σ2(x)
〈x〉 =β
β − α =k2b0
k2b0 − k1a0→ ∞ for α→ β ,
and accordingly we are dealing with very large fluctuations in 〈x〉 near the
critical point. Since the system is Markovian and the differential equation
for the expectation value is linear we have
〈x(t), x(0)〉 = σ2(x) eα−β = σ2(x) ek1a0−k2b0 ,
and the relaxation of fluctuations becomes very slow near the critical point,
a phenomenon that is known as critical slowing down.
5.3.3 The formalism of the Poisson representation
The basic assumption is that a probability distribution P (x, t) can be ex-
panded as a superposition of uncorrelated, multivariate Poisson distributions:
P (x, t) =
∫dα
∏
k
e−αkαxkk
xk!f(α, t) , (5.40)
where f(α, t) is the quasiprobability of the Poisson representation, and then
the probability generating function g(s, t) can be written as
g(s, t) =
∫dα exp
(∑
k
(sk − 1)αk
)
f(α, t) . (5.41)
Substitution of g(s, t) into the probability generating function in equation (5.34),
which stems from the general master equation (5.31’) for a chemical reaction,
leads to
∂g(s, t)
∂t=∑
µ
∫dα
((∏
i
(∂
∂αi+ 1
)ν(µQ)
i
−∏
j
(∂
∂αj+ 1
)ν(µG)j)×
×(k+
µ
∏
i
αν(µQ)
ii − k−
µ
∏
j
αν(µG)j
j
)eΣk(sk−1)αk
)f(α, t) .
272 Peter Schuster
Integration by parts, neglect of surface terms and finally comparison of co-
efficients in the exponential functions yields [6, pp.301,302]
∂f(α, t)
∂t=∑
µ
(∏
i
(1− ∂
∂αi
)ν(µQ)
i
−∏
j
(1− ∂
∂αj
)ν(µG)j
)×
×(k+
µ
∏
i
αν(µQ)
ii − k−
µ
∏
j
αν(µG)j
j
)f(α, t) .
(5.42)
The function f(α, t) is obtained as solution of equation (5.42) and insertion
into equation (5.40) yields the approximation to the probability distribution
P (x, t).
The introduction of a reaction flux J(α) with the components
Jµ(α) = k+µ
∏
i
αν(µQ)
ii − k−
µ
∏
j
αν(µG)j
j (5.43)
facilitates the formulation of a Fokker-Planck equation. If the repertoire
of elementary steps does not contain reaction orders larger than two or at
maximum bimolecular reactions – which is almost always the case in realistic
systems – then the Fokker-Planck equation contains no higher derivatives
than second order:
∂f(α, t)
∂t= −
n∑
i=1
∂
∂αi
(∑
µ
Aµ
i Jµ(α)
)
f(α, t) +
+1
2
n∑
i,j=1
∂2
∂αi∂αjBij
(J(α)
)f(α, t) with
Aµ
i = ν(µQ)i − ν
(µG)i and
Bij
(J(α)
)= δi,j
∑
µ
(ν
(µQ)i (ν
(µQ)i − 1)− ν(µG)
i (ν(µG)i − 1)
)Jµ(α) +
+ (1− δi,j)∑
µ
(ν
(µQ)i ν
(µQ)j − ν(µG)
i ν(µG)j
)Jµ(α) .
(5.44)
For practical purposes it is much more convenient not to work with the
Fokker-Planck equation (5.44) but with the equivalent stochastic differential
Stochastic Kinetics 273
equation in Ito form:
dαi =∑
µ
Aµ
i Jµ(α) dt +n∑
j=1
cij(J(α)
)dW (t) with
cij(J(α)
)=(B(J(α)
))
ij,
(5.45)
where dW (t) is a differential Wiener process.
In order to calculate quantities of interest from equation (5.45) in the
inverse power of the system size V , it is useful to define
η =α
V, κ+
µ=
k+µ
V∑
i ν(µQ)
i +1, and κ−
µ=
k−µ
V∑
i ν(µG)i +1
.
and the stochastic differential equation takes on the form
dαi =∑
µ
Aµ
i Jµ(α) dt + εn∑
j=1
cij(J(α)
)dW (t) with ε =
1√V.
We illustrate by means of two examples.
The monomolecular reversible reaction. Here, we consider the monomolec-
ular reversible reaction
X
k1
−−−→←−−−k2
Y (5.46)
as an exercise in two variables, although mass conservation would allow for
the elimination of one variable. It can be described by the master equation
∂P (x, y, t)
∂t= k1
((x+ 1)P (x+ 1, y − 1, t) − xP (x, y, t)
)+
+ k2
((y + 1)P (x− 1, y + 1, t) − y P (x, y, t)
),
(5.47)
where P (x, y, t) = Prob(X (t) = x,Y(t) = y).8 Now we expand P (x, y, t) in
Poisson distributions according to equation (5.40) whereby N is a normal-
ization factor and the region of integration is still to be determined:
P (x, y, t) = N∫
dαx dαy e−αx
αxxx!e−αy
αyyy!f(αx, αy, t) . (5.48)
8It is important to keep in mind that depending on the experimental setup the stochas-
tic variables X and Y may be dependent or independent. In closed systems we have the
conservation relation X + Y = N or x+ y = n with N and n being constants.
274 Peter Schuster
In general, a Poisson representation in terms of a function f(α, t) need not
exist but it can be proven to exist in terms of generalized functions:
Any distribution in x can be realized as a linear combination of Kronecker
deltas δx,z, which can be chosen to fulfill9
δx,z =
∫dα e−α
αx
x!
(δz(−α) eα
).
Now we choose fz(α) = (−1)z δz(α) eα and find through integration by parts∫
dα fz(α) e−ααx
x!=
∫dα
αx
x!
(− d
dα
)zδ(α) = δx,z ,
and for a one-variable probability distribution we can write
P (x) =
∫dα e−α
αx
x!
(∑
z
(−1)z P (z) δz(α) eα
),
and in a formal sense a function f(α) can always be found for any P (x).
Commonly, however, one need not rely on such rather bizarre distribu-
tions as the one that has been used for the proof of existence of a Poisson
quasiprobability. If the quasiprobability vanishes at the boundary of the re-
gion of integration, substitution of (5.48) into (5.47) and integration by parts
yields the Fokker-Planck equation for f(αx, αy, t)
∂f(αx, αy, t)
∂t=
(− ∂
∂αx+
∂
∂αy
)((k1αx − k2αy) f(αx, αy, t)
)(5.49)
It is important to note that the diffusion coefficient in equation (5.49) is
zero and this is generally observed for all first-order reactions giving rise to
linear kinetic equations: All fluctuations obey the Poissonian law of noise as
it is encapsulated in the Poissonian distribution. The range of integration
in equation (5.48) follows from the solution of (5.49) by searching for the
manifold in the (αx, αy) plane as the boundary on which f(αx, αy) vanishes.
The general steady state solution satisfying the boundary condition of
vanishing f(αx, αy) at the limits of the range of integration can be written
f(αx, αy) = δ(k1αx − k2αy)φ(αx, αy) , (5.50)
9The n-derivative of the delta function with respect to x is denoted by δn(x).
Stochastic Kinetics 275
where φ(αx, αy) still is some arbitrary function.10
If we choose, for example, φ(αx, αy) = δ(αx− x) the corresponding steady
state solution P (x, y) is
P (x, y) = N∫
dαx dαy e−αx
αxxx!
e−αyαyyy!
δ(k1αx − k2αy) δ(αx − x) .
The range of integration is any region, which contains the point where the
arguments of both delta functions vanish. Eventually we find for the equi-
librium distribution
P (x, y) =xx
x!
yy
y!e−(x+y) . (5.51)
which is a Poisson distribution in x and y wherein the values x and y are
related by the condition for the deterministic equilibrium: k1x− k2y = 0.
Alternatively, we could use
φ(αx, αy) = (−1)n δn(αy) eαx+αy .
Then, we obtain for the stationary probability distribution
P (x, y) =n!
nn1
x!
1
y!xx yy δ(x+ y − n) and with y = n− x
P (x) =1
nn
(n
x
)xx (n− x)(n−x) ,
(5.52)
which is a binomial distribution. In this case the condition for the determin-
istic equilibrium is k1x = k2y = n k1k2/(k1+k2). The two choices correspond
to the grand canonical and the canonical ensemble, respectively, and we see
nicely that the latter is more restricted than the former: In the closed system
the total number of molecules X and Y is constant, X + Y = N .
10An alternative but equivalent ansatz for the stationary solution is
f(αx, αy) = g(αx + αy)/(k1αx − k2αy) but because of the pole at k1αx = k2αy it requires
integration in the complex plane (subsection 5.3.4). If integration is restricted to a region
along the real axis (5.50) is unique.
276 Peter Schuster
Two reactions with a single variable. The reaction mechanism studied
here is closely related to the nuclear reactor model (5.35):
A + X
k1
−−−→←−−−k2
2X and
B + X
k3
−−−→←−−−k4
C ,
(5.53)
both reaction are assumed to be reversible and the amounts of three molecular
species are assumed to be constant, [A] = a0, [B] = b0, and [C] = c0.11 The
stoichiometric matrices for n = 1 and m = 2 have a very simple form:
G = (1 1) , Q = (2 0) , and S = Q − G = (1 − 1) .
The constant concentrations can be absorbed into the rate parameters and
we have
k+1 = k1 a0, k
−1 = k2 , k
+2 = k3 b0 , k
−2 = k4 c0 .
Equation (5.42) for the function f(α, t) in the Poisson representation now
takes on the form
∂f(α, t)
∂t=
((1− ∂
∂α
)2
−(
1− ∂
∂α
))(k1a0 α− k2 α
2) f(α, t) +
+
(1−
(1− ∂
∂α
))(k3b0 α− k4c0) f(α, t) and
eventually leading to
∂f(α, t)
∂t=
(− ∂
∂α
(k4c0 + (k1a0 − k3b0)α− k2α
2)
+
+∂2
∂α2(k1a0α− k2α
2)
)f(α, t) .
(5.54)
11We remark that the mechanism (5.53) is compatible with equilibrium thermodynamics
if and only if the relation (k1a0 · k3b0 − k2 · k4c0) = ϑ = 0 is fulfilled.
Stochastic Kinetics 277
This equation is of Fokker-Planck form as long as D = k1a0α − k2α2 > 0
is fulfilled. The critical values of α where the expression for D vanishes are
α1 = 0 and α2 = k1a0/k2 between these two values we have D > 0 and a
conventional Fokker-Planck equation exists.
For one variable the factorial moments are obtained from the simple re-
lationship (2.75d) and they are of the general form
〈xr〉 ≡∑
x
∫dα(x(x− 1) . . . (x− r + 1)
)e−α
x!f(α) =
=
∫dααr f(α) ≡ 〈αr〉 .
(5.55)
There is, however, one caveat: The quasiprobability f(α) need not fulfil the
conditions required for a probability.
The stationary solution of the Fokker-Planck equation 5.54 up to normal-
ization is
f(α) = N (k1a0 − k2α)k3b0/k2−k4c0/(k1a0)−1 αk4c0/(k1a0)−1 × eα . (5.56)
Although this is a fairly smooth function, the prerequisites for a probability
density – f(α) being nonnegative everywhere and normalizable – need not
be fulfilled. With ϑ = (k3b0/k2− k4c0/k1a0) normalization is possible on the
interval ]0, k1a0/k2[ provided ϑ > 0 and k4 > 0 are fulfilled. In addition, one
has to check whether the surface terms vanish under these conditions, which
is the case in the current example, and thus there exists a genuine Fokker-
Planck equation for ϑ > 0 and k4 > 0 that is equivalent to the stochastic
differential equation
dα =(k4c0 + (k1a0 − k3b0)α− k2 α
2)
dt +√
2(k1a0 α− k2 α2) dW (t) .
(5.57)
The domain of the variable is 0 < α < k1a0/k2 one can readily verify that
both boundaries satisfy the criteria for entrance boundaries. Accordingly,
it is not possible to leave the interval ]0, k1a0/k2[. Outside this interval the
coefficient of dW (t) becomes imaginary and an interpretation of the SDE on
the real axis alone in no longer possible.
278 Peter Schuster
5.3.4 Real, complex, and positive Poisson representations
In the case of a single variable only the Poisson representation of the proba-
bility distribution P (x, t) is of the form
P (x, t) =
∫
D
dµ(α)e−
ααx
x!f(α) , (5.58)
where µ(α) is a measure that can be chosen in different ways and D is the
domain of integration, which can take on various forms depending on the
choice of µ(α).
In real Poisson representations the choice of the measure is dµ(α) = dα
and D is a section [a, b] of the real line. As we have seen in the second
example of the previous subsection 5.3.3 there may be situations where the
diffusion coefficient becomes negative and then a real Poisson representation
does not exist.
For complex Poisson representations we choose again the simple measure
dµ(α) = dα but D is a contour C in the complex plane. In order to analyze
the existence of a complex Poisson representation we choose
fz =z!
2πı.ıα−z−1 eα
and C being a contour surrounding the origin instead of the expression
fz(α) = (−1)zδz(α)eα used previously for the real representation. Then
we can show that
Pz(x) =1
2πı.ı
z!
x!
∮
C
dααx−z−1 = δx,z
By appropriate summation we may express a given probability distribution
P (x) in terms of functions f(α) which are now given by
f(α) =1
2πı.ı
∑
z
P (z) eα α−z−1 z! . (5.59)
Provided the P (z) have the property that z!P (z) is bounded for all z, the
series has a finite radius of convergence outside which f(α) is analytic. By
Stochastic Kinetics 279
Figure 5.6: Definition of the contour C in the complex plane The sketch
refers to the reaction mechanism (5.53). The contour C is chosen to enclose the
pole on the real axis at α = k1a0/k2. The range on the real axis that is accessible
to the real Poisson representation, the stretch ]0, k1a0/k2[, is shown as full line.
choosing the contour C to lie outside this circle of convergence, the integra-
tion can be taken inside the summation and we find that P (x) is finally given
by
P (x) =
∮
C
dαe−α αx
x!f(α) . (5.60)
Equation (5.60) is the analogue of the real representation in the complex
plane and, for example, can be used for functions f(α) that have poles on
the real axis.
The complex Poisson representation allows for the calculation of station-
ary solutions in analytic form to which exact or asymptotic methods are easily
applicable. In general, it is nor so useful for the derivation of time-dependent
solutions.
Two reactions with a single variable. Again we adopt the reaction
model (5.53) of subsection 5.3.3 with the notation introduced there. When a
steady state exists, the quantity ϑ = (k3b0/k2−k4c0/k1a0) provides a measure
for the relative direction in which the two reactions are progressing: ϑ > 0
implies that the first reaction produces and the second reaction consumes
X, at ϑ = 0 both reaction balance separately and we have thermodynamic
280 Peter Schuster
equilibrium(see the footnote to the mechanism (5.53)
), and ϑ < 0 the first
reaction consumes X whereas the second reaction produces it.12 Now we dis-
cuss the three conditions separately:
(i) ϑ> 0: This case has been analyzed in subsection 5.3.3. The conditions
for f(α) to be a valid quasiprobability on the real interval ]0, k1a0/k2[ are
fulfilled. Within this range the diffusion coefficient D = k1a0α − k2α2 is
positive and the deterministic mean of α given by
α = 〈α〉 =k1a0 − k3b0 +
√(k1a0 − k3b0)2 + 4k2 k4c0
2k2,
lies within the interval under consideration, ]0, k1a0/k2[, we are dealing with
a genuine Fokker-Planck equation and f(α) is a function vanishing at both
ends of the interval and having its maximum near the deterministic mean.
(ii) ϑ= 0: Both reaction balance separately and the existence of Poissonian
steady state is expected. The quasiprobability f(α) has a pole at α = k1a0/k2
and the range of α is chosen to be a contour C in the complex plane enclosing
this pole. No boundary terms will arise for a closed contour C through
partial integration and hence P (x) resulting from this type of Poissonian
representation satisfies the steady state master equation. By calculus of
residues we find that
P (x) =eα0 αx0x!
with α0 =k1a0
k2. (5.61)
(iii) ϑ< 0: The steady state does no longer satisfy the condition ϑ > 0. If,
however, the range of α is chosen to be a contour C in the complex plane
as shown in figure 5.6 the complex Poisson representation can be used to
construct a steady state solution of the master equation that has the form:
P (x) =
∮
C
dα f(α)e−α αx
x!
Now the deterministic steady state corresponds to a point on the real axis
the is situated to the right of the singularity at α = k1a0/k2 and asymptotic
12This example demonstrates nicely the difference between stationarity and detailed
balance: All three cases describe steady states but only the condition ϑ = 0, the state of
thermodynamic equilibrium, is compatible with detailed balance.
Stochastic Kinetics 281
evaluation of means, higher moments and other quantities may be performed
by choosing C to pass through the saddle point occurring there. Then, the
variance σ2(α) = 〈α2〉 − 〈α〉2 is negative. This has the consequence that the
variance in x, which is obtained as the variance in α plus the variance of the
Poisson distribution, 〈α〉,
σ2(x) =⟨x2⟩− 〈x〉2 =
⟨α2⟩− 〈α〉2 + 〈α〉 , (5.62)
is smaller than the variance of the Poissonian: σ2(x) < 〈x〉. In other words,
the steady distribution is narrower than the Poissonian distribution.
Finally, we stress that all three cases may be obtained from the contour
C. For ϑ = 0 the cut from the singularity at α = k1a0/k2 to α = −∞and C can be distorted to a simple contour around the pole. If ϑ > 0 the
singularity becomes integrable and the contour C may be collapsed onto the
cut. The integral to be evaluated is a discontinuity integral over the full range
[0, k1a0/k2] that might need modifications for ϑ being a positive integer.
For the positive Poisson representation we choose α to be a genuine two
valued complex variable, α ≡ αx + ı.ıαy,
dµ(α) = d2α = dαx dαy ,
and D being the entire complex plane. Then, it can be proven [6, pp.309-312]
that for any P (x) there exists a positive f(α) such that
P (x) =
∫d2α
e−α αx
x!f(α) . (5.63)
A positive Poisson representation, fp(α), however, need not be unique as we
show by means of an example. We choose
fp =1
2πσ2exp
(− |α− α0|2
2σ2
)
and g(α) an analytic function of α that can be expanded according to
g(α) = g(α0) +
∞∑
n=1
g(n)(α0)(α− α0)
n
n!such that
∫1
2πσ2exp
(− |α− α0|2
2σ2
)g(α) = g(α0) ,
282 Peter Schuster
since all terms with n ≤ 1 vanish upon integration. As the Poissonian
e−ααx/x! itself is an analytic function we find for any positive value of σ2
P (x) =
∫d2α
e−α αx
x!fp(α) =
e−α0 αx0x!
.
Nonuniqueness of this kind is an advantage in practice rather than a problem.
As as example we consider again mechanism (5.53) that gives rise to the
SDE in the complex plane since α = αx + ı.ı αy
dα =(k4c0 + (k1a0 − k3b0)α− k2 α
2)
dt +√
2(k1a0 α− k2 α2) dW (t) .
(5.64)
Again the term ϑ plays the decisive role: For ϑ > 0 the noise term vanishes
at α = 0 and α = k1a0/k2, is positive between these point and the drift term
takes care that α returns to the range ]0, k1a0/k2[ whenever it reaches one
of the endpoints. For ϑ > 0 equation (5.64) is the real stochastic differential
equation (5.57) on the real interval [0, k1a0/k2].
For ϑ < 0 the stationary point lies outside the interval [0, k1a0/k2], and
a point inside the interval will migrate according to (5.64) along the interval
until it reaches the right-hand end, where the noise vanishes but the drift
continues to drive it further towards the right. On leaving the interval the
noise becomes imaginary and the point will start to diffuse in the complex
plane until it may eventually return again to the interval [0, k1a0/k2]. Thus,
the behavior for the entire range of ϑ is encapsulated in a single SDE.
Application to logistic growth. Based on the mechanism (5.53) an ex-
tensive stochastic analysis of the logistic model in population dynamics has
been performed by Alexei and Peter Drummond [121, 123]. The deterministic
dynamics is described by the differential equation
dx
dt= x (g − c x) ,
which is identical with the logistic equation of Pierre-Francois Verhulst [124].
In this formulation, g is the unconstrained growth rate and g/c the carrying
capacity of the ecosystem. The dynamical system has an equilibrium point
at x = g/c and a second stationary state at x = 0. Drummond and Drum-
mond add a death reaction in addition the competition process allowing for
Stochastic Kinetics 283
limitation of the population size. The implementation of the logistic model
makes use of three processes:
Death : Xa
−−−→ ,
Birth : Xb
−−−→ 2X , and (5.65)
Competition : 2Xc
−−−→ X .
Parametrization yields the three quantities: (i) the net growth rate g = b−a,(ii) the carrying capacity Nc = g/c, and (iii) the reproductive ratio r = b/a.
Comparison with the mechanism (5.53) shows that this logistic model is a
special case of it with the substitutions: k1a0 → b, k2 → c, k3b0 → a, and
k4c0 → 0.
284 Peter Schuster
Bibliography
[1] D. T. Gillespie. A general method for numerically simulating the stochastictime evolution of coupled chemical reactions. J.Comp. Phys., 22:403–434,1976.
[2] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions.J. Phys. Chem., 81:2340–2361, 1977.
[3] D. T. Gillespie. A rigorous derivation of the chemical master equation.Physica A, 188:404–425, 1992.
[4] D. T. Gillespie. Stochastic simulation of chemical kinetics.Annu.Rev. Phys. Chem., 58:35–55, 2007.
[5] K. L. Chung. Elementary Probability Theory with Stochastic Processes.Springer-Verlag, New York, 3rd edition, 1979.
[6] C. W. Gardiner. Stochastic Methods. A Handbook for the Natural Sciencesand Social Sciences. Springer Series in Synergetics. Springer-Verlag, Berlin,fourth edition, 2009.
[7] M. Kimura. The Neutral Theory of Molecular Evolution. CambridgeUniversity Press, Cambridge, UK, 1983.
[8] D. S. Moore, G. P. McCabe, and B. Craig. Introduction to the Practice ofStatistics. W. H. Freeman & Co., New York, sixth edition, 2009.
[9] H. Risken. TheFokker-Planck Equation. Methods of Solution andApplications. Springer-Verlag, Berlin, 2nd edition, 1989.
[10] M. Fisz. Wahrscheinlichkeitsrechnung und mathematische Statistik. VEBDeutscher Verlag der Wissenschaft, Berlin, 1989.
[11] H. Georgii. Stochastik. Einfuhrung in die Wahrscheinlichkeitstheorie undStatistik. Walter de Gruyter GmbH & Co., Berlin, third edition, 2007.
[12] D. S. Moore and W. I. Notz. Statistics. Concepts and Controversies.W. H.Freeman & Co., New York, seventh edition, 2009.
[13] W. I. Notz and M. A. Fligner. Study Guide for Moore and McCabe’sIntroduction to the Practice of Statistics. W. H.Freeman, New York, thirdedition, 1999.
285
286 BIBLIOGRAPHY
[14] N. Henze. Stochastik fur Einsteiger. Eine Einfuhrung in die faszinierendeWelt des Zufalls. Vieweg Verlag, Braunschweig, DE, fourth edition, 2003.
[15] M. S. Bartlett. An Introduction to Stochastic Processes with SpecialReference to Methods and Applications. Cambrigde University Press,Cambridge, UK, 3rd edition, 1978.
[16] D. R. Cox and H. D. Miller. The Theory of Stochastic Processes. Methuen,London, 1965.
[17] J. L. Doob. Stochastic Processes. John Wiley & Sons, New York, 1953.
[18] W. Feller. An Introduction to Probability Theory and its Applications,volume I and II. John Wiley, New York, 1966.
[19] N. S. Goel and N. Richter-Dyn. Stochastic Models in Biology. AcademicPress, New York, 1974.
[20] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics.Addison-Wesley, Reading, MA, 2nd edition, 1994.
[21] M. Iosifescu and P. Tautu. Stochastic Processes and Application in Biologyand Medicine. I. Theory, volume 3 of Biomathematics. Springer-Verlag,Berlin, 1973.
[22] M. Iosifescu and P. Tautu. Stochastic Processes and Application in Biologyand Medicine. II. Models, volume 4 of Biomathematics. Springer-Verlag,Berlin, 1973.
[23] S. Karlin and H. M. Taylor. A First Course in Stochastic Processes.Academic Press, New York, 1975.
[24] S. Karlin and H. M. Taylor. A Second Course in Stochastic Processes.Academic Press, New York, 1981.
[25] A. Stuart and J. K. Ord. Kendall’s Advanced Theory of Statistics. Volume1: Distribution Theory. Charles Griffin & Co., London, fifth edition, 1987.
[26] A. Stuart and J. K. Ord. Kendall’s Advanced Theory of Statistics. Volume2: Classical Inference and Relationship. Edward Arnold, London, fifthedition, 1991.
[27] A. Loman, I. Gregor, C. Stutz, M. Mund, and J. Enderlein. Measuringrotational diffusion of macromolecules by fluorescence correlationspectroscopy. Photochem. Photobiol. Sci., 9:627–636, 2010.
[28] M. Gosch and R. Rigler. Fluorescence correlation spectroscopy of molecularmotions and kinetics. Advanced Drug Delivery Reviews, 57:169–190, 2005.
BIBLIOGRAPHY 287
[29] S. T. Hess, S. Huang, A. A. Heikal, and W. W. Webb. Biological andchemical applications of fluorescence correlation spectroscopy: A review.Biochemistry, 41:697–705, 2002.
[30] J. Hohlbein, K. Gryte, M. Heilemann, and A. N. Kapanidis. Surfing on anew wave of single-molecule fluourescence methods. Phys. Biol., 7:031001,2010.
[31] E. Haustein and P. Schwille. Single-molecule spectroscopic methods.Curr. Op. Struct. Biol., 14:531–540, 2004.
[32] R. Brown. A brief description of microscopical observations made in themonths of June, July and August 1827, on the particles contained in thepollen of plants, and on the general existence of active molecules in organicand inorganic bodies. Phil.Mag., Series 2, 4:161–173, 1828.
[33] A. Einstein. Uber die von der molekular-kinetischen Theorie der Warmegeforderte Bewegung von in ruhenden Flussigkeiten suspendierten Teilchen.Annal. Phys. (Leipzig), 17:549–560, 1905.
[34] M. von Smoluchowski. Zur kinetischen Theorie der BrownschenMolekularbewegung und der Suspensionen. Annal. Phys. (Leipzig),21:756–780, 1906.
[35] E. N. Lorenz. Deterministic nonperiodic flow. J.Atmospheric Sciences,20:130–141, 1963.
[36] G. Mendel. Versuche uber Pflanzen-Hybriden. Verhandlungen desnaturforschenden Vereins in Brunn, 4:3–47, 1866.
[37] R. A. Fisher. Has Mendel’s work been rediscovered? Annals of Science,1:115–137, 1936.
[38] A. Franklin, A. W. F. Edwards, D. Fairbanks, D. Hartl, and T. Seidenfeld.Ending the Mendel-Fisher Controversy. University of Pittsburgh Press,Pittsburgh, PA, 2008.
[39] W. Penney. Problem: Penney-Ante. J.Recreational Math., 2(October):241,1969.
[40] R. T. Cox. The Algebra of Probable Inference. The John Hopkins Press,Baltimore, MD, 1961.
[41] E. T. Jaynes. Probability Theory. The Logic of Science. CambridgeUniversity Press, Cambridge, UK, 2003.
[42] G. Vitali. Sul problema della misura dei gruppi di punti di una retta. TipiGamberini e Parmeggiani, Bologna, 1905.
288 BIBLIOGRAPHY
[43] P. Billingsley. Probability and Measure. Wiley-Interscience, New York,third edition, 1995.
[44] M. Carter and B. van Brunt. The Lebesgue-Stieltjes Integral. A PracticalIntroduction. Springer-Verlag, Berlin, 2007.
[45] D. Meintrup and S. Schaffler. Stochastik. Theorie und Anwendungen.Springer-Verlag, Berlin, 2005.
[46] G. de Beer, Sir. Mendel, Darwin, and Fisher. Notes and Records of theRoyal Society of London, 19:192–226, 1964.
[47] K. Sander. Darwin und Mendel. Wendepunkte im biologischen Denken.Biologie in unserer Zeit, 18:161–167, 1988.
[48] T. Bayes and R. Price. An essay towards solving a problem in the doctrineof chances. By the late Rev. Mr.Bayes, communicated by Mr. Price, in aletter to John Canton, M.A. and F.R.S. Phil. Trans. Roy. Soc. London,53:370–418, 1763.
[49] C. P. Robert. The Bayesian Choice: From Decision-Theoretic Foundationsto Computational Implementation. Springer-Verlag, Berlin, 2007.
[50] P. M. Lee. Bayesian Statistics. An Introduction. Hodder Arnold, NewYork, third edition, 2004.
[51] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian DataAnalysis. Chapman& Hall / CRC, Boca Raton, FL, second edition, 2004.
[52] B. E. Cooper. Statistics for Experimentalists. Pergamon Press, Oxford,1969.
[53] J. F. Kenney and E. S. Keeping. Mathematics of Statistics. Van Nostrand,Princeton, NJ, second edition, 1951.
[54] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling.Numerical Recipes. The Art of Scientific Computing. Cambridge UniversityPress, Cembridge, UK, 1986.
[55] J. F. Kenney and E. S. Keeping. The k-Statistics. In Mathematics ofStatistics. Part I, § 7.9. Van Nostrand, Princeton, NJ, third edition, 1962.
[56] M. Evans, N. A. J. Hastings, and J. B. Peacock. Statistical Distributions.John Wiley & Sons, New York, third edition, 2000.
[57] N. A. Weber. Dimorphism of the African oecophylla worker and ananomaly (hymenoptera formicidae). Annals of the Entomological Society ofAmerica, 39:7–10, 1946.
BIBLIOGRAPHY 289
[58] M. F. Schilling, A. E. Watkins, and W. Watkins. Is human heightbimodal? The American Statistician, 56:223–229, 2002.
[59] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin,first edition, 1983.
[60] D. Williams. Diffusions, Markov Processes and Martingales. Volume 1:Foundations. John Wiley & Sons, Chichester, UK, 1979.
[61] B. K. Øksendal. Stochastic Differential Equations. An Introduction withApplications. Springer-Verlag, Berlin, sixth edition, 2003.
[62] M. Schubert and G. Weber. Quantentheorie. Grundlagen undAnwendungen. Spektrum Akademischer Verlag, Heidelberg, DE, 1993.
[63] R. W. Robinett. Quantum Mechanics. Classical Results, Modern Systems,and Visualized Examples. Oxford University Press, New York, 1997.
[64] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin,second edition, 1985.
[65] I. F. Gihman and A. V. Skorohod. The Theory of Stochastic Processes.Vol. I, II, and III. Springer-Verlag, Berlin, 1975.
[66] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the Brownianmotion. Phys. Rev., 36:823–841, 1930.
[67] P. Langevin. On the theory of Brownian motion. C.R.Acad. Sci. (Paris),146:530–533, 1908.
[68] L. Arnold. Stochastic Differential Equations. Theory and Applications.John Wiley & Sons, New York, 1974.
[69] P. Medvegyev. Stochastic Integration Theory. Oxford University Press,New York, 2007.
[70] P. E. Protter. Stochastic Intergration and Differential Equations,volume 21 of Applications of Mathematics. Springer-Verlag, Berlin, secondedition, 2004.
[71] K. Ito. Stochastic integral. Proc. Imp. Acad. Tokyo, 20:519–524, 1944.
[72] K. Ito. On stochastic differential equations. Mem.Amer.Math. Soc.,4:1–51, 1951.
[73] R. L. Stratonovich. Introduction to the Theory of Random Noise. Gordonand Breach, New York, 1963.
[74] D. L. Fisk. Quasi-martingales. Trans. Amer.Math. Soc., 120:369–389, 1965.
290 BIBLIOGRAPHY
[75] A. D. Fokker. Die mittlere Energie rotierender elektrischer Dipole imStrahlungsfeld. Annal. Phys. (Leipzig), 43:810–820, 1914.
[76] M. Planck. Uber einen Satz der statistischen Dynamik und seineErweiterung in der Quantentheorie.Sitzungsber. Preuss.Akad.Wiss. Phys.Math.Kl., 1917/I:324–341, 1917.
[77] A. N. Kolmogorov. Uber die analytischen Methoden in derWahrscheinlichkeitsrechnung. Mathematische Annalen, 104:415–418, 1931.
[78] W. Feller. The parabolic differential equations and the associatedsemi-groups of transformations. Annals of Mathematics, Second Series,55:468–519, 1952.
[79] R. C. Tolman. The Principle of Statistical Mechanics. Oxford UniversityPress, Oxford, UK, 1938.
[80] N. G. van Kampen. Derivation of the phenomenological equations from themaster equation. I. Even variables only. Physica, 23:707–719, 1957.
[81] N. G. van Kampen. Derivation of the phenomenological equations from themaster equation. II. Even and odd variables. Physica, 23:816–729, 1957.
[82] R. Graham and H. Haken. Generalized thermodynamic potential forMarkoff systems in detailed balance and far from thermal equilibrium.Z.Physik, 243:289–302, 1971.
[83] L. Onsager. Reciprocal relations in irreversible processes. I. Phys. Rev.,37:405–426, 1931.
[84] L. Onsager. Reciprocal relations in irreversible processes. II. Phys. Rev.,38:2265–2279, 1931.
[85] W. W. S. Wei. Time Series Analysis. Univariate and MultivariateMethods. Addison-Wesley Publ. Co., Redwood City, CA, 1990.
[86] E. W. Montroll and K. E. Shuler. Studies in nonequilibrium rate processes:I. The relaxation of a system of harmonic oscillators. J. Chem. Phys.,26:454–464, 1956.
[87] K. E. Shuler. Studies in nonequilibrium rate processes: II. The relaxationof vibrational nonequilibrium distributions in chemical reactions and shockwaves. J. Phys. Chem., 61:849–856, 1957.
[88] N. W. Bazley, E. W. Montroll, R. J. Rubin, and K. E. Shuler. Studies innonequilibrium rate processes: III. The vibrational relaxation of a systemof anharmonic oscillators. J.Chem. Phys., 28:700–704, 1958.
BIBLIOGRAPHY 291
[89] A. F. Bartholomay. On the linear birth and death processes of biology asMarkoff chains. Bull.Math. Biophys., 20:97–118, 1958.
[90] A. F. Bartholomay. Stochastic models for chemical reactions: I. Theory ofthe unimolecular reaction process. Bull.Math. Biophys., 20:175–190, 1958.
[91] A. F. Bartholomay. Stochastic models for chemical reactions: II. Theunimolecular rate constant. Bull.Math. Biophys., 21:363–373, 1959.
[92] S. K. Kim. Mean first passage time for a random walker and its applicationto chemical knietics. J.Chem. Phys., 28:1057–1067, 1958.
[93] D. A. McQuarrie. Kinetics of small systems. I. J.Chem. Phys., 38:433–436,1962.
[94] D. A. McQuarrie, C. J. Jachimowski, and M. E. Russell. Kinetics of smallsystems. II. J.Chem. Phys., 40:2914–2921, 1964.
[95] K. Ishida. Stochastic model for bimolecular reaction. J.Chem. Phys.,41:2472–2478, 1964.
[96] I. G. Darvey and P. J. Staff. Stochastic approach to first-order chemicalreaction kinetics. J.Chem. Phys., 44:990–997, 1966.
[97] D. A. McQuarrie. Stochastic approach to chemical kinetics. J.Appl. Prob.,4:413–478, 1967.
[98] N. G. van Kampen. The expansion of the master equation.Adv.Chem. Phys., 34:245–309, 1976.
[99] G. Nicolis and I. Prigogine. Self-Organization in Nonequilibrium Systems.John Wiley & Sons, New York, 1977.
[100] A. M. Turing. The chemical basis of morphogenesis.Phil. Trans. Roy. Soc. London B, 237(641):37–72, 1952.
[101] H. Meinhardt. Models of Biological Pattern Formation. Academic Press,London, 1982.
[102] P. Ehrenfest and T. Ehrenfest. Uber zwei bekannte Einwande gegen dasBoltzmannsche H-Theorem. Z.Phys., 8:311–314, 1907.
[103] M. Abramowitz and I. A. Segun, editors. Handbook of MathematicalFunctions with Formulas, Graphs, and Mathematical Tables, New York,1965. Dover Publications.
[104] H. A. Kramers. Brownian motion in a field of force and the diffusion modelof chemical reactions. Physica, 7:284–304, 1940.
292 BIBLIOGRAPHY
[105] J. E. Moyal. Stochastic porcesses and statistical physics.J.Roy. Statist. Soc. B, 11:151–210, 1949.
[106] A. Janshoff, M. Neitzert, Y. Oberdorfer, and H. Fuchs. Force spectroscopyof molecular systems – single molecule spectroscopy of polymers andbiomolecules. Angew.Chem. Int. Ed., 39:3212–3237, 2000.
[107] W. K. Zhang and X. Zhang. Single molecule mechanochemistry ofmacromolecules. Prog. Polym. Sci., 28:1271–1295, 2003.
[108] A. Messiah. Quantum Mechanics, volume II. North-Holland PublishingCompany, Amsterdam, NL, 1970.
[109] D. T. Gillespie. Markov Processes: An Introduction for Physical Sceintists.Academic Press, San Diego, CA, 1992.
[110] Y. Cao, D. T. Gillespie, and L. R. Petzold. Efficient step size selection forthe tau-leaping simulation method. J.Chem. Phys., 124:044109, 2004.
[111] P. Schuster and K. Sigmund. Random selection – A simple model based onlinear birth and death processes. Bull.Math. Biol., 46:11–17, 1984.
[112] I. S. Gradstein and I. M. Ryshik. Tables of Series, Products, and Integrals,volume 1. Verlag Harri Deutsch, Thun, DE, 1981.
[113] C. R. Heathcote and J. E. Moyal. The random walk (in continuous time)and its application to the theory of queues. Biometrika, 46:400–411, 1959.
[114] N. T. J. Bailey. The Elements of Stochastic Processes with Application inthe Natural Sciences. Wiley, New York, 1964.
[115] E. W. Montroll. Stochastic processes and chemical kinetics. In W. M.Muller, editor, Energetics in Metallurgical Phenomenon, volume 3, pages123–187. Gordon & Breach, New York, 1967.
[116] E. W. Montroll and K. E. Shuler. The application of the theory ofstochastic processes to chemical kinetics. Adv. Chem. Phys., 1:361–399,1958.
[117] K. E. Shuler, G. H. Weiss, and K. Anderson. Studies in nonequilibriumrate processes. V. The relaxation of moments derived from a masterequation. J.Math. Phys., 3:550–556, 1962.
[118] C. W. Gardiner and S. Chaturvedi. The Poisson representation. I. A newtechnique for chemical master equations. J. Statist. Phys., 17:429–468, 1977.
[119] S. Chaturvedi and C. W. Gardiner. The Poisson representation. II.Two-time correlation functions. J. Statist. Phys., 18:501–522, 1978.
BIBLIOGRAPHY 293
[120] P. D. Drummond. Gauge Poisson representation for birth-death masterequations. Eur. Phys. J. B, 38:617–634, 2004.
[121] P. D. Drummond, T. G. Vaughan, and A. J. Drummond. Extinction timesin autocatalytic systems. J. Phys. Chem.A, 114:10481–10491, 2010.
[122] R. S. Berry, S. A. Rice, and J. Ross. Physical Chemistry. OxfordUniversity Press, New York, second edition, 2000.
[123] A. J. Drummond and P. D. Drummond. Extinction in a self-regulatingpopulation with demographic and environmental noise.ArXiv: 0807.4772v2, the University of Auckland, Auckland, NZ and theUniversity of Queensland, Brisbane, QLD, AU, 2008.
[124] P. Verhulst. Recherches mathematiques sur la loi d’accroisement de lapopulation. Nouv.Mem. de l’Academie Royale des Sci. et Belles-Lettres deBruxelles, 18:1–41, 1845.
294 BIBLIOGRAPHY
Contents
1 History and Classical Probability 5
1.1 Precision limits and fluctuations . . . . . . . . . . . . . . . . . 7
1.2 Thinking in terms of probability . . . . . . . . . . . . . . . . . 10
2 Probabilities, Random Variables, and Densities 17
2.1 Sets and sample spaces . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Probability measure on countable sample spaces . . . . . . . . 22
2.2.1 Probabilities on countable sample spaces . . . . . . . . 22
2.2.2 Random variables and functions . . . . . . . . . . . . . 27
2.2.3 Probabilities on intervals . . . . . . . . . . . . . . . . . 32
2.3 Probability measure on uncountable sample spaces . . . . . . 34
2.3.1 Existence of non-measurable sets . . . . . . . . . . . . 34
2.3.2 Borel σ-algebra and Lebesgue measure . . . . . . . . . 36
2.3.3 Random variables on uncountable sets . . . . . . . . . 42
2.3.4 Limits of series of random variables . . . . . . . . . . . 46
2.3.5 Stieljes and Lebesgue integration . . . . . . . . . . . . 47
2.4 Conditional probabilities and independence . . . . . . . . . . . 56
2.5 Expectation values and higher moments . . . . . . . . . . . . 60
2.5.1 First and second moments . . . . . . . . . . . . . . . . 61
2.5.2 Higher moments . . . . . . . . . . . . . . . . . . . . . . 67
2.6 Mathematical statistics . . . . . . . . . . . . . . . . . . . . . . 70
2.7 Distributions, densities and generating functions . . . . . . . . 74
2.7.1 Probability generating functions . . . . . . . . . . . . . 74
2.7.2 Moment generating functions . . . . . . . . . . . . . . 76
2.7.3 Characteristic functions . . . . . . . . . . . . . . . . . 76
2.7.4 The Poisson distribution . . . . . . . . . . . . . . . . . 79
2.7.5 The binomial distribution . . . . . . . . . . . . . . . . 81
2.7.6 The normal distribution . . . . . . . . . . . . . . . . . 83
2.7.7 Central limit theorem and the law of large numbers . . 91
2.7.8 The Cauchy-Lorentz distribution . . . . . . . . . . . . 96
2.7.9 Bimodal distributions . . . . . . . . . . . . . . . . . . . 97
I
II CONTENTS
3 Stochastic processes 101
3.1 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . 1033.1.1 Simple stochastic processes . . . . . . . . . . . . . . . . 1043.1.2 The Chapman-Kolmogorov equation . . . . . . . . . . 106
3.2 Classes of stochastic processes . . . . . . . . . . . . . . . . . . 1173.2.1 Jump process and master equation . . . . . . . . . . . 1173.2.2 Diffusion process and Fokker-Planck equation . . . . . 1193.2.3 Deterministic processes and Liouville’s equation . . . . 121
3.3 Forward and backward equations . . . . . . . . . . . . . . . . 1223.4 Examples of special stochastic processes . . . . . . . . . . . . 125
3.4.1 Poisson process . . . . . . . . . . . . . . . . . . . . . . 1253.4.2 Random walk in one dimension . . . . . . . . . . . . . 1283.4.3 Wiener process and the diffusion problem . . . . . . . . 1313.4.4 Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . 135
3.5 Stochastic differential equations . . . . . . . . . . . . . . . . . 1373.5.1 Derivation of the stochastic differential equation . . . . 1373.5.2 Stochastic integration . . . . . . . . . . . . . . . . . . . 1403.5.3 Integration of stochastic differential equations . . . . . 1463.5.4 Changing variables in stochastic differential equations . 1493.5.5 Fokker-Planck and stochastic differential equations . . 150
3.6 Fokker-Planck equations . . . . . . . . . . . . . . . . . . . . . 1533.6.1 Probability currents and boundary conditions . . . . . 1543.6.2 Fokker-Planck equation in one dimension . . . . . . . . 1583.6.3 Fokker-Planck equation in several dimensions . . . . . 164
3.6.3.1 Change of variables . . . . . . . . . . . . . . . 1643.6.3.2 Stationary solutions . . . . . . . . . . . . . . 1673.6.3.3 Detailed balance . . . . . . . . . . . . . . . . 169
3.7 Autocorrelation functions and spectra . . . . . . . . . . . . . . 173
4 Applications in chemistry 177
4.1 Stochasticity in chemical reactions . . . . . . . . . . . . . . . . 1774.1.1 Elementary steps of chemical reactions . . . . . . . . . 1784.1.2 The master equation in chemistry . . . . . . . . . . . . 1804.1.3 Birth-and-death master equations . . . . . . . . . . . . 1834.1.4 The flow reactor . . . . . . . . . . . . . . . . . . . . . 186
4.2 Classes of chemical reactions . . . . . . . . . . . . . . . . . . . 1934.2.1 Monomolecular chemical reactions . . . . . . . . . . . . 193
4.2.1.1 Irreversible monomolecular chemical reaction 1944.2.1.2 Reversible monomolecular chemical reaction . 195
4.2.2 Bimolecular chemical reactions . . . . . . . . . . . . . 2014.2.2.1 Addition reaction . . . . . . . . . . . . . . . . 201
CONTENTS III
4.2.2.2 Dimerization reaction . . . . . . . . . . . . . 2054.3 Fokker-Planck approximation of master equations . . . . . . . 210
4.3.1 Diffusion process approximated by a jump process . . . 2104.3.2 Kramers-Moyal expansion . . . . . . . . . . . . . . . . 2134.3.3 Size expansion of the chemical master equation . . . . 214
4.4 Numerical simulation of master equations . . . . . . . . . . . . 2194.4.1 Definitions and conditions . . . . . . . . . . . . . . . . 2194.4.2 The probabilistic rate parameter . . . . . . . . . . . . . 221
4.4.2.1 Bimolecular reactions . . . . . . . . . . . . . 2224.4.2.2 Monomolecular, trimolecular, and other reac-
tions . . . . . . . . . . . . . . . . . . . . . . . 2244.4.3 Simulation of chemical master equations . . . . . . . . 2264.4.4 The simulation algorithm . . . . . . . . . . . . . . . . 2314.4.5 Implementation of the simulation algorithm . . . . . . 233
5 Applications of stochastic processes in biology 241
5.1 Autocatalysis, replication, and extinction . . . . . . . . . . . . 2415.1.1 Autocatalytic growth and death . . . . . . . . . . . . . 2425.1.2 Boundaries in one step birth-and-death processes . . . 251
5.2 Size expansion in biology . . . . . . . . . . . . . . . . . . . . . 2565.3 The Poisson representation . . . . . . . . . . . . . . . . . . . . 261
5.3.1 Motivation of the Poisson representation . . . . . . . . 2625.3.2 Many variable birth-and-death systems . . . . . . . . . 2645.3.3 The formalism of the Poisson representation . . . . . . 2715.3.4 Real, complex, and positive Poisson representations . . 278
IV CONTENTS
Index
algebrafiltered, 105
assumptionscaling, 210
Bayes, Thomas, 58Bernoulli trials, 104Bernoulli, Jakob, 81Boltzmann, Ludwig, 8, 220Borel, Emile, 34boundary
absorbing, 157, 167, 194, 251entrance, 159, 160, 277exit, 159, 160natural, 159, 160periodic, 158, 161prescribed, 159reflecting, 157, 167, 251regular, 160
Brown, Robert, 7
Cantor, Georg, 17Cardano, Gerolamo, 10carrying capacity, 282Cauchy, Augustin Louis, 96central limit theorem, 91chaos
deterministic, 8Chapman, Sydney, 107Chebyshev, Pafnuty, 94Chung, Kai Lai, 6collisions
classical theory, 222nonreactive, 221reactive, 221
conditiongrowth, 148
Lipschitz, 148Markov, 106potential, 168pseudo first order, 204
conditionsboundary, 153
confidence interval, 84, 192constant
equilibrium, 193reaction rate, 193
controversyMendel-Fisher, 11
convergencepointwise, 51
correctionBessel, 71
correlationcoefficient, 64
covariance, 64sample, 72
currentprobability, 154
Darboux, Gaston, 48de Moivre, Abraham, 91Dedekind, Richard, 17density
discrete probability, 25joint, 61, 103joint , 44marginal, 44spectral, 173
descriptiondeterministic, 5, 7
detailed balance, 170, 185diffusion matrix, 119Dirac, Paul, 109
I
II INDEX
Dirichlet, Peter Gustav Lejeune, 51distribution
bimodal, 65discrete uniform, 26joint, 33, 43, 88joint , 44marginal, 33, 45, 58Maxwell-Boltzmann, 220normal, 67uniform, 35
Doob, Joseph, 104drift vector, 119dynamics
complex, 8
Ehrenfest, Paul, 200Einstein, Albert, 7, 106ensemble
canonical, 262grand canonical, 262microcanonical, 262
ensemble average, 174equation
backward, 107, 115, 123Chapman-Kolmogorov, 107, 123,
138chemical master, 230Fokker-Planck, 116, 119, 138,
153, 177forward, 107, 115, 123Kolmogorov, 153Langevin, 137Liouville, 121master, 117, 177, 180Smoluchowski, 153
ergodicity, 175estimator, 70event, 24exit problem, 124expectation value, 49, 60
Feller, William, 160
filtration, 105Fisher, Ronald, 11Fisk, Donald, 144fluctuations, 5, 7, 9Fokker, Adriaan, 153function
autocorrelation, 173characteristic, 74, 76cumulative distribution, 30, 33,
66density, 42distribution, 43Heaviside, 28indicator, 50marginal distribution, 45measurable, 49moment generating, 74, 76nonanticipating, 144probability generating, 74probability mass, 30signum, 29simple, 50
gamePenney’s, 13
Gardiner, Crispin, 102, 115, 137Gegenbauer, Leopold, 207genetics
Mendelian, 11Gillespie, Daniel, 182, 219
Heaviside, Oliver, 28
independence, 57inequality
Cauchy-Schwarz, 64inference, statistical, 59integral
Lebesgue, 47Riemann, 47Stieltjes, 47, 139, 140Stratonovich, 144
INDEX III
integrationCauchy-Euler, 147
Ito, Kiyoshi, 141
Jacobi, Carl, 203
Khinchin, Aleksandr, 174Kimura, Motoo, 248kinetic theory
gases, 222kinetics
combinatorial, 261combinorial, 261mass action, 261
Kolmogorov, Andrey, 107Kramers, Hendrik, 210kurtosis, 67
excess, 67
Levy, Paul Pierre, 104Langevin, Paul, 137Laplace, Pierre-Simon, 91law
large numbers, 94Lebesgue, Henri Leon, 30limit
almost certain, 46in distribution, 47mean square, 46, 141stochastic, 47
Liouville, Joseph, 121Lorentz, Hendrik, 96
Markov, Andrey, 106martingale, 104
local, 105, 143mass action, 227matrix
stoichiometric, 265Maxwell, James Clerk, 8, 220mean
sample, 70
measureLebesgue, 39
median, 64Mendel, Gregor, 11, 58mode, 65molecularity, 193moment
centered, 62jump, 181raw, 62
momentsfactorial, 80raw, 84sample, unbiased, 71
motionBrownian, 7, 137thermal, 8
Moyal, Jose, 210
noisecolored, 174white, 138, 174
numbersrational, 41
objectelementary, 17
Onsager, Lars, 172operator
linear, 60Ornstein, Leonard, 135Ostwald, Wilhelm, 9
Pascal, Blaise, 10PDE
parabolic, 153Planck, Max, 153Poincare, Henri, 9Poisson representation, 261Poisson, Simeon Denis, 79pre-image, 50probability
IV INDEX
conditional, 56density, 42, 49distribution, 43, 49elementary, 45mass function, 30net flow, 154posterior, 59prior, 59triple, 27, 42
processadapted, 105adaptive, 144Bernoulli, 81birth-and-death, 178cadlag, 30, 105diffusion, 119, 151jump, 117Markov, 138, 177nonanticipating, 105Ornstein-Uhlenbeck, 135Poisson, 79, 125Rayleigh, 165, 168Wiener, 108, 120, 131, 135, 143,
147, 174product, reaction, 220property
extensive, 214intensive, 214
pseudovector, 171
quantile, 66quasiprobability, 271
random walk, 128, 135rate parameter
probabilistic, 222Rayleigh fading, 165reactant, 220reaction
Belousov-Zhabotinskii, 9reaction order, 193
Riemann, Bernhard, 47
samplespace, 17
selectionrandom, 248
semimartingale, 30, 105, 143sensitivity
to fluctuations, 9set
Borel, 34, 38Cantor, 41countable, 21empty, 18power, 34uncountable, 21Vitali, 35, 41
setsdisjoint, 20
sigma-algebra, 37Borelian, 38
skewness, 67space
event, 37measurable, 37
spectroscopyfluorescence correlation, 6single molecule, 6
spectrum, 173standard deviation, 63
sample, 70statistics, Bayesian, 58Stieltjes, Thomas Jean, 30stochastic process, 101
independent, 104Markov, 106separable, 103
Stratonovich, Ruslan, 144system
closed, 193, 195event, 36, 37
INDEX V
isolated, 193systems
dynamical, 8
theoremmutliplication, 62
theorylarge samples, 94
timearrival, 127first passage, 107, 124, 249sequential extinction, 249
Tolman, Richard Chace, 170, 185trajectory, 101translation, 40
Uhlenbeck, George, 135uncertainty
quantum mechanical, 8
van Kampen, Nicholas, 210, 214variable
random, 28variables
continuous, 42discrete, 42
variance, 63sample, 70
vectoraxial, 171random, 87
Vitali, Giuseppe, 34volume
generalized, 39von Smoluchowski, Marian, 7, 106,
153
Wiener, Norbert, 131