Uniformization for Time-Inhomogeneous Markov Population Models · PDF fileUniformization for Time-Inhomogeneous Markov Population Models ... 3.2 Uniformization for Time-Inhomogeneous

Saarland University

Faculty of Natural Sciences and Technology IDepartment of Computer Science

Master’s Thesis

Uniformization for Time-Inhomogeneous MarkovPopulation Models

submitted by

Aleksandr Andreychenko

submitted on

2010-23-07

Supervisor

Dr. Verena Wolf

Advisor

Prof. Dr.-Ing. Holger Hermanns

Reviewers

Dr. Verena WolfProf. Dr.-Ing. Holger Hermanns

Declaration of Consent

I agree to make my thesis (with a passing grade) accessible to the public by having themadded to the library of the Computer Science Department.

Saarbrucken,(Datum / Date) (Unterschrift / Signature)

i

During the time of writing this thesis I have been supported by several people whom Iwould like to thank sincerely.

Verena Wolf has been mentoring me since the starting point of my work. The lecture”Stochastic Dynamics in Systems Biology” offered by her inspired my interest in thetopic of work. It was the great pleasure for me to write thesis under her guidance.

Special thanks goes to Christa Schafer and Jens Peter for her great organizationalbackup.

ii

Contents

1 Introduction 11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Continuous-time Stochastic Processes 42.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Jump Times and Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Inhomogeneous Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Uniformization for Markov Chains 153.1 Uniformization for Time-Homogeneous Markov Chains . . . . . . . . . . . 153.2 Uniformization for Time-Inhomogeneous Markov Chains . . . . . . . . . . 173.3 Bounding Method for Transient Probabilities . . . . . . . . . . . . . . . . 203.4 Approximate Uniformization for ICTMC . . . . . . . . . . . . . . . . . . . 22

4 Markov Population Models 224.1 Uniformization for MPMs . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 State Space Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Uniformization Rate Calculation . . . . . . . . . . . . . . . . . . . . . . . 274.4 Bounding Approach for MPMs . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Approximate Uniformization for MPMs . . . . . . . . . . . . . . . . . . . 314.6 Choice of Time Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Algorithms and Implementation 345.1 On-the-fly Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Step-size Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3 Maximal Exit Rates Calculation . . . . . . . . . . . . . . . . . . . . . . . 385.4 Complete Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6 Case Studies 486.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 Conclusions 607.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

A Stochastic Chemical Kinetics 63A.1 Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64A.2 Chemical Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

B MATLAB Routines 68B.1 Solution of CME for Gene Expression . . . . . . . . . . . . . . . . . . . . 68

iii

1 Introduction

Time is one of the main factors in any kind of real-life systems. When a certain system isanalysed one is often interested in its evolution with respect to time. Various phenomenacan be described using a form of time-dependency. The difference between load incall-centres is the example of time-dependency in queueing systems. The process ofmigration of biological species in autumn and spring is another illustration of changingthe behaviour in time. The ageing process in critical infrastructures (which can result inthe system component failure) can also be considered as another type of time-dependentevolution.

Considering the variability in time for chemical and biological systems one comes tothe general tasks of systems biology [9]. It is an inter-disciplinary study field whichinvestigates complex interactions between components of biological systems and aimsto explore the fundamental laws and new features of them. Systems biology is alsoused for referring to a certain type of research cycle. It starts with the creation amodel. One tries to describe the behaviour in a most intuitive and informative waywhich assumes convenience and visibility of future analysis. The traditional approachis based on deterministic models where the evolution can be predicted with certainty.This type of model usually operates at a macroscopic scale and if one considers chemicalreactions the state of the system is represented by the concentrations of species anda continuous deterministic change is assumed. A set of ordinary differential equations(ODE) is one of the ways to describe such kind of models. To obtain a solution numericalmethods are applied. The choice of a certain ODE-solver depends on the type of theODE system. Another option is a full description of the chemical reaction system wherewe model each single molecule explicitly operating with their properties and positionsin space. Naturally it is difficult to treat big systems in a such way and it also createsrestrictions for computational analysis.

However it reveals that the deterministic formalism is not always sufficient to describeall possible ways for the system to evolve. For instance, the Lambda phage decisioncircuit [1] can be a motivational example of such system. When the lambda phagevirus infects the E.coli bacterium it can evolve in two different ways. The first one islysogeny where the genome of the virus is integrated into the genome of the bacterium.Virus DNA is then replicated in descendant cells using the replication mechanism ofthe host cell. Another way is entering the lytic cycle, which means that new phagesare synthesized directly in the host cell and finally its membrane is destroyed and newphages are released. A deterministic model is not appropriate to describe this process ofchoosing between two pathways as this decision is probabilistic and one needs a stochasticmodel to give an appropriate description.

Another important issue which has to be addressed is the fact that the state of thesystem changes discretely. It means that one considers not the continuous change ofchemical species concentrations but discrete events occuring with different probabilities(they can be time-dependent as well).

We will use the continuous-time Markov Population Models (MPMs) formalism inthis thesis to describe discrete-state stochastic systems. They are indeed continuous-

1

time Markov processes, where the state of the system represents populations and it isexpressed by the vector of natural numbers. Such systems can have infinitely manystates. For the case of chemical reactions network it results in the fact that one cannot provide strict upper bounds for the population of certain species. When analysingthese systems one can estimate measures of interest (like expectation and variance forthe certain species populations at a given time instant). Besides this, probabilities forcertain events to occur can be important (for instance, the probability for population toreach the threshold or the probability for given species to extinct).

The usual way to investigate properties of these systems is simulation [8] which meansthat a large amount of possible sample trajectories are generated and then analysed.However it can be difficult to collect a sufficient number of trajectories to provide sta-tistical estimations of good quality. Besides the simulation, approaches based on theuniformization technique have been proven to be computationally efficient for analysisof time-independent MPMs. In the case of time-dependent processes only few resultsconcerning the performance of numerical techniques are known [2].

Here we present a method for conducting an analysis of MPMs that can have possiblyinfinitely many states and their dynamics is time-dependent. To cope with the problemwe combine the ideas of on-the-fly uniformization [5] with the method for treating time-inhomogeneous behaviour presented by Bucholz.

1.1 Contributions

The main contributions of this master thesis are (i) a method to obtain the tight boundsfor transient probability distribution for time-inhomogeneous Markov Population Modelswith possibly infinite state space, (ii) a procedure to predict the future behaviour of suchsystems, and (iii) a technique to compute the time step of the adaptive size.

1.2 Organisation

In Section 2, the basic definitions and conceptions of continuous-time stochastic processesare introduced. We establish results concerning the basic properties of this model suchas stability, conservation and explosiveness. Then the continuous-time Markov chainsare presented as a stochastic processes of the special type that reveal to be very useful forpractical purposes. We provide the theoretical background needed to prove the existenceof the unique solution for Kolmogorov equation systems both in the case of finite andinfinite state spaces.

In Section 3 the uniformization procedure is presented for time-homogeneous andtime-inhomogeneous Markov chains which allows to compute the transient probabilitydistribution. Concerning the latter method, two approaches are given. The first one(bounding approach, Section 3.3) provides tight bounds for a transient probabilities. Thesecond one (approximate uniformization, Section 3.4) can provide just an approximatesolution.

In Section 4 we introduce the Markov Population Models (MPMs) as a subclass ofcontinuous time Markov chains. Both uniformization approaches for inhomogeneous

2

systems are refined in way that they can be applied to compute the transient probabilitydistribution of the MPM. We present the state space truncation procedure in Section 4.2(since MPMs can have the infinite state space) and the mechanism of choosing the timestep size is Section 4.6.

Key points on the algorithms implementing the described techniques are given in theSection 5 along with the conception of probability flow (Section 5.1) which allows us notto store matrices in the explicit form to fasten the calculations.

In Section 6 we provide case studies related to the chemical reaction networks toevaluate our approach. We conclude this thesis and discuss future work in Section 7.

3

2 Continuous-time Stochastic Processes

2.1 Basic Definitions

Generally speaking, a stochastic process is any process taking place in time and whichis controlled by probabilistic laws. There are several types of processes that one canconsider depending on the nature of the time domain and state space. The most suitablemodel w.r.t. our assumptions is a continuous-time stochastic process.

Definition 1 (Continuous-time Stochastic Processes). Let I be a countable set and Ωbe a sample set. A continuous-time stochastic process X = Xt, 0 ≤ t <∞ with valuesin I is a family of random variables Xt : Ω→ I.

As one can see, the values of Xt lay in the countable set I which indeed puts us inthe predefined discrete settings. Our task is to specify a probabilistic law for X. Thisshould give us a way to compute probabilities like P (Xt = i for some t). From here onwe will consider only right-continuous processes. This means that for almost all ω ∈ Ωand t ≥ 0 there exists ε > 0 such that Xs(ω) = Xt(ω) for t ≤ s ≤ t+ε. This is equivalentto assuming that there are no instantaneous states. Ω is called the sample space of thestochastic process (Xt)t≥0.

Definition 2 (Finite-dimensional Distribution of Stochastic Process). Let X be a stochas-tic process with values in I. Then probability P (Xt0 = i0, Xt1 = i1, . . . , Xtn = in, ) forn ∈ N, 0 ≤ t0 ≤ t1 ≤ . . . ≤ tn and i0, . . . , in ∈ I is the finite-dimensional distribution ofthe stochastic process X.

The probability of any event depending on a right-continuous stochastic process canbe determined from its finite-dimensional distributions. For example

P (Xt = i for some t ∈ [0,∞)) = 1− limn→∞

∑j1,...,jn 6=i

P (Xq1 = j1, . . . , Xqn = jn)

where q1, q2, . . . is an enumeration of the rationals. An example of the behaviour forX(ω) is shown on the Figure 1:

4

Figure 1: The example behaviour of the continuous-time stochastic process X

2.2 Jump Times and Explosion

Every sample path t 7→ Xt(ω) of a right-continuous process must remain constant forsome time in each new state. It means that the process has the following three ways toevolve:

1. it can make infinitely many jumps but only a finite number of jumps in any finiteinterval [0, t], t ≥ 0

2. a finite number of jumps are performed and then Xt(ω) stays in some state forever

3. infinitely many jumps are done in a finite time interval (explosion)

To describe formally these evolution possibilities we introduce the following definitionsof jump times and holding times.

Definition 3 (Jump Times). For the continuous-time stochastic process X on a samplespace Ω we define jump times Jn(ω) as

J0(ω) = 0, Jn+1(ω) = inft ≥ Jn(ω) : Xt(ω) 6= XJn(ω)

for n = 0, 1, . . . and ω ∈ Ω.

We put Jn+1(ω) =∞ if Xt(ω) = XJn(ω) for all t > Jn(ω). This allows us to describethe second way of X to evolve. The sequence of holding times for (Xt)t≥0 is then definedas

S0(ω) = 0, Sn+1 = Jn+1 − Jnfor all ω ∈ Ω and n = 0, 1, . . .. Note that Jn is a random variable, such that Jn : Ω →R≥0 ∪ ∞. The process X makes n jumps and remains in the state XJn for Sn+1 timeunits.

5

As mentioned above, it can happen that the process makes infinitely many jumps ina finite time interval as illustrated on the Figure 2:

Figure 2: An explosion of the continuous-time stochastic process X

The explosion time ζ is defined by

ζ = supnJn =

∞∑n=1

Sn

Process X is explosive if there exists i ∈ I such that Pi(ζ <∞) > 0. The jump process(or jump chain) of (Xt)t≥0 is discrete-time process (Yn)n≥0 such that

Yn = XJn .

It is indeed the sequence of values taken by (Xt)t≥0 up to explosion time ζ. For conve-nience one can add a new state to the set I (let say, ∞) and require that Xt =∞ for allt ≥ ζ. Any process which satisfies this condition is called minimal. Here we do not putall the theory concerning stochastic processes. More detailed introduction can be foundin [4].

2.3 Markov Processes

Definition 4 (Markov Process). Let X be continuous-time stochastic process. We callX a Markov process if for all times 0 ≤ t0 ≤ t1 ≤ . . . ≤ tn the conditional distributionat t = tn depends only on Xtn−1 for all n > 0:

f(Xtn |Xt0 , . . . , Xtn−1

)= f

(Xtn |Xtn−1

)

6

The definition means that a Markov process at the current time t = tn is independentof the random variables Xt0 , . . . , Xtn−2 that relates to the past instants of time. In thedefinition 4 we do not state the nature of the state space. If X has the state space Iand it is countable, we can consider the probabilities P

(Xtn = in|Xtn−1 = in−1

). We

call them transient probabilities, that is the probabilities for a process to be in the statein after the state in−1. Let use the following notation for all h ≥ 0:

P(Xtn = in|Xtn−1 = in−1

)= pin−1in(h), h = tn − tn−1

One can see that infinitely many Pin−1in(tn− tn−1) are needed to describe the behaviourof Markov process. To make the description more convenient we need to introduceinfinitesimal generator matrix.

Definition 5 (Stochastic Matrix). A matrix M = (mij : i, j ∈ I) is called a stochasticmatrix if it satisfies

1. 0 ≤ mij <∞ for all i, j

2.∑

j∈I mij = 1 for all i

The definition means that every row of (mij : j ∈ I) defines a probability distribution.

Definition 6 (Transition Semigroup). We refer to (P (t))t≥0 = (p(t)ij)t≥0 as a transitionsemigroup on the set I if for each t, s ≥ 0

1. P (t) is a stochastic matrix

2. P (0) = E (here E is identity matrix)

3. P (t+ s) = P (t)P (s)

Now let (P (t))t≥0 be a transition semigroup. Suppose that it is continuous at theorigin:

limh→0

P (h) = P (0) = E,

where one assumes point-wise convergence. This allows to establish the continuity forany t ≥ 0, that is for all i, j ∈ I

limh→0

pij(t+ h) = pij(t)

We can consider transition probabilities of the Markov process X with countable statespace I for some time instants t+ h ≥ t and states i, j ∈ I

pij(h) = P (Xt+h = i|Xt = j)

The probabilities pij(h) form a transition semigroup and we refer to them as P (h). Thus,if (P (t))t≥0 is a continuous transition semigroup on the countable state space I then forany state i there exists

qi = limh→0

1− pii(h)h

∈ [0,∞],

7

and for any pair i, j of different states there exists

qij = limh→0

pij(h)h

∈ [0,∞)

From here on we put qii = −qi for each state i. The numbers qij are called localcharacteristics of the semigroup. They can be also referred to as transition rates for atransition from state i into state j.

Definition 7 (Infinitesimal Generator Matrix). Assume that qij (defined above) are thelocal local characteristics of the continuous transition semigroup (P (t))t≥0. The matrixQ which contains qij

Q = qiji,j∈Iis called infinitesimal generator matrix of (P (t))t≥0

Note that the following relationship for matrix Q can also be used:

Q = limh→0

P (h)− P (0)h

According to this, the matrix Q can be interpreted as the derivative of the matrixfunction t 7→ P (t).

Definition 8 (Stability and Conservation). If for all states i ∈ I it holds that

qi <∞,

then the semigroup (P (t))t≥0 is called stable. If for all states i ∈ I it holds that

qi =∑

j∈I,j 6=iqij

then it is called conservative.

The latter equality comes from the conservation equality∑j∈I

pij(h) = 1

which is the same as1− pii(h)

h=

∑j∈I,j 6=i

pij(h)h

andqij = lim

h→0

∑j∈I,j 6=i

pij(h)h

And we obtain the desired result if the interchange of summation and limit is allowed.In case of a finite I it is always possible. If I is infinite, we will use regular jump Markovprocesses which are both stable and conservative. To get more details on this, we referto the [3].

8

Definition 9 (Regular Jump Markov Process). A stochastic process (Xt)t≥0 taking itsvalues in the state space I (not necessary countable) is called jump process if for almostall ω ∈ Ω and all t ≥ 0 there exists ε (t, ω) > 0 such that

Xt+s(ω) = Xt(ω) for all s ∈ [t, t+ ε (t, ω))

It is called regular jump process if in addition, for almost all ω ∈ Ω the set A(ω) ofdiscontinuities of the function t 7→ Xt(ω) is σ-discrete, that is, for all c ≥ 0,

|A(ω) ∩ [0, c]| <∞ ,

where the |B| stays for number of elements in the set B

It should be mentioned that if the stochastic process X is regular then it is right-continuous and P (ζ =∞) = 1.

Theorem 1. A regular jump stochastic process is stable and conservative.

From now on we will consider only regular jump stochastic processes. And we candefine

Definition 10 (Continuous-Time Markov Chain). Let I be a countable set and stochas-tic process (Xt)t≥0 is I-valued (i.e. takes its values on I). Then (Xt)t≥0 is calleda continuous-time Markov chain if for all i, j, i1, . . . , ik ∈ I, all t, s ≥ 0, and alls1, . . . , sk ≥ 0 with sl ≤ s for all l ∈ [1, k] it holds that

P (Xt+s = j|Xt = i,Xs1 = i1, . . . , Xsk= ik) = P (Xt+s = j|Xs = i)

A continuous-time Markov chain is called homogeneous (HCTMC) if the right-handside of the equality is independent of s. Otherwise it is called inhomogeneous (ICTMC).

Let the matrix P (t) have entries pij(t) = P (Xt+s = j|Xs = i) , i, j ∈ I. The family(P (t))t≥0 is the transition semigroup of the continuous-time homogeneous Markov chain(HMC). It satisfies the Chapman-Kolmogorov equation

P (t+ s) = P (t)P (s)

And it also holds that P (0) = E, where E is identity matrix. If one is interested in theprobability for the continuous-time HMC to be in a certain state i ∈ I for a given timeinstant t, the definition of the distribution should be introduced:

Definition 11 (Distribution of Continuous-Time Homogeneous Markov Chain). LetX be a continuous-time HMC. Then µ(0) = µi(0)i∈I is called initial distribution ifµi(0) = P (X0 = i). The vector µ(t) = µi(t)i∈I is then distribution at time t of X ifµi(t) = P (Xt = i). It is obtained from the initial distribution by the formula

µ(t) = µ(0)P (t)

9

One can prove that the distribution µ(t) also satisfies the property that for all t, s ≥ 0

µ(t+ s) = µ(t)P (s)

For a continuous-time inhomogeneous Markov chain the distribution is described by

Definition 12 (Distribution of Continuous-Time Inhomogeneous Markov Chain). LetX be a continuous-time inhomogeneous Markov chain. Then µ(0) = µi(0)i∈I is calledinitial distribution if µi(0) = P (X0 = i). The vector µ(t) = µi(t)i∈I is then distribu-tion at time t of X if µi(t) = P (Xt = i). It is obtained from the initial distribution bythe formula

µ(t) = µ(0)P (0, t)

Notice that in case of the inhomogeneous behaviour in time additional parameter isintroduced that describes the correspondent time step. For instance, in the Definition 12we do the step of size t, starting at time 0. One can prove that for the given distributionat time t and a time step s, the distribution µ(t, s) can be computed via

µ(t+ s) = µ(t)P (t, s)

Natural question arises - what is the connection between transition semigroup (P (t))t≥0

of X and infinitesimal generator matrix Q. In the view of semigroup properties, for allt ≥ 0 and all h ≥ 0

P (t+ h)− P (h)h

= P (h)P (t)− I

h=P (t)− I

hP (h)

Therefore, if state space I is finite, the passage to the limit is allowed and one obtainsthe differential system

d

dtP (t) = P (t)Q = QP (t) ,

where Q is the infinitesimal generator. The equation

d

dtP (t) = QP (t)

is called Kolmogorov’s backward differential system. The forward differential system is

d

dtP (t) = P (t)Q

For a given I, a unique solution of both with the initial condition P (0) = E is

P (t) = etQ ,

where the exponential of a finite-dimensional matrix C is defined by

eC =∞∑n=0

Cn

n!

10

This is the unique solution for given initial data. When the state space is infinite,difficulties may arise in the passage to the limit when h → 0 because the possiblyinfinite sums is involved. For this case one has the two following theorems.

Theorem 2 (Backward Kolmogorov System). If the continuous semigroup (P (t))t≥0 isstable and conservative, Kolmogorov’s backward differential system is satisfied

Theorem 3 (Forward Kolmogorov System). If the continuous semigroup (P (t))t≥0 isstable and conservative, and for all states i and all t ≥ 0∑

k∈Ipik(t)qk <∞ ,

then Kolmogorov’s forward differential system is satisfied

The condition in Theorem 3 is satisfied when the rates are bounded, that is:

supi∈I

qi <∞

A unique solution for Kolmogorov’s systems in case of infinite state space can befound if the correspond semigroup (P (t))t≥0 is stable, conservative and the process isnot explosive.Each Markov chain can be described by its state-transition graph, called intensity graph.Nodes in this graph represent elements of the state space of the chain. It has an edgefrom state i to state j (labeled by qij) whenever qij > 0. The Markov chain is uniquelydetermined by its intensity graph since the diagonal entries of infinitesimal generatormatrix Q are given by the negative sum of the edge labels of a state

qi =∑

j∈I,j 6=iqij

The example of such intensity graph is given in Figure 3:

Figure 3: An intensity graph of the Markov chain

11

.

We have given basic definitions and needed properties of Markov processes and chains.Please recall that in the following we use only regular jump Markov chains. Thereforeunder the assumption of countable state space we will have no problems with inter-changing the limit and summation for infinitesimal generator matrix. Detailed proofs ofMarkov chain properties for conditions in question can be found in [3].

2.4 Poisson Processes

Poisson processes are some of the simplest examples of continuous-time Markov chains.They arise in many applications like a model for the arrival process at a store or for thearrivals of calls in a call-center. In general, it is the natural model for any uncoordinatedstream of discrete events in continuous time and this fact makes them very useful forpractical application purposes. Let Ω be a sample space and P a probability measureon it. Then arrival process N = Nt, t ≥ 0 is defined on Ω such that for any ω ∈ Ω,the mapping t 7→ Nt(ω) is non-decreasing, increases by jumps only, is right-continuous,and has N0(ω) = 0. The example is given in Figure 4:

Figure 4: The function Nt(ω) for some realization ω ∈ Ω

Definition 13 (Poisson Process). An arrival process N = Nt, t ≥ 0 is called aPoisson process if the following axioms hold:

1. for almost all ω, each jump of t 7→ Nt(ω) is of unit magnitude

2. for any t, s ≥ 0, Nt+s −Nt is independent of (N(u))u≤t

3. for any t, s ≥ 0, the distribution of Nt+s −Nt is independent of t

Axiom 2 in the Definition 13 expresses the independence of the number of arrivals in(t, t + s] from the past history of the process until time t. It implies that Nt+s − Nt

is independent of Nt1 , Nt2 , . . . , Ntn provided that t1, t2, . . . , tn ≤ t and the fact that

12

numbers of arrivals in disjoint time intervals are independent. According to the axiom1 one can prove that

limt→0

1tPNt ≥ 2 = 0

It means that the process N can do more than one jump in the infinitesimal time periodwith probability 0. We will often use the Poisson process later, therefore we need tostate here law of distribution for it.

Theorem 4 (Distribution of a Poisson Process). If N = Nt, t ≥ 0 is a Poissonprocess, then for any t ≥ 0,

PNt = k =e−λt(λt)k

k!, k = 0, 1, . . . ,

for some constant λ ≥ 0.

We also refer to the latter probabilities as to β(λt, k). Note that we have the constantλ without any particular meaning attached to it. Note that for any t ≥ 0,

E[Nt] =∞∑n=0

nPNt = n =∞∑n=0

ne−λt(λt)n

n!= λt

Thus λ is the expected number of arrivals in an interval of unit length. λ is called anarrival rate or an intensity of the Poisson process N . Another important property isgiven by the following

Theorem 5. Let N = Nt, t ≥ 0 be a Poisson process. Then, conditional on N havingexactly one jump in the interval [s, s+t], the time at which that jump occurs is uniformlydistributed on [s, s+ t].

A Poisson process N can also be described by the diagram given in Figure 5.

Figure 5: An intensity graph for the Poisson process N

And the associated infinitesimal generator matrix Q is given by

Q =

−λ λ 0 0 . . .0 −λ λ 0 . . ....

......

.... . .

Please notice that in this section we consider homogeneous Poisson processes which

means that intensity λ is independent of current time t. Otherwise the correspondentprocess is inhomogeneous. We consider them in the next subsection.

13

2.5 Inhomogeneous Poisson Process

Definition 14. A stochastic process N = Nλ(t),t, t ≥ 0 is an inhomogeneous Poissonprocess (IPP) with the rate λ(t), if:

1. increments occur one at a time

2. Nλ(t),(t,t+∆) = Nλ(t),t+∆ −Nλ(t),t is independent of Nλ(t),u for 0 ≤ u ≤ t

We assume that for a given time interval [t, t+ ∆) the rate function of IPP Nλ(τ),τ isbounded (0 ≤ λ(τ) <∞, where τ ∈ [t, t+ ∆)).

Theorem 6 (Distribution of the Inhomogeneous Poisson Process). Let us denote theprobability for a Poisson process N to make k jumps within the interval [t, t + ∆) oflength ∆ by P [Nλ(τ),(t,t+∆) = k]. It can be computed as

P [Nλ(τ),(t,t+∆) = k] = P [Nλ(t,t+∆),(t,t+∆) = k]

=

(λ(t, t+∆) ·∆

)kk!

· e−λ(t,t+∆)·∆

= β(λ(t, t+∆) ·∆, k

)The theorem says that for a fixed interval distribution of events generated by an IPP

equals the distribution of events generated by a HPP with the average rate of the IPP([10]). The superposition of independent IPPs satisfies the next property:

Theorem 7. Let Nλ1(τ),(t,t+∆) and Nλ2(τ),(t,t+∆) be two independent IPPs, then is anNλ1(τ),(t,t+∆) +Nλ2(τ),(t,t+∆) IPP with rate λ1(τ) + λ2(τ) in the interval [t, t+∆). Theprocess is Nλ1(τ),(t,t+∆) + Nλ2(τ),(t,t+∆) called superposition of IPPs Nλ1(τ),(t,t+∆) andNλ2(τ),(t,t+∆).

The next property describes the way to calculate the probability of the occurrence forcertain event type at time t:

Theorem 8. Let Nλ1(τ),(t,t+∆) and Nλ2(τ),(t,t+∆) that are superposed. The event observedat time t is an event of type i = 1, 2 with probability

λi(t)λ1(t) + λ2(t)

The associated infinitesimal generator matrix Q(t) for an IPP Nλ(t),t at time t is thengiven by

Q(t) =

−λ(t) λ(t) 0 0 . . .0 −λ(t) λ(t) 0 . . ....

......

.... . .

14

3 Uniformization for Markov Chains

Consider a continuous-time homogeneous Markov chain X with initial distribution µ(0)and infinitesimal generator matrix Q. To simplify the introduction of the uniformizationtechnique in this section we first suppose the state space I of the HMC X to be finite.The probabilities of interest for us are state probabilities P (Xt = x) i.e. probabilitiesfor the process X to be in state x after t time units. Using the Kolmogorov’s forwarddifferential for a regular Markov chain (and X is regular as it is finite) we obtain that

P (t) = etQ

and the state probabilities can be calculated as

µ(t) = µ(0)P (t) = µ(0)eQt = µ(0)∞∑i=0

(Qt)i

i!

where the matrix Q is finite. In case of the ICTMC a distribution µ(t) can not becomputed in this fashion since infinitesimal generator depends then on time t and thefollowing equation system has to be solved ([14]) :

dµ(t)dt

= µ(t)Q(t)

Van Moorsel and Wolter also give the closed-form solution for this equation under theinitial condition µ(0). We see now that we need to calculate a matrix exponential orapply some ODE solver. It gives us all state probabilities and it does not matter howsmall these values are (for instance, w.r.t. some predefined threshold). It can be a criticalpoint in the implementation to store all the values. Another problem is the calculationof the matrix exponential which is costly operation especially for large state spaces.The opportunity to get rid of these computational difficulties and to have more precisecontrol over the numerical analysis process motivates an alternative solution methodcalled uniformization.

3.1 Uniformization for Time-Homogeneous Markov Chains

To start with the description of uniform Markov chain construction we need to introducethe concept of a discrete-time Markov chain.

Definition 15 (Discrete-Time Markov Chain). Let I be a finite set. A discrete-timeMarkov chain (DTMC) is a family of random variables (Yn)n∈N0 such as Yn : Ω 7→ Iand Y fulfils the Markov property

P (Yn+1 = y | Yn = xn, Yn−1 = xn−1, . . . , Y0 = x0) = P (Yn+1 = y | Yn = xn)

for all n, y, x0, . . . , xn ∈ I

15

If the probability P (Yn+1 = y | Yn = xn) does not depends on n then DTMC is calledhomogeneous. Otherwise it is a inhomogeneous DTMC.We assume that the states in I are ordered by a function f : I 7→ N and that thetransition probabilities pij = P (Yn+1 = f−1(j) | Yn = f−1(i)) do not depend on n.Then the transition probabilities can be arranged in a matrix P = (pij)i,j∈1,2,.... Thematrix P is called the transition probability matrix of DTMC Y . Note that it fulfils theproperties of a stochastic matrix given in the Definition 5Let p0 be the row vector that contains the initial distribution, i.e., the entries P (Y0 = x)for all x. We now find that p1 = p0 · P since

P (Y1 = x) =∑y∈I

P (Y1 = x | Y0 = y) · P (Y0 = y)

Here we multiply the transition probability (to go from y to x) with the probability tobe in the state y and sum up over all possible y ∈ I. Application of this rules severaltimes results in:

pn = pn−1 · P = . . . = p0 · Pn

that is, the probabilities after n steps are obtained by n multiplications with P .For a given HMC X, the uniformization of X is based on the construction of

1. a discrete-time Markov chain (DTMC) Y that represents the sequence of statesvisited by X

2. a Poisson process N that represents the jump times of X

Now let (Yn)n∈N0 be a DTMC with discrete state space I and N be a Poisson processwith intensity λ > 0. Assume that N is independent of Y . For t ≥ 0 the uniform Markovchain X is defined as

Xt = YNt

The process X is indeed a CTMC. The processes N is called clock and Y is calledsubordinated DTMC. Thus the state probabilities of X are given by

P (Xt = x) = P (YNt = x)=

∑∞n=0 P (Yn = x,Nt = n)

=∑∞

n=0 P (Yn = x) · P (Nt = n)=

∑∞n=0 P (Yn = x) · e−λt (λt)n

n!

If p0 is the initial distribution of the DTMC Y and P is the transition probabilitymatrix for it then

p(t) =∞∑n=0

p0 · Pn · e−λt(λt)n

n!=∞∑n=0

pn · e−λt(λt)n

n!,

where p(t) contains state probabilities of X and pn represents state probabilities of Y .It can be seen from the formula that one has to multiply transition probabilities ofsubordinated DTMC after n steps with the weights given by Poisson probabilities.

16

Now we need to construct a Poisson process N and a DTMC Y for a given CTMC Xto express it as the corresponding uniform Markov chain. Let Q be the infinitesimalgenerator matrix of X. If it is a finite regular jump chain, then

supi|qii| <∞

We choose λ ≥ supi |qii|. In case where the state space I is infinite we still can implementthe method if the condition supi|qii| < ∞ is satisfied. Otherwise special truncationprocedures have to be applied.Using the determined λ we put P = E + 1

λQ. Then Q = λ(P − E) and

p(t) = p(0) · eQt = p(0) · eλtP−λtE = p(0) · eλtP · e−λt

Using the definition of matrix exponential

p(t) = p(0) · eλtP · e−λt =∞∑n=0

p(0) · Pn · e−λt (λt)n

n!=∞∑n=0

p(0) · Pn · β(λt, n)

Note that P = E+ 1λQ is a stochastic matrix as for all i, k ∈ N with i 6= k pik = qik/λ ≥ 0,

pii = 1 + qii/λ ≥ 0 and∑j∈N

pij = 1 +∑j∈N

qijλ

= 1 +qiiλ

+∑

j∈N,j 6=i

qijλ

= 1

Now P defines a DTMC Y having the vector pn of state probabilities P (Yn = i). Andwe can identify e−λt (λt)n

n! to be probabilities of a Poisson process N with the intensityλ. One can truncate the obtained sum w.r.t. predefined accuracy ε > 0 for Poissonprobabilities ([7]). To avoid big number of states having nonzero probability in pn athreshold δ can be introduced (for entries of pn). Then all the values in pn < δ can beset to zero. Another reason for using the latter infinite sum instead is the fact that itcontains only non-negative summands (and in case of the sum with power of matrix Qsummands can be negative also). Uniformization method to calculate state probabilitiesfor CTMC X has been proven to be computationally effective and stable [12].

3.2 Uniformization for Time-Inhomogeneous Markov Chains

Consider ICTMC X with the finite state space I = 1, . . . , n. In the homogeneouscase transitions are described by a generator matrix Q and in the inhomogeneous case itdepends on the time t and is denoted by Q(t). Usually the ICTMC results from a modeldescription where certain types of events happen. Each transition belongs to the certainevent type. It implies that they have the same time dependent behaviour. Assume thatthere are M different functions λ(m)(t) for all t ≥ 0. They determine the behaviour ofthe rates in the model w.r.t. an event of type m. At the state level of the ICTMC eventm causes transitions that are described in matrix Q(m). And the rate of transition from

17

state i to state j at time t for event m is given by λ(m)(t)Q(m)ij . In addition, let the row

sum for matrix Q be 0:Q

(m)ii = −

∑j 6=i

Q(m)ij

One can see that Q(m) is independent of time and the transition rates depend on t onlyvia the functions λ(m)(t). We can rescale them to obtain

maxi|Q(m)

ii | = 1

Using these definitions we can express the infinitesimal generator matrix as

Q(t) =M∑m=1

λ(m)(t)Q(m)

Example 1. Consider the following ICTMC with 4 different event types (we assumethat i-th transition refers to the i-th event type) and 4 states

Assume the rate functions are α1(x, t) = k1x1t = x1t, α2(x, t) = k2x1t2 = 2x1t

2,α3(x, t) = k3x12t2 sin(t) = 4x12t2 sin(t), α4(x, t) = k4x12t = x12t. The generator matrixQ(t) is then given by

Q(t) =

−2t− 4t2 2t 4t2

0 −6t 6t0 8t2 sin t −8t2 sin t

It can be expressed as the following sum

Q(t) =4∑

m=1

λ(m)(t)Q(m)

Q(t) = t

−2 2 00 0 00 0 0

+ t2

−4 0 40 0 00 0 0

+ 2t

0 0 00 −3 −30 0 0

+ 2t2 sin t

0 0 00 0 00 4 −4

where λ(1)(t) = t, λ(2)(t) = t2, λ(3)(t) = 2t, λ(4)(t) = 2t2 sin t are the functions describingthe time-dependent behaviour.

18

We assume that 0 ≤ λ(m)(t) < ∞ and all λ(m)(t) are right continuous functions onthe interval [0, T ], T <∞ where the ICTMC is analysed. Let p(t) be the distributionat time t > 0 for a given initial distribution p(0). The matrix

P (m) = Q(m) + E

is a stochastic matrix since the maximum absolute value of the elements in Q is 1. ThusP determines the transition probabilities of the embedded DTMC for event of type m.Now our goal is to compute the distribution p(T ) having p(0) as the initial one. It canbe obtained using the p(0) following formula for t > 0

dp(t)dt

= p(t)Q(t)

with the initial condition p(0). In contrast to the homogeneous case, for ICTMC p(t) isusually not equal to p(0) exp

(∫ t0 Q(τ)dτ

). Therefore it has to be solved step by step.

In each step we compute p(t+ ∆) from a known vector p(t). For the event m rates aredenoted by

λ(m)−(t, t+ ∆) = minτ∈[t,t+∆)

(λ(m)(τ)

)λ(m)+(t, t+ ∆) = max

τ∈[t,t+∆)

(λ(m)(τ)

)λ(m)(t, t+ ∆) =

1∆

t+∆∫t

λ(m)(τ) dτ

which describe the minimum, maximum and the average rate in the interval. In thesame manner we define overall event rate of the ICTMC:

Λ(τ) =M∑m=1

λ(m)(τ)

Λ−(t, t+ ∆) = minτ∈[t,t+∆)

Λ(τ)

Λ+(t, t+ ∆) = maxτ∈[t,t+∆)

Λ(τ)

Λ(t, t+ ∆) =1∆

t+∆∫t

Λ(τ) dτ

As opposed to a HMC where a homogeneous Poisson process is used during the uni-formization procedure, here we exploit inhomogeneous Poisson process (IPP) introducedin Section 2.5. Let us formulate two basic theorems about the ICMTC uniformizationprocedure using the conception of IPP.Assume that the total number of event types M = 1, which means that all rates followthe same function up to a constant factor. Then the following holds:

19

Theorem 9. Let Q(t) = λ(t)Q be an infinitesimal generator matrix (where Q and λ(t)are defined as before), p(0) the distribution of ICTMC X at time t = 0 and P = Q+E.Then

p(t) = p(0)∞∑k=0

e−λ(0,t)t (λ(0, t)t)k

k!P k

Observe that ICMTC is uniformized using the average rate of the state with maximalexit rate in contrast to the uniformization of HMC where the constant rate of the statewith maximal rate in the interval is used.

Now let us consider the case of M > 1 events and assume that all functions λ(m)(t)are independent

Theorem 10. Let the infinitesimal generator be given by

Q(t) =M∑m=1

λ(m)(t)Q(m)

Transitions of the correspondent ICTMC occur according to an IPP with rate Λ(t) andinhomogeneous DTMC with matrix

P (t) =M∑m=1

λ(m)(t)Λ(t)

P (m)

This means that the ICTMC is composed of M ICTMCs defined by the matricesλ(m)(t)Q(m). Each of these ICTMCs is described by an IPP with rate λ(m)(t) and aDTMC with matrix P (m). According to Theorem 7, M IPPs together describe an IPPwith rate function Λ. Theorem 8 says that event m occurs at time t with probabilityλ(m)(t)/Λ(t). Then the transition matrix of the embedded IDTMC equals P (m) andthe resulting matrix at time t is obtained via weighted summation of matrices P (m) forall M event types. The latter theorem does not provide a straightforward approach forcomputing the distribution however a step-by-step procedure based on it can be created.It has the advantage that the uniformization rate for HMC has to be at least maximumexit rate whereas for ICTMC the mean value (over time) is used Λ(t, t+∆). Mean valueis computed for the rate function of the state having the maximal exit rate among allstates in a state space.

3.3 Bounding Method for Transient Probabilities

The theorems given above allow to compute lower and upper bounds for the transientdistribution. Let p−(t) be an element-wise lower bound for p(t), i.e. p−i (t) ≤ pi(t)for all i. Our aim now is to compute p−(t + ∆) using the given p−(t). Assume wehave M possible events, then probability for event of the type m to occur at the time

20

τ is λ(m)(τ)/Λ(τ). Denote by r(m)− the lower bound for the probability of m-eventoccurrence:

r(m)− = minτ∈[t,t+∆]

(λ(m)(τ)Λ(τ)

)Using it we can determine matrix of the subordinated DTMC. It is given by the next

Theorem 11. If r(m)− ≤ λ(m)(τ)Λ(τ) for all τ ∈ [t, t + ∆], then the matrix of subordinated

DMTC is

P− =M∑m=1

r(m)−P (m) ≤ P (τ)

Assume now that initially lower bound is given by π0(−) = p−(t) and πk(−) =πk−1(−)P− for k > 0. Then a lower bound for probabilities after ∆ time units is givenby

p−(t+∆) =∞∑k=0

β(Λ(t, t+∆)∆, k

)πk(−)

This infinite sum can be truncated using the required error bound in the way similar tothe HMC uniformization. The left truncation point in our case is often 0 because ∆ isusually small. Vectors πk(−) are multiplied with the sub-stochastic matrix P− due to

M∑m=1

r(m)− ≤ 1

If we denote by ρ− the latter sum

ρ− =M∑m=1

r(m)−

then one looses 1− ρ− of the probability mass on each iteration (eT is the unit columnvector):

π(k)−eT = π(k−1)−eT − (1− ρ−)

.

The bound can be improved if a fixed number of events occurs in the interval. Assumethat only one event had place in the interval [t, t+∆]. In this case

r(m) =λ(m)(t, t+∆)Λ(m)(t, t+∆)

The conditional distribution in the case of only one event is given by

φ = p−(t)

(M∑m=1

λ(m)(t, t+∆)Λ(m)(t, t+∆)

P (m)

)

21

Then the improved bound for p−(t+∆) can be found using

p−(t+∆) = β(Λ(t, t+∆)∆, 0

)p−(t)

+ β(Λ(t, t+∆)∆, 1

)φ

+K∑k=2

β(Λ(t, t+∆)∆, k

)πk(−)

Observe that the second term in sum is now the exact vector in case when accurate p(t)is given. One can extend this procedure to get refined probability values for the case ofk = 2, 3, . . . events occurred in the interval. Having the lower bound, the upper bound isobtained via

p+(t+∆) = p−(t+∆) + εbe

where εb = 1− p−(t+∆)eT is the missing probability mass.

3.4 Approximate Uniformization for ICTMC

In the method described above one obtains strict bounds for the transient state prob-abilities. When one uses ODE solvers for the Kolmogorov equation, only approximatesolution can be obtained. It is also possible to use uniformization as a kind of approx-imation method. To do this, time dependent rates in the interval [t, t + ∆) can besubstituted by the average rates in [t, t+∆):

p(t+∆) = p(t)∞∑k=0

β(Λ(t, t+∆) ·∆, k

)( M∑m=1

λ(m)(t, t+∆)Λ(t, t+∆)

P (m)

)kTruncation point can be computed similarly as for HMC uniformization, via the Poissonprobabilities truncation. This approximation is based on the assumption of uniformlydistributed Poisson events in the interval [t, t + ∆). Due to the time-dependent ratesthe exact distribution is not uniform. Using this fact, the approximation for p can berefined.The quality of the approximation depends on the time step size ∆ and the behaviour oftime-dependent rate functions λ(m)(t) in the interval [t, t+ ∆).

For both given uniformization methods the time step size ∆ has to be chosen. We willaddress this problem later because it is closely related to the type of considered modelsand therefore to the certain types of rate functions λ(m)(t).

4 Markov Population Models

Markov chains with large (or infinite) state spaces are usually described by some formal-ism which allows to generate a possibly infinite set of states and transitions. We stickto the Markov population model (MPM) formalism and use transition classes to specifyit. A MPM is a continuous-time Markov chain (CTMC) Xt, t ≥ 0 with state space

22

I = Zn+ = 0, 1, . . .n. Here the k-th state variable represents the number of instancesof the k-th species. Species are interpreted w.r.t. type of system and they can shownumber of customers, calls, molecules, etc.

Definition 16 (Transition Class). A transition class η is a triple (G, v, α) where G ⊆ Zn+is the guard, v ∈ Zn is the change vector, and α : G× R≥0 → R≥0 is the rate function.

The rate function α(x, t) determines the time-dependent transition probabilities foran infinitesimal time step of length dt:

P (Xt+dt = x+ v | Xt = x) = α(x, t) · dt

The guard is the set of states where an instance of η is possible, and if the current stateis x ∈ G then x+ v ∈ Zn+ is the state after an instance of η has occurred.Let CTMC X has infinitesimal generator matrix Q(t) such that the row that describesthe transitions of a state x. A CTMC X can be specified by a set of M transition classesη1, . . . , ηM . Q(t) has entry αi(x, t) at position Q(t)x,x+vi whenever x ∈ Gi. Observe thatnumber of transition classes relates to the number of events described in Section 3. Fori ∈ 1, . . . ,M, let ηi = (Gi, vi, αi). Assume that each change vector vi has at least onenon-zero entry and all the rate functions αi are right continuous.

In this thesis we concentrate on the biological systems and the relation between tran-sition class conception and stochastic chemical kinetics is shown by the next example(and basics are addressed in Section A).

Example 2. Consider a simple gene expression model for E. coli cells [13]. It consistsof the transcription of a gene into messenger RNA (mRNA) and subsequent translationof the latter into proteins. A state of the system is uniquely determined by the number ofmRNA and protein molecules, that is, a state is a pair (xR, xP ) ∈ Z2

+. We assume thatinitially there are no mRNA molecules and no proteins in the system, P

(X(0) = (0, 0)

)=

1. There are four types of reactions that can occur in the system. Let j ∈ 1, . . . , 4 andηj = (Gj , uj , αj) be the transition class that describes the j-th reaction type. We definethe guard sets G1, . . . , G4 and the update functions u1, . . . , u4.

1. Transition class η1 models gene transcription. The corresponding stoichiometricequation is ∅ →mRNA. If a η1–transition occurs, the number of mRNA moleculesincreases by one. Thus, u1(xR, xP ) = (xR+1, xP ). This transition class is possiblein all states, i.e., G1 = Z2

+.

2. Transition class η2 represents the translation of mRNA into protein. The corre-sponding reaction is mRNA→ mRNA+P. A η2-transition is only possible if there isat least one mRNA molecule in the system. We set G2 = (xR, xP ) ∈ Z2

+ | xR > 0and u2(xR, xP ) = (xR, xP + 1). Note that in this case mRNA is a reactant that isnot consumed.

3. Both mRNA and protein molecules can degrade, which is modelled by η3 and η4

(mRNA → ∅ and P → ∅). Hence, G3 = G2, G4 = (xR, xP ) ∈ Z2+ | xP > 0,

u3(xR, xP ) = (xR − 1, xP ), and u4(xR, xP ) = (xR, xP − 1).

23

Let k1, k2, k3, k4 be real-valued positive constants. We assume that transcription happensat rate α1(xR, xP , t) = k1 · V (t), that is, the rate is proportional to the cell volumeV (t) [15]. The time-independent translation rate depends linearly on the number ofmRNA molecules. Therefore, α2(xR, xP , t) = k2 · xR. Finally, for degradation, we setα3(xR, xP , t) = k3 · xR and α4(xR, xP , t) = k4 · xP .

Now we consider the transient probability distribution for a MPM model. For the pairof states (x, y) it is described by P (Xt+∆ = y | Xt = x) for t,∆ ≥ 0. For a given initialdistribution p(0)(x) = P (X0 = x) for all x ∈ I the state probabilities are given by

p(t)(y) =∑

x∈Ip(0)(x) · P (0, t)xy

As mentioned in Section 2.3, we assume that ICTMC X is a regular jump Markovchain. This means that MPM has the same properties. For a given state space I (whichis infinite now) we assume that limit and summation signs can be exchanged and Q(t)suffices forward and backward Kolmogorov equations for 0 ≤ t0 ≤ t:

d

dtP (t0, t) = Q(t)P (t0, t)

d

dtP (t0, t) = P (t0, t)Q(t)

By multiplication with distribution row vector p(t)(x) one obtains

d

dtp(t) = p(t)Q(t)

4.1 Uniformization for MPMs

Let us reformulate the ICTMC uniformization method given in the Section 3.2 usingthe MPM formalism. As before, we need to compute state probabilities P (Xt = x) ofan MPM X for any t ∈ [0, T ]. It is done via the construction of subordinated DMTC Yand clock N . Assume that we have M possible transition classes. If Y is in state x attime t then the probability for j-th transition (j = 1, . . . ,M) to occur is given by

P (t) (Yi+1 = x+ vj | Yi = x) =αj(x, t)Λ(t)

These probabilities define the IDTMC Y . However to use Theorem 10 we need to extractthe time-dependent part and define the set of functions λm(t).

For the rate functions αj(x, t), j ∈ 1, . . . ,M we make the following assumptions:

1. The function αj(x, t) can be split into two parts: rj(x) and λj(t) such thatαj(x, t) = λj(t) · rj(x) (note that such splitting can always be done in case ofchemical reaction networks). Thus, the functions λj : R≥0 7→ R≥0 contain thetime-dependent part (but are state-independent) and the functions rj : I 7→ R≥0

contain the state-dependent part (but are time-independent).

24

2. The function αj(x, t) grows at most polynomially in the state variables and thepolynomial is of at most second order.

3. Function αj(x, t) increases monotonically in the state variables.

The second and the third assumptions come from the standard mass action kinetics.Therefore, the transient probabilities of the IDTMC can be computed via


=λj(t) · rj(x)

Λ(t)

Let the infinitesimal generator matrix of MPM X be Q(t) =∑M

m=1 λm(t)Q(m) andΛ =

∑Mm=1 λm(t). The subordinated IDTMC Y is determined by the matrix

P (t) =M∑m=1

λm(t)Λ(t)

Pm =M∑m=1

λm(t)Λ(t)

(Q(m) + E

)where matrices Q(m) describe possible transitions caused by the event of type m. Ele-ments of Q(j)(x, y) = rj(x) 6= 0 whenever the transition from state x to y = x + vj ispossible (for all j = 1, . . . ,M). Diagonal elements are Q(j)(x, x) = −

∑y 6=x Q

(j)(x, y)and by rescaling of functions λj(t) we obtain supx |Q(j)(x, x)| = 1. This procedure canalways be done if the special truncation procedure is applied to the state space as intro-duced below.

Now we have to describe the clock process N . Having Y and N we can compute stateprobabilities:

P (Xt = x) = P (YNt = x)

N is IPP with the rate function Λ(t):

Λ(t) ≥ supx∈I

M∑j=1

αj(x, t)

To use the Theorem 10 we need to have I finite. Here we relaxed this condition andallow it to be infinite. In this case we can have

supx∈I

M∑j=1

αj(x, t) =∞

It means that IPP N is not well-defined and the truncation of state space I has to bedone for a given time t.

If the supreme of the function∑M

j=1 αj(x, t) is ∞, then it means that for some valuesof j

supx∈I

αj(x, t) = supx∈I

λj(x, t) · rj(x) =∞

25

and, as |Q(j)(x, x)| ≤ 1

supx∈I

λj(t) · rj(x) ≤ supt∈[0,T ]

λj(t) =∞

According to the second assumption about the rate functions α(x, t), the state depen-dent part rj(x) has to be polynomial in state variables of at most second order. To havesupx∈I |Q(j)(x, x)| = 1 we need to do the rescaling of time-dependent part λj(t). Thusthe functions λj(t) have the following general form for j = 1, . . . ,M , k = 1, . . . , n:

λj(t) = fj(t) ·Nj

Nj =∏k

Njk

where fj(t) shows the time-dependency and Njk is the maximal value for k-th statevariable which is changed according to the k-th event type (reaction). For one-molecularreactions (A → . . .) the constant N = NA. For bi-molecular reactions (A + B → . . .)this constant is given by N = NA ·NB.The next example shows how can this rescaling procedure result in the infinite supremum:

Example 3. Assume that we have a system consisting of only one reaction ∅ → A andthe rate function is α(x, t) = xA · t. Then the state space of the system is I = Z+

and xA stays for the number of type A molecules. Consider time interval of interest[0, T ], T < ∞. Splitting the rate function into time-dependent and time-independentparts gives α(x, t) = t · xA = r(x) · λ(t). To have 1 as the maximum absolute value ofelements in matrix Q, we do a rescaling:

λ(t) = t ·NA

r(x) =xANA

where NA is the maximum value of type A molecules which can be in the system. Asx ∈ I and I is infinite state space, we can obtain NA =∞ and supt∈[0,T ] λ(t) =∞

4.2 State Space Truncation

Consider a time interval [t, t+∆) of length ∆. Assume that the transient distribution ofX at time t is p(t) and it has finite support It,0. We wish to approximate the distributionp(t + ∆). Denote by It,R the set of states reachable from It,0 after at most R eventshappen, R ≥ 1. To make an appropriate truncation of the state space, let us considerthe probability for the IPP N to do i steps within the time interval [t, t+ ∆):

P(N(t,t+∆) = i

)= P (Nt+∆ −Nt = i)

For a fixed 0 < ε 1, let R, ∆ and the rate function Λ be such that N performs atleast R steps with probability 1− ε:

R∑i=0

P (N(t, t+ ∆) = i) ≥ 1− ε

26

and for any τ ∈ [t, t+ ∆) the rate must satisfy

Λ(τ) ≥ supx∈It,R

M∑j=1

αj(x, τ)

In this way we truncate the state space It,R, allowing only R steps to be done within[t, t+ ∆) and R is always finite positive constant which can be found for a given ε. Thenit is guaranteed that

supx∈It,R

M∑j=1

αj(x, t) <∞

and the IPP N is non-explosive. Note that the function Λ(τ) depends on τ , t, ∆, It,0,and R. So that finding the appropriate values for ∆ and R for a given t, It,0 and ε isnot trivial. Λ(τ) determines the speed of IPP N and thereby influences the value or R(because probability to do R steps should be more or equal to 1− ε). R determines thesize of the set It,R and thus influences Λ(τ) (because it should be greater or equal to themaximal exit rate for x ∈ It,R.

Assume we have appropriate ∆, R, Λ(τ) and the distribution at time t is given byvector p(t). Then for all states x ∈ I we get ε-approximation

P (Xt+∆ = x) ≥R∑i=0

P(Yi = x ∧N(t,t+∆) = x

)Note that it is beneficial to have R small since fewer probabilities and matrix multipli-cations have to be computed. It is also better to choose Λ(τ) as small as possible whilestill satisfying the predefined properties because R is also small in this case (N jumpsat a slower rate and P

(N(t,t+∆) > i

)becomes smaller).

4.3 Uniformization Rate Calculation

To introduce the step-by-step uniformization approach we need to know how the rateof IPP N can be computed. Consider a time interval [t, t + ∆) and a fixed R ≥ 1.Probabilities P

(N(t,t+∆) = i

)follow a Poisson distribution with parameter Λ(t, t+∆)·∆,

where

Λ(t, t+ ∆) =1∆

t+∆∫t

Λ(τ) dτ =1∆

t+∆∫t

(M∑m=1

λ(m)(τ)

)dτ

Using the general form for time-dependent parts λj(t) of the exit rate function, we canreformulate this:

Λ(t, t+ ∆) =M∑m=1

1∆

t+∆∫t

fm(τ)Nm dτ

=M∑m=1

fmNm

27

fm =1∆

t+∆∫t

fm(τ) dτ

Then for all states x, y ∈ I we have

P[YNt+∆

= y | YNt = x]

= p(t)∞∑k=0

β(Λ(t, t+ ∆) ·∆, k

) (P[t,t+∆)

)k≥ p(t)

R∑k=0

β(Λ(t, t+ ∆) ·∆, k

) (P[t,t+∆)

)kwhere y is state which is reachable from x via some sequence of transitions and matrixP[t,t+∆) is the approximation of the matrix P (τ) where τ ∈ [t, t+ ∆). The matrix P (τ)defines the IDTMC Y for a given time interval (as defined in the Section 3.2). We willdiscuss two approaches for defining the matrix P[t,t+∆) in Section 4.4 and Section 4.5.

4.4 Bounding Approach for MPMs

Consider the IDMTC transition probabilities


To find the lower bound for these transition probabilities we use the under-approximationdescribed in the Section 3.3:

αj(x, t)Λ(t)

≥ minτ∈[t,t+∆)

αj(x, τ)Λ(τ)

Denote by uj(x, t, t+ ∆) the latter minimum. It determines the minimal probability forj-th event to happen for a given state x ∈ I. Self-loop probability is given by

u0(x, t, t+ ∆) = minτ∈[t,t+∆)

1−M∑j=1

αj(x, t)Λ(t)

Assume we have R defined as before. Then for i ∈ 1, . . . , R we can approximate proba-bilities P (Y (i) = x) as

P (Yi = y) ≥∑

x,j:y=x+vj

P (Yi−1 = x) · uj(x, t, t+ ∆) + P (Yi−1 = y) · u0(y, t, t+ ∆)

where the sum ranges over all possible predecessor states of y. As before, we can use thesplitting of rate functions αj(x, t) so that uj can be minimized w.r.t. to time-dependentpart only:

uj(x, t, t+ ∆) = minτ∈[t,t+∆)

αj(x, τ)Λ(τ)

=(

minτ∈[t,t+∆)

λj(τ)Λ(τ)

)· rj(x)

28

u0(x, t, t+ ∆) = minτ∈[t,t+∆)

1−

M∑j=1

λj(τ)Λ(τ)

· rj(x)

When λj and Λ are monotone in τ , the above minimum can be found analytically.We can approximate lower bound p−(t+∆) for probabilities P (Yi = x). For this purposewe compute a sequence of sub-stochastic vectors π(1), . . . , π(R). Initially we start withthe approximation π(0) = p(t) of the previous step. Then π(i+1) is computed from π(i)

using transition probabilities uj(x, t, t + ∆), j ∈ 0, . . . ,M for i ∈ 0, 1, . . . , R. Thetransition probabilities can sum up to less than one:

M∑j=0

uj(x, t, t+ ∆) ≤ 1

This means that resulting vector can also sum up to less than one:∑k∈It,i

π(i)k ≤ 1

where It,i denotes the current truncated state space after i (i ≥ 1) steps are done. Notethat the vectors π(i) contain the DMTC probabilities of the subordinated Markov chain.For the computation of p(t + ∆) (probabilities of MPM X) we sum up all the vectorsπ(i) with weights given by Poisson probabilities. It introduces an approximation errorsince R was chosen w.r.t ε (see Section 4.2). Taking the minimum among the transitionprobabilities also introduces an error (except for the case of transition probabilitieswhich are constant in [t, t + ∆)), because we substitute αj(x,τ)

Λ(τ) by the only one valueuj(x, t, t+ ∆) for the whole interval (Figure 6 illustrates this process).

29

Figure 6: The substitution of transient probabilities αj(x,τ)Λ(τ) by the minimum value in the

interval.

In general, the larger size of a time step ∆, the worse the under-approximations weobtain which results in worse approximation for p(t + ∆). It is illustrated by the nextexample.

Example 4. In the gene expression of Example 2, the time-dependence is due to thevolume and only affects the rate function α1 of the first transition class. The time foran E. coli cell to divide varies from about 20 minutes to several hours and depends ongrowth conditions. We assume a cell cycle time of one hour and a linear growth [1].Thus, if at time t = 0 we consider a cell immediately after division then the cell volumedoubles after 3600 sec. Assume that ∆ ≤ 3600. Then, α1(x, τ) = k′1 · (1 + τ

3600) for allx ∈ I. Assume we have R with properties defined above and

Λ(τ) = maxxR,xP

k′1 · (1 +τ

3600) + (k2 + k3) · xR + k4 · xP

where xR and xP range over all states (xR, xP ) ∈ I0,R. Then for every τ ∈ [0,∆) welook for a state for for which the exit-rate α0(x, τ) =

∑Mj=1 αj(x, τ) is maximal. Let

30

(xmaxR , xmax

P ) denote this state. In general we can have several states having the sameexit rate which is maximal. For instance it can happen when the system has reactionsof the form A + B ⇒ . . .. They are dependent both on cell-volume and the populationnumbers. We discuss how to overcome this difficulty in Section 5.3. The transitionprobabilities of the IDTMC Y are defined as

u1(xR, xP , 0,∆) = mint′∈[0,∆)

α1(xR, xP , t′)Λ(t′)

=α1(x, 0)

Λ(0)=

k′1k′1 + (k2 + k3) · xmax

R + k4 · xmaxP

and, for j ∈ 2, 3,

uj(xR, xP , 0,∆) = mint′∈[0,∆)

αj(xR, xP , t′)Λ(t′)

= mint′∈[0,∆)

kj · xRΛ(∆)

=kj · xR

k′1 · (1 + ∆3600) + (k2 + k3) · xmax

R + k4 · xmaxP

u4(xR, xP , 0,∆) =k4 · xP

k′1 · (1 + ∆3600) + (k2 + k3) · xmax

R + k4 · xmaxP

For the self-loop probability we have:

u0(xR, xP , 0,∆) = mint′∈[0,∆)

1−4∑j=1


=

1− maxt′∈[0,∆)

4∑j=1


= 1−

4∑j=1

αj(xR, xP ,∆)Λ(∆)

= 1−k′1 · (1 + ∆

3600) + (k2 + k3) · xR + k4 · xPk′1 · (1 + ∆

3600) + (k2 + k3) · xmaxR + k4 · xmax

P

Now the fraction of probability lost during the computation of v(i+1) from v(i) can becomputed

1−4∑j=0

uj(xR, xP , 0,∆)

=k′1 · (1 + ∆

3600)k′1 · (1 + ∆

3600) + (k2 + k3) · xmaxR + k4 · xmax

P

− k′1k′1 + (k2 + k3) · xmax

R + k4 · xmaxP

=(k2 + k3) · xmax

R + k4 · xmaxP

k′1 + (k2 + k3) · xmaxR + k4 · xmax

P

−(k2 + k3) · xmax

R + k4 · xmaxP

k′1 · (1 + ∆3600) + (k2 + k3) · xmax

R + k4 · xmaxP

.

For ∆ = 0 we have a probability loss of 0 and for ∆ > 0 the probability loss increaseswith increasing ∆.

4.5 Approximate Uniformization for MPMs

Consider the same IDMTC probabilities as before


31

When we use uniformization as an approximation method to calculate the distributionP (Y (i) = x), IDTMC probabilities for a given time interval [t, t+∆) are computed usingthe mean approximation as described in the Section 3.4:

αj(x, τ)Λ(τ)

≈ αjΛ

where τ ∈ [t, t + ∆). We denote this mean approximation by lj(x, t, t + ∆). For theself-loop probability we sum up all the transition probabilities and take the rest

l0(x, t, t+ ∆) = 1−M∑j=1

lj(x, t, t+ ∆)

For a fixed R we approximate state probabilities as

P (Yi = y) ≈∑

x,j:y=x+vj

P (Yi−1 = x) · lj(x, t, t+ ∆) + P (Yi−1 = y) · l0(y, t, t+ ∆)

As before, x stays for all possible predecessors of y. Splitting of rate functions gives:

lj(x, t, t+ ∆) =αjΛ

=

1∆

t+∆∫t

αj(x, τ) dτ

Λ

=1Λ

1∆

t+∆∫t

λj(x, τ) dτ

rj(x) =λj,[t,t+∆) · rj(x)

Λ

If all λj are integrable in elementary functions then the mean value on [t, t+ ∆) can beeasily computed. To obtain the approximation p(t + ∆) we calculate the sequence ofvectors π(1), . . . , π(R) using the same principle as in Section 4.4 Transition probabilitiessum up to one:

M∑j=0

lj(x, t, t+ ∆) = 1

The approximation error is introduced by the truncation of Poisson probabilities anddue to the substitution αj

Λby the only one (mean) value on the interval [t, t + ∆). It

is important to note that we can not provide lower and upper bounds for the transientprobability distribution but only some approximation for it. The Figure 7 illustrates theprocess.

32

Figure 7: The substitution of transient probabilities αj

Λby the mean value in the interval.

4.6 Choice of Time Step

As we can see from the Example 4, a large time horizon may lead to the decreasedaccuracy. This motivates us to introduce the partitioning for the time period of interest[0, T ], T <∞. We need to divide it into the finite number (H + 1) of subintervals:

0 = t0 < t1 < . . . < tH = T

Here we set the left bound equal to 0, but it does not restrict the generality of theapproach. We can consider arbitrary time interval [t0, T ] and substitute variable usingthe shift t = t − t0. The time step size ∆ is not constant value and it can vary duringthe iteration process. Arns et al.([2]) give an adaptive method for time step choice incase of finite state space. Due to the truncation of I and the fact that Λ(τ) is of complexstructure, we need to refine this procedure. Details are given in the Section 5.2. Generalpipeline for both MPM uniformization methods is the following. In each step we computean approximation of the transient distribution at the current time instant, p(t). Then itis used as initial condition for the next step. In each step the number of considered states

33

grows (they are represented by |It,R|). The probabilities of all remaining states in I areapproximated as zero. Thus each step yields a vector p(t+ ∆) with positive entries forall states x ∈ It,R (it approximates P (Xt+∆ = x). On the next step the vector p(t+ ∆)with support It,R = It+∆,0 is used as the initial distribution to approximate the vectorp(t+ ∆ + ∆′). (See Figure 8 for a sketch of the state truncation approach).The total error for the uniformization procedure is given by the sum of errors in eachstep. There are two sources of error, namely the error due to the substitution of thetime-dependent rate functions λj(τ) by a single value on the interval and the errorcorrespondent to the truncation of Poisson probabilities sum (choice of constant R).One can use exact formulas for Poisson probabilities (which are given in the Section 3.3)and provide accurate computation of the first n terms of the sum for P (Xt+∆ = x). Itimplies that R should be chosen small to obtain smaller errors. But in the same timeit means that only small time steps ∆ can be done and more iterations are needed toobtain the result, so the certain trade-off between running time and accuracy has to befound.

5 Algorithms and Implementation

To start the description of implementation details, we need to show how all the interme-diate tasks are solved. In the following sections we show how to cope with the problemsof the infinite state space truncation, choice of the time step and the correspondinguniformization rate Λ. Truncation of the state space plays important role for chemicalreaction networks since the species population can rise very fast in time. The value ofa time step influences both running time and accuracy. The rate function Λ defines theIPP N and it has to be determined for each subinterval, which shows the connectionwith the value of ∆. Also the truncation point for Poisson probabilities R has to be cho-sen. We address these interdependency problem later and give our algorithm to computeall needed values. Note that here we will use the notation p(t) for transient probabilitydistribution at time t (p(t)− and p(t)+ for lower and upper bound respectively).

5.1 On-the-fly Algorithm

During the iteration procedure, we create a state space truncation It,i for i ∈ 0, . . . , R.As we can see in Figure 8, the number of states to compute approximation p(T ) growson each step. On each iteration in time we initially have It,0 and for every state x ∈ It,0all states within a radius of R transitions are added. This indeed creates restrictions forapplication of the method since the approach becomes infeasible for MPMs with infinitestate spaces (or large ones). Therefore we use a strategy similar to one described in [5]to make the memory requirements not so strict.Another problem is the computational cost of matrix-vector and matrix-matrix mul-tiplications. We use method which allows to find the transient distribution withoutconducting these operations explicitly and achieve faster running times as described byDidier et al. The principle of this approach is to store only that part of a state spacewhere most of the transient probability distribution mass is located. To achieve this,

34

states are added and removed in on-the-fly fashion. A small probability threshold δ > 0is used to decide whether state should be added or removed from the current snapshot ofthe state space. Therefore we store only those states that has ”significant” probabilityvalue w.r.t δ.

State Space Data Structure A state x ∈ It,i is represented in implementation as anarray with the fields

• x.s containing the set of state variables (that indeed describe state x)

• x.dtmc containing the current DTMC probability π(i)(x)

• x.ctmc containing the current CTMC probability p(t)(x)

• x.temp which accumulated the incoming probability mass

• x.next containing pointers to all direct successor states x+ vj ,j = 1, . . . ,M

Figure 8: Illustration of the state space truncation approach for the two-dimensionalcase. Given the distribution p(t) with support It,0, a truncation point R and atime-step ∆, we compute in the first step the distribution p(t+∆) with supportIt,R = It+∆,0. For the next step we consider the set It+∆,R. From the left toright: support at time t, truncation for the first step, truncation for the secondstep.

Move of Probability Mass Assume that for some time instant t = th (1 ≤ h < H) wehave the truncated state space It,0 and the corresponding distribution of state probabil-ities is given by the vector v(0)(x) where all the entries are greater than 0. We denote itby I(0). For the vector of DTMC probabilities we have π(0) = p(th−1) as an initialization.Let It,i be the state space of states that we consider to compute π(i+1) from π(i). Thisprocess is shown in Algorithm 5, where we use the function FoxGlynnProbabilityto obtain the Poisson probabilities β

(Λ(t, t+∆)∆, i

)w.r.t the predefined ε. For each

state x ∈ I(i) we compute all the transition probabilities uj or lj (depending on the useduniformization method) for all possible events j ∈ 1, . . . ,M. We stick to the bounding

35

method (Section 4.4) as it does not change the pipeline and in the following we use onlyuj notation. We add the value x.dtmc·uj to the field (x+vj).temp for each j andx.dtmc·x.u0 to the field x.temp. Afterwards we iterate once more over all states inx ∈ I(i) and set x.dtmc = x.temp and x.temp = 0 if the value x.temp ≥ δ. Otherwisewe remove the state x, i.e., I(i+1) contains all states x with π(i+1)(x) ≥ δ. Similarly, ifthe direct successor x + vj does not exist yet and there is probability flow uj > δ fromx to x + vj then we create it and add to I(i+1). Even though x + vj might in totalreceive more than δ, we do not create it to improve the efficiency. In this way we avoidcreation of states only to test whether the sum of their incoming probability flow is largeenough, so they are immediately deleted. The process of propagating probability massis shown in Algorithm 1 (in Algorithm 2 for the approximation approach respectively),and the collection is shown in 3. As it can be seen, we never store the approximationfor stochastic matrix P (t) explicitly, but only use the definition of transition classes todetermine the values of uj . Figure 9 shows the example of how the mass can be moved.Usage of this technique seriously relaxes the memory requirements and allows to workwith large state spaces. Also we get rid of time-consuming matrix- vector multiplicationsbut it is done implicitly. We compute transient probabilities from the state x to everysuccessor x + vj , which is the same as considering the matrix-vector multiplication forall non-zero entries of P (t). Storing the states only with significant probability mass inaddition decreases the speed Λ of the IPP N since set It,R is smaller and the maximumof the uniformization rate is taken over the fewer states. It means that the mean valueΛ becomes smaller as well. This effect is particularly important if during some timeinterval in certain parts of the state space the dynamics of the system is fast while itis slow in other parts where the latter contain the main part of the probability mass.On the other hand, the threshold δ introduces another approximation error which maybecome large if many iterations in time has to be done (recall that time step size isadaptive). We can easily track how much probability has been neglected by adding upthe probability inflow which was not added to any income-field.For the approximate uniformization technique we can not establish error bounds howeverfor the bounding approach we know the total error at time t

εb = 1−∑x∈It,R

p(t)−

and the upper bound is given by

p(t)+ = p(t)− + εbe

36

Figure 9: Illustration of the probability mass move for the two-dimensional case. Giventhe distribution p(t) with support It,0, a truncation point R and a time-step ∆,we compute in the first step the distribution p(t+∆) with approximate supportIt+∆,0 ⊂ It,R. For the next step we consider the set It+∆,R. From the left toright: support at time t, truncation for the first step, truncation for the secondstep.

5.2 Step-size Calculation

In the Section 4.6 we stated the partitioning for the time interval of interest [0, T ]:

0 = t1 < t2 < . . . < tH = T

Recall that we can always shift to the arbitrary time moment using the substitutionof variables. As the size of the time step can vary, we can not determine the numberof intervals (constant H) in advance. Given an error bound ε, a time point t ∈ [0, T ],for which the support of p(t) is It,0, we calculate a time-step ∆. The probabilitiesof IPP N depend both on R and Λ thus we propose to first choose a desired righttruncation point R∗ for Poisson probabilities. It eases the whole process of choosing ∆and remove complex interdependencies. We perform an iteration where in each step wesystematically choose different values for ∆ and compare the associated right truncationpoint R with R∗. Recall that the rate of N is given by

Λ(t, t+ ∆) =1∆

t+∆∫t

Λ(τ) dτ

To reflect the dependency on R we denote it by ΛR,∆. Due to the monotonicity of theintegral part in ∆ the search can be done in binary search fashion. Initially we put ∆to T − t. Then the two bounds ∆− and ∆+ are determined and the sequential searchis done as described in Algorithm 4. The function FindMaxState(∆, R∗) finds a statexmax such that for all τ ∈ [t, t+ ∆) we have

M∑j=1

αj(xmax, τ) ≥ maxx′∈It,R∗

M∑j=1

αj(x′, τ)

37

The choice of state xmax determines the uniformization rate

Λ(τ) =M∑j=1

αj(xmax, τ)

It indeed satisfies properties defined in the Section 4.1. The function CalculateUni-formizationRate(t, t + ∆, xmax) computes the integral for Λ(τ). It was shown in theSection 4.3 how this integral can be simplified using the splitting of function α(x, t). Ifpossible we compute the integral analytically, otherwise a numerical integration tech-nique is used. The function FoxGlynn(Λ ·∆, ε) computes the right truncation point ofa homogeneous Poisson process with rate Λ for a given error bound ε, i.e. the value Rthat is the smallest positive integer such that

R∑i=0

(Λ ·∆

)ii!

e−(Λ·∆) ≥ 1− ε

For the refinement of the bounds ∆− and ∆+ in lines 15–21 we exploit that R is monotonein ∆. For the last time interval from t = tH−1 to tH = T we can have that R < R∗. Inthis case we put R∗ = R and ∆ = tH − t = T − t.

5.3 Maximal Exit Rates Calculation

The function FindMaxState(∆, R∗) in Algorithm 4 finds a state xmax such that itsexit-rate is greater or equal than the maximal exit-rate α0(x, τ) =

∑Mj=1 αj(x, τ) over

all states x in It,R∗ . In principal it is enough to find a function Λ(τ) with this property.For instance, it can be the maxx∈It,R∗

∑Mj=1 αj(x, τ), but this function may be hard to

determine analytically and it is also not clear how to represent such a function practicallyin an implementation. Selecting a state xmax and defining Λ(τ) to be the exit-rate ofthis state solves these problems. We describe three ways of implementing the functionFindMaxState(∆, R∗).

Full State Space Exploration Assume that for the time instant t we have p(t) with asupport It,0. We consider all the states x ∈ It,0 and create possible one-step successorsy = x + vj neglecting the DTMC-probabilities value. In such way we obtain It,1. Thesame procedure is repeated for all states y ∈ It,1 in the same fashion. Finally we obtainR-steps extension of the It,0 w.r.t. all possible events of the type m, m ∈ 1, . . . ,M.It is important to note that we create only ”fake” states which contains no values forDTMC or CTMC probability. It results in a huge rise of the state set because we cannot judge whether a certain state will be needed later (i.e. it will get enough probabilitymass) or not. Having the set It,R we can find the state xmax. However it can be acase where several states have a maximal exit for different values of time τ in the sameinterval [t, t+ ∆). Such a case is shown in Figure 10.

38

Figure 10: Two exit rate functions can get the maximal value on a different subintervalsof the [t, t+ ∆). The resulting overall rate function Λ(τ) is shown in red.

For such a case we have to divide the interval [t, t + ∆) into a finite number H ofsubintervals

t = t0 < t1 < . . . < tH = t+ ∆

and then define the function Λ(τ) as a sum

Λ(τ) =H∑h=0

α0

(xmaxh

, τ)

xmaxh

= arg maxx∈It,R

α0(x, τ), τ ∈[th, th+1

)Now we can fill in these ”fake” states with real probability values. The implementation

of this technique requires much more memory and expensive in the sense of running timessince big amount of states is created and then deleted as redundant. The only advantageof this technique is that it can be applied for systems where αj(x, t) are not monotonicallyincreasing functions in state variables. For chemical reaction networks with big value oftime horizon T this method becomes infeasible. The Figure 11 shows how this methodcan work for some MPM.

39

Figure 11: Illustration of the full state space exploration for the two-dimensional casewith the rate functions that are not monotone is state variables. The statexmaxIt,0

has maximal exit rate among all states in It,0. Then we scan for the statewith maximal exit rate in the R-transition extension of It,0, that is xmax

It,R. We

put Λ(τ) = α(xmaxIt,R

, τ) and fill in the ”fake” states with probability values.

Modified DMTC Evolution Method Recall that all rate functions αj(x, t) are assumedto be monotonically increasing in state variables (see Section 4.1). We exploit that thechange vectors are constant and define the subclass of events for which change vectorscontain at least one positive entry. This is done because the state having lower values forstate variables can not have the bigger exit rate. However, if the change vector containsboth positive and negative entries, we need to consider the correspondent event. Withoutloss of generality we can assume that these vectors correspond to the first M ′ events.For each dimension k ∈ 1, . . . , n

vmaxk = max

j∈1,...,M ′vjk

where vjk is the k-th entry of the change vector vj . For the set It,0 we evaluate the statewith the maximum values for state variables in each dimension k ∈ 1, . . . , n

ymaxk = max

y∈It,0yk

We now find the state xmax which is guaranteed to have a higher exit-rate than any statein It,R∗ for all time-points in the interval [t, t+ ∆) as follows

xmaxk = ymax

k +R∗ · vmaxk

and state variables of xmaxk are upper bounds for the state variables appearing in It,R∗ .

Thus the exit rate of the state xmax = (xmax1 , . . . , xmax

n ) is upper-bound for the exit rates

α(xmax, τ) ≥ α(x, τ)

for all τ ∈ [t, t + ∆). And we can define the function Λ(τ) = α0(xmax, τ) as before.Essentially here the same idea as before is used but we do not need to create all in-termediate state to realize which one has the maximal exit rate. It results in savings

40

of memory storage and it is computationally much more effective. Figure 12 shows theexample of determining xmax in this fashion.

Figure 12: Illustration of the modified DTMC evolution method for the two-dimensionalcase with the rate functions that are monotone is state variables. The statexmaxIt,0

is calculated as xmaxIt,0

= (xmax1 , xmax

2 ). Then the state xmaxIt,R

is approxi-mated and we can propagate the probability using Λ(τ) = α(xmax

It,R, τ).

Statistical Moments Approximation In paper of Engblom [6] the method is introducedto approximate moments of the n-th order of a MPM in case when the system describedby the chemical master equation (CME, see Section A.2). This approximation assumesthat the expectations and the covariances change continuously and deterministically intime. We approximate the means Ek(τ) = E[Xk,τ ] and the variances σ2

k(τ) = VAR[Xk,τ ]for all dimensions (species) k ∈ 1, . . . , n. Recall that for a given n-dimensional randomvector X the mean is given by

EX =

EX1

EX2...

EXn

and we use the following equation for computation

d

dtmk = −

M∑j=1

vkj

αj(m) +n∑

k1,k2

∂2αj(m)∂xk1∂xk2

Ck1,k2

2!

where mk = mk(τ) = Ek(τ). Denote by xmean the state for which state variables aregiven by the vector m and αj(m) = αj(xmean, t). However it can be a case that thenewly obtained values for state variables of xmean are not integer numbers. Than for astate variable s we take the value sl = bsc (the closest integer, which is smaller than s)if the overall rate function is decreasing in s. On the other hand, if the function Λ(τ) isincreasing in a state variable s, we take the value su = dse (the closest integer, which is

41

greater than s). For the covariance matrix Ck1k2 = E [(Xk1 −mk1)(Xk2 −mk2)]

d

dtCk1k2 =−

M∑j=1

vk1j

n∑k3=1

∂αj(m)∂xk3

Ck3k2

1!+ vk2

j

n∑k4=1

∂αj(m)∂xk4

Ck1k4

1!

+

M∑j=1

v[k1,k2]j

αj(m) +n∑

k3,k4

∂2αj(m)∂xk3∂xk4

Ck3k4

2!

where v[k1,k2]

j = vk1j v

k2j and vj = −vj . In this way we obtain the ODE system which

has to be solved. Note that we calculate covariance between dimensions (species) of thesystem. It is important that the mean values and the covariances are calculated w.r.t.CTMC probability of the correspondent states:

E(Xk) =∑x∈It,0

x.ctmc · x.sk

Ck1,k2 =∑x∈It,0

p(x.sk1 , x.sk2) (x.sk1 − E(X1)) (x.sk2 − E(X2))

where p(x.sk1 , x.sk2) refers to the joint distribution P (Sk1 = sk1 , Sk2 = sk2) for all pairs ofspecies (dimensions) k1, k2 ∈ 1, . . . , n. The state x has certain values for state probabil-ities x.s1, . . . , x.sn and the latter probability can be considered as P ([S1, . . . , Sn](t) = (s1, . . . , sn)).Summing up the probabilities for all states with the same pair (sk1 , sk2) we get the neededvalue for the joint distribution.

Finally we obtain an equation system with n · (n + 1) equations. By solving it, wederive σ2

k(τ) = Ck,k for each dimension k and determine the time instant τ ∈ [t, t+∆) atwhich Ek(τ) + ` ·σk(τ) is maximal for some fixed `. We use this maximum to determinethe spread of the distribution, i.e. we assume that the values of Xτ will stay belowxmaxk = Ek(τ)+`·σk(τ). Note that a more precise approach is to consider the multivariate

normal distribution with mean E[Xτ ] and covariance matrix COV [Xτ ]. But since thespread of a multivariate normal distribution is difficult to derive in higher dimensions,we consider each dimension independently. We now have xmax = (xmax

1 , . . . , xmaxn ). If

during the analysis a state is found which exceeds xmax in one dimension then we repeatour computation with a higher value for `. The state xmax indeed defines the overallrate function Λ(τ) and the speed of correspondent IPP N . Figure 13 shows the exampleof determining xmax in this fashion.

42

Figure 13: Illustration of the Engblom approximation method for the two-dimensionalcase. The state xmax is approximated using the calculated mean values andcovariances. By putting Λ(τ) = α(xmax, τ) we can propagate the probabilitymass. The state xmax is a state in significant part of the state space whichhas the maximal exit rate after the propagating process.

It also should be noted that we solve this ODE system for the time interval [t, t+ ∆)where xmax is chosen on-the-fly in the same time with ∆. In comparison to the previousapproaches where we know the xmax in advance and only then calculate ∆ using it, herewe have to determine time step and maximal reachable state for it. This procedurerepeats until we do not satisfy the condition R = R∗ on the right truncation point forPoisson probabilities. The computational effectiveness here depends on the system inquestion.

5.4 Complete Algorithm

Our complete algorithm proceeds as follows. Given an initial distribution p(t0) with finitesupport It0,0, a time-bound T , thresholds δ and ε, and a desired right truncation pointR∗,we first set t = 0. Now we compute a time-step ∆ and the state xmax using Algorithm 4with inputs R∗,t,T , and ε. We then approximate the transient distribution pt+∆ usingan on-the-fly algorithm of the bounding approach or approximate uniformization, wherethe state space is dynamically maintained and states with probability less than δ arediscarded (see Section 5.1). For the rate function Λ we use the exit-rate of state xmax.When computing DTMC probabilities, we use exact formulas for the first two termsgiven by Arns et al. [2] of the sum in case of bounding approach or use the meanapproximation. This gives us the approximation pt+∆ with finite support It+∆,0. Wenow set t = t + ∆ and repeat the above step until we have t = T . The correspondentpseudo-code in given in Algorithm 6.

43

Algorithm 1: Propagate phase for bounding approachInput: th: current time instant, th+1: next time instant, δ: probability thresholdfor x ∈ It,i do1

for j ∈ 1, . . . ,M do2

uj := minτ∈[th,th+1)αj(x,τ)

Λ(τ) ;3

zj := uj · x. dtmc ;4

if zj > δ then5

// Calculate the successor state

x(new) := x+ vj ;6

if x(new) /∈ It,i then7

It,i := It,i ∪ x(new) ;8

x(new).temp := x(new).temp +zj ;9

end10

end11

end12

// Calculate z0

z0 := u0 · x.dtmc;13

x.temp := x.temp +z0;14

end15

44

Algorithm 2: Propagate phase for approximate uniformization approachInput: th: current time instant, th+1: next time instant, δ: probability threshold// Calculate Λ for a given time interval

Λ := 1(th+1−th)

th+1∫th

Λ(τ) dτ ;1

for x ∈ It,i do2

for j ∈ 1, . . . ,M do3

lj := αj(x,τ)

Λ;4

zj := lj · x. dtmc ;5

if zj > δ then6

// Calculate the successor state

x(new) := x+ vj ;7

if x(new) /∈ It,i then8

It,i := It,i ∪ x(new) ;9

x(new).temp := x(new).temp +zj ;10

end11

end12

end13

// Calculate z0

z0 :=(

1−∑M

j=1 lj

)· x.dtmc;14

x.temp := x.temp +z0;15

end16

Algorithm 3: Collect phaseInput: δ: DTMC probability thresholdfor x ∈ It,i do1

x.dtmc := x.temp;2

x.temp := 0;3

// Remove the state if it has not enough probability massif x.dtmc < δ then4

It,i := It,i \ x;5

end6

end7

45

Algorithm 4: Time step size searchInput: R∗: right truncation point for Poisson probabilities, ε: threshold for

Poisson probabilities, t: current time instant, T : time horizonOutput: ∆: resulting time step, xmax: predicted state with maximal exit rate in

It,R∗

// Determine the upper bound for ∆∆+ := T − t;1

// Determine state with the maximal exit ratexmax :=FindMaxState(∆, R∗);2

Λ[t,t+∆+) := CalculateUniformizationRate(t, t+ ∆+, xmax);3

R+ := FoxGlynn(Λ[t,t+∆+) ·∆, ε);4

if R+ ≤ R∗ then5

R− := R+;6

∆− := ∆+;7

end8

else9

// Determine the lower bound for ∆R− := 0; ∆− := 0;10

end11

// Start binary searchwhile R 6= R∗ do12

∆ := ∆+−∆−

2 ;13

Λ[t,t+∆) := CalculateUniformizationRate(t, t+ ∆, xmax);14

R := FoxGlynn(Λ[t,t+∆) ·∆, ε);15

if R− < R∗ < R then16

R+ := R;17

∆+ := ∆;18

end19

else if R < R∗ < R+ then20

R− := R;21

∆− := ∆;22

end23

end24

46

Algorithm 5: Probability mass move procedure

Input: p(th−1): transient probability distribution from the previous time iteration,R∗: right truncation point for Poisson probabilities, , [th, th+1): timeinterval, Λ: uniformization rate, δ: DTMC probability threshold

Output: p(th+1): transient probability distribution at time th+1

// Initialize the current distribution vector p(th)

for x ∈ It,0 do1

p(th)(x) := 0;2

end3

i := 0;4

while i < R∗ do5

// Compute the vector of DTMC probabilities π(i)

Collect(δ) ;6

for x ∈ It,i do7

// Compute the vector of CTMC probabilitiesβi :=FoxGlynnProbability(Λ · (th+1 − th), i, ε) ;8

p(th)(x) := p(th)(x) + βi · π(i)(x);9

end10

Propagate(th, th+1, δ) ;11

i := i+ 1;12

end13

Algorithm 6: General procedure of uniformization for MPM

Input: p(t0): initial distribution, T : time horizon, δ: threshold for DTMCprobability, ε: threshold for Poisson probabilities sum, R∗: righttruncation point for Poisson probabilities

Output: p(t1), p(t2), . . . , p(T ): sequence of transient probability distributionstcurrent := t0;1

// Calculate time step ∆ and the state with maximal// exit rate xmax using Algorithm 4(∆, xmax) := TimeStepSearch(R∗, ε, t0, T );2

tnext := tcurrent + ∆;3

Λ[tcurrent,tnext) := CalculateUniformizationRate(tcurrent, tnext, xmax);4

while tcurrent < T do5

// Conduct probability mass move using Algorithm 5

p(tnext) := Probability mass move(p(tcurrent), R∗, [tcurrent, tnext), Λ, δ

);6

tcurrent := tnext (∆, xmax) := TimeStepSearch(R∗, ε, tcurrent, T );7

tnext := tcurrent + ∆;8

Λ[tcurrent,tnext) := CalculateUniformizationRate(tcurrent, tnext, xmax);9

end10

47

6 Case Studies

We implemented the described uniformization approaches in C/C++ and ran experi-ments on a 2.4GHz Linux machine with 4 GB of RAM. We apply these methods to theMarkov population models that refer to chemical reaction networks. According to thetheory of stochastic chemical kinetics (see Section A), the form of the rate function ofa reaction depends on how many molecules of each chemical species are needed for oneinstance of the reaction to occur. In case of time-dependent systems (we assume thatall chemical reactions are elementary) this gives the following dependency:

• If no reactants are needed then the reaction is of the form ∅ → . . . and αj(x, t) =kj ·V (t), where kj is a positive constant and V (t) is the volume of the compartmentin which the reaction takes place.

• If one molecule is needed (the reaction refers to Si → . . .) then αj(x, t) = kj · xi,where xi is the number of molecules of type Si. Thus, in this case, αj(x, t) isindependent of time.

• If two distinct molecules are needed (case Si+S` → . . .) then αj(x, t) = kj

V (t) ·xi ·x`.

Other examples may contain non-elementary reactions and thus a realistic biologicalmodel may contain different volume dependencies. But since the focus is on the numer-ical algorithm, we do not aim for an accurate biological description here.The first considered system is the gene expression which was introduced in Example 2.The parameters are chosen as k1 = 0.05, k2 = 0.0058, k3 = 0.0029, and k4 = 10−4,where k3 and k4 correspond to a half-life of 4 minutes for mRNA and 2 hours for theprotein [13]. As explained in Example 2, we assume that the cell cycle time is one hourand that the cell growth is linear. We assume that only the first transition class is time-dependent and the values uj(x, t, t+ ∆) are determined analytically. Thus the system isdescribed by the following reactions, propensities and change vectors:

1. ∅ → mRNA, α1(x, t) = k1 · V (t) = 0.05 ·(1 + t

3600

), v1 = (1, 0)

2. mRNA→ mRNA + P , α2(x, t) = k2 · x.sR = 0.0058 · xR, v2 = (0, 1)

3. mRNA→ ∅, α3(x, t) = k3 · x.sP = 0.0029 · xP , v3 = (−1, 0)

4. P → ∅, α4(x, t) = k4 · x.sP = 10−4 · xP , v4 = (0,−1)

where xR and xP denotes the values of state variables in state x. We started the systemat time t = 0 in state s = (0, 0) with probability 1.The second system we consider is a gene regulatory network called the exclusive switch [11].It consists of two genes with a common promotor region. Each of the two gene productsP1 and P2 inhibits the expression of the other product if a molecule is bound to thepromotor region. More precisely, if the promotor region is free, molecules of both typesP1 and P2 are produced. If a molecule of type P1 is bound to the promotor region, onlymolecules of type P1 are produced. If a molecule of type P2 is bound to the promotor

48

region, only molecules of type P2 are produced. No other configuration of the promo-tor region exists. The probability distribution of the exclusive switch is bistable whichmeans that after a certain amount of time, the probability mass concentrates on twodistinct regions in the state space. The system has five chemical species (however inthe implementation it is easer to distinguish between the molecule with bounded P1 andP2) of which two have an infinite range, namely P1 and P2. All the other can be ei-ther zero or one. Assume that state variables are s = (sGe1 , sGe2 , sP1 , sP2 , sGe2.P1 , sGe1.P2),where Ge1.P2 and Ge2.P1 denotes the molecule with P2 and P1 attached to the promotorrespectively. We define the transition classes ηj = (Gj , uj , αj), j ∈ 1, . . . , 8 as follows:

1. The first reaction Ge1 → Ge1 + P1 describes the production of P1 by G1 = x ∈N6 | xGe1 > 0, u1(x) = x+ e3, α1(x, t) = k1 · xGe1 = 0.05 · x.sGe1 .

2. The second reaction Ge2 → Ge2 +P2 describes the production of P2 by G2 = x ∈N6 | xGe2 > 0, u2(x) = x+ e4, α2(x, t) = k1 · xGe2 = 0.05 · x.sGe2 .

3. The third reaction P1 → ∅ describes degradation of P1 by G3 = x ∈ N6 | xP1 > 0,u3(x) = x− e3, α3(x, t) = k2 · xP1 = 0.005 · xP1 .

4. The fourth reaction P2 → ∅ describes degradation of P2 by G4 = x ∈ N6 | xP2 >0, u4(x) = x− e4, α4(x, t) = k2 · xP2 = 0.005 · xP2 .

5. We model the binding of P1 to the promotor by Ge1 +Ge2 + P2 → Ge1.P2 +Ge2

and G5 = x ∈ N6 | xGe1 = 1, xGe2 = 1, xP2 > 0, u5(x) = x − e1 − e4 + e5

α5(x, t) = k3 · xGe1 · xGe2 · xP2 · V (t) = 0.1 · xGe1 · xGe2 · xP2 · 11+t/3600 .

6. We model the binding of P2 to the promotor by Ge1 +Ge2 + P1 → Ge2.P1 +Ge1

and G6 = x ∈ N6 | xGe1 = 1, xGe2 = 1, xP1 > 0, u6(x) = x − e2 − e3 + e6

α5(x, t) = k3 · xGe1 · xGe2 · xP2 · V (t) = 0.1 · xGe1 · xGe2 · xP2 · 11+t/3600 .

7. For unbinding of P1 we define Ge1.P2 → Ge1+P2 and G7 = x ∈ N6 | xGe1.P2 > 0,u7(x) = x+ e1 + e4 − e5, α6(x, t) = k4 · xGe1.P2 = 0.005 · xGe1.P2 .

8. For unbinding of P2 we define Ge2.P1 → Ge2+P1 and G8 = x ∈ N6 | xGe2.P1 > 0,u8(x) = x+ e2 + e3 − e6, α6(x, t) = k4 · xGe2.P1 = 0.005 · xGe2.P1 .

Here the vector ej is such that all its entries are zero except the j-th entry which is one.Only the rate functions α5 and α6 are time-dependent (they refer to the binding of aprotein to the promotor region). The reason is the following. If the cell volume grows itbecomes less likely that a protein molecules is located close to the promotor region. Westarted the system at time t = 0 in state s = (1, 1, 0, 0, 0, 0) with probability 1.

6.1 Experiments

Let us start with investigating the properties of numerical algorithm for the first de-scribed system (gene expression). We need to analyse the influence of parameters thathas to be chosen. Most important ones are the DTMC probability threshold δ and the

49

right truncation point R = R∗ for the Poisson probabilities (R∗ is defined as in Sec-tion 5.2). Also we consider two methods to find the state having the maximal exit ratexmax ∈ It,R. For the gene expression system the running time to cover the whole periodof 3600 seconds (see Example 2) is large, so we stick to considering the smaller timeinterval [0, T ] = [0, 100].

Approximate Uniformization As described in the Section 4.5, we can not establishany bounds for the values of transient probabilities p(t) when using the approximateuniformization method. It can be used as a fast way to obtain an approximation for thegiven MPM. Let us show several examples of how it behaves and later on we stick tothe bounding approach.

xmax searchmethod

RTotalerror

Exec.time

Iterationsin time

Total numberof iterations

DMTC evolution 10 2.2013 · 10−2 80 144 1436method 20 6.8257 · 10−4 3 17 368

40 1.8377 · 10−7 1 7 248Engblom 10 1.7116 · 10−2 272 1299 13021

approximation 20 2.0376 · 10−3 5 51 107140 4.4894 · 10−5 1 7 346

Table 1: Approximate uniformization for gene expression, δ = 10−10

For the values of right truncation point we choose R = 10, 20, 40 and observe thevalues of total CMTC probability loss (see column ”Total error”), execution time (inseconds, see column ”Exec. time”), the number of iterations in time and total numberof iterations (see columns ”Iterations in time” and ”Total number of iterations”) Totalnumber of iterations is determined as Ni = H ·R∗+Rlast, where H denotes the numberof subintervals in [0, T ] and Rlast is the number of DTMC steps for the last interval[tH−1, tH ] (it can be less than R∗). We also stop the computation process when the totalCTMC probability loss reaches 5%. The amount of states in a truncated state space isabout 5000 states in the considered cases.We can see, that for a given system DTMC evolution method is more preferable bothin terms of the execution time and the achieved error. For the case of Englbom methodwe obtain over-approximation for the state xmax. It results in large values for Λ oneach subinterval (Λ determines the speed of IPP). Typically for this system we obtainthe values 0.6 ≤ Λ ≤ 3 when the DTMC evolution approximation is used and in caseof Engblom approximation we obtain 1 ≤ Λ ≤ 50. Another important issue is theconnection between R and the length of each subinterval. Using the greater value for Rwe obtain the bigger time steps for gene expression system. Thus the number of iterationsin time decreases when R increases. The same holds for a total number of iterations.The computational error decreases with the increasing of R for a given system and it isgreater in case of Engblom approximation. The reason for it is the over-approximation of

50

the state variables of xmax which results in over-approximation of Λ thus the transitionprobabilities become small and go below the threshold. The behaviour of Engblommethod is not so strictly determined and it requires more advanced techniques to obtainthe accurate approximation. The behaviour of DTMC evolution method to determinethe xmax is more predictable as we do not have to make ”tries” that result in over-approximation.

Now let us investigate the behaviour of approximate uniformization for the lowerDTMC probability threshold δ = 10−12 (Table 2):

xmax searchmethod

RTotalerror

Exec.time

Iterationsin time


DMTC evolution 10 9.7426 · 10−4 55 139 1388method 20 2.16 · 10−7 14 17 340

40 2.3057 · 10−9 1 7 258Engblom 10 1.5113 · 10−2 2868 759 7595

approximation 20 1.0316 · 10−5 3 36 76940 7.7645 · 10−8 1 6 270

Table 2: Approximate uniformization for gene expression, δ = 10−12

In case of the smaller probability threshold the we obtain 0.6 ≤ Λ ≤ 4 for the valuesof IPP rate when the DTMC evolution method is used. If Engblom method is used forapproximation of xmax we obtain 1 ≤ Λ ≤ 60. This behaviour is caused by the factthat more states are now considered to be significant, i.e. their DTMC probability fallsabove the threshold. It results in the increased size of the state space It,R (it has about70000 states) and the exit rate of xmax. Due to the same reason, the probability lossvalues get smaller for the same values of R and different δ. In general, the dependenciesbetween parameters are the same as before. The Engblom approximation is not efficientenough for the gene expression system since it provides larger values of probability lossand running time for same values of R. We can conclude that the use of approximateuniformization for a gene expression system is effective under the large R and DTMCevolution approximation method since it provides the small value for a total error (thesystem contains only one linear time-dependent reaction which can be approximatedwell by its average value even for a large time steps).

Another issue to check is the numerical quality of the approximation. To investigatethis, we compute the solution of chemical master equation (see Section A.2) describinggene expression system using the MATLAB built-in ODE solver. We consider onlythe subspace of the whole state space I = x | x.smRNA ≤ NmRNA, x.sP ≤ NP where NmRNA, NP are chosen to be small, e.g. 100. By such truncation we lose someprobability mass, but this loss is negligible for a time horizon T = 100. The code forMATLAB procedure is given in Section B. We compare the probability values for acertain state and compare it to the solution given by approximate uniformization (seeFigure 14):

51

Figure 14: Comparison of the transient probability values for the state x with x.s1 = 3and x.s2 = 3. The brown line refers to the values obtained with MATLABODE solver and green one refers to the values computed using the approxi-mate uniformization method.

The relative error for the state probability p((x)) with x.s1 = 3 and x.s2 = 3 reaches9.4 · 10−3 for a time instant t = 100. Approximately the same results are obtained forother states x of this MPM that have probability p(x) > 10−3. The quality is also illus-trated by Figure 14. However it indeed depends on the dynamical system itself, on thetime-dependent rate functions and the choice of parameters.As it was said before, we can not provide tight bounds for transient probability distribu-tion and from now on we will consider the bounding approach which can do this. Also,we stick to discussing only the DMTC evolution method of determining xmax in case ofthe gene expression system as it has nice predictable behaviour.

Bounding Approach We start again with the discussion of the first introduced chem-ical reaction network (gene expression). Initially, let us check if the probability valuesfor certain state falls into the bounds defined by the bounding approach. In Figure 15we plotted values obtained using the approximate uniformization.

52

Figure 15: Comparison of the transient probability values for the state x with x.s1 = 4and x.s2 = 0. Brown lines refer to the bounds obtained with boundingapproach and blue dots refer to the probability values computed using theapproximate uniformization method.

In Figure 16 we compare our probability bounds with the values obtained using MAT-LAB ODE solver.

53

Figure 16: Comparison of the transient probability values for the state x with x.s1 = 4and x.s2 = 0. Brown lines refer to the bounds obtained with boundingapproach and green dots refer to the probability values computed using theMATLAB ODE solver.

We can see that in both cases the probability values fall into the computed bounds.The approximate uniformization method provides results of the acceptable accuracywhen the rates change only moderately in time and the transition probabilities then canbe approximated well using the average value on the time interval. However the weakside is that we can not establish tight error bounds.Now let us choose different values for the reaction rate constants (k1 = 0.1, k2 = 0.0021,k3 = 0.0043, k4 = 0.0002), R = 10 and δ = 10−10. We can see that the values oftransient probability given by the MATLAB ODE solver fall into the bounds given byour implementation for such modified gene expression model (see Figure 17).

54

Figure 17: Comparison of the transient probability values for the state x with x.s1 = 4and x.s2 = 0 with alternative reaction rate constants. Brown lines refer tothe bounds obtained with bounding approach and green dots refer to theprobability values computed using the MATLAB ODE solver.

Now we investigate the properties of the bounding approach w.r.t. parameter values.The comparison is given in the Table 3:

RTotalerror

Exec.time

Iterationsin time


10 6.0872 · 10−3 79 129 128620 6.5133 · 10−3 21 17 33640 1.8729 · 10−2 1 7 247

Table 3: Bounding approach for gene expression, δ = 10−10

Let us discuss results obtained using the bounding approach for the gene expressionsystem. We can see that if the number of reaction occurrences per time interval Rincreases the resulting error also increases. It is caused by the properties of under-

55

approximation of the transition probabilities. The approximation quality becomes worsefor increasing R since we take the minimal value of the transition probability for largertime interval. The range of IPP rate is 0.6 ≤ Λ ≤ 2.6. The truncated state spacecontains about 4000 states. It is less than in case of approximate uniformization since theless number of states have now significant probability mass (more transition probabilityvalues falls below the threshold). And we can also observe the straightforward connectionbetween the total number of iterations and the running time of the program.

Let us now use smaller value for the threshold (δ = 10−12). Results are given in theTable 4:

RTotalerror

Exec.time

Iterationsin time


10 5.3951 · 10−4 140 157 156820 5.0959 · 10−3 14 18 35140 1.7951 · 10−2 1 7 258

Table 4: Bounding approach for gene expression, δ = 10−12

As expected, we observe the lower values for probability loss since the larger numberof states in truncated state space is considered to be significant. The range of the valuesof Λ is larger (0.6 ≤ Λ ≤ 3.2). The reason is that we have more states with a probabilityvalue falling above the threshold in comparison to the case of δ = 10−10. And forgene expression system it means that the state xmax (which determines the overall rate)has the larger values of state variables. Since the rate function Λ(t) is monotonicallyincreasing in xR and xP , we obtained the larger value for the IPP rate. Other generalregularities stay the same in case of approximate uniformization.

Now we turn to the second presented chemical reaction network (exclusive switch)and consider the time horizon T = 3600. The main difference in comparison to the geneexpression is that the system is bi-stable and the truncated state space contains fewerstates. It should be mentioned that time-dependent rates for this dynamic system aremonotonically decreasing functions in time. For it we will obtained the following results(Table 5):

RTotalerror

Exec.time

Iterationsin time


10 2.8053 · 10−2 1056 2153354 2153353620 4.7767 · 10−2 3114 2491887 4983774040 > 5 · 10−2 > 12243 1036826 >> 41473040

Table 5: Bounding approach for exclusive switch, δ = 10−10

We again observe the grows of the total computational error with increasing of pa-rameter R. However in contrast to the gene expression chemical reaction network, here

56

we observe that both the number of iterations in time and the total number of itera-tions gets larger for larger R. The reason for such behaviour is the following. Sincesystem has 2 regions where probability mass is concentrated in, the DTMC evolutionmethod always gives large over-approximation of xmax. It results in the larger values forΛ (556 ≤ Λ ≤ 17922). This means that for larger R we obtain the smaller time step sizeand this gives a rise to the iteration number. We should notice that such behaviour iscaused both by the properties of the system and the usage of DTMC evolution method.The truncated state space contains about 100 states. Also, the big amount of statesare not created (because the probability inflow is less than a threshold) and this factprovides large probability loss. For the smaller value of δ we obtain the result given inTable 6:

RTotalerror

Exec.time

Iterationsin time


10 9.924 · 10−4 1098 1876761 1876760820 1.0892 · 10−2 18579 2533284 5066568040 2.1193 · 10−2 195158 5507378 220295087

Table 6: Bounding approach for exclusive switch, δ = 10−12

For this smaller threshold we observe the increasing of the running time and decreas-ing of probability loss. Also the IPP speed Λ becomes larger (571 ≤ Λ ≤ 19276). Thetruncated state space contains about 200 states.We can also analyse the behaviour of Engblom approximation for exclusive switch reac-tion network. Computation for the whole interval [0, 3600] is infeasible when applyingthis technique for small values of R, so we compare results for T = 100, δ = 10−12 andR = 20, 40, 60. Results are given in Table 7:

RTotalerror

Exec.time

Iterationsin time


20 1.8201 · 10−5 7285 241108 482218240 4.587 · 10−5 1021 21875 87504860 1.3615 · 10−4 284 3639 218344

Table 7: Bounding approach for exclusive switch, T = 100, Engblom approximation

We can observe that with increasing R both the execution time and the total numberof iterations gets smaller. This shows that Engblom method provides the estimationwhich based on the accumulated ”statistics” about the probability distribution and canprovide the better prediction for xmax. It should be noticed than the process of ODEsolution for Engblom method does not seriously influences the running time (it takesonly about 8% of the whole running time per iteration). However the approximation

57

is still of a poor quality (1108 ≤ Λ ≤ 9628) but it is better than in case when DTMCevolution is used. We can see it from the next Table 8:

RTotalerror

Exec.time

Iterationsin time


20 7.0941 · 10−6 589 100071 200142040 9.4663 · 10−5 2706 189536 758142960 3.107 · 10−4 8761 297812 17868679

Table 8: Bounding approach for exclusive switch, T = 100, DMTCe approximation

The running times are much larger and also the approximation of xmax is of extremelypoor quality (2944 ≤ Λ ≤ 55038). This reveals the nice behaviour of the Engblommethod for a given system and small T . The results obtained with DTMC evolution areexpectable as it always gives the xmax according to the same rule (it gives a state whichlies R transitions ”further” even if it will not be actually reached after R transitions).This behaviour is shown in Figure 18.

Figure 18: Comparison of Engblom and DTMC evolution approximations for the case ofexclusive switch reaction network. The DTMC evolution gives approximationwhich is not reachable on the next iteration in time. The Englom methodprovides the better estimation.

58

Summarizing, we can say that the approximate uniformization technique can be usedwhen there is a need to compute an approximation of the state probabilities vectorand the error bounds do not play so important role. When the bounding method isused we can control the reachable accuracy by adjusting the parameter R and we cancompute the tight bounds for the state probability. There exists a certain compromisebetween the total number of iterations and achieved accuracy. If there is a need tocompute rough approximation for the vector of state probabilities than the usage oflarger values for R is better. On the other hand, the better accuracy can be obtainedwith the smaller value R but the execution time can be much longer. We have alsoconsidered two different techniques for prediction of the future behaviour of the biologicalsystem (i.e. approximation of the state xmax which has the largest exit rate over allstates in a state space). For systems like gene expression the DTMC evolution methodis more preferable since its behaviour is well-defined. The behaviour of the Engblommethod can not be predicted well and it seriously depends on the properties of thesystem. The considered case studies and experiments show that the given methods arecomputationally applicable for MPMs that describe reaction rate networks. They canbe also implemented to any MPM with time-dependent rate functions.

59

7 Conclusions

In this master thesis we propose a method to compute the transient probability distribu-tion for the inhomogeneous Markov Population Models (MPMs) that can have infinitestate space. We combine the techniques applied to the homogeneous Markov chains withpossibly infinite state space and inhomogeneous Markov chains with finite state space.To obtain the unique solution we introduce the state space truncation method. It ispossible to conduct an analysis of systems having large state spaces since we never storematrices in the explicit form and rule the computation process operating the probabil-ity flow conception. Due to the presence of time-dependent rate functions in a modelwe address the problem of choosing the time step size. This task incorporates complexinter-dependencies and we present the approach to control the computation process with3 basic parameters that have intuitive meaning. Two different techniques to obtain thetransient distribution are provided. The first approach (approximate uniformization)can be applied to create an overview for the system behaviour. The second approach(bounding method) can provide the tight bounds for the probability distribution valesand can be used for the detailed analysis. We show the feasibility of our approachproviding the experimental results for two reaction rate networks: gene expression andexclusive switch.

7.1 Future work

We presented three approaches to predict the evolution of MPMs for the given timeinterval. More precise analysis is needed to provide some recommendations concerningthe application of each technique. The first presented approach (full state space explo-ration) can be applied to any system regardless the monotonicity properties of the ratefunctions α(x, t) w.r.t. state variables of x but it is computationally ineffective (it takesabout 60% of the whole running time to define the overall rate function since this processis closely coupled with time-consuming memory allocation operations). The problem tobe solved is to modify it in a way to provide the faster computation. The third proposedmethod (Engblom approximation) can also be treated in a more advanced way. Insteadof considering the one-dimensional normal distribution, we can use the multi-variatenormal distribution and incorporate the information on the sample moments of higherorder (third, fourth, etc.). One can make a try to combine these three approaches to getone efficient method for calculating the xmax for any type of the system.We can also slightly improve the precision of the approach by implementing differentnumerical integration and differentiation techniques depending on the actual form ofthe time-dependent rate functions. The implementation can be improved by the con-struction of more advanced data structure and the optimization of memory usage. Asmentioned in Section 4.4 that the precise terms can be computed analytically for a caseof exactly 3, 4, . . . events took place in the interval of interest. We can determine themand incorporate in the implementation to get more accurate result for small values of R.We restricted the class of the considered rate functions to the polynomials of at most thesecond order. But we can extend it and analyse the properties of computational scheme

60

like stability. The both presented uniformization approaches exploit the substitutionof overall rate function Λ by piecewise-constant approximation. More sophisticated ap-proximation techniques can be applied (splines, for instance) to provide the better fit.Finally, the main theoretical problem is to provide a proof that we obtain strict lowerbound for all kinds of considered dynamical systems using the given state space trunca-tion method. Using this proof we will be able to compute the estimation of total erroranalytically.

References

[1] A. Arkin, J. Ross, and H. H. McAdams. Stochastic kinetic analysis of developmentalpathway bifurcation in phage λ-infected escherichia coli cells. Genetics, 149:1633–1648, 1998.

[2] M. Arns, P. Buchholz, and A. Panchenko. On the numerical analysis of inhomo-geneous continuous time Markov chains. INFORMS Journal on Computing. Toappear.

[3] P. Bremaud. Markov Chains. Springer, 1998.

[4] E. Cinlar. Introduction to Stochastic Processes. Prentice-Hall, 1975.

[5] F. Didier, T. A. Henzinger, M. Mateescu, and V. Wolf. Fast adaptive uniformizationof the chemical master equation. In Proc. of HIBI, 2009. To appear.

[6] S. Engblom. Computing the moments of high dimensional solutions of the masterequation. Appl. Math. Comput., 180:498–515, 2006.

[7] B. L. Fox and P. W. Glynn. Computing Poisson probabilities. Communications ofthe ACM, 31(4):440–445, 1988.

[8] D. T. Gillespie. A rigorous derivation of the chemical master equation. Physica A,188:404–425, 1992.

[9] H. Kitano. Foundations of Systems Biology. The MIT Press, 2001.

[10] A. M. Law and W. D. Kelton. Simulation Modeling and Analysis. McGraw Hill,2000.

[11] A. Loinger, A. Lipshtat, N. Q. Balaban, and O. Biham. Stochastic simulations ofgenetic switch systems. Phys. Rev. E, 75(2):021904, 2007.

[12] W. J. Stewart. Introduction to the Numerical Solution of Markov Chains. PrincetonUniversity Press, 1995.

[13] M. Thattai and A. van Oudenaarden. Intrinsic noise in gene regulatory networks.PNAS, USA, 98(15):8614–8619, July 2001.

61

[14] A. P. A. van Moorsel and K. Wolter. Numerical solution of non-homogeneousMarkov processes through uniformization. In Proc. of the European SimulationMulticonference - Simulation, pages 710–717. SCS Europe, 1998.

[15] O. Wolkenhauer, M. Ullah, W. Kolch, and K. Cho. Modeling and simulation ofintracellular dynamics: Choosing an appropriate framework. IEEE Transactionson NanoBioscience, 3(3):200–207, 2004.

62

A Stochastic Chemical Kinetics

Reaction rate networks are usually described using the stoichiometric equations. Con-sider, for instance, the following equation:

A+B︸︷︷︸reactant species

−→ C︸︷︷︸product species

The upper-case letters denote different types of molecules (chemical species). We referto the chemical species on the left hand side of the arrow as reactant species and tothose on the right hand side as product species. Stoichiometric equations specify whichreactant species are required for the reaction to occur and which are the products of thereaction. In the above example, a molecule of type A and a molecule of type B form amolecule of type C. Note that reactions may require/produce more than one molecule ofa certain type. In this case a stoichiometric coefficient is explicitly added, for instanceconsider the reaction channel

2A −→ D

This equation describes the dimerization process. As it was mentioned, we consider onlyelementary reactions which correspond to a single mechanistic step. In general, reactionsmay have intermediate products and parallel reaction pathways. However, they alwayscan be decomposed into elementary reactions. Mathematical models of natural systemsare usually idealized descriptions of the extreme complexity of the real system and thedetailed description would make the model intractable. To simplify the construction andanalysis of the models the following assumptions are usually used:

• The temperature and pressure are fixed

• The mixture is well-stirred (spatially homogeneous), which means that the moleculesare uniformly distributed over the reaction volume

Consider a reaction volume with chemical species S1, . . . , SN and chemical reaction typesR1, . . . , RM . An ideal model of the time evolution of the system would track the exactpositions and velocities of all molecules in the reaction volume. Whenever moleculescollide, chemical reactions may occur. Unfortunately, such a model is infeasible fornearly all systems. As a more abstract model, we assume the positions and velocities aregiven by probability distributions. The stochastic phenomena and usage of probabilisticmodels was already motivated in the introductory section. Recall that for such systemsthe state is given by the vector with discrete values related to the populations of eachchemical species.For the velocities of molecules we assume that the reaction volume is in thermal equi-librium. The velocity of a single molecule is then given by the Maxwell-Boltzmanndistribution. Since the position and velocity of each molecule is now known we havethat the number of molecules of each species is enough information to predict the futurebehaviour of the system. That is, given a vector (#S1, . . . ,#SN ) for the populations attime instant t, we can determine the populations at a time instant t + ∆. Under given

63

assumptions molecules collide randomly and chemical reactions occur at random pointsin time. We can define a chance experiment whose outcomes are functions ω : R≥0 → ZN+such that ω(t) = (x1, . . . , xN ) is the population vector at time t ≥ 0. The state of thesystem at time t can then be represented by a discrete random vector

~Xt = (Xt,1, . . . , Xt,N )

where Xi,t, 1 ≤ i ≤ N , represents the number of molecules of species Si at time t. Notethat t represent continuous time and we have uncountably many random vectors definedon the same probability space. Such a collection ( ~Xt)t≥0 can be described as a stochasticprocess (see Section 2). We refer to the countable co-domain ZN+ = 0, 1, . . .N as tothe state space. The functions t 7→ ~X(t)(ω), ω ∈ Ω are called trajectories. They arepiecewise constant and each jump refers to the occurrence of a chemical reaction. Thestate of the system at time t is then given by ~Xt(ω) = ω(t), that is, given an outcomeω, the random vector ~X(t) projects ω onto the value of ω at time t.

A.1 Transition Probabilities

Assume that the process ~X is in state ~x ∈ ZN+ at time t. Then the process is in state ~yat time t+ ∆ with probability

P(~Xt+∆ = ~y | ~Xt = ~x

)We call the above values transition probabilities. Since the populations of all chemicalspecies in the reaction volume at time t is enough information to predict the futureevolution of the system, we can exploit the Markov property, i.e. for all ∆ > 0, t ≥ 0,t0, t1, . . . tn ∈ [0, t) with t0 < . . . < tn and ~x, ~x0, . . . , ~xn ∈ ZN+ ,

P(~Xt+∆ = ~y | ~Xt = ~x, ~Xtn = ~xn, . . . , ~Xt0 = ~x0

)= P

(~Xt+∆ = ~y | ~Xt = ~x

)To determine the probability P ( ~Xt = ~x) we need to have the initial distribution. Weassume that P ( ~X0 = ~x0) is given for all x0 ∈ ZN+ . In this way we fix the distribution attime t = 0 which means that the distributions for all remaining time points can also becomputed.The effect of the occurrence of a chemical reaction of type Rj , 1 ≤ j ≤ M is given bythe change vector ~vj ∈ −2,−1, 0, 1, 2N of Rj . The i-th entry denotes the differencebetween the number of molecules of species Si gained by Rj and the number of moleculesof species Si consumed by Rj . Thus, if ~x is the current state of the system and an in-stance of reaction Rj occurs, the next state is ~x+ ~vj .Now let us consider the possible reactant combinations for different types of reactionchannels. Assume that the current state of the system is given by ~x = (x1, . . . , xN ). Ifthe reaction is of the form Si → products there are xi reactants in the system, whichyields xi different possible instances of the reaction. If the reaction is of the form

64

Si + Sk → products, i 6= k, there are xi · xk different ways to combine the reactants. Inthe case of 2Si → products, there are

(xi2

)= 0.5 · xi · (xi − 1) possible combinations of

two molecules of type Si. Finally, for ∅ → products, there is only one possible instanceof the reaction.Now we consider the infinitesimal time interval [t, t + ∆), which means that the prob-ability of more than one state change within [t, t + ∆) is negligible. The fundamentalpremise of stochastic chemical kinetics says that the occurrence probability for a reactionof type Rj within the next ∆ time units is proportional to the product of the number ofreactant combinations and the length ∆. It means that there exists a constant cj > 0such that

P(~Xt+∆ = ~x+ ~vj | ~Xt = ~x

)= cj ·Nj ·∆

where Nj denotes the number of reactant combinations for a reaction channel of typeRj . Thus, for a fixed combination of reactants, the infinitesimal transition probabilityis cj · ∆. The constant cj is called stochastic reaction rate constant and its existenceis guaranteed by physical theory [8]. The constant cj depends on the properties of thereactant species and on the temperature and the volume, which may change in time. Ifwe assume that the temperature and the volume of the system are fixed than reactionrate constants do not change in time. Otherwise the dependency is caused by the factthat the volume of the cell can be time-dependent and it indeed influences the value of cj .Let αj be the function that calculates the product cj ·Nj and cj are time-independent.If state of the system is given by ~x = (x1, . . . , xn) then for reaction Rj of the form:

• Si + Sk → products, i 6= k, we have αj(~x) = cj · xi · xk

• Si → products, we have αj(~x) = cj · xi

• 2Si → products, we have αj(~x) = cj · xi · (xi − 1) · 0.5

• ∅ → products, we have αj(~x) = cj

If constants cj are time-dependent as well, we have the following expressions for thefunctions αj describing the same reaction types Rj :

• Si + Sk → products, i 6= k, we have αj(~x, t) = cj(t) · xi · xk = cj · 1V (t) · xi · xk

• Si → products, we have αj(~x) = cj(t) · xi = cj · xi

• 2Si → products, we have αj(~x) = cj(t) ·xi · (xi− 1) · 0.5 = cj · 1V (t) ·xi · (xi− 1) · 0.5

• ∅ → products, we have αj(~x) = cj(t) = cj · V (t)

Here we denote the constant part of the cj(t) by cj . We can see that for the first andthe third reaction types the reaction rate becomes smaller for the increasing cell volumeV (t) since the larger volume of the cell is, the less likely is for molecules to collide andreact. For the second reaction type we do not have time-dependency under the variablevolume V (t). For the 4-th reaction type we obtain the larger value of cj(t) for the larger

65

V (t) and it is determined by the processes in the cell which are not explicitly definedfor a given chemical reaction network. It is also should be noticed that the reactionrate constants can also depend on the temperature but we do not consider this case ofdependency in this thesis. More information is given by Gillespie in [8].

A.2 Chemical Master Equation

Let us determine the state probabilities for X:

P ( ~Xt = ~x) =∑~y∈ZN

+

P(~Xt = ~x | ~X0 = ~y

)· P ( ~X0 = ~y)

where the initial distribution P ( ~X0 = ~y), y ∈ ZN+ is given and propensity functionsα1, . . . , αM are known. For an infinitesimal time step of length ∆, we have

P (Xt+∆ = ~x) = P (Xt+∆ = ~x | Xt = ~x) · P (Xt = ~x)

+M∑

j=1,~x−~vj≥0

P (Xt+∆ = ~x | Xt = ~x− ~vj) · P (Xt = ~x− ~vj)

where the first term relates to the probability of staying in the state ~x (no reactionoccurred) and the second one relates to the probability of changing the state from ~x−~vjto ~x, (reaction Rj occurred). Using the notation of propensity functions we obtain thefollowing:

P (Xt+∆ = ~x) =(

1−∑M

1 αj(~x) ·∆)· P (Xt = ~x)

+M∑

j=1,~x−~vj≥0

αj(~x− ~vj) ·∆ · P (Xt = ~x− ~vj)

We can divide both parts on the ∆ which is infinitesimal and thus obtain the derivativeof the state probability:

d

dtP ( ~Xt = ~x) = lim

∆→0

P ( ~Xt+∆ = ~x)− P ( ~Xt = ~x)∆

=

= −M∑j=1

αj(~x) · P ( ~Xt = ~x) +M∑

j=1,~x−~vj≥0

αj(~x− ~vj) · P ( ~Xt = ~x− ~vj)

This differential equation is called the Chemical Master Equation (CME). It describesthe system of the coupled ODEs since it describes the probability for every state ~x ∈ ZN+and probabilities for states ~y = ~x − ~vj are needed to compute the right-hand side.Given an initial probability distribution, the solution of the CME are the probabilitiesP ( ~Xt = ~x) for all states x. The intuitive meaning of the CME is that the derivative ofthe probability of state x is the difference between the probability inflow and outflow.The states are seen as nodes in a flow network and their probability is the amount offluid, which moves through the network according to the propensities.

66

Figure 19: Inflow and outflow of probability.

67

B MATLAB Routines

B.1 Solution of CME for Gene Expression

68

69

Documents

Uniformization for Time-Inhomogeneous Markov Population Models · PDF fileUniformization for Time-Inhomogeneous Markov Population Models ... 3.2 Uniformization for Time-Inhomogeneous