Deterministic models and stochastic simulations in multiple reaction

BioTechnologia vol. 92(3) C pp. 265-280 C 2011 Journal of Biotechnology, Computational Biology and Bionanotechnology REVIEW PAPER

Deterministic models and stochastic simulationsin multiple reaction models in systems biology

PAWEŁ LACHOR 1*, KRZYSZTOF PUSZYŃSKI 2, ANDRZEJ POLAŃSKI 1

1 Institute of Informatics, Silesian University of Technology, Gliwice, Poland 2 Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland

* Corresponding author: [email protected]

Abstract

In this paper, we have overviewed deterministic and stochastic approaches for the modeling of bio-molecularreactions in systems biology. We have described and compared different versions of stochastic simulationapproaches towards the modeling of bio-molecular reaction systems, the direct approach and the family of the tau-leap methods. We also have illustrated differences between different approaches by providing numerical examplesof computational analyses for selected models. Computational examples of application of stochastic simulationalgorithms involved two systems of different interacting bio-molecular species and one model from the area ofecology, describing interactions between two animal populations. Apart from the direct approach and tau-leapfamily of methods, we have also overviewed the so called hybrid algorithm of stochastic simulations, proven tobe very efficient when there are huge differences between reaction rates in the system.

Key words: stochastic simulation algorithms, direct algorithm, Tau-Leap approximated stochastic simulation algo-rithms, hybrid algorithm

Introduction

Models in systems biology, describing the dynamicsof interactions between bio-molecules, help to explainand understand many aspects of biological processesoccurring at the molecular level in cells of living orga-nisms. There are many classes of models, such as sig-naling pathways, gene regulation networks, transcriptionnetworks, metabolic networks, protein interaction net-works, etc., which describe dynamics of evolution andinteractions of populations of bio-molecular species.Numerous examples for models of bio-molecular systemshave previously been discussed in the literature (e.g. Hatet al., 2009).

Dynamics in models of evolution of bio-molecule po-pulations are often described by using the deterministic– limit approach based on systems of coupled first-orderordinary differential equations (ODEs) (Hat et al., 2009;Krishna et al., 2006). Deterministic approach is based onthe hypothesis that population sizes of interacting bio-molecules are large and as a consequence state vectorsof bio-molecular models can be defined by average popu-lation size of bio-molecular species (Butcher, 2003).There are very efficient methods for numerically solving

the systems of differential equations (Butcher, 2003).Deterministic models are therefore very easy to use.They also provide unique insights into bio-molecular in-teractions and help explain many mechanisms of evolu-tion in systems biology models.

However, it is well known that there are situationswhere limitation of the deterministic approach can ham-per capturing important properties of the system understudy. Most important limitations of deterministic ap-proach are as follows.1) Deterministic approach although efficient in termsof computational load, is not accurate for systems thatcontain low-rate reactions. These are most often relatedto species occurring in small molecular quantities.If a system contains bio-molecular species of low cellularconcentration (quantity) then representing this speciesin molecular reactions by its population mean can intro-duce either significant errors in model predictions orcan even change the model’s qualitative properties. 2) When the behavior of a bio-molecular system is stu-died for parameter ranges close to bifurcation points(which correspond to qualitative changes of systems dyna-mics) then, as mentioned above, limiting to average values

P. Lachor, K. Puszyński, A. Polański266

can cause overlooking important features of the systemdynamics. Also for systems whose evolution significantlydepends on (distribution) of initial conditions, such as bistable or multi stable systems, deterministic modelingmay not adequately describe the distributions of systemresponses.3) Deterministic approach ignores stochastic variationof time responses of the system under study. Investiga-ting stochastic variation of system trajectories is on onehand interesting by itself. On the other hand it is wellknown that stochasticity in systems biology models hasvery important biological/evolutionary meaning (Geva-Zatorsky et al., 2006) which provides another motivationfor its study.

Stochastic simulation approach is applied in order tostudy effects of random fluctuations in numbers of mole-cular species in systems biology models. Stochastic si-mulations algorithm (SSA) for interactions of bio-mole-cules was first proposed by Gillespie (Gillespie, 1976)for simple models of chemical reactions. Gillespie’s ideaof stochastic simulation is to record changes in bio-mole-cular species sizes vector over time, following from sub-sequent occurrences of different reactions. In each stepof the simulations, random draw of the reaction to occurnext is done on the basis of the value of reaction propen-sities. Gillespie’s paper led to the generation of a num-ber of subsequent literatures (e.g. Chatterjee et al.,2005; Cao et al., 2006). The original idea was developedby improving its computational efficiency and by fittingthe structure of the algorithm to the specific propertiesof the analyzed system.

This paper is aimed as a survey devoted to (i) des-cription of deterministic and stochastic approach tomodeling of bio-molecular systems, (ii) comparison bet-ween deterministic and stochastic approach, (iii) compa-rison and applicability study of different versions of sto-chastic simulations approach to modeling of bio-mole-cular systems. Survey papers devoted to variants of Gil-lespie algorithm have already appeared in the literature(Meng et al., 2004; Pineda-Krch, 2008). When comparedto the existing literature surveys, our paper is moregeneral in the following sense: (i) we illustrate differen-ces between different approaches by providing numeri-cal examples of computational analyses for selectedmodels; (ii) we include wider scope of variants of the Gil-lespie algorithm, namely apart from direct approach andtau-leap family of methods, we also overview and present

computational example for the so called hybrid algorithmof stochastic simulations (Haseltine and Rawlings, 2002),proven to be very efficient in situations where propen-sities of reactions in the studied system are of differentorders of magnitude.

The paper is structured as follows. In the sectionModels of interactions of bio-molecular species we pre-sent basic principles of bio-molecular interactions mode-ling. In the section Deterministic and stochastic simu-lation algorithms, overview of deterministic and sto-chastic methods in systems biology is presented. In thesection Computational complexity of stochastic simula-tion algorithms we discuss problem of computationalcomplexity of stochastic simulation algorithms. In thesection Computational examples we present exemplaryresults of applications of these methods, which providea base for quantitative comparisons of different appro-aches. Finally, in the Conclusion section, conclusionsfollowing from the whole study are summarized.

Models of interactions of bio-molecular species

Before we start a discussion on different approachesin simulating techniques, we introduce the notations tobe used in subsequent sections. We denote the reactionthat can occur between biomolecules by Rj where j isthe reaction number. The total number of reactions inthe model is denoted by M. Reactions between biomole-cules can lead to the disappearance of some types of bio-molecules and creation of others.

The history of occurrences of all reactions is summa-rized by actual numbers of all biomolecules in the sys-tem or their molar concentrations, called populationstate vector. Population state vector is denoted by x (t )= (x1(t ), ..., xN (t )) where N is total number of bio-mole-cular species.

Each reaction is characterized by its propensity co-efficient, which defines the probability of occurrence ofthe reaction. Propensity for each reaction Rj is denotedby aj (x ) and the sum of all propensities by a 0(x ). Eachreaction Rj is also characterized by a vector vj = (v1j , ...,vNj), called state change vector, whose component vij de-notes the change in the number molecules of the speciesintroduced by reaction Rj .

Reaction rate, Hill model, inhibition, stimulation

The law of mass action (Waage and Guldberg, 1864)derived by Gulberg and Waage is a basic mathematical

Deterministic models and stochastic simulations in multiple reaction models in systems biology 267

model of reactions in systems biology. According to thislaw, the probability of each reaction event is proportio-nal to the product of the concentration of participatingreactants. In the deterministic limit, rate of each elemen-tary reaction is proportional to the product of concentra-tions of the participating reactants. From the law ofmass action it follows that the increase in concentrationof the reagents will result in an increase of the reactionrate, while the decrease must result in the slowdown. Ifa reagent is “drained out”, the reaction does not occurapt all. Let us show an example of a reaction of two re-agents, A and B, which form a complex AB

(1)A B AB+ →

In accordance to the law of mass action, the rate ofthis reaction should be expressed as a product of con-centrations of reactants, denoted by [A ] and [B ], and thepropensity coefficient p,

(2)r p A B= × ×[ ] [ ]

Rates of all reactions contribute to balances of allmasses, which in the simple example stated above hasthe form

(3)d AB

dtr p A B[ ] [ ] [ ]= = × ×

The product structure given by the above-mentionedlaw of mass action is additionally modified by assumingthe possible character of the influence of a bio-moleculeon the reaction rate, stimulation or inhibition. Differenttypes of influence can be modeled by Hill equation withappropriate parameters, given below,

(4)r A Adt

Ak A

n

n n( ) [ ]=

+

Hill equation model contains two parameters. Theparameter k is interpreted as a dissociation constant or“half occupation” constant. Depending on the value ofthe parameter n in the exponent, called the Hill coeffi-cient, Hill equation can be a model of positive coopera-tivity (stimulation effect) of the reactant A in the re-action, when n takes positive values, n > 0, or negativecooperativity (inhibition effect), in the opposite case,where n < 0.

Differential equation models and lists of reactions models, SBML

Dynamics in systems biology models can be descri-bed by using different methods. The most common ap-proach is based on writing systems of coupled first orderdifferential equations describing concentrations for eachsingle species involved in the model by a separate equa-tion. Such a method allows the researcher to presenteven the considerably complex systems in a brief form.Constructing model with ODEs is usually associated witha deterministic solution, although there are complextools allowing the calculation of stochastic trajectories insuch models.

Across the literature, we may find many examples ofmodels of bio-molecular systems written in the form ofsets of differential equations. For illustration, we will fo-cus on the simple model described in (Hat et al., 2009).The simple “Toy” model introduced in this paper descri-bes p 53|Mdm 2 regulatory core of interactions of twoproteins, p 53 and Mdm 2. This model was also usedin further chapters for computational illustration of theproblem of efficiency of different stochastic approachesfor simulation in systems biology models. The “Toy”model consists of only three components (species), totalp 53, cytoplasmic Mdm 2 (Mdm 2c) and nuclear Mdm 2(Mdm 2n). The model (Hat et al., 2009) was describedusing ODEs. Three appropriate differential equationswere derived, each describing concentration over timeof one of species involved in the model (p 53, Mdm 2cand Mdm 2n). These equations are presented below,

(5)ddt

p m s k p Mdm nd53 53 21 12= ⋅ − ⋅ ⋅ ( )

(6)

ddt

Mdm c n s sp

s p

kk

k pMdm

253

53

53

2 3

3

43

12

2

= ⋅ + ⋅+

⎛

⎝⎜

⎞

⎠⎟ −

− ⋅+

⋅

( )( )

(7)

ddt

Mdm n kk

k pMdm c

k Mdm nd

253

2

2

12

2

2

=+

⋅

− ⋅

The above-mentioned model describes balances ofthree interacting proteins and includes reaction ratelaws provided by the law of mass action and Hill modelswith different types of interactions.


Another method to describe models in system bio-logy is based on listing all reactions and for each reac-tion defining appropriate rate laws. Listing all reactionsin the system is basically equivalent to writing system’sequations for balances of volumes of bio-molecular spe-cies. For the case of the “Toy” model of p 53 and Mdm 2interactions (Hat et al., 2009), the “list of reactions” mo-del will assume the following form,

(8)∅ → = ⋅p r m s53 1,

(9)p r k p Mdm nd53 53 212→ ∅ = ⋅ ⋅, [ ] [ ]

(10)∅ → = ⋅Mdm c r n s2 2,

(11)∅ ← = ⋅ ⋅+

Mdm c r n sp

s p2

53533

3

43,

[ ][ ]

(12)

Mdm c Mdm n

r kk

k pMdm c

2 2

5321

2

2

→

= ⋅+

⋅

,

[ ][ ]

(13)Mdm n r k Mdm nd2 22→ ∅ = ⋅, [ ]

In the context of the “list of reactions” models, it isworthwhile to mention the System Biology Markup Lang-uage (SBML) (http://sbml.org), a standard designed toease biological data exchange over different systems andtools. SBML standard is based on representing list of re-actions of the type (8)-(13) as .xml files and allows forfast and efficient exchange of biological data and modelsbetween a variety of software environments. Formula-ting a model, like reactions (8)-(13), in the SBML formatdoes not require any advanced knowledge of the XMLschema of the systems uses. There are interactive toolsthat help creating SBML models (Swainston and Men-des, 2009).

Graphical representation

The above-described methods, while good when pre-paring simulation experiments, may not be comprehen-sive enough to present/understand the mechanisms ofthe system, for a researcher. A valuable help in under-standing system’s functionalities is the use of graphicalrepresentation. Below, in Figure 1, a diagram represen-ting interactions in the p 53|Mdm 2 regulatory mecha-

Fig. 1. Diagram of p 53|Mdm 2 regulatory core

nism, modeled by equations (5)-(7), or alternatively bythe list of reactions (8)-(13), is presented. This diagramillustrates conventions commonly applied in graphicalrepresentations of system biology models.

From Figure 1, one can observe that in the studied sy-stem we can find two interlinked feedback loops, positiveand negative. Feedback structure includes p53, in the nu-cleus, blocking (inhibiting) nuclear import of Mdm2which in turn enhances (stimulates) p53 degradation inthe nucleus. Additionally, p53 stimulates production ofMdm2 in the cytoplasm.

Deterministic and stochastic simulation algorithms

In this section, algorithms for solving/simulating mo-dels of interacting bio-molecules are presented. As afore-mentioned, these algorithms can be generally dividedinto two categories, deterministic and stochastic.

Deterministic models

Deterministic methods for simulating the behaviorof a systems biology model involve obtaining numericalsolution to ODEs. Well-developed methods for integra-ting ODEs, which are based on recursive procedureswith adaptively adjusted step size, e.g., Runge-Kutta pro-cedures, are fast, reliable and applicable even to systemsof high order.

When a system is modeled by a set of ODEs, insightsto its properties can be gained by a study based on thesystem theory computational tools including computing


equilibrium points, verifying their stabilities, verifyingexistence of possible cycles (limit cycles), describingtypes of transient processes in the system, checking thedependence of the behavior of solutions on parameters,bifurcation points and their types.

Using the above described analytical as semi-analyti-cal tools is, however, substantially limited or impossiblein the context of stochastic modeling. The limitationsfollow from the fact that in the stochastic case determini-stic trajectories are replaced by families of probabilitydistributions. Researching these probability distributionsby stochastic sampling techniques was presented inthe next sections.

Stochastic models

In the deterministic modeling case, there is notmuch room for options regarding the quality of simula-tions, since in the widely used software packages theseoptions are often set automatically. In contrast, inthe stochastic case, there are several further decisionsfor a researcher to make, regarding the structure ofthe algorithm and the related accuracy of simulations.These decisions highly depend on the size of the re-searched system. Important problem to solve is oftenthe compromise between the time load of the computa-tional project and the accuracy of the expected results.Some of the problems related to stochastic modeling aswell as applicable approaches were also described inthe subsections below.

In 1976, Gillespie published his first paper (Gilles-pie, 1976) that described an algorithm for simulatingchemical (biochemical) reaction. Purpose of the stocha-stic simulation algorithm proposed by him was to recordchanges in the population vector over time. During eachstep of the Gillespie simulation procedure, we need todecide when and which reaction will occur as the so-onest. Over the time many variants of original algorithmwere proposed in an effort to improve computationalefficiency while trying to preserve the exactness.

Exact stochastic simulation algorithms

Exact stochastic methods are valid for small num-bers of bio-molecules in the system, as they account foreach reaction event in the system separately. Originally,Gillespie (1976) proposed two methods for exact simula-tion, First Reaction Method and the Direct Method.Additionally, we will describe the Next Reaction Methodproposed by Gibson and Bruck (2000) as an improved

version of the First Reaction Method. When consideringthe exact solution, we need to take into account that itmay slow down the calculation process significantly,especially for large systems.

First Reaction Method

Initialization. In the first reaction method, we startby initializing the number of reactions Rj , j = 1, ..., M,(where M is the total number of reactions) and con-centration of species in population vector x (t ) = (x1(t ),..., xN (t )) (where N is the total number of species) atinitial time.

In step 1 of the first reaction method, for each re-action Rj we draw random numbers rj from the uniformdistribution and calculate the propensity functions aj (x ).

Step 2. Based on the randomly drawn numbers rj,we compute the “putative” reaction time tj for eachreaction Rj, using the following formula:

(14)ta x

r j Mjj

j= − ⋅ =1 1( )

ln ( ) , ... ,

A reaction which eventually occurs in the simulatedsystem is the reaction Rμ, whose index is defined by

(15)t t tmμ = min ( , ... , )1

Step 3. For the given R μ we proceed to recordingthe time of this reaction by adding the calculated timestep to the current time counter.

(16)t t t= + μ

Next, we proceed to implementing the changes tothe population state vector x (t ) introduced by occurren-ce of the selected reaction.

(17)x x v= + μ

If the current time t of experiment does not exceedthe total time of simulation experiment, we proceed tothe step 1 and follow the subsequent steps of the algo-rithm, otherwise we terminate the procedure.

The First Reaction Method is an exact method forcomputing stochastic realizations related to system’sevolution. However, we need to take into considerationthe fact that in each iteration, we must draw M randomnumbers. In addition, we calculate M times the loga-rithm to compute the putative reaction times. Therefore,for larger systems consisting of various reactions andspecies, decision of using the First Reaction Method can


result in a significant slowdown of the computationaltime.

Direct Method

In the Direct Method, proposed by Gillespie, the num-ber of necessary draws of random numbers is reduced.This algorithm can be divided into four major steps.

Initialization. We initialize reactions Rj , j = 1, ..., M(where M is the total number of reactions) and con-centrations of species in population vector x (t ) = (x1(t ),..., xN (t )) (where N is the total number of species) atinitial time.

Step 1. We draw two random numbers r1 and r2 fromthe uniform distribution. Then we calculate the propen-sity functions aj (x ) for each reaction Rj . After that wesum all propensities and we denote the sum by a0(x ).

(18)a x a xj0 ( ) ( )= ∑Step 2. In this step, we determine which Rj will occur

next and the time instant of its occurrence. Using the sumof propensities a0(x ) we calculate the time step T.

(19)Ta x

r= − ⋅1

01( )

ln( )

The smallest integer j satisfying

a x r a xii

j

( ) ( )=∑⎛⎝⎜

⎞⎠⎟ >

12 0

is the index of the next reaction to be executed. Step 3. In last step, we compute the time of the next

reaction occurrence, t = t + T and execute the selectedreaction R j by modifying the population state vector x (t )by adding the vector of state change vj for the reaction R j.

(20)x x v j= +

If the current time of experiment does not exceedthe total time, we go to the step 1 and follow the sub-sequent steps of the algorithm, otherwise the algorithmterminates.

The drawback of the Direct Method is that when weconsider a system consisting of (very) rare and frequentreactions at the same time, the Direct Method, due to fi-nite precision of random number generators, may leadto “overlooking” of all (or most) of rare events. This pro-blem does not appear in the First Reaction Methodwhich is, however, computationally less efficient as al-ready mentioned.

Next Reaction Method

Gibson and Bruck (Gibson and Bruck, 2000) intro-duced an adaptation of the First Reaction Method impro-ved in terms of its computational complexity. Improve-ments were achieved through the re-use of already calcu-lated propensities. The number of draws of random num-bers per step was reduced to one. The idea is based onintroduction of the priority queue containing the sortedputative reaction times. Instead of using the relative pu-tative time, in this case we use their absolute values.

Initialization. Analogous to the previously discus-sed methods, we initialize reactions Rj, j = 1, ..., M(where M is the total number of reactions) and con-centrations of species in the population vector x (t )= (x1(t ), ..., xN (t )) (where N is the total number ofspecies) at the initial time. Next, the priority queue iscreated. In our case, we assume that the queue has theform of the ascending sorted list. The list includes thepop method which simultaneously removes and returnsits first element. Next, for each reaction Rj we calculatethe propensity functions aj (x ) using the random numberrj drawn from the uniform distribution, we determinethe reaction time tj by using the formula,

(21)ta x

r t j Mjj

j= − ⋅ + =1 10( )ln ( ) , ... ,

Finally, we add it to the priority queue. After thisstep, the initialization is over and we can proceed toStep 1.

Step 1. The first element, t μ, from the priorityqueue indicates the time at which the first reaction, R μ,occurs. We pop this element from the priority queue,proceed to time t = t μ and we implement changes to thepopulation state vector introduced by this reaction.

(22)x x v= + μ

Step 2. In this step, we focus our attention on thereactions whose propensities were affected due to theexecution of the Step 1. For each reaction, Ri, from theaffected pool we:C compute the new propensity and store it in a tempo-

rary container atemp,C re-calculate the reaction time and permit update on

the priority queue,C store the new value of propensity ai (x ) = atemp


(23)( )ta xa

t t tii

tempi=

⎛

⎝⎜⎜

⎞

⎠⎟⎟ ⋅ − +

( )

Once all affected reactions have been updated, wemove to Step 3 and proceed by calculating a new re-action time for the reaction R μ. To do so, similar to Ini-tialization step, we draw a random number r μ fromthe uniform distribution and compute new occurrencetime of the reaction R μ as the exponentially distributedrandom number with mean aμ (x ),

(24)ta x

r tμμ

μ= ⋅ +1( )

ln ( )

We store the new reaction time t: by pushing it tothe priority queue.

If the current time of experiment does not exceed thetotal time, we proceed to the step 1 and follow the sub-sequent steps of the algorithm, otherwise we terminate.

The above-stated algorithm combines advantages ofthe two previous ones. On one hand, it does not overlookrare reactions, and on the other, by using the queuemechanism, computational efficiency is improved. Thismethod performs very well, especially in systems consis-ting of many reacting species and reactions.

Approximate stochastic simulation algorithms

Algorithms described in previous subsections arecalled exact stochastic simulation algorithms, due tothe fact that for each reaction, its exact time instant isseparately drawn by using appropriate pseudo randomgenerator. However, execution of these algorithms maybe quite time consuming. In order to achieve better com-putational efficiency, several approximated algorithmshave been proposed. These approximated methods, alsocalled accelerated methods, were also presented in thissubsection.

In order to achieve the acceleration, these methodssacrifice the accuracy of the solution for its better timeefficiency. In the majority of approaches, several reac-tions occur simultaneously in the coarse-grained time in-crements. In all accelerated methods, the following LeapCondition must be satisfied.

(25)( )a x t t tj ( ) [ , ]≈ +const for τ

In other words, each time step (leap) τ must be smallenough, so the changes in the propensities have to beinsignificant.

Basic τ -Leap Method

This method was originally proposed by Gillespie asan approximate, accelerated procedure for simulatingoccurrences of reactions in stochastic models (Gillespie,2001). The idea is to divide the time scale into steps(τ-leaps) and to allow for several firings of each reactionat each time step. In order to successfully apply τ-lea-ping, we need to use the largest possible value of τ sa-tisfying the Leap Condition. If τ is set to too small va-lues, we may slow down the algorithm execution, to 1 oreven 0 firings per step, which will result in the loss ofnumerical efficiency.

One way of checking for the best τ is the post leapcheck of differences in propensities for each reaction

(26)a x a x j Mj j( ) ( ) , ... ,+ − =λ 1

and then adjusting the leap according to the value of thisdifference by increasing or decreasing τ. The post leapcheck gives a reliable condition for τ, but clearly thismethod might be too time consuming due to the neces-sity of repeated checking of the post-leap condition.

In his paper Gillespie (Gillespie, 2001) describesalso a pre leap check for best fit of τ. To compute timestep, by using the pre-leap check, we define auxiliaryvariables, bij (x ) and ξ(x ), as follows.

(27)b xa x

xj M i Nji

j

i

( )( )

, ... , ; , ... ,= = =∂

∂1 1

(28)ξ ( ) ( )x a x vj jj

M

==∑

1

To predict best τ we assume some specified fractionε (0 < ε < 1) as a boundary and we choose minimumvalue of τ satisfying the equation below,

(29)τε

ξ=

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

∈=∑

min( )

( ) ( )[ , ]j Mjii

N

a x

x b x1

0

1

In our further review of this algorithm, we can as-sume that the appropriate value of τ has been estimated.For the description of the further part of the algorithm,it does not make a difference, as to which method of τestimation was applied. With this assumption, we pre-sent the structure (stated below) of the τ-leap algorithm.


Initialization. We proceed by initializing reactionsRj, j = 1, ..., M (where M is the total number of re-actions) and concentrations of species in the populationvector x (t ) = (x1(t ), ..., xN (t )) (where N is the totalnumber of species) at the initial time.

Step 1. We continue by calculating propensities foreach reaction. Using λ = aj (x ) τ as the mean value forreaction R j we compute the number of its firings k j bya random draw from the Poisson distribution with inten-sity λ,

(30)k en

nj

n

= =−λ λ

!, , , ...0 1 2

Step 2. We increment the time, t = t + τ and applychanges introduced by each Rj reaction to the populationstate vector x (t ), additionally taking into accountthe number of firings of Rj,

(31)x x k vj jj

M

= +=∑

1

If the current time of the experiment does not ex-ceed the total time, we go to Step 1 and follow the sub-sequent steps of the algorithm, otherwise we terminate.

The advantage of the above-described algorithm isits simplicity. However, this algorithm lacks the coordi-nation of reacting species during single step, which canlead to a situation where population size of a species canbecome negative. Therefore, further improvements ofthis method have been proposed, which we describebelow.

Chatterjee τ-Leap Method

The previously described basic approach was bur-dened with the risk of reaching negative population size,due to the lack of coordination between reacting species.The lack of coordination means that a situation can hap-pen, where the number of firings of a reaction is greaterthan the number of available reactants. Chatterjee andcoworkers in their work (Chatterjee et al., 2005) des-cribe a method which allows for the avoidance of thiscase. To coordinate reactions and ensure mass conserva-tion of the entire reactions network, two mechanismsare introduced. The first mechanism involves kj

C --a maximum number of firings of a given reaction Rj, up-dated constantly between each subsequent firing duringthe τ interval. The second mechanism is the additional

vector x-(t ) introduced to track the currently availablereacting population size during the τ time step. Beforethe execution of any firing, the currently available re-acting population size is set to x-(t ) = x (t ). Each changein the population size introduced by subsequent firingsof reaction Rj is tracked and applied to vector x-(t ). Thestructure of the Chatterjee τ -Leap method is presentedbelow.

Initialization. We proceed by initializing reactionsRj, j = 1, ..., M (where M is the total number of re-actions) and concentrations of species in populationvector x (t ) = (x1(t ), ..., xN (t )) (where N is the totalnumber of species) at initial time.

Step 1. We continue with calculating propensitiesfor each reaction and set the initial value of the popula-tion size change vector.

(32)~( ) ( )x t x t=

Then we compute the actual time step τ as the reci-procal of a0(x ) (previously defined as the sum of all pro-pensities) multiplied by a coarse-graining factor f > 1,

(33)τ =f

a x0 ( )

After calculating the time leap, we proceed to Step 2(for each reaction Rj ).

Step 2. C we compute the value of k j

C, a maximal number ofpossible firings for reaction Rj

(34)k vxv

j i N iji

ij

•

== <

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

⎛

⎝

⎜⎜

⎞

⎠

⎟⎟min ( )

~,...,1

0

C using random number generator based on binomialdistribution, B (k j

C, p ) ((k jC) is the number of experi-

ments and p is the probability of the success) wedraw the number of firings kj,

(35)k B k p pa x

kj jj

j

= =••( , ),

( )where

τ

C we apply changes to the x-(t ) vector for all vij < 0,

(36)~ ~x x v ki i ij j= +

Step 3. We update the time of simulation (t = t + τ )and apply the changes to the population state vectorx (t ) for each reaction Rj,


(37)x x k vj jj

M

= +=∑

1

If the current time of experiment does not exceedthe total time, we proceed to the Step 1 and followthe subsequent steps of the algorithm, otherwise we ter-minate the algorithm.

Cao τ-Leap Method

Cao and coworkers in their work (Cao et al., 2006)propose an optimized approach for tau-leap methods.They introduce a partition of reactions into two sets, theset of critical reactions Jc and the set of noncritical re-actions Jnc. As the critical reaction, they define each re-action that within a specified number of firings nc is de-pleting the number of any of the reactants. The purposefor such partitioning is the fact that the negative popula-tion typically arises from multiple firings of reactionsthat are close to consume all its reactants. To preventthe occurrence of such a scenario, the algorithm allowsfor only one firing from the critical subset during thespecified time leap. Cao’s method is described in detailbelow.

Initialization. We proceed by initializing reactionsRj, j = 1, ..., M (where M is the total number of re-actions) and concentrations of species in population vec-tor x (t ) = (x1(t ), ..., xN (t )) (where N is the total num-ber of species) at the initial time.

Step 1. We start with partition of all reactions intotwo groups, critical Jc and noncritical Jnc by checkingwhich reaction is about to deplete its reactants inthe range defined by the value of nc using the condition,

(38)min( )

[ , ]i N

i

ijc

x tv

n∈

⎛

⎝⎜⎜

⎞

⎠⎟⎟

<1

We continue by calculating the time leap candidateTnc for the subset of noncritical reactions.

(39)( )T a x

ax t Tnc j

jnc= + ≤

⎛

⎝⎜⎜

⎞

⎠⎟⎟

< ≤

max ( ) ( ) ε

εwhere 0 1

If the value of the calculated candidate is less thansome small multiple (usually 10) of 1/a0(x ) we abort exe-cution of following steps and for specified number oftimes (usually 100) we execute single-reaction steps of

the SSA and return to Step 1. Otherwise if the value ofthe candidate is greater, we proceed to Step 2.

Step 2. Here, we start with calculation of the sum ac

of all propensities over members of the critical subset ofreactions,

(40)a a xc jj Jc

=∈∑ ( )

We generate the second time leap candidate Tc (forcritical subset) as a random draw of the exponential ran-dom variable with mean value λ = 1/ac (x ),

(41)T Expc = ( )λ

Step 3. From the above, two candidates for timeleap τ we choose the smaller one and treat it as an actualtime leap. In the case where Tc < Tnc single critical re-action is executed. Reaction is chosen by a randomsample from the subset Jc. We calculate number of fi-rings kj for reactions from the subset of non-critical re-actions Jnc by using Poisson distribution with intensitygiven by the mean value aj (x ), j 0 JnC.

Step 4. We update the time of simulation (t = t + τ)and apply the changes in the population state vectorx (t ) for each reaction Rj from the noncritical subset,

(42)x x k v j Jj jj

M

nC= + ∈=∑

1

where

If the current time of experiment does not exceedthe total time, we go to Step 1 and follow the subsequentsteps of the algorithm, otherwise we terminate the algo-rithm.

Hybrid stochastic simulation algorithms

A group of the stochastic simulation algorithms, cal-led hybrid stochastic simulation algorithms were deve-loped for bio-molecular systems that include reactionswith very wide scope of values of propensities of re-action. The researchers propose that in such a case, fastreaction can be solved by using differential equations(deterministic modeling), while for slow reactions thestochastic simulations machinery is used. This approachis presented in this subsection.

Haseltine and Rawlings (Haseltine and Rawlings,2002) propose a modification of the standard GillespieAlgorithm resulting in a hybrid method based on partial-ly solving models with the use of differential equations


and partially in a stochastic manner. By applying the ap-propriate partitioning condition, we split all reactions totwo groups, of slow and fast reactions. Clearly, asthe propensity greatly relies on population size, the par-tition threshold is defined by the number of bio-particles.Fast reactions are commonly defined as such, for whichwe talk about hundreds of thousands of reacting bio-particles. Such reactions occur very frequently duringsimulation (time leap between occurrence is very small),which allows for approximation of the process by usinga system of coupled first-order ODEs guaranteeingsufficient precision. In contrast, for slow reactions, thenumber of bio-particles is assumed to be of the order ofhundreds or less. These reactions are simulated by usingstochastic random generators. One can notice that thereis an area in between which is difficult to classify aseither of the types of reactions.

Based on the idea of partitioning reactions into fastand slow, Haseltine and Rawlings proposed the followinghybrid algorithm.

Initialization. As mentioned before, we proceed bydividing the reactions into two groups. We describe fastreactions separately by using ODEs. After that we initia-lize slow reactions Rj, j = 1, ..., M (where M is the totalnumber of slow reactions) and concentration of speciesin population vector x (t ) = (x1(t ), ..., xN (t )) (where Nis the total number of species) at initial time.

Step 1. We compute the propensities for all slow re-actions and sum them at given time t.

(43)( ) ( )a x t a x tj0 ( ) ( )= ∑

We proceed by drawing two random numbers r1 andr2 from the uniform distribution.

Step 2. We compute deterministic changes by sol-ving differential equations to the time t + Δt for whichthe equation stated below is true:

(44)( )ln ( ) ( )r a x t dtt

t t

1 0 0+ =+

∫Δ

Step 3. We check which slow reaction R μ will occurbetween t and t + Δt and we update properly populationstate vector x (t ),

(45)a t t r a t t a t tjj

jj

( ) ( ) ( )+ ≤ + < +=

−

=∑ ∑Δ Δ Δ2 0

1

1

1

μ μ

If the current time of experiment does not exceedthe total time, we proceed to the Step 1 and followthe subsequent steps of the algorithm, otherwise we ter-minate the algorithm.

Computational complexity of stochastic simulation algorithms

When describing computational algorithms, it is verydesirable to include estimations concerning their compu-tational complexity as functions of parameters descri-bing the size of the analyzed problem. In this section, weprovide some comments on this issue.

All stochastic simulation algorithms which we havedescribed in previous sections have simple logical struc-ture. Diagrams of their algorithms, presented in chapter“Deterministic and stochastic simulation algorithms”contain only one loop structure. In conclusion, in orderto estimate computational complexity of each algorithm,one has to estimate (i) computational complexity of onepass of the loop and (ii) the number of passes ofthe loop.

As for (i), computational operations inside the bodyof the loop include, generation of random numbers, se-lection of one reaction (several reactions) from all pos-sible reactions, operations related to updating of the po-pulation state vectors. These three elements can gene-rally have different computational load, but the third one(updating of the population state vector) is usually negli-gible compared to the first two. In different methodsthe computational load of the first two operations candiffer. For example the computational load of the one passof the body of the loop in the Direct Method is estimatedas O (M ) (Mauch and Stalzer, 2011; Schulze, 2002).

As for (ii), it is rather impossible to estimate, beforeperforming some experiments with the system, as tohow many loop executions will the simulation problemrequire.

When estimating time complexity of approximatealgorithm versus exact algorithm, it is possible to pro-pose some rough estimates by calculating the number offirings of reaction inside one leap.

Summarizing the above-mentioned comments, wefound basing the estimates of computational complexityon experimental setup as most practical. When reportingcomputational results, we have added data on the time


of computations and on the number of steps. This makescomparisons between different methods possible.

Computational examples

In this section, we illustrate methods presentedabove by computational examples. We focus our atten-tion on three different models. The first one is a pre-viously described model explaining p53|Mdm 2 regulato-ry mechanisms described in (Hat et al., 2009). For thismodel, we compared the Direct Method with two diffe-rent tau-leap algorithms. The second model describesinfection of a cell by a virus. For this viral infection mo-del, we additionally performed comparisons of the hybridstochastic simulation method to the Direct Method andtau-leap family methods. The last model, in contrast toprevious examples concerning bio-molecules, describesa behavior of an ecosystem, more precisely interactionsof two populations, prey and predator. For this Prey-Predator model, we compared efficiencies of Cao andChatterjee implementations of the tau-leap method.

In the computational examples, which are mentionedbelow, we present results obtained in the following sce-nario: first we perform numerous random simulations ofevolution of the studies systems, according to the assu-med method and with the chosen parameters. On the ba-sis of performed simulations, we then compare differentmethods by comparisons of values obtained by (i) avera-ging across multiple simulation experiments, (ii) compa-risons of standard deviations obtained from multiplesimulations experiments, (iii) comparisons of frequen-cies of occurrences of reactions in different stochasticsimulation algorithms. In this approach, we consideredDirect Method as the “golden standard”, i.e., accuraciesof different stochastic simulation methods were mea-sured by the distance of the obtained realizations to therealization of the stochastic evolution process obtainedby using the Direct Method. Apart from the above-statedcomparisons, we have also compared time efficiencies ofdifferent methods.

Model of p53|Mdm2 signaling pathway

The model of interactions of protein molecules p 53and Mdm 2, is described by the set of first order differen-tial equations (5)-(7), or equivalently by the list of reac-tions (8)-(13). For this model, we have performed com-putational experiments using parameters specified in thepaper (Hat et al., 2009) presented in Table 1.

Table 1. Parameters used in experimentfor simulation of model of p 53|Mdm 2

m 2

n 6

s 1 16

s 2 8

s 3 80

s 4 1 @ 105

k 1 3.5 @ 10!3

k 2 2300

kd 1 1 @ 10!13

kd 2 3 @ 10!3

For the Model of p53|Mdm2 signaling pathway, wehave performed stochastic simulation experiments by exe-cuting 3000 separate trials per each of the studied me-thods, “Direct Method”, “Tau-Leap Chatterjee” and “Tau-Leap Cao”. For both approximate tau-leap methods, their“run time” parameters were adjusted experimentally,such that accuracy was optimized without significant lossin terms of time efficiency. Due to the interlinked positiveand negative feedback loops, we observed oscillations inconcentration of each of the protein species. Oscillatorybehavior of time plots in Figure 2 has biological explana-tion of directing properly cellular response to DNA da-mage presented in detail in (Hat et al., 2009).

Reviewing the results depicted graphically in Fi-gure 2, we noticed that concentrations, particularly forcytoplasmic Mdm2, oscillate around 3 million bio-mole-cules. At such high level of concentrations, reactionsoccur at a very fast rate. For such systems, mean valuesobtained from multiple trials practically do not differfrom the deterministic solution. We noticed that the lossof accuracy is insignificant in this case. This property isvisible in the graph presenting time plots of signals foreach method drawn in Figure 2, where only with large“zoom in” we could observe differences between avera-ges. The results for Cao tau-leap method and the DirectMethod were almost overlapping, while the Chatterjeetau-leap method provided results with a (slightly) highererror rate. For both approximated methods, the accu-racy loss in terms of computational efficiency is insigni-ficant.

However, with the high accuracy of computationalresults obtained by all of the applied methods, we obser-


Fig. 2. Chart averaged realizations of p 53|Mdm 2 model using exact algorithm (red) and two approximated tau-leap methods:Cao (green) and Chatterjee (blue). In upper-right corner, we see a large “zoom-in” where we can observe differences between

averaged results

Table 2. Summary of averaged resultsfrom 3000 realizations per single method

AlgorithmDirect

MethodCao

Tau-LeapChatterjeeTau-Leap

Parameters epsilon = 0.5 f = 2000

Average time (s) 13.2728 0.1534 0.157

Average numberof steps per trial

33498160 8855.87 16745.01

ved significant differences in their time efficiencies. Tocalculate single trial with exact stochastic method,around 13 seconds were required, while for both appro-ximate methods simulation takes around 150 millise-conds. These results are summarized in Table 2, whereapart from mean times of simulations, mean numbers ofsteps per one experiment for each of the methods is alsoreported.

Viral infection model

The second model describes a simple network ofa viral infection of a cell (Haseltine and Rawlings, 2002).The model consists of three species: viral structuralprotein (S) (48) and two copies of viral nucleic acids,template (T) (46) and genomic (G) (47). The determini-stic dynamics of interactions of these species is descri-bed by the following system of differential equations:

(46)dTdt

k G k T= ⋅ − ⋅1 2

(47)dGdt

k T k G k G S= ⋅ − ⋅ − ⋅ ⋅2 1 4

(48)dSdt

k T k s k G S= ⋅ − ⋅ − ⋅ ⋅5 6 4

Initial values of reactants are as follows: T = 1, G = 1,S = 1. These values are understood as numbers of inter-acting molecules. As for parameters, their values wereobtained from the paper (Haseltine and Rawlings, 2002)and are reprinted in Table 3.

Table 3. Parameters used in experimentfor simulation of model of viral infection

Parameter Value

k 1 0.025

k 2 0.25

k 3 1

k 4 7.5 @ 10!6

k 5 1000

k 6 1.99


We used this model to present the effectiveness ofstochastic simulation algorithms with regard to the ideaof the hybrid method. Reactions described in model arepartioned into two sets, slow (treated in a stochasticway). As slow reactions in equations (46)-(48), we treatall reactions influencing directly concentration of viralnucleic acids. This means that evolutions described byequations (46) and (47) are modeled by using stochasticsimulation algorithms. In contrast, evolution governedby equation (48) describing viral structural protein wascalculated using the deterministic approach.

The total time of single experiment was set to 200days. For each applied method, we executed 5000 ex-periments and generated mean results. Below we pre-sent outcomes for 4 different methods as graphical plotsof signals, along with the table presenting calculationtime for each method.

As in the previous example, in this case, exact sto-chastic method is rather inefficient in terms of the timeconsumption. Single experiment is computed for ap-proximately 6.5 seconds. Both approximate tau-leap me-thods proved that the model can be calculated in moreefficient manner while still preserving accuracy of the re-sults. We noticed that results obtained for Chatterjeealgorithm are more accurate (Fig. 3) in comparison toCao method in terms of outcome of exact solution. Inaddition, we found performance increase in the calcula-tion time. Still partitioning of the model into two subsetsof slow and fast reactions allowed us for further optimi-zation. For hybrid method, we observed that the resultsobtained by us are almost overlapping with the exactalgorithm.

Prey-Predator

(49)dNdt

rN Nk

a NN

P= ⋅ −⎛⎝⎜

⎞⎠⎟

−+

⋅11 ω

(50)dPdt

P c aNN

g= ⋅ ⋅+

−⎛

⎝⎜

⎞

⎠⎟

1 ω

Although our previous examples focused around mo-dels describing behavior of bio-molecules, in the last ex-ample, we performed a comparison of approximated me-thods in the Prey-predator model describing an eco-sys-

Table 4. Summary of averaged resultsfrom 5000 realizations of per single method

Method ParametersTotaltime(s)

Averagetime(s)

Direct Method 32410 6.482

Cao Tau-Leap epsilon = 0.9 3347 0.669

Chatterjee Tau-Leap f = 1500 216 0.043

Haseltine--Rawlings

deterministic step = 1stochastic step = 0.01 669 0.134

tem (Pineda-Krch, 2008). The model is given by two dif-ferential equations describing rates of change of sizes ofpopulations of prey’s (N ) (49) and predator’s (P ) (50).Due to its construction this model presents additionalpossibilities in analysis of accuracies of the approximatestochastic algorithms versus accurate ones. We noticedthat the prey population is bound by the value of para-meter k (49), also we could see that predator’s birth anddeath rates are dependent on the current predator’s den-sity. In the case of the deterministic realization forthe chosen parameters, we observe un damped oscilla-tions for both of the signals, but in the case of stochasticmethods there is a possibility of occurrence of a situ-ation in which all predators become extinct. This leadsto the state where predators population remains emptyand preys density gradually stabilizes at a fixed valuedetermined by the parameter k. In stochastic simula-tions, we observed the occurrence of extinction of pre-dators with probability one, provided that simulationtime was long enough. Averaging over many stochasticsimulations led to interesting results, depending on thesimulation method used, the pattern of “averaged”oscillations changed. The model is an interestingexample of the analysis of accuracy of approximated me-thods, as we can compare different modes of oscillationssuppression.

To perform the experiment, we collected data from5000 separate trials of each method. The total simula-tion time per a single trial was set to 100 time units.Parameters and initial condition values can be found inTable 5.

In Table 6, we have reported results of two differenttau-leap approximate methods. We noticed that resultsfor both Cao (Cao et al., 2006) and Chatterjee (Chatter-


Fig. 3. Time plots of concentrations of protein and viral species obtained by using different stochastic simulation methods(Viral infection model)

Fig. 4. Plot for prey population – different realizations


Table 5. Parameters used in experimentfor simulation of prey-predator model

N 1000

P 100

r 1

k 1000

a 0.007

ω 0.0035

c 2

g 2

Table 6. Summary of results for different methods. In columnA, we present average computation time for single simulation.In column B, we report averaged number of steps, plus stan-dard deviation for those numbers in column C. In the last co-lumn D, we report information on reaction averaged differen-ces of number of occurrences of reactions between the ap-

proximate and exact method of stochastic simulation

Method Parameter A B C D

DirectMethod

1.96 286334 58%

Cao epsilon = 0.2 0.55 38663 37% 8%

Cao epsilon = 1 0.1 3829 336% 32%

Chatterjee f = 15 0.39 20450 29% 7%

Chatterjee f = 23 0.23 11336 17% 33%

jee et al., 2005) methods strongly depend on the choiceof their “run time” parameter, both in terms of timeconsumption and quality of calculated results. We alsonoticed that average time needed to perform a singletrial using exact stochastic method took around 2seconds while for both approximate methods it oscillatedaround 500 milliseconds. With the increase in the valueof parameter epsilon in the Cao method we observedreduction in the calculation time but also great loss ofexactness which can be seen in Figure 4. The samesituation was observed for Chatterjee method when weincreased the value of the parameter f. Additionally,Table 6 reports information about average number ofsteps and standard deviation (between different simula-tions) for those numbers. We have also collected infor-mation on the average differences of the number of re-action occurrence for each of the approximate methodscompared to the realization obtained by using the exactmethod. In both cases where we sacrifice exactness toincrease calculation performance (changes in method

call parameters), we notice (in Fig. 4) the lack of oscilla-tion suppression especially in the case of Cao implemen-tation of approximate simulations. For this experiment,best results were those obtained for Cao algorithm.

Conclusion

There is a great interest in using stochastic algo-rithms for simulations in system biology models. How-ever, several problems, particularly those concerning com-promises between computational efficiency and exactnessof results, have been reported in the previously publishedliterature.

In this paper, we have surveyed existing methodo-logies for simulating systems of interacting bio-mole-cules. We have discussed their advantages and draw-backs and we have demonstrated computational exam-ples of their applications.

Our computational examples, developed for rathersimple systems of interacting biomolecules, confirm thatthe proper choice of the stochastic simulation method aswell as proper tuning of the parameters of the method,may be very important for the obtained results.

Implementing different methods of stochastic simu-lations and comparing their outcomes seems like a goodstrategy. Distributions of simulation errors can be obtai-ned by repeated multiple simulations. Comparisons ofoutcomes of different methods can provide knowledgeon the exactness of solutions and on the reliability ofobtained results.

Acknowledgments

This work was supported by the European Community fromthe European Social Fund (Paweł Lachor UDA-POKL.04.01.01-00-106/09-00) and by The Foundation for Polish Science(Krzysztof Puszyński N N518 287540).

References

Butcher J.C. (2003) Numerical methods for ordinary differen-tial equations, John Wiley and Sons.

Cao Y., Gillespie D., Petzold L. (2006) Efficient StepsizeSelection for the Tau-Leaping Method. J. Chem. Phys.124: 044109.

Chatterjee A., Vlachos D.G., Katsoulakis M.A. (2005) Binomialdistribution based τ-leap accelerated stochastic simulation.J. Chem. Phys., 122: 024112.

Geva-Zatorsky N., Rosenfeld N., Itzkovitz S., Milo R., Sigal A.,Dekel E., Yarnitzky T., Liron Y., Polak P., Lahav G., et al.(2006) Oscillations and variability in the p53 system. Mol.Syst. Biol. 2: 0033.


Gibson M.A., Bruck J. (2000) Efficient exact stochastic simu-lation of chemical systems with many species and manychannels. J. Phys. Chem. 104: 1876-1889.

Gillespie D. (1976) A general method for numerically simula-ting the stochastic time evolution of coupled chemical re-actions. J. Comput. Phys. 22: 403-434.

Gillespie D. (2001) Approximate accelerated stochastic simu-lation of chemically reacting systems. J. Chem. Phys. 115:1716.

Haseltine E.L., Rawlings J.B. (2002) Approximate simulationof coupled fast and slow reactions for stochastic chemicalkinetics. J. Chem. Phys. 117: 6959-6969.

Hat B., Puszyński K., Lipniacki T. (2009) Exploring mecha-nisms of oscillations in p53 and NF-kappa B systems. IETSys. Biol. 3: 342-355.

Krishna S., Jensen M., Sneppen K. (2006) Minimal model of spi-ky oscillations in NF-κB signaling. PNAS 104: 29-10845.

Mauch S., Stalzer M. (2011) Efficient formulations for exactstochastic simulation of chemical systems. IEEE/ACMTrans. Comput. Biol. Bioinform. 8(1).

Meng T.C., Somani S., Dhar P. (2004) Modeling and simula-tion of biological systems with stochasticity. Silico Biol. 4:0024.

Pineda-Krch M. (2008) Implementing the stochastic simula-tion algorithm in R using the GillespieSSA package.J. Stat. Software 25(12): 1-18.

Schulze T.P. (2002) Kinetic Monte Carlo simulations withminimal searching. Phys. Rev. E 65: 036704.

Swainston N., Mendes P. (2009) libAnnotationSBML: a libraryfor exploiting SBML annotations. Bioinformatics 25(17):2292-2293.

Waage P. Guldberg C.M. (1864) Videnskabs-Selskabet i Chris-tiana. Forhandlinger, 35.

Documents

Deterministic models and stochastic simulations in multiple reaction