HIERATIC(( - University of Birmingham · Inﬁnite- and Finite-Horizon Bisimulation Minimisation in PRISM Chris Good1, Nishanthan Kamaleson 2, David Parker , Mate Puljiz1, and Jonathan

!!

HIERATIC( (Hierarchical(Analysis(of((Complex(Dynamical(Systems(((

Deliverable:((D5.1(Title:(New(Aggregation(Strategies(Identified(and(Implemented(in(PRISM.(Authors:(Chris(Good,(Nishanthan(Kamaleson,(David(Parker,(Mate(Puljiz,(

Jonathan(E.(Rowe,(Chunyan(Mu(and(Peter(Dittrich(Date:(7(November(2015((

(((

(

New$Aggregation$Strategies$Identified$and$Implemented$in$PRISM:$Overview$

Chris$Good,$Nishanthan$Kamaleson,$David$Parker,$Mate$Puljiz,$Jonathan$E.$Rowe,$Chunyan$Mu$and$Peter$Dittrich$

!WP5!builds!upon!theoretical!results!and!algorithms!from!WP1!in!order!to!develop!efficient!aggregation!strategies!for!PRISM,!a!tool!for!verification!and!prediction!of!probabilistic!systems.!PRISM!performs!probabilistic+model+checking,!a!formal!approach!to!the!analysis!of!probabilistic!systems!such!as!Markov!chains!and!Markov!decision!processes.!In!the!context!of!the!probabilistic!verification!methods!used!by!PRISM,!coarse!graining!(or!aggregation)!is!usually!referred!to!as!bisimulation,!and!the!process!of!constructing!the!smallest!possible!such!reduction!is!called!bisimulation+minimisation.!!The!first!report!(“InfiniteF!and!FiniteFHorizon!Bisimulation!Minimisation!in!PRISM”)!investigates!the!practical!applicability!of!the!notion!of!finite6horizon!bisimulation!minimisation!outlined!in!WP3.!It!begins!by!implementing!and!comparing!existing!algorithms!for!full!bisimulation!minimisation!of!Markov!chains!in!PRISM.!It!then!adapts!these!to!form!an!implementation!of!the!finiteFhorizon!variant.!First,!prototype!implementations!of!the!model!reduction!process!are!developed!and!evaluated!on!a!set!of!standard!benchmarks.!Then,!to!avoid!the!bottleneck!of!first!constructing!the!full,!unreduced!Markov!chain,!an!on6the6fly!approach!is!proposed,!based!on!a!backwards!exploration!of!the!model!directly!from!it’s!highFlevel!behavioural!description.!Two!distinct!implementations!are!investigated,!using!symbolic!and!explicit!representations!of!state!classes.!With!these,!substantial!reductions!in!model!size!are!demonstrated,!along!with!gains!in!efficiency!and!scalability!with!respect!to!PRISM.!!The!second!report!(“Verification!of!Markov!Decision!Processes!using!Learning!Algorithms”)!also!aims!to!construct!a!reduced!version!of!a!probabilistic!model!such!that!a!particular!property!of!interest!can!be!analysed!(approximately)!upon!it.!The!reduction!process!is!different!to!the!one!above:!this!approach!constructs!a!fragment!of!the!complete!model,!based!on!the!generation!of!sample!execution!paths!through!the!model.!Since!it!is!based!on!simulation,!it!avoids!the!bottleneck,!mentioned!above,!of!constructing!the!full,!unreduced!model,!first.!During!the!process,!lower!and!upper!bounds!are!maintained!on!the!property!of!interest,!so!that!the!procedure!can!be!terminated!when!the!answers!obtained!reach!a!preFspecified!level!of!accuracy.!The!techniques!are!implemented!in!an!extension!of!PRISM!and!demonstrate!considerable!performance!improvements!on!several!benchmark!models.!!The!third!report!(“Analysing!Reaction!Networks!using!Chemical!Organisation!Theory!and!Probabilistic!Model!Checking”)!begins!to!develop!links!between!WP5!and!WP4.!Again!working!in!the!context!of!the!PRISM!verification!tool,!the!report!investigates!the!applicability!of!probabilistic!model!checking!and!PRISM!to!the!analysis!of!reaction!networks,!a!widely!used!formalism!for!modelling!chemical!phenomena.!In!particular,!it!uses!these!techniques!to!analyse!models!in!the!context!of!chemical!organisation!theory,!which!provides!a!way!to!analyse!complex!dynamical!networks!by!decomposing!them!into!organisations.!Algorithms!are!developed!to!identify!the!underlying!structure!of!the!models,!which!is!then!analysed!to!determine!quantitative!properties!about!the!likelihood!and!timing!of!evolution!between!organisations.!This!paves!the!way!for!future!work!on!coarse!grained!model!analysis!via!an!organisationFbased!abstraction.!

Infinite- and Finite-HorizonBisimulation Minimisation in PRISM

Chris Good1, Nishanthan Kamaleson2, David Parker2,Mate Puljiz1, and Jonathan E. Rowe2

1 School of Mathematics, University of Birmingham2 School of Computer Science, University of Birmingham

Abstract. We investigate a variety of approaches for bisimulation min-imisation of discrete-time Markov chains and their implementation inthe probabilistic verification tool PRISM. In particular, we develop tech-niques for applying finite-horizon bisimulation minimisation, which pre-serves properties of the model over a finite time window, permitting amore aggressive model reduction to be performed than with classicaltechniques. We investigate both symbolic and explicit-state implemen-tations of these techniques, based on SMT solvers and hash functions,respectively, and illustrate that finite-horizon reduction can provide largereductions in model size, in some cases outperforming PRISM’s existinge�cient implementations of probabilistic verification.

1 Introduction

Probabilistic verification is an automated technique for the formal analysis ofquantitative properties of systems that exhibit stochastic behaviour. A proba-bilistic model, such as a Markov chain or a Markov decision process, is system-atically constructed and then analysed against properties expressed in a formalspecification language such as temporal logic. Mature tools for probabilistic ver-ification such as PRISM [15] and MRMC [13] have been developed, and thetechniques have been applied to a wide range of application domains, from bio-logical and chemical reaction networks [12] to car airbag controllers [1].

A constant challenge in this area is the issue of scalability: probabilistic mod-els, which are explored and constructed in an exhaustive fashion, are typicallyhuge for real-life systems, limiting the applicability of the techniques. A widerange of approaches have been proposed to reduce the size of these models. Oneof the most widely used is probabilistic bisimulation [17], an equivalence rela-tion over the states of a probabilistic model which can be used to constructa smaller quotient model that is equivalent to the original one (in the sensethat it preserves key properties of interest). Typically, it preserves both infinite-horizon (long-run) properties, e.g., “the probability of reaching a state from setA”, finite-horizon (transient) properties, e.g. “the probability of reaching a statefrom set A within k steps”, and, more generally, any property expressible in anappropriate temporal logic such as PCTL [11].

2 Good, Kamaleson, Parker, Puljiz, Rowe

In this report, we consider a variety of bisimulation minimisation techniquesfor Markov chains in the context of the PRISM verification tool. Our startingpoint is an investigation into the e↵ectiveness of several existing approches, allbased on iterative partition refinement, for bisimulation minimisation. We thenadapt these into a finite-horizon bisimulation minimisation algorithm, whichexecutes just k partition refinements in order to generate a partially reducedMarkov chain which preserves finite-horizon (transient) properties of up to timehorizon k. We illustrate, with a preliminary implementation of the minimisationprocedure, applied to a set of standard Markov chain benchmarks, the potentialreductions in model size than can be achieved with a finite-horizon approach.Then, to make the approach practically applicable, we develop on-the-fly ap-proaches to minimisation which construct the reduced model directly from ahigh-level modelling language description. We develop two implementations, onesymbolic, based on SMT solvers, and one explicit-state, using hash functions,and use these to show that finite-horizon reduction can indeed provide large re-ductions in model size, in some cases more e�ciently than the existing e�cientimplementations in PRISM.

2 Preliminaries

2.1 Discrete-time Markov Chains

A (labelled) discrete-time Markov chain (DTMC) is a state transition systemwhere transitions between states are annotated with probabilities.

Definition 1 (DTMC). A DTMC is a tuple D = (S, sinit

, P,L), where:– S is a finite set of states,

– sinit

is the initial state

– P : S ⇥ S ! [0, 1] is the transition probability matrix, where

X

sj2SP (s

i

, sj

) = 1, 8si

2 S

– L : S ! 2APis a labelling function which labels each state s 2 S with the

atomic propositions from a set AP that are true in S.

For each pair si

, sj

of states, P (si

, sj

) represents the probability of goingfrom s

i

to sj

. If there is no outgoing transition from si

to sj

, P (si

, sj

) = 0. Inthe case where P (s

i

, sj

) > 0, si

is a predecessor of sj

and sj

is a successor of si

.A state s 2 S is called absorbing when P (s, s) = 1.

A path � of a DTMC D is a finite or infinite sequence of states

� = s0.s1.s2.s3.s4....

such that 8i � 0, si

2 S and P(si

, si+1) > 0. The ith state of the path � is

denoted by �[i] and the length of the path, i.e. the number of transitions in thepath �, is denoted by |�|. Let PathD(s) denote the set of paths of D that beginin s. To reason formally about the behaviour of a DTMC, we define a probabilityspace Pr

s

over the set of infinite paths PathD(s) [14].

Infinite- and Finite-Horizon Bisimulation Minimisation in PRISM 3

2.2 Probabilistic Computation Tree Logic

Properties of probabilistic models can be expressed using Probabilistic Compu-

tation Tree Logic (PCTL) [11] which extends Computation Tree Logic (CTL)with time and probabilities. In PCTL, state formulas are interpreted over statesof DTMC and path formulas are interpreted over paths in a DTMC.

Definition 2 (PCTL). The syntax of PCTL is inductively defined as follows:

– true is a state formula;

– Every atomic proposition a 2 AP is a state formula;

– If � and are state formulae, then so are ¬� and � ^ ;– If � is a path formula, then P

onp

[�] is a state formula, where on2 {<,,�, >}and 0 p 1;

– If � is a state formula, then X� is a path formula;

– If � and are state formulae, then �Uk is a path formulae, where k 2N [ {1}.

The next operator is denoted by X and the bounded until operator is writtenUk. In the case where k equals 1, the bounded until operator becomes theunbounded until operator and is denoted by U . If state formula � is satisfied inthe next state, then X� is true. If becomes true within k time steps and �is true until the point becomes true, then �Uk is true. The probabilisticoperator P

onp

[�] means that the probability measure of paths that satisfy � iswithin the bound on p.

Definition 3 (PCTL semantics). Let D = (S, sinit

, P,L) be a labelled DTMC.

The satisfaction relation ✏D for PCTL formulae on D is defined by:

– s ✏D true 8s 2 S– s ✏D a i↵ a 2 L(s)– s ✏D ¬� i↵ s 2D �

– s ✏D � ^ i↵ s ✏D � and s ✏D

– s ✏D Ponp

[�] i↵ Prs

{� 2 PathD(s) | � ✏D �} on p

– � ✏D X� i↵ �[1] ✏D �

– � ✏D �Uk i↵ 9i 2 N.(i k ^ �[i] ✏D ^ (8j.0 j < i.�[j] ✏D �))

For example, a PCTL formula such as P<0.01[¬fail1Uk

fail2] means thatthe probability of a failure of type 2 occurring before a failure of type 1 does isless than 0.01. We also often use derived operators F� ⌘ trueU�, which meansthat � eventually becomes true, and Fk� ⌘ trueUk�, which means that �becomes true within k steps. We also often write properties of the form P=?[�]which ask “what is the probability of � being true?” from a state.


2.3 Probabilistic Bisimulation

A bisimulation is an equivalence relation between states in a model (e.g., alabelled transition system). States are equivalent under bisimulation only whenthey have identical labels and their outgoing transitions to other classes of statesare the same. While initially applied only to non-probabilistic systems, this no-tion was later adapted for probabilistic systems as well. Larsen and Skou [17]defined (strong) probabilistic bisimulation for discrete probabilistic transitionsystems. Probabilistic bisimulation is an equivalence relation where any two re-lated states have the the same labels and the same probability of making atransition to any equivalence class.

Definition 4 (Probabilistic bisimulation). Let D = (S, sinit

, P,L) be a DTMC

and R an equivalence relation on S. R is a (strong) probabilistic bisimulation

on D if for (si

, sj

) 2 R:

L(si

) = L(sj

) and P (si

, C) = P (sj

, C) . 8 C 2 S/R.

where P (s, C) =P

si2C

P (s, si

) for any C ✓ S. States si

, sj

are strongly bisim-

ilar if there exists a bisimulation R on D that contains (si

, sj

).

Two states that are probabilistically bisimilar will satisfy the same properties,including both infinite-horizon (long-run) and finite-horizon (transient) proper-ties. Aziz et al. [4] proved that any property in the temporal logic PCTL is alsopreserved in this manner. Thanks to these results, the analysis of the originalMarkov chain, such as probabilistic model checking of PCTL, can be equiva-lently performed on the quotient Markov chain, in which equivalence classes ofbisimilar states are lumped together into a single state.

Usually, we are interested in the coarsest possible probabilistic bisimulationfor a DTMC D (or, in other words, the union of all possible bisimulation rela-tions). We denote the coarsest possible probabilistic bisimulation by ⇠. Hence,we will use the quotient model D/ ⇠ derived using this relation.

Definition 5 (Quotient DTMC). Given DTMC D = (S, sinit

, P,L), the quo-tient DTMC is the DTMC D/ ⇠, defined as D/ ⇠= (S0, s0

init

, P 0, L0) where:

– S0 = S/ ⇠= {[s]⇠ | s 2 S}– s0

init

= [sinit

]⇠– P 0([s]⇠, [s0]⇠) = P (s, [s0]⇠)– L0([s]⇠) = L(s)

3 Bisimulation Minimisation

3.1 Overview

The process of constructing the quotient model (corresponding to the coarsestpossible probabilistic bisimulation relation) for a Markov chain is referred to asbisimulation minimisation. In other domains, it is also known either as “lumping”


or “aggregation”. Most algorithms that perform bisimulation minimisation arebased on the classic partition refinement algorithm of Paige and Tarjan [18],which starts with a coarse initial partition and then repeatedly splits blocks ofthe partition until the required relation is found.

Derisavi et al. [10] have proved that the quotient of a finite Markov chain canbe constructed in O(m log n) time by using statically optimal trees (e.g., splaytrees [19]), for n state observable processes and m transitions. They showed thatusing other balanced binary search trees results in the worst case running timeof the algorithm which is O(m log2 n). On the other hand, when they use splaytrees, they manage to take O(log n) factor out of the time complexity from thepreviously obtained results, with the help of the static optimality property ofsplay trees.

Derisavi has implemented two variants of the bisimulation minimisation al-gorithm; one of them uses splay trees while the other one uses red-black trees torepresent sub block trees. Although the splay tree variant is proven to be theo-retically faster [10], the results of the experiments in [8] show that, in practiceand for virtually all cases, the red-black tree variant is 10% faster comparingto the splay tree variant. The algorithm from [10] with O(m log n) time com-plexity is the fastest known algorithm. The authors also prove a lower bound ofO(m+ n log n) on the running time of any state level lumping algorithm. Thereis a noticeable gap between these two time complexities.

Derisavi et al. conjecture that the time complexity O(mlog(n)) could beachieved using a simpler solution than splay trees. In other words, the proposedalgorithm for the Markov chain lumping might benefit from using an e�cientsorting algorithm for weights. In [20], Valmari and Franceschinis present analgorithm that sorts the weights with a combination of so called possible majorityalgorithm and any O(k log k) sorting algorithm where k is the number of itemsto be sorted. They also point out an essential issue in the description of thealgorithm presented in [10], i.e. if a block is used as a splitter, and then itselfsplit into sub blocks, then it is enough to use all of them as a potential splitterexcept for one (the largest block will not be used). In the case that the main blockis not a splitter then every resulting block must be used as a splitter. The MRMCmodel checker [13] implements the time-optimal partition refinement algorithmpresented in [10]. In the implementation, the splay tree is replaced with theheapsort data structure which gives approximately the same performance as thesplay tree implementation.

3.2 Minimisation Algorithms

We now describe some of the minimisation algorithms in more detail and, sub-sequently, investigate their performance.

Splitter-based bisimulation. The algorithm of Derisavi et al. [10] is actuallypresented for continuous-time Markov chains (CTMCs), but applies directly alsoto DTMCs. The underlying functionality of this algorithm is based on the notion


of splitting. Let ⇧ be a partition of the original state space S. Each block B 2 ⇧consists of a finite number of states. Any block that has the ability to split theblocks in the partition ⇧ is called a splitter B

splitter

2 ⇧. If there are si

, sj

2 Bsuch that:

P (si

, Bsplitter

) 6= P (sj

, Bsplitter

)

then the block B must be split into sub blocks B0, B1, ..., Bn

until it satisfies thefollowing condition:

P (si

, Bsplitter

) 6= P (sj

, Bsplitter

) 8si

, sj

2 Bk

for 0 k n

As mentioned above, splay trees are used in [10] to achieve the time com-plexity of O(m log n). This e�cient time complexity has been achieved with thehelp of the static optimality property of splay trees, i.e. if a certain element isaccessed from the splay tree, then that element will be placed at the root bysplaying.

A partition of the original state space is represented by an integer arraywhere indexes of the array represent concrete states and the values representthe current location, i.e. index of the new block, of each concrete state in theminimised state space. An array allows to access a certain element using an indexin constant time. Each block has a boolean flag and a splay tree. The boolean flagin a block denotes whether the block is a potential splitter or not. The splay treein the block will be used during the splitting process to sort the states residesunder the block. This splay tree will be destroyed after the splitting process tofree the memory space.

This lumping algorithm has three main phases. They are:

1. Construction of the initial partition;2. Iterative splitting process;3. Construction of the minimised model.

During the first phase, the whole state space will be partitioned so that everystate in the equivalence block has the same combination of atomic propositions.Therefore, every block B in the initial partition ⇧

init

will satisfy

L(si

) = L(sj

) 8si,j

2 B

The splitting process is the key part of this algorithm. As the first step ofthis procedure, all the states that have a transition to B

splitter

will be identifiedand the transition probability will be stored in a double array where each indexrepresents the state and the corresponding value denotes

X

s2Bsplitter

P (si

, s) si

2 S

Afterwards, si

will be stored in a linked list L where p(si

, Bsplitter

) > 0. Thislinked list L will be iterated through to identify the blocks of each state. Duringeach iteration, the corresponding state will be removed from its old block and


added to the splay tree. Each node in this splay tree is a key-value pair, wherethe key denotes p(s

i

, Bsplitter

) and the value will be a pointer to a sub block ofthe original block. Once the iteration through the linked list L is completed, allconstructed sub blocks will be added as potential splitters to the list under acondition. If the parent block is not a potential splitter, the largest sub blockwill not be chosen as a potential splitter, otherwise all the sub blocks will beconsidered as potential splitters. At the end of split process, all created subblock trees will be destroyed and the old partition will be updated with new subblocks. Once there is no further possible refinement, a quotient model will beconstructed using the current partition and the lifted distribution.

Signature-based bisimulation. An alternative approach to bisimulation min-imisation is to use a so-called signature-based approach [9]. The basic structureof the algorithm remains the same, however the approach to splitting di↵ers.Rather than using splitters, a signature corresponding to current partition iscomputed at each iteration for each state s. This signature comprises the prob-ability of moving from s in one step to each block in the partition. In the nextiteration, all states with di↵erent signatures are placed in di↵erent blocks. Asfor the splitter-based approach, the process terminates when the partition isstabilised, i.e., no further blocks need to to be split.

Possible majority candidate. The implementation presented in the splitter-based approach of [10] uses a complex tree structure (e.g., splay trees [19]) toachieve the O(m log n) time complexity. However, the authors conjecture thatthe same time complexity can be achieved using a simpler solution. In [20], suchan algorithm was presented that sorts the weights with a combination of a socalled possible majority candidate algorithm and any O(k log k) sorting algorithmwhere k is the number of transitions. The implementation of this algorithm isvery similar to the one for the original splitter-based implementation describedabove. However, the split process has been changed so that the sorting algorithmcan be applied. Initially, in this split process, the possible majority candidatealgorithm is applied on cumulative probabilities. At the end of the process, thecumulative probabilities are sorted.

3.3 Experimental Results

We have implemented the aforementioned minimisation algorithms in the PRISMtool and compared their performance on four benchmark DTMCs taken from thePRISM benchmark suite [16]:

– brp: Bounded retransmission protocol;– crowds: Crowds protocol;– egl : Contract signing protocol of Even, Goldreich & Lempel;– nand : NAND multiplexing.


Model States Transitions

brp [N=16, Max=2] 677 867crowds [TotalRuns=3, CrowdSize=5] 1198 2038egl [N=5, L=2] 33790 34813nand [N=7, K=1] 2440 3648nand [N=20, K=1] 78332 121512

Table 1. Details of benchmark DTMC models used for experiments

Model States Blocks Signature (ms) Splitter (ms)

brp [N=16, Max=2] 677 326 43 30crowds [TR=3, CS=5] 1198 41 21 26egl [N=5, L=2] 33790 229 530 88nand [N=7, K=1] 2440 1182 125 45nand [N=20, K=1] 78332 39982 17514 613

Table 2. Comparison of the performance of signature- and splitter-based algorithms

Table 1 shows the details of the DTMCs we have used and their sizes.

Experiments were run on a PC with an Intel Core i7-2630QM processorand 8GB RAM. We first compare the performance of the signature-based andsplitter-based algorithms. Table 2 shows the results obtained for the benchmarkmodels: this includes the size of the quotient model produced (i.e., the numberof blocks in the final partition) and the time required for the whole process.

Although the times for the brp and crowds model are quite similar, the timesfor the egl and nand models are significantly di↵erent. The primary cause seemsto be the size of the models, i.e., the number of states and transitions that theycontain. The signature-based approach ends up considering (i.e. computing thesignature for) many states in each iteration where it is not necessary to do so.The splitter-based algorithm, on the other hand, performs splits one by one, asneeded, resulting in less wasted e↵ort. It is important to note that, here, weare focusing on explicit-state implementations. If using symbolic methods (e.g.,with data structures such as binary decision diagrams), as is done with someimplementations of signature-based methods [9], the results would be likely tovary considerably.

Now, we compare the performance of the two implementations of the splitter-based approach: one uses splay trees (as in [10]); the other uses the possiblemajority candidate (PMC) approach of [20]. The results for the same modelsconsidered in Table 2 are included in Table 3.

In this case, the performance of the two approaches is rather similar. Thedi↵erence is only apparent on larger models (e.g., nand [N = 20,K = 1]), wherethe possible majority candidate algorithm is slightly faster.


Model Blocks Sorting&PMC (ms) SplayTree (ms)

brp [N=16, Max=2] 326 28 30crowds [TR=3, CS=5] 41 27 26egl [N=5, L=2] 229 89 88nand [N=7, K=1] 1182 45 45nand [N=20, K=1] 39982 750 613

Table 3. Performances of the two implementations of the splitter-based algorithm,using: possible majority candidate (PMC) with sorting or splay trees.

4 Finite-Horizon Bisimulation Minimisation

The bisimulation minimisation algorithms considered in the previous sectiongenerate the coarsest possible probabilistic bisimulation reduction for a givenMarkov chain and its associated labelling with atomic propositions. These arebased on an iterative splitting procedure which continues until no further split-ting is possible, indicating that the probabilistic bisimulation has been identified.

As mentioned previously, this bisimulation quotient preserves any infinite-or finite-horizon properties to be checked on the original Markov chain. We nowdevelop a finite-horizon variant of bisimulation, which represents a more aggres-sive reduction of the model, but which only preserves finite-horizon properties.In particular, we currently aim to preserve time-bounded reachability propertiessuch as:

P=?[F20

fail ]

i.e., “what is the probability of a failure occurring with 20 time steps?”. Thisrepresents a very commonly used class of properties, which can be used to reasonabout the timing or e�ciency of a probabilistic system. By refining the Markovchain only up to the point where it can answer such questions, we will potentiallyavoid expensive iterations of the splitting process.

The experimental results of the previous section indicate that the signature-based minimisation algorithm is slower than the splitter-based one (at least in theexplicit state implementation considered here). One advantage it has, though,is that, after k iterations of splitting we know that we have already split stateswhose behaviour, k steps into the future, is identical (with respect to the atomicpropositions labelling the Markov chain). We thus build on the signature-basedapproach to perform finite-horizon minimisation.

To get an indication of the potential savings to be made, Table 4 shows thenumber of iterations of splitting required by both signature-based and splitter-based bisimulation minimisation. It is not particularly instructive to comparethese two sets of figures since the work performed in each iteration di↵ers alot: the signature-based approach executes a smaller number of more expensivesplitting operations. On the other hand, it is useful to note that the number ofsignature-based splits, can be relatively high (e.g. for brp and nand). Thus, if


Signature Splitterbrp [N=16, Max=2] 98 326crowds [TotalRuns=3, CrowdSize=5] 18 41egl [N=5, L=2] 33 229nand [N=7, K=1] 85 1182

Table 4. Number of iterations to converge for both signature and splay tree basedalgorithms

the time bound in a finite-horizon property to be checked is relatively small, itmay be possible to minimise the model more e�ciently.

4.1 Minimisation Algorithm

The finite-horizon bisimulation algorithm, MinimiseFiniteHorizon, is shownin Algorithm 1. It requires three parameters: the concrete state space S, the setof atomic propositions AP and the time horizon k, i.e., the number of iterationsin the splitting procedure.

As a first step, the algorithm calls InitialisePartition (see Algorithm 2),passing both S,AP as inputs. In InitialisePartition, the original state spaceis grouped based on the di↵erent combinations of atomic propositions, i.e. stateswith identical combinations of atomic propositions are merged in one block. Oncethese blocks are constructed, they are added to the initial partition ⇧.

Afterwards, MinimiseFiniteHorizon repeatedly calls the Split algorithm(see Algorithm 3), from within a loop which terminates either when k iterationshave been completed or no further splitting is possible. In the split algorithm,the distribution of each state s 2 S is computed over the current partition andstored in Sig, where Sig can be any key-value pair data structure. Once thecomputation of the probability distribution for every state is completed, stateswith the same Sig are put in a new block and these blocks are added to thepartition ⇧ 0. After the complete construction of partition ⇧ 0, ⇧ 0 replaces theold partition ⇧.

Algorithm 1: MinimiseFiniteHorizon

Data: S,AP, k (time-bound)Result: S 0

⇧ = InitialisePartition (S,AP)isChanged := true

counter := 0

while (isChanged) || (counter < k) doisChanged = Split (⇧)counter := counter + 1

S 0 := ⇧


Algorithm 2: InitialisePartition

Data: S, APResult: ⇧

⇧ := ;for a 2 AP do

B := ;for s 2 S \ L(s) = a do

B := B [ s

end

⇧ := ⇧ [B

end

Algorithm 3: SplitData: ⇧Result: isChanged

⇧

0 := ;for s

i

2 S do

Sig := ;for s

i

! s

j

do

B := block of sj

if B 2 Sig then

Sig.B := Sig.B +Q(si

, s

j

)else

Sig.B := Q(si

, s

j

)end

end

if B 2 ⇧

0 matches Sig then

B := B [ s

i

else

B := s

i

⇧

0 := ⇧

0 [B

end

end

if (⇧ 0 6= ⇧) thenisChanged := true

⇧ := ⇧

0

else

isChanged := false

end


brp Normal k = 6 k = 7 k = 8 k = 9 k = 10x = 60 2.28E-04 5.92E-04 1.19E-05 1.78E-05 1.81E-05 2.28E-04x = 70 2.69E-04 7.06E-04 1.19E-05 1.78E-05 1.81E-05 2.69E-04x = 80 3.13E-04 8.20E-04 1.19E-05 1.78E-05 1.81E-05 3.13E-04x = 90 3.58E-04 9.35E-04 1.19E-05 1.78E-05 1.81E-05 3.58E-04x = 100 4.00E-04 1.05E-03 1.19E-05 1.78E-05 1.81E-05 4.00E-04

Table 5. Results for the finite-horizon property P=? [ Fx (s=5) ] on brp model

crowds Normal k = 10 k = 12 k = 14 k = 16 k = 18x = 20 0.018033 0.015488 0.014145 0.014145 0.014145 0.018033x = 30 0.034516 0.023447 0.017027 0.017027 0.017027 0.034516x = 40 0.043052 0.031397 0.018377 0.018377 0.018377 0.043052x = 50 0.048263 0.039295 0.019104 0.019104 0.019104 0.048263x = 60 0.051008 0.047066 0.01931 0.01931 0.01931 0.051008

Table 6. Results for the finite-horizon property P=? [ Fx (observe0>1) ] on crowdsmodel

The final step (not shown in Algorithm 1) is to construct the minimisedDTMC. This is similar, but not identical, to the process for building the quotientMarkov chain corresponding to a full minimisation (see Definition 5). The statespace is taken to be the final partition ⇧, i.e., each block of equivalent statesbecomes a state in the minimised model. The transition probabilities of theconnected states are taken from the signatures Sig constructed for each block,with respect to the final partition ⇧. Here, we must take care since, unlike fullbisimulation minimisation, the signatures are not identical for all states in eachblock. However, it su�ces to use the signature for an arbitrary state in the block.

4.2 Experimental Results

We implemented the algorithm described above and applied it to the benchmarksused in the previous section. Tables 5 and 6 show the results of the experimentson the brp and crowd models: we minimise the model over k iterations andthen check a step-bounded PCTL property of the form P=?[Fx

target ] on theresulting quotient model. The tables show the probability values obtained in eachcase. To guarantee correctness of the results, we need to take k � x. Interestingly,we see that, for some examples, the correct answers are actually obtained formuch smaller values of k. For example, we can see for the first brp model, preciseanswers for the given reachability questions were obtained by the 10th iterationwhereas the typical signature-based bisimulation gave the final quotient modelafter 98 iterations. On the other hand, for the crowds model, precise answerswere obtained after 18 iteration which is same as the number of iterations thattraditional approach took to give the final quotient model.

To give an idea of the e�ciency of the finite-horizon bisimulation, Figure 1shows the number of the blocks3 in the partition generated by finite-horizon

3 We can consider the ‘blocks’ of the partition as ‘states’ in the minimised model, sosometimes we use these terms interchangeably.


Fig. 1. Partition sizes for k steps of finite-horizon bisimulation minimisation.

Fig. 2. Running times for k steps of finite-horizon property checking.


Fig. 3. Running times for k steps of finite-horizon property checking.

bisimulation minimisation for di↵erent values of k on the four case studies. Forlarge enough values of k, we eventually generate the partition correspondingto the full (non-finite-horizon) bisimulation. In most cases, the growth in thenumber of blocks is close to linear in k, although it is rather less regular forthe nand example. In all cases, it seems that the growth is slow enough to takeadvantage of the finite-horizon bisimulation minimisation.

Figure 2 shows, for the same four case studies, the time required by thefinite-horizon minimisation approach and compares it to the time needed forfull bismulation minimisation. It is possible to see, as expected, that the timerequired for the finite-horizon case is less, and that the times are identical forthe highest value of k considered (i.e., the value that produces a full bisimulationminimisation). Here, we show the total time needed for both the minimisationprocess and the subsequent checking of a finite-horizon property. The improve-ment in computation time come from both the faster minimisation process andthe the fact that finite-horizon reduced model is smaller and can be analysedmore e�ciently. Figure 3 shows, for the nand case study, a separate comparisonof the time needed for minimisation and computation, illustrating that both as-pects contribute to the overall speed-up. Similar behaviour is observed on theother case studies.

Finally, we observe that, although finite-horizon bisimulation minimisationis shown here to be more e�cient than the classical variant, both minimisationimplementations presented above do not yield significant enough improvement incomputation time to make their usage worthwhile. This is because they rely onthe full Markov chain being constructed before any minimisation is applied. On


these examples, it is faster to analyse the complete model, after it is constructed,than to minimise it and then analyse the reduced model. In the next section, wepropose methods to address this issue.

5 On-the-Fly Finite-Horizon Minimisation

As discussed above, the main bottleneck in the implementation of finite-horizonbisimulation minimisation presented in the previous section is the constructionof the full probabilistic model prior to minimisation. This can be expensive interms of time, or may limit the applicability of the techniques to larger models,even if they can potentially be reduced to a manageable size.

In this section, we propose methods to compute a finite-horizon bisimulationminimisation in an on-the-fly fashion, where the minimised model is constructeddirectly from a high-level modelling language description of the original model.In our case, the probabilistic models are described using the modelling languageof the probabilistic verification tool PRISM [15], which uses a guarded commandnotation based on Reactive Modules [2].

Typically, when performing probabilistic verification of a model, the statespace generation of the model begins with a set of initial states (in fact, oftenjust a single initial state), and then identifies all the possible states that canbe reached from there through a forwards exploration. We are usually theninterested in properties regarding the behaviour of the model starting in theinitial state(s).

However, one of the strengths of probabilistic verification is the exhaustivenature of the analysis it performs on a model. For example, rather than asking“is the probability of an error occurring within 10 seconds, starting in the initialstate, less than 0.01?”, it might be preferable to identify all possible initial statesof a model from which the probability of an error occurring within 10 secondsexceeds the threshold 0.01. It turns out that our on-the-fly approach is wellsuited to exactly this kind of problem.

On-the-fly finite-horizon bisimulation minimisation begins with a set of targetstates and explores the model backwards, rather than forwards. If our interest isin the probability of reaching this target set with some finite horizon k, we do notneed to explore all states of the model, or even all those designated as possibleinitial states, just those that have a non-zero probability of reaching the targetin time. On-the-fly finite-horizon bisimulation minimisation identifies preciselythese states, reducing the model by merging equivalent states as it proceeds.On, the other hand, we can also continue exploring and minimising until no newstates are found, in which case the process identifies a full (non-finite horizon)bisimulation minimisation.

5.1 Minimisation Algorithm

The basic approach to performing finite-horizon minimisation on the fly is shownas FiniteHorizonOnTheFly, in Algorithm 4. This requires a target block


Algorithm 4: FiniteHorizonOnTheFlyData: B

target

, kResult: quotientBlocks

P

curr

:= FindPredecessors(Btarget

)P

new

:= ;quotientBlocks := {B

target

}

while P

curr

6= ; & k 6= 0 do

B

new

:= pop(Pcurr

)RefineQuotientBlocks(quotientBlocks, B

new

)

if B

new

6= ; then

quotientBlocks := quotientBlocks [B

new

P

new

:= P

new

[ FindPredecessors(Bnew

)end

if (Pcurr

= ; & P

new

6= ;) then

P

curr

:= P

new

P

new

:= ;k := k � 1

end

end

Btarget

, representing a set of states. The variable k represents the number of timesteps to be explored backwards from B

target

. The algorithm begins by adding allthe discovered predecessors of B

target

to the list of current predecessors Pcurr

. Inthe process of finding predecessors, all the states with equal cumulative transitionprobability T to a given block B

new

are put into a single block. The cumulativetransition probability T is defined as follows,

T =X

s

02Bnew

P (s, s0)

A loop then iterates whilst Pcurr

is not empty and the given finite horizon k hasnot been reached. As the first step in the while loop, a block B

new

is poppedfrom P

curr

and all the existing quotient blocks are refined with respect to Bnew

.During the refinement process, the block B

new

and a list of quotient blocks thatare to be refined with respect to B

new

are passed as parameters to the algorithmRefineQuotientBlocks (see Algorithm 5).

This then iterates through the list quotientBlocks to verify whether thereis an existing block B

old

that intersects with Bnew

. If there is an intersection,a new block B

n\o

is created to represent the intersecting states and also thisblock’s transitions will be the union of both B

new

and Bold

.In the case where both B

new

and Bold

represent the same set of states, Bold

will be replaced by Bn\o

. Otherwise, the set of intersecting states are removedfrom both B

new

and Bold

. If Bold

⇢ Bnew

, as before Bold

will be replaced byB

n\o

. In the other cases, Bn\o

is added as a new member to the quotientBlocksand the whole quotient blocks’ transitions are recomputed with respect to B

old


Algorithm 5: RefineQuotientBlocksData: quotientBlocks, B

new

for B

old

2 quotientBlocks do

B

n\o

:= B

new

\B

old

if B

n\o

6= ; then

B

n\o

.T ransitions := B

new

.T ransitions [B

old

.T ransitions

if B

new

= B

old

then

B

old

:= B

n\o

B

new

:= ;else

B

new

:= B

new

\Bn\o

B

old

:= B

old

\Bn\o

if B

old

= ; then

B

old

:= B

n\o

else

quotientBlocks := quotientBlocks [B

n\o

refine quotientBlocks further wrt. Bold

and B

n\o

end

end

if B

new

= ; then

break the loopend

end

end

and Bn\o

. The quotient blocks are split further if they comprise states withdi↵erent T . This further refinement continues until there is no more change inthe quotient blocks.

After the refining process, if Bnew

has not become an empty block then itwill be added as a new member to the list quotientBlocks and predecessors ofB

new

will be added to the list Pnew

. Whenever Pcurr

becomes empty, a timestep has been successfully completed. If P

curr

becomes empty while Pnew

is notempty, then P

curr

will be replaced by Pnew

, so that the loop can proceed further.

Computing predecessors. One of the key challenges faced by this algorithmis determining the predecessors of a given state from the high-level modellinglanguage description. For example, in the PRISM language, a model is madeup of one or more modules, each of whose behaviour is represented by guardedcommands.

Consider the following guarded command,

s = 6 ! 0.5 : (s0 = 2) + 0.5 : (s0 = 7) & (d0 = 6)

This guarded command describes that, when a state satisfies the guard (s =6), the updates 0.5 : (s0 = 2) and 0.5 : (s0 = 7) & (d0 = 6) can be executed on s.In other words, a state s = 6 can move either to a state (s = 2) with probability0.5 or to a state (s = 7) & (d = 6) with probability 0.5.


In the following sections, we describe two approaches to finding predecessors:one symbolic, which represents blocks (sets of states) as predicates and uses anSMT (satisfiability modulo theories) [5] based implementation; and one explicit-state, which explicitly enumerates the states in each block.

5.2 Symbolic (SMT-based) Minimisation

The state space of a probabilistic model can be reasoned about e�ciently byusing boolean formulas to represent large numbers of states as sets, which isknown as symbolic representation. We have made use of SMT solvers to imple-ment the on-the-fly algorithm in a symbolic fashion. An SMT solver is a tool fordetermining the satisfiability of formulas under the first order theories such asequality reasoning, arithmetic, quantifiers, etc.

A guarded command of a PRISM model provides a description of the be-haviour of a state which satisfies its guard. An SMT query for a given state anda particular guarded command is constructed in conjunctive normal form (CNF)to determine the predecessors of a given target state:

b ^ g ^ constraint ^ wp(update, targetexpr

)

A state is a combination of valuations of a set of variables. The upper andlower bounds of these variables are indicated by b. A guard g is a booleanexpression. Only if a state satisfies the g of a command, can the updates ofthe command be executed on the corresponding state. We can limit the scopeof the set of predecessors by defining constraint. The weakest precondition oftarget

expr

with respect to the update is denoted as wp(update, targetexpr

). Aweakest precondition describes that if the result of executing an update on astate source

expr

satisfies targetexpr

, sourceexpr

is considered as a predecessor ofthe given target

expr

.

sourceexpr

update��! resultexpr

with resultexpr

✏ targetexpr

FindPredecessors (see Algorithm 6) is used to determine the predecessorsfor a given target. In this approach, an SMT query is created for each andevery update for a given target expression. If an SMT query is satisfiable, avalid probability is obtained from the query to form the constraint (prob

expr

=prob). The conjunction of the query and the formed constraint denotes the setof predecessors with the same probabilities. A query can contain multiple validprobabilities when prob

expr

depends on the valuation of the variables. Therefore,every time a valid probability is obtained it will be used as a blocking expressionto obtain all the remaining distinct probabilities.

SMT-based methods for bisimulation minimisation have been developed pre-viously [7]. One key di↵erence here is that our approach handles transition prob-abilities expressed as state-dependent expressions, rather than fixed constants,which are important for the models we consider here.


Algorithm 6: FindPredecessorsData: model, targetResult: P

M := ;b := model.getBounds()constraint = model.getConstraint()

for command 2 model do

g := command.getGuard()for update 2 command do

update

expr

:= update.getUpdate()prob

expr

:= update.getProb()query := b ^ g ^ constraint ^ (p = prob

expr

) ^wp(updateexpr

, target)while query.isSat() do

prob := query.getVal(p)predecessor

expr

:= query ^ (p = prob)if prob /2 M then

B := new Block()B.addExpr(predecessor

expr

);B.addTransition(prob , targetBlock)M .put(prob, B)

else

B := M .get(prob)B := B.addExpr(predecessor

expr

)end

query := query ^ (p 6= prob)end

end

end

P := M .getAllBlocks()

Example. We illustrate the SMT-based approach with an example. We use aPRISM model of a Tournament game [21], which comprises K particles labelledwith values from a range 0, . . . , N�1. Particles interact at random, and whendoing so, the particle with the larger state value wins, and copies its value tothe other. Figure 4 shows the PRISM model, for N = 3,K = 5. For modellingconvenience, we actually describe the model as a continuous-time Markov chain(CTMC), and consider its embedded Markov chain as the DTMC to be analysed.

Figure 5 shows an example of an SMT query that is constructed for the inputtarget expression (c2 = 5) & (c0 + c1 = 0) and for the following command:

c1 > 0 & c2 > 0 & c2 < K ! 2⇥ c1⇥ c2 : (c10 = c1� 1) & (c20 = c2 + 1)

In the model, variable ci counts the number of particles in state i. The commandrepresents the interaction between particles in states 1 and 2.

The SMT query above will be dispatched as an input to the solver to check forits satisfiability. If this query is satisfiable, a value for p will be retrieved from


ctmc

// Number of fitness levels: N

const int N = 3;

// Total number of agents/particles: K

const int K = 5;

module tournament

// Counters: ci = number of agents/particles with fitness i

c0 : [0..K ];c1 : [0..K ];c2 : [0..K ];

// Possible reactions between agents/particles

// Each possible pairwise collision

[r01 ] c0>0 & c1>0 & c1<K ! 2 ⇤ c0 ⇤ c1 : (c0 0=c0 � 1) & (c1 0=c1 + 1);[r02 ] c0>0 & c2>0 & c2<K ! 2 ⇤ c0 ⇤ c2 : (c0 0=c0 � 1) & (c2 0=c2 + 1);[r12 ] c1>0 & c2>0 & c2<K ! 2 ⇤ c1 ⇤ c2 : (c1 0=c1 � 1) & (c2 0=c2 + 1);// Collision between 2 identical agents/particles

[r00 ] c0>1 ! c0 ⇤ (c0 � 1) : true;[r11 ] c1>1 ! c1 ⇤ (c1 � 1) : true;[r22 ] c2>1 ! c2 ⇤ (c2 � 1) : true;

endmodule

// Initial states

init c0 + c1 + c2=K & c2>0 endinit

// Labels (atomic propositions) for properties:

// Finished: all agents/particles have maximum fitness

label “done” = c2�K ;label “target” = c2=K & c0 + c1=0;

// Reward structure used to reason about passage of time (discrete steps)

rewards “time”true : 1;

endrewards

Fig. 4. PRISM modelling language description of the Tournament game (N = 3).

the solution produced by the solver. As presented in Figure 6, an expressionpredecessor

expr

for a set of states that satisfies the value p will be generated.Afterwards, the initial SMT query will be updated as shown in Figure 7, i.e.,(prob

expr

6= p) is used as a blocking expression, to rule out the previous solution.

Experimental Results. We have implemented symbolic finite-horizon min-imisation using the SMT solver Z3 [6]. This was developed as an extension ofthe PRISM model checker, building upon its language parsers and probabilisticmodel storage/analysis. It is implemented in Java, using the Z3 Java API.

We evaluate the approach using the Tournament game discussed above, fora variety of parameters. As discussed at the beginning of Section 5, we targetmodels with many possible initial configurations, for which we are interested inthe states that can reach a target within some finite time. We hence adopt di↵er-


Fig. 5. SMT query representing a guarded command

Fig. 6. SMT query to find predecessors for a specific probability value

Fig. 7. SMT query updated with a blocking expression to find further matches

ent case studies for evaluation to those used for the preliminary implementationsin the earlier sections of this report.

The results are shown in Table 7 which gives, for a variety of model pa-rameters N and K, model sizes and times for a several scenarios. Under theheading ‘Full Red.’, columns ‘States’ and ‘Blocks’ shown the size of the full


N K kFull Red. Finite Horiz. Time (s)

States Blocks States Blocks PRISM Full Red. Finite Horiz.

4

93

165 920 5

0.03 1554.5

4 35 6 11.15 56 7 23.5

103

220 1020 5

0.03 2159.3

4 35 6 15.15 56 7 31.1

5

93

330 935 5

0.04 723.422.1

4 70 6 70.75 126 7 180.9

103

495 1035 5

0.04 1998.748.8

4 70 6 82.05 126 7 233.7

Table 7. Experimental results: SMT-based implementation on the Tournament game.

DTMC and the fully reduced quotient model, respectively. Columns ‘PRISM’and ‘Full Red.’ on the right-hand side show the time taken to build the fullmodel (in PRISM) and to perform full bisimulation minimisation. Then, underheading ‘Finite Horiz.’, for various di↵erent time horizons k, columns ‘Blocks’and ‘States’ show the size of the finite-horizon bisimulation minimisation (num-ber of blocks) and the total number of states across all blocks. The rightmostcolumn gives the time required for finite-horizon bisimulation minimisation.

This example has a very compact minimised form (both the finite-horizonand full variants). On a positive note, the SMT-based approach successfullyperforms the minimisation and gives a symbolic (Boolean expression) represen-tation for each block. However, the process is slow, limiting the applicability ofthe approach to relatively small DTMCs which PRISM can build and analysevery quickly.

The reason for the slow performance can be explained as follows. A guard of acommand is a simple representation of a set of states for which the correspondingcommand is enabled. In some models, such as this one, guards of the commandsin the PRISM language description may overlap. Hence, the set of predecessorsidentified for a given target from di↵erent commands may overlap each otherand, as a result, every set of predecessors has to be compared against each otherto remove any overlaps, such that these predecessors become disjoint sets ofstates. This results in a very large number of calls to the SMT solver.

5.3 Explicit-State Minimisation

As an alternative to the symbolic approach using SMT, we developed an explicit-state implementation of finite-horizon minimisation in which the blocks of equiv-alent states are represented by explicitly listing the states that comprise them.As above, the blocks are refined at each time step such that states residing in


Algorithm 7: FindPredecessorsData: targetBlock

Result: P

L

s

:= identifyPredecessors(targetBlock)L

t

:= computeLiftedTransitions(Ls

)M(t, B) := ;for s 2 L

s

do

t := L

t

.get(Ls

.indexOf(s))if t /2 M then

B := new Block()B.addState(s);B.addTransition(t , targetBlock)M .put(t, B)

else

B := M .get(t)B.addState(s)

end

end

P := M .getAllBlocks()

the same block have equal cumulative transition probability T . To improve per-formance and store states compactly, we hash them based on the valuation ofvariables that define them. This is done in such a way that the hash values arebi-directional (one-to-one).

Algorithm 7 summarises the identification of predecessors using this ap-proach. As a first step, all the states that have outgoing transitions to the targetblock are identified and stored in the list L

s

. Then the cumulative transitionsfor each of these states are computed and stored in the list L

t

, respectively tothe position of the states in L

s

. Afterwards, the states in Ls

with the same tran-sition probabilities are put in a single block using a splay tree data structure.Thus, any block that is generated in this method represents a group of statesthat behaves equivalently with respect to the given targetBlock. The collectionof these blocks is returned at the end of the algorithm.

Experimental results. As in the previous section, we develop an implemen-tation of this version of finite-horizon minimisation as an extension of PRISMin Java. We evaluate it on the Tournament game example used earlier and ontwo further examples: Modulus and Approximate Majority. The Modulus gameis similar to the Tournament game example, but with a modified update rule: fortwo colliding particles x and y, one particle changes to 0 and the other to x+ ymod N . The Approximate Majority example is a simple population protocol forcomputing a majority value amongst a set of agents [3].

Table 8 summarises the experimental results for all three examples, in exactlythe same format as for the SMT results explained above. We see that the results


N K kFull Red. Finite Horiz. Time (s)


9

238

5852925 236435 10

705.0 21.60.1

9 12870 11 0.210 24310 12 0.3

248

7888725 246435 9

4780.8 29.80.1

9 12870 10 0.310 24310 11 0.3

10

218

10015005 2111440 10

59.0 43.60.3

9 24310 11 0.410 48620 12 0.5

228

14307150 2211440 10

61.3 51.30.5

9 24310 11 0.610 48620 12 0.8

(a) Tournament example

N K kFull Red. Finite Horiz. Time


7

198

177100 2955011694 3614

0.4 1091.019.0

9 22003 6445 64.210 39126 10854 190.5

208

230230 384275678 3596

0.5 1930.617.8

9 11702 6440 63.710 21939 10831 188.5

9

116

75578 1270323817 3282

0.3 175.819.7

7 48419 7530 79.78 67597 11251 142.1

126

125954 2114024038 3330

0.3 567.421.9

7 51091 7902 100.88 85309 13679 262.6

(b) Modulus example

K kFull Red. Finite Horiz. Time (s)


10020

20300 10200242 121

11.0 10.70.1

40 882 441 0.260 1922 961 0.3

150100

45450 228005202 2601

46.1 62.51.3

150 11552 5776 4.7200 20402 10201 12.4

200250

80600 4040031752 15876

memout 202.435.9

300 45602 22801 68.7350 61952 30976 121.2

250375

125750 6300071064 35532

memout 532.5154.6

400 80802 40401 199.9425 91164 45582 254.0

(c) Approximate Majority example

Table 8. Experimental results: Explicit-state implementation on three examples.


Fig. 8. Modulus game (N=7,K=20): Reduction obtained for varying time horizons k.

Fig. 9. Modulus game (N=7): Sizes of the original model and (full) minimised version.

are significantly better than for the SMT implementation and the benefits ofapplying the finite-horizon approach are more clearly demonstrated on bothexamples. For the Tournament game, in particular, there is a very large reductionin state space, as before. But now, our implementation works much faster: (a)it can now handle much larger Markov chains (including cases where the fullmodel has over 14 million states); and (b) it is faster than PRISM. In fact, it isfaster than PRISM even when executing full bisimulation minimisation, but theimprovements are bigger when restricting to a finite horizon.

For the Modulus example, significant reductions in model size are still ob-served (e.g. from 125,954 to 3,330 for N = 9,K = 2 and finite horizon k = 6).In Figure 8, we show how the size of the reduced model varies with k for anindicative example: N = 7,K = 20. Here for values of k less than 14, the finite-horizon minimisation yields an improvement over the full reduction. Figure 9shows that, generally, the reductions obtained increase as K does. The reduc-tions obtained on this example are smaller than for the previous example and,as a result, PRISM is able to compute the model faster. In the other hand, forthe Approximate Majority example, comparable reductions are observed, butthe minimisation approach can be applied to larger models than can be han-


dled by PRISM. For this example, although the state spaces of the full modelare manageable, the models prove poorly suited to PRISM’s model constructionimplementation (which is based on binary decision diagram data structures).

In addition to the amount of reduction obtained, a second factor that in-fluences the e↵ectiveness of the minimisation approaches is the structure of themodel. For the Tournament game, the model is actually a directed acylic graph,but the others have loops. The former case is handled more e�ciently since thereis less overhead when checking for overlapping blocks.

6 Conclusions

We have investigated a variety of algorithms for probabilistic bisimulation min-imisation in the probabilistic model checking PRISM. In particular, we devel-oped a finite-horizon variant of bisimulation minimisation, which partially re-duces models for the purposes of checking finite-horizon properties. This hasbeen implemented, as an extension of PRISM, suing both symbolic (SMT-based)and explicit-state techniques. Applying to several benchmark examples, we il-lustrated that significant model reductions can be obtained in this manner, re-sulting in improvements in both execution time and scalability with respect tothe existing implementations in PRISM.

References

1. H. Aljazzar, M. Fischer, L. Grunske, M. Kuntz, F. Leitner, and S. Leue. Safetyanalysis of an airbag system using probabilistic FMEA and probabilistic counterex-amples. In Proc. 6th Int. Conf. Quantitative Evaluation of Systems (QEST’09),2009.

2. R. Alur and T. Henzinger. Reactive modules. Formal Methods in System Design,15(1):7–48, 1999.

3. D. Angluin, J. Aspnes, and D. Eisenstat. A simple population protocol for fastrobust approximate majority. Distributed Computing, 21(2):87–102, 2008.

4. A. Aziz, V. Singhal, F. Balarin, R. K. Brayton, and A. L. Sangiovanni-Vincentelli.It usually works: The temporal logic of stochastic systems. In Computer AidedVerification, pages 155–165. Springer, 1995.

5. L. De Moura and N. Bjørner. Satisfiability modulo theories: Introduction andapplications. Commun. ACM, 54(9):69–77, Sept. 2011.

6. L. de Moura and N. Bjrner. Z3: An e�cient SMT solver. In Tools and Algo-rithms for the Construction and Analysis of Systems, 14th International Confer-ence, TACAS 2008, Held as Part of the Joint European Conferences on Theory andPractice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008.Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337–340.Springer, 2008.

7. C. Dehnert, J.-P. Katoen, and D. Parker. SMT-based bisimulation minimisationof Markov models. In Verification, Model Checking, and Abstract Interpretation,pages 28–47. Springer, 2013.

8. S. Derisavi. Solution of large Markov models using lumping techniques and sym-bolic data structures. 2005.


9. S. Derisavi. Signature-based symbolic algorithm for optimal Markov chain lump-ing. In Proc. 4th International Conference on Quantitative Evaluation of Systems(QEST’07), pages 141–150. IEEE Computer Society, 2007.

10. S. Derisavi, H. Hermanns, and W. H. Sanders. Optimal state-space lumping inMarkov chains. Information Processing Letters, 87(6):309–315, 2003.

11. H. Hansson and B. Jonsson. A logic for reasoning about time and reliability. Formalaspects of computing, 6(5):512–535, 1994.

12. J. Heath, M. Kwiatkowska, G. Norman, D. Parker, and O. Tymchyshyn. Proba-bilistic model checking of complex biological pathways. In C. Priami, editor, Proc.Computational Methods in Systems Biology (CMSB’06), volume 4210 of LectureNotes in Bioinformatics, pages 32–47. Springer Verlag, 2006.

13. J.-P. Katoen, I. S. Zapreev, E. M. Hahn, H. Hermanns, and D. N. Jansen. Theins and outs of the probabilistic model checker MRMC. Performance evaluation,68(2):90–104, 2011.

14. J. Kemeny, J. Snell, and A. Knapp. Denumerable Markov Chains. Springer-Verlag,2nd edition, 1976.

15. M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of proba-bilistic real-time systems. In G. Gopalakrishnan and S. Qadeer, editors, Proc. 23rdInternational Conference on Computer Aided Verification (CAV’11), volume 6806of LNCS, pages 585–591. Springer, 2011.

16. M. Kwiatkowska, G. Norman, and D. Parker. The PRISM benchmark suite. In 9thInternational Conference on Quantitative Evaluation of SysTems, pages 203–204,2012.

17. K. G. Larsen and A. Skou. Bisimulation through probabilistic testing. Informationand Computation, 94(1):1–28, 1991.

18. R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM Journalon Computing, 16(6):973–989, 1987.

19. D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. Journal of theACM (JACM), 32(3):652–686, 1985.

20. A. Valmari and G. Franceschinis. Simple o(m logn) time Markov chain lumping. InTools and Algorithms for the Construction and Analysis of Systems, pages 38–52.Springer, 2010.

21. M. Vose. The simple genetic algorithm, Complex Adaptive Systems. MIT Press,1999.

Verification of Markov Decision Processesusing Learning Algorithms?

Tomas Brazdil1, Krishnendu Chatterjee2, Martin Chmelık2, Vojtech Forejt3,Jan Kretınsky2, Marta Kwiatkowska3, David Parker4, and Mateusz Ujma3

1 Masaryk University, Brno, Czech Republic 2 IST Austria3 University of Oxford, UK 4 University of Birmingham, UK

Abstract. We present a general framework for applying machine-learning algo-rithms to the verification of Markov decision processes (MDPs). The primarygoal of these techniques is to improve performance by avoiding an exhaustive ex-ploration of the state space. Our framework focuses on probabilistic reachability,which is a core property for verification, and is illustrated through two distinctinstantiations. The first assumes that full knowledge of the MDP is available,and performs a heuristic-driven partial exploration of the model, yielding pre-cise lower and upper bounds on the required probability. The second tackles thecase where we may only sample the MDP, and yields probabilistic guarantees,again in terms of both the lower and upper bounds, which provides efficient stop-ping criteria for the approximation. The latter is the first extension of statisticalmodel checking for unbounded properties in MDPs. In contrast with other relatedtechniques, our approach is not restricted to time-bounded (finite-horizon) or dis-counted properties, nor does it assume any particular properties of the MDP. Wealso show how our methods extend to LTL objectives. We present experimentalresults showing the performance of our framework on several examples.

1 Introduction

Markov decision processes (MDPs) are a widely used model for the formal verificationof systems that exhibit stochastic behaviour. This may arise due to the possibility offailures (e.g. of physical system components), unpredictable events (e.g. messages sentacross a lossy medium), or uncertainty about the environment (e.g. unreliable sensors ina robot). It may also stem from the explicit use of randomisation, such as probabilisticrouting in gossip protocols or random back-off in wireless communication protocols.

Verification of MDPs against temporal logics such as PCTL and LTL typically re-duces to the computation of optimal (minimum or maximum) reachability probabilities,either on the MDP itself or its product with some deterministic !-automaton. Optimalreachability probabilities (and a corresponding optimal strategy for the MDP) can becomputed in polynomial time through a reduction to linear programming, although in? This research was funded in part by the European Research Council (ERC) under grant

agreement 267989 (QUAREM), 246967 (VERIWARE) and 279307 (Graph Games), by theEU FP7 project HIERATIC, by the Austrian Science Fund (FWF) projects S11402-N23(RiSE), S11407-N23 (RiSE) and P23499-N23, by the Czech Science Foundation grant NoP202/12/P612, by EPSRC project EP/K038575/1 and by the Microsoft faculty fellows award.

practice verification tools often use dynamic programming techniques, such as value it-eration which approximates the values up to some pre-specified convergence criterion.

The efficiency or feasibility of verification is often limited by excessive time orspace requirements, caused by the need to store a full model in memory. Common ap-proaches to tackling this include: symbolic model checking, which uses efficient datastructures to construct and manipulate a compact representation of the model; abstrac-tion refinement, which constructs a sequence of increasingly precise approximations,bypassing construction of the full model using decision procedures such as SAT orSMT; and statistical model checking [38,19], which uses Monte Carlo simulation togenerate approximate results of verification that hold with high probability.

In this paper, we explore the opportunities offered by learning-based methods, asused in fields such as planning or reinforcement learning [37]. In particular, we focus onalgorithms that explore an MDP by generating trajectories through it and, whilst doingso, produce increasingly precise approximations for some property of interest (in thiscase, reachability probabilities). The approximate values, along with other information,are used as heuristics to guide the model exploration so as to minimise the solution timeand the portion of the model that needs to be considered.

We present a general framework for applying such algorithms to the verificationof MDPs. Then, we consider two distinct instantiations that operate under different as-sumptions concerning the availability of knowledge about the MDP, and produce differ-ent classes of results. We distinguish between complete information, where full knowl-edge of the MDP is available (but not necessarily generated and stored), and limitedinformation, where (in simple terms) we can only sample trajectories of the MDP.

The first algorithm assumes complete information and is based on real-time dy-namic programming (RTDP) [3]. In its basic form, this only generates approximationsin the form of lower bounds (on maximum reachability probabilities). While this maysuffice in some scenarios (e.g. planning), in the context of verification we typically re-quire more precise guarantees. So we consider bounded RTDP (BRTDP) [31], whichsupplements this with an additional upper bound. The second algorithm assumes lim-ited information and is based on delayed Q-learning (DQL) [36]. Again, we produceboth lower and upper bounds but, in contrast to BRTDP, where these are guaranteedto be correct, DQL offers probably approximately correct (PAC) results, i.e., there is anon-zero probability that the bounds are incorrect.

Typically, MDP solution methods based on learning or heuristics make assumptionsabout the structure of the model. For example, the presence of end components [15](subsets of states where it is possible to remain indefinitely with probability 1) can resultin convergence to incorrect values. Our techniques are applicable to arbitrary MDPs.We first handle the case of MDPs that contain no end components (except for trivialdesignated goal or sink states). Then, we adapt this to the general case by means of on-the-fly detection of end components, which is one of the main technical contributionsof the paper. We also show how our techniques extend to LTL objectives and thus alsoto minimum reachability probabilities.

Our DQL-based method, which yields PAC results, can be seen as an instance ofstatistical model checking [38,19], a technique that has received considerable attention.Until recently, most work in this area focused on purely probabilistic models, without

2

nondeterminism, but several approaches have now been presented for statistical modelchecking of nondeterministic models [13,14,27,4,28,18,29]. However, these methodsall consider either time-bounded properties or use discounting to ensure convergence(see below for a summary). The techniques in this paper are the first for statisticalmodel checking of unbounded properties on MDPs.

We have implemented our framework within the PRISM tool [25]. This paper con-cludes with experimental results for an implementation of our BRTDP-based approachthat demonstrate considerable speed-ups over the fastest methods in PRISM.

Detailed proofs omitted due to lack of space are available in [7].

1.1 Related Work

In fields such as planning and artificial intelligence, many learning-based and heuristic-driven solution methods for MDPs have been developed. In the complete informationsetting, examples include RTDP [3] and BRTDP [31], as discussed above, which gen-erate lower and lower/upper bounds on values, respectively. Most algorithms makecertain assumptions in order to ensure convergence, for example through the use ofa discount factor or by restricting to so-called Stochastic Shortest Path (SSP) problems,whereas we target arbitrary MDPs without discounting. More recently, an approachcalled FRET [24] was proposed for a generalisation of SSP, but this gives only a one-sided (lower) bound. We are not aware of any attempts to apply or adapt such methodsin the context of probabilistic verification. A related paper is [1], which applies heuristicsearch methods to MDPs, but for generating probabilistic counterexamples.

As mentioned above, in the limited information setting, our algorithm based ondelayed Q-learning (DQL) yields PAC results, similar to those obtained from statis-tical model checking [38,19,35]. This is an active area of research with a variety oftools [21,8,6,5]. In contrast with our work, most techniques focus on time-boundedproperties, e.g., using bounded LTL, rather than unbounded properties. Several ap-proaches have been proposed to transform checking of unbounded properties into test-ing of bounded properties, for example, [39,17,34,33]. However, these focus on purelyprobabilistic models, without nondeterminism, and do not apply to MDPs. In [4], un-bounded properties are analysed for MDPs with spurious nondeterminism, where theway it is resolved does not affect the desired property.

More generally, the development of statistical model checking techniques for prob-abilistic models with nondeterminism, such as MDPs, is an important topic, treated inseveral recent papers. One approach is to give the nondeterminism a probabilistic se-mantics, e.g., using a uniform distribution instead, as for timed automata in [13,14,27].Others [28,18], like this paper, aim to quantify over all strategies and produce an "-optimal strategy. The work in [28] and [18] deals with the problem in the setting ofdiscounted (and for the purposes of approximation thus bounded) or bounded proper-ties, respectively. In the latter work, candidates for optimal schedulers are generatedand gradually improved, but “at any given point we cannot quantify how close to opti-mal the candidate scheduler is” (cited from [18]) and the algorithm “does not in generalconverge to the true optimum” (cited from [30]). Further, [30] considers compact rep-resentation of schedulers, but again focuses only on (time) bounded properties.

Since statistical model checking is simulation-based, one of the most important dif-ficulties is the analysis of rare events. This issue is, of course, also relevant for our

3

approach; see the section on experimental results. Rare events have been addressed us-ing methods such as importance sampling [17,20] and importance splitting [22].

End components in MDPs can be collapsed either for algorithmic correctness [15]or efficiency [11] (where only lower bounds on maximum reachability probabilities areconsidered). Asymptotically efficient ways to detect them are given in [10,9].

2 Basics about MDPs and Learning Algorithms

We begin with basic background material on MDPs and some fundamental definitionsfor our learning framework. We use N, Q, and R to denote the sets of all non-negativeintegers, rational numbers and real numbers respectively. Dist(X) is the set of allrational probability distributions over a finite or countable set X , i.e., the functionsf : X ! [0, 1] \Q such that

Px2X

f(x) = 1, and supp(f) denotes the support of f .

2.1 Markov Decision Processes

We work with Markov decision processes (MDPs), a widely used model to capture bothnondeterminism (e.g., for control or concurrency) and probability.

Definition 1. An MDP is a tuple M = hS, s, A,E,�i, where S is a finite set of states,s 2 S is an initial state, A is a finite set of actions, E : S ! 2

A assigns non-empty setsof enabled actions to all states, and � : S⇥A ! Dist(S) is a (partial) probabilistictransition function defined for all s and a where a 2 E(s).

Remark 1. For simplicity of presentation we assume w.l.o.g. that, for every action a 2A, there is at most one state s such that a 2 E(s), i.e., E(s) \ E(s

0) = ; for s 6= s

0. Ifthere are states s, s0 such that a 2 E(s) \ E(s

0), we can always rename the actions as

(s, a) 2 E(s), and (s

0, a) 2 E(s

0), so that the MDP satisfies our assumption.

An infinite path of an MDP M is an infinite sequence ! = s0a0s1a1 . . . such thata

i

2 E(s

i

) and �(s

i

, a

i

)(s

i+1) > 0 for every i 2 N. A finite path is a finite prefix ofan infinite path ending in a state. We use last(!) to denote the last state of a finite path!. We denote by IPath (resp. FPath) the set of all infinite (resp. finite) paths, and byIPath

s

(resp. FPaths

) the set of infinite (resp. finite) paths starting in a state s.A state s is terminal if all actions a 2 E(s) satisfy �(s, a)(s) = 1. An end compo-

nent (EC) of M is a pair (S0, A

0) where S

0 ✓ S and A

0 ✓S

s2S

0 E(s) such that: (1) if�(s, a)(s

0) > 0 for some s 2 S

0 and a 2 A

0, then s

0 2 S

0; and (2) for all s, s0 2 S

0

there is a path ! = s0a0 . . . sn such that s0 = s, sn

= s

0 and for all 0 i < n wehave a

i

2 A

0. A maximal end component (MEC) is an EC that is maximal with respectto the point-wise subset ordering.

Strategies. A strategy of MDP M is a function � : FPath ! Dist(A) satisfyingsupp(�(!)) ✓ E(last(!)) for every ! 2 FPath . Intuitively, the strategy resolves thechoices of actions in each finite path by choosing (possibly at random) an action enabledin the last state of the path. We write ⌃M for the set of all strategies in M. In standardfashion [23], a strategy � induces, for any initial state s, a probability measure Pr�M,s

over IPaths

. A strategy � is memoryless if �(!) depends only on last(!).

4

Objectives and values. Given a set F ✓ S of target states, bounded reachability forstep k, denoted by ⌃k

F , refers to the set of all infinite paths that reach a state inF within k steps, and unbounded reachability, denoted by ⌃F , refers to the set of allinfinite paths that reach a state in F . Note that ⌃F =

Sk�0 ⌃k

F . We consider thereachability probability Pr�M,s

(⌃F ), and strategies that maximise this probability. Wedenote by V (s) the value in s, defined by sup

�2⌃MPr�M,s

(⌃F ). Given " � 0, wesay that a strategy � is "-optimal in s if Pr�M,s

(⌃F ) + " � V (s), and we call a 0-optimal strategy optimal. It is known [32] that, for every MDP and set F , there is amemoryless optimal strategy for ⌃F . We are interested in strategies that approximatethe value function, i.e., "-optimal strategies for some " > 0.

2.2 Learning Algorithms for MDPs

In this paper, we study a class of learning-based algorithms that stochastically ap-proximate the value function of an MDP. Let us fix, for this section, an MDP M =

hS, s, A,E,�i and target states F ✓ S. We denote by V : S ⇥ A ! [0, 1] the valuefunction for state-action pairs of M, defined for all (s, a) where s 2 S and a 2 E(s):

V (s, a) :=

Xs

02S

�(s, a)(s

0) · V (s

0).

Intuitively, V (s, a) is the value in s assuming that the first action performed is a. Alearning algorithm A simulates executions of M, and iteratively updates upper andlower approximations U : S ⇥ A ! [0, 1] and L : S ⇥ A ! [0, 1], respectively, of thevalue function V : S ⇥A ! [0, 1].

The functions U and L are initialised to appropriate values so that L(s, a) V (s, a) U(s, a) for all s 2 S and a 2 A. During the computation of A, simulatedexecutions start in the initial state s and move from state to state according to choicesmade by the algorithm. The values of U(s, a) and L(s, a) are updated for the statess visited by the simulated execution. Since max

a2E(s) U(s, a) and max

a2E(s) L(s, a)

represent upper and lower bound on V (s), a learning algorithm A terminates whenmax

a2E(s) U(s, a) � max

a2E(s) L(s, a) < " where the precision " > 0 is given tothe algorithm as an argument. Note that, because U and L are possibly updated basedon the simulations, the computation of the learning algorithm may be randomised andeven give incorrect results with some probability.

Definition 2. Denote by A(") the instance of learning algorithm A with precision ".We say that A converges surely (resp. almost surely) if, for every " > 0, the computationof A(") surely (resp. almost surely) terminates, and L(s, a) V (s, a) U(s, a) holdsupon termination.

In some cases, almost-sure convergence cannot be guaranteed, so we demand that thecomputation terminates correctly with sufficiently high probability. In such cases, weassume the algorithm is also given a confidence � > 0 as an argument.

Definition 3. Denote by A(", �) the instance of learning algorithm A with precision "

and confidence �. We say that A is probably approximately correct (PAC) if, for every" > 0 and every � > 0, with probability at least 1 � �, the computation of A(", �)

terminates with L(s, a) V (s, a) U(s, a).

5

The function U defines a memoryless strategy �

U

which in every state s chooses allactions a maximising the value U(s, a) over E(s) uniformly at random. The strategy�

U

is used in some of the algorithms and also contributes to the output.

Remark 2. If the value function is defined as the infimum over strategies (as in [31]),then the strategy chooses actions to minimise the lower value. Since we consider thedual case of supremum over strategies, the choice of �

U

is to maximise the upper value.

We also need to specify what knowledge about the MDP M is available to the learn-ing algorithm. We distinguish the following two distinct cases.

Definition 4. A learning algorithm has limited information about M if it knows onlythe initial state s, a number K � |S|, a number Em � max

s2S

|E(s)|, a number 0 <

q pmin, where pmin = min{�(s, a)(s

0) | s 2 S, a 2 E(s), s

0 2 supp(�(s, a))},and the function E (more precisely, given a state s, the learning procedure can ask anoracle for E(s)). We assume that the algorithm may simulate an execution of M startingwith s and choosing enabled actions in individual steps.

Definition 5. A learning algorithm has complete information about M if it knows thecomplete MDP M.

Note that the MDPs we consider are “fully observable”, so even in the limited informa-tion case strategies can make decisions based on the precise state of the system.

3 MDPs without End Components

We first present algorithms for MDPs without ECs, which considerably simplifies theadaptation of BRTDP and DQL to unbounded reachability objectives. Later, in Sec-tion 4, we extend our methods to deal with arbitrary MDPs (with ECs). Let us fix anMDP M = hS, s, A,E,�i and a target set F . Formally, we assume the following.

Assumption-EC. MDP M has no ECs, except two trivial ones containing distinguishedterminal states 1 and 0, respectively, with F = {1}, V (1) = 1 and V (0) = 0.

3.1 Our framework

We start by formalising a general framework for learning algorithms, as outlined in theprevious section. We then instantiate this and obtain two learning algorithms: BRTDPand DQL. Our framework is presented as Algorithm 1, and works as follows. Recall thatfunctions U and L store the current upper and lower bounds on the value function V .Each iteration of the outer loop is divided into two phases: EXPLORE and UPDATE. Inthe EXPLORE phase (lines 5 - 10), the algorithm samples a finite path ! in M from s to astate in {1, 0} by always randomly choosing one of the enabled actions that maximisesthe U value, and sampling the successor state using the probabilistic transition function.In the UPDATE phase (lines 11 - 16), the algorithm updates U and L on the state-actionpairs along the path in a backward manner. Here, the function pop pops and returns thelast letter of the given sequence.

6

Algorithm 1 Learning algorithm (for MDPs with no ECs)1: Inputs: An EC-free MDP M2: U(·, ·) 1, L(·, ·) 0

3: L(1, ·) 1, U(0, ·) 0 . INITIALISE4: repeat5: ! s /* EXPLORE phase */6: repeat7: a sampled uniformly from argmaxa2E(last(!)) U(last(!), a)8: s sampled according to �(last(!), a) . GETSUCC(!, a)9: ! ! a s

10: until s 2 {1, 0} . TERMINATEPATH(!)11: repeat /* UPDATE phase */12: s0 pop(!)13: a pop(!)14: s last(!)15: UPDATE((s, a), s0)16: until ! = s17: until maxa2E(s) U(s, a)�maxa2E(s) L(s, a) < " . TERMINATE

3.2 Instantiations: BRTDP and DQL

Our two algorithm instantiations, BRTDP and DQL, differ in the definition of UPDATE.

Unbounded reachability with BRTDP. We obtain BRTDP by instantiating UPDATEwith Algorithm 2, which requires complete information about the MDP. Intuitively,UPDATE computes new values of U(s, a) and L(s, a) by taking the weighted averageof the corresponding U and L values, respectively, over all successors of s via action a.Formally, denote U(s) = max

a2E(s) U(s, a) and L(s) = max

a2E(s) L(s, a).

Algorithm 2 BRTDP instantiation of Algorithm 11: procedure UPDATE((s, a), ·)2: U(s, a) :=

Ps02S �(s, a)(s0)U(s0)

3: L(s, a) :=P

s02S �(s, a)(s0)L(s0)

The following theorem says that BRTDP satisfies the conditions of Definition 2 andnever returns incorrect results.

Theorem 1. The algorithm BRTDP converges almost surely under Assumption-EC.

Remark 3. Note that, in the EXPLORE phase, an action maximising the value of U ischosen and the successor is sampled according to the probabilistic transition functionof M. However, we can consider various modifications. Actions and successors maybe chosen in different ways (e.g., for GETSUCC), for instance, uniformly at random,in a round-robin fashion, or assigning various probabilities (bounded from below bysome fixed p > 0) to all possibilities in any biased way. In order to guarantee almost-sure convergence, some conditions have to be satisfied. Intuitively we require, that thestate-action pairs used by "-optimal strategies have to be chosen enough times. If thiscondition is satisfied then the almost-sure convergence is preserved and the practicalrunning times may significantly improve. For details, see Section 5.

7

Remark 4. The previous BRTDP algorithm is only applicable if the transition proba-bilities are known. However, if complete information is not known, but �(s, a) canbe repeatedly sampled for any s and a, then a variant of BRTDP can be shown to beprobably approximately correct.

Unbounded reachability with DQL. Often, complete information about the MDP isunavailable, repeated sampling is not possible, and we have to deal with only limitedinformation about M (see Definition 4). For this scenario, we use DQL, which can beobtained by instantiating UPDATE with Algorithm 3.

Algorithm 3 DQL (delay m, estimator precision ") instantiation of Algorithm 11: procedure UPDATE((s, a), s0)2: if c(s, a) = m and LEARN(s, a) then3: if accumU

m(s, a)/m < U(s, a)� 2" then4: U(s, a) accumU

m(s, a)/m+ "5: accumU

m(s, a) = 0

6: if accumLm(s, a)/m > L(s, a) + 2" then

7: L(s, a) accumLm(s, a)/m� "

8: accumLm(s, a) = 0

9: c(s, a) = 0

10: else11: accumU

m(s, a) accumUm(s, a) + U(s0)

12: accumLm(s, a) accumL

m(s, a) + L(s0)13: c(s, a) c(s, a) + 1

Macro LEARN(s, a) is true in the kth call of UPDATE((s, a), ·) if, since the (k � 2m)th callof UPDATE((s, a), ·), line 4 was not executed in any call of UPDATE(·, ·).

The main idea behind DQL is as follows. As the probabilistic transition func-tion is not known, we cannot update U(s, a) and L(s, a) with the actual valuesP

s

02S

�(s, a)(s

0)U(s

0) and

Ps

02S

�(s, a)(s

0)L(s

0), respectively. However, we can

instead use simulations executed in the EXPLORE phase of Algorithm 1 to estimatethese values. Namely, we use accumU

m

(s, a)/m to estimateP

s

02S

�(s, a)(s

0)U(s

0)

where accumU

m

(s, a) is the sum of the U values of the last m immediate successorsof (s, a) seen during the EXPLORE phase. Note that the delay m must be chosen largeenough for the estimates to be sufficiently close, i.e., "-close, to the real values.

So, in addition to U(s, a) and L(s, a), the algorithm uses new variablesaccumU

m

(s, a) and accumL

m

(s, a) to accumulate U(s, a) and L(s, a) values, respec-tively, and a counter c(s, a) recording the number of invocations of a in s since thelast update (all these variables are initialised to 0 at the beginning of computation).Assume that a has been invoked in s during the EXPLORE phase of Algorithm 1,which means that UPDATE((s, a), s0) is eventually called in the UPDATE phase of Al-gorithm 1 with the corresponding successor s0 of (s, a). If c(s, a) = m at that time,a has been invoked in s precisely m times since the last update concerning (s, a) andthe procedure UPDATE((s, a), s0) updates U(s, a) with accumU

m

(s, a)/m plus an ap-propriate constant " (unless LEARN is false). Here, the purpose of adding " is to makeU(s, a) stay above the real value V (s, a) with high probability. If c(s, a) < m, then

8

m1 m2

m3

•

10

a 1

b 1

c 1

0.50.5e1

f1

d

m1 m2

m3

a 1

c 1

b 1

?1

sC

m3

•

10

0.50.5

c 1

e1

f1

d

Fig. 1. MDP M with an EC (left), MDP M{m1,m2} constructed from M in on-the-fly BRTDP(centre), and MDP M0 obtained from M by collapsing C = ({m1,m2}, {a, b}) (right).

UPDATE((s, a), s0) simply accumulates U(s

0) into accumU

m

(s, a) and increases thecounter c(s, a). The L(s, a) values are estimated by accumL

m

(s, a)/m in a similar way,just subtracting " from accumL

m

(s, a). The procedure requires m and " as inputs, andthey are chosen depending on " and �; more precisely, we choose " =

"·(pmin/Em)|S|

12|S|

and m =

ln(6|S||A|(1+ |S||A|" )/�)

2"2 and establish that DQL is probably approximately cor-rect. The parameters m and " can be conservatively approximated using only the limitedinformation about the MDP (i.e. using K, Em and q). Even though the algorithm haslimited information about M, we still establish the following theorem.

Theorem 2. DQL is probably approximately correct under Assumption-EC.

Bounded reachability. Algorithm 1 can be trivially adapted to handle bounded reach-ability properties by preprocessing the input MDP in standard fashion. Namely, everystate is equipped with a bounded counter with values ranging from 0 to k where k is thestep bound, the current value denoting the number of steps taken so far. All target statesremain targets for all counter values, and every non-target state with counter value k

becomes rejecting. Then, to determine the k-step reachability in the original MDP, wecompute the (unbounded) reachability in the new MDP. Although this means that thenumber of states is multiplied by k + 1, in practice the size of the explored part of themodel can be small.

4 Unrestricted MDPs

We first illustrate with an example that the algorithms BRTDP and DQL as presentedin Section 3 may not converge when there are ECs in the MDP.

Example 1. Consider the MDP M in Fig. 1 (left) with EC ({m1,m2}, {a, b}). Thevalues in states m1,m2 are V (m1) = V (m2) = 0.5 but the upper bounds are U(m1) =

U(m2) = 1 for every iteration. This is because U(m1, a) = U(m2, b) = 1 and bothalgorithms greedily choose the action with the highest upper bound. Thus, in everyiteration t of the algorithm, the error for the initial state m1 is U(m1) � V (m1) =

12

and the algorithm does not converge. In general, any state in an EC has upper bound 1

9

since, by definition, there are actions that guarantee the next state is in the EC, i.e., isa state with upper bound 1. This argument holds even for standard value iteration withvalues initialised to 1.

One way of dealing with general MDPs is to preprocess them to identify allMECs [10,9] and “collapse” them into single states (see e.g. [15,11]). These algorithmsrequire that the graph model is known and explore the whole state space, but this maynot be possible either due to limited information (see Definition 4) or because the modelis too large. Hence, we propose a modification to the algorithms from the previous sec-tions that allows us to deal with ECs “on-the-fly”. We first describe the collapsing of aset of states and then present a crucial lemma that allows us to identify ECs to collapse.

Collapsing states. In the following, we say that an MDP M0= hS0

, s

0, A

0, E

0,�

0i isobtained from M = hS, s, A,E,�i by collapsing a tuple (R,B), where R ✓ S andB ✓ A with B ✓

Ss2R

E(s) if:– S

0= (S \R) [ {s(R,B)},

– s

0 is either s(R,B) or s, depending on whether s 2 R or not,– A

0= A \B,

– E

0(s) = E(s), for s 2 S \R; E

0(s(R,B)) =

Ss2R

E(s) \B,– �

0 is defined for all s 2 S

0 and a 2 E

0(s) by:

• �

0(s, a)(s

0) = �(s, a)(s

0) for s, s0 6= s(R,B),

• �

0(s, a)(s(R,B)) =

Ps

02R

�(s, a)(s

0) for s 6= s(R,B),

• �

0(s(R,B), a)(s

0) = �(s, a)(s

0) for s0 6= s(R,B) and s the unique state with

a 2 E(s) (see Remark 1),• �

0(s(R,B), a)(s(R,B)) =

Ps

02R

�(s, a)(s

0) where s is the unique state with

a2E(s).We denote the above transformation, which creates M0 from M, as the COLLAPSE func-tion, i.e., COLLAPSE(R,B). As a special case, given a state s and a terminal state s

0 2{0, 1}, we use MAKETERMINAL(s, s0) as shorthand for COLLAPSE({s, s0}, E(s)),where the new state is renamed to s

0. Intuitively, after MAKETERMINAL(s, s0), everytransition previously leading to state s will now lead to the terminal state s

0.For practical purposes, it is important to note that the collapsing does not need to

be implemented explicitly, but can be done by keeping a separate data structure whichstores information about the collapsed states.

Identifying ECs from simulations. Our modifications will identify ECs “on-the-fly”through simulations that get stuck in them. The next lemma establishes the identificationprinciple. To this end, for a path !, let us denote by Appear(!, i) the tuple (S

i

, A

i

) ofM such that s 2 S

i

and a 2 A

i

(s) if and only if (s, a) occurs in ! more than i times.

Lemma 1. Let c = exp (� (pmin/Em)

/), where = KEm + 1, and let i �. Assume that the EXPLORE phase in Algorithm 1 terminates with probability lessthan 1. Then, provided the EXPLORE phase does not terminate within 3i

3 iterations, theconditional probability that Appear(!, i) is an EC is at least 1� 2c

i

i

3 · (pmin/Em)

�.

The above lemma allows us to modify the EXPLORE phase of Algorithm 1 in sucha way that simulations will be used to identify ECs. The ECs discovered will subse-quently be collapsed. We first present the overall skeleton (Algorithm 4) for treating

10

ECs “on-the-fly”, which consists of two parts: (i) identification of ECs; and (ii) pro-cessing them. The instantiations for BRTDP and DQL will differ in the identificationphase. Hence, before proceeding to the individual identification algorithms, we firstestablish the correctness of the processing phase.

Algorithm 4 Extension for general MDPs1: function ON-THE-FLY-EC2: M IDENTIFYECS . IDENTIFICATION OF ECS3: for all (R,B) 2M do . PROCESS ECS4: COLLAPSE(R,B)

5: for all s 2 R and a 2 E(s) \B do6: U(s(R,B), a) U(s, a)7: L(s(R,B), a) L(s, a)

8: if R \ F 6= ; then9: MAKETERMINAL(s(R,B), 1)

10: else if no actions enabled in s(R,B) then11: MAKETERMINAL(s(R,B), 0)

Lemma 2. Assume (R,B) is an EC in MDP M, VM the value before the PROCESS ECSprocedure in Algorithm 4, and VM0 the value after the procedure, then:

– for i 2 {0, 1} if MAKETERMINAL(s(R,B), i) is called, then 8s 2 R : VM(s) = i,– 8s 2 S \R : VM(s) = VM0

(s),– 8s 2 R : VM(s) = VM0

(s(R,B)).

Interpretation of collapsing. Intuitively, once an EC (R,B) is collapsed, the algorithmin the EXPLORE phase can choose a state s 2 R and action a 2 E(s) \ B to leavethe EC. This is simulated in the EXPLORE phase by considering all actions of the ECuniformly at random until s is reached, and then action a is chosen. Since (R,B) is anEC, playing all actions of B uniformly at random ensures s is almost surely reached.Note that the steps made inside a collapsed EC do not count towards the length of theexplored path.

Now, we present the on-the-fly versions of BRTDP and DQL. For each case, we de-scribe: (i) modification of Algorithm 1; (ii) identification of ECs; and (iii) correctness.4.1 Complete information (BRTDP)

Modification of Algorithm 1. To obtain BRTDP working with unrestricted MDPs, wemodify Algorithm 1 as follows: for iteration i of the EXPLORE phase, we insert a checkafter line 9 such that, if the length of the path ! explored (i.e., the number of states) is k

i

(see below), then we invoke the ON-THE-FLY-EC function for BRTDP. The ON-THE-FLY-EC function possibly modifies the MDP by processing (collapsing) some ECs asdescribed in Algorithm 4. After the ON-THE-FLY-EC function terminates, we interruptthe current EXPLORE phase, and start the EXPLORE phase for the i+1-th iteration (i.e.,generating a new path again, starting from s in the modified MDP). To complete thedescription we describe the choice of k

i

and identification of ECs.Choice of k

i

. Because computing ECs can be expensive, we do not call ON-THE-FLY-EC every time a new state is explored, but only after every k

i

steps of the repeat-until

11

loop at lines 6–10 in iteration i. The specific value of ki

can be decided experimentallyand change as the computation progresses. A theoretical bound for k

i

to ensure thatthere is an EC with high probability can be obtained from Lemma 1.

Identification of ECs. Given the current explored path !, let (T,G) be Appear(!, 0),that is, the set of states and actions explored in !. To obtain the ECs from the setT of explored states, we use Algorithm 5. This computes an auxiliary MDP MT

=

hT 0, s, A

0, E

0,�

0i defined as follows:– T

0= T [ {t | 9s 2 T, a 2 E(s) such that �(s, a)(t) > 0},

– A

0=

Ss2T

E(s) [ {?},– E

0(s) = E(s) if s 2 T and E

0(s) = {?} otherwise,

– �

0(s, a) = �(s, a) if s 2 T , and �

0(s,?)(s) = 1 otherwise.

It then computes all MECs of MT that are contained in T and identifies them as ECs.The following lemma states that each of these is indeed an EC in the original MDP.

Algorithm 5 Identification of ECs for BRTDP1: function IDENTIFYECS(M, T )2: compute MT

3: M0 MECs of MT

4: M {(R,B) 2M0 | R ✓ T}

Lemma 3. Let M,MT be the MDPs from the construction above and T be the set ofexplored states. Then every MEC (R,B) in MT such that R ✓ T is an EC in M.

Finally, we establish that the modified algorithm, which we refer to as on-the-flyBRTDP, almost surely converges; the proof is an extension of Theorem 1.

Theorem 3. On-the-fly BRTDP converges almost surely for all MDPs.

Example 2. Let us describe the execution of the on-the-fly BRTDP on the MDP Mfrom Fig. 1 (left). Choose k

i

� 6 for all i. The loop at lines 6 to 10 of Algorithm 1generates a path ! that contains some (possibly zero) number of loops m1 am2b fol-lowed by m1 am2 cm3 d t where t 2 {0, 1}. In the subsequent UPDATE phase, weset U(m3, d) = L(m3, d) = 0.5 and then U(m2, c) = L(m2, c) = 0.5; none ofthe other values change. In the second iteration of the loop at lines 6 to 10, the path!

0= m1 am2 bm1 am2 b . . . is being generated, and the newly inserted check for

ON-THE-FLY-EC will be triggered once ! achieves the length k

i

.The algorithm now aims to identify ECs in the MDP based on the part of the MDP

explored so far. To do so, the MDP MT for the set T = {m1,m2} is constructedand we depict it in Fig. 1 (centre). We then run MEC detection on MT , finding that({m1,m2}, {a, b}) is an EC, and so it gets collapsed according to the COLLAPSE pro-cedure. This gives the MDP M0 from Fig. 1 (right).

The execution then continues with M0. A new path is generated at lines 6 to 10of Algorithm 1; suppose it is !

00= sCcm3d0. In the UPDATE phase we then update

the value U(sC , d) = L(sC , d) = 0.5, which makes the condition at the last line ofAlgorithm 1 satisfied, and the algorithm finishes, having computed the correct value.

12

4.2 Limited information (DQL)

Modification of Algorithm 1 and identification of ECs. The modification of Algo-rithm 1 is done exactly as for the modification of BRDTP (i.e., we insert a check afterline 9 of EXPLORE, which invokes the ON-THE-FLY-EC function if the length of path! exceeds k

i

). In iteration i, we set ki

as 3`3i

, for some `

i

(to be described later). Theidentification of the EC is as follows: we consider Appear(!, `

i

), the set of states andactions that have appeared more than `

i

times in the explored path !, which is of length3`

3i

, and identify the set as an EC; i.e., M in line 2 of Algorithm 4 is defined as the setcontaining the single tuple Appear(!, `

i

). We refer to the algorithm as on-the-fly DQL.Choice of `

i

and correctness. The choice of ì

is as follows. Note that, in iteration i,the error probability, obtained from Lemma 1, is at most 2cì`3

i

· (pmin/Em)

� and wechoose `

i

such that 2cì`3i

· (pmin/Em)

� �/22i , where � is the confidence. Note that,

since c < 1, we have that cì decreases exponentially, and hence for every i such `

i

exists. It follows that the total error of the algorithm due to the on-the-fly EC collapsingis at most �/2. It follows from the proof of Theorem 2 that for on-the-fly DQL theerror is at most � if we use the same " as for DQL, but now with DQL confidence �/4,

i.e., with m =

ln(24|S||A|(1+ |S||A|" )/�)

2"2 . As before, these numbers can be conservativelyapproximated using the limited information.

Theorem 4. On-the-fly DQL is probably approximately correct for all MDPs.

Example 3. Let us now briefly explain the execution of on-the-fly DQL on the MDPM from Fig. 1 (left). At first, paths of the same form as ! in Example 2 will begenerated and there will be no change to U and L, because in any call to UPDATE(see Algorithm 3) for states s 2 {m1,m2} with c(s, a) = m the values accumulatedin accumU

m

(s, a)/m and accumL

m

(s, a)/m are the same as the values already held,namely 1 and 0, respectively.

At some point, we call UPDATE for the tuple (m3, d) with c(m3, d) = m, whichwill result in the change of U(m3, d) and L(m3, d). Note, that at this point, the numbersaccumU

m

(s, d)/m and accumL

m

(s, d)/m are both equal to the proportion of generatedpaths that visited the state 1. This number will, with high probability, be very close to0.5, say 0.499. We thus set U(m3, d) = 0.499 + " and L(m3, d) = 0.499� ".

We then keep generating paths of the same form and at some point also updateU(m2, c) and L(m2, c) to precisely 0.499 + " and 0.499 � ", respectively. The subse-quently generated path will be looping on m1 and m2, and once it is of length `

i

, weidentify ({m1,m2}, {a, b}) as an EC due to the definition of Appear(!, `

i

). We thenget the MDP from Fig. 1 (right), which we use to generate new paths, until the upperand lower bounds on value in the new initial state are within the required bound.

4.3 Extension to LTLSo far we have focused on reachability, but our techniques also extend to linear temporallogic (LTL) objectives. By translating an LTL formula to an equivalent deterministic !-automaton, verifying MDPs with LTL objectives reduces to analysis of MDPs with !-regular conditions such as Rabin acceptance conditions. A Rabin acceptance conditionconsists of a set {(M1,N1) . . . (Md

,Nd

)} of d pairs (Mi

,Ni

), where each Mi

✓ S and

13

Ni

✓ S. The acceptance condition requires that, for some 1 i d, states in Mi

arevisited infinitely often and states in N

i

are visited finitely often.Value computation for MDPs with Rabin objectives reduces to optimal reachability

of winning ECs, where an EC (R,B) is winning if R \ Mi

6= ; and R \ Ni

= ; forsome 1id [12]. Thus, extending our results from reachability to Rabin objectivesrequires processing of ECs for Rabin objectives (line 3-11 of Algorithm 4), which isdone as follows. Once an EC (R,B) is identified, we first obtain the EC in the originalMDP (i.e., obtain the set of states and actions corresponding to the EC in the originalMDP) as (R,B) and then determine if there is a sub-EC of (R,B) that is winning usingstandard algorithms for MDPs with Rabin objectives [2]; and if so then we merge thewhole EC as in line 9 of Algorithm 4; if not, and, moreover, there is no action out ofthe EC, we merge as in line 11 of Algorithm 4. This modified EC processing yieldson-the-fly BRTDP and DQL algorithms for MDPs with Rabin objectives.

5 Experimental Results

Implementation. We have developed an implementation of our learning-based frame-work within the PRISM model checker [25], building upon its simulation engine forgenerating trajectories and explicit probabilistic model checking engine for storingvisited states and U and L values. We focus on the complete-information case (i.e.,BRTDP), for which we can perform a more meaningful comparison with PRISM. Weimplement Algorithms 1 and 2, and the on-the-fly EC detection algorithm of Sec. 4,with the optimisation of taking T as the set of all states explored so far.

We consider three distinct variants of the learning algorithm by modifying the GET-SUCC function in Algorithm 1, which is the heuristic responsible for picking a successorstate s0 after choosing some action a in each state s of a trajectory. The first variant takesthe unmodified GETSUCC, selecting s

0 at random according to the distribution �(s, a).This behaviour follows the one of the original RTDP algorithm [3]. The second uses theheuristic proposed for BRTDP in [31], selecting the successor s0 2 supp(�(s, a)) thatmaximises the difference U(s

0)�L(s

0) between bounds for those states (M-D). For the

third, we propose an alternative approach that systematically chooses all successors s0in a round-robin (R-R) fashion, and guarantees termination with certainty.

Results. We evaluated our implementation on four existing benchmark models, usinga machine with a 2.8GHz Xeon processor and 32GB of RAM, running Fedora 14.We use three models from the PRISM benchmark suite [26]: zeroconf, wlan, andfirewire impl dl; and a fourth one from [16]: mer. The first three use unbounded prob-abilistic reachability properties; the fourth a time-bounded probabilistic reachability.The latter is used to show differences between heuristics in the case of MDPs contain-ing rare events, e.g., MDPs where failures occur with very low probability. All models,properties and logs are available online at [40].

We run BRTDP and compare its performance to PRISM. We terminate it when thebounds L and U differ by at most " for the initial state of the MDP. We use " = 10

�6

in all cases except zeroconf, where " = 10

�8 is used since the actual values are verysmall. For PRISM, we use its fastest engine, which is the “sparse” engine, running valueiteration. This is terminated when the values for all states in successive iterations differ

14

Name[param.s]

Param.values

Num.states

Time (s) Visited statesPRISM RTDP M-D R-R RTDP M-D R-R

zeroconf[N,K]

20, 10 3,001,911 129.9 7.40 1.47 1.83 760 2007 257020, 14 4,427,159 218.2 12.4 2.18 2.26 977 3728 302820, 18 5,477,150 303.8 71.5 3.89 3.73 1411 5487 3704

wlan[BOFF ]

4 345,000 7.35 0.53 0.48 0.54 2018 1377 14435 1,295,218 22.3 0.55 0.45 0.54 2053 1349 15426 5,007,548 82.9 0.50 0.43 0.49 1995 1313 1398

firewire impl dl[delay,

deadline]

36, 200 6,719,773 63.8 2.85 2.62 2.26 26,508 28,474 22,03836, 240 13,366,666 145.4 8.37 7.69 6.72 25,214 26,680 20,21936, 280 19,213,802 245.4 9.29 7.90 7.39 32,214 28,463 25,565

mer[N, q]

3000, 0.0001 17,722,564 158.5 67.0 2.42 4.44 1950 3116 37293000, 0.9999 17,722,564 157.7 10.9 2.82 6.80 2902 4643 46084500, 0.0001 26,583,064 250.7 67.3 2.41 4.42 1950 3118 37294500, 0.9999 26,583,064 246.6 10.9 2.84 6.79 2900 4644 4608

Table 1. Verification times using BRTDP (three different heuristics) and PRISM.

by at most ". Strictly speaking, this is not guaranteed to produce an "-optimal strategy(e.g. in the case of very slow numerical convergence), but on all these examples it does.

The experimental results are summarised in Table 1. For each model, we give thenumber of states in the full model, the time for PRISM (model construction, precom-putation of zero/one states and value iteration) and time and number of visited statesfor BRTDP with each of the three heuristics described earlier. Some heuristics performrandom exploration and therefore all results have been averaged over 20 runs.

We see that our method outperforms PRISM on all four benchmarks. The improve-ments in execution time on these benchmarks are possible because the algorithm is ableto construct an "-optimal policy whilst exploring only a portion of the state space. Thenumber of states visited by the algorithm is at least two orders of magnitude smallerthan the total size of the model (column ‘Num. states’). These numbers do not varygreatly between heuristics.

The RTDP heuristic is generally the slowest of the three, and tends to be sensitive tothe probabilities in the model. In the mer example, changing the parameter q can meanthat some states, which are crucial for the convergence of the algorithm, are no longervisited due to low probabilities on incoming transitions. This results in a considerableslow-down, and is a potential problem for MDPs containing rare events. The M-D andR-R heuristics perform very similarly, despite being quite different (one is randomised,the other deterministic). Both perform consistently well on these examples.

6 Conclusions

We have presented a framework for verifying MDPs using learning algorithms. Build-ing upon methods from the literature, we provide novel techniques to analyse un-bounded probabilistic reachability properties of arbitrary MDPs, yielding either exactbounds, in the case of complete information, or PAC bounds, in the case of limitedinformation. Given our general framework, one possible direction would be to exploreother learning algorithms in the context of verification. Another direction of future workis to explore whether learning algorithms can be combined with symbolic methods forprobabilistic verification.

Acknowledgement. We thank Arnd Hartmanns and anonymous reviewers for carefulreading and valuable feedback.

15

References

1. Aljazzar, H., Leue, S.: Generation of counterexamples for model checking of Markov deci-sion processes. In: QEST. pp. 197–206 (2009)

2. Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)3. Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic program-

ming. Artificial Intelligence 72(12), 81 – 138 (1995)4. Bogdoll, J., Fioriti, L.M.F., Hartmanns, A., Hermanns, H.: Partial order methods for statisti-

cal model checking and simulation. In: FMOODS/FORTE. pp. 59–74 (2011)5. Bogdoll, J., Hartmanns, A., Hermanns, H.: Simulation and statistical model checking for

modestly nondeterministic models. In: MMB/DFT. pp. 249–252 (2012)6. Boyer, B., Corre, K., Legay, A., Sedwards, S.: PLASMA-lab: A flexible, distributable statis-

tical model checking library. In: QEST. pp. 160–164 (2013)7. Brazdil, T., Chatterjee, K., Chmelıik, M., Forejt, V., Kretınsky, J., Kwiatkowska, M.Z.,

Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms.CoRR abs/1402.2967 (2014)

8. Bulychev, P.E., David, A., Larsen, K.G., Mikucionis, M., Poulsen, D.B., Legay, A., Wang,Z.: UPPAAL-SMC: Statistical model checking for priced timed automata. In: QAPL (2012)

9. Chatterjee, K., Henzinger, M.: An O(n2) algorithm for alternating Buchi games. In: SODA.

pp. 1386–1399 (2012)10. Chatterjee, K., Henzinger, M.: Faster and dynamic algorithms for maximal end-component

decomposition and related graph problems in probabilistic verification. In: SODA (2011)11. Ciesinski, F., Baier, C., Grosser, M., Klein, J.: Reduction techniques for model checking

Markov decision processes. In: QEST. pp. 45–54 (2008)12. Courcoubetis, C., Yannakakis, M.: Markov decision processes and regular events (extended

abstract). In: ICALP. pp. 336–349 (1990)13. David, A., Larsen, K.G., Legay, A., Mikucionis, M., Poulsen, D.B., van Vliet, J., Wang, Z.:

Statistical model checking for networks of priced timed automata. In: FORMATS (2011)14. David, A., Larsen, K.G., Legay, A., Mikucionis, M., Wang, Z.: Time for statistical model

checking of real-time systems. In: CAV. pp. 349–355 (2011)15. De Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis (1997)16. Feng, L., Kwiatkowska, M., Parker, D.: Automated learning of probabilistic assumptions for

compositional reasoning. In: FASE. pp. 2–17 (2011)17. He, R., Jennings, P., Basu, S., Ghosh, A.P., Wu, H.: A bounded statistical approach for model

checking of unbounded until properties. In: ASE. pp. 225–234 (2010)18. Henriques, D., Martins, J., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking

for Markov decision processes. In: QEST. pp. 84–93 (2012)19. Herault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model

checking. In: VMCAI. pp. 307–329 (2004)20. Jegourel, C., Legay, A., Sedwards, S.: Cross-entropy optimisation of importance sampling

parameters for statistical model checking. In: CAV. pp. 327–342 (2012)21. Jegourel, C., Legay, A., Sedwards, S.: A platform for high performance statistical model

checking - PLASMA. In: TACAS. pp. 498–503 (2012)22. Jegourel, C., Legay, A., Sedwards, S.: Importance splitting for statistical model checking

rare properties. In: CAV. pp. 576–591 (2013)23. Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains. Springer-Verlag (1976)24. Kolobov, A., Mausam, Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic

shortest path mdps. In: ICAPS (2011)25. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time

systems. In: CAV. pp. 585–591 (2011)

16

26. Kwiatkowska, M., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST. pp.203–204 (2012)

27. Larsen, K.G.: Priced timed automata and statistical model checking. In: IFM (2013)28. Lassaigne, R., Peyronnet, S.: Approximate planning and verification for large Markov deci-

sion processes. In: SAC. pp. 1314–1319 (2012)29. Legay, A., Sedwards, S.: Lightweight Monte Carlo algorithm for Markov decision processes.

CoRR abs/1310.3609 (2013)30. Legay, A., Sedwards, S., Traonouez, L.: Scalable verification of markov decision processes.

In: SEFM. pp. 350–362 (2014)31. McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming:

RTDP with monotone upper bounds and performance guarantees. In: ICML (2005)32. Puterman, M.: Markov Decision Processes. Wiley (1994)33. Rabih, D.E., Pekergin, N.: Statistical model checking using perfect simulation. In: ATVA.

pp. 120–134 (2009)34. Sen, K., Viswanathan, M., Agha, G.: On statistical model checking of stochastic systems. In:

CAV. pp. 266–280 (2005)35. Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic

systems. In: CAV. pp. 202–215 (2004)36. Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforce-

ment learning. In: ICML. pp. 881–888 (2006)37. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)38. Younes, H., Simmons, R.: Probabilistic verification of discrete event systems using accep-

tance sampling. In: CAV. pp. 223–235 (2002)39. Younes, H.L.S., Clarke, E.M., Zuliani, P.: Statistical verification of probabilistic properties

with unbounded until. In: SBMF. pp. 144–160 (2010)40. http://www.prismmodelchecker.org/files/atva14learn/

17

Analysing Reaction Networks usingChemical Organisation Theory and

Probabilistic Model Checking

Peter Dittrich1, Chunyan Mu2, David Parker2, and Jonathan E. Rowe2

1 Institute of Computer Science, Friedrich-Schiller-University Jena2 School of Computer Science, University of Birmingham

Abstract. Probabilistic model checking is a formal verification tech-nique for analysis of systems that exhibit stochastic behaviour. In thisreport, we show its applicability to the quantitative analysis of chemicalreaction networks. More precisely, we study such systems in the con-text of chemical organisation theory, which is an approach to simplifyingthe analysis of complex networks. We develop algorithms and an imple-mentation, as an extension of PRISM, to investigate the quantitativebehaviour of reaction networks in terms of organisations.

1 Introduction

In this report, we study reaction networks and chemical organisation theory, inparticular, investigating the applicability of probabilistic model checking to theiranalysis. Reaction networks are widely used in modelling chemical phenomena.They describe the dynamical interaction between processes of living systems ina formal way. Reaction networks can be di�cult to understand and analyse sincethey can represent complex interaction behaviour over large state spaces.

Chemical organisation theory [5,4] provides a way to analyse complex dy-namical networks. It defines the notion of an organisation, which is a set ofobjects (for example, the species or molecules in a reaction system) that areclosed and self-maintaining. Informally, closed means that no new object can beproduced by the interactions within the set, and self-maintaining means that noobject of the set disappears from the system, i.e., every consumed object of theset can be generated within the set. The dynamics of the complex state spaceof the reaction network can then be mapped to movements among the set oforganisations, simplifying the analysis of the overall system.

In order to study the evolution of reaction networks, we apply probabilisticmodel checking, a formal verification technique for modelling and analysis ofsystems with stochastic behaviour. It has been used to study models across awide range of application domains, including chemical and biological systems.Probabilistic model checking is based on the exhaustive construction and anal-ysis of a probabilistic model, typically a Markov chain, of a real-life system. Inthis work, we model the reaction networks as continuous-time Markov chains.Quantitative properties of interest about the system being analysed are formally

specified using temporal logic. Here we use CSL (Continuous Stochastic Logic)with rewards, an extension of temporal logic CTL.

Specifically, in this work, we use CSL model checking of continuous-timeMarkov chains to investigate connections between chemical organisations usingmodel decompositions into strongly connected components (SCCs). We developan algorithm to automatically find organisations, and then perform a quantita-tive dynamical analysis in terms of organisations, asking, for example, “what isthe probability of moving from one organisation to another?” or “what is theexpected time to leave an organisation?” We implement our techniques as anextension of the probabilistic model checking tool PRISM [8], and illustrate theapproach on a set of example reaction networks.

Organisation. This report is organised as follows. Section 2 gives an overviewof probabilistic model checking, including definitions of continuous-time Markovchains (CTMCs) and the temporal logic CSL. Section 3 presents the details ofhow to model chemical reaction networks as CTMCs and introduces definitionsfor building connections with chemical organisation theory. Section 4 introducesan algorithm to find organisations for a chemical reaction network and proposesmethods for a quantitative organisation-based analysis using PRISM.

2 Probabilistic Model Checking

Probabilistic model checking is a variant of model checking [3], a well-establishedformal method to automatically verify the correctness of real-life systems. Clas-sical model checking answers the question of whether the behaviour of a givensystem satisfies a property or not. It thus requires two inputs: a description ofthe system and a specification of one or more required properties of that system,normally in temporal logic (such as CTL or LTL).

In probabilistic model checking, the models are extended with quantitativeinformation regarding the likelihood that transitions take place. In practice, thesemodels are usually Markov chains or Markov decision processes. In this work,we model the reaction systems as continuous-time Markov chains (CTMCs).Properties expressed in temporal logic are with quantitative nature. For instance,in stead of verifying that “the species A eventually vanishes”, we ask that “whatis the probability of species A eventually vanishes?” In this work, we employ thetemporal logic Continuous Stochastic Logic (CSL).

The remainder of this section reviews some preliminary definitions for theprobabilistic model checking techniques we use in this report.

2.1 Continuous-Time Markov Chains

Continuous-time Markov chains (CTMCs) are widely used in fields such as per-formance analysis or systems biology to model systems with stochastic real-timebehaviour. Formally, we define them as follows.

2

Definition 1 (CTMC). A CTMC is a tuple A = (Q,Q0

,�, L), where:

– Q is a finite set of states;

– Q0

✓ Q is the set of initial states;

– � : Q⇥Q ! R�0

is the transition rate matrix;

– L : Q ! 2APis a labelling function which assigns each state q 2 Q the set

L(q) of atomic propositions, from a set AP , that are true in state q.

The transition rate matrix � assigns a rate to each pair of states in the CTMC,which is used as the parameter of an exponential distribution. A transition hap-pens between q and q0 if �(q, q0) > 0, such that, the probability of this transitionbeing fired within t time units equals 1 � e��(q,q0)·t. For any state q 2 Q, arace condition is applied when there is more than one state q0 2 Q such that�(q, q0) > 0. The time spent in state q before any transition occurs is exponen-tially distributed with rate E(q) , P

q02Q�(q, q0).Ignoring time, the probability of a transition from state q to q0 can be calcu-

lated using a discrete-time Markov chain (DTMC) called the embedded DTMC

of the CTMC which is defined as follows.

Definition 2 (Embedded DTMC). The embedded DTMC of a CTMC A =(Q,Q

0

,�, L) is the DTMC Aemb(Q,Q0

,�emb, L), where for q, q0 2 Q:

�emb(q, q0) =

8<

:

�(q,q0)E(q) if E(q) 6= 0

1 if E(q) = 0 ^ q = q0

0 otherwise

The embedded DTMC can be use to study untimed properties of the behaviourof the CTMC.

2.2 Continuous Stochastic Logic

In this work, we use the probabilistic temporal logic CSL (Continuous StochasticLogic) to formally represent properties of reaction networks. It was originallyintroduced by Aziz et al. [1] and extended by Baier et al. [2]. The extendedversion allows for the specification of reward (or cost) properties, to reason aboutrewards (or costs) that have been attached to a CTMC. The extended versionof CSL that we use allows us to represent properties such as “the probability ofall of species A degrading within t time units is at most 0.1” or “the expectedtime elapsed before a B molecule first appears is at most 10”.

Definition 3 (CSL syntax). An (extended) CSL formula is an expression derived from the grammar:

::= true | p | ¬ | ^ | P./�( ) | S./�( ) | R./r[⌃ ] ::= � | U I

where p 2 AP an atomic proposition, � 2 [0, 1] is a probability threshold, r 2 R�0

is a reward threshold, ./2 {<,,�, >} and I is an interval of R�0

.

3

CSL formulas are described over the states of the Markov chain, and the syntaxdistinguishes between state formulae and path formulae , which are evaluatedover states and paths respectively. A state q satisfies P./�( ) if the probabilityof taking a path from q satisfying is in the interval specified by ./ �. Pathformulae include the “next” operator � and the “until” operator U , which arestandard in temporal logic. U I 0 asserts that 0 is satisfied at some futuretime point within interval I, and that is true up until that point. Commonderived operators include: “eventually” ⌃I := true U I and “always” ⇤I :=¬⌃I¬ . For example, P�(⇤I ) ⌘ P�1��(⌃I¬ ).

The S operator describes the steady state (long-run) behaviour of the CTMC.The formula S./�( ) specifies that the steady-state probability of being in a statesatisfying meets the bound ./ �. The R operator is used for reward properties:R./r[⌃ ] is true from state q if the expected reward cumulated before a statesatisfying is reached meets the bound ./ r. Rewards and costs are treatedidentically: here, we will use the R operator to formalise properties about theexpected time elapsing before an event’s occurrence.

We omit a full definition of the semantics of CSL with respect to a Markovchain. Full details can be found in, for example, [2].

For the purpose of quantitative analysis, we allow the bounds ./ � and ./ rattached to the P, S and P operators to be replaced with =?. This allows us touse numerical properties such as P

=?

(true U [t,t] ) (“what is the probability of being true at time t?”) and R

=?

(⌃ ) (“what is the expected reward accumulatedbefore a state satisfying is reached?”).

3 Modelling Reaction Networks with CTMCs

This section reviews the definitions of reaction networks and the basic con-cepts of organisation theory. It also explains how to model reaction networks asCTMCs and studies the connections between chemical organisations and modeldecompositions into strongly connected components (SCCs).

3.1 Reaction Networks

A reaction network consists of a set of molecules, of various species, and inter-action rules amongst them that lead to their production or removal. Given a setC, we will denote by PM (C) the set of all multisets of elements from C. Thedescription of a reaction network is then formally given as follows.

Definition 4. A reaction network is a pair (M,R) consisting of a set of possible

molecular species M, and a set R ✓ PM (M) ⇥ PM (M) of possible reactions

among those species. For a reaction (R,P ) 2 R, the multisets R and P denote

the reactants and products of the reaction, respectively, and we write R(s) and

P (s) for the number of molecules of species s consumed by and produced by the

reaction, respecitvely.

For simplicity, we write s1

+ s2

+ · · · + sn ! s01

+ s02

+ · · · + s0n0 instead of({s

1

, s2

, . . . , sn}, {s01

, s02

, . . . , s0n0}) 2 R to denote the existence of a reaction.

4

3.2 Reaction Networks as CTMCs

There are multiple ways in which we can model and analyse the behaviour ofa reaction network. One way is to consider (real-valued) concentrations of eachmolecular species and then represent the (deterministic) behaviour of the re-actions as a set of ordinary di↵erential equations. An alternative is to take adiscrete, stochastic view of the network, modelling the (integer-valued) popula-tion count of each species and considering its evolution as a stochastic process,typically a continuous-time Markov chain [6]. The latter is particularly appro-priate when the numbers of molecules can be assumed to be relatively small inpractice, and is the approach that we take in this work.

Furthermore, we will assume also that the reaction network is executingwithin a finite volume, which is modelled by limiting the total number Nmax 2 Nof molecules that can be present at any given time [7]. We also need to define therates at which reaction events occur in the CTMC. To retain a general approach,we allow an arbitrary function rater from reactant populations to rate valuesfor each reaction r. A typical default is to multiply the number of molecules ofeach reactant by a fixed kinetic rate associated with the reaction.

Definition 5 (CTMC for reaction network). Given a reaction network

hM,Ri, a volume limit Nmax

2 N and a rate function rater : NM ! R�0

for each r 2 R, we define the corresponding CTMC A = (Q,Q,�, L) where:

– Q = {q : M ! N | Ps2M q(s) N

max

}

is the set of population counts of M and � is defined as follows. For states

q, q0 2 Q, we write q(R,P )��! q0 if and only, for each species s 2 M, we have

q(s) � R(s) and q0(s) = q(s)� R(s) + P (s), andP

s2M q0(s) Nmax

. Then for

any q, q0 2 Q, we have:

– �(q, q0) =P{| rater(q) | (R,P ) 2 R and q

(R,P )��! q0}.

As mentioned above, a common default is to take rater(q) = �r · Qs2R q(s)for some kinetic rate �r. L can be any labelling function over Q that identifies

properties of interest.

Each state q 2 Q of the CTMC gives the number q(s) of molecules of eachspecies s 2 M that are currently present. For a state q, we also write �(q) todenote the set of molecular species that are present, i.e., �(q) = {s | q(s) > 0}.

Example 1. Consider the reaction network A with species M = {a, b}, and re-action rules R = {a + b ! a + 2b, a ! 2a, b ! 2b, a ! ;, b ! ;}. Assume thevolume of the system is Nmax = 4, and that the rate of each reaction rule is themultiplication of the number of the reactants. The CTMC constructed for A isshown in Fig. 1.

5

11:2a2b

6:ab7:a2b8:a3b 10:2ab 13:3ab

4:4b3:3b2:2b1:b 5:a 9:2a 12:3a 14:4a

0:

4

4

1

1

1

1

2

2

2

2

3

3

2

2

2

3

3

1

1

2

2

3

3

4

1

1

2

2

3

3

4

1

Fig. 1. CTMC for a reaction network (see Example 1). State labels show index andpopulation count: e.g., 11 : 2a2b denotes there are 2a and 2b in state 11.

3.3 Chemical Organisation Theory and SCC Decomposition

Complex dynamical reaction networks are hard to understand and analyse inpractice. Chemical organisation theory [5,4] provides a way to lift the complexstate space of a network to a set of organisations, and then map the movementthrough the state space to movement between organisations. Such an abstractview allows us to analyse and predict the dynamical behaviours of the complexreaction network more easily. An organisation is a set of molecules algebraicallyclosed and dynamically self-maintaining. A subset C ✓ M is called “closed”if no molecules outside C produced by applying all reactions possible in C tomultisets over C, a subset S ✓ M is “self-maintaining” if reactions which areable to fire in S can occur at certain strictly positive rates to a multiset over Mwithout reducing the number of molecules of any species of S.

Definition 6 (Organisation). A subset of M is a chemical organisation if it

is closed and self-maintaining.

As discussed above, we model the dynamics of the reaction networks in Markovchains. The state space is built upon the discrete numbers of molecules. Withlimited amount of molecules, both cases of too few and too many molecules canprevent reaction rules to be fired. As a consequence, the discrete organisations [7]and those states contributing to generate them are required to be defined.

Definition 7 (Discrete organisation and generator). Let (M,R) be a re-

action network. A subset of species D ✓ M is called a discrete organisation,if there is a state q 2 Q, and C ✓ Acc(q) with D = M(C), and there is a

sequence of transitions (�1

, . . . ,�k) such that RC = [ki=1

[(R,P )2�i

{(R,P )} and

6

q0 = (�k � · · · � �1

)(q) satisfies: 8s 2 D.q0(s) � q(s), where Acc(q) ✓ Q denotes

a set of reachable states from q, for C ✓ Q, M(C) returns a set of molecules

with positive numbers in C, RC denotes the reactions firing in a set of states C.

Such a state q is called a generator of the discrete organisation.

Example 2. The discrete organisations for Example 1 are: {a, b}, {a}, {b}, {}and the corresponding generators are, respectively:

– {6, 7, 8, 10, 11, 13},– {5, 6, 7, 8, 9, 10, 11, 12, 13, 14},– {1, 2, 3, 4, 6, 7, 8, 10, 11, 13},– {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}.

In order to analyse the system behaviour and perform an organisation-basedquantitative analysis of the reaction network, we study the connections betweenchemical organisations and modelling decompositions into strongly connectedcomponents (SCCs). The definition of SCC and BSCC is as follows.

Definition 8 (SCC). A strongly connected component (SCC) of a Markov

chain is a maximal set of states T such that, for every pair of states q and q0,there is a directed path from q to q0.

Definition 9 (BSCC). A bottom strongly connected component (BSCC) is an

SCC T from which no state outside T is reachable from T .

We call an SCC is trivial if it comprises a single state that either is absorbing(i.e., with no outgoing transitions) or has only outgoing transitions (i.e., no self-loops). From now on, we focus purely on non-trivial SCCs.

Intuitively, in the Markov chain for a reaction network, SCCs are importantfor an organisation-based analysis. In the next section, will describe an algorithmto find organisations based on decomposing into SCCs and then identifying thosein which self-maintenance of a set of species. We call such SCCs, which contributeto an organisation, good SCCs. These are defined formally as follows.

Definition 10 (Good SCC). An SCC C is called good if either: (i) there is a

cycle of the firing of every “possible” reaction rule, i.e., those whose reactants Rappear in the SCC (R ✓ {�(q) | q 2 C}); or (ii) C has no outgoing transition,

i.e., it is a BSCC.

Example 3. All SCCs are good in Example 1.

Clearly, some generators can contribute to generate multiple organisations. Thismakes it more di�cult to decompose the Markov into its sets of generators. How-ever, generators located in good SCCs contribute uniquely to an organisation.Such generators are called internal generators.

Definition 11 (Internal generator). A generator g 2 Q of organisation Oi

is internal if it is located in a good SCC C such that:

g 2 C ^[

q2C

�(q) = Oi.

7

Example 4. In Example 1, the internal generators of organisations {a, b}, {a},{b} and {} are {6, 7, 8, 10, 11, 13}, {5, 9, 12, 14}, {1, 2, 3, 4, } and {0}, respectively.

Proposition 1. Given a good SCC C, let Mg = [q2C�(q), if Mg is closed,

then Mg is an organisation, and {q | q 2 C} ✓ G(Mg), where G(Mg) denotes

the internal generators of Mg.

4 Organisation-based Analysis of Reaction Networks

In this section, we propose techniques for quantitative organisation-based anal-ysis of reaction networks. We first introduce an algorithm to find the set oforganisations for a specific reaction network. We then use probabilistic modelchecking and PRISM to analyse quantitative properties regarding the dynamicsof the network with respect to its organisations.

4.1 Finding Organisations

Computing the organisations of a reaction network requires an analysis of thestrongly-connected components of its Markov chain’s underlying transition graph.Since every state in a good SCC is an internal generator of an organisation, weidentify good SCCs to find the organisations of the reaction network.

Algorithm 1 presents the procedure for finding organisations of a given reac-tion network modelled as a CTMC. It is based on the following procedures:

– Tarjan(A) returns the set of strongly-connected components of the Markovchain A, using Tarjan’s SCC algorithm [9] on the underlying digraph.

– findGoodSCCs(SCC) returns the “good” part SCCG of A in which each pos-sible reaction rule is able to be fired.

– find a set of closed molecules (organisation) appearing in each scc 2 SCCG,and its relevant internal generators i.e., states in scc which generates theorganisation.

4.2 Organisation-Based Probabilistic Analysis

We now illustrate, via a number of examples, how we model reaction networksas CTMCs in PRISM and how we derive quantitative properties of them. Ingeneral, given a reaction network, our tool will first produce the CTMC model ofit, then find a set of good SCCs which contribute to organisations, and thus find aset of organisations O and their (internal) generators. Lastly, organisation-basedquantitative analysis of the network, specified as CSL formulas can be performed,e.g.: probabilities (bounds or average) of the movements among organisations,or the expected time to leave or stay at an organisation.

8

Algorithm 1: Finding organisations of a reaction network

Data: CTMC A of reaction network (M,R)Result: O as a set of organisations, G as a set of relevant internal generatorsorganisations O = {};internal generators G = {};SCC Tarjan(A);SCC1 findGoodSCCs(SCC);SCCG SCC1 [ BSCC;TrivialSCC findTrivialSCCs(SCC,BSCC);for scc 2 SCCG do

Mg {�(q) | q 2 scc} ;if Mg is closed then

if Mg 62 O then

/* add new organisation */;O O [Mg ;

/* add internal generators for the new organisation */;G G + {q|q 2 scc} ;

end

else

i index of Mg in O;/* update the internal generators of organisation Mg */;

Gi Gi [ {q|q 2 scc};end

end

end

return O, G.

9

//translation reaction network to PRISM modelctmcconst int N_MAX = 10;formula total = a + b;init total <= N_MAX endinit

// Model parametersconst double rA = N_MAX; // rAconst double rB = N_MAX; // rB

module RNa : [0..N_MAX];b : [0..N_MAX];c : [0..N_MAX];

// r1: a+b -> a[r1] (a*b > 0) & (a > 0) & (b > 0) & (total<= N_MAX) -> a*b : (a’=a-1) & (b’=b);// r2: a -> 2a[r2] (rA*a > 0) & (a > 0) & (total+1<= N_MAX) -> rA*a : (a’=a+1) ;// r3: b -> 2b[r3] (rB*b > 0) & (b > 0) & (total<= N_MAX) -> rB*b : (b’=b+1) ;// r4: a -> 0[r4] (a*a > 0) & (total<= N_MAX) -> a*a : (a’=a-1);// r5: b -> 0[r5] (a*b > 0) & (total<= N_MAX) -> a*b : (b’=b-1);

endmodule

Fig. 2. Example 5 in the PRISM modelling language

Example 5. Consider the reaction network with molecular species M = {a, b}and reactions rules with stochastic rates shown below, where ]a denotes thenumber of molecules of species a.

R Stochastic rate

a+ b ! a ]a · ]ba ! 2a ↵ · ]ab ! 2b � · ]ba ! ; (]a)2

b ! ; (]b)2

Fig. 2 shows the PRISM model for the CTMC of the above reaction networkwith ↵ = � = N

max

= 10. This is automatically generated by our tool froma description of the reaction network. The resulting model is described in thePRISM modelling language. It consists of a keyword describing the the modeltype (ctmc), a set of constants, and a single module whose state is representedby a set of finite-ranging variables. Each variable stores the number of eachmolecular species. The behaviour of the module is specified by a set of guardedcommands of the form []g ! r : u. This command is interpreted as: if thepredicate g is true, then the system is updated by command u. Command ucomprises one or more statements of the form x0 = . . . indicating how the valueof variable x is updated. The rate at which this occurs is given by r, which willbe attached to the corresponding transition label in the underlying CTMC.

10

Fig. 3 shows the CTMC model of the system. There are 66 states and 201transitions in the model, and there are 4 SCCs ({a > 0, b > 0}, {a > 0, b =0}, {a = 0, b > 0}, {a = b = 0}) with 1 BSCC ({a = b = 0}).

The first property we consider is the probability of moving between organi-sations. Specifically, the probability of moving from O

1

to O2

can be specified inCSL as: P

=?

[ o1

U o2

], where o1

and o2

are atomic propositions labelling stateswhich represent internal generators of organisations O

1

and O2

. In this example,all SCCs are good and each (good) SCC generates exactly one organisation. Tovisualise the movement between organisation, we analyse the property above foreach pair of organisations and construct the abstract transition graph shown inFig. 4. Block are labelled with organisations and, for each possible transitionbetween organisations, we show the range of probabilities (over all states in thesource organisation) and the average value (over the same set of states).

We also consider the expected time to leave (the generators of) each organisa-tion. The CSL property to specify this, for some organisation Oi, is: R=?

[⌃¬oi ],where oi is an atomic proposition as above, ¬ denotes negation and we assign astate reward of 1 to every state of the CTMC. This value is also shown for eachorganisation in Fig. 4, inside the block for the corresponding organisation.

Finally, we consider the e↵ect of making some constructive perturbation tothe reaction network, by adding rules to create species with a small rate. Fig. 5shows the results of the same analysis described above for the following con-structive perturbation in which � = 0.01:

R Stochastic rate

; ! a � · a; ! b � · b

The result shows that, generating a and b with a small rate can cause anupward movement and heavily a↵ect the system’s behaviour. The expected timeto stay in each organisation dramatically decreases since the movement flow getsfaster. This meets our intuition: the upward flow introduced by the constructiveperturbation leads to a smoother flow of the system.

Example 6. Consider now the reaction network with M = {a, b, c, d} and R ={a + b ! a + 2b, a + d ! a + 2d, b + c ! 2c, c ! b, b + d ! c, b ! ;, c !;, d ! ;}. The corresponding PRISM model for ↵ = � = N

max

= 5 is given inFig. 6 and Fig. 7 shows the structure CTMC for N

max

= 5. Note that, even fora small volume N

max

= 5, the model is quite complex. There are 126 states, 386transitions, 28 SCCs and 6 BSCCs.

Fig. 8 illustrates, in the same fashion as above, the transition probabilitiesbetween all SCCs of the CTMC, and the expected times to leave them. Notethat not all SCCs are good SCCs in this example: we highlight good SCCs incolour in Fig. 8. For instance, the SCC labelled as (99, 105; 0.25) is not a goodone. There are two states in this SCC: state 99 (a = 2, b = 0, c = 1, d = 1) andstate 105 (a = 2, b = 1, c = 1, d = 1). The set of molecules appearing in this nodeis closed, but reaction rules such as c ! ; and d ! ; cannot be fired within theSCC and it is therefore not good.

11

0

(0,0)1

1

(0,1)

1

2

(0,2)

10 4

3

(0,3)

20 9

4

(0,4)

30 16

5

(0,5)

40 25

6

(0,6)

5036

7

(0,7)

60 49

8

(0,8)

70 64

9

(0,9)

80 81

10

(0,10)

90100

11

(1,0)

1

21

(2,0)

10 4

30

(3,0)

20

12

(1,1)

12

13

(1,2)

10

22

(2,1)

10 1

6

14

(1,3)

20

23

(2,2)

10

3

4

10

31

(3,1)

20

1

12

15

(1,4)

30

24

(2,3)

10 4

8

20

32

(3,2)

20

1

20

16

(1,5)

40

25

(2,4)

10

4

15

30

33

(3,3)

20

1

30

17

(1,6)

50

26

(2,5)

10 4

24

40

34

(3,4)

20

1

42

18

(1,7)

60

27

(2,6)

10

4

35

50

35

(3,5)

20

1

56

19

(1,8)

70

28

(2,7)

104

48

60

36

(3,6)

20

1

72

20

(1,9)

80

29

(2,8)

104

63

70

37

(3,7)

20

1

90

4

80

9

38

(4,0)

30

9

4

10

39

(4,1)

30

9

10

20

40

(4,2)

30

9

18

30

41

(4,3)

30

9

28

40

42

(4,4)

30

9

40

50

43

(4,5)

30

9

54

60

44

(4,6)

30

9

70

16

45

(5,0)

40

16

5

10

46

(5,1)

40

16

12

20

47

(5,2)

40

16

21

30

48

(5,3)

40

16

32

40

49

(5,4)

40

16

45

50

50

(5,5)

40

16

60

25

51

(6,0)

50

25

6

10

52

(6,1)

50

25

14

20

53

(6,2)

50

25

24

30

54

(6,3)

50

25

36

40

55

(6,4)

50

25

50

36

56

(7,0)

60

36

7

10

57

(7,1)

60

36

16

20

58

(7,2)

60

36

27

30

59

(7,3)

60

36

40

49

60

(8,0)

70

49

8

10

61

(8,1)

70

49

18

20

62

(8,2)

70

49

30

64

63

(9,0)

80

64

910

64

(9,1)

80

64

20

81

65

(10,0)

90

81

10

100

Fig. 3. Example 5: CTMC model with 4 SCCs and 1 BSCC

12

{a, b} 0.5959

{a} 196.433 {b} 196.433

{}, 1

[0.8267, 0.9979], 0.956 [0.0021, 0.1733], 0.044

1 1

1

Fig. 4. Organisation movement for Example 5: transition probabilities (ranges andaverages of possible values) between generators of organisations, and the expectedtime to leave them.

{a, b} 0.0715

{a} 11.05 {b} 11.05

{}, 5.0

[0.8935, 0.9999], 0.98[0.842, 0.947], 0.933

[7.137E � 7, 0.106], 0.02

[0.053, 0.158], 0.067 [0.053, 0.158], 0.067

[0.5, 0.5], 0.5

Fig. 5. Organisation movement for Example 5 with constructive perturbation.

In addition, the SCC labelled as (12, 27; 0.25) is also not a good one. Itcontains state 12 (a = 0, b = 0, c = 2, d = 1) and state 27 (a = 0, b = 1, c =1, d = 1). The set of molecules appeared in this node is closed, but reaction rulec ! ; is unable to be fired locally, i.e., this decay will only introduce transitionsto other SCCs. Similar cases can happen for some of the other reaction rules.

Fig. 9 presents the transition probabilities between good SCCs only, andthe expected time to leave them. Note that multiple good SCCs can contributeto the generation of one organisation. For instance, both good SCCs labelled65 . . . and 98 . . . contribute to organisation {a, b, c}. Based on this graph, we canbuild up the transition graph over organisations. Fig. 10 presents the transitionprobabilities between (internal generators of) organisations, and the expectedtime to leave each of them. This helps us to understand the movement betweenorganisations and can be viewed as an abstract model capturing the behaviourof he reaction network at the level of organisations.

5 Conclusions

This report investigates the applicability of probabilistic model checking tech-niques to the quantitative analysis of chemical reaction networks. We model the

13

//translation reaction network to PRISM modelctmc

const int MAX_AMOUNT = 10;formula total = a + b + c + d;init total <= MAX_AMOUNT & (a>0) & (a<7) & (d>0) & (b>0 | c>0) endinit

// Model parametersconst double rA = 1; // rAconst double rB = 1; // rBmodule RN

a : [0..MAX_AMOUNT] init 2;b : [0..MAX_AMOUNT] init 2;c : [0..MAX_AMOUNT] init 2;d : [0..MAX_AMOUNT] init 2;

// r1: a+b -> a+2b[r1] (rA*a*b > 0) & (a > 0) & (b > 0) & (total+1<= MAX_AMOUNT)

-> rA*a*b : (a’=a) & (b’=b+1);// r2: a+d -> a+2d[r2] (rA*a*d > 0) & (a > 0) & (d > 0) & (total+1<= MAX_AMOUNT)

-> rA*a*d : (a’=a) & (d’=d+1);// r3: b+c -> 2c[r3] (rB*b*c > 0) & (b > 0) & (c > 0) & (total<= MAX_AMOUNT)

-> rB*b*c : (b’=b-1) & (c’=c+1);// r4: c -> b[r4] (rA*c > 0) & (c > 0) & (total<= MAX_AMOUNT)

-> rA*c : (b’=b+1) & (c’=c-1);// r5: b+d -> 2c[r5] (rB*b*d > 0) & (b > 0) & (d > 0) & (total<= MAX_AMOUNT)

-> rB*b*d : (b’=b-1) & (c’=c+2) & (d’=d-1);// r6: b -> 0[r6] (rB*b > 0) & (b > 0) & (total<= MAX_AMOUNT) -> rB*b : (b’=b-1);// r7: c -> 0[r7] (rA*c > 0) & (c > 0) & (total<= MAX_AMOUNT) -> rA*c : (c’=c-1);// r8: d -> 0[r8] (rA*d > 0) & (d > 0) & (total<= MAX_AMOUNT) -> rA*d : (d’=d-1);

endmodule

Fig. 6. Example 6 in the PRISM modelling language.

reaction networks as continuous-time Markov chains, and specify quantitativeproperties of interest in the logic CSL with rewards. We investigate connec-tions between chemical organisations and model decompositions into stronglyconnected components (SCCs), and study the problem of how to analyse themodel in terms of organisations. For future work, we propose to build a frame-work for coarse-graining complex chemical reaction networks in terms of theorganisation-based quantitative analysis presented in this report. The startingpoint is an approximation of the behaviour of the concrete CTMC model interms of the organisation-based abstract model.

14

0

(0,0,0,0)1

1

(0,0,0,1)

1

2

(0,0,0,2)

2

3

(0,0,0,3)

3

4

(0,0,0,4)

4

5

(0,0,0,5)

5

6

(0,0,1,0)

121

(0,1,0,0)

1

1

7

(0,0,1,1)

1 122

(0,1,0,1)

1

1 1

1

8

(0,0,1,2)

1 223

(0,1,0,2)

1

1 2

2

9

(0,0,1,3)

1 324

(0,1,0,3)

1

1 3

3

10

(0,0,1,4)

1 425

(0,1,0,4)

1

1 4

4

11

(0,0,2,0)

226

(0,1,1,0)

2

1

1

1

36

(0,2,0,0)

1

12

(0,0,2,1)

2 127

(0,1,1,1)

2

1

1

1

1

1

37

(0,2,0,1)

1

13

(0,0,2,2)

2 228

(0,1,1,2)

2

1

1

2

2

1

38

(0,2,0,2)

1

14

(0,0,2,3)

2 3

29

(0,1,1,3)

2

1

1

3

3

1

39

(0,2,0,3)

1

15

(0,0,3,0)

330

(0,1,2,0)

3

1

2

2

40

(0,2,1,0)

2

16

(0,0,3,1)

3 1

31

(0,1,2,1)

3

1

2

1

1

2

41

(0,2,1,1)

2

17

(0,0,3,2)

3

2

32

(0,1,2,2)

3

1

2 2

2

2

42

(0,2,1,2)

2

18

(0,0,4,0)

4

33

(0,1,3,0)

4

1

3

3

43

(0,2,2,0)

3

19

(0,0,4,1)

4 1

34

(0,1,3,1)

4

1

3

1

13

44

(0,2,2,1)

3

20

(0,0,5,0)

5

35

(0,1,4,0)

5

1

4

4

45

(0,2,3,0)

4

2

2 2

1

2 4

2

2 6

3

2

2

146

(0,3,0,0)

1

2 2

2

1 1

47

(0,3,0,1)

1

2

4

2

1

2

48

(0,3,0,2)

1

2

4

2

49

(0,3,1,0)

2

2 2

4

2 1

50

(0,3,1,1)

2

2

6

351

(0,3,2,0)

3

3

3 3

1

3

6

2

3

3

1

52

(0,4,0,0)

1

3 3

3

1 153

(0,4,0,1)

1 3

6

254

(0,4,1,0)

2

4

4 4

1

4

4

155

(0,5,0,0)

1

5

56

(1,0,0,0)1

57

(1,0,0,1)

1

58

(1,0,0,2)

12

59

(1,0,0,3)

2 3

60

(1,0,0,4)

34

61

(1,0,1,0)

171

(1,1,0,0)

1

1

81

(1,2,0,0)

1

62

(1,0,1,1)

1

1

63

(1,0,1,2)

1

72

(1,1,0,1)

1

1

2

64

(1,0,1,3)

2

73

(1,1,0,2)

1

1

1

1

1

82

(1,2,0,1)

1

1

3

74

(1,1,0,3)

1

1

2

2

2

83

(1,2,0,2)

1

1

3

3

65

(1,0,2,0)

2

75

(1,1,1,0)

2

1

1

1

1

84

(1,2,1,0)

1

66

(1,0,2,1)

2

1

67

(1,0,2,2)

1

76

(1,1,1,1)

2

2

2

77

(1,1,1,2)

2

1

1

1

1

1

1

185

(1,2,1,1)

11

1

2

1

2

1

68

(1,0,3,0)

3

78

(1,1,2,0)

3

1

2

2

2

86

(1,2,2,0)

1

69

(1,0,3,1)

3

1

79

(1,1,2,1)

3

1

2

1

1

2

2

70

(1,0,4,0)

4

80

(1,1,3,0)

4

1

3

3

3

2

87

(1,3,0,0)

2

2

2

1

1

88

(1,3,0,1)

2

2

4

2

2

2

1

1

89

(1,3,1,0)

2

2

2

2 1

1

1

2

4

2

2

3

90

(1,4,0,0)

3

3

3

1

3

3

1

1

4

91

(2,0,0,0)1

92

(2,0,0,1)

1

93

(2,0,0,2)

2 2

94

(2,0,0,3)

4 3

95

(2,0,1,0)

1101

(2,1,0,0)

1

1

107

(2,2,0,0)

2

96

(2,0,1,1)

1

1

97

(2,0,1,2)

2

102

(2,1,0,1)

1

1

2

103

(2,1,0,2)

1

1

1

1

2

108

(2,2,0,1)

2

1

2

2

98

(2,0,2,0)

2

104

(2,1,1,0)

2

1

1

1

1

109

(2,2,1,0)

2

99

(2,0,2,1)

2

1

105

(2,1,1,1)

2

1

1

1

1

1

1

100

(2,0,3,0)

3

106

(2,1,2,0)

3

1

2

2

2

2

110

(2,3,0,0)

4

2

2

1

2

2

1

1

3

111

(3,0,0,0)1

112

(3,0,0,1)

1113

(3,0,0,2)

32

114

(3,0,1,0)

1117

(3,1,0,0)

1

1

120

(3,2,0,0)

3

115

(3,0,1,1)

1 1118

(3,1,0,1)

1

1 1

1

116

(3,0,2,0)

2

119

(3,1,1,0)

2

1

1

1

1

2

121

(4,0,0,0)1

122

(4,0,0,1)

1

123

(4,0,1,0)

1124

(4,1,0,0)

1

1

125

(5,0,0,0)1

Fig. 7. CTMC for the reaction network from Example 6, with 28 SCCs and 6 BSCCs.

62 . . . 88; 0.59

65 . . . 89; 0.708 98 . . . 109; 0.574

71 . . . 90; 3.25

101 . . . 110; 4.22

117, 120; 2.75

57 . . . 60; 3.25

92 . . . 94; 4.22

112, 113; 2.75

56; 1 91; 1 111; 1

121; 1 125; 1

0; 1

99, 105; 0.25

96 . . . 108; 0.52

14, 29; 0.147

13, 28; 0.185

12, 27; 0.25

17, 32, 42;0.143

16, 31, 41;0.192

11, 26;0.786

19, 34, 44, 50;0.154

15, 30, 40;0.302

20, 35, 45, 51, 540.195

18, 33, 43, 490.238

116, 119; 0.393

[0.04, 0.72], 0.34[0.125, 0.416], 0.217 [0.056, 0.73], 0.356

[0.02, 0.19], 0.087

[0.65, 0.88], 0.78

[0.12, 0.35], 0.22

[0.66, 0.86], 0.77

[0.14, 0.34], 0.23

1

1

1

1

1

1

[0.643, 0.786], 0.714

[0.214, 0.257], 0.286

[0.357, 0.393], 0.375[0.607, 0.643], 0.625

[0.045, 0.462], 0.159

[0.192, 0.39], 0.29 [0.135, 0.65], 0.439

[0.047, 0.169], 0.112

[0.63, 0.72], 0.676

[0.004, 0.014], 0.009

[9.02E � 4, 0.0028], 0.002

[0.265, 0.361], 0.313

[0.54, 0.63], 0.587

[0.004, 0.013], 0.0087

[0.36, 0.45], 0.4043

[0.536, 0.614], 0.575

[0.0037, 0.0501], 0.021

[0.235, 0.529], 0.392

[0.004, 0.055], 0.0232

[0.458, 0.592], 0.555

[9.986E � 4, 0.0136], 0.0058

[4.911E � 42, 0.0067], 0.003

[0.346, 0.668], 0.524

[0.007, 0.067], 0.03

[0.31, 0.47], 0.392

[0.0124, 0.1175], 0.053

1

[0.39, 0.74], 0.59

[4.535E � 4, 0.0215], 0.0067

[0.001, 0.054], 0.0167

[0.26, 0.43], 0.367[0.0013, 0.06], 0.0185

[0.625, 0.937], 0.8125

[0.0625, 0.345], 0.1875

[0.785, 0.998], 0.95

[0.0017, 0.215], 0.05

[0.011, 0.274], 0.09

[0.726, 0.989], 0.91

Fig. 8. Transition probabilities (bounds/averages) between all SCCs of the CTMC forExample 6 and expected leaving times.

15

62 . . . 88; 0.59

65 . . . 89; 0.708 98 . . . 109; 0.574

71 . . . 90; 3.25 101 . . . 110; 4.22

117, 120; 2.75

57 . . . 60; 3.25 92 . . . 94; 4.22

112, 113; 2.75

56; 1 91; 1 111; 1

121; 1 125; 1

0; 1

[0.04, 0.72], 0.34

[0.125, 0.416], 0.217[0.056, 0.73], 0.356

[0.02, 0.19], 0.087

[0.65, 0.88], 0.78

[0.12, 0.35], 0.22

[0.66, 0.86], 0.77

[0.14, 0.34], 0.23

1

1

1

1

1

1

1 1 1

1 1

1

Fig. 9. Transition probabilities (bounds and average) between good SCCs for Exam-ple 6 and the expected time to leave them.

{a, b, c, d} 0.59

{a, b, c} 0.66

{a, b} 3.46 {a, d} 3.46

{a}, 1

{}, 1

[0.04, 0.72], 0.34

[0.1246, 0.4158], 0.217 [0.056, 0.73], 0.356

[0.02, 0.185], 0.087[0.65, 0.88], 0.78

[0.12, 0.35], 0.22

1 1

1

1

Fig. 10. Transition probabilities (bounds and average) between generators of organi-sations for Example 6 and the expected time to leave them.

References

1. A. Aziz, K. Sanwal, V. Singhal, and R. Brayton. Verifying continuous time markovchains. pages 269–276. Springer, 1996.

2. C. Baier, B. Haverkort, H. Hermanns, and J.-P. Katoen. Model-checking algorithmsfor continuous-time Markov chains. IEEE Transactions on Software Engineering,29(6):524–541, 2003.

16

3. E. Clarke, O. Grumberg, and D. Peled. Model Checking. The MIT Press, 2000.4. P. Dittrich and P. di Fenizio. Chemical organisation theory. Bulletin of Mathematical

Biology, 69(4):1199–1231, 2007.5. W. Fontana. Algorithmic chemistry. In Artificial Life II. Addison Wesley, 1992.6. D. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of

Physical Chemistry, 81(25):2340–2361, 1977.7. P. Kreyssig, C. Wozar, S. Peter, T. Veloz, B. Ibrahim, and P. Dittrich. E↵ects

of small particle numbers on long-term behaviour in discrete biochemical systems.Bioinformatics, 30(17):475–481, 2014.

8. M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of proba-bilistic real-time systems. In G. Gopalakrishnan and S. Qadeer, editors, Proc. 23rdInternational Conference on Computer Aided Verification (CAV’11), volume 6806of LNCS, pages 585–591. Springer, 2011.

9. R. Tarjan. Depth first search and linear graph algorithms. SIAM Journal on Com-

puting, 1972.

17

Documents

HIERATIC(( - University of Birmingham · Inﬁnite- and Finite-Horizon Bisimulation Minimisation in PRISM Chris Good1, Nishanthan Kamaleson 2, David Parker , Mate Puljiz1, and Jonathan