Rights / License: Research Collection In Copyright - …7584/eth... · agr egation de chaine de Markov exacte et approch ee, qui constitue une grande partie de la th ese. La th eorie

Research Collection

Doctoral Thesis

Formal reductions of stochastic rule-based models ofbiochemical systems

Author(s): Petrov, Tatjana

Publication Date: 2013

Permanent Link: https://doi.org/10.3929/ethz-a-010006341

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-010006341

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

Diss. ETH No. 21269

Formal reductions of stochastic

rule-based models of biochemical

systems

A dissertation submitted to

ETH ZURICH

for the degree of

Doctor of Sciences

presented by

Tatjana Petrov

M. Sc. Computer Science, University of Novi Sad

born 27.12.1983

citizen of Serbia

accepted on the recommendation of

Prof. Dr. Heinz W. Koppl, examiner

Prof. Dr. Thomas A. Henzinger, co-examiner

Dr. Jerome Feret, co-examiner

2013

© Copyright by Tatjana Petrov, 2013.

All Rights Reserved.

ii

Abstract

Understanding principles behind cell’s functioning is one of the most fundamental

topics of science today. However, realistically explaining the variety and complex-

ity observed in biological systems results in highly complex models with a huge

number of possible state configurations. Reducing the complexity of these models,

while preserving the realistic model description, represents a major challenge.

People have proposed domain-specific formal languages in order to facilitate the

knowledge representation and to aid the model analysis. One of them, rule-based

language, enables to compactly specify molecular interactions, by maintaining the

internal protein structure in form of a site-graph, and by allowing interactions

to happen upon testing only patterns, local contexts of molecular species. The

executions of rule-based models are traces of a continuous-time Markov chain

(CTMC), defined according to the principles of chemical kinetics.

In this thesis, we study formal reductions of rule-based models. The idea of re-

duction is that, if the rules are executed upon testing patterns (instead of full

molecular species), then the stochastic executions of the whole model can be de-

scribed in terms of a carefully chosen set of patterns, called fragments, that are

much fewer the molecular species. Our method aligns with the principle of static

program analysis – the CTMC traces (semantics) are considered only virtually,

while the actual operations are performed over the rule-set (source code, that is

a set of site-graph-rewrite rules). To this end, we study separately mathematical

relations between rule-sets, and what these relations imply for their respective

CTMC’s.

We provide a general model reduction procedure, that is efficient – of complexity

linear in the description of the rule-set, and automatic – it applies to any well-

defined rule-based program. The formal relation between the respective CTMC’s

is guaranteed within two frameworks. In the framework for exact reductions, the

set of fragments is enforced and the precise relation between respective CTMC’s is

guaranteed. In the framework for approximate reductions, the set of fragments can

vary, and, for a given time limit of a trace, the error in terms of Kullback-Leibler

divergence for trace distributions of the CTMC’s is computed. Both frameworks

rely on a unifying mathematical theory of exact and approximate Markov chain

aggregation, which takes a major part of the thesis. The theory is instantiated

with three toy examples and two large-scale case studies.

iii

Zusammenfassung

Das Verstandnis uber die Prinzipien und der Funktionsweise von Zellen stellt

eines der grundlegendsten Themen der gegenwartigen Wissenschaft dar. Allerd-

ings fuhrt die realistische Beschreibung der Vielfalt und der Komplexitat solcher

biologischen Systeme zu hochkomplexen Modellen mit einer enormen Anzahl an

moglichen Zustandskonfigurationen. Die Reduktion der Modellkomplexitat stellt,

unter Wahrung der realistischen Modellbeschreibung, eine grosse Herausforderung

dar.

Zur Erleichterung des Wissensstandes und zur Unterstutzung der Modellanal-

yse wurden domanenspezifische, formale Sprachen vorgeschlagen. Eine davon,

die regelbasierte Sprache, ermoglicht eine kompakte Spezifikation der molekularen

Wechselwirkungen, durch die Wahrung der inneren Proteinstruktur in Form einer

site-graph, und indem Wechselwirkungen erfolgen konnen. Die Ausfuhrung eines

regelbasierten Modells liefert einen Pfad einer zeitstetigen Markov Kette (Englisch:

continuous-time Markov chain, CTMC), welche durch die Prinzipien der chemis-

chen Kinetik definiert ist.

In dieser Arbeit untersuchen wir formale Reduktionen von regelbasierten Mod-

ellen. Die Idee der Reduktion besteht darin, dass, wenn die Regeln an Testmuster

angewandt werden (anstatt auf die gesamte molekulare Spezies), die stochastis-

che Ausfuhrung des gesamten Modells durch eine sorgfaltig gewahlte Menge von

Mustern, sogenannte Fragmente, beschrieben werden kann, mit viel weniger moleku-

laren Arten. Unsere Methode richtet sich an das Prinzip der statischen Program-

manalyse — die CTMC Spuren (Sematik) werden nur virtuell berucksichtigt,

wahrend die eigentlichen Operationen uber den Regelsatz durchgefhrt werden

(Quellcode, als Menge der site-graph-rewrite Regeln). Zu diesem Zweck unter-

suchen wir separat die mathematischen Beziehungen zwischen den Regelwerken

und deren Bedeutungen fur die jeweiligen CTMC’s.

Wir schlagen ein allgemeines Verfahren zur Modellreduktion vor, welches linear mit

der Grosse des Regelsatzes skaliert und desweiteren auf beliebige regelbasierte Pro-

gramme anwendbar ist. Die formale Beziehung zwischen den jeweiligen CTMC’s

ist innerhalb zwei Szenarien gewahrleistet. Im Kontext exakter Reduktionen ist die

Menge der Fragmente eindeutig und die genaue Beziehung zwischen den jeweiligen

CTMC’s gewahrleistet. Im Kontext genaherter Reduktionen kann die Menge der

iv

Fragmente variieren und fur ein gegebenes Zeitlimit eines Pfades wird der Fehler

anhand der Kullback-Leibler Divergenz zwischen Verteilungen berechnet. Beide

Programmansatze basieren auf einer vereineitlichten mathematischen Theorie der

exakten und approximierten Markov Ketten Aggregation, die einen wesentlichen

Teil der Arbeit darstellt. Die Theorie wird mit drei einfachen Beispielen und zwei

umfangreichen Fallstudien instanziiert.

v

Resume

Comprendre les principes sous-jacents au fonctionnement de la cellule est un

des sujets fondamentaux en sciences aujourd’hui. Cependant, expliquer de facon

realiste la variete et la complexite observees dans les systemes biologiques conduit

a des modeles tres complexes contenant un grand nombre de configurations possi-

bles. Reduire la complexite de ces modeles, tout en preservant une description de

modele fidele a la realite, represente un defi majeur.

Des langages formels dedies ont eteproposes afin de representer les connaissances

ainsi que pour faciliter lanalyse de modele. Lun dentre-eux, le langage a base

de regles, permet de definir de maniere concise les interactions moleculaires, en

decrivant la structure interne de la proteine sous la forme dun graphe et en per-

mettant aux interactions de se produire apres avoir testeuniquement des motifs,

contextes locaux d’especes moleculaires. Les executions des modeles a base de

regles sont les traces d’une chaıne de Markov a temps continu (CMTC), definie

selon les principes de la cinetique chimique.

Dans cette these, nous etudions les reductions formelles des modeles a base de

regles. L’idee de la reduction est que si les regles sont executees apres avoir

evalueles motifs (plutt que les especes moleculaires completes) alors l’execution

stochastique du modele complet peut etre decrite par un ensemble de motifs

biens choisis, appeles fragments, qui sont bien moins nombreux que les especes

moleculaires. Notre methode correspond au principe d’analyse statique de pro-

grammes – les traces de la CMTC (semantique) sont considerees uniquement de

maniere virtuelle, tandis que les operations sont effectuees sur l’ensemble des regles,

c’est a dire par un ensemble de regles de reecriture. A cette fin, nous etudions

separement les relations mathematiques entre les ensembles de regles et ce que ces

relations impliquent sur leurs CMTC respectives.

Nous proposons une procedure generale de reduction de modele, efficace – de com-

plexite lineaire en la taille de l’ensemble de regles – et automatique – applicable a

tout programme a base de regles bien definies. La relation formelle entre les CMTC

respectives est garantie au sein de deux cadres. Dans le cadre des reductions ex-

actes, l’ensemble des fragments est impose et la relation precise entre les CMTC

respectives est garantie. Dans le cadre des reductions approchees, l’ensemble des

fragments peut varier, et, pour un temps limite de trace donne, l’erreur en ter-

mes de divergence de Kullback-Leibler pour les distributions de trace des CMTC

vi

est calculee. Ces deux cadres reposent sur une theorie mathematique unifiant

agregation de chaine de Markov exacte et approchee, qui constitue une grande

partie de la these. La theorie est appliquee a trois exemples jouets et a deux

etudes de cas a grande echelle.

vii

Sommario

Comprendere i meccanismi alla base del funzionamento delle cellule e uno degli

obiettivi principali della scienza moderna. Tuttavia, spiegare in maniera realis-

tica la varieta e la complessita osservata nei sistemi biologici richiede l’utilizzo di

modelli molto complessi con un numero elevatissimo di possibili configurazioni di

stato. Ridurre la complessita di tali modelli, mantenendo al tempo stesso una

descrizione realistica dei sistemi in esame, e quindi un obiettivo molto importante.

Allo scopo di facilitare l’analisi e la rappresentazione del contenuto informativo

dei sistemi, negli ultimi anni sono stati proposti diversi linguaggi formali dominio-

specifici. Uno di questi, il rule-based language, permette di rappresentare in modo

compatto le interazioni tra molecole, mantenendo la struttura proteica interna in

forma di grafo dei siti, e verificando la possibilita di accadimento di determinate

interazioni tramite il riconoscimento di particolari patterns, contesti locali di specie

molecolari. Le esecuzioni dei modelli rule-based che sono realizzazioni di catene

di Markov a tempo continuo (CTMC), definite secondo i principi della cinetica

chimica.

L’obiettivo di questa tesi e la riduzione formale dei modelli rule-based. L’idea

alla base della riduzione e la seguente: se le regole sono eseguite verificando

dei patterns (invece delle intere specie molecolari), allora l’esecuzione stocastica

dell’intero modello puo essere descritta in termini di un appropriato insieme di pat-

terns, chiamati frammenti, il cui numero e di molto inferiore a quello delle specie

molecolari. Il metodo utilizzato in questa tesi si allinea con i principi dell’analisi

statica di programmi - i percorsi della CTMC (semantica) sono considerate solo

virtualmente, mentre le operazioni reali vengono eseguite attraverso un set di re-

gole (il codice sorgente che e l’insieme di grafo dei siti-regole di riscrittura). A

tale scopo ci occupiamo di studiare separatamente le relazioni matematiche tra i

diversi insiemi di regole e cosa queste relazioni implichino per le rispettive catene

di Markov a tempo continuo.

In questo lavoro, presentiamo una procedura generale di riduzione di modello che e

efficiente di complessita lineare nella descrizione del set di regole, ed automatica -

si applica a qualunque rule-based program che sia ben definito. La relazione formale

tra le rispettive CTMC e garantita per mezzo di due framework. Nel framework

per la riduzione esatta, l’insieme di frammenti e imposto a priori e la precisa

viii

relazione che intercorre tra le diverse CTMC e garantita. Per quanto riguarda

invece il framework per la riduzione approssimata, l’insieme di frammenti puo

variare, e, dato un tempo limite di una realizzazione, e possibile calcolare l’errore in

termini di divergenza Kullback-Leibler rispetto alla distribuzione delle realizzazioni

della CTMC. Entrambi i framework si basano su una teoria matematica della

aggregazione esatta ed approssimata delle catene di Markov, la cui presentazione

e parte fondamentale del presente lavoro. La teoria qui presentata e esemplificata

attraverso l’uso di tre esempi numerici e di due casi di studio su larga scala.

Acknowledgements

First of all, I would like to thank to Professor Heinz Koppl, who invited me

to work on this PhD project and who introduced me to rule-based modeling of

biochemical systems. With his knowledge and professionalism, he has been a great

advisor, giving me the freedom to pursue my ideas and guiding me wisely, always

towards better and finer results. His steady vision and belief in this project largely

influenced the thesis to its final form. I have learnt a lot from him, and from the

unique blend of collaborators he brought into this project. Thank you very much

for your ongoing support, as well as your encouragement to this day.

It was an incredible honor for me to start the PhD under the supervision of Pro-

fessor Tom Henzinger. I highly appreciate his guidance, starting at a time when

I was taking his course on model checking at EPFL and finally as my thesis co-

advisor and co-examiner. Thank you very much for your consistent interest in my

work, for your encouragement and support, and for welcoming me at IST in many

occasions. I look forward to continuing our collaboration in the future.

I consider myself very lucky to have worked with Jerome Feret, the inventor of

‘fragments’, who has directly or indirectly inspired many of the results presented in

this thesis, and who has largely shaped my thinking about abstract interpretation

and model reduction. Thanks a lot for being generous with your time for discus-

sions, for always being supportive, for hosting me at ENS, and for co-examining

the final exam.

Much of the enthusiasm and inspiration embedded into this project were created

from discussions with the ‘Kappa people’, Vincent Danos, Walter Fontana, Russ

Harmer and Jean Krivine, who always generously shared their time and thoughts

about rule-based modeling with me. I am especially grateful to Vincent, for hosting

me in 2009 in his lab in Edinburgh at the beginning of my work on this project;

those were intense two weeks that gave an important impulse to this project. I

was fortunate to pursue a one-semester long research stay in 2010, in the lab of

Professor Walter Fontana. Thank you very much for hosting me, and for the

memorable conversations on information propagation in molecular signaling. The

stay in Boston was an incredibly stimulating experience that largely influenced

the later stages of my PhD work.

ix

x

I would like to thank to the postdocs Arnab Ganguly, who taught me the Kurtz’

theorem and the asymptotic behavior of Markov chains, and to Loic Pauleve and

Michael Klann, for valuable discussions on spatial rule-based modeling in the last

year of my PhD. I am very grateful to Loic Pauleve for proof-reading the final ver-

sion of this thesis. My gratitude further goes to Professor Prakash Panangaden,

for his support and for the discussions in workshops in Bertinoro and Barbados.

It was great to have conference-companions and colleagues Sasa Misailovic, Aure-

lian Rizk, Nicholas Stroustrup, Daniel Schultz, Cheng-Zhong Zhang, Ferdinanda

Camporesi, Norman Ferns, to lead inspiring discussions on model reduction and

beyond. Finally, I am thankful to my students Marica Stojanov, Eirini Arvaniti

and Zahra Karimadini, for choosing to work with me at ETH on topics related to

this thesis.

As I first started this PhD journey at EPFL, I would like to thank to the former

MTC group at EPFL, and in particular to Barbara Jobstmann, Laurent Doyen,

Viktor Kuncak, and Ruzica Piskac, for introducing me to different challenges of

formal verification. I thank to Daniel Kroening, for hosting me in his lab in Oxford,

and to Georg Weissenbacher, for introducing me to the counter-example guided

abstraction, and for showing me the Oxford colleges in their full charm. I am

particularly indebted to Verena Wolf, for turning my attention to the Markov chain

reduction problem, and to Maria Mateescu, a PhD companion, for her friendship

and advice. My deepest thanks to Professor Martin Hasler and Professor Sebastian

Maerkl, for providing the office space and for their hospitality during the transition

from EPFL to ETHZ.

I would not have ended up at EPFL (and later at ETHZ) without my thesis

advisors at the University of Novi Sad, Professor Dragan Masulovic and Professor

Igor Dolinka, who were unconditionally supportive when applying – thank you

very much. I would like to thank to Professor Viktor Kuncak, for welcoming me

in Lausanne, and for his support during the early days at EPFL. It is possible

that I would not have even thought of studying abroad if I did not attend the

seminars at the Petnica Science Center from early high-school years – my sincere

gratitude goes to Petnica overall, as well as to my very first co-authors and very

special friends, Zeljka Dobricic and Lazar Krstic.

This thesis has abstracts written in english, and was translated to the three offi-

cial languages of Switzerland, thanks to Zoran Vidakovic and Christoph Zechner

(german), Raphael Barazzuti and Aurelian Rizk (french), and Davide Martino

xi

Raimondo (italian). I am very grateful to all of them, for providing the help with

the translation on a one-night short notice.

The success of this project owes much to all the lab mates from the BISON group

at ETHZ, and to the whole Automatic Control Lab, whose friendly atmosphere

made my transition from Lausanne to Zurich smooth and pleasant. Christoph,

Michael, Sunil, Preetam, thanks for being the best office mates. Kolega Gabriele,

Riki, Mike, Khoa, Marianne, Stefan, Claudia, Andreas, Costas, Maria, Martin,

Vedrana, Christian, thanks for the friendship and joyful lunch breaks. Zurich is a

wonderful city, but it was even nicer with Davide, Miruska, Mica, Jelena, Marko,

Rafal, Nawal, Afonso, Riki, Stefano, Stephan and Manuela, Sofia and Mathias,

Simonetta and Alberto. Thanks for all the great times, there is more to come.

I somewhere read that moving to a foreign country brings you into a psycholog-

ical state of an infant. Marija, Tamara, Zorana, Miruska, rue Marterey brings

lovely memories and feels like growing up among sisters. Marija, thank you for

demonstrating the power of friendship. Nedeljko, Andrijana, Raphael, Alex, Ana,

Roberto, Nevena, Dejan, Nikodin, Marko, Nikola, Marica, Milos, Ivana, Bojana,

Mihailo, Mirabela, Mahdi, Manos, Yannis, I am proud of all of you, thank you for

making Lausanne such a happy and dynamic place.

I would thank to my friends at home, for all the Skype conversations, good times

when back in Serbia, and for visiting me in Switzerland.

Some things hardly obey scientific laws. I am very grateful to Destiny, for introduc-

ing me to my boyfriend Davide, whose love, support and constant encouragement,

especially during the final phases of writing this thesis, were priceless.

Finally, my gratitude and love go to my family: my Mum and my Dad, my Sister,

for their love and support; to ’zet’ Mihajlo, to my grandparents, to uncles, aunts

and dear cousins, and especially to the new generation, born while this thesis was

being created – to Mateja, Aleksandar, Vasilije, Tara and Natalija.

I devote this thesis to my parents Ranko and Mira and to my sister Goca.

This thesis work was financed by SystemsX, the Swiss Initiative for Systems Bi-

ology.

Contents

Acknowledgements ix

1 Preliminaries 9

1.1 Probability spaces and random variables . . . . . . . . . . . . . . . . . 12

1.2 Distance between probability measures . . . . . . . . . . . . . . . . . . 14

1.2.1 Entropy and mutual information . . . . . . . . . . . . . . . . . 16

1.2.2 Relative entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3.1 Markov chains and Markov graphs . . . . . . . . . . . . . . . . 20

1.4 Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . 21

1.4.1 Transient distributions . . . . . . . . . . . . . . . . . . . . . . . 22

1.4.2 Stationary behavior . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.5 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . 23

1.5.1 A discussion on constructing the CTMC . . . . . . . . . . . . 24

1.5.2 Transient distribution . . . . . . . . . . . . . . . . . . . . . . . . 26

1.5.3 Uniformization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.5.4 Stationary behavior . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.5.5 Finite-dimensional marginal probabilities . . . . . . . . . . . . 28

2 Rule-based modeling of biochemical networks 29

2.1 Chemical kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.1.1 Stochastic chemical kinetics . . . . . . . . . . . . . . . . . . . . 31

2.1.2 Classical chemical kinetics . . . . . . . . . . . . . . . . . . . . . 33

2.1.3 Determinisitc and stochastic rate constants . . . . . . . . . . . 34

2.1.4 Random time change model and the thermodynamical limit 35

2.2 Site-graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3 Rule-based models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4 Site-graph rigidity and counting automorphisms . . . . . . . . . . . . 41

2.5 Individual-based and species-based semantics of rule-based programs 43

2.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3 Automated reductions of rule-based models 51

3.1 Stochastic fragments: Motivating example . . . . . . . . . . . . . . . . 53

3.2 Fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

xii

Contents xiii

3.3 Fragment-based semantics . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.4 Reduction with fragments . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5 Computing fragment-based semantics . . . . . . . . . . . . . . . . . . 61

3.5.1 Translating the contact map . . . . . . . . . . . . . . . . . . . . 62

3.5.2 Translating the rule-based program . . . . . . . . . . . . . . . 63

4 Exact aggregation of Markov chains 65

4.1 Lumpability and invertability . . . . . . . . . . . . . . . . . . . . . . . 66

4.2 Discrete-time case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2.1 Forward criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.2 Backward criterion . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.3 Invertability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 Continuous-time case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.4 Trace semantics of stochastic processes . . . . . . . . . . . . . . . . . . 82

4.4.1 Trace semantics: discrete-time . . . . . . . . . . . . . . . . . . 82

4.4.2 Trace semantics: continuous-time . . . . . . . . . . . . . . . . . 83

4.5 Trace semantics interpretation of exact aggregations . . . . . . . . . . 83

4.5.1 Discrete-time case . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.5.2 Continuous-time case . . . . . . . . . . . . . . . . . . . . . . . . 84

4.6 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5 Exact automatic reductions of stochastic rule-based models 88

5.1 Exact fragment-based reduction . . . . . . . . . . . . . . . . . . . . . . 91

5.2 Computing the fragment-based semantics . . . . . . . . . . . . . . . . 95

5.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6 Approximate aggregation of Markov chains 99

6.1 KL divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.2 Error measure: Discrete time . . . . . . . . . . . . . . . . . . . . . . . . 102

6.2.1 Lifting: Discrete case . . . . . . . . . . . . . . . . . . . . . . . . 104

6.3 Error measure: Continuous time . . . . . . . . . . . . . . . . . . . . . . 106

6.3.1 Lifting: Continuous case . . . . . . . . . . . . . . . . . . . . . . 109

6.4 Trace semantics interpretation of approximate aggregations . . . . . 111

6.5 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7 Approximate automatic reductions of stochastic rule-based mod-els 113

7.1 Approximate reductions and error bound . . . . . . . . . . . . . . . . 115

7.2 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

8 Case studies 118

8.1 EGF/insulin receptor pathway . . . . . . . . . . . . . . . . . . . . . . . 118

8.1.1 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.1.2 Exact fragment-based reduction . . . . . . . . . . . . . . . . . 121

Contents xiv

8.2 HOG pathway in yeast . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2.1 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2.2 Reachable species . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.2.3 Exact fragment-based reduction and model decomposition . 124

9 Conclusions and Discussion 126

mami, tati i Goci

xv

Introduction

Recent advances in high-resolution imaging, microfluidic technology and fluores-

cent biomarkers for proteins rendered possible to obtain measurements at a level

of single cells and even single proteins, for hundreds of cells at a time [25, 38].

However, measurements alone do not explain the underlying mechanisms, and

appropriate mechanistic theories are sought.

Systems biology research focuses on mechanistic, quantitative models, which aim

to explain the function of the subject under study – molecule, cell, organism or

entire species [57]. Following laws of chemical kinetics, under mild simplifying

assumptions, molecular dynamics is appropriately modeled by a continuous-time

Markov process (CTMC), in which one state corresponds to one reaction mixture,

encoded as a multi-set of chemical species. For example, a state can be x =2S1,3S2,5S3, where S1, S2, S3 are chemical species. Then, upon a reaction,

for example, S1, S2c→ S3, the system can move from the state x to the state

x′ = S1,2S2,6S3, at a stochastic rate c whose existence is justified by the laws

of physical chemistry. The number of states of that CTMC grows exponentially

with the species’ abundances.

The above-mentioned reason motivates omitting details in model specification and

adding assumptions to the model. A popular approach is to use a deterministic

limit of a CTMC model, where abundance of all species is scaled to infinity, but

maintaining a concentration (multiplicity per unit volume) of constant order [60].

Then, a set of coupled ordinary differential equations describes the deterministic

evolution of continuous species’ concentrations. The number of equations is equal

to the number of species. However, in many applications in cellular biology, a

deterministic model is unsatisfactory due to the low multiplicities of some molec-

ular species [47, 63, 64, 68]. Then, a stochastic description of chemical reactions

is mandatory to analyze the behavior of the system. To this end, two major

1

Introduction 2

approaches are used to analyze the CTMC. The first approach is statistical esti-

mation of trace distribution and event probabilities of the CTMC by generating

many sample traces [42]. The second approach includes the efforts to understand

the transient evolution of the probability related to each state of the CTMC, re-

ferred to as the transient distribution. The transient distribution evolves according

to the Kolmogorov forward equation (chemical master equation in the chemistry

literature), and, as it is typically very difficult to solve the forward equations (ex-

cept for the simplest systems), sophisticated numerical algorithms are designed to

numerically solve the forward equation for larger systems [52, 66].

Orthogonal to solving the mathematical equations that describe the temporal evo-

lution of the modeled state people have proposed domain-specific, formal languages

in order to facilitate the knowledge representation and to aid the model analysis.

Models written in those languages can be executed by a prescribed operational

semantics, regardless of the size and complexity of the system [35]. An early

formalism designed for computations by multi-set rewriting was named Gamma

[4, 5]. Today, many such modeling frameworks are used for specifying CTMC’s

of biochemical reaction networks: rule-based models, stochastic Petri nets [45],

stochastic process algebras [12, 74], probabilistic Boolean networks [81], to name

a few.

Yet another source of complexity characterizes protein interactions. Each species

can be, for instance, a protein or its phosphorylated form or a protein complex

that consists of several proteins bound to each other. Then, especially in cellular

signal transduction, the number of different such species can be combinatorially

large [53],[84]. To exemplify, one model of the early signaling events in epidermal

growth factor receptor (EGFR) network, with only 8 different proteins, gives rise

to 2748 different molecular species [7], while the full model of the same network

has ≈ 1020 different molecular species [20].

Motivation

Rule-based language is designed to naturally capture the protein-centric and con-

current nature of biochemical signaling. The idea for a rule-based formalism was

discussed in [76], [36], before it was formally introduced in 2003 [24]. Kappa [34]

and BioNetGen [6] are examples of two rule-based modeling platforms. In a rule-

based model, the internal protein structure is maintained in form of a site-graph,

Introduction 3

and interactions can happen upon testing only patterns, local contexts of molec-

ular species. A site-graph is a graph where each node contains different types

of sites, and edges can emerge from these sites. Nodes typically encode proteins

and their sites are the protein binding-domains or modifiable residues; the edges

indicate bonds between proteins. Then, every species is a connected site-graph,

and a reaction mixture is a multi-set of connected site-graphs. The executions of

rule-based models are traces of a continuous-time Markov chain (CTMC), defined

according to the principles of chemical kinetics. Rule-based models testify that the

success and efficiency of model analysis largely depends on the choice of its syntax

[1, 51]. First, the explicit graphical representation of molecular complexes makes

models easy to read, write or edit. Moreover, the description of interactions is

compact and models can trivially be composed, by simply merging two collections

of rules. Finally, a rule set can be executed, or subjected to formal static analy-

sis: for example, it provides efficient simulations [21], automated answers about

the reachability of a particular molecular complex [23] or about causal relations

between rule executions [19].

If a rule-set is to be expanded to its equivalent species description, its quantitative

analysis remains prohibitive. But, if the rules are executed upon testing patterns

(instead of full molecular species), then the executions of the whole model can

be described in terms of a carefully chosen set of patterns, that are much fewer

than molecular species. More specifically, one region of a molecular species being

in a particular state may, or may not influence the state of another region of

a molecular species. Such a notion of influence can be formalized by a binary

relation among the sites of molecular species. As the mentioned correlation can

be detected by only looking at the contexts of rules, one can efficiently, by a single

processing of the rule-set, obtain the set of coarse-grained species, called fragments.

The described method aligns with the principle of static program analysis [16] –

the model executions (semantics) are considered only virtually, while the actual

operations are performed over the rule-set (source code).

The idea of fragment-based reduction was first exploited in [31], where the au-

thors propose how to obtain a set of fragments which self-consistently describe

the dynamics of the model in its deterministic limit. The method was applied to

a model of interplay between epidermal growth-factor receptor (EGFR) and in-

sulin crosstalk, and the reduction from a set of 2899 ODE’s to a set of 208 ODE’s

was demonstrated [20]. Furthermore, the full EGFR model was reduced to only

Introduction 4

2 ⋅ 105 equations, instead of 2 ⋅ 1019. Yet, as the deterministic limit is a particular

limiting behavior of the ground stochastic model, the obtained ‘differential’ frag-

ments do not always correctly describe stochastic kinetics [30], nor they capture

the inherently stochastic dynamics of chemical reactions.

Problem and Contribution

In this thesis, we study the fragment-based reductions of rule-based models. We set

a following framework (Figure 1). The original model R is assigned a continuous-

time Markov chain (CTMC) Xt over the state space of reachable multi-sets

of species X . A class of fragment sets to be considered is formally defined as

emerging from a set of particular equivalence relations defined among domains

(sites) of each protein. For each particular set of fragments, we propose a new

rule-set R, which is referred to as the reduced model. The reduced model is such

that the assigned CTMC Y ′t operates over the state-space of reachable multi-sets

of fragments, denoted by Y. As more species conform to the description of the

same fragment, the species-based state space projects to a fragment-based state

space by a partition function ϕ ∶ X → Y. Let Yt be the process obtained by

projecting samples of Xt by function ϕ,

Yt = y iff Xt ∈ x ∈ X ∣ ϕ(x) = y for all t ≥ 0.

Then, if the CTMC Y ′t is equivalent to the projection Yt, the reduction is said

to be exact. Otherwise, the reduction is approximate.

The CTMC traces are considered only virtually, while the actual operations are

performed over the rule-sets. To this end, we study separately mathematical

relations between R and R, and what these relations imply for their respective

CTMC’s, Xt and Y ′t . Two scenarios are investigated, depending on a type of

formal relation to be proven between the respective CTMC’s: (1) exact reductions,

where the set of fragments is automatically derived subject to the guarantee that

the reduction is exact. (2) approximate reductions, where the set of fragments is

given by the user, and, for a given time limit of a trace, the goal is to provide the

error between trace distributions of processes Yt and Y ′t .

Every set of fragments defines a partition ϕ over the state space of Xt. Aiming

at a procedure for correlating sites of a rule-based model, while guaranteeing an

Introduction 5

exact reduction, we detected three general situations of process Xt with respect

to partition ϕ. In Figure 2, we illustrate the three situations on a simple model of a

discrete-time Markov chain. The resulting procedure (Algorithm 2) correlates any

two sites which are related directly or indirectly within a left-hand-side or a right-

hand-side of a rule, and it hence enforces a ‘strong’ independence notion between

the uncorrelated sites, analogous to the one in Figure 2c. In turn, precisely such

strong independence brings a possibility to effectively reconstruct the transient

semantics of the original system. Motivated by the dependency on the initial

condition, we investigated the asymptotic distribution in the situation when the

initial distribution is not in accordance with the invariant distribution among

lumped states.

Despite the strong correlation notion, examples and case studies confirmed that the

reduction can be significant, or even exponential. However, if the reduced system

remains at a prohibitive size, approximate reduction is necessary. In the frame-

work for approximate reduction, the set of all possible fragment sets is put in a

partially ordered set. Each fragment set is positioned depending on its potential of

expressing quantities which influence self-consistency. Among the various metrics

rule set R

rule set R

Xt

Yt Y t

(species)

(fragments)reduction error

projectionreduction

Figure 1: Problem setup. The presented arrows serve for illustration purpose:the double arrow denotes the assignment of a CTMC to a rule-based model, the

dotted arrows illustrate operations which are never performed.

a

c

b

d0.3

0.7

0.2

0.8

0.5

0.6

0.5

0.4

a

c

b

d0.3

0.7

0.2

0.8

0.60.4

0.60.4

a

c

b

d

0.5

0.6

0.5

0.4

0.2

0.8

0.2

0.8

a) b) c)

Figure 2: Three general situations of process Xt with respect to partitionϕ. a) The lumped process Yn is not Markov, time-homogeneous. b) Thelumped process Yn is Markov, time-homogeneous for all initial distributions.c) The lumped process Yn is Markov, time-homogeneous whenever the ratioof probabilities between states c and b at the initial state is equal to 0.2

0.8 = 0.25.

Introduction 6

for stochastic processes (see [39] for an overview), we decided on the Kullback-

Leibler divergence [27]. The main reason of employing KL-distance was that it

has particularly suitable properties when applied to the probability space of traces

generated by Markov sources. More concretely, it can be computed efficiently, as a

function of only the corresponding matrix description and the transient distribu-

tion of the original process [3, 13]. As the measure can be obtained only between

two CTMC’s on the same state space, an upper bound to the error is proposed in-

stead, with a technique inspired by the work in [27]. The inequality is guaranteed

by a standard result of information theory.

Outline

Chapter 1 reviews the basic concepts of Markov chain theory.

Chapter 2 introduces the stochastic chemical kinetics and rule-based models.

The main purpose of Chapter 3 is to formalize fragment-based reductions, as a

notion independent of the semantics under study. In particular, a dimension and

expressiveness of a set of fragments is defined as a means to compare two differ-

ent fragment sets, and it is demonstrated on a toy example that fragment-based

reductions can reduce the number of states of a CTMC exponentially. Moreover,

the general algorithm for reducing a rule-based program with respect to any given

fragment set is presented.

Chapter 4 and Chapter 5 contain the results on exact reduction problem. The

Chapter 4 outlines the results related to exact Markov chain aggregation, indepen-

dently of the application to rule-based models. Discrete-time and continuous-time

Markov chains are studied separately. All properties are summarized in Theo-

rem 4.28, where the relation between trace semantics and transient semantics of

the original and aggregated process are comprehensively presented.

In Chapter 5, the focus is on the practical implications of the theory presented

in Chapter 4, in the context of fragment-based reductions of rule-based models.

We propose an algorithm for obtaining the set of fragments for a given rule-based

program (Algorithm 2), and in Theorem 5.10, we prove that the suggested set

of fragments guarantees an exact reduction. Moreover, we show how to compute

the probability of a species-based state P(Xt = x), given the probability of the

fragment-based state P(Yt = y = ϕ(x)).

Introduction 7

Chapter 6 and Chapter 7 present the results on approximate reduction problem.

Chapter 6 defines the reduction error as the Kullback-Leibler distance measure

between the trace distributions of the reduced chain Yt and the projected pro-

cess Y ′t , and the computation of error as a function of the respective generator

matrices and transient distributions (Theorem 6.12). Moreover, it is shown that

the upper bound on the error can be evaluated when only the generator matrix

and the transient distribution of the original model are known (Theorem 6.19). In

Chapter 7, the framework is instantiated over three examples. Simulation results

indicate how the error can be used to discriminate between fragment sets of equal

dimension.

In Chapter 8, the framework of exact reductions is discussed over two large-scale

case studies.

Parts of the ideas in this thesis are reflected in the Kappa modeling environment

[34], ‘complx’ toolbox. The implementation of the approximate reductions frame-

work within the Kappa modeling environment is a work in progress and for that

reason, we leave the analysis of the approximate framework for large-scale case

studies to future work.

Related work

The principle of obtaining conclusions about system’s dynamics by analyzing their

model description, originates from, and is exhaustively studied in the field of

formal program verification and model checking [11, 16], while it is recently gaining

recognition in the context of programs used for modeling biochemical networks.

An example is the aforementioned related work of detecting fragments for reducing

the deterministic rule-based models [31], detecting the information flow for ODE

models of biochemical signaling [8, 50], or the reaction network theory [18].

To the best of our knowledge, the presented method is the only static analysis tech-

nique for reducing the stochastic models of protein interactions. To this end, we

distinguish the fragment-based approach from model reduction techniques, based

on, for example, separating time-scales [44, 55, 75] or numerical algorithms that

focus of efficiently solving the chemical master equation [52, 66]. Still, once a

fragment-based rule set is obtained, it is amenable to any further analysis.

Introduction 8

In contrast, Markov chain aggregation problem was extensively studied in theory

and application. Put in the context of our problem formulation, strong lumpability

refers to the property of Xt, when there exists an exact aggregation by partition

ϕ for any initial distribution. Tian and Kannan [82] extended the notion of strong

lumpability to continuous time Markov chains. A more general situation of weak

lumpability refers to the property of Xt, when there exists an exact aggregation

by partition ϕ a subset of initial distributions.The notion first appeared in [56]

and subsequent papers [62, 77, 78] focussed toward developing an algorithm for

characterizing the desired set of initial distributions. This reconstruction property,

demonstrated to be efficiently realizable in our framework, is not addressed explic-

itly for weakly lumpable chains in previous literature. A variant of our condition

can be found in [10] where the author considered backward bisimulation over a

class of weighted automata (finite automata where weights and labels are assigned

to transitions). Moreover, we proved that even if the initial distribution is not

in accordance with the invariant distribution among lumped states, it will be so

asymptotically. These convergence results, to the best of our knowledge, have not

been discussed before.

Parts of this thesis were built on results that were previously published in collab-

oration with colleagues: [32, 33, 37, 72]. Published work related to the topic of

this thesis but not discussed or only cited is [70, 71].

Chapter 1

Preliminaries

In this section, we review the basic concepts of probability theory, which are

needed for developing the later analysis of Markov processes. A more elaborated

discussion and proofs of statements can be found in standard measure theory and

probability theory textbooks (for example, [29]).

We start with the general concepts of measure theory.

Let E be a set. We denote by P(E) the set of all subsets of E (power-set), by Ec

the complement of E, and by ∣E∣ the number of elements in E (it is not defined

when E is not countable, and may be infinite when E is countable). A range of

values taken by a function f ∶ E → E′ is denoted by R(f). A partition of a set E

is an equivalence binary relation on E, and the corresponding classes we refer to

as partition classes. We use N to denote for the set of naturals with zero.

Definition 1.1. A σ-algebra E on E is a non-empty set of subsets of E, which is

closed under complement and countable unions, that is,

(i) ∅ ∈ E ,

(ii) for all A ∈ E , Ac ∈ E ,

(iii) for all sequences Aii∈N of elements in E , ⋃iAi ∈ E .

Definition 1.2. Let E be a set and E a σ-algebra on E. The pair (E,E) is called

a measurable space, and each A ∈ E is called a measurable set. A measure µ on

9

Chapter 1. Preliminaries 10

(E,E) is a function µ ∶ E → [0,∞), such that µ(∅) = 0, and, for any sequence

Aii∈N of disjoint elements of E ,

µ(⋃i

Ai) =∑i

µ(Ai),

that is, it satisfies the countable additivity property. The triple (E,E , µ) is called

a measure space. In particular, we will say that a measure is σ-finite, if E is a

countable union of measurable sets with finite measure.

Let A be a collection of subsets of E (A ⊆ P(E)). The smallest σ-algebra on E

which contains all elements of A trivially exists and it is denoted by σ(A). We

call it the σ-algebra generated by A.

When E is a finite countable set, the measure on (E,P(E)) which assigns to each

measurable set A the number of elements of A, that is, m(A) = ∣A∣, is called the

counting measure. When E is not countable, we work with Borel sets.

Definition 1.3. Given a set E, the σ-algebra generated by the set of open sets in

E is called the Borel σ-algebra of E, denoted by B(E), and its elements are called

Borel sets. A measure µ on (E,B(E)) is called a Borel measure on E.

We will denote by B a Borel σ-algebra on R . If not specified differently, we assume

a Borel measure space (R,B, µ), where µ is the Lebesgue measure. Recall that

the Lebesgue measure on R is the Borel measure on R such that, for all a, b ∈ R,

if a < b then µ((a, b]) = b − a.

Definition 1.4. Given two measurable spaces (E,E) and (E′,E ′), a function

f ∶ E → E′ is measurable if the inverse image of A ∈ E ′, defined by f−1(A) = e ∈E ∣ f(e) ∈ A is in E . A measurable function is often called a measurement on

(E,E).

For example, a measurement on (E,E) is the indicator function of the set A ∈ E ,

defined by 1A(e) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

1, if e ∈ A

0, otherwise.

The measure of f on any measurable set can be determined through the concept

of integral. The right intuition for the integral is to think of the volume taken by

the function f .


If the range of a measurable function is a finite countable set, the function is said

to be simple. Observe that, if R(f) = a1, . . . , ak, by setting Ai ∶= f−1(ai)for i = 1, . . . , k, we obtain a partition on E . A measurable function on a Borel

σ-algebra of E is called a Borel function.

Definition 1.5. Let f ∶ E → E′ be a nonnegative function. If f is a simple

function, the integral of f is ∫ fdm ∶= ∑ki=1 aiµ(ai). If f is a Borel function (not

necessarily over R), the integral of f is the supremum

∫ fdµ ∶= sup∫ gdm ∣ g is a simple function such that g(e) ≤ f(e) for all e ∈ E ,

if it exists. The integral is also called the mean or expectation of f with respect

to measure µ.

Then, for a given measurable set A ∈ E , the integral of f on a set A is ∫ f1Adµ,

written ∫A fdµ. It can be shown that the integral of f is again a measurable

function on (E,E), and hence the triple (E,E , ν) is a well-defined measure space,

where ν ∶= ∫ fdµ. In fact, f is then called the density of ν with respect to µ. The

following Theorem states the existence of the density function. We rephrase the

Theorem because, as it defines density for general Borel spaces, it will be useful for

introducing the density of a probability measure over traces of a continuous-time

Markov chains, which will be important when defining the error measure in the

framework for approximate aggregations.

As mentioned, intuitively, the integral represents the volume of f in reference to µ.

Clearly, the integral of a function should always dominate the measured function

in the sense that whenever the function has a value 0, the integral should also be

0.

Definition 1.6. If ν and µ are two measures on the same measurable space then

µ is said to be absolutely continuous with respect to ν, or dominated by ν, written

µ≪ ν, if µ(A) = 0 for every set A ∈ E for which ν(A) = 0.

Theorem 1.7. (Radon-Nikodym theorem) Let µ and ν be two measures on (E,E)and let ν be σ-finite. If ν ≪ µ, then there exists a unique nonnegative measurable

function f on E such that

ν(A) = ∫Afdµ,A ∈ E .


The function f is called the Radon-Nikodym derivative or density of ν with respect

to µ and is denoted by dν/dµ. If ∫ fdµ = 1 for an f ≥ 0, then ν is a probability

measure and f is called its probability density function (pdf) with respect to µ.

1.1 Probability spaces and random variables

Definition 1.8. Probability space is a measure space (Ω,F ,P), with measure P,

such that P(Ω) = 1. It provides a model for an experiment whose outcome is

subject to chance, and has the following interpretation:

(i) Ω is the set of outcomes of the experiments, samples ;

(ii) F is a set of observable sets of outcomes, events ;

(iii) P(A) is the probability of event A ∈ F .

We further assume to be given a probability space (Ω,F ,P).

Definition 1.9. A random variable X is a measurement on (Ω,F). We call X

a discrete random variable, if R(X) is a countable set, and a continuous random

variable, otherwise.

A random variable is a means to reduce the probability space to the observations

of interest. If X is a random variable from (Ω,F) to (E,E), we say that (E,E)is a measurable space generated by X. The measurability of X ensures that the

outputs of the random variable naturally inherit their own probability measure

PX . For example, PX(A), denoted also by P(X ∈ A), amounts to

PX(A) = P(ω ∈ Ω ∣X(ω) ∈ A).

Definition 1.10. The law of distribution of a random variable X is the image

measure PX ∶ E → [0,1], such that PX = P X−1. The range or sample space of

a discrete random variable is also called the alphabet of the random variable X,

and the random variable with alphabet E is also said to be an E-valued random

variable.


The expectation or mean of a random variable X is the integral ∫ XdP, denoted

by E[X]. Then, it can be shown that Var(X) = E[(X − E[X])2] is also a random

variable, and it is called the variance.

For discrete measurements, the probability density function fX ∶ a1, a2, . . . →[0,1] is determined with respect to the counting measure on (E,E ,PX), and is

more commonly termed the probability mass function (pmf) of X. It is defined by

pX(ai) ∶= P(X = ai) for all ai ∈ E.

Then, by the definition of the integral on discrete measurements, the probability

of event A ∈ P(E) is PX(A) = ∫A fXdm = ∑∣A∣

i=1 pX(ai)dm which indeed computes

to the intuitive result PX(A) = ∑∣A∣

i=1 pX(ai). Similarly, the expectation of X is

E[X] = ∫x∈R(X)xfX(x)dm which agrees with the intuition for the mean of a discrete

measurement: E[X] = ∑∣R(X)∣

i=1 aipX(ai).

Example 1.1. The simplest discrete random variables are:

(i) A Bernoulli random variable Xp ∶ Ω → 0,1, defined by PXp(1) = p and

PXp(0) = 1 − p. The mean of a Bernoulli random variable is E[Xp] = p, and

the variance Var(Xp) = p(1 − p). For example, the indicator function of the

event A ∈ F is a Bernoulli random variable with parameter P(A).

(ii) A binomial random variable X(n,p) ∶ Ω → 0, . . . , n, defined by PX(n,p) =k = (n

k)pk(1−p)n−k. The mean of the binomial random variable is E[X(n,p)] =

np, and the variance Var(X(n,p)) = np(1−p). Binomial random variable with

parameters n and p interprets the total number of successful outcomes when

repeating n independent Bernoulli trials with parameter p.

(iii) A geometric random variable Xp ∶ Ω → 1,2, . . . is defined by PXp = k =(1 − p)k−1p. The mean of a geometric random variable is E[Xp] = 1

p , and the

variance Var(Xp) = 1−pp2 . The geometric random variable with parameter p

interprets the number of trials until a success happens.

(iv) The poisson random variable Poλ ∶ Ω→ 0,1,2, . . . is defined by

PPoλ = k =λke−λ

k!.

The mean is E[Poλ] = λ and the variance is Var(Poλ) = λ.


For continuous measurements, the density function of measure PX is typically

taken to be the one with respect to the Lebegue measure µ on (R,B). If it exists,

is called the probability density function (pdf) of PX . Then, the probability is

determined by

PX(A) = ∫Af(x)dµ(x) = ∫

Af(x)dx,

where the latter is the standard Lebegue integration. Moreover, for all non-

negative Borel functions g, µ(g) = ∫R g(x)f(x)dx. In particular, the expectation

of X is computed as E[X] = ∫ XdP = ∫x∈R(X)xdPX = ∫z∈R zfX(z)dz.

If X is a random variable to the Borel space (R,B), the measure PX is uniquely

determined by its values on the intervals ((−∞, x] ∣ x ∈ R). The function FX ∶ R→[0,1] defined by FX(x) = PX((−∞, x]) = P(ω ∈ Ω for X(ω) < x) = P(X < x) is

called the cumulative distribution function (cdf) of X (or PX). It is also common

to say the fX is the pdf of X (not of PX as introduced before).

Example 1.2. The two most important random variables for constructing a

continuous-time Markov chain are Poisson random variables and exponential ran-

dom variables. The exponential random variable Expλ ∶ Ω → [0,∞) has a proba-

bility density function

fExpλ(x) = λe−λx.

The mean is E[Expλ] = 1λ and the variance Var(Expλ) = 1

λ2 .

Definition 1.11. (memoryless property) For x, y ∈ R(X), a random variable

X ∶ Ω→ R is said to have the memoryless property, if

PX > x + y ∣X > x = PX > y.

It can be shown that the geometric random variables are the only discrete random

variables satisfying the memoryless property and that exponential random vari-

ables are the only continuous random variable satisfying the memoryless property.

1.2 Distance between probability measures

We introduce an information-theoretic measure of similarity between probability

distributions, called the Kullback-Leibler divergence (KL divergence). It will be


used later, as a distance measure between the trace semantics generated by two

different Markov chains over the same state space.

Relative information, cross entropy or Kullback-Leibler divergence (KL diver-

gence) is a generalization of entropy. KL divergence is always non-negative, but it

is not a metric: it is non-symmetric, and it does not satisfy the triangle inequality.

It is still often used as a measure of similarity between probability distributions.

A common technical interpretation is that KL divergence is the coding penalty

associated with selecting the candidate distribution to approximate the correct

distribution [17].

Jensen’s inequality and log-sum inequality are useful for providing bounds about

information-theoretic measures.

Theorem 1.12. Let (Ω,F ,P) be a probability space, f an integrable real-valued

random variable, and φ a convex function. Then,

φ(E[f]) ≤ E[φ(f)],

also known as Jensen’s inequality.

Proof. We discuss the inequality only for discrete measurements f , by induction

on the cardinality of R(f) = x1, . . . , xn. Let pi = Pf(xi). For n = 2, φ(E[f]) =φ(p1x1 + p2x2) and E[φ(f)] = p1φ(x1)+ p2φ(x2). Since p2 = 1− p1 and φ is convex,

φ(p1x1+p2x2) ≤ p1φ(x1)+p2φ(x2), so inequality follows. Assume that the inequal-

ity holds for all k < n. Then, E[φ(f)] = ∑ni=1 piφ(xi) = pnφ(xn) + ∑n−1

i=1 piφ(xi).Since p1 + . . . + pn−1 = 1 − pn, the sequence p′i = pi/(1 − pn)i=1,...,n−1 is a probabil-

ity distribution, and, by the induction hypotheses, ∑n−1i=1 p

′iφ(xi) ≥ φ(∑n−1

i=1 p′ixi).

Therefore, we obtain that E[φ(f)] ≥ pnφ(xn) + (1 − pn)φ(∑n−1i=1 p

′ixi). Finally,

pnφ(xn) + (1 − pn)φ(∑n−1i=1 p

′ixi) ≥ φ(∑n

i=1 pi(xi)) = φ(E[f]), by convexity of φ.

Theorem 1.13. Let n ≥ 2, and let a1, . . . , an and b1, . . . , bn be non-negative real

numbers, and let a = ∑i ai, b = ∑i bi. Then

∑i=1

ai lnaibi

≥ a lna

b,

with equality if a1b1

= . . . = anbn

. The inequality is termed log-sum inequality. In

particular, in a special case when ai and bi are probability distributions, we


obtain the Gibbs inequality :

∑i=1

ai lnaibi

≥ 0.

Proof. Let φ(x) = x lnx and let random variable f have the output values xj = aibi

and distribution pi = P(f = xj) = bib . Since for x > 0, φ′′(x) = 1/x > 0, the

function φ is convex, Jensen’s inequality holds. Then, E[φ(f)] = ∑ibib [

aibi

ln aibi] =

1b(∑i ai ln

aibi), and φ(E[f]) = φ(∑i pixi) = (∑i pixi) ln (∑i pixi) = ∑i ai

b ln (∑i aib

) =1b(a ln a

b). The equality holds in case of f being constant.

1.2.1 Entropy and mutual information

In the following, assume given a probability space (Ω,F ,P) with a discrete mea-

surement f , which induces the probability measure Pf and probability mass func-

tions pf .

Definition 1.14. The entropy of f is defined by

HP(f) = − ∑a∈R(f)

P(f = a) ln P(f = a) = − ∑a∈R(f)

pf(a) lnpf(a).

By convention, 0 ln 0 = 0, and the logarithm basis is arbitrary. In particular, if

the logarithm base is 2, the units for entropy are ‘bits’, and if it is the natural

logarithm, the units are ‘nats’. To denote HP, the subscript will be omitted, if

clear from context.

Intuitively, entropy represents the measure of uncertainty related to a random

variable. More concretely, let the information contained in the outcome a ∈ R(f)be given by If(a) = ln(1/Pf(a)) = − ln P(ω ∣ f(ω) = a), that is, a number of bits

used to encode the fraction of outcomes measured with a (e.g., for encoding 0.25,

there are exactly two bits needed). Then, entropy can be interpreted as exactly the

average information contained in the distribution: H(f) = EPf [If ] = EPf [− ln Pf ](see [80] for further reference).

Theorem 1.15. The entropy value ranges at 0 ≤ H(f) ≤ ln ∣R(f)∣, where the

lower bound holds if and only if f is a constant (no uncertanity), and the upper

bound is reached if f is a uniform distribution.


Proof. By applying Jensen’s inequality (Theorem 1.12) for R(f) = 1, . . . , n,

ai = Pf(i) and bi = 1, it follows that −∑i ai lnaibi

≤ −1 1∑i bi

= ln ∣n∣, with equal-

ity only if a1 = . . . = an. The upper bound can be shown from Gibbs inequality

(Theorem 1.13).

In the following, we use notation ⟨f, g⟩(ω) for ⟨f(ω), g(ω)⟩, and 1a=b for a function

1a=b =⎧⎪⎪⎪⎨⎪⎪⎪⎩

1, if a = b

0, otherwise.

Theorem 1.16. For two discrete random variables f and g on a common proba-

bility space, it holds that maxH(f),H(g) ≤ H(f, g) ≤ H(f) +H(g), where the

left equality holds if and only if f = g, and the right equality holds if and only if

f and g are independent.

For the proof, we refer to [46]. The latter inequality motivates the notion of mutual

information between two measurements.

Definition 1.17. Mutual information between two measurements f and g is de-

fined by I(f ; g) =H(f) +H(g) −H(f, g).

The mutual information of f with itself is equal to its entropy, because H(f, f) =H(f). More concretely, since P(f = b ∣ f = a) = 1a=b, it follows that

H(f, f) = − ∑a,b∈R(f)

P(f,f)((a, b)) ln P(f,f)((a, b))

= − ∑a,b∈R(f)

Pf(a)1a=b(ln Pf(a) + ln 1a=b) =H(f).

1.2.2 Relative entropy

Assume now given a measurable space (Ω,F) with two different probability mea-

sures, P and M. Let f be a discrete measurement, and denote its respective

probability measures with Pf , Mf , and probability mass functions with pf , mf .

Definition 1.18. If Pf is absolutely continuous with respect to Mf , that is, Pf ≪Mf , the relative entropy of f with measure Pf , with respect to the measure M is

HP∣∣M(f) = ∑a∈R(f)

pf(a) lnpf(a)mf(a)

.

Otherwise, HP∣∣M(f) =∞.


An immediate corollary of Gibbs inequality (Theorem 1.13) is that relative entropy

is always nonnegative. Since entropy is not a function of the particular output

values a ∈ R(f), but of the distribution of the random variable, it is useful to

view the entropy as a function of the partition induced on the original probability

space. Let R(f) = a1, a2, . . . and Qi = f−1(ai). If Q = Q1,Q2, . . . denotes a

partition of Ω induced by f , we may write HP(Q) = −∑QP(Qi) ln P(Qi).

Definition 1.19. We will say that a measurement f ′ refines a measurement f ,

written f ′ ⪯ f , if for all a ∈R(f ′), there exists a b ∈R(f), such that that f ′−1(a) ⊆f−1(b). In other words, if the induced partitions are denoted by Q′ and Q, every

element of Q′ needs to be contained in some element of Q.

The next theorem shows that the measurement on a discrete random variable

lowers the relative entropy and the entropy of that random variable. The gener-

alization of this result to continuous measurements will be used for providing the

error bound in the approximate aggregation framework.

Theorem 1.20. If f ⪯ g, then HP∣∣M(f) ≥HP∣∣M(g) and HP(f) ≥HP(g).

Proof. Let f ∶ Ω → D1 and g ∶ Ω → D2. Since f ⪯ g, there exists a function

θ ∶ D1 → D2 (which is a measurement on the probability space (D1,P(D1),Pf)),such that g = θ f .

If HP∣∣M(f) =∞, the theorem trivially holds. If HP∣∣M(g) =∞, then also HP∣∣M(f) =∞, and the theorem holds. This is because, if HP∣∣M(g) = ∞, then there exists

an element b ∈ D2, such that Pg(b) > 0 and Mg(b) = 0. But, Mg(b) = 0

implies that for all a ∈ θ−1(b), Mf(a) = 0, while Pg(b) ≠ 0 implies that for

some a ∈ θ−1(b), Pf(a) > 0. The remaining case is when Pf is dominated by Mf

and Pg is dominated by Mg. Then,

HP∣∣M(f) = ∑a∈D1

pf(a) lnpf(a)mf(a)

= ∑b∈D2

⎡⎢⎢⎢⎢⎣∑

a∈θ−1(b)

P(ω ∣ f(ω) = a) lnP(ω ∣ f(ω) = a)M(ω ∣ f(ω) = a)

⎤⎥⎥⎥⎥⎦

≥ ∑b∈D2

⎡⎢⎢⎢⎢⎣

⎛⎝ ∑a∈θ−1(b)

P(ω ∣ f(ω) = a)⎞⎠

ln∑a∈θ−1(b) P(ω ∣ f(ω) = a)∑a∈θ−1(b) M(ω ∣ f(ω) = a)

⎤⎥⎥⎥⎥⎦= ∑b∈D2

Pg(b) lnPg(b)Mg(b)

=HP∣∣M(g),


where the inequality step relies on the log-sum inequality applied to the bracketed

expression. The proof for entropy is similar.

1.3 Markov chains

The theory of Markov processes has a wide variety of applications ranging from

engineering to biological sciences. In systems biology appropriate Markov pro-

cesses are used in stochastic modeling of biochemical reaction systems, especially

where the constituent species are present in low abundance. In this chapter, we

first recall some general notions related to stochastic processes, and then we review

the concepts about discrete-time and continuous-time Markov chains which will

be necessary in the rest of the thesis.

Let S be a countable set, and (T,<) a totally ordered set.

Definition 1.21. A stochastic process or a random process with state space S

and parameter set T is a collection of random variables Xt, t ∈ T, defined on a

common probability space (Ω,F ,P). If T is countable, the process is said to be

discrete, and otherwise it is continuous.

The index t usually represents time, and then one thinks of Xt as the state of

the process at time t. For any subset T ′ = t1 < . . . < tn ⊂ T , the probability

distribution Pt1,...,tn = P(Xt1 , . . . ,Xtn)−1 of the random vector (Xt1 , . . . ,Xtn) ∶ Ω→Sn is called a finite-dimensional marginal distribution of the process Xt ∶ t ∈ T.

Definition 1.22. Two random processes are equivalent, if they agree at all finite-

dimensional marginal distributions.

For every fixed ω ∈ Ω, the mapping t ↦ Xt(ω) defines a trace, also called a

realization, trajectory, sample path or sample function of the process. Additional

structure is assumed on a stochastic model in order to render the model analysis

easier.

Definition 1.23. Given a stochastic process Xt on a countable state space S,

let HX(t) denote all the information about the process Xt up to time t ∈ T . The

process Xt satisfies the Markov property, if for all states s, s′ ∈ S and all times

t + h > t,PXt+h = s′ ∣ HXt = PXt+h = s′ ∣Xt.


The process Xt is said to be time-homogeneous, if, being in the state s, the

probability that the next state is s′ is the same, no matter for how much time the

system has been observed:

PXt+h = s′ ∣Xt = s = PXh = s′ ∣X0 = s.

Example 1.3. A simple example of a continuous-time process, which plays an

important role in constructing the continuous-time Markov chain is a counting

process or Poisson process. The Poisson process with intensity λ is a continuous-

time process ξt taking values in the set N, such that (i) ξ(0) = 0, (ii) the number

of events in disjoint time intervals are independent, and (iii) ξ(s+ t)− ξ(t) ∼ Poλs,

that is, P(ξ(s + t) − ξ(t) = k) = e−λs λkskk! , for s ≥ 0.

1.3.1 Markov chains and Markov graphs

When S is a finite set, it is useful to switch to the vector notation. Under some

ordering of the state space, we will denote by P(t) the transition probability matrix

for step t, with entries p(t)(s, s′) = P(Xt = s′ ∣X0 = s). It can be observed that any

Markov, time-homogeneous process satisfies the Chapman-Kolmogorov equations :

P(t+h) = P(t)P(h) for all t, h ∈ T . (1.1)

The Chapman-Kolmogorov equations ensure that any finite-dimensional marginal

distribution of a Markov, time-homogeneous process can be determined through

P(1) (denoted by P), in case of discrete-time, and through the limit ddtP

(t) (denoted

by Q) in case of continous-time. For that reason, a Markov, time-homogeneous

stochastic process on a countable set can be concisely represented in terms of a

chain (graph).

For our analysis, it will be useful to explicitly keep track of the state space of the

process, and the distribution at which the process is initiated.

Definition 1.24. A Markov graph (MG) is a triple (S,w, p0), such that

(i) S is a countable state space,

(ii) w ∶ S × S → R, defines the transition weights,

(iii) p0 ∶ S → [0,1] is such that ∑si p0(si) = 1.


We later assign a Markov process to a Markov graph. The process assigned

to a Markov graph will be either a discrete-time Markov chain (DTMC) or a

continuous-time Markov chain (CTMC). A separation of the graph description

and the process itself is done because we will later on make statements about

Markov graphs, independently of the process assigned to it.

Definition 1.25. A discrete-time Markov chain (DTMC) is a discrete-time ran-

dom process Xnn∈N which satisfies the Markov and time-homogenity property.

Definition 1.26. A continuous-time Markov chain (CTMC) is a continuous-time

random process Xtt∈R≥0 which satisfies the Markov and time-homogenity prop-

erty.

1.4 Discrete-time Markov chains

Depending on whether the process assigned to a Markov graph is discrete or

continuous, we will call it a discrete Markov graph, and a continuous Markov

graph. We start by defining a discrete-time Markov graph. We will say that w(s, ∶)is a probability distribution, if w(s, s′) ≥ 0 for all s′ ∈ S, and ∑s′∈S w(s, s′) = 1.

Definition 1.27. A discrete-time Markov graphM = (S,w, p0) is such that for all

s ∈ S, w(s, ∶) is a probability distribution. Then, a process Xn assigned to M is

a DTMC, and it is such that, for all s, s′ ∈ S,

(i) P(X0 = s) = p0(s), and

(ii) P(X1 = s′ ∣X0 = s) = w(s, s′).

Notice that it is sometimes implicit in the literature to assume that the DTMC

Xn is defined only by its one-step transition matrix. In our treatment, a DTMC

assigned to (S,w, p0) has a transition matrix determined by w, it operates over a

state space S and it has a fixed initial probability distribution p0.


1.4.1 Transient distributions

For a given DTMC Xn, the matrix P(1) = P is called the (one-step) transition

matrix. Due to Chapman-Kolmogorov equations, P(n) = Pn. The marginal distri-

bution of Xn is also called the transient distribution at time n. We use the row vec-

tor notation π(n) for the transient distribution at time n, so that π(n)(s) = P(Xn =s). Then, the transient distribution computes to π(n) = π(n−1)P = . . . = π(0)Pn.

Remark 1.28. It is worth realizing here that two DTMC’s which are indistinguish-

able by their transient distributions may have different distributions of traces.

For example, knowing the marginal distributions of X0 and X1 is not enough for

reconstructing the distribution of their joint (X0,X1): take two DTMC’s Xnand X ′

n, with S = 1,2,3 and p0 = (13 ,

13 ,

13), and let the weight functions

be as follows: w(1,2) = w(2,3) = w(3,1) = w(1,1) = w(2,2) = w(3,3) = 0.5

and w′(1,3) = w′(3,2) = w′(2,1) = w′(1,1) = w′(2,2) = w′(3,3) = 0.5. The

marginal distributions of either of these chains is uniform, while, for example,

P((X0,X1) = (1,2)) = 1/4 and P((X ′0,X

′1) = (1,2)) = 0.

1.4.2 Stationary behavior

Let P be a transition matrix.

Definition 1.29. A probability distribution µ ∶ S → [0,1] is called a stationary

probability distribution of P, if

µ(s) =∑s′µ(s′)P(s′, s) for all s, s′ ∈ S,

or, in matrix notation, a non-negative vector µ is an invariant measure, if µP = µ.

The stationary distribution is also termed equilibrium or steady state distribution

in the related Markov chain literature. However, in the application to model-

ing biological systems, the term ‘equilibrium’ or ‘steady state’ often refers to the

stationarity with respect to the deterministic model, and thus differs from the

probabilistic equilibrium.

A stationary distribution can be interpreted as a fixed point for the Markov chain,

in the sense that µ = µP = . . . = µPn. However, simply knowing a fixed point

exists does not guarantee that the system will converge to it, or that it is unique.


There exists a criterion for proving the existence, uniqueness and convergence

of the stationary distribution – it suffices to show that the transition matrix is

irreducible and aperiodic.

Definition 1.30. We say that s communicates to s′, written s→ s′, if there exists

n ≥ 0 such that p(n)s,s′ > 0. Let s↔ s′ iff s → s′ and s′ → s. The equivalence classes

of ↔ are called communication classes.

Definition 1.31. A transition matrix is irreducible if there is only one communi-

cation class. That is, if s↔ s′ for all s, s′ ∈ S. Otherwise, it is called reducible.

Definition 1.32. The period of state s ∈ S is defined by d(s) = gcdn ≥ 1 ∶ p(n)s,s >0, where gcd stands for greatest common divisor. If d(s) = 1, the we say that s

is aperiodic, and if d(s) > 1, we say that s is periodic with a period of d(s).

Theorem 1.33. Suppose that a DTMC Xn has an irreducible and aperiodic

transition matrix P. Then, there is a unique stationary distribution µ with positive

values at all components, such that

P(Xn = s)→ µ(s) as n→∞ for all s

or, equivalently, limn→∞π(0)Pn = µ.

We refer to ([67], Theorem 1.7.7) for the proof.

1.5 Continuous-time Markov chains

Reasoning about continuous-time processes is more subtle, loosely because the

parameter set T is uncountable, and we cannot assign a probability to an un-

countable union of marginal distributions. In this respect, it is useful to re-

strict to the right-continuous processes, meaning that for all ω ∈ Ω and t ≥ 0

there exists ε > 0 such that Xs(ω) = Xt(ω) for s ∈ [t, t + ε]. This assump-

tion allows reasoning about any finite-dimensional distribution of the process.

For example, one can find the marginal probability P(Xt = s), or the reacha-

bility of the state s, P(Xt = s for some t), or the finite-dimensional probability

P(Xt0 = s0, . . . ,Xtn = sn).

Every trace of a right-continuous process remains constant for a while, and then

‘jumps’ to a new state. We will assume in our analysis that a process is regular,


that is, in all finite intervals [0, t), only finitely many jumps may occur (otherwise,

the process is said to be explosive). The restriction to non-explosive processes

is justified because the stochastic models of biochemical networks are trivially

non-explosive, if modeled at a proper resolution. Each right-continuous process is

associated with random variables

(i) ξ0, ξ1, . . . ∈ R≥0 - jump times of Xt (absolute time instants when jumps

occur), defined by

ξ0 = 0, ξn+1 = inft > ξn ∣Xt ≠Xξn.

(ii) τ0, τ1, . . . ∈ R≥0 - waiting times, such that τi = ξi+1 − ξi (waiting times relative

to the last occured jump), and

(iii) Z0, Z1, . . . ∈ S - the sequence of states visited by jumps, given by Zi = Xξi .

The process Zn, n = 0,1, . . . defines the jump chain, embedded discrete

process or the skeleton process.

1.5.1 A discussion on constructing the CTMC

Before the formal description, we briefly discuss which parameters specify a CTMC.

Assume that we want to construct a continuous-time Markov time-homogeneous

process on a countable state space S. To start with, apart from knowing the

initial distribution, we need also to specify how long the process waits in each of

the states. Since we want the process to be Markov and time-homogeneous, the

waiting time must be distributed by a memoryless distribution. The only mem-

oryless continuous distribution is exponential – hence, the waiting time must be

exponentially distributed. The distribution parameter should depends solely on

the state s0, often called the activity of s0: P(ξ1 < t ∣ Z0 = s0) ∼ Exp(a(s0)). Notice

that the expected waiting time is E[ξ1 ∣ Z0 = s0] = 1a(s0)

, which is consistent with

the intuition: the bigger the activity of a state, the faster the state will be left (on

average). Moreover, it can be shown that the choice of the state visited by the next

jump, P(Z1 = s1 ∣ Z0 = s0, ξ1 < t), should depend on the current state, but it should

not depend on the amount of time spent in the current state (see [67], for example).

Loosely, this is because one would then need to keep track of the past back to when

the previous jump occured all along the time between two jumps, which would in


turn violate the targeted properties. Finally, letting ps,s′ = P(Z1 = s1 ∣ Z0 = s0) and

assigning weights to the transitions by w(s, s′) ∶= a(s)ps,s′ , the sum of the weights

initiating at state s must sum up to the activity of that state: ∑s,s′ w(s, s) = a(s),since ps,∶ is a probability distribution.

Definition 1.34. A continuous-time Markov graphM = (S,w, p0) is such that for

all s ∈ S, if s ≠ s′, then w(s, s′) ≥ 0. The function w is also called the rate function.

For any s ∈ S, set also w(s, s) = −a(s), where

a(s) = ∑s′∈S

w(s, s′)

is called the activity of a state s. A process Xt assigned to (S,w, p0) is a CTMC,

and it is such that, for all s, s′ ∈ S and for all t ≥ 0,

(i) P(X0 = s) = p0(s),

(ii) P(Xh = s′ ∣ X0 = s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

w(s, s′)h + o(h) , if s ≠ s′

1 − a(s)h + o(h) , otherwise,with f ∈ o(h) if

limh→0f(h)h = 0.

There are many different ways of defining the CTMC. For example, instead of

defining the probability of transitions in a small interval [0, h), an equivalent

definition, which is more helpful for simulating the CTMC is given by asking that:

(i) P(X0 = s) = p0(s), for all s ∈ S,

(ii) P(ξ1 < t ∣ Z0 = s) = 1 − e−a(s)t, for all s, s′ ∈ S,

(iii) P(Z1 = s′ ∣ Z0 = s) = w(s,s′)a(s) , for all s, s′ ∈ S.

Intuition is the following: assume being in state s, and having set an independent

alarm clock for each s′ such that w(s, s′) > 0, each with exponentially distributed

expiration time with parameter w(s, s′)/a(s). The state of which alarm expires

first is the next chosen state. This interpretation is consistent, due to the following

simple property of exponential distributions.

Lemma 1.35. If X1, . . . ,Xn are independent random variables with Xi ∼ Expλi ,

then X ≡ minXi ∼ Expλ, where λ = λ1 + . . . + λn. Moreover, the index of the Xi

is a discrete random variable with P(i = j) = λjλ .


1.5.2 Transient distribution

Similarly as in the discrete-time case, we may compute the transient distributions

recursively. Recall the notation p(t)(s, s′) = PXt = s′ ∣X0 = s. Then, for s′ ≠ s,

d

dtp(t)(s, s′) = lim

h→0

p(t+h)(s, s′) − p(t)(s, s′)h

= limh→0

1

h(∑s′′∈S

p(t)(s, s′′)p(h)(s′′, s) − p(t)(s, s′)) ,by (1.1),

= limh→0

1

h

⎛⎝ ∑s′′∈S∖s′

p(t)(s, s′′)p(h)(s′′, s) + p(t)(s, s′)p(h)(s′, s) − p(t)(s, s′)⎞⎠

= limh→0

1

h

⎛⎝ ∑s′′∈S∖s′

p(t)(s, s′′)p(h)(s′′, s) − p(t)(s, s′)[1 − p(h)(s′, s′)]⎞⎠

= limh→0

1

h

⎛⎝ ∑s′′∈S∖s′

p(t)(s, s′′)w(s′′, s′)h + o(h) − p(t)(s, s′)ha(s′)⎞⎠

= ∑s′′∈S∖s′

w(s′′, s′)p(t)(s, s′′) − a(s′)p(t)(s, s′),

where we used Definition 1.34 to evaluate p(h)(s′′, s) and p(h)(s′, s′). Therefore,

the marginal distribution of Xt computes to:

d

dtp(t)(s) = −a(s)p(t)(s) +∑

s′≠s

w(s′, s)p(t)(s′). (1.2)

also known as the Kolmogorov forward equations for the stochastic process, or the

chemical master equation in the biology literature.

For a given CTMC Xn, the matrix ddtP

(t) = Q, with entries q(s, s′) = w(s, s′) ∈ Ris called the generator matrix. The Equation (1.2) translates to

d

dtπ(t) = π(t)Q,

which solves to π(t) = π(0)etQ. The result involves etQ, the standard matrix

exponential defined by etQ = ∑∞n=0

tnQn

n! . The solution is valid always when the

state space is finite.


1.5.3 Uniformization

Definition 1.36. Suppose there exists r, such that r ≥ sups∈S ∣Qs,s∣, and let M =Q/r + IN (I is an identity matrix of dimension N). The DTMC with transition

matrix M is called the subordinated process of Xt with uniformization constant

r.

Let Zn be a DTMC on S, with transition matrix M. The DTMC Zn is

also called a uniformized or randomized chain, used in principle for numerical

computation of the marginal transient distribution of Xt, since

π(t) = π(0)∞

∑n=0

tnQn

n!= π(0)

∞

∑n=0

tn(r(M − I))nn!

= π(0)e−rt∞

∑n=0

Mn (rt)nn!

.

The practical convenience of the method is that the error of truncating the sum

can be bounded apriori:

π(t) −k

∑n=0

π(0)Mn (rt)nn!

e−rt =∞

∑n=k+1

π(0)Mn (rt)nn!

e−rt ≤ 1 −k

∑n=0

(rt)nn!

e−rt,

since the vector π has all components not larger than 1. Let ξt be a Poisson

process with intensity r, independent of Zn. Then, it can be shown that Xtand Zξ(t) are equivalent in distribution. Consequently, the CTMC Xt and its

subordinated DTMC Zn have identical stationary distribution. The uniformized

chain will be useful when discussing the stationary properties of a CTMC in the

context of exact Markov chain aggregation.

1.5.4 Stationary behavior

In the following, we assume to be given a generator matrix Q.

Definition 1.37. A probability distribution µ ∶ S → [0,1] is called a stationary,

equilibrium or steady state probability distribution of Q, if

µ(s) = ∑s′∈S

µ(s′)P(Xt = s′ ∣X0 = s) = ∑s′∈S

µ(s′)ps,s′(t).

In matrix notation, a non-negative vector µ is an invariant measure if µQ = 0.

Definition 1.38. We will say that the generator Q is irreducible if the transition

matrix of the embedded DTMC is irreducible.


Theorem 1.39. Suppose that Xt is a non-explosive CTMC, and that it has an

irreducible generator matrix Q. Then, there is a unique stationary distribution µ

with positive values at all components, such that

P(Xt = s)→ µ(s) as t→∞ for all s.

1.5.5 Finite-dimensional marginal probabilities

Assume to be interested in the finite-dimensional marginal probability P(Z0 =s0, τ0 < δ0, . . . , Zk−1 = sk−1, τk−1 < δk−1, Zk = sk). The joint probability can be

decomposed to

P(Z0 = s0)k−1

∏i=0

[P(τi < δi ∣ Zi = si)P(Zi+1 = si+1 ∣ Zi = si)] ,

which is equivalent to the probability of observing a sequence of states s0, . . . , sk

in the embedded DTMC, and the product of probabilities P(τ0 < δi ∣ Z0 = si)for i = 0, . . . , k − 1. Let ρs ∼ (τ0 ∣ Z0 = s) denote the waiting time in state s.

Then, the cdf of ρs is given by Fρs(δ) = 1 − e−a(s)δ and the corresponding pdf is

fρs(δ) = a(s)e−a(s)δ. The cdf of the joint random variable (Z0, . . . , Zk, τ0, . . . , τk−1)at point (s0, . . . , sk, δ0, . . . , δk−1) evaluates to

P(Z0 = s0, . . . , Zk = sk)k−1

∏i=0

P(ρsi < δi) = p0(s0)k−1

∏i=0

[(1 − e−a(si)δi) w(si, si+1)a(si)

] ,

yielding the respecting pdf p0(s0)∏k−1i=0 [fρsi(δi)

w(si,si+1)a(si)

].

Chapter 2

Rule-based modeling of biochemical

networks

In this Chapter, we introduce a rule-based language, a form of site-graph-rewrite

grammar, tailored for modeling low-level bio-molecular interactions. The need for

such a formalism was first discussed in [76], [36], before it was formally intro-

duced in 2003 [24]. Kappa [34] and BioNetGen [6] are examples of two rule-based

modeling platforms which appeared to date.

A simple rule-based model is sketched in Figure 2.1. Informally, an agent of type

B can form a bond with either agent of type A or agent of type C, via specific

(typed) site variables (a, b or c). A transition can be triggered upon local tests on

agent’s interface – omitting the site c of agent B in rule R1 (or R−1 ) means that

the conformation of site c is irrelevant for executing rule R1 (or R−1 ) (sometimes

referred to as the don’t care, don’t write agreement). Typically, agent types encode

proteins and site types encode respective protein domains. The executions of rule-

based models - programs written in a rule-based language - are defined according

to the principles established in physical chemistry and molecular physics domain.

a b AB

c3

c−3

a b AB

c4

c−4B

b CcB

b CcR1, R

−1 : R2, R

−2 :

Figure 2.1: A simple rule-based model

A rule-based model can be understood as a compact, symbolic encoding of a set

of biochemical reactions. In this sense, rule-based models are just a ‘syntactic’

29

Chapter 2. Rule-based modeling of biochemical networks 30

shift with respect to traditional models, but the impact of this simple idea goes

much beyond. Being visually comprehensive, but at the same time formal (and

hence executable), rule-based models become a powerful alternative to the tradi-

tional approaches. First, for a modeler, the site-graph representation of molecular

complexes renders models easy to read, write or edit. Moreover, the description of

interactions is compact and models can trivially be composed, by simply merging

two collections of rules. Finally, unlike the non-formal interaction diagrams, a rule

set can be executed according to its prescribed semantics, or subjected to static

(preprocessing) analysis. Questions such as reachability of a particular molecular

complex, causal relations between rule executions, or quantitative analysis with

respect to the underlying chemical kinetics, can be automatized.

We start by outlining the classical (deterministic) and stochastic chemical kinet-

ics, which are fundamental to biochemical reaction network analysis. These two

mathematical models will serve as a reference when defining the semantics of rule-

based models. Then, site-graphs, rule-based models and their stochastic semantics

are introduced. In the end of the Chapter, we introduce an example of a rule-set

which will facilitate the illustrations throughout the thesis.

2.1 Chemical kinetics

Building a model involves two important choices, related to (i) how to represent

the model (syntax), (ii) how to interpret, ‘execute’ the model (semantics). Popu-

lation models are widely used in modeling interactions among a set of individuals,

distinguishable only by the class of species they belong to. Any population model

can be represented in terms of reactions of the form

A + 2B´¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¶reactant species

k→ C®product species

,

where k denotes the rate or a speed at which the change occurs. A model of

population dynamics can be

(i) either discrete or continuous, depending on whether the population quantity

is modeled as a discrete or a continuous value, and


(ii) either deterministic or stochastic, depending on whether the output trajec-

tory is fully determined by the initial state (deterministic), or if different

trajectories can emerge, each associated with a certain probability (stochas-

tic).

Definition 2.1. A reaction system is a pair (S,R), such that

(i) S = S1, S2, . . . , Sn is a finite set of species,

(ii) R = r1, . . . , rr is a finite set of reactions. Each reaction is a triple rj ≡(aj,νj, kj) ∈ Nn ×Nn ×R≥0, written down in the following form:

a1jS1, . . . , anjSnkj→ a′1jS1, . . . , a

′njSn, such that a′ij = aij + νij.

The vectors aj and a′j are often called respectively the consumption and

production vectors due to reaction rj, and kj is the kinetic rate of reaction

rj.

2.1.1 Stochastic chemical kinetics

Numerous studies have shown that stochastic effects generate phenotypic hetero-

geneity in cell behavior and that cells can functionally exploit variability for in-

creased fitness ([64] is an early review on the subject). As many genes, RNAs and

proteins are present in low copy numbers, deterministic models are insufficiently

informative or even wrong.

Consider, for example, a simple birth-death model ∅ k1→ S1, S1k2→ ∅. The de-

terministic solution z(t) = z(0)et(k1−k2) is interpreted as the mean population of

species S1 through time. Any additional experimental observation, such as the de-

gree of deviation around the average value, or the probability of extinction of the

species at a given time cannot be deduced. In more complex examples, observing

that the population exhibits bimodal response cannot be made unless a stochastic

model is employed.

Before introducing the stochastic model of a biochemical reaction system, we state

in the next Theorem a result of molecular physics, called the fundamental premise

of stochastic chemical kinetics [42].


Theorem 2.2. Consider a set of species interacting in a finite volume V . If

the system is well-mixed and the temperature is constant, for a reaction of form

S1, ..., Sl → products, there exist stochastic rate constants cj, such that

cjdt ≈probability that a randomly chosen tuple of S1, . . . , Sl (2.1)

will react according to rj in the next infinitesimal time dt.

A discrete, stochastic model of a biochemical reaction system reacting in a well-

stirred mixture of volume V and in thermal equilibrium is shown in Definition 2.3.

It can be derived from the fundamental premise as shown in, for example, ([41],

Section 5.3.B).

Definition 2.3. Let (S,R) be a reaction system and x0 = (x1, ..., xn) ∈ Nn an

initial state of the system. Then, the discrete, stochastic model is a continuous-

time Markov chain (CTMC) Xt with Markov graph (S,w, p0), such that

(i) S = x ∣ x is reachable from x0 in R,

(ii) p0(x0) = 1,

(iii) w(x,y) = ∑λj(x)1y=x+νj ∣ j = 1, . . . , r.

The family of functions λj ∶ Rn → R ∣ j = 1, . . . , r, called also stochastic reaction

rates is defined by

λj(x) = cjn

∏i=1

(xiaij

) (2.2)

The binomial coefficient ( xiaij

) represents the probability of choosing aij molecules

of species Si out of xi available ones.

In the following, we use the vector notation Xt for the marginal of process Xtat time t.

The Theorem 2.2 is said to be fundamental to stochastic chemical kinetics, be-

cause the remaining theory follows from the fundamental premise via the laws of

probability. Other remaining theory in particular relates to


(i) computing the transient probability distribution of Xt via the Kolmogorov

forward equation, also known as chemical master equation (CME) in chem-

ical literature. Concretely, denoting by p(t)(x) = P(Xt = x), the CME for

state x ∈ Nn is

d

dtp(t)(x) =

r

∑j=1,x−νj∈S

λj(x − νj)p(t)(x − νj) −r

∑j=1

λj(x)p(t)(x). (2.3)

(ii) the simulation of traces of Xt, known as the stochastic simulation algo-

rithm (SSA) in chemical literature [40].

Notice that the CME implies that the expectation of the marginal distribution of

Xt satisfies the equations

d

dtE(Xt) = ∑

x∈S

xd

dtp(t)(x)

= ∑x∈S

r

∑j=1

((x + νj) − x)λj(x(t))p(t)(x)

=r

∑j=1

νj (∑x

λj(x(t))p(t)(x))

=r

∑j=1

νjE(λj(Xt)),

Observe a transition from x to x+νj. The term λj(x)P(X(t) = x) appears exactly

once when summing up for the state x = x as the outflow probability, and exactly

once when summing up for the state x = x + νj, as the inflow probability. This

gives the term (x + νk) − x = νj multiplying λj(x)p(t)(x).

It is worth noting that, upon scaling the rate constants as will be explained in Sec-

tion 2.1.3, the equations for E(Xt) are equivalent to (2.4) only if all rate functions

are linear, that is, when all reactions are unimolecular.

2.1.2 Classical chemical kinetics

Conventional chemical kinetics handles ensembles of molecules with large number

of particles, 1020 and more. The chemist uses concentrations rather than particle

numbers, [N] = N/NA ⋅ V , where NA = 6.23 ⋅ 1023mol−1 is the Avogadro’s number

and V is the volume (in dm3). When the pressure and temperature are constant,

the following continuous, deterministic model is appropriate.


Definition 2.4. Let (S,R) be a reaction system and z0 = (z1, ..., zn) ∈ Rn an initial

state of the system. Then, the continuous, deterministic model is the solution of

the set of n coupled differential equations given by

d

dtzi(t) =

r

∑j=1

νijλj(z(t)), for i = 1,2, . . . , n, (2.4)

satisfying the initial condition z0. The family of functions λj ∶ Rn → R ∣ j = 1, . . . , r,

called also deterministic reaction rates is defined by

λj(z) = kjn

∏i=1

ziaij . (2.5)

The fact that the speed of a chemical reaction is proportional to the quantity of

the reacting substances is known as the law of mass action.

2.1.3 Determinisitc and stochastic rate constants

We mentioned above the existence of both reaction rate constant kj and stochastic

rate constant cj. Deterministic and stochastic rate constant are not equivalent.

When switching between the stochastic and deterministic model, a conversion

of rates must be performed. In particular, stochastic rate constant depends on

the volume and the arity of a reaction. In general, the conversion is such that

the stochastic rate function applied to a state x ∈ Nn for a reaction rj, and the

deterministic law of its conversion to a volume unit - xV ∈ Rn will relate as

λj(xV −1) = λj(x)V −1. (2.6)

The careful study of the above conversions is outlined in [41]. Intuitively, recall

(2.1) and observe that, as unimolecular reactions represent a spontaneous conver-

sion of a molecule, they should not be volume-dependent. In bimolecular reactions,

the stochastic rate cj will be proportional to 1/V , reflecting that two molecules

have harder time finding each-other within a larger volume. In general, whenever

no more than one copy of each species is consumed in reaction rj of arity ∣aj ∣, then

cj = kjV −(∣aj ∣−1). The approximation cj ≈ kjV −(∣aj ∣−1) holds when all species are

highly abundant. More concretely, when each species is highly abundant, then,

the stochastic rate function for a reaction rj of arity ∣aj ∣ can be approximated


to λj(x) = cj∏ni=1 ( xiaij) ≈ cj∏

ni=1 x

aiji ; The deterministic law of its conversion to a

volume unit is then λj(xV −1) = kj∏ni=1 xi

aijV −aij = kjV −∣aj ∣∏ni=1 xi

aij .

2.1.4 Random time change model and the thermodynamical limit

The so-called ‘thermodynamic limit’ is defined as the limit in which the reactant

populations xi and the system volume V all become infinitely large, but in such

a way that the reactant concentrations xi/V stay fixed. Importantly, the thermo-

dynamic limit is not a limit that the system actually approaches as a consequence

of its natural temporal evolution, nor as a result of the experimental intervention-

it is an idealized state that is useful because it provides a convenient approxi-

mation to macroscopic systems. Even though deterministic models historically

appeared first, they represent a particular approximation of the stochastic model.

We outline the sketch of the derivation of [2].

Denote by Rj(t) the number of times that the j-th reaction had happened until

the time t. Then, the state of the system at time t is Xt = X0 +∑rj=1Rj(t)νj. The

value of Rj(t) is a random variable, that can be described by a non-homogenous

Poisson process, with parameter ∫t

0 λj(Xs)ds, that is, Rj(t) = ξj(∫t

0 λj(Xs)ds).Finally, the expression

Xt = X0 +r

∑j=1

ξj (∫t

0λj(Xs)ds)νj (2.7)

represents the evolution of the state Xt.

Denote by V the size of the volume in which the reactions take place. Introducing

the scaled states XV = V −1X ∈ Rn, the scaled propensities λj(XV ) = V −1λj(VXV )and denoting by ξj(t) = ξj(t) − t the centered Poisson process, the scaled version

of (2.7) can be written as

XVt = XV

0 +r

∑j=1

νjV−1ξj (V ∫

t

0λj(XV

s )ds) (2.8)

= XV0 +

r

∑j=1

νj ∫t

0λj(XV

s )ds + V −1r

∑j=1

νj ξj(V ∫t

0λj(XV

s )ds). (2.9)

Letting V → ∞, the law of large numbers for the Poisson process ([2], Lemma

1.2), implies that V −1ξj(V t) ≈ 0, and and the process ξj follows – according to the


remaining term – an ordinary differential equation, equivalent to the reaction-rate

equation (2.4). The performed limit is referred to as the thermodynamic limit.

To summarize, classical chemical kinetics faithfully describes the mean population

sizes, either when all reactions are unimolecular [65], or when the system is in the

‘thermodynamical limit’ [43, 60]. When the assumptions underlying the classical

kinetics break down, the deterministic models can be not only less informative,

but also misleading [79].

2.2 Site-graphs

We now introduce site-graphs, which will facilitate the formal definition of a rule-

based model.

A site-graph is an undirected graph where typed nodes have sites, and edges are

partial matchings on sites. Moreover, the sites which do not serve for forming

edges are called internal, and they are assigned a value from a predefined set. The

nodes of the site-graph usually interpret protein names, and sites of a node stand

for protein binding domains. Internal states are used to encode post-translational

modifications.

Let S denote the set of site labels, and I the set of internal values that can be

assigned to sites. The function I ∶ S → P(I ) denotes the set of internal values

that a site s can take. A site s can be evaluated only to a predefined set of values,

I(s). If the site is used for creating a bond, its set of predefined internal values is

empty.

A rule-based model is defined over a fixed contact map, reflecting that a mod-

eler fixes the assumptions on the structure of proteins, which are intended to be

included in the model.

Definition 2.5. A contact map (CM) is a tuple (A,Σ,E ,I), where A be the set

of node types, and each node type is being equipped with a set of sites, defined

by a signature map Σ ∶ A → P(S ). The set E ⊆ ((A, s), (A′, s′)) ∣ A,A′ ∈ A, s ∈Σ(A), s′ ∈ Σ(A′),I(s) = I(s′) = ∅ is a set of predefined edge types.

In the following, assume that site-graphs are defined over a contact map C =(A,Σ,E ,I ).


Definition 2.6. A site-graph is a tuple G = (V,Type, I,E,ψ) with

(i) a set of nodes V ,

(ii) a node type function Type ∶ V → A,

(iii) a node interface function I ∶ V → P(S ), such that for v ∈ V , I(v) ⊆Σ(Type(v)),

(iv) a set of edges E ⊆ E , which is

symmetric: ((v, s), (v′, s′)) ∈ E iff ((v′, s′), (v, s)) ∈ E,

injective: if ((v, s), (v′, s′)) ∈ E, ((v, s), (v′′, s′′)) ∈ E, then (v′, s′) = (v′′, s′′),

irreflexive: for all v ∈ V , s ∈ S , ((v, s), (v, s)) ∉ E,

(v) a site evaluation function ψ ∶ (v, s) ∣ v ∈ V, s ∈ I(v) → I ∪ ε, such that,

[if I(s) ≠ ∅ then ψ(v, s) ∈ I(s)]. In words, if site s is an internal site, the

function ψ assigns an internal value to a node-site combination.

Site-graphs will be used in three different contexts: (i) to model physically exist-

ing group of interacting complexes, termed reaction mixtures, and the connected

components within them, called species, (ii) to specify the local interaction pat-

terns – rewrite rules, (iii) to model those patterns/motifs whose quantities within

a reaction soup are to be tracked as a state of the stochastic model – fragments.

The function Σ in the above definition tracks the sites assigned to a particular

node of a site-graph. The reaction mixtures (and species) must have all interfaces

complete, in the sense that all the sites of node’s signature are listed in its inter-

face. The patterns appearing in rules of fragments typically have non-complete

interfaces.

Notice that CM is itself not necessarily a site-graph, since the edge may occur

between two same sites. The set P = (v, s) ∣ s ∈ I(v) ⊆ V ×S is called the set of

ports. Given an edge e = (p, p′) ∈ E, we denote by e the symmetric edge (p′, p).

Definition 2.7. A site-graph G = (V,Type, I,E,ψ), such that for all v ∈ V ,

I(v) = Σ(Type(v)) is called a reaction mixture.

Definition 2.8. Given a site-graph G = (V,Type, I,E,ψ), a sequence of edges

(e1, . . . ek) ∈ Ek, ei = ((vi, si), (v′i, s′i)), such that v′i = vi+1 and s′i ≠ si+1, for i =1, . . . k − 1, is called a path between nodes v1 and vk.


Definition 2.9. A site-graph G is connected if there exists a path between every

two vertexes v and v′.

Definition 2.10. A connected site-graph is called a pattern. A connected reaction

mixture is called a species.

Note that, different to the reaction system model (Definition 2.1), the set of all

species may be infinite. For example, consider a set of sites S = x, y, such

that I(x) = I(y) = ∅. Hence, both x and y serve as binding sites. More-

over, let A = A,B, and Σ(A) = Σ(B) = x, y, and a set of edge types is

E = ((A,y), (B,x)), ((A,x), (B,y)). Potentially infinitely many connected re-

action mixtures are formed over this contact map.

Two site-graphs can be related by an embedding function, which is important

for assigning the stochastic process to a rule-based model. The symmetry of a

site-graph is formalized as a bijective embedding of a site-graph to itself, called

automorphism.

Definition 2.11. The embedding σ between site-graphs G = (V,Type, I,E,ψ) and

G′ = (V ′,Type′, I ′,E′, ψ′), is induced by a support function σ∗ ∶ V → V ′, if

(i) σ∗ is injective: for all v, v′ ∈ V , [σ∗(v) = σ∗(v′) Ô⇒ v = v′];

(ii) for all v ∈ V , Type(v) = Type′(σ∗(v)) and for all s ∈ S such that I(s) ≠ ∅,

ψ′(σ∗(v), s) = ψ(v, s);

(iii) for all v ∈ V , [s ∈ I(v) Ô⇒ s ∈ I(σ⋆(v))];

(iv) for all (v, s) ∈ V ×S , if I(s) = ∅, then ψ(v, s) = ψ′(σ∗(v), s); Otherwise, for

all (v′, s′) ∈ V ×S , [if ((v, s), (v′, s′)) ∈ E then σ∗(v) ∈ V ′ and ((σ∗(v), s), (σ∗(v′), s′)) ∈E′].

Notice that requirement (iv) enforces that both nodes forming an edge in a site-

graph G are embedded in the site-graph G′. If σ∗ is bijective, then σ is an iso-

morphism. We denote that G is isomorphic to G′ by G ≅ G′. An isomorphism

between G and itself is called automorphism. If σ ∶ G1 → G2 is an isomorphism,

we write G2 = σ(G1). The set of embeddings between the site-graph G and G′ will

be denoted by Emb(G,G′), the set of isomorphisms between site-graphs G and

G′ will be denoted by Iso(G,G′), and the set of automorphisms of a site-graph G

will be denoted by Aut(G). The set cardinality will be denoted by ∣ ⋅ ∣.


2.3 Rule-based models

Rule-based language is a formalism for specifying biochemical reaction systems,

in which the internal structure of molecular species is represented by site-graphs,

where modifications of protein residues and bonds are explicitly encoded. The

inspiration for the here-presented site-graph-rewrite language is Kappa [34], even

though the formalism which we present here does not fully coincide with that of

Kappa, and is often referred to as a kernel of Kappa. For example, the release of a

dangling bond (in Kappa syntax, written as an expression A(x!)→ A(x)) cannot

be expressed in our framework.

The site-graphs are specifically designed for modeling molecular interactions and,

to our knowledge, it was not previously studied within the classical graph theory

works. On the other hand, studying graph rewrite grammars and graph rewrite

transformation systems (the difference is that a transformation system has no

initial state) has been present in the computer science research since the late 60’s

[73] to today [28, 83].

A rule-based program is given by an initial reaction mixture and a collection

of rules over a fixed contact map. We first define a rule over a contact map

C = (A,Σ,E ,I ). The shorthand notation ψA will be used to denote mapping

ψA ∶ Σ(A) → I ∪ ε for a full valuation function of a node of type A, where the

binding sites are evaluated to ε.

Definition 2.12. Let G, G′ be site-graphs, and c ∈ R a non-negative real number.

The triple (G,G′, c), also denoted by Gc→ G′, is called a rule. A rule is well-

defined, if G′ = (V ′,Type′, I ′,E′, ψ′) can be derived from G = (V,Type, I,E,ψ) by

a finite number of applications of five elementary site-graph transformations:

(i) adding an edge: δae(G,e) = (V,Type, I,E∪e, e, ψ′), where e = ((v, s), (v′, s′))is such that v, v′ ∈ V , s ∈ I(v), s′ ∈ I(v′);

(ii) deleting an edge: δde(G,e) = (V,Type, I,E ∖ e, e, ψ′), where e, e′ ∈ E,

(iii) changing the state value: δci(G,v′, s′, i′) = (V,Type, I,E,ψ′), where s′ ∈I(v′), i′ ∈ I(s′), and

ψ′(v, s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

i′, if v = v′ and s = s′,

ψ(v, s), otherwise;


(iv) deleting a node: δdn(G,v) = (V ′,Type, I,E′, ψ′), such that V ′ = V ∖ v,

ψ′ = ψ∣V ′ (the restriction of function ψ to the set of nodes V ′), and E′ =E ∖ e ∣ there is a site s and a port p, e, e ∩ (v, s), p) ≠ ∅,

(v) adding a node: δan(G,A,ψA) = (V ∪ v′,Type, I,E,ψ′), such that v′ ∉ V ,

Type(v′) = A, I(v′) = Σ(A), and ψ′(v, s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

ψA(s), if v′ = v,

ψ(v, s), otherwise.

The interface function I is unaltered under any of the transformations - a site

cannot be added or deleted from node’s interface by a rule. Adding a node needs

an evaluation of all internal sites in the signature of that node.

Definition 2.13. A rule-based program over a contact map C is a tuple (R,G0),where

(i) R = R1, . . . ,Rn is a set of well-defined site-graph rewrite rules over the

contact map C,

(ii) G0 is the the reaction mixture over the contact map C.

Since our analysis will not be tailored specifically to the observable site-graphs

(those whose quantities are to be observed during model execution), we do not

include the set of observable site-graphs into the definition of a rule-based program.

Instead, for a proposed fragment set (to be defined later), we will assume that the

observables are all fragments from that set.

A rule-set is defined over a fixed contact map. However, if it is not explicitly

defined, contact map can be inferred from a rule-set, as a union of contact maps

inferred from the lhs and rhs of each rule.

Definition 2.14. The contact map inferred from a site-graphG = (V,Type, I,E,ψ)is C = (A, I,E ,I ), where A = Type(v) ∣ v ∈ V , Σ(A) = ∪I(v) ∣ v ∈ V,Type(v) =A, E = ((A, s), (A′, s′)) ∣ ((v, s), (v′, s′)) ∈ E,Type(v) = A,Type(v′) = A′,

I (s) = i ∣ ∃v ∈ V, s ∈ I(v), ψ(v, s) = i.

Definition 2.15. Given contact maps C1 = (A1, I1,E1,I1) and C2 = (A2, I2,E2,I2),their union is a contact map C = (A, I,E ,I ), such that A = A1 ∪ A2, I(A) =I1(A)∪ I2(A) for all A ∈ A, E = E1 ∪E2, I = I1 ∪I2. If the interface functions are

such that for all A ∈ A, I1(A) ∩ I2(A) = ∅, we write C = C1 ⊎C2.


Given an initial reaction mixture, the continuous-time Markov chain (CTMC)

assigned to a rule-based model takes values in the set of reaction mixtures reachable

from the initial one.

A rule Ri = (Gi,G′i, ci) ∈ R can be applied to a reaction mixture G if there exists

an embedding between Gi and G. The rule-application to a reaction mixture via

a given embedding is formalized by function

δi ∶ G × (Vi → V )→ P(G),

which takes as input a reaction mixture G, and a candidate embedding between

the lhs of the rule R and G. The result is equal to ∅, if σ∗ does not induce

an embedding between Gi and G. Otherwise, if the rule does not include node

creation, the result, δi(G, σ∗), is uniquely determined for a chosen embedding, as

it is rigorously shown in [20]. Intuitively, the part of the mixture to which the lhs

of the rule is embedded, is transformed by the rule, whereas the rest of it remains

unchanged (Figure 2.2). Finally, if one of the elementary transformations defining

a rule includes the creation of node of type A, there are countably many results

of application of the rule to a reaction mixture – the node is added accordingly as

defined in Definition 2.12. The name of the newly created node must be chosen

from a predefined set of node names, which will be denoted by N . It will also be

useful to adopt a naming convention, a bijective mapping ζ ∶ A × N → N , which

reflects the total order among the names reserved for nodes of the same type.

Then, if the initial reaction mixture has k copies of node of type A, the nodes

used to encode it will be ζ(A,1), ζ(A,2), . . . , ζ(A,k). If the current node set in

a reaction mixture is V , and a new node of type A is created, the corresponding

node name will be a node ζ(A, j) where j is chosen uniformly at random from

the set N ∖ i ∣ ζ(A, i) ∈ V . For example, the node naming convention can be

ζ(A, i) ∶= viA, where A ∈ A, i ∈ N.

2.4 Site-graph rigidity and counting automorphisms

Each embedding of a lhs of a rule to a reaction mixture is one randomly chosen

combination of species which conform the lhs description. We will therefore be

interested in the number of embeddings of G into the reaction mixture G. Loosely,


a b AB

a b AB

cR

d

b

BB

C

A

v2v1

v4

v3a

c

a

c d

b

BC

Aa

c

a

c

c

B v2v1

v4

v3

G G

Figure 2.2: Rule application. Rule R can be applied to a reaction mixtureG via the embeding indicated by the dotted arrows. The result is the reaction

mixture G′ = δi(G,u↦ v2, v ↦ v3).

it is the number of different occurrences of the motif described by G inside the

reaction mixture G.

Lemma 2.16. Let G be a pattern, and G a reaction mixture over the contact map

C. Then, the number of embeddings of G to G different up to automorphism will

be denoted by mG(G), and it counts to

mG(G) ∶= ∣Emb(G,G)∣∣Aut(G)∣ .

Proof. (Sketch) Each embedding of site-graph G to a site-graph G comes together

with exactly ∣Aut(G)∣ different embeddings, each determining exactly the same

connected sub-site-graph of G. Therefore, the number of occurences of a motif

described by species G in a reaction mixture G equals to the total number of

embeddings in Emb(G,G), divided by the number of automorphisms in Aut(G).

It is much easier to count the embeddings between two connected site-graphs than

between general graphs. For regular graphs, deciding whether there exists an

embedding between two graphs (subgraph isomorphism problem), is known to be

NP-complete. On the other hand, site-graphs enjoy the rigidity property which

ensures that an embedding between two connected site-graphs is fully defined by

the image of one node.

Theorem 2.17. Let G = (V,Type, I,E,ψ) and G′ = (V ′,Type′, I ′,E′, ψ′) be two

connected site-graphs and let σ1 and σ2 be two embeddings between G and G′.

Then, for any node v ∈ V , we have: [σ∗1(v) = σ∗2(v) Ô⇒ σ1 = σ2].


Proof. Let us assume that we have two connected site-graphs G and G′, two

embeddings σ1 and σ2 betweenG andG′, and a node v ∈ V such that σ∗1(v) = σ∗2(v).Let us consider another node v′ ∈ V . Since G is connected, there exists a path

p = (e1, . . . , ek) between the node v and v′. Let p′ be the image of the path p

by the σ1. Since σ1 is an embedding, p′ is indeed a path in G′. By induction

over n between 1 and k, if we denote en = ((vn, sn), (vn+1, sn+1)), it holds that:

σ∗1(vn+1) = σ∗2(vn+1) (and thus for n = k, we get σ∗1(v′) = σ∗2(v′). The induction

step is proved as follows. Since σ1 and σ2 are both embeddings, we know that:

((σ∗1(vn), sn), (σ∗1(vn+1), sn+1)) and (σ∗2(vn), sn), (σ∗2(vn+1), sn+1)) are two edges of

G′; by injectivity (Def. 2.6), σ∗1(vn) = σ∗2(vn) implies σ∗1(vn+1) = σ∗2(vn+1).

Recall that a species is a connected reaction mixture. Let S = S1, S2, . . . represent

the possibly infinite set of non-isomorphic species that can be formed in a given

rule-based model. In other words, S captures the set of species that are equivalent

up to isomorphism. The following result states that the quantity of any pattern

can be expressed as a linear combination of quantities of the species.

Lemma 2.18. For every pattern Fi, and for all reaction mixtures G ∈ G,

mG(Fi) = ∑Sj∈S

mG(Sj)mSj(Fi).

Proof. Let Fi = (Vi,Typei,Ii,Ei, ψi) and G = (V,Type,I,E,ψ). Take σ ∈ Emb(Fi,G).Let v ∈ Vi, and let the connected component in G, which contains σ∗(v) ∈ V be

denoted by ccG(σ∗(v)). It is isomorphic to some species Sj ∈ S. Denote the iso-

morphism by σ′ ∶ Sj → ccG(σ∗(v)). Then, by construction, σ′−1 σ ∶ Fi → Sj

is an embedding between Fi and Sj. Therefore, each embedding σ ∈ Emb(Fi,G)uniquely defines a species Sj and σ′ ∈ Emb(Sj,G), such that σ′−1σ ∈ Emb(Fi, Sj).Conversely, for each embedding σ′ ∈ Emb(Si,G) and σ1 ∈ Emb(Fi, Sj), σ′ σ1 ∈Emb(Fi,G), so the claim follows.

2.5 Individual-based and species-based semantics of rule-based

programs

Assume given a rule-based program (R,G0) over the contact map C = (A,Σ,E ,I ),where an initial mixture is written under the node naming convention ζ ∶ A×N→


viA ∣ A ∈ A, i ∈ N, defined by ζ(A, i) ∶= viA. We now define the Markov graph

assigned to a rule-based program.

Definition 2.19. A Markov graph (G, w, p0) assigned to a rule-based program

(R,G0), where G0 respects the naming convention ζ, is such that

(i) G denotes the set of reachable reaction mixtures:

G = G ∣ G is reachable by a finite number of

applications of rules from R to G0.

(ii) the initial distribution is p0(G) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

1/∣G ∩ Iso(G0)∣ if G ∈ G ∩ Iso(G0),

0 otherwise.

(iii) For every G,G′ ∈ G, let

w(G,G′) = ∑Ri∈R

ci ∣ G′ ∈ δi(G, σ∗).

Recall that, in case of no node creation in rule Ri, there is a unique reaction

mixture G′ ∈ δ(G, σ∗) for a chosen embedding (see [20] for a rigorous deriva-

tion). If the rule Ri includes the creation of a species, there are countably

many reaction mixtures G′ ∈ δ(G, σ∗), which differ in the choice of the names

for freshly created nodes, but they all respect the naming convention ζ. No-

tice that each new node name is thus chosen uniformly at random from the

set ζ(A, i) ∣ i = 1,2, . . . ∖ V , where V is the set of nodes in the reaction

mixture G.

Definition 2.20. Let Xt be CTMC assigned to (G, w, p0). Then, Xt is re-

ferred to as the individual-based (stochastic) semantics of (R,G0).

Since rules operate over node types, rather than individual, concrete node names,

it is natural to turn to a species-based view of the reaction mixture - to identify

all of those mixtures which are equivalent up to isomorphism:

G1 ∼ G2 iff exists σ ∈ Iso(G1,G2), such that σ(G1) = G2.

Assuming a finite set of species produced by a rule-set, S = S1, . . . , Sn, each

partition class is uniquely represented by a multi-set of species.


Definition 2.21. Define ϕS ∶ G→ X ⊂ Nn, so that

ϕS(G) = (x1, . . . , xn), with xi =mG(Si).

Conversely, let x ∈ Nn be a multi-set of species with the set of node types A =A1, . . . ,AN. Every reaction mixture G = (V,Type, I,E,ψ) such that ϕS(G) = x

contains the same copy number of nodes of type A1, A2 etc. All reaction mixtures

which have fixed node names and node types, while different site evaluation and

edges, define a set ϕ−1S(x):

ϕ−1S,V,Type(x) = G′ ≡ (V,Type, I ′,E′, ψ′) ∣G = (V,Type, I,E,ψ) ∈ G is such

that ϕS(G) = x and ϕS(G′) = x.

Then, the process Xt over the state space X , defined by [Xt = x iff Xt ∈ ϕ−1S(x)]

is referred to as the species-based semantics of (R,G0).

We will argue in Chapter 5 on the properties of process Xt. Then, it will also

be useful to know how many different reaction mixtures are lumped to a given

species-based state x.

Theorem 2.22. Let x be a multiset of species with S = S1, . . . Sn and A =A1, . . . ,AN. Then,

∣ϕ−1S (x)∣ = ( a1!a2! . . . aN !

∏ni=1(xi!∣Aut(Si)∣xi)

) , (2.10)

where ai denotes the number of nodes of type Ai in any site-graph G ∈ ϕ−1S(x).

Proof. Let G ∈ G with a set of nodes V be a reaction mixture such that ϕS(G) = x,

that is, it has xi different embeddings of a species Si, i = 1, . . . , n up to auto-

morphism. Assume that G contains ai nodes of type Ai, i = 1, . . . ,N . Any other

G′ ∈ ϕ−1S(x) can be obtained from G by an isomorphism σ ∶ G → G. Denote the

set of all such isomorphisms by Γx,G = σ ∶ G → G ∣ for all G ∈ ϕ−1S(x), ϕS(G) =

ϕS(σ(G)) = x. Then, ∣ϕ−1S(x)∣ = ∣Γx,G ∣. The support function σ∗ ∶ V → V of

an isomorphism σ ∈ Γ is such that a node v is mapped to a node of the same

type: Type(v) = Type(σ∗(v)). Denote the set of all maps with such a property

by Γ∗. The set Γ∗ has a1! . . . aN ! elements. However, some support functions


from Γ∗ determine exactly the same isomorphism, that is, for some σ∗, σ∗′ ∈ Γ∗,

it is σ(G) = σ′(G) = G′. We now determine how much of over-counting was done.

Firstly, consider a connected subgraph of G such that it forms a species Si, and

denote its set of nodes by Vi. Every two maps σ∗, σ′∗ ∈ Γ∗ whose restriction to

Vi is an automorphism of Si are inducing the same automorphism. There are

∏ni=1 ∣Aut(Si)∣xi such maps. Moreover, consider the species Si, and let Vi1, . . . , Vixi

denote the set of nodes of all subgraphs of G, which form the species Si. Let

σ∗∣Vi1 , σ∗∣Vi2 , . . . σ∗∣Vixi be the restrictions of σ∗ to the corresponding sets of nodes.

Then, every map σ∗′

whose sequence of projections (σ∗′∣Vi1 , σ∗′∣Vi2 , . . . σ∗′∣Vi1) is a

permutation of the projections (σ∣Vi1 , σ∣Vi2 , . . . σ∣Vi1), determines the same isomor-

phism. There are xi! such permutations for each species Si, and ∏ni=1 xi! such maps

over all species.

2.6 Examples

We now illustrate how the classical deterministic and stochastic model is assigned

to a rule-based program. The rule-based program will first be expanded to its

equivalent reaction system.

Example 2.1. (Simple scaffold) Scaffold protein B recruits independently the pro-

teins A and C. These assumptions are captured by a set of rules, R1,R2,R−1 ,R

−2,

depicted in Figure 2.3. Adding the rules R3,R4 accelerates the unbinding, when-

ever the bond is within a trimer complex (that is, the bonds are made less stable

when withing a trimer).

R3

a b

c d

AB

C

a b

c d

AB

C

c3 R4

a b

c d

AB

C

a b

c d

AB

C

c4

a b ABR1

a b AB

c1

c−1R2

c dB

C c dB

C

c2

c−2

Figure 2.3: Rule-set for Example 2.1.


The corresponding reaction system is (S,R), where S = SA, SB, SC , SAB, SBC , SABCand R = rA.B, rB.C , rA.BC , rAB.C , rA..B, rB..C , rA..BC , rAB..C, defined by

rA.B ∶ SA, SBk1→ SAB

rA.BC ∶ SA, SBCk1→ SABC

rB.C ∶ SB, SCk2→ SBC

rAB.C ∶ SAB, SCk2→ SABC

rA..B ∶ SABk1−→ SA, SB

rA..BC ∶ SABCk1−→ SA, SBC

rB..C ∶ SBCk2−→ SB, SC

rAB..C ∶ SABCk2−→ SAB, SC .

The consumption vectors and change vectors are the column vectors of respectively

matrices P and C:

P =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 0 0 0 0 0 0

1 0 1 0 0 0 0 0

0 0 1 1 0 0 0 0

0 0 0 1 1 0 0 0

0 1 0 0 0 0 1 0

0 0 0 0 0 1 0 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

and C =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

−1 −1 0 0 1 1 0 0

−1 0 −1 0 1 0 1 0

0 0 −1 −1 0 0 1 1

1 0 0 −1 −1 0 0 1

0 −1 1 0 0 1 −1 0

0 1 0 1 0 −1 0 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

while, according to the mass-action law, the rate function has the following form:

λ(z) = (k1zAzB, k1zAzBC , k2zBzC , k2zABzC , k1−zAB, k1−zABC , k2−zBC , k2−zABC).

Deterministic model. Denote by z ∈ R6 the vector of concentrations of species

from S. For keeping transparency, let zA denote the concentration of species A, zB

the concentration of species B etc. The continuous, deterministic model is given


by the set of ordinary differential equations:

dzAdt

= −zAzBk1 − zAzBCk1 + zABk1− + zABCk1−

dzBdt

= −zAzBk1 − zBzCk2 + zABk1− + zBCk2−

dzCdt

= −zBzCk2 − zABzCk2 + zBCk2− + zABCk2−

dzABdt

= −zABzCk2 − zABk1− + zAzBk1 + zABCk2

dzBCdt

= zBzCk2 − zBCk2− + zBzCk2 + zABCk1−

dzABCdt

= −zABCk1− − zABCk2− + zAzBCk1 + zABzCk2.

Stochastic model. Assume that there are initially three copies of agent B, one

copy of agent A and one copy of agent C, which is represented by a population

state x0 = (1,3,1,0,0,0). For transparency, we will represent states in form of

multi-sets - for example, x0 ≡ A,3B,C. The stochastic model is a CTMC Xtwith a Markov graph (S,w, p0), such that p0(x0) = 1, S = x0,x1,x2,x3,x4, and

the weights are as depicted in Figure 3.1.

x0

x1

x2

x3

x4

3c1

3c2 2c1

2c2

c1

c2 c1−

c2−

c1−

c2−

c1−

A, 3B, C

AB, 2B, C

A, 2B, BC ABC, 2B

AB, B, BC

c2−

Figure 2.4: Markov graph for x0 ≡ A,3B,C.

Denoting by p(t)(x) = P(Xt = x), the CME is represented by the following system

of equations (the superscript (t) is omitted):

dp(x0)dt

= c1−p(x1) + c1−p(x2) − p(x0)(3c1 + 3c2)dp(x1)

dt= 3c1p(x0) + c2−p(x3) + c2−p(x4) − p(x1)(c1− + 2c2 + c2)

dp(x2)dt

= 3c2p(x0) + c1−p(x3) + c1−p(x4) − p(x2)(c1− + 2c1 + c1)dp(x3)

dt= 2c2p(x1) + 2c1p(x2) − p(x3)(c2− + c1−)

dp(x4)dt

= c2p(x1) + c1p(x2) − p(x4)(c2− + c1−).


0

1

3

time 10

ABCABBCABC

0

1

3

time 10

ABCABBCABC

a) b)

copy number per volume unit(stochastic model)

concentration(deterministic model)

mean copy number per volume unit, for volume=1

mean copy number per volume unit, for volume=20

concentration(deterministic model)

Figure 2.5: Deterministic and stochastic models for Example 2.1. a) Forvolume V = 20v, the solution z(t) of the deterministic model with initial statez(0) = (1,3,1,0,0,0)v, and one scaled trajectory of a stochastic simulation

x(Vv)(t), for initial state x(0) = (20,60,20,0,0,0) (number of molecules). Rate

values are set to k1 = 1v−1s−1, k2 = 0.2v−1s−1, k1− = 2v−1s−1, k2− = 0.3v−1s−1 andc1 = 1s−1(Vv )

−1 c2 = 0.2s−1(Vv )−1, c1− = 2s−1, c2− = 0.3s−1. b) We integrated the

CME for two initial states: x1(0) = (1,3,1,0,0,0) (five equations of the modelpresented in Figure 3.1) and x2(0) = (20,60,20,0,0,0) (set of 3113 equations).The three plots represent: (solid lines) the solution z(t) of the deterministicmodel with initial state z(0) = (1,3,1,0,0,0)v, (dashed lines) the scaled meanpopulation for each species, for initial state x1(0), that is, 1

3E[X1(t)] and,(dotted lines) the scaled mean population for each species, for initial state x2(0),

that is, 120E[X2(t)].

In Figure 2.5a, we show the solution of the model in the deterministic limit, and

one trajectory of a stochastic model scaled with the volume, XV . In Figure 2.5b,

we illustrate that, due to bimolecular reactions, the mean population size does

not coincide with the solution in the deterministic limit. The used values of rate

constants are not inspired from real data. A volume unit is denoted by v. In

order to compare the deterministic and stochastic models, we assumed that the

volume scales with the total molecule number, more precisely, that one volume

unit corresponds to five molecules. Therefore, for the initial state for the stochastic

model to x(0) = (20,60,20,0,0,0) (molecules), the volume of V = 100 molecules

takes 20 units – V = 20v.

Two more working examples are introduced. The first one, a model of two-sided

polymerization will be useful to demonstrate that, by fragment-based reductions,

the state space of the fragment-based Markov graph can be exponentially smaller

than the state space of the species-based Markov graph. The second example has


only one node type with three sites, and it will be used to discuss the difference

between deterministic and stochastic fragments.

Example 2.2. (Two-sided polymerization) A rule-based model shown in Fig-

ure 2.6 describes an alternating polymerization between nodes A and B. The

rules R3,R−3 ,R4 and R−

4 describe the binding and unbinding events. Moreover,

each node has two activation levels, modelled by an internal state - a for node A

and b for node B, which are regulated independently of the bindings (rules R1,

R−1 , R2, R−

2 ). We assume that the binding between the nodes can be accelerated

if both nodes are in the active mode. This is incorporated by rule R∗3 . Hence, the

total rate of binding between an activated A and an activated B is c3 + c∗3.

xAc3

c−3

c∗3

R1, R−1 :

R2, R−2 :

R3, R−3 :

R∗3 :

AaA

c1

c−1bB

bB

c2

c−2

y B xA y Ba

y Bb

xAa

y Bb

xAa

R3, R−3 : yB yBx A Ax

c4

c−4

Figure 2.6: The rule-set described by Example 2.2. The shaded site s overnode type A represents a different internal state.

Example 2.3. (Conditional independence) Assume an agent A with three sites:

l(left), r(right) and c(control), such that each of them has two internal modifica-

tions, denoted by 0 and 1. The site c can change its internal value independently

of the other two rules. The sites l and r may change their internal value only if

the site c has value 1. The model is sketched in Figure 2.7.

R1, R−1 :

c

A

c1

c−1Ac

R2, R−2 :

c2

c−2Ac

l Ac

l

c3

c−3R3, R

−3 :

rAc

Ac

r

Ac

l Ac

l rAc

Ac

r

c∗2 c∗3R∗

2 : R∗3 :

Figure 2.7: The model described in Example 2.3.

Chapter 3

Automated reductions of rule-based

models

This Chapter outlines general properties of fragment-based reductions. These

reductions were termed fragment-based by Feret and co-authors, who used them

for automatically reducing the deterministic semantics of rule-based models [31].

The main focus of our work will be using fragment-based technique for reducing

stochastic semantics of rule-based models, that is, characterizing the stochastic

fragments and computing their dynamics.

The motivation for performing fragment-based reductions is the following. A small

number of rules can generate a system of astronomical state space [1, 54], rendering

the expansion to the species-based description often infeasible even to write down.

For example, if proteins D and E can bind via one bond, and each have n domains

which can all have two possible internal states, then there is 2n+2n+2n2n, different

molecular species formed only by these two molecules (for instance, equal to 16640

already for n = 7). With the huge state space, it becomes prohibitive to analyze the

rule-set by expanding it to its equivalent species description. However, since the

huge state space emerges from a small number of rules operating over patterns,

there is hope to capture the dynamics of a rule-set compactly, as a function of

patterns, which are much fewer than full molecular species. For that reason, we

turn to detecting those patterns, called fragments, which can faithfully describe

the dynamics of a rule-set. The term ‘fragment’ is chosen in the sense that it

is syntactically represented as a fragment of a full species, which is opposite to

51

Chapter 3. Automated reductions of rule-based models 52

the extensional characterization of a fragment, as a set of species to which it can

embed to.

To exemplify, consider the Example 2.1, and a projection from a system state z(t)to a state z(t) with three components zA, zB?, zAB?, such that

zA(t) = zA(t) (3.1)

zB?(t) = zB(t) + zBC(t)

zAB?(t) = zAB(t) + zABC(t)

Looking back at the system of ODE’s Section 2.6, since differentiation is a linear

operator, the derivatives of the new variables compute to

dzAdt

= −zAzB?k1 + zAB?k1− (3.2)

dzB?

dt= −k1zAzB? + k1− zAB?

dzAB?

dt= k2zAzB? − k1− zAB?.

The system (3.2) operates only over the variables zA, zB?, zAB?, that is, it self-

consistently describes their dynamics. By solving the smaller system (3.2), the full

dynamics of the concrete system is not known, but meaningful information about

the original system is obtained.

The system (3.2) is exactly the deterministic semantics of a reaction model

FA, FB?k1→ FAB? (3.3)

FAB?

k1−→ FA, FB?

operating over three ‘abstract species’, denoted by FA, FB? and FAB?. These

‘abstract species’ are called fragments. In particular, notice that, for example, the

contribution of fragment FB? with respect to rule R2 is zero. This is because FB

is consumed at rate k2zBzC , while FBC gets produced at the same rate. In total -

the two terms cancel out, and we say that rule R2 is silent with respect to FB?.

Fragment-based reductions aim to immediately derive the system (3.3), in contrast

to first expanding the equivalent species-based description, and then detecting


symmetries in the equations. It is therefore important to distinguish the fragment-

based reductions from other principled model simplification techniques, based on,

for example, separating time-scales [44, 55, 75] or exploiting conservation laws

[9, 14]. In fragment-based reductions, the species-based system is considered only

for the purpose of proving the relation between the reduced and the original model.

Still, once a fragment-based rule set is obtained, it is amenable to any further

analysis.

To this end, the fragment-based reductions follow the idea of static program anal-

ysis by abstract interpretation. Abstract interpretation is a unifying formal frame-

work for providing partial answers about mathematical structures, when the actual

problem is computationally expensive or even undecidable [16] while static anal-

ysis is preprocessing analysis, which makes conclusions about program semantics

before their executions. The theory of abstract interpretation is independent of a

particular application. In general, the applicability of the abstract interpretation

framework has two constraints: (i) the choice of the abstract domain needs to en-

sure the desired relation between the concrete and abstract semantics (soundness),

(ii) the way of computing the abstract semantics should be tractable.

The choice of the abstract domain in our case is the fragment-based domain. The

relation between the species-based and fragment-based domain will be discussed

in Chapter 5 and Chapter 7. Computing the abstract semantics is done by trans-

lating the rule-set, so that the concrete semantics of the new rule-set provides the

abstract semantics. Such an approach is tractable, and also practically appealing,

since it can be fed into any general-purpose rule-based quantitative analysis tool

(simulation engine).

3.1 Stochastic fragments: Motivating example

We start by elaborating the idea of stochastic fragments with Example 2.1.

In Figure 3.1a, the stochastic model for initially one copy of free SA, one copy of free

SC and three copies of free SB is represented. The description in terms of fragments

FA, FB?, FAB?, FC , F?BC means that states x3 and x4 are indistinguishable. Let


x34 ∶= x3 + x4. Then, we can compute the evolution of the fragment-based states:

dp(x34)dt

= dp(x3)dt

+ dp(x4)dt

= 3c2p(x1) + 3c1p(x2) − (c2− + c1−)(p(x3) + p(x4))

= 3c2p(x1) + 3c1p(x2) − (c2− + c1−)p(x34)dp(x1)


= 3c1p(x0) + c2−p(x34) − p(x1)(c1− + 2c2 + c2)dp(x2)


= 3c2p(x0) + c1−p(x34) − p(x2)(c1− + 2c1 + c1),

As the above set of equations is self-consistent, the CTMC in Figure 3.1b can be

used to compute the transient distribution of the lumped process. In the later

development of the theory, it will be shown that it can also be used to compute

the trace distribution of the lumped process.

Another property which will be argued on is that the conditional probability of

being in a state x3 or x4 can be recovered from that of x34 (introduced as ‘in-

vertability’ in Chapter 4). In particular, the theory will imply that the ratio

between the probability p(t)(x3) and p(t)(x4) can be reconstructed as the ratio of

automorphisms of site-graphs which represent the states x3 and x4 respectively:

p(t)(x3)p(t)(x4)

= ∣Aut(SAB, SB, SBC)∣∣SABC ,2SB∣

= 2

1. (3.4)

We show that (3.4) holds. Let ∆(t) ∶= 12p

(t)(x3) − p(t)(x4). Then,

d∆(t)dt

= −(c2− + c1−)∆(t)

has a unique solution ∆(t) = ∆(0)e−(c2−+c1−)t, meaning that the probability of

being in state x3 converges to being exactly two times larger than the probability

of being in state x4, and, combined with the self-consistency derivation, it follows

that p(t)(x3) = 23p

(t)(x34). If ∆(0) = 0, the ratio between probabilities will always

hold, and otherwise it will be the case asymptotically.

Finally, notice that, if, for example, the rate of unbinding SABC would be stronger

than the rate of unbinding SAB or SBC separately, it would not be possible to

write the equation for dp(x1)

dt and for dp(x2)

dt as a function of p(x34). In this case,


y0

y1

y2

y233c1

3c2

c1−

c1−

FA, 3FB?, 3F?B , FC

FAB?, 2FB?, 2F?B , FC

FA, 2FB?, 2F?B , 2F?BCFAB?, FB?, F?B , F?BC

3c2

3c2

c2−

c2−

x0

x1

x2

x3

x4

3c1

3c2 2c1

2c2

c1

c2 c1−

c2−

c1−

c2−

c1−

SA, 3SB , SC

SAB , 2SB , SC

SA, 2SB , SBC SABC , 2SB

SAB , SB , SBC

c2−

a) b)

Figure 3.1: Stochastic fragments: motivating example. a) The Markovgraph for x0 ≡ SA,3SB, SC; b) The fragment-based Markov graph.

the proposed fragmentation is not expressive enough, since it cannot express a

quantity which is necessary for the correct description of fragments’ dynamics.

Consequently, any proposed reduction with the same choice of fragments will not

be exact.

The goal of exact fragment-based reductions of stochastic rule-based models is to

generalize the made observations, so that the presented reduction can be detected

and performed on any rule-based program. The input to the fragmentation process

is (i) the set of observable species, patterns or their combination within a reaction

soup (for example, we can be interested in the average copy number of SA and SC ,

or the probability of being in the state with 100 patterns FAB? and 100 patterns

F?BC), (ii) the rule-set. The fragments should be chosen so that the dynamics of

the observables can be correctly and self-consistently computed from the fragment-

based description.

The detection of fragments involves characterizing the states of the CTMC that

can be lumped, and boils down to detecting groups of sites that a rule-set must

simultaneously ‘know’ in order to execute the rules correctly. For example, exe-

cuting a rule R3 in Example 2.1 demands determining whether the species SABC

embeds into the current reaction mixture, implying that the correlation between

values of sites a and c on node type B must be maintained.

In the rest of this Chapter, we address three general questions: (i) what are

fragments and what is the fragment-based semantics of a rule-based model, (ii)

how to evaluate reduction with fragments, (iii) how to compute the fragment-based

semantics efficiently.


3.2 Fragments

One way to formalize how fragments emerge from a rule-set is by grouping sites of

the same node type according to a binary relation. In this work, we will require

that this binary relation is an equivalence relation.

Definition 3.1. The annotation of a contact map (A,Σ,E ,I ) is a family of

equivalence relations ∼A⊆ Σ(A) ×Σ(A) ∣ A ∈ A. Let C ∶ A → P(P(S )) be such

that C (A) = Σ(A)/∼, for A ∈ A. The elements of C (A) will be called annotation

classes of node type A.

The informal meaning of s ∼A s′, is that the correlation between values of site

s and s′ in a node of type A should be maintained. Recall that a pattern is a

connected site-graph over a contact map C.

Definition 3.2. Let P be the set of all patterns over a contact map C. The set

of fragments induced by the annotation ∼AA∈A with a contact map C is

F = F ∈ P ∣F = (V,Type,I,E,ψ), such that for all v ∈ V ,

if Type(v) = A then I(v) ∈ C (A).

In words, the interface of a node in a fragment must equal exactly one of the

classes induced by annotation. When all sites of a node type A are correlated

by ∼A, the set of fragments is equal to the set of species. Therefore, whenever

C (A) = Σ(A), we deal with the species-based description. On the other extreme,

the biggest reduction is achieved when the relation ∼A is a diagonal - when each

site is correlated only with itself.

In Example 2.1, there are exactly two possible annotations, both represented in

Figure 3.2.

There are different ways of understanding fragments. In a static context, in the

intensional view, a fragment is a conceptual definition, which does not include

global knowledge about the rule-based system in which this fragment appears. On

the other hand, in the extensional view, a fragment stands for the set of species

which conform to the description of this fragment within the given rule-set. For


a b AB

c dB

C

b A d Ca

Bc

B

a b AB

c d C

a b AB

c dB

C

b A d Ca

Bc

a b AB

c d C

a)

b)

c

a a b AB

c d CF1 = S

F2

Figure 3.2: Example 2.1: The two possible annotations for Example 2.1 andrespective sets of fragments. a) The annotation on the contact map whichdefines the six species. b) The annotation where the sites a and c of node type

B are separated and six fragments emerge.

example,

intensional view : fragment = pattern ‘FB?’

extensional view : fragment = set of species SB, SBC.

Extensional view will be useful for proving properties related to the concrete,

species-based world. In algorithms, we will use the intensional view, because we

want to avoid the expansion to the reaction-, species-based world.

The dynamical notion of fragments is related to the semantics of interest. For

example, the fragmentation FA, FB?, FAB? with system (3.2) is appropriate for

describing the deterministic semantics, because (3.1) holds, but it does not directly

imply that the same set of fragments is appropriate for describing the stochastic

semantics.

3.3 Fragment-based semantics

In analogy to the definition of species-based semantics of a rule-based program, the

fragment-based view lumps all the reaction mixtures which are indistinguishable

when counting the number of occurrences of each fragment. Recall that the number

of distinct embeddings of fragment Fi in the mixture G is mG(Fi) = ∣Emb(Fi,G)∣∣Aut(Fi)∣

.


Definition 3.3. Given a set of fragments F = F1, F2, . . . , Fm, let ϕF ∶ G → Y ⊂Nm be such that

ϕF(G) = (y1, . . . , ym), with yi =mG(Fi),

and define

ϕ−1F (y) = G′ ≡ (V,Type, I ′,E′, ψ′) ∣G = (V,Type, I,E,ψ) ∈ G is such

that ϕF(G) = y and ϕF(G′) = y.

The process Yt over the state space Y, defined by [Yt = y iff Xt ∈ ϕ−1F(y)], is

referred to as the fragment-based semantics of (R,G0).

3.4 Reduction with fragments

Fragments provide potentially much smaller models than their species-based coun-

terparts. We investigate how much reduction can be achieved in relation to the

traditional, species-based view. Two parameters will be considered: (i) The di-

mension of the fragmentation; (ii) The number of states in a CTMC, for initially

n interacting particles.

The dimension of a fragmentation refers to the number of equations to be solved in

the deterministic model, or the dimension of the CTMC assigned to a rule-based

program. For example, the number of both species and fragments in Example 2.1

equals six, but the dimension of the fragmentation is five, because the quantity of

the fragment FAB? can be inferred from the quantities of other fragments:

mG(FAB?) =mG(F?B) +mG(F?BC) −mG(FB?),

inferred from a simple fact that the total number of nodes of type B can be

expressed as a sum of nodes B bound to A or not bound to A, or as a sum of

nodes B bound to C or not bound to C. The actual size of the CTMC can greatly

vary depending on the configuration of the initial state, that is, the distribution

of the n particles among the node types. In Example 2.1, if there is only one node

of type SB initially, the species-based CTMC will always have at most five states,

and the fragment-based CTMC will have at most four states, even if the particles

SA and SC are highly abundant. For that reason, we assume the case where the


number of each node type is initially equal, because it provides often the maximal

variety among the reachable states.

We first introduce the fragment-based state space Y as a lumping of the species-

based state space, X under function ϕ. Recall the functions ϕS ∶ G → X and

ϕF ∶ G→ Y.

Theorem 3.4. Let ϕ ∶ X → Y be such that

ϕ(x) = ϕF(G), where G ∈ ϕ−1S,V,Type(x),

for some node set V and type function Type ∶ V → A. Then, ϕ is unambiguously

defined for all x ∈ X .

Proof. Assume that for some G1,G2 ∈ G, ϕS(G1) = ϕS(G2) = (x1,⋯, xn), and let

ϕF(G1) = (y1, . . . , ym). Recall that, by (Chapter 2, Lemma 2.18), for every frag-

mentation F , and for all reaction mixtures G ∈ G,

mG(Fj) =∑Si

mG(Si)mSi(Fj). (3.5)

Then,

yj =mG1(Fj) =∑Si

mG1(Si)mSi(Fj), by (3.5)

=∑Si

xjmSi(Fj),by assumption

=∑Si

mG2(Si)mSi(Fj) =mG2(Fj), by (3.5).

Definition 3.5. Let Π ∶ F × S → N, such that Π(F,S) ∶= mS(F ). The expres-

siveness of F is span(Π), and the dimension of a fragmentation F is equal to

rank(Π). We say that the fragmentation F1 is more expressive than fragmentation

F2, written F1 ⪯ F2, if span(Π1) ⊇ span(Π2).

Expressiveness refers to all linear combinations which can be derived from frag-

ment quantities; The ground, species fragmentation is more expressive than any

fragmentation, since all fragment quantities can be described as a linear combina-

tion of species quantities. The dimension of a fragmentation can also be computed


y Bb

xAa

y Bb

xAa

y Bb

xAa

y Bb

xAa

rAc

l

rAc

l rAc

l

rAc

lF1 = SF1 = S

F2 F3

F4

F2 F3 F4

a) b) F5

rAc

l

Figure 3.3: Fragmentation lattice. a) Hasse diagram of a fragmentation latticefor Example 2.2, when c4 = c

−4 = 0; b) Hasse diagram of a fragmentation lattice

for Example 2.3.

directly from the annotation map, since the number of new conservation laws is

correlated with the number of annotation classes over each agent. The fragmen-

tation proposed with Example 2.1 is assigned a matrix Π, with rank five:

Π =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 0 0 0

0 1 0 0 1 0

0 1 0 1 0 0

0 0 1 0 0 0

0 0 0 1 0 1

0 0 0 0 1 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

A

B?

?B

C

AB?

?BC

Claiming that one fragment set is more expressive than another one, can easily

be detected by directly looking at their respective contact map annotations. In-

formally, we will say that one annotation refines another one, it it induces a more

expressive fragment set.

Definition 3.6. A contact map annotation ∼1A∣ A ∈ A refines a contact map

annotation ∼2A∣ A ∈ A, if for all A ∈ A, ∼1A⊇∼2A.

Lemma 3.7. If ∼1A refines ∼2A, then F1 ⪯ F2.

Proof. It suffices to show that the quantitymG(F2), of a fragment F2 = (V,Type,I,E,ψ) ∈F2, can be expressed as a linear combination of quantities of fragments in F1. Let

F ′ ≡ (V ′,Type′,I ′,E′, ψ′) ∈ F1 be such that Emb(F ′, F2) ≠ ∅. At least one such

F ′ must exist, since ∼1A⊇∼2A. Let now F ′1 = F ′′ ≡ (V ′′,Type′′,I ′′,E′′, ψ′′) ∣ V ′′ =

V ′,Type′′ = Type′,I ′′ = I ′, that is, the set of fragments over the same set of nodes


Examplesimple scaffold(3 node types)polymerization(2 node types)

conditionalactivation

(1 node type)

annot. dim. size of CTMC n = 3, n = 10, n = 100

F1 ≡ S 3 (n+1)(n+2)(n+3)6 20, 286, ∼ 10e5

F2 2 (n + 1)2 16, 121, 10201F1 ≡ S n > 3P (n) > 9, > 126, > 10e10F2 2 (n + 1)2 16, 121, 10201F1 ≡ S 8 C(n,8) 120, 19448, ∼ 10e10F2, F3, F4 5 C(n,2)C(n,4) 80, 3146, ∼ 10e7F5 4 C(n,2)3 64, 121, ∼ 10e6

Table 3.1: Summary of the reduction for different annotations in the exam-ples. The model of Example 2.2 is analyzed only with respect to the rulesR3,R

−3 ,R4,R

−4. The number of partitions of n is denoted by P (n) (the ap-

proximate formula is P (n) ≈ 14n

√3eπ

√2n3 [48]). C(n, k) = (

n+k−1k−1

) equals the

number of writing n as a sum of k non-negative integers. The size of the CTMCfor each example is estimated by the assumption of initially having n copies of

each node type.

and interfaces as F ′. Then,

mG(F2) = ∑F ′′∈F ′1

mG(F ′′)mF ′′(F2).

Since the set of all fragmentations over a given contact map is a partially ordered

set with respect to ⪯, all possible annotations of a considered contact map can be

presented by a lattice. In Figure 3.3, we present a lattice of fragmentations related

to the examples introduced in Section 2.6. Moreover, in Table 3.1, we compare the

dimensions and the size of the CTMC for different fragmentations. In particular,

in the example of polymerization, it is demonstrated that the fragmentation can

lead to an exponentially smaller state space. The detailed computation for the

entries in the table can be found in [37].

3.5 Computing fragment-based semantics

Given the rule-based program (R,G0) over the contact map C, our approach for

obtaining the fragment-based semantics is to construct a new rule-based program

(R, G0) over a modified contact map C, so that the species-based semantics of Rcoincides with the fragment-based semantics of R. The modification of a contact

map and of the rule-set is done according to the annotation classes.


In the following, we define how to transform a rule-based program, for any given

fragment set. The relation between the semantics of the original and the translated

model will be discussed in Chapter 5 and Chapter 7.

3.5.1 Translating the contact map

Assume a rule-based program (R,G0) over the contact map C = (A,Σ,E ,I ).Recall that, given annotation ∼A∣ A ∈ A, the annotation classes are captured by

a function C ∶ A→ P(P(S )), such that C (A) = Σ(A)/∼A .

a)

b

d

B

B

B

A

C

G

v2

v1

v3

v4

v5a

c

a

ca

c

b) τ

G

b

b

C

A

a

c

Ba

a

c

Ba

a

c

Ba

Bc

Bc

Bc

v1a

v1c

v2a

v2c

v3a

v5d

v3c

v4bR3

aBa b Ab

Bc

aBa b Ab

Bc

a b AB

a b AB

c1/nB

c−1 /nB

c1

c−1R1

Figure 3.4: Translating a rule, according to the annotation such that C (A) =

b, C (B) = a,c, C (C) = b. a) Translation of a rule; b) Transla-tion of a reaction mixture.

Definition 3.8. Given annotation ∼A∣ A ∈ A, the new contact map is C =(A, Σ, E , I ), such that

(i) A = AC1 ,AC2 , . . . ∣ A ∈ A and C1,C2, . . . ∈ C (A),

(ii) Σ(AC) = Σ(A)∣C ∈ C (A),

(iii) E = ((AC, s), (A′C′ , s

′)) ∣ s ∈ C, s′ ∈ C′ and ((A, s), (A′, s′)) ∈ E,

(iv) I = I .

In words, for each equivalence class C ∈ C (A) assigned to a node type A ∈ A, a

new node type is created. The interface of the new node type and the new edge

types are naturally inherited from the original one.

In graphical representation of a contact map, each node type A is ‘cut’ into as

many parts as many classes there are in the annotation of A. We now define how


to project a site-graph over contact map C to a site-graph over the new contact

map C. Let G be the set of all site-graphs over C (notice that, since G denotes

all reachable reaction mixtures in a rule-based program, G ⊆ G ), and G the set of

all site-graphs over C.

Definition 3.9. The function τ ∶ G → G maps a site-graph G = (V,Type, I,E,ψ) ∈G to a site-graph G = (V , ˜Type, Σ, E, ψ) ∈ G , that is, G = τ(G), if

(i) V = vC ∣ v ∈ V and C ∈ C (Type(v))

(ii) ˜Type(vC) = [Type(v)]C,

(iii) Σ(vC) = I(v)∣C,

(iv) E = ((vC, s), (v′C′ , s′)) ∣ s ∈ C, s′ ∈ C′ and ((v, s), (v′, s′)) ∈ E,

(v) ψ(vC, s) = ψ(v, s), if s ∈ C.

Similar to how the contact map is translated according to the contact map anno-

tation, each node of a site-graph is cut into as many parts as many classes there

are in its annotation.

In Figure 3.4b, we show an application of τ to one reaction mixture for Exam-

ple 2.1. The transformation from reaction mixture G to G will not be explicitly

done over the reaction mixtures; It will arise as a reachable mixture in a translated

rule-based program.

3.5.2 Translating the rule-based program

A rule-based program (R,G0) over a contact map C can be translated to a new

rule-based program (R, G0) over the transformed contact map C. Translation of

a rule-set is performed by translating each of the rules separately.

In Algorithm 1, we present how to translate a rule to a rule over the new contact

map. The lhs and rhs of a rule are translated according to the function τ , and

then the rate in the new rule is corrected. One rule translation is illustrated in

Figure 3.4a.

Definition 3.10. Given an annotation ∼A and a rule-based program (R,G0)over the contact map C, the reduced rule-based program is (R, G0) over the con-

tact map C (Definition 3.8), such that


Algorithm 1: Translating a rule.

Input : A rule R = (G,G′, c) over the contact map C and annotation ∼AA∈A;the state G.

Output: A rule R = (G, G′, c), over the contact map C.

G = τ(G);G′ = τ(G′);c = c;for all A ∈ A do

nA ∶=the total count of nodes of type A in G;if there exists a node of type A in G then

c = c ⋅ nA;for all equivalence classes in Σ(A)/∼A do

c = c/nA;

(i) G0 = τ(G0),

(ii) The translation of each rule is outlined in Algorithm 1.

We explain intuitively the translation of a rule-based program. Formal analysis

will follow in Chapter 5 and Chapter 7. Assume that the process assigned to

(R,G0) is in state G, and that the process assigned to (R, G0) is in state τ(G) = G.

Two requirements arise. First, application of any rule Ri ∈ R to G should be

mimicked by the application of a rule Ri ∈ R in the following way: if the result of

applying a rule Ri to a mixture G ∈ G is G′, then the result of applying a rule Ri

to a mixture G ∈ G should be G′, such that G′ = τ(G′). To this end, whenever a

copy of a node type AC is consumed (resp. produced), a copy of a node type of

each other class C′ ∈ C (A) must be consumed (resp. produced). This is achieved

by defining the translation from Ri to Ri by translating both lhs and rhs of a

rule by τ . Second, the rate must be appropriately adjusted, so that the number

of embeddings between the lhs of Ri ∈ R and G = τ(G) approximates (or, when

possible, equals) the number of embeddings between the lhs of Ri ∈R and G.

Chapter 4

Exact aggregation of Markov chains

Throughout this section, we consider a process Xt (either discrete- or continuous-

time), assigned to a Markov graph (S,w, p0), and a partitioning of the countable

set S, induced by a surjective function g ∶ S → S, where S = A1, . . . ,AM and

M < ∣S∣. The partition classes, induced by the inverse map g−1 ∶ S → P(S) with

g−1(A) = s ∣ g(s) = A will be denoted by [A]g or only [A], when the partition

function is clear from context. The set of probability distributions over S will be

denoted by D(S).

Definition 4.1. The g-projection of a stochastic process Xt over the state space

S is a stochastic process Yt over the state space S, defined by

Yt = g(Xt).

We can now define the exact Markov chain aggregation problem.

Problem 1. Construct a Markov graph (S, w, p0) with process Yt, so that Ytis equivalent to the g-projection of Xt.

To this end, we introduce the notion of g-aggregation, to be any Markov graph on

S. In other words, aggregation is a candidate solution to the aggregation problem.

We call the aggregation exact, if the aggregated Markov graph defines exactly

the projected process. Observe that the notion of projection refers to the process

Xt, while the notion of aggregation refers to the graph (S,w, p0).

Definition 4.2. Any Markov graph (S, w, p0) such that p0(A) = ∑s∈A p0(s) is

called a g-aggregation of (S,w, p0). Let Yt be the stochastic process assigned to

65

Chapter 4. Exact aggregation of Markov chains 66

(S, w, p0). If the g-projection of Xt is equivalent to Yt, the aggregation is said

to be exact. Otherwise, the aggregation is approximate.

Naturally, the projected process is not necessarily Markov time-homogeneous, and

the exact aggregation may not exist.

In this chapter, we study sufficient criteria for performing exact aggregations,

and we show how to construct the exact aggregation (if the criteria is met). In

particular, we study the properties of lumpability, a property which guarantees the

existence of exact aggregation, and invertability, a property of being able to invert,

reconstruct the original transient probabilities from the aggregate ones. Discrete-

and continuous-time Markov chains are studied separately. Each of the two cases

is summarized in a single theorem on how the trace distribution of the aggregate

process relate to the trace distribution of the original process.

4.1 Lumpability and invertability

Definition 4.3. If the g-projection of a process Xt assigned to a Markov graph

is again a Markov, time-homogeneous process, then Xt is said to be lumpable

with respect to the partition g. If Xt is lumpable with respect to g for every

initial distribution, it is also said to be strongly lumpable, and if it is lumpable for

some initial distribution, it is said to be weakly lumpable with respect to g.

In practice, it is useful to characterize lumpability only from the graph description

of the process. The criteria we present relate to the Markov graph, and they apply

both for DTMC and CTMC.

Let (S,w, p0) a Markov graph.

Forward criterion. Define a function δ+ ∶ S × S → R by

δ+(s,A) = ∑s′∈[A]

w(s, s′),

and specify the following condition.

(Cond1) For all A,A′ ∈ S and s, s′ ∈ [A], δ+(s,A′) = δ+(s′,A′).


The condition can be interpreted as follows. Consider being in some state s ∈ [A],and transitioning to a state in [A′]. In discrete-time case, the value of δ+(s,A′)represents the probability of this event, more precisely, P(Xn+1 ∈ [A′] ∣ Xn = s),and (Cond1) states that this probability is the same no matter which state s in

[A] is taken as initial.

Backward criterion. Let α ∶ S → D(S) be a family of probability measures on

S, such that α(A, s) = 0 for s ∉ A. Intuitively, one can think of α(A, s) as the

probability of process Xt being in state s, conditioned on the projected process

g(Xt) being in state A = g(s). Define δ− ∶ S × S → R≥0 by

δ−(A′, s′) = ∑s∈[A]

α(A, s)w(s, s′)α(A′, s′) , where s ∈ [A],

and specify the following condition.

(Cond2) For all A,A′ ∈ S and s, s′ ∈ [A′], δ−(A, s) = δ−(A, s′).

An intuitive interpretation is the following. Consider a transition from a state in

[A] to a state in [A′], and fix two states s′1, s′2 ∈ [A′]. The condition (Cond2)

states that the proportion of ending in either of the two states should always be

the same. For example, in discrete-time, this means that, for all n = 0,1,2, . . ., for

all A,A′ ∈ S, and for all s1, s2 ∈ [A′],

P(Xn+1 = s′1 ∣Xn ∈ [A′])P(Xn+1 = s′2 ∣Xn ∈ [A′]) = α(A

′, s′1)α(A′, s′2)

.

The value δ−(A, s′1) represents the fraction α(A′, s′1) of probability P(Xn+1 = s′1 ∣Xn ∈ [A′]).

Definition 4.4. Given a probability distribution π ∈ D(S), if

π(s)∑s′∈[A] π(s′)

= α(A, s), for all A ∈ S, for all s ∈ [A],

we say that π respects α.

Note that whenever ∣S∣ = M > 1, there are infinitely many distributions which

respect α.

It sometimes happens that the transient probability of begin in a concrete state,

conditioned on being in the respective aggregate state is invariant of time. Then,


one can invert, reconstruct the original transient probabilities from the aggregate

ones.

Definition 4.5. If the transient distributions of Xt respect α for all t ∈ R≥0, we

say that Xt is invertible with respect to α.

Remark 4.6. Before continuing, we discuss informally the derivation of the two

conditions in discrete-time. The projected process is Markov, time-homogeneous,

if the probability P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) depends on no particular states s ∈ [A],s′ ∈ [A′], nor on time n. Notice that

P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) = ∑s′∈[A′]

P(Xn+1 = s′ ∣Xn ∈ [A])

= ∑s∈[A]

∑s′∈[A′]

P(Xn+1 = s′ ∣Xn = s)P(Xn ∈ [A])

= ∑s∈[A]

P(Xn = s ∣Xn ∈ [A]) ∑s′∈[A′]

w(s, s′).

When asking for strong lumpability, the conditional probabilty P(Xn = s ∣ Xn ∈[A]) can vary, since the initial distribution should not influence lumpability. Then,

one must ask for (Cond1), which implies

P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) = ∑s∈[A]

P(Xn = s ∣Xn ∈ [A])δ+(s,A′)

= δ+(s,A′), for any s ∈ [A].

For ensuring weak lumpability, one may assume that P(Xn = s ∣ Xn ∈ [A]) =α(A, s) holds. Then, (Cond2) must be imposed to guarantee that the property is

preserved also for n + 1, in which case

P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) = ∑s′∈[A′]

α(A′, s′)δ−(A, s′)

= δ−(A, s′) for any s′ ∈ [A′].

4.2 Discrete-time case

We outline two simple criteria for proving that a given Markov chain is lumpable

with respect to a given partition.


Let (S,w, p0) be a Markov chain with DTMC Xn.

4.2.1 Forward criterion

Theorem 4.7. Suppose that (Cond1) holds. Fix s ∈ [A] and let for each A ∈ S,

w(A,A′) = δ+(s,A′). Then, the aggregation (S, w, p0) is well-defined and exact.

Proof. We show only the well-definedness part. Notice that, by (Cond1), the

aggregation is unambiguously defined. Moreover, for every state A ∈ S, w(A, ∶) is

a probability distribution: let s ∈ [S]. Then,

∑A′∈S

w(A,A′) = ∑A′∈S

δ+(s,A′)

= ∑A′∈S

∑s′∈A′

w(s, s′)

= ∑s′∈S

w(s, s′) = 1.

Corollary 1. The process Xn is strongly lumpable with respect to g. Moreover,

the process Xn is strongly lumpable with respect to g only if (Cond1) holds.

4.2.2 Backward criterion

Definition 4.8. Suppose that (Cond2) holds. Fix s ∈ A′ and let w(A,A′) =δ−(A, s) for each A′ ∈ S. Then, the aggregation (S, w, p0) is called the (g,α)-aggregation of (S,w, p0). If the partition g is clear from the context, we write only

α-aggregation.

Theorem 4.9. The α-aggregation is well-defined.

Proof. The aggregation is unambiguously defined. Notice that, by (Cond2),

α(A′, s′)w(A,A′) = ∑s∈[A]

α(A, s)w(s, s′).


Summing over s′ ∈ A′, we have

w(A,A′) = ∑s∈[A]

∑s′∈[A′]

α(A, s)w(s, s′).

It follows that

∑A′∈S

w(A,A′) = ∑s∈[A]

∑s′∈S

α(A, s)w(s, s′) = 1.

Notice that α-aggregation is well-defined whenever (Cond2) holds, regardless of the

initial distribution of Xn. This will be important when discussing the asymptotic

behavior.

Theorem 4.10. If (Cond2) holds and p0 respects α, the α-aggregation (S, w, p0)is exact.

The proof is outlined together with the proof of Theorem 4.12.

Corollary 2. If (Cond2) holds and p0 respects α, the process Xn is weakly

lumpable with respect to g.

Example 4.1. We demonstrate in Figure 4.1 that it may happen that (Cond2)

holds, but (Cond1) doesn’t. This is consistent with the intuition of (Cond1) being

a more restrictive condition, since it implies strong lumpability. However, we

also demonstrate that it may happen that (Cond1) holds, but (Cond2) doesn’t.

Therefore, neither of the conditions is strictly more restrictive than the other.

On a practical note, the observation suggests that, when building algorithms for

detecting lumpable partitions, it makes sense to check both conditions (Cond1)

and (Cond2). The observation is summarized in the following Lemma.

Lemma 4.11. Given a Markov graph (S,w, p0) with process Xn, let X denote

all possible partition functions on S. Define the set

PSX = g ∈ X ∣ Xn is strongly lumpable with respect to g,

PWX = g ∈ X ∣ Xn is weakly lumpable with respect to g,

CSX = g ∈ X ∣ (S,w, p0) satisfies (Cond1) with respect to g,

CWX = g ∈ X ∣ (S,w, p0) satisfies (Cond2) with respect to g and some α.

Then, (i) for all X, CSX = PSX ⊂ PWX , (ii) for all X,CWX ⊂ PWX , (iii) there

exist X, such that CSX∖CWX ≠ ∅ and (iv) there exist X, such that CWX∖CSX ≠


∅. In other words, (Cond1) is sufficient and necessary for the process Xn to be

strongly lumpable with respect to g, (Cond2) is sufficient, but not necessary for

the process Xn being weakly lumpable with respect to g, (Cond1) is neither

necessary nor sufficient condition for (Cond2).

x

y1

y2

y3

z1

z2

z3

1/2

1/3

1

1

11/3

1/3

1/2

1/2

1/2

1/2

1/2

x y

z12

z3

1

2/3

1/3

1/2

1/2

1/2

1/2

x

z12

z31/2

1/2

1/2

1/2

y12

y3

1

1

2/3

1/3

2/3

1/3

y12

y3

x z1/2

1/2

1

1

a) b) c)

Figure 4.1: An example of a DTMC and three possible aggregations. Let S =

x, y1, y2, y3, z1, z2, z3 and the transition matrix is as specified in the graph above.

Let g1, g2, g3 denote the three partitions shown in a), b) and c) (the states y1 and y2map to the state y12 etc). Then, g1 meets both properties (Cond1) and (Cond2) (by

taking α(y12, y1) = α(y12, y2) = 0.5 and α(z12, z1) = α(z12, z2) = 0.5). Furthermore, g2does not satisfy the property (Cond1), because, for example, δ+(y1, z12) ≠ δ

+(y3, z12),

but it does satisfy the property (Cond2), for α(y, y1) = α(y, y2) = α(y, y3) = 1/3 and

α(z12, z1) = α(z12, z2) = 0.5. Finally, g3 satisfies (Cond1), but (Cond2) fails, because,

for example δ−(y12, z1) ≠ δ−(y12, z3).

The coming discussion on invertability explains why (Cond2) is sometimes more

restrictive than (Cond1).

4.2.3 Invertability

The condition (Cond2) enforces more than lumpability: one can also invert, or

reconstruct the transient probabilities from the aggregated process.

Let (S,w, p0) be a Markov graph with a DTMC Xn. Recall that the Xn is

invertible with respect to α if all the transient distributions of X0,X1, . . . respect

α.

Theorem 4.12. If (Cond2) holds and p0 respects α, then Xn is invertible with

respect to α.


Instead of proving directly Theorem 4.12, we show first a stronger statement which

relates the transient distributions of the original and the aggregated process. No-

tice that the following Theorem does not yet show that the α-aggregation is exact

(Theorem 4.10), because it does not show that the projected process is Markov

time-homogeneous.

Theorem 4.13. Let Y ′n denote the process assigned to (S, w, p0). If p0 respects

α, then

(i) P(Xn ∈ [A]) = P(Y ′n = A),

(ii) P(Xn = s) = P(Y ′n = A)α(A, s).

We will prove Theorem 4.13 with a help of two Lemmas.

Lemma 4.14. Assume that P(Xn−1 = s ∣ Xn−1 ∈ [A]) = α(A, s) for all A ∈ S and

all s ∈ S. Then, P(Xn ∈ [A′] ∣Xn−1 ∈ [A]) = w(A,A′).

Proof. Notice that the joint probability P(Xn ∈ [A′],Xn−1 ∈ [A]) can be written

as w(A,A′)P(Xn−1 ∈ [A]):

P(Xn ∈ [A′],Xn−1 ∈ [A]) = ∑s∈[A]

∑s′∈[A′]

P(Xn = s′,Xn−1 = s)

= ∑s∈[A]

∑s′∈[A′]

P(Xn−1 = s)P(Xn = s′ ∣Xn−1 = s)

= ∑s∈[A]

∑s′∈[A′]

P(Xn−1 = s)w(s, s′)

= P(Xn−1 ∈ [A]) ∑s′∈[A′]

⎛⎝ ∑s∈[A]

α(A, s)w(s, s′)⎞⎠,by the hypotheses

= P(Xn−1 ∈ [A])w(A,A′) ∑s′∈[A′]

α(A′, s′),by (Cond2)

= w(A,A′)P(Xn−1 ∈ [A]).

Lemma 4.15. Assume that p0 respects α. Then, P(Xn = s ∣Xn ∈ [A]) = α(A, s).

Proof. We use induction. Suppose that the statement holds for k = n − 1. First

observe that if s ∉ [A], then both sides equal zero. So assume that s ∈ [A]. Then,


by Lemma 4.14, we have that P(Xn ∈ [A′] ∣ Xn−1 ∈ [A]) = w(A,A′), which is used

to show that

P(Xn ∈ [A′]) = ∑A∈S

P(Xn−1 ∈ [A])P(Xn ∈ [A′] ∣Xn−1 ∈ [A])

= ∑A∈S

P(Xn−1 ∈ [A])w(A,A′).

Next note that

P(Xn = s′) =∑s∈S

P(Xn−1 = s)w(s, s′)

= ∑A∈S

∑s∈[A]

P(Xn−1 ∈ A)P(Xn−1 = s ∣Xn−1 ∈ [A])w(s, s′)

= ∑A∈S

∑s∈[A]

P(Xn−1 ∈ A)α(A, s)w(s, s′), by the hypotheses

= ∑A∈S

P(Xn−1 ∈ A)α(A′, s′)w(A,A′), by (Cond2).

= α(A′, s′)∑A∈S

P(Xn−1 ∈ A)w(A,A′).

The claim is obtained by dividing the two expressions.

Proof. (Theorem 4.13) We use induction. Notice that both statements hold for n =0. Assume that (i) and (ii) hold for n−1. Then P(Xn−1 = s ∣Xn−1 ∈ [A]) = α(A, s),and hence by Lemma 4.14 P(Xn ∈ [A′] ∣Xn−1 ∈ [A]) = w(A,A′). Therefore,

P(Y ′n = A′) = ∑

A∈S

P(Y ′n−1 = A)w(A,A′)

= ∑A∈S

P(Xn−1 ∈ [A])P(Xn ∈ [A′] ∣Xn−1 ∈ [A])

= P(Xn ∈ [A′]).

This proves (i). Next notice that Lemma 4.15 implies

P(Xn = s) = α(A, s)P(Xn ∈ [A])

= α(A, s)P(Y ′n = A), by (i).


Proof. (Theorem 4.12) The proof follows immediatelly from Theorem 4.13.

We now get back to show Theorem 4.10. In Theorem 4.13, we showed that the

transient distributions of g-projection of Xn and its aggregation are equivalent.

We need still to show that the g-projection of Xn is a Markov time-homogeneous

process.

Theorem 4.16. If (Cond2) holds and p0 respects α, then for n = 0,1, . . . and for

all sequences of states A0, . . . ,An ∈ S,

P(X0 ∈ [A0], . . . ,Xn ∈ [An]) = p0(A0)n−1

∏i=0

w(Ai,Ai+1).

Similarly as in the case of Theorem 4.13, we first show a helpful Lemma.

Lemma 4.17. Define the following two properties

Φ1(n): for all sequences of n + 2 states A0, . . . ,An−1,A,A′ ∈ S,

P(Xn ∈ [A′] ∣Xn−1 ∈ [A],Xn−1 ∈ [An−1], . . . ,X0 ∈ [A0]) = w(A,A′).

Φ2(n): for all sequences of n + 1 states A0, . . . ,An−1,A ∈ S,

P(Xn = s ∣Xn ∈ [A],Xn−1 ∈ [An−1], . . . ,X0 ∈ [A0]) = α(A, s).

Then,

(i) Φ2(n − 1) implies Φ1(n), and

(ii) Φ1(n) and Φ2(n − 1) imply Φ2(n).


Proof. Assume that Φ2(n − 1) holds. Denoting the history until time n − 2, that

is, Xn−2 ∈ [An−2], . . . ,X0 ∈ [A0] by HXn−2 , we obtain

P(Xn ∈ [A′],Xn−1 ∈ [A] ∣ HXn−2)

= ∑s′∈[A′]

∑s∈[A]

P(Xn = s′,Xn−1 = s ∣ HXn−2)

= ∑s′∈[A′]

∑s∈[A]

P(Xn = s′ ∣Xn−1 = s,HXn−2)P(Xn−1 = s ∣ HXn−2),

since Xn is Markov and by Φ2(n − 1),

= ∑s′∈[A′]

∑s∈[A]

w(s, s′)α(A, s)P(Xn−1 ∈ [A] ∣ HXn−2),

= P(Xn−1 ∈ [A] ∣ HXn−2) ∑s′∈[A′]

∑s∈[A]

w(s, s′)α(A, s),

= P(Xn−1 ∈ [A] ∣ HXn−2)w(A,A′) ∑s′∈[A′]

α(A′, s′),by (Cond2),

= P(Xn−1 ∈ [A] ∣ HXn−2)w(A,A′).

Second, we assume that Φ1(n) and that Φ2(n − 1) hold. We need to show that

Φ2(n) holds. Notice that

P(Xn = s′,Xn−1 ∈ [A] ∣ HXn−2) = ∑s∈[A]

P(Xn = s′,Xn−1 = s ∣ HXn−2)

= ∑s∈[A]

P(Xn−1 = s ∣Xn−1 ∈ [A],H)w(s, s′)

= ∑s∈[A]

P(Xn−1 ∈ [A] ∣ HXn−2)α(A, s)w(s, s′),

since Φ2(n − 1) holds,

= P(Xn−1 ∈ [A] ∣ HXn−2) ∑s∈[A]

α(A, s)w(s, s′),

= P(Xn−1 ∈ [A] ∣ HXn−2)α(A′, s′)w(A,A′),

since Φ1(n) holds,

= α(A′, s′)P(Xn ∈ [A′],Xn−1 ∈ [A] ∣ HXn−2).

Notice that the statement Φ2(0) holds since p0 respects α. Then, by (i), Φ1(1)holds. Further, by (ii), Φ2(1) holds etc.


Corollary 3. If (Cond2) holds and p0 respects α, then Φ1(n) and Φ2(n) hold for

n = 0,1,2, . . ..

Proof. (Theorem 4.16 and Theorem 4.10) Let Y ′n denote the aggregated process

and Yn the g-projection of process Xn. By Corollary 3, since Φ1(n) holds

for all n ∈ N, the process Yn is Markov. Moreover, it has the transition matrix

equivalent to the one of Y ′n, which also concludes the proof of Theorem 4.10.

The forward and backward criteria for lumpability in case of DTMC’s have been

first noticed by Kemeny and Snell [56]. Here, these results were adapted to our

terminology of projection and aggregation (in order to have a unifying framework

for exact and approximate aggregations). From this point on, we present several

extension to the theory: (i) invertability property, and invertability in case that the

initial distribution does not respect α, (ii) continuous-time case, (iii) interpretation

of the results between trace distributions of the original and aggregated chain (both

for discrete- and continuous-time case).

4.2.4 Convergence

In the previous section, we proved that, if the initial distribution respects α, then

(Cond2) implies lumpability (with respect to g) and invertability (with respect to

α). We now investigate the case when the initial distribution doesnt respect α.

We assume throughout to be given a DTMC Xn over the Markov graph (S,w, p0).Moreover, we will assume that (Cond2) is satisfied for some α, and we denote the

α-aggregation by (S, w, p0).

Theorem 4.18. Let Yn be a process assigned to (S, w, p0). Then,

(i) If Xn is irreducible, then so is Yn.

(ii) If state s is aperiodic in Xn, then the state g(s) is aperiodic in Yn.

We start with the following Lemma.

Lemma 4.19. For every A,A′ ∈ S and every s′ ∈ [A′],

P(Yn = A′ ∣ Y0 = A) = ∑s∈[A]α(A, s)P(Xn = s′ ∣X0 = s)α(A′, s′) .


Proof. We use induction. Notice that, for n = 1, the assertion is true by the

definition of w. Assume that the statement holds for some n. Notice that

P(Xn+1 = s′ ∣X0 = s) = ∑s′′∈S

P(X1 = s′′ ∣X0 = s)P(Xn+1 = s′ ∣X1 = s′′)

= ∑A′′∈S

∑s′′∈[A′′]

w(s, s′′)P(Xn+1 = s′ ∣X1 = s′′).

By multiplying with α(A, s) for all s ∈ [A], and summing up, we obtain

∑s∈[A]

α(A, s)p(n+1)(s, s′) = ∑s∈[A]

α(A, s) ∑A′′∈S

∑s′′∈[A′′]

w(s, s′′)p(n)(s′′, s′)

= ∑A′′∈S

∑s′′∈[A′′]

α(A′′, s′′)w(A,A′′)p(n)(s′′, s′),

= ∑s′′∈[A′′]

α(A′′, s′′)p(n)(s′′, s′),by the hypotheses,

= α(A′, s′)P(Yn+1 = A′ ∣ Y1 = A′′),

where the shorthand p(n)(s, s′) was used for P(Xn = s′ ∣X0 = s). Finally, we obtain

∑A′′∈S α(A′, s′)P(Yn+1 = A′ ∣ Y1 = A′′)w(A,A′′), which is exactly α(A′, s′)P(Yn+1 =A′ ∣ Y0 = A).

Proof. (Theorem 4.18) A consequence of Lemma 4.19 is that if for s ∈ [A] and

s′ ∈ [A′], if s → s′ (s communicates with s′, Definition 1.30), then A → A′ in the

DTMC Yn. It implies (i). The claim (ii) follows from

n ∣ P(Xn = s,X0 = s) > 0 ⊆ n ∣ P(Yn = A,Y0 = A).

Theorem 4.20. Let Yn be a process assigned to (S, w, p0), and let Xn also

be irreducible and aperiodic. Then

(i) The process Xn has a unique stationary distribution µ, and µ respects α.

(ii) The process Yn has a unique stationary distribution µ, and µ(A) = ∑s∈[A] µ(s).

(iii) The g-aggregation of (S,w,µ) to (S, w, µ) is exact.

The fact (iii) tells that using the aggregation (S, w, p0) will be justified in the

limit.


Proof. The process Xn has a unique stationary distribution µ, by Theorem 1.33.

Assume that X ′n is assigned to a DTMC (S,w, p′0) and that p′0 respects α. Since

µ is a unique stationary distribution for every chain with transition probabilities

defined by w, the transient distribution of X ′n converges to µ. By Theorem 4.10,

since all transient distributions respect α, µ also respects α.

By Theorem 4.18, the chain Yn is also aperiodic and irreducible, so it has a

unique stationary distribution µ. Since µ respects α, (ii) and (iii) immediately

follow.

4.3 Continuous-time case

Let (S,w, p0) be a Markov graph with a CTMC Xt. In addition, we will assume

that there exists an r > 0 such that supi a(si) < r. Notice that this assumption

does not mean that the state space of the chain is finite, but that there are finitely

many transitions with non-zero weights originating in a particular state in S. A

chain satisfying such a condition is told to be finitely branching.

Then, the analogue of Theorem 4.7 and Corollary 1 trivially hold.

Theorem 4.21. Suppose that (Cond1) holds. Fix s ∈ Ai and let for each A ∈ S,

w(A,A′) = δ+(s,A′). Then, the aggregation (S, w, p0) is well-defined and exact.

Corollary 4. The process Xt is strongly lumpable with respect to g. Moreover,

he process Xt is strongly lumpable with respect to g only if (Cond1) holds.

We now discuss lumpability under the condition (Cond2).

Definition 4.22. Suppose that (Cond2) holds. Fix s ∈ A′ and let w(A,A′) =δ−(A, s) for each A′ ∈ S. Then, the aggregation (S, w, p0) is called the (g,α)-aggregation. If the partition g is clear from the context, we write only α-aggregation.

Recall that Xt is invertible with respect to α if all the transient distributions

of Xt respect α for all t ∈ R≥0. We now merge the results for lumpability and

invertability in continuous-time in the following Theorem.

Theorem 4.23. If (Cond2) holds and p0 respects α, the process Xt is lumpable

with respect to g. Moreover, the g-aggregation (S, w, p0) is well-defined, exact and

invertible with respect to α.


The Theorem will be shown by constructing a uniformized discrete time Markov

chain out of Xt. We first need to show the commutativity relation between

aggregating and uniformizing a CTMC. An illustration is provided in Figure 4.2.

uniformizationXt

Y t

Zn

aggregation Zn

Z n

Ytprojection

Figure 4.2: Illustration for Theorem 4.24.

Theorem 4.24. Let Q be the generator matrix with sups ∣qs∣ < r for some r, and

let

(i) Xt be the CTMC assigned to the Markov graph (S,w, p0),

(ii) Y ′t be the CTMC assigned to the (g,α)-aggregation of (S,w, p0),

(iii) Zn be the uniformized chain of Xt with parameter r,

(iv) Zn be the uniformized chain of Y ′t with parameter r,

(v) Z ′n be the DTMC assigned to the (g,α)-aggregation of Zn.

Then Z ′n ≡ Zn.

Proof. The process Zn with a transition matrix M meets the condition (Cond2)

with respect to α. Denote by δ−M and δ−Q the backward conditions with respect to

matrices M with entries m(s, s′) and Q with entries q(s, s′) respectively. Since

Zn is a uniformization of Xt with constant r, M = Q/r + IN . For s′ ∉ [A],

δ−M(A, s′) = ∑s∈[A]

α(A, s)m(s, s′)α(A′, s′) = 1

r∑s∈[A]

α(A, s)q(s, s′)α(A′, s′) = 1

rδ−Q(A, s′),


and, if s′ ∈ [A],

δ−M(A, s′) = ∑s∈[A]

α(A, s)m(s, s′)α(A′, s′)

= ∑s∈[A]∖s′

α(A, s)m(s, s′)α(A′, s′) + (1 + q(s

′, s′)r

)

= 1 + 1

r

⎛⎝ ∑s∈[A]∖s′

α(A, s)q(s, s′)α(A′, s′) + q(s′, s′)

⎞⎠

= 1 + 1

r

⎛⎝ ∑s∈[A]

α(A, s)q(s, s′)α(A′, s′)

⎞⎠= 1 + 1

rδ−Q(A, s′).

The value δ−Q(A, s′) does not depend on s′, since, by assumption, (Cond2) holds

for matrix Q. The rest of the claim follows trivially.

We first show the analogue of Theorem 4.13.

Theorem 4.25. Let Y ′t denote the process assigned to (S, w, p0). If p0 respects

α, then

(i) P(Xt ∈ [A]) = P(Y ′t = A),

(ii) P(Xt = s) = P(Y ′t = A)α(A, s).

Proof. Notice that

P(Y ′t = A) =∑

n≥0

P(Zn = A)e−rt(rt)nn!

=∑n≥0

P(Z ′n = A)e

−rt(rt)nn!

=∑n≥0

P(Zn ∈ [A])e−rt(rt)nn!

=∑n≥0

∑s∈[A]

P(Zn = s)e−rt(rt)n

n!

= ∑s∈[A]

P(Xt = s) = P(Xt ∈ [A]),

where the second equality is by Theorem 4.24 while the third is by Theorem 4.10.

It is similarly shown that P(Xt = s ∣Xt ∈ [A]) = α(A, s).

Proof. (Proof sketch, Theorem 4.23) The well-definedness follows by the same

arguments as for Theorem 4.9. By Theorem 4.25, the process Xt is invertible

with respect to α, and, moreover, the process Yt and Y ′t agree at all transient


distributions. It remains to show that Yt is a CTMC, that is, for all A,A′ and

for all t, h ≥ 0,

P(Xt+h = A′ ∣Xt = A,HXt) = P(Xh = A′ ∣X0 = A).

This can be proven by induction on the number of jumps occurred in history HXt ,analogous to the proof of Lemma 4.17.

We complement the proof by showing that the g-projection Yt meets the Chapman-

Kolmogorov equations. To that end, notice that for all t, h ∈ R≥0,

∑A′∈S

P(Yt = A′ ∣ Y0 = A)P(Yh = A′′ ∣ Y0 = A′)

= ∑A′′∈S

∑n≥0

n

∑k=0

P(Zk = A′ ∣ Z0 = A)P(Zn−k = A′ ∣ Y0 = A′)e−r(t+h)rn tkhn−k

k!l!

n!

n!

=∑n≥0

P(Zn = A′′ ∣ Z0 = A)e−r(t+h)rn

n!

n

∑k=0

(nk)tkhn−k, since Zn is a DTMC,

=∑n≥0

P(Zn = A′′ ∣ Z0 = A)e−r(t+h)(r(t + h))n

n!= P(Yt+h = A′′ ∣ Y0 = A),

where the last equality follows from Theorem 4.24.

Let µ be a stationary distribution of the CTMC Xt. Then we have the corre-

sponding analogue of Theorem 4.20.

Theorem 4.26. Let Xt be a CTMC assigned to a chain (S,w, p0) with an

irreducible generator Q, and with supsi ∣a(si)∣ < r for some r > 0. Let µ be the

stationary distribution for Q. Moreover, let Yt be the (g,α)-aggregation of Xtwith stationary distribution µ. Then,

(i) The process Xt has a unique stationary distribution µ, and µ respects α.

(ii) The process Yt has a unique stationary distribution µ, and µ(A) = ∑s∈[A] µ(s).

(iii) The (g,α)-aggregation from (S,w,µ) to (S, w, µ) is exact.

Proof. We first consider the uniformized chain Zn corresponding to Xt with

transition matrix M = Q/r + IN . Note that µ is the stationary distribution for M.

If follows by Theorem 4.20 that µ is the stationary distribution for Zn, hence

for Y ′t .


4.4 Trace semantics of stochastic processes

We now define a probability space directly on the traces of a given stochastic

process, which will be helpful for reasoning about approximate reductions.

Let Xt, t ∈ T be a stochastic process. Define a measurable space (Γ,FΓ), such

that Γ contains all traces of process Xt and FΓ is some suitably chosen set of

measurable sets of traces. A trace γ ∈ Γ is a mapping of T to S, and the set Γ lies in

the space of all fuctions from T to S, denoted by (T → S) or ST . The probability

measure PΓ is then inherited from the finite-dimensional marginal distributions

of Xt, and one has to take care of defining it properly. The probability space

(Γ,FΓ,PΓ) is sometimes called a directly given random process or a Kolmogorov

model for the random process [46].

4.4.1 Trace semantics: discrete-time

For example, in a directly given random process of a discrete time process Xn, n ∈N, a trace is an infinite sequence of states from S:

γ ≡ s0 → s1 → s2 → . . . ∈ SN,

and it is also called the output sequence or a discrete signal. We refer to the i-th

state of trace γ by sγi . For a given k ∈ N0, and a sequence of states s0, . . . , sk ∈ S,

let Prefix(s0, . . . , sk) denote the set of all the traces with a prefix (s0, . . . , sk):

Prefix(s0, . . . , sk) = γ ∈ Γ ∣ sγi = si, for all i = 0,1, . . . , k.

Let the event space FSN be the smallest σ-algebra containing all the finite prefix

sets of traces. The elements of FSN are sometimes called the rectangle sets of

traces. Finally, the probability measure PSN is naturally inherited from the original

probability measure with

PSN(Prefix(s0, . . . , sk)) = P(X0 = s0, . . . ,Xk = sk).

Notice that any finite-dimensional distribution related to the process Xn can be

determined through the set FSN . Random variables can now be defined over the

probability space (SN,FSN ,PSN) as usually.


4.4.2 Trace semantics: continuous-time

We now construct a directly given random process for a continuous-time process

Xt, t ∈ R≥0. Let Γ(R+,S) denote the set of all piece-wise constant, right-continuous

functions from R≥0 to S. Recall that any trace in Γ(R+,S) is characterized by an

infinite sequence of jump times and jump states:

γ ≡ s0t1→ s1

t2→ s2 . . . ∈ Γ(R+,S).

We refer to the i-th visited state of trace γ by sγi , and the time instance at which the

i-th jump occured by tγi . Given k ∈ N, let Cylinder(s0, . . . , sk, δ0, . . . , δk−1) denote

the set of traces which visit a sequence of states s0, . . . , sk, with respective waiting

times no longer than δ0, . . . , δk−1:

Cylinder(s0, . . . , sk, δ0, . . . , δk−1) = γ ∈ Γ(R+,S) ∣sγi = si for all i = 0, . . . , k, and

tγi+1 − tγi < δi for all i = 0, . . . , k − 1,

and let FΓ(R+,S)be the smallest σ-algebra containing all such sets (for all k ∈ N,

s0, s1, . . . , sk ∈ S and δ0, . . . , δk−1 ∈ R≥0). The elements of FΓ(R+,S)are called cylinder

sets of traces. Notice that a cylinder set of traces can take any subset of S at the

i-th jump, and the jump times may lie in any interval in B. The probability

measure is naturally inherited from the original process Xt:

PΓ(R+,S)(Cylinder(s0, . . . , sk, δ0, . . . , δk−1)) = P(τi < δi, for all i = 0, . . . , k − 1 and

Zi = si, for all i = 0, . . . , k).

4.5 Trace semantics interpretation of exact aggregations

We summarize the results of this chapter by relating the trace semantics of the

aggregated chain with the trace semantics of the original chain.

4.5.1 Discrete-time case

Let (SN,FSN ,PSN) be the directly given random model for process Xn. In par-

ticular, define the following three measurements (random variables):


(i) Prefix trace semantics, defined by X0∶k(γ) = (sγ0 , . . . , sγk), inherits the joint

distribution (X0, . . . ,Xk) of the original process Xn. The probability space

generated by the random variable X0∶k is (Sk+1,P(Sk+1),PX0∶k). For some

D0, . . .Dk ∈ P(S), a rectangle set of traces (D0, . . . ,Dk) = ∪(s0, . . . , sk) ∣ si ∈Di, i = 0, . . . , k will be denoted by D0 → . . . → Dk. When it is clear from

the context that all Si are singletons, we write s0 → . . . → sk instead of

s0→ . . .→ sk.

(ii) Transient semantics, defined by Xk(γ) = sγk, coincides with the transient

distribution of the process Xn.

(iii) Projected trace semantics, defined by Xg(γ) = (g(sγ0), g(sγ1), . . .), projects

the model to the measurable space (SN,FSN). It is easy to inspect that the

induced probability space is equivalent to the directly given random model

of the g-projection of process Xn.

Theorem 4.27. Let Xn be a DTMC assigned to the Markov graph (S,w, p0),such that (Cond2) holds for some α. Let Yn be the process assigned to the

α-aggregation (S, w, p0). Then,

(i) If p0 respects α, then for all n = 0,1, . . .

(i) P(X0∶n ∈ [A0]→ . . .→ [Ak]) = P(Y 0∶n = A0 → . . .→ Ak)

(ii) P(X0∶n ∈ [A0]→ . . .→ [Ak−1]→ s) = P(Y 0∶n = A0 → . . .→ Ak)α(Ak, s),with s ∈ [Ak].

(ii) If p0 respects α, then for all n = 0,1, . . .

(i) P(Yn = A) = P(Xn ∈ [A])

(ii) P(Xn = s) = α(A, s)P(Yn = A).

(iii) If (S,w, p0) is irreducible and aperiodic, then for n→∞,

(i) ∣P(Yn = A) − P(X ∈ [A])∣→ 0

(ii) P(Xn = s)/P(Yn = A)→ α(A, s).

4.5.2 Continuous-time case

Let (Γ(R+,S),FΓ(R+,S),PΓ(R+,S)

) be a directly given model for Xt. Simlarly as in

the discrete case, we focus on the three measurements:


(i) Prefix trace semantics, defined byX[0,T )(γ) = (kγ, sγ1 , . . . , sγk, t

γ0 , . . . , t

γk), where

kγ is the number of jumps which happen before time T . For D0, . . . ,Dk ∈P(S), and I1, . . . , Ik ∈ B, we write instead ofX[0,T ) ∈ (k,D0, . . . ,Dk, I1, . . . , Ik),the shorthand X[0,T ) ∈ D0

I1→ . . .Ik→ Dk, and if it is clear that all sets Di are

singletons, we omit the set parenthesis.

(ii) Transient semantics, defined by Xt(γ) = γ(t), coincides with the transient

distribution of Xt.

(iii) Projected trace semantics, defined by Xg(γ) = γ ∈ Γ(R+,S), so that γ(t) =g(γ(t)) for all t ∈ R≥0. It can be shown that the induced probability space is

equivalent to the directly given random model of the g-projection of process

Xt.

Theorem 4.28. Let Xt be assigned to the Markov graph (S,w, p0) such that

(Cond2) holds for some α. Let Yt be the process assigned to the α-aggegation

of Xt. Then,

(i) If p0 respects α, for all T ∈ R≥0,

(i) P(Y [0,T ) = A0I1→ . . .

Ik→ Ak) = P(X[0,T ) ∈ [A0]I1→ . . .

Ik→ [Ak])

(ii) P(X[0,T ) ∈ [A0]I1→ . . . [Ak−1]

Ik→ s) = P(Y [0,T ) = A0I1→ . . .

Ik→ Ak)α(Ak, s).

(ii) If p0 respects α, then for all n = 0,1, . . .

(i) P(Yt = A) = P(Xt ∈ [A])

(ii) P(Xt = s) = α(A, s)P(Yt = A).

(iii) If (S,w, p0) is irreducible and aperiodic, then for t→∞,

(i) ∣P(Yt = A) − P(X ∈ [A])∣→ 0

(ii) P(Xt = s)/P(Yt = A)→ α(A, s).

4.6 Matrix representation

If the state space is finite, using the matrix notation enables a concise specification

of the forward, backward criterion, and of the construction for the aggregate chain.


Corollary 5. Let V be an N ×M matrix defined by

Vs,A =⎧⎪⎪⎪⎨⎪⎪⎪⎩

1 if g(s) = A,

0 otherwise,

and let Uα be an M ×N matrix defined by

UαA,s =

⎧⎪⎪⎪⎨⎪⎪⎪⎩

α(A, s) if g(s) = A,

0 otherwise.

Let P be the transition matrix of Markov graph (S,w, p0) (either discrete- or

continuous-time), and let Pα be the transition matrix of the corresponding α-

aggregation (S, w, p0). Then,

(i) UαV = IM for all α,

(ii) (Cond1) is equivalent to VUαPV = PV for all α.

(iii) (Cond2) is equivalent to UαPVUα = UαP.

(iv) The transition matrix of α-aggregation is given by Pα = UαPV.

(v) The results of Theorem 4.13 and Theorem 4.25 summarize to π(t) = π(t)V

(lumpability) and π(t) = π(t)Uα (invertability).

Proof. (i) for all α, (UαV )AI ,AJ = ∑s∈S UαAI ,s

Vs,AJ = ∑s∈[AJ ]α(AI , s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

1 if I = J

0 otherwise.

(ii) First see that (PV )si,AJ = ∑sj∈[AJ ]w(si, sj) = δ+(si,AJ), and that (V Uα)si,sj =⎧⎪⎪⎪⎨⎪⎪⎪⎩

α(A, sj) if g(si) = g(sj) = A

0 otherwise. Moreover, (V UαP )si,sj = ∑sk∈S

(V Uα)si,skw(sk, sj) =

∑sk∈[g(si)]α(g(si), sk)w(sk, sj) and (V UαPV )si,AJ = ∑sj∈S(V UαP )si,sjVsj ,AJ =

∑sj∈[AJ ]∑sk∈[AI]α(AI , sk)w(sk, sj), where AI = g(si). Since (V UαPV )si,AJ

has the same value for all si ∈ [AI], it follows that the matrices VU and

VUαPV are equivalent if and only if (Cond1) holds.

(iii) Notice that (UαP )Ai,sj = ∑sk∈SUαAi,sk

w(sk, sj) = ∑sk∈[Ai]α(Ai, sk)w(sk, sj),

that is equivalent to δ−(AI , sj)α(AJ , sj), where AJ = g(sj). On the other

hand, (UαPV )AI ,AJ = ∑si∈[AI]∑sj∈[AJ ]α(AI , si)w(si, sj) and consequently


(UαPV Uα)Ai,sj = (UαPV )AI ,AJUAJ ,sj , whereAJ = g(sj). Therefore, UαPVUα =UαP is (UαPV )AI ,AJ is equivalent to δ−(AI , sj) for all sj ∈ [AJ], that is, if

and only if (Cond2) holds.

The points (iv) and (v) follow trivially.

Chapter 5

Exact automatic reductions of stochastic

rule-based models

Throughout this Chapter, we assume a rule-based program (R,G0) over a contact

map C = (A,Σ,E ,I ) with Markov graph (G, w, p0) and process Xt. We use

notation S for the set of species, F for the set of fragments, Xt the species-based

semantics with partition function ϕS ∶ G → X and Markov graph (X ,w, p0), and

Yt the fragment-based semantics with partition function ϕF ∶ G → Y . Finally,

ϕ ∶ X → Y denotes the partitioning from the species state space to the fragments

state space.

Definition 5.1. The F-reduction of (R,G0) is any ϕ-aggregation of (X ,w, p0).The reduction is exact, if the ϕ-aggregation is exact (Definition 4.2). Otherwise,

the reduction is approximate.

We will address the following problem related to the exact reduction of a rule-based

program.

Problem 2. Characterize the set of fragments F , such that there exists an exact

F -reduction of (R,G0); Then, if the fragment-based semantics of a given rule-

based program is Yt, define a new rule-based program with the species-based

semantics Yt.

Definition 5.2. Let G1 = (V1,Type1, I1,E1, ψ1) and G2 = (V2,Type2, I2,E2, ψ2)be site-graphs such that V1∩V2 = ∅. Their disjoint union, denoted by G = G1⊎G2,

is such that, for i = 1,2, if v ∈ Vi, then Type(v) = Typei(v), I(v) = Ii(v), ψ(v) =ψi(v).

88

Chapter 5. Exact automatic reductions of stochastic rule-based models 89

Definition 5.3. Site-graph G is said to be a sub-graph of G′, written G ⊆ G′, if

the identity mapping is a support of an embedding of G to G′. If σ ∈ Emb(G,G′),we denote by σ(G) the sub-graph of G′ induced by σ. In that case, we can refer

to the inverse transformation σ−1 ∈ Emb(σ(G),G).

The following definition is an alternative way of defining fragments, by extending

the annotation relation to the sites of different agents which are potentially con-

nected via a path. It will be useful for proving that the reduction by Algorithm 2

is exact.

Definition 5.4. Let ∼A∣ A ∈ A be some annotation of a contact map C. More-

over, let ∼⊆ ((A, s), (A′, s′)) ∣ A,A′ ∈ A, s ∈ Σ(A), s′ ∈ Σ(A′) be a transitive

closure of the relation

(A, s) ∼ (A′, s′) iff s ∼A s′ or ((A, s), (A′, s′)) ∈ E .

The contact map can uniquely be decomposed into a disjoint union of a finite

number of contact maps, C = ⊎iCi (Figure 5.1a). Moreover, under the described

decomposition of a contact map, the set of fragments F can be decomposed to

F = ⊎iFi, where Fi is isomorphic to a set of species for the contact map Ci.

Definition 5.5. The restriction of a site-graph G = (V,Type, I,E,ψ) over a con-

tact map C = (A,Σ,E ,I ), to a contact map Ci = (Ai,Σi,Ei,I ) is denoted

by G∣Ci and defined by (Vi,Typei, Ii,Ei, ψi), with Vi = v ∈ V ∣ Type(v) ∈ Ai,

Typei(v) = Type(v), Ii(v) = Σ(Typei(v)), Ei = ((v, s), (v′, s′)) ∣ v, v′ ∈ Vi, s ∈Σi(v), s′ ∈ Σi(v′), ((v, s), (v′, s′)) ∈ E, ψi(v, s) = ψ(v, s), if v ∈ Vi.

Notice that the identity node permutation always defines an embedding between

G∣Cj and G.

Notation 1. If σ ∈ Iso(G1,G2), we write G2 = σ(G1). A union of a1 copies of site-

graphs F1, a2 copies of site-graph F2 etc will be denoted by GF ≡ a1F1, . . . , amFm.

Notice that GF is defined over the contact map C, but it is not necessarily a

reaction mixture.

Recall that ϕS(G) = (x1, . . . , xn) with xi = mG(Si) for i = 1, . . . , n, and that

ϕF(G) = (y1, . . . , ym) with yi = mG(Fi) for i = 1, . . . ,m. We provide an alter-

native characterization of both aggregations in terms of a binary relation between

states. An illustration is provided in Figure 5.1.


u3

u4

G1

a b

c d

AB

C

a

Bc

G2

a

b

c d

A

BC

a

Bc

a b

c d

AB

C

a

Bc

a

b

c d

A

BC

a

Bc

a b

c d

AB

C

a

Bc

u1u2

u2

u2

u2

u2

a b AB

c d CB

c d C

a b AB

C C1 C2b)

a)

G3 G4

G0

Figure 5.1: Example 2.1, characterization of ϕS and ϕF by node permutations.a) The decomposition of a contact map, as described in Definition 5.4. b) As-sume the initial state of (G, w, p0) in G0. Any of the four states G1,G2,G3,G4 isreachable from G0, after applying rules R1 and R2 (in any order). The permuta-tion of nodes which confirms that G1,G2 ∈ [SABC , SB]ϕS is (v2 ↦ v1) (the othernode images are uniquely determined); Moreover, G3,G4 ∈ [SAB, SBC]ϕS , withpermutation (v1 ↦ v2); Finally, G1,G2,G3,G4 ∈ [FAB?, F?BC]ϕF . For exam-ple, the permutation between G1∣C1 to G3∣C1 is (u2 ↦ u2), and the permutation

between G1∣C2 to G3∣C2 is (u2 ↦ u1).

Lemma 5.6. Let G = (V,Type,I,E,ψ), G′ = (V,Type,I,E′, ψ′) be two reaction

mixtures in G and let C = C1 ⊎ . . .⊎Ck be the decomposition of a contact map as

defined in Definition 5.4. Then,

(i) ϕS(G) = ϕS(G′) if and only if there exists an isomorphism ρ ∈ Iso(G,G′)with support ρ∗ ∶ V → V , that is, G′ = ρ(G). Notice that, since nodes can be

mapped only to nodes of the same type, ρ is a product of permutations over

the nodes of the same type.

(ii) ϕF(G) = ϕF(G′) if and only if there exists a family of isomorphisms ρi ∈Iso(G∣Ci ,G′∣Ci) ∣ i = 1, . . . , k, such that G′∣Ci = ρi(G∣Ci) for all i = 1, . . . , k.

It is expected that the species-based semantics is an exact aggregation with respect

to the individual-based semantics. Next result confirms that (Cond1) holds for

lumping Xt with partition ϕS .

Theorem 5.7. The process Xt is strongly lumpable with respect to ϕS .

Proof. Let G1 ∈ G, ϕS(G1) = x and for rule Ri, σ∗ ∶ V → V be such that δi(σ,G1) =G′1, and ϕS(G′1) = x′. Let G2 ∈ G be such that ϕS(G2) = x, and let ρ ∶ V → V

be the node permutation such that G2 = ρ(G1). Then, ρ∗ σ∗ ∶ Vi → V ′ is an


Algorithm 2: Procedure for annotating the node type’s signatures

Input : A rule-based program (R,G0) over the contact map (A,Σ,E ,I ), suchthat R ≡ R1, . . . ,Rn and for each i = 1, . . . , n, Ri ≡ (Gi,G′

i, ci);Output: Annotation ∼AA∈A (a family of equivalence relations of the set of sites

of each node type).

for A ∈ A do ∼A= s ∣ s ∈ Σ(A);for G ∈ G1,G′

1, . . . ,Gn,G′n do

G = (V,Type, I,E,ψ);for v ∈ V do

A = Type(v);for s ∈ I(v) and each s′ ∈ I(v) do ∼A= addrelation(∼A, s, s′);

end

endreturn (∼A)A∈A;/* For any node type A ∈ A, ∼A is a equivalence relation that is encoded by a

forest as in the Union-Find algorithm [15], the primitiveaddrelation(∼, a, b) fuses the two ∼-equivalence classes [a]∼ and [b]∼. */

embedding support between Gi and G2. Let ρ′ ∶ V ′ → V ′ be such that ρ′(v) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

ρ(v) if v ∈ V

v if v ∈ V ′ ∖ V. Then, δi(ρ σ, ρ G1) = ρ′(δi(σ,G1)), by definition of rule

application. Hence, δi(ρ σ,G2) ∈ [x]ϕS . Since whenever σ′ ≠ σ, also ρ σ′ ≠ ρ σ,

the number of applications starting from G1 to x′, and from G2 to x′ is equal.

Consequently, δ+(G1,x′) = δ+(G2,x′).

5.1 Exact fragment-based reduction

We propose the Algorithm 2 for annotating the contact map of a given rule-based

program. Initially, each site is correlated only to itself (being the top of the annota-

tion lattice, introduced in Chapter 3). First, the annotation is refined so to contain

all correlations between sites which appear in the observable patterns, because all

the observables should be read from in the fragment-based semantics. Then, the

procedure refines the current annotation by processing the rules sequentially, so

that every two sites which are tested or modified by the same rule, are correlated

by the annotation.


From now on, we assume that ∼A∣ A ∈ A is the annotation derived by Algo-

rithm 2 and F the corresponding fragmentation. The following observation will

be important for showing that the F -reduction is exact.

Observation 1. Let u be a node appearing in the lhs of a rule Ri, and Let C = ⊎jCj

be the contact map decomposition, F = ⊎jFj the corresponding partition of the

fragment set, v a node appearing in the a fragment F ∈ F , such that Type(u) =Type(v) = A. Then, we differ one of the three cases:

(i) I(u) = ∅,

(ii) I(u) ⊆ I(v) = Σj(A), or

(iii) I(u) ≠ ∅ and I(u) ∩ I(v) = ∅.

In further analysis, it will be important to notice that, for every embedding σ ∈Emb(Gi,G), in case (ii), the test is performed exclusively over fragments from

group Fj. If Ri changes the internal configuration of a fragment, it affects no

fragment from other groups. If a Ri involves deletion of nodes, a deleted fragment

from set Fj will trigger a deletion of all other fragments which contain the node

types involved in the lhs of the rule. Therefore, unless the deleted structure is a

species, the change will involve more fragments of the same type simultaneously

(see Figure 5.2). In a well-defined rule, the birth of a component must be a birth

of a species. As Algorithm 2 merges also the sites which appear together in the rhs

of the rule, the birth of a fragment will uniquely affect any fragment-based state.

In case (i), no site of node u is tested by the rule. Such a node either remains as

it is on the rhs, or can be deleted.

We first establish lumpability between the fragment-based semantics and the

individual-based semantics by showing that (Cond2) holds.

Theorem 5.8. Xt is lumpable with respect to ϕF . Specifically, (Cond2) is met,

for α(y,G) = 1/∣[y]ϕF ∣.

Proof. Fix y,y′ ∈ Y, reaction mixture G′1 ∈ [y′]ϕF and rule Ri = (Gi,G′i, ci), such

that Gi = (Vi,Typei,Ii,Ei, ψi) and G′i = (V ′

i ,Type′i,I ′i ,E′i, ψ

′i). Define the set of

embeddings which lead to G′1 after applying a rule Ri to a reaction mixture from

[y]ϕF by Γ−(y,Ri,G′1) ∶= σ ∈ Emb(Gi,G) ∣ G ∈ [y] and δi(G1, σ∗) = G′1. We show

that the cardinality of set Γ−(y,Ri,G′1) does not depend on the choice of G′1. Let


c5

G1

a b

c d

AB

C

a

Bc

a b

c d

AB

C

G2

a

b

c d

A

BC

a

Bc a

c dB

C

c5

cR5 : ∅ x1

x2

x3

x4

ABC, B

AB, BC BC

ABCc5

c5

G1

G2

a) b) c)

aB

Figure 5.2: Specificity of deletion events. a) A ruleR5 is added to Example 2.1,where a node of type B is deleted whenever site c is free. b) Two reactionmixtures, G1 and G2, are lumped in the fragment-based view; To both statesG1 and G2, lhs of rule R5 embeds exactly once: mG1(G5) = mG1(G5) = 1, butthe states obtained after the transition are not equivalent in the fragment-basedview. c) The species-based representation of the situation in b) – (Cond1) is

violated.

G′2 ∈ [y′]ϕF . The goal is to construct a bijection between sets Γ−(y,Ri,G′1) and

Γ−(y,Ri,G′2).

For each G1 and σ1 ∈ Γ−(y,Ri,G′1), we define a different G2 ∈ [y]ϕF and σ2, such

that G′2 = δi(G2, σ∗2). Since ϕF(G′1) = ϕF(G′2), there exist node permutations ρ′∗j ∶V ′ → V ′ ∣ j = 1, . . . , k such that G′2∣Cj = ρ′j(G′1∣Cj) for j = 1,2, . . .. Let u ∈ Gi,

and v = σ∗1(u) its image in G1. Define G2 by permuting the node identifiers of

G1 with ρ∗j ∶ V → V ∣ j = 1, . . . , k,such that ρ∗j (v) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

ρ′∗j (v) if v ∈ V ∩ V ′

v if v ∈ V ∖ V ′

. In

other words, the image of a node which was not deleted remains as in y′, and

the deleted nodes have the identity image. In case V ⊂ V ′ (creation of nodes),

the image is neglected. First show that G2 ∈ [y′]ϕF : (Case 1) if V = V ′, trivially

holds by the discussion in Observation 1; (Case 2) if V ∖ V ′ ≠ ∅ (Ri is a deletion

event) For all j = 1, . . . , k, a deleted node v ∈ V ∖ V ′ is mapped to itself by ρj.

Therefore, for each deleted node in G1 and G2, their interfaces are equivalent.

Denote the deleted component by Sl. Since by assumption, G′1 = G1 ∖ Sl and

G′2 = G2∖Sl, are fragment-based equivalent, it follows that G1 and G2 are fragment-

based equivalent. (Case 3) if V ′ ∖ V ≠ ∅ (Ri is a birth event) The node v ∈ V ′ ∖ Vinduces a connected component in G′1 which is a fully defined species Sl, because,

in a well-defined rule, the birth components must have full interfaces. Moreover,

since the Algorithm 2 groups all the sites which appear on the rhs of a rule,

the annotation class of every v ∈ V ′ ∖ V lists all sites from its interface Σ(v),and, consequently the connected component induced by ρ′∗j (v) must be its species


equivalent. Therefore, the states before the creation of these components are also

fragment-equivalent. Finally, the rule Ri can be applied to G2 via embedding

ρ∗j σ∗, resulting in δi(ρj G1, ρ∗j σ∗) = ρj(δi(G1, σ∗)) = G′2.

It can be inspected that the activity of each rule is equal for any two individual

states lumped by ϕF . This is however not sufficient for claiming (Cond1).

Theorem 5.9. Xt is lumpable with respect to ϕF . Specifically, if all deletion

events are deletion of a species,(Cond1) holds.

Proof. Taking G1,G2 such that ϕF(G1) = ϕF(G2) = y, it suffices to establish a

bijection between the successors of G1 and G2 inside some state [y′]ϕF . The proof

is analogous to the proof of Theorem 5.8, but it proceeds in the opposite direction:

under assumption that the transition sources are lumped, one needs to show that

the transition targets are lumped as well. The difference comes in the argument

related to the cases of node birth and node deletion: the case of node deletion is

analogous to the case of node creation in Theorem 5.8, and vice versa. However,

while the proof of Theorem 5.8 relies on the argument that the new node must

have fully defined interface, the guarantee that the deleted node is a fully defined

species is lost. An illustration of an Example where the node deletion violates

(Cond1) is provided in Figure 5.2.

Theorem 5.10. Xt is lumpable with respect to ϕ. Specifically, (Cond2) is met

for α ∶ Y ×X → [0,1], such that

α(y,x) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

∣[x]ϕS ∣

∣[y]ϕF ∣, if [x]ϕS ∩ [y]ϕF ≠ ∅

0 , otherwise..

If all deletion events are deletion of a species, (Cond1) holds as well.

The proof follows directly from Theorem 3.4.

The computation of the conditional probabilities can be done efficiently: for G ∈[x]ϕS , ∣[x]ϕS ∣ = ∣Aut(G)∣ is given in Theorem 2.22. Moreover, ∣[y]ϕF ∣ is equal to

the number of automorphisms of τ(G).

In the related works, the presented problem was discussed from different perspec-

tives. In [33] and [32], the proof is facilitated with the process-algebraic notation,

while the semantics is defined over a weighted-labeled transition model. In [37], the


results were extended by showing how the applicability of de-aggregation depends

on the initial distribution. Recently, we presented a procedure for reconstructing

the high-dimensional species-based dynamics from the aggregate state efficiently.

The algorithm involves counting the automorphisms of a connected site-graph,

and has a quadratic time complexity in the number of molecules which constitute

the site-graphs of interest [70].

5.2 Computing the fragment-based semantics

When dealing with large-scale examples, it is important to be able to compute the

fragment-based semantics. We propose to reuse the rule-based simulator for ob-

taining the fragment-based semantics. Towards this goal, it is important to choose

the right representation of a fragment-based state, and to define the executions

between fragment-based states, so that the aggregation between the underlying

Markov graphs is indeed exact.

Recall the translation of a rule-based program proposed in Chapter 3. In the

following, consider the (R, G0), which is a result of translation according to the

Algorithm 1 and the fragmentation F .

Observation 2. Each species in S = S1, S2, . . . in the model (R, G0) is isomorphic

to some fragment Fj ∈ F = F1, F2, . . . in the model (R,G0) and vice versa.

Assume that the ordering is such that S1 ≅ F1, S2 ≅ F2 etc.

Theorem 5.11. The individual-based semantics of (R, G0) –a Markov graph

(G,w, p0), and the individual-based semantics of (R, G0)– a Markov graph (G, w, p0),are equivalent.

The following result will serve to show that the activity of a rule in G ∈ G and

τ(G) ∈ G are equal. Recall the translation of a rule-based program presented in

(Definition 3.10, Chapter 3).

Lemma 5.12. Let G = (V,Type, I,E,ψ) and A1, . . . ,An be the node types which

occur in the lhs of the rule Ri, that is, in Gi. Moreover, let a1, a2, . . . , an denote

the number of annotation classes in A1,A2, . . . ,An respectively, and N1, . . . ,Nn


the total abundance of nodes of type A1, . . . ,An in G. Then,

cici= 1

∏nj=1N

aj−1j

= mG(Gi)mτ(G)(τ(Gi))

.

Proof. The first equality follows from the definition of Algorithm 2. Fix σ∗ ∶ Vi →V , one embedding of Gi to G, such that δi(G) = G′. Let Ci = (Ai,Σi,Ei,Ii) be

the contact map inferred from a rule Ri. For each σ ∈ Emb(Gi,G), there are

exactly ∏nj=1N

aj−1j distinct embeddings σ ∈ Emb(Gi, G). More precisely, since

σ ∈ Emb(Gi, G), it holds that

σ∗(vC) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

σ∗(vC), if Σi ⊆ C

any vC ∈ V , such that [Type(v)]C = Type(vC), (if C ∩Σi = Σi = ∅).

induces an embedding such that σ ∈ Emb(Gi, G). By Observation 1, there is a

unique vC ∈ Vi such that Σi ⊆ C. Since there are Nj possibilities for the choice of

σ∗(vC), the proof is complete.

Proof. (Theorem 5.11) If suffices to show that

1. G ∈ G iff τ(G) ∈ G, and

2. w(G,G′) = w(τ(G), τ(G′)).

We use induction on the number of steps. By Definition, G0 = τ(G0). Assume that

G = τ(G), and let G′ = δi(σ,G) be an arbitrary successor of state G. Let Vi be the

set of nodes of the translation of rule Ri by Algorithm 1. Define σ∗ ∶ Vi → V , so

that σ∗(vC) = [σ∗(v)]C. Then, it trivially holds that δi(G, σ∗) = G′, and G′ = τ(G′).Moreover,

w(G,G′) =∑Ri

∑σ∈Emb(Gi,G)

ci1δi(G, σ) = G′, by Definition of rule application

=∑Ri

(mG(Gi)ci)1δi(G, σ) = G′

=∑Ri

(mG(Gi)ci)1δi(G, σ) = G′ by Lemma 5.12

=∑Ri

∑σ∈Emb(G,G)

ci1δi(G, σ) = G′, by Definition of rule application

= w(τ(G), τ(G′)).


a) b)1 time

P(Yt = y)

P(Xt ∈ [y]ϕ)

c3, c4 > 0

c3 = c4 = 0

0

0.05

1 time0

3e − 7

P(Xt = x1)

P(Yt = y)α(y,x)

c3, c4 > 0

c3 = c4 = 0

Figure 5.3: Example 2.1: testing Theorem 5.11, Theorem 5.10 and Theo-rem 4.28 (points (ii) and (iii)) for a set of fragments shown in Figure 3.2, for ini-tial state x = 10SA,20SB,10SC, and rate values c1 = 0.001, c1−=2, c2 = 0.002,c2− = 3, and c3 = c4 = 0 (solid lines), c3 = 0.05, c4 = 0.1 (dotted lines). The statesx1 = 10SB,10SABC) and x2 = 10SAB,10S10 are such that ϕ(x1) = ϕ(x2) = y(a multiset 10FAB?,10F?BC). The transient distribution of for the model with506 states, and for the reduced model with 121 states, is obtained by integrat-ing the CME. a) The plots for P(Yt = y) (green) and P(Xt ∈ [y]ϕ) (blue) areshown. For c3 = c4 = 0, the curves are identical; When the initial distributionwas changed so that P(X0 = x1) = 1, the condition still holds (plot not shown).b) The plots for P(Xt = x1) and P(Yt = y)α(y,x1) = P(Yt = y)10!10!

20! are shown.Again, for c3 = c4 = 0, the curves are identical. When the initial distributionwas set to P(X0 = x1) = 1, the condition holds asymptotically (plot not shown).

Corollary 6. Let Y ′t with be the species-based semantics of (R, G0), Then, Y ′

t is equivalent to Yt, the fragment-based semantics of (R,G0).

5.3 Example

The results are interpreted over the Example 2.1. For a set of rules R1,R−1 ,R2,R−

2,

the set of fragments F , derived by Algorithm 2, is the one shown in Figure 3.2

(Chapter3). Therefore, Theorem 5.10 holds and (Cond2) is satisfied for a pro-

cesses Xt (with respect to the partition ϕ), proving the F -reduction from Ytto Xt to be exact. By Corollary 6, the species-based semantics of a new rule-

set R = R1, R−1 , R2, R−

2 is exactly the process Yt. Therefore, the process Ytcan be analyzed instead of Xt, with their mutual relation as outlined in The-

orem 4.28. More precisely, points (i) and (ii) always hold. Since the set of rules

is reversible, the CTMC Yt is irreducible, and Theorem 4.28(iii) holds as well.

Moreover, point (ii)(i) of Theorem 4.28 relates the transient semantics between


Yt and Xt. This is because there are no deletion events in the model, and by

Theorem 5.9. We confirmed all of the above observations on a test case described

in Figure 5.3.

Notice that, when rules R3,R4 are added, the Algorithm 2 will output the set

of species. Indeed, the simulation shows that adding rules R3,R4, F no longer

provides an exact reduction.

Chapter 6

Approximate aggregation of Markov chains

Throughout this section, we consider a Markov graph (S,w, p0) with the process

Xt (either discrete- or continuous-time), a partitioning of the countable set S,

induced by a surjective function g ∶ S → S, where S = A1, . . . ,AM and M < ∣S∣.

Recall from Chapter 4.1, that using a g-aggregation (S, w, p0), instead of the

original Markov graph (S,w, p0), is justified in the case of exact aggregation,

that is, when the projected process is equivalent to the process assigned to the

aggregated Markov graphs. If we do not have guarantees that the aggregation is

exact, it is useful to quantify the error of using the Markov graph(S, w, p0) as an

approximation of the projected process.

Problem 3. Quantify the error induced by the g-aggregation (S, w, p0), until time

T .

Let Yt be the g-projection of Xt and let Y ′t be the CTMC assigned to

(S, w, p0). Both processes Yt and Y ′t are defined over the same state space S.

The projected process Yt is not necessarily a Markov chain. We need a distance

measure between distributions of two multi-dimensional random variables. In case

of discrete time, we deal with a multi-dimensional discrete random variable, and

in the case of continuous time, we deal with a multi-dimensional mixed (discrete

and continuous) random variable.

We decide on the information-theoretic measure of divergence, known as relative

entropy or Kullback-Leibler divergence. The main reason why we use the KL-

distance is that it is convenient when applied to the probability space of traces

generated by Markov sources: it can be computed efficiently, as a function of only

99

Chapter 6. Approximate aggregation of Markov chains 100

aggregation error

Xt

Yt

projection

Y n

X t

lifting

aggregation

Figure 6.1: Processes Xt and X ′t operate on the state space S, while

processes Yt and Y ′t operate on the reduced state space S. Aggregation

and lifting are operations over the Markov graph. Projected process Yt is notnecessarily a Markov chain.

the corresponding generator matrices and the transient distribution of the original

process.

6.1 KL divergence

Let P and M be two probability measures on a common measurable space (Ω,F).

Definition 6.1. The divergence of measure P with respect to measure M (discrete-

or continuous-) is the supremum of relative entropy with respect to all possible

discrete measurements:

D(P∣∣M) = supfH(Pf ∣∣Mf).

Various other terms are used for quantity D(P∣∣M) throughout the literature: dis-

crimination, Kullback-Leibler number, directed divergence, cross entropy. By pre-

vious discussion on relative entropy, KL divergence is nonnegative, it equals zero

if the distributions match and it can be equal to infinity (whenever P is not dom-

inated by M). KL divergence is not a metrics, since it is non-symmetric, and it

does not satisfy the triangle inequality. A common technical interpretation is that

KL divergence is the coding penalty associated with selecting the candidate M to

approximate the correct distribution P .

The following Theorem provides the computation of KL divergence for both discrete-

and continuous- probability spaces. It will be important in the analysis of approx-

imate aggregations, since we will need to derive an error measure between trace

distributions generated by continuous-time Markov chains.


Theorem 6.2. [Theorem 5.2.3 in [46]] The Kullback-Leibler divergence of P with

respect to M is given by

D(P∣∣M) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

∫ ln f(ω)dP(ω) = ∫ f(ω) ln f(ω)dM(ω) , if P ≪ M

∞ , otherwise.

The term f = dP/dM is the Radon-Nikodym derivative of P with respect to M. The

quantity ln f , if it exists, is called the entropy density or relative entropy density

of P with respect to M.

In particular, it is worth instantiating the computation of divergence on a proba-

bility space depending on whether P and M are discrete or continuous measures:

(i) If P and M are discrete, then the Radon-Nikodym derivative dPdM(ω) is equal

to p(ω)m(ω) , so we have

D(P∣∣M) = ∑ω∈Ω

p(ω) lnp(ω)m(ω) .

Indeed, this follows immediately from Lemma 1.20: for discrete sample space,

the supremum is achieved for the identity measurement.

(ii) When P and M are measures on Euclidean space Rn, and if they are both

absolutely continuous with respect to Lebesgue measure (dominated by it),

then there exist pdf’s f and g for measures P and M respectively, such that

D(P∣∣M) = ∫Rnf(x) ln

f(x)g(x)dx.

Another important result which will provide a crucial argument in the framework

of approximate aggregations, states that the measurement on a finite alphabet

random variable lowers the relative entropy of that random variable.

Theorem 6.3. [Thm. 5.2.2 in [46]] If P and M are two probability measures and

f a measurement on the common measurable space (Ω,F), then

D(Pf ∣∣Mf) ≤D(P∣∣M).

In the case of discrete probability spaces, the result of the Theorem is immediate

from the definition of divergence. The result for arbitrary measurable function on


continuous measure spaces can be proven by combining the Lemma 1.20 and an

approximation technique. A detailed proof and further discussion can be found in

([46], Thm. 5.2.2).

6.2 Error measure: Discrete time

Our goal is to estimate the aggregation error for any given number of steps, as a

function of only the descriptions of Markov graphs (S,w, p0) and (S, w, p0) and

the transient distribution of process Xt of (S,w, p0). We start by defining the

aggregation error.

Definition 6.4. Let Xn be a DTMC assigned to M ≡ (S,w, p0) and Y ′n a

process assigned to some g-aggregation M ≡ (S, w, p0) . Then, the aggregation

error until time n = 0,1, . . . is defined by

∆M,M(n) ∶=D(PY 0∶n ∣∣P′Y 0∶n),

where PY 0∶n and P′Y 0∶n denote the prefix trace semantics of respectively Yn and

Y ′n (prefix trace semantics is defined in Section 4.4.1).

A useful notion of divergence between processes is the KL divergence rate.

Definition 6.5. Let P and P′ be two different probability measures on the mea-

surable space of a directly given random model of a discrete-time process Xn.

Then, the KL distance rate is defined by

D(PX ∣∣P′X) = lim

n→∞

1

nD(PX0∶n ∣∣P′

X0∶n).

We start with a Theorem on how to compute the KL distance between prefix trace

distributions between two Markov sources on the same state space. Then, since

Y ′n is not necessarily a Markov chain, instead of computing the error between

Yn and Y ′n directly, we will provide an upper bound, by lifting the aggregation

(S, w, p0) back to the state space S. We will show that the KL distance between

the original process, Xn, and the lifted process, X ′n is indeed an upper bound

to the aggregation error.


Theorem 6.6. Let Xn and X ′n be DTMC’s assigned to the Markov graphs (S,w, p0)

and (S,w′, p′0), and let P, P′ denote the probability measures of their directly given

models. Moreover, let π(k) ∶ S → [0,1] denote the transient distribution of Xnat time k. Then,

D(PX0∶n ∣∣P′X0∶n) =HP∣∣P′(X0∶n) =∑

s∈S

p0(s) lnp0(s)p′0(s)

+n−1

∑k=0

∑s∈S

π(k)(s)σ(s)

=D(p0∣∣p′0) +n−1

∑k=0

Eπ(k)[σ],

where σ ∶ S → R≥0 is defined by

σ(s) = ∑s′∈S

w(s, s′) lnw(s, s′)w′(s, s′) .

In particular, if Xn has a unique stationary distribution µ, then

D(P∣∣P′) = Eµ[σ] =∑s∈S

µ(s)σ(s).

Recall the definition of prefix trace semantics in discrete-time and notice that the

prefix trace s0→ . . .→ sk is assigned a probability

PX0∶k(s0 → . . .→ sk) = PSN(Prefix(s0, . . . , sk)) = p0

k−1

∏i=0

w(si, si+1).

Proof. We use induction on the length of the trace. The case n = 0 is trivial.

Assume that the claim holds for all k < n. Then,

D(PX0∶n ∣∣P′X0∶n)

= ∑s0,...,sn−1∈S

∑s′∈S

P(X0∶n = (s0, . . . , sn−1, s′)) ln

P(X0∶n = (s0, . . . , sn−1, s′))P(X ′

0∶n = (s0, . . . , sn−1, s′))

= ∑s0,...,sn−1∈S

∑s′∈S

P(X0∶(n−1) = (s0, . . . , sn−1))w(sn−1, s′)

lnP(X0∶(n−1) = (s0, . . . , sn−1))w(sn−1, s′)P(X ′

0∶(n−1)= (s0, . . . , sn−1))w′(sn−1, s′)

=D(PX0∶(n−1) ∣∣P′

X0∶(n−1)) + ∑sn−1∈S

π(n−1)(sn−1)σ(sn−1),

where we used that the transient probability of being in state s at step (n − 1)is equal to the sum of probabilities of all traces which have the (n − 1)st step in

s.


k 0.25k

0.25k

a b

a2

a1 b1

b2π = ( )a1 a2 b1 b2

0.5 0.5 0.25 0.75( )g = a a b ba1 a2 b1 b2

0.75k

0.75k

Figure 6.2: An example of π-lifting.

6.2.1 Lifting: Discrete case

Lifting is an operation on the Markov graph, which is in a sense inverse to ag-

gregation: given a Markov graph on the aggregated state space, lifting outputs a

Markov graph on the original state space. Lifting needs to be done in such a way,

that the KL divergence between (S,w, p0) and the lifted graph (S,w′, p′0) provides

an upper bound on the aggregation error. It will suffice to construct the lifting so

that the aggregated process Y ′n is a projection of the lifted process X ′

n.

Definition 6.7. Given a discrete-time Markov graph (S, w, p0) and a probability

distribution π ∶ S → [0,1], let

α(A, s) = 1g(s)=Aπ(s)

∑s′∈g−1(A) π(s′). (6.1)

Then, a Markov graph (S,w′, p′0) defined by

(i) p′0(s) = α(A, s)p0(A), where A = g(s), and

(ii) w′(s, s′) = α(A′, s′)w(A,A′), where A = g(s) and A′ = g(s′)

is called a π-lifting of (S, w, p0).

Lemma 6.8. The π-lifting of a discrete-time Markov graph is a discrete-time

Markov graph.

Proof. Notice that for every s ∈ S, p′0(s) = ∑A∈S α(A, s)p0(A) ≥ 0. Moreover,

∑s∈S

p′0(s) =∑s∈S

∑A∈S

p0(A)α(A, s) = ∑A∈S

p0(A)(∑s∈S

α(A, s)) = 1.


Observe that, by Definition 6.7, w′(s, s′) = ∑A∈S∑A′∈S 1g(s)=Aα(A′, s′)w(A,A′) ≥ 0.

For a fixed s ∈ S,

∑s′∈S

w′(s, s′) = ∑s′∈S

⎛⎝∑A∈S

∑A′∈S

1g(s)=Aα(A′, s′)w(A,A′)⎞⎠

= ∑A∈S

1g(s)=A⎛⎝∑A′∈S

∑s′∈S

α(A′, s′)w(A,A′)⎞⎠

= ∑A∈S

1g(s)=A⎛⎝∑A′∈S

w(A,A′)⎞⎠= 1.

Lemma 6.9. If (S,w′, p′0) is a π-lifting of (S, w, p0), then (S, w, p0) is an exact

aggregation of (S,w′, p′0).

Proof. By Theorem 4.27, it suffices to show that, for α defined as in (6.1), (Cond2)

holds and p0 respects α. The distribution p0 respects α by construction. Let

A,A′ ∈ S. Then, for every state s′ ∈ [A′], δ−(A, s′) equals w(A,A′):

δ−(A, s′) = α(A′, s′)−1 ∑s∈[A]

α(A, s)w(s, s′)

= α(A′, s′)−1 ∑s∈[A]

α(A, s)[α(A′, s′)w(A,A′)]

= α(A′, s′)−1α(A′, s′)w(A,A′) ∑

s∈[A]

α(A, s) = w(A,A′).

Theorem 6.10. Let π be some probability distribution on S, and X ′t be a

π-lifting of (S, w, p0). Then, for all n ≥ 0,

∆M,M(n) ≤D(PX0∶n ∣∣P′X0∶n).

Proof. Since Yn is a g-projection of Xn, the measurement Y 0∶n is a composition

of measurement g and X0∶n on a directly given model of process Xn (concretely,

Y 0∶n = gX0∶n). By Lemma 6.9, since the aggregation from (S,w′, p′0) to (S, w′, p′0)is exact, Y ′

n is a g-projection X ′n, and, thus, Y ′0∶n = g X ′0∶n. The final claim

follows by the fact the measurement reduces relative entropy (Theorem 6.3).


6.3 Error measure: Continuous time

Definition 6.11. Let Xt be a CTMC assigned to M ≡ (S,w, p0) and Y ′n a

process assigned to some g-aggregation M ≡ (S, w, p0) . Then, the aggregation

error until time T ∈ [0,∞) is defined by

∆M,M(T ) ∶=D(PY [0,T ) ∣∣P′

Y [0,T )),

where PY [0,T ) and P′

Y [0,T ) denote the trace distributions of respectively Yt and

Y ′t .

Theorem 6.12. Let Xt and X ′t be assigned to the CTMCs (S,w, p0) and

(S,w′, p′0) respectively, and let P, P′ denote the probability measures of the directly

given models. Moreover, let π(t) denote the transient distribution of Xt at time

t. Then,

D(PX[0,T ) ∣∣P′

X[0,T )) =HP∣∣P′(X[0,T )) =∑s∈S

p0(s) lnp0(s)p′0(s)

+ ∫T

0∑s∈S

π(k)(s)θ(s)

=D(p0∣∣p′0) + ∫T

0Eπ(k)[θ(s)],

where θ ∶ S → R≥0 is defined by

θ(s) = ∑s′∈S∖s

w(s, s′) lnw(s, s′)w(s, s′) − (a(s) − a′(s)).

The proof of a more general statement (the case of Markov processes which are

not necessarily time-homogeneous) can be found in [13]. We next outline the proof

for CTMC’s.

Recall that X[0,T ) is characterized by a tuple lying in ∪∞k=0 (S × (R+ × S)k), and

that the events measurable by X[0,T ) are of the form (k,D0, . . . ,Dk, I1, . . . , Ik),denoted by

X[0,T ) ∈ D0I1→ . . .

Ik→ Dk,

where Di ∈ P(S) and Ii ∈ B, for i = 0, . . . , k (Section 4.5.2). For example, the event

X[0,T ) ∈ D0I1→ D1

I2→ D2 contains all traces which start at s0 ∈ D0, make a first jump

to state s1 ∈ D1 at time t1 ∈ I1, move to a state s2 ∈ D2 at time t2 ∈ I2, and finally

exit the state s2 after time T .


Lemma 6.13. Let Xt be a process assigned to a CTMC (S,w, p0) and X[0,T )

its prefix trace semantics (Section 4.5.2). Then,

fX[0,T )(s0t1→ . . .

tk→ sk) = p0(s0)k

∏i=0

[e−a(si)(ti+1−ti)w(si, si+1)] e−a(sk)(T−tk). (6.2)

Proof. The measurable sets related to the random variable X[0,T ) belong to the

smallest σ-algebra generated by ∪∞k=0 (S × ([0, r) ∣ r > 0 × S)k). The cumulative

distribution function of X[0,T ) is therefore

FX[0,T )(s0t1→ . . .

tk→ sk) = PΓ(R+,S)(γ ∣sγ1 = s1, . . . , s

γk = sk,

tγ0 < t0, . . . , tγk < tk − tk−1, t

γk+1 > T − tk),

where, as introduced in Section 4.5.2, sγi associates with the i-th state visited

by γ, and the time instance at which the i-th jump occured is tγi . The cylin-

der set of traces Cylinder(s0, s1, . . . , sk, t0, t1 − t0, . . . , tk − tk−1) contains all of the

above-described events, but it also contains the traces in which the (k + 1)stexit happens before time T . Hence, the probability of the disjoint union set

∪s∈SCylinder(s0, . . . , sk, s, t0, . . . , tk − tk−1, T − tk) will be substracted from the re-

sult. Intuitively, the proportion of the events which are over-count should be

P(ξk+1 > T − tk ∣ Zk = sk) = 1 − (1 − e−a(sk)(T−tk)). Recall that (Section 1.5.5)

P(Cylinder(s0, . . . ,sk, s, δ0, . . . , δk−1, δk))

= P(Cylinder(s0, . . . , sk, δ0, . . . , δk−1))∫δi

0e−a(si−1)δkw(sk, s)dδk.

Finally, the cdf results in P(Cylinder(s0, . . . , sk, t0, . . . , tk − tk−1)), multiplied by a

factor (1 −∑s∈S∖sk[(1 − e−a(sk)(T−tk)) w(sk,s)

a(sk)]) = e−a(sk)(T−tk) yields

FX[0,T )(s0t1→ . . .

tk→ sk) = p0(s0)k−1

∏i=0

[(1 − e−a(si)(ti+1−ti)) w(si, si+1)a(si)

] e−a(sk)(T−tk).

The corresponding pdf is the derivative over δ1, . . . , δk.

By definition of KL divergence,

D(PX[0,T ) ∣∣P′

X[0,T )) = EX[0,T )[ln fX[0,T )] − EX[0,T )[ln fX′[0,T )]. (6.3)


For notational convenience, we further write E for EX[0,T ) . Moreover, we write f

for fX[0,T ) and f ′ for fX′[0,T ) . Finally, we will use Γ to denote the range of a variable

X[0,T ) and γ for a tuple which characterizes a trace until time T . Respectively,

the state visited by a trace γ at time t is denoted by γ(t).

From (6.2), by setting tk+1 = T , it follows that

ln(f(s0t1→ . . .

tk→ sk)) = ln(p0(s0)) +k

∑j=0

−a(sj)(tj+1 − tj) +k−1

∑j=0

lnw(sj, sj+1). (6.4)

The next two Lemmas help to write the expectation of two summations with

respect to the density f , in form of an integral of some function over the interval

[0, T ).

Lemma 6.14. Let a ∶ S → R≥0. Then,

Ef [kγ

∑j=0

a(sγj )(tγj+1 − t

γj )] = ∫

T

0∑s∈S

π(t)(s)a(s)dt.

Proof. Observe that

∫T

0a(γ(t))dt = ∫

t1

0a(sγ0)dt + . . . + ∫

T

tka(sγk)dt =

kγ

∑j=1


γj ).

Therefore,

Ef [kγ

∑j=0


γj )] = Ef [∫

T

0a(γ(t))dt] = ∫

γ∈Γf(γ) [∫

T

0a(γ(t))dt]dγ

= ∫T

0(∫

γ∈Γ∑s∈S

f(γ)1γ(t)=sdγ)a(s)dt = ∫T

0∑s∈S

π(t)(s)a(s)dt.

Lemma 6.15. Let φ ∶ S ×S → R≥0 be defined by φ(s, s′) = ln(w(s, s′))1s=s′ . Then,

E [kγ

∑k=1

φ(sγk, sγk+1)] = ∫

T

0(∑s∑s′≠s

πt(s)w(s, s′)φ(s, s′))dt.

Proof. We will show that E [∑kσ

k=1 φ(sσk , sσk+1)] = ∫T

0 (∑s∑s′≠s πt(s)w(s, s′)φ(s, s′))dt.

Recall that, by definition of a generator matrix for time-homogeneous Markov

chains, for all t ≥ 0, w(s, s′) = limh→0P(Xt+h=s

′∣Xt=s)h . Due to right-continuity, we

may choose h, such that the possibility of two reactions occurring within the inter-

val [t, t+h) is negligible. Then, for a fixed trace γ ∈ Γ, the function φ(γ(t), γ(t+h))


has non-zero value only in the interval of at most length h before some jump occurs.

Therefore,

∫T−h

0φ(γ(t), γ(t + h))dt = ∫

tγ1

tγ1−hφ(sγ0 , s

γ1)dt + . . . + ∫

tσk

tγk−hφ(sγk−1, s

γk)dt = h

kγ

∑k=1

φ(sγk, sγk+1).

Hence,

E [kγ

∑k=1

φ(sγk, sγk+1)] =∑

γ∈Γ

f(γ) limh→0

1

h(∫

T−h

0φ(γ(t), γ(t + h))dt)dγ

= limh→0

1

h(∑γ∈Γ

∑s∑s′≠s

f(γ)∫T−h

01γ(t)=s,γ(t+h)=s′φ(s, s′)dtdγ)

= limh→0

1

h ∫T−h

0(∑s∑s′≠s

P(Xt = s,Xt+h = s′)φ(s, s′)dt)

= limh→0

1

h ∫T−h

0(∑s∑s′≠s

P(Xt = s)P(Xt+h = s′ ∣Xt = s)φ(s, s′)dt)

= ∫T

0(∑s∑s′≠s

πt(s)w(s, s′)φ(s, s′))dt.

Proof. (Theorem 6.12) From (6.3) and (6.4), it follows that

D(PX[0,T ) ∣∣P′

X[0,T )) = E[ln f] − E[ln f ′] = E[ln(p0(sσ0) − ln(p′0(sσ0)]

− E [kσ

∑j=0

a(sσj )(tσj+1 − tσj )) −kσ

∑j=0

a′(sσj )(tσj+1 − tσj )]

+ E [kσ

∑j=1

lnw(sσj , sσj+1)] −kσ

∑j=1

lnw′(sσj , sσj+1)]

=D(PX0 ∣∣PX′0) + ∫

T

0∑s

πs(t)(∑s′≠s

w(s, s′) lnw(s, s′)w′(s, s′) − (a(s) − a′(s))) .

6.3.1 Lifting: Continuous case

We construct lifting, analogous to the discrete-time case.

Definition 6.16. Given a continuous-time Markov graph and a probability dis-

tribution π ∶ S → [0,1], let α(A, s) be defined as in (6.1). Then, a Markov graph

(S,w′, p′0) defined by


(i) p′0(s) = α(A, s)p0(A) for A = g(s), and

(ii) w′(s, s′) =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

α(A′, s′)w(A,A′) if g(s) ≠ g(s′), A = g(s), A′ = g(s′)

0 if g(s) = g(s′) and s ≠ s′

w(A,A′) if s = s′, A = g(s), A′ = g(s′).

is called a π-lifting of (S,w, p0).

Lemma 6.17. The π-lifting of a continuous-time Markov graph is a continuous-

time Markov graph.

Notice that the rates between two different lumped states are always assigned rate

zero. To show that the lifting is a well-defined continuous-time Markov graph,

notice first that for s ≠ s′, it holds that w′(s, s′) ≥ 0. Moreover, for a fixed s, and

A = g(s),

∑s′∈S

w′(s, s′) = ∑s′∈S∖A

w′(s, s′) + ∑s′∈A∖s

w′(s, s′) +w′(s, s)

= ∑A′∈S∖A

∑s′∈[A′]

α(A′, s′)w(A,A′) + w′(A,A)

= ∑A′∈S∖A

w(A,A′)⎛⎝ ∑s′∈[A′]

α(A′, s′)⎞⎠+ w(A,A) = 0.

Lemma 6.18. If (S,w′, p′0) is a π-lifting of (S, w, p0), then (S, w, p0) is an exact

aggregation of (S,w′, p′0).

Proof. Let A,A′ ∈ S and s′ ∈ [A′]. For A ≠ A′, the proof that δ−(A, s′) = w(A,A′)is analogous to the case of discrete time. For A = A′, by definition of lifting, the

only non-zero rate towards state s′, inside cluster A is the self-loop. Therefore,

δ−(A, s′) = ∑s∈[s]

α(A, s)w′(s, s′)α(A, s′) = w′(s, s) = w(A,A).

Theorem 6.19. Let π be some probability distribution on S, and X ′t be a

π-lifting of (S, w, p0). Then, for all t ≥ 0,

∆M,M(T ) ≤D(PX[0,T ) ∣∣P′

X[0,T )).


Proof. Since Yt is a g-projection of Xt, the measurement Y [0,T ) is a composi-

tion of measurement g and X[0,T ) on a directly given model of process Xt. By

Lemma 6.18, since the aggregation from (S,w′, p′0) to (S, w′, p′0) is exact, Y ′t is

a g-projection X ′t, and, thus, Y ′[0,T ) = g X ′[0,T ). The claim follows by the fact

the measurement reduces relative entropy (by Theorem 6.3).

6.4 Trace semantics interpretation of approximate aggrega-

tions

We can now summarize the results of this chapter by interpreting the error measure

framework on the trace semantics.

Theorem 6.20. Let Xn be a DTMC of M = (S,w, p0) and Y ′n a process

of some g-aggregation M = (S, w, p0). Moreover, denote by Xπn the process

assigned to some π-llifting (S,w′, p′0). Then, for all n ≥ 0, for all π probability

distributions on S,

∆M,M(n) ≤D(PX0∶n ∣∣PπX0∶n).

The Theorem tells that any lifting will provide an upper bound to the aggregation

error. If Xn has a unique stationary distribution µ, the authors in [27] show

that arg minπ D(P∣∣Pπ) = µ, that is, the best bound with respect to the long-term

behavior is achieved for lifting with the stationary distribution. It is intuitively

clear that, for long-term behavior, the best g-aggregation is constructed for π = µ,

that is, when the stationary distribution is taken to be the reference for conditional

distributions α.

Theorem 6.21. Let Xt be a CTMC assigned to M ≡ (S,w, p0) and Y ′t a

process assigned to some g-aggregation M ≡ (S, w, p0). Moreover, denote by Xπt

the process assigned to some π-llifting (S,w′, p′0). Then, for all T ≥ 0, for all π

probability distributions on S,

∆M,M(T ) ≤D(PX[0,T ) ∣∣PπX[0,T )).


6.5 Matrix representation

Similarly as in the case of exact aggregations, we provide the specification of lifting

in matrix representation.

Corollary 7. For a given π, let α be defined as in (6.1), and let V and Uαbe

defined as in Corollary 5. Let P be a transition matrix of a Markov chain assigned

to (S,w, p0) (either discrete- or continuous-time). Moreover, let P be the transition

matrix of a Markov chain assigned to (S, w, p0), and Pπ be the transition matrix

of a Markov chain of the corresponding π-lifting. Then,

(i) Pπ = VPUα and P = UαPπV.

Chapter 7

Approximate automatic reductions of

stochastic rule-based models

Exact reductions deal with finding those fragments which ensure that the aggre-

gation from the species-based Markov graph to the fragment-based Markov graph

is exact. We have shown that, if each two sites, which are directly or indirectly (by

transitivity) tested or modified within the rule set, are correlated in the contact

map annotation, the corresponding fragment-based reduction is guaranteed to be

exact. In such a framework, it may happen that the fragment set output by the

algorithm leaves the system at a prohibitive size, or that it even coincides with the

species-based description (in which case the system remains at its original size).

In this Chapter, we discuss how to perform an approximate F -reduction, and we

describe an application scenario of using the error bound framework described in

Chapter 6.

(R, G0)

(R, G0) aggregation error

Xt

Yt

projection

Y t

X t

lifting

aggregationreduction

Q

Q

Qµ p(t), , ,

,

,

Figure 7.1: Approximate reductions framework. (arrows have no formalmeaning, and serve for illustration purpose: double arrows indicate assigning aMarkov graph to a rule-based program, single full arrow stands for reduction,single arrows indicate operations over a Markov graph, and the dotted arrow is

a never explicitly performed).

113

Chapter 7. Approximate automatic reductions of stochastic rule-based models 114

y1

y2

y34

x1

x2

x3

x4c1− + c3

c2−

c2− + c4

[c−2 , c−2 + c4]

[c−1 , c−1 + c3]

c−1

Figure 7.2: Approximate reduction involves assuming a distribution over thelumped states: A part of a Markov graph from Example 2.1, as presented inthe motivating example, Figure 3.1. Since conditions (Cond1) and (Cond2) are

violated, approximate rates are derived.

The general concept of approximate reductions is illustrated in Figure 7.1. The

rule set R is translated to a rule set R, by Definition 3.10. Reduction error is

the aggregation error between the species-based Markov graph assigned to the

rule-set R and the species-based Markov graph assigned to the rule-set R. The

upper bound on the aggregation error is computed by lifting the process Y ′t to

X ′t. The computation necessitates the availability of the generator matrix of

the original system, and the transient distribution of process Xt. If the user is

interested only in the stationary behavior, the stationary distribution µ suffices.

In Figure 7.2, we show a part of a Markov graph from Example 2.1, as presented

in the motivating example, Figure 3.1. By adding rules R3, R−3 and R4, R−

4 to

the model, condition (Cond1) and (Cond2) are violated. For example, see that

δ+(x3, [y1]ϕ) = c−2 ≠ c−2 + c4 = δ+(x4, [y1]ϕ). The weights w(y34,y1) and w(y34,y2)are approximated, under a certain distribution over the lumped states, denoted

by α. Then, the translated rule-set R is such that w(y34,y1) = α(y34,x3)c−2 +α(y34,x3)(c−2 + c4).

In a local reduction, we take α(y34,x3) = 23 and α(y34,x4) = 1

3 , where α is cho-

sen according to the formula from Theorem 5.10. The reduction is called local,

because the conditional distribution can be determined without looking at the

global dynamics of the rule-based model. As mentioned before, the global reduc-

tion is the one where we know the stationary distribution µ of the original Markov

graph, in which case the approximated rate is set with α(y34,x3) = µ(x3)

µ(x3)+µ(x4)and

α(y34,x4) = µ(x4)

µ(x3)+µ(x4).


7.1 Approximate reductions and error bound

Let M be the species-based Markov graph of rule-based program (R,G0), and M

the species-based Markov graph of its translation (R, G0). Let Q be the generator

of M , µ its stationary distribution and p(t) its transient distribution. We will

denote by Q the generator of M (Figure 7.1).

Definition 7.1. The reduction error between (R,G0) and (R, G0) until time T is

the aggregation error between M and M until time T (defined in Definition 6.11).

Theorem 7.2. For a given fragment set F , the aggregation M is such that Qα =UαQV, where Uα is defined as in Theorem 5.10, and α is defined as in Corollary 5.

Suppose that in M , the rule Ri is applicable to the fraction of states [x]ϕS ⊆ [y]ϕF .

Let G ∈ [x]ϕS . There are in total ∣Aut(x)∣∣Emb(Gi,G)∣ embeddings in the set

∪Emb(Gi,G) ∣ G ∈ [y]ϕF, thus the total rate after applying rule Ri in M is equal

to ci∣Aut(x)∣∣Emb(Gi,G)∣. In M , the translated rule Ri can be applied to each

G ∈ [y]ϕS with the rate c∣Emb(Gi, G)∣, which is equal for all G ∈ [y]ϕS . It suffices

to show that this rate equals

ci∣Aut(x)∣∣Aut(y)∣ ∣Emb(Gi,G)∣,

which can be checked similarly as in Lemma 5.12.

The lifted chain is accordingly defined by Q′ = VQUα. The computation ne-

cessitates the availability of the generator matrix of the original system, and the

transient distribution of process Xt. If the user is interested only in the sta-

tionary behavior, the stationary distribution µ suffices - the aggregated chain can

be defined directly, by setting Q = UµQV, and the lifting with Q′ = VQUµ. To

this end, we call a reduction local, in the sense that only the local, fragment-based

description of the current state needs to be known for defining the reduction. A

reduction is called global, if it is performed with respect to the stationary distri-

bution.

7.2 Tests

We illustrate the framework on three case studies. For each case study, we com-

pare the of the local and global reduction. In Figure 7.3, Example 2.1, the local


0.2

time

stationary (global reduction)

uniform (local reduction)

24 nodes, 81 states (165 before)

15 nodes, 36 states (56 before)d dt∆

(M,M

)(t)

,upper

bound

1

Example 1: Simple scaffold

Figure 7.3: Testing approximate reductions framework. a) Example 2.1:Transient semantics of M and the stationary distribution are obtained byintegrating the CME. The reduction is obtained for a set of fragments Fig-ure 3.2, and the upper bound on ∆(M,M) is computed for both local andglobal reduction. As expected, the global reduction gives a better error boundat the stationary regime. The initial state is set to x = 8SA,8SB,8SC orx = 5SA,5SB,5SC, and rate values are set to c1 = 0.001, c1−=2, c2 = 0.002,c2− = 3, and c3 = 0.05, c4 = 0.2. The dotted lines are the error bound in casec3 = c4 = 0, when the reduction is exact. Notice that the error is not equal tozero even in the case of exact lumping – this is because the error represents theupper bound to the actual error between the lumped processes (the error upperbound would be equal to zero only in the very specific situation that the lifted

process coincides with the original process).

reduction provides a better error bound in the transient regime, while the global

reduction, as expected, gives a better error bound at the stationary regime. We

run the system at two different initial copy numbers, so to observe that the er-

ror grows with the system size. In Figure 7.4a, Example 2.2, we illustrate the

error bound for the three possible fragment sets, with annotating maps shown in

Figure 3.3. Similarly, in Figure 7.4b, for Example 2.3, the framework is used for

showing which of the three fragmentations with the same dimension provides the

smallest error. Plots show that, at the stationary, both F2 or F3 are better than

F4, which can be explained by the fact that the correlation between sites c and x

and c and y, respectively preserved with F2 and F3, is tested with the rules R∗2

and R∗3 . It would be difficult to inspect that either F2 or F3 is better, by only

looking at the information flow in the rule-set. Results show that reduction with

F3 gives a smallest error. Since the error ∆M,M(T ) is the integral of the plotted

functions until time T , it is notable that the discussed ordering of fragment sets

does not hold for small values of T (for the considered initial state).


stationary (global reduction)

uniform (local reduction)

F3

F4

Example 2: Polymerization

80 states

112 states

64 states

time

0.5

1

d dt∆

(M,M

)(t)

,upper

bou

nd

F3

F4

Example 3: Conditional independence

64 states

time

d dt∆

(M,M

)(t)

,upper

bound

F2

a) b)120 states

F2

0

0.1

1 2 3 4 520.5 1.5 000

Figure 7.4: Comparing two fragment-based reductions with the same dimen-sion. a) Example 2.1. With initially 3 nodes of type A and 3 nodes of typeB, the Markov graph counts 112 states. The rates are set to c1 = 3, c−1 = 2.4,c2 = 3.5, c−2 = 2.1, c3 = 0.1, c−3 = 3.6, c∗3 = 3.3;. We compared the error boundfor the reduction for fragment set F2, F3 and F4. With F4, the reduction isfrom 112 to 64 states. With F2 and F3, the reduction is from 112 to 80 states.Preserving the correlation between the sites of nodes A (F2) shows a smallererror bound then preserving the correlation between the sites of nodes B (F3).b) Example 2.3. The system initialized at 3 nodes, inactivated at all three siteshas a Markov graph with 120 states. Three fragment sets F2, F3 and F4 reducethe state space to 80 states. The rates are set to c1 = 1, c2 = 0.6, c3 = 0.5,

c−1 = 1.2, c−2 = 0.3, c−3 = 0.5, c∗2 = 2, c∗y = 2.

The proposed ‘global’ reduction technique is not limited to the application of rule-

based models, and can be applied to any continuous-time Markov chain model.

’Local’ reductions are specific to site-graph-rewrite grammars. The implementa-

tion of the approximate reductions framework within the Kappa modeling frame-

work is a work in progress. For that reason, we leave the analysis of the approxi-

mate framework for large-scale case studies to future work.

At the end of this Chapter, we remark that, as biochemical models are already

an approximation of reality, models obtained by approximate reductions can also

be seen as alternative model candidates which operate on a context that is more

‘local’ than the given, reference model. Discriminating between such model candi-

dates, which differ only at the radius of context tested by a rule opens numerous

challenges related to model validation, which is out of the scope of this thesis work.

Chapter 8

Case studies

Two large-scale case studies of signaling pathways were chosen to analyze the per-

formance of the fragment-based reduction: a crosstalk between epidermal growth-

factor receptor and insulin receptor pathway, and high-osmolarity glycerol path-

way in yeast. Each model is captured by a set of site-graph-rewrite rules, which

are then input to a rule-based programming environment Kappa [34]. For the

definition of Kappa syntax and its operational semantics, we refer to [32]. In Fig-

ure 8.1d, we illustrate how one site-graph rewrite rule is written in Kappa. We

will use several features of Kappa based on static analysis of a rule-set: (1) an au-

tomatized generation of the contact map of a model, (2) the over-approximation

of the set of reachable species [22], and (3) the contact map annotation according

to Algorithm 2, together with the generation of a new rule-set, as defined in Defi-

nition 3.10. Rule rates will not be specified, since they do not affect the outcome

of Algorithm 2.

8.1 EGF/insulin receptor pathway

We study a model of a crosstalk between the epidermal growth-factor receptor

(EGFR) and the insulin receptor pathway. In general, EGFR exists on the cell

surface and is activated by binding of its specific ligands, EGF in this case. In

turn, it initiates a signaling cascade related to a variety of important biochemical

changes, such as cell growth, proliferation, and differentiation. A huge number of

feasible multi-protein species can be formed in a detailed model of this signaling

pathway [7]. For example, in the complete model described in [20], the number of

118

Chapter 8. Case studies 119

Figure 8.1: The set of rules for the early EGF/insulin crosstalk model. Theunderlying mechanistic model is taken from [14]. Original model contains 42956reaction and 2768 species. Kappa syntax supports two types of shorthand nota-tion: a site which simultaneously bears an internal state and serves as a bindingsite (for example, site b of node type EGFR), and the dash symbol which denotesthat the site is bound - for example, in rule r10, EGFR(bu, d

−) denotes that sited is bound.

reachable complexes is estimated to ≈ 1020. We focused on a model of the early

signaling cascade of events, described in [14]. More precisely, the model focuses

on the signaling from the initial receptor binding (either of EGF or Ins), until the

recruitment of the transport molecule Grb by binding to Sos. Grb, the growth

factor receptor-bound protein, is also known as the transport molecule, because of

its ability to link the EGFR to the activation of Ras and the further downstream

components. The model involves only eight proteins, combined into 2768 different

molecular species. The interactions are captured by a set of 42956 reactions.

8.1.1 Model description

The reactions were translated into a rule-based model with 38 reversible rules,

shown in Figure 8.1. Eight node types arise: A = EGF,EGFR, IR,Sos,Grb, IRS, Ins,Shc.

The contact map of the model is given in Figure 8.2a. Each of the eight proteins

is assigned a set of sites. For example, Σ(EGFR) = a, b, c, d. The shaded sites in

the figure bear an internal site value. For example, the internal site b of protein

EGFR can bear two internal values - I (b) = u,p, where bp denotes that the

site is phosphorylated. It is worth noticing that some sites have multiple binding

partners, which denotes a competition (concurrency) for binding, because only one


Sos

d

b

a

a

a

d

b

a

b

d

c

a

a

a

a

bc

b

EGF

EGFR

Shc

IR

IRS

Grb Ins

d

b b

d

EGFR EGFR

EGFR(bu, d) → EGFR(bp, d)

a)

a

d

b

a

b

IRS

Grb

Sos

a

d

b

Grb

Sos

a

d

b

a

b

IRS

Grb

Sos

a

d

bGrb

Sos

d

d

c)

b)

d)

d

b

a

a

a

d

b

a

b

d

c

a

a

a

a

bc

b

EGF

EGFR

Shc

IR

IRS

Grb

Sos

Ins

Figure 8.2: EGF/insulin crosstalk model. a) Contact map. The gray-shadedsites bear internal value. b) Contact map annotation. c) Two reaction mixtureswhich are equivalent with respect to the annotation. The green color denotesphosphorylated state. d) An example of a Kappa rule and the site-graph rewriterule: EGFR(bu, d) denotes a site-graph G = (V,Type, I,E,ψ) with one node V =

v, such that Type(v) = EGFR, interface function I(v) = b, d and evaluationfunction ψ(v, b) = u.

bond can be established at a time. For example, the site a of protein Grb has three

possible binding partners. Moreover, a self-loop at the site d of a receptor EGFR

means that it can be bound to the site d of another EGFR. Therefore, one or two

nodes of type EGFR can be found in a single species.

Two major pathways are involved: one starting with the receptor EGFR, and

another, starting at the receptor IR. The two pathways share proteins. We explain

how each pathway works, by focusing on the forward direction of rules.

In the first branch, EGFR recruits a transport molecule Grb. Three rules model

the self-dimerization of EGFR’s (r03,r04,r05). The rate depends on whether the

ligand EGF is already recruited or not. When EGFR is in a dimer with another

EGFR, it is considered to be in its active form. Therefore, depending on whether

site d is bound or free, two rules model for EGFR recruiting a ligand (EGF) on site

a (r01,r02), and two rules model that the site b of EGFR can be phosphorylated

(r06,r07). The phosphorylation signal can be passed from EGFR to the adapter

molecule Shc (r09,r10) if previously bound to it (r08). Finally, Shc recruits a


transport molecule Grb (r11). Yet, each receptor has a shorter way to recruit a

transport molecule. The site c of EGFR can be phosphorylated (r12,r13), and then

bind to Grb directly (r14,r15).

In the second branch, an insulin receptor (IR) recruits the transport molecule Grb.

Receptor IR can recruit insulin molecules (Ins) on two sites– a (r16,r17) and b

(r18,r19) (the rate may depend on whether an insulin molecule is already bound).

Similarly, the site c of the IR can be phosphorylated (r20,r21,r22,r23). Then, IR can

recruit an adapter Shc (r24). Whenever IR is also bound to two insulin molecules,

Shc can be phosphorylized (r25). Adapter Shc can then recruit Grb (r11). Yet,

IR also has a direct way of recruiting a Grb: the site d of IR (r26,r27,r28,r29) can

phosphorylate, and then recruit another adapter IRS (r30) which can be activated

when the insulin receptor is bound to two insulin molecules (r31). Then, IRS can

recruit Grb (r32).

Finally, independently, Grb can bind to a protein Sos (r33,r34), and Sos is activated

(r35,r36). The remaining rules describe the recruitment of Sos by Grb (r37), and

spontaneous (de)phosphorylation of Shc (r37) and IR (r38).

8.1.2 Exact fragment-based reduction

Applying Algorithm 2 to the model, a reduction from a dimension of 2768 species to

609 fragments is obtained. The annotated contact map is given in Figure 8.2b. The

interface of protein Grb is split into two annotation classes, because no rule tests

both the site a and b of Grb. Thus, the partition Σ(Grb)/∼ = C (Grb) = a,bdefines a set of fragments for which the reduction is exact. Two fragment-based

equivalent mixtures are shown in Figure 8.2c.

Even though there are cycles in the site-graph representation of a contact map,

none of these cycles is a path of a site-graph. Consequently, no more than two

EGFR can be within the same species. The largest species for this contact map

counts 16 nodes (containing two EGFR nodes, two EGF nodes, four Grb nodes, four

Shc nodes), while the equivalent fragment counts 12 nodes.


8.2 HOG pathway in yeast

A model of the adaption of yeast cells (S. Cerevisae) to external osmotic changes is

discussed. Osmotic shock initiates a quick increase in the external osmotic pressure

and a decrease in the turgor pressure of the cell. The balance between internal and

external pressures of a yeast cell upon osmotic shock is re-established by the high-

osmolarity glycerol (HOG) pathway. More specifically, the transducers of osmotic

signals are mitogen-activated protein kinases (MAPK) cascades which serve to

activate the HOG molecule; Upon activation, gene expression and metabolism

modules regulate an increase in glycerol, which is in turn used to increase the

internal osmotic pressure [58]. The increase in internal osmotic pressure balances

out the turgor pressure back to its original value, which deactivates the pathway

and stops unnecessary glycerol production.

As mentioned, upon activation, the HOG molecule translocates to the nucleus

and the gene transcription is initiated. At this point, it was recently reported

that, depending on the intensity of the osmotic stress, the gene expression may

vary from cell to cell [69]. In other words, a bimodal expression behavior in a cell

population is exhibited.

We consider a detailed rule-based model of the HOG pathway in yeast, based on

the evidence taken from literature [59]. The authors of the model were aiming at

a platform for ‘in silico’ experiments related to the aforementioned phenomenon

of bimodal response in transcriptional output of HOG pathway in yeast.

8.2.1 Model description

Since the model is very detailed, and the purpose of our analysis is to comment on

the dimension decrease after applying the exact fragment-based model reduction,

we provide only a high-level model description.

The model comprises the Sln and Sho branches of Hog1 activation. It contains 41

node types and 443 rules. Each of the two osmosensors at the cell membrane (Sln

and Sho) can activate a MAPK kinase kinases (Ste11 or Ssk2.22) which bind to

a MAPK kinase Pbs2. The MAPK kinase Pbs2 then doubly phosphorylates the

MAPK Hog1, which then rapidly translocates to the nucleus and starts the gene

transcription. This leads to the conversion to glycerol. Other genes will work to


Figure 8.3: High-osmolarity glycerol (HOG) model in yeast: contact mapobtained by Kappa - A summary of node types, their domains and possiblebindings. The model comprises 41 agents and 443 rules. The site Localiz encodesthe cellular compartments in which it can be found (membrane, cytosol, or

nucleus).

dephosphorylate the active Hog1 in the nucleus, which causes it to move back out

into the cytosol.

The input to the system, salt concentration, is modeled by the node type Osm. The

output, apart from Hog1, are nodes of type mVenus and mCherry, which denote

mRNA proteins measured by fluorescent markers and indicate stochasticity. More

precisely, the correlation between the intensities of the two mRNA’s serves to

quantify the contribution of inter-cellular (extrinsic) variability and intracellular

(intrinsic) variability to the overall expression noise (more details on the method

can be found in [85]).

Not all node types represent proteins, nor do all site types represent protein do-

mains in this model. For example, the node type FeedbackDummy or GlycFeedback

are incorporated for regulating the feedback mechanism. Moreover, site Localiz at

nodes Hog1, Ssk1, Ssk2, Ste50 denotes the compartment localization of the protein

(nucleus or cytosol).

8.2.2 Reachable species

The contact map generated automatically by Kappa is shown in Figure 8.3. The

exact number of reachable species could not be reported by Kappa, since it counts

over 109. The protein with the largest number of sites is Hog1 (10 sites). Our

calculation reports that only the number of species involving Hog1 counts 1476


(without polymers). A creation of species containing an unbound number of nodes

is possible in this model. For example, notice that, Hog1, Pbs2, Ste50 and Ste11

can theoretically form polymers of a size limited only by the total number of each

of the proteins in the reaction mixture (Figure 8.4).

8.2.3 Exact fragment-based reduction and model decomposition

The model is translated to a new rule-based model, according to the annotation by

Algorithm 2. We report the annotation classes for a part of the model, shown in

Figure 8.4. While all sites of the protein Hog1 are captured in one annotation class,

the scaffold Pbs2 has three independently interacting groups of sites, rendering the

number of fragments containing Pbs2 significantly smaller than the corresponding

species. Moreover, while proteins Hog1, Pbs2, Ste50, Ste11 exhibit unbounded

polymerization in the species-based system, and, consequently, produce a number

of species exponential with respect to proteins’ abundances, the number of frag-

ments composed of the same group of proteins is constant. This situation can be

put in analogy with the simple polimerization case study Example 2.2 presented

in Chapter 3, where every cycle in the contact map is broken by the annotation

classes. Despite the argued reduction from the number of species to the number

of fragments, Kappa reported that the number of fragments (species of the new

model) still counts more than 109.

We performed additional analysis of detecting smaller, independent stochastic sys-

tems, as suggested in [71]. The model decomposition with fragments is not yet

automatized in Kappa, so we performed the calculation manually. The model can

be decomposed into 20 smaller models, which can be independently analyzed. Re-

call from Chapter 3, that if a node of type A has an annotation class C ⊆ C (A),a new node type is assigned a name AC. Then, for example, the set of fragments

containing new nodes of type Pbs2Ste11, and/or a node of type Ste11Pbs2 build

one module, which interacts independently from the rest of the system. Another,

larger module is, for example, among the set of fragments arising from the new

node types Ssk22, Ssk2, Ptc2Hog1, FeedbackDummyPtc23, Pbs2Ptc23, Pbs2Ste11.

It is worth noticing that the annotation classes, and the possibility of decompos-

ing the model into smaller, independent units reflects that the modeler did not

incorporate cross-interactions between these modules. This is either because there

is indeed no evidence about the cross-interaction, or, that the modeler simply did


Figure 8.4: High-osmolarity glycerol (HOG) model in yeast: MAPK cascade.A part of model related to MAPK cascade is isolated. The red boxes denote

the annotations done according to the output of Algorithm 2.

not exhaust the literature related to those cross-interactions. To this end, the de-

composition of the model to smaller units, apart from the possibility to simulate

each of the units independently and to faithfully compose the obtained results,

also serves as to automatically reveal the assumptions which the modeler imposes

on the dependence of the interaction units.

Chapter 9

Conclusions and Discussion

Handling complexity is an important challenge towards understanding mechanisms

of molecular signaling. In this work, we confirmed our main hypothesis, that pro-

gram static analysis can be employed to successfully reduce rule-based models of

complex biochemical systems. More specifically, we showed how to systematically

derive a reduced model that operates over a set of much fewer, coarse-grained

variables – fragments – which self-consistently describe their own stochastic evo-

lution. We thoroughly analyzed mathematical relations between the original and

the reduced rule-set, and what these relations imply for their respective CTMC’s.

The specificity of the presented reduction procedure is that it is efficient – of

complexity linear in the size of the rule-set, automated – it applies to any well-

defined rule-based program. Formal relation between the respective CTMC’s is

guaranteed within two frameworks. In the framework for exact reductions, the set

of fragments is enforced and the precise relation between respective CTMC’s is

guaranteed. In the framework for approximate reductions, the set of fragments can

vary, and, for a given time limit of a trace, the error in terms of Kullback-Leibler

divergence for trace distributions of the CTMC’s is computed.

In the following, several research questions which directly complement the work

presented in this thesis are discussed.

The procedure for obtaining fragments which guarantee exact reduction (Algo-

rithm 2) correlates any two sites which are related directly or indirectly within a

left-hand-side or a right-hand-side of a rule, and it hence enforces a ‘strong’ in-

dependence notion between the uncorrelated sites. In turn, precisely such strong

independence brings a possibility to effectively reconstruct the transient semantics

126

Chapter 7. Conclusions and Discussion 127

of the original system. Despite such strong correlation notion, it was shown that

the reduction can be significant, as shown over the EGFR/insulin crosstalk case

study, or even radical – as shown on a simple polymerization example. However,

in several other test examples, Algorithm 2 reported the annotation equal to the

species-based description. Indeed, a typical signaling cascade module involves a

cascade of tests over pairs of sites, which are finally all correlated due to transitivity

of annotation relation. In such a case, the framework for approximate reductions

can be used to quantitatively study coarse-grained executions, even when its exe-

cutions are not consistent with the original model. In the current framework, the

computation of error bound relies on knowing the generator matrix and transient

distribution of the original process. To this end, the efficient numerical estimation

of the error bounds is a first compelling question for future work.

Second, it would be interesting to investigate whether in case of no pattern dele-

tion, the annotation which tests only the left-hand-side of each rule is sufficient for

claiming the reduction exact. For example, in Example 2.1, if no deletion of pat-

terns is involved, by Theorem 5.9, adding a rule ∅ c3→ SABC does not influence the

forward property, even though the annotation provided by Algorithm 2 enforces

it (notice that invertability can no longer be claimed). On the other side, adding

a rule FAB?c4→ ∅ indeed breaks both forward and backward property (it suffices to

observe three species-based states x0 = SB, x1 = SB, SABC, x2 = SAB, SBC).

The above observation relates to what the authors in [49] informally named am-

biguous update.

Moreover, as ODE fragments are typically fewer than stochastic ones (for example,

the presented EGF/insulin case study, the ODE fragments count 39 and stochastic

fragments 609), it motivates to study whether ODE fragments can be used for

exact simulation of stochastic traces, or, for correct computation of the transient

distribution. To this end, the result of Kurtz [61] – that the ODE model is a

thermodynamical limit of the stochastic model – is an important insight. However,

direct comparison of ODE and stochastic fragments is not possible, because the

annotation used for ODE fragment need not be transitive. In Example 2.3, while

the procedure for stochastic fragments outputs C (A) = c, x, y, the procedure for

differential fragments would output C (A) = c, x,c, y. Such a fragment set

is of dimension 6, and would be positioned between F1 (dimension 8) and F2, F3

(dimension 5) in the annotation lattice in Figure 3.3. Preliminary analysis over

examples show that ODE fragments sometimes preserve the transient distribution

Bibliography 128

in the stochastic: using ODE fragments in the stochastic setting does not provide

an exact reduction (it suffices to observe the state SA010 , SA111 and transition by

R1), while numerical experiments show that the transient distribution is preserved.

A generalization of this observation as an extension to the current framework

towards using ODE fragments in the stochastic setting, requires a different rule-

set translation procedure and necessitates further technical analysis.

Finally, it is important to mention that the here-presented work deals with pro-

viding more efficient executions of a given rule-based model (taken as the ‘ground

truth’), while we do not address the problem of collecting the modeling hypothesis

or validating that model with respect to experimental data. Studying fragment-

based reductions in a wider modeling context opens numerous challenges for for-

mal methods, when used as a service towards better understanding mechanisms

of molecular signaling. As a good model needs to be consistent with the observa-

tion, but also to predict behaviors which can be tested by observation, one such

question is how to tailor the reduction to the high-level, qualitative experimental

observation (for example, formation of a species, bimodality or causal relation be-

tween events). For example, for studying phenotypic variety, it sometimes suffices

to use a model where each site is correlated only to itself [26].

We believe that coupling the fragment-based reduction technique with a formally

expressed question of interest can provide significantly better reduction and ulti-

mately facilitate efficient, automated reasoning within the modeling cycle. This

would in turn allow the biologist to focus on the key biological principles instead

of solving equations and interpreting complicated diagrams.

Bibliography

[1] Bree B Aldridge, John M Burke, Douglas A Lauffenburger, and Peter K

Sorger. Physicochemical modelling of cell signalling pathways. Nature Cell

Biology, 8(11):1195–1203, November 2006.

[2] David F Anderson and Thomas G Kurtz. Continuous time markov chain mod-

els for chemical reaction networks. In Design and Analysis of Biomolecular

Circuits, pages 3–42. Springer, 2011.

[3] Cedric Archambeau and Manfred Opper. Approximate inference for

continuous-time markov processes. In Bayesian Time Series Models, pages

125–140. Cambridge University Press, 2011.

[4] Jean-Pierre Banatre, Pascal Fradet, and Daniel Le Metayer. Gamma and the

chemical reaction model: Fifteen years after. In Multiset Processing, pages

17–44. Springer, 2001.

[5] Jean-Pierre Banatre and Daniel Le Metayer. Programming by multiset trans-

formation. Communications ACM, 36(1):98–111, 1993.

[6] Michael L Blinov, James R Faeder, Byron Goldstein, and William S Hlavacek.

Bionetgen: software for rule-based modeling of signal transduction based on

the interactions of molecular domains. Bioinformatics, 20(17):3289–3291,

2004.

[7] Michael L Blinov, James R Faeder, Byron Goldstein, William S Hlavacek,

et al. A network model of early events in epidermal growth factor receptor

signaling that accounts for combinatorial complexity. Biosystems, 83(2):136–

151, 2006.

[8] Nikolay M Borisov, Nick I Markevich, Jan B Hoek, and Boris N Kholodenko.

Signaling through receptors and scaffolds: independent interactions reduce

combinatorial complexity. Biophysical journal, 89(2):951–966, 2005.

129

Bibliography 130

[9] N.M. Borisov, A.S. Chistopolsky, J.R. Faeder, and B.N. Kholodenko. Domain-

oriented reduction of rule-based network models. IET Syst.Biol., 2, 2008.

[10] Peter Buchholz. Bisimulation relations for weighted automata. Theoretical

Computer Science, Volume 393, Issue 1-3:109–123, 2008.

[11] Jerry R Burch, Edmund M Clarke, Kenneth L McMillan, David L Dill, and

Lain-Jinn Hwang. Symbolic model checking: 1020 states and beyond. Infor-

mation and computation, 98(2):142–170, 1992.

[12] Federica Ciocchetta and Jane Hillston. Bio-pepa: A framework for the

modelling and analysis of biological systems. Theoretical Computer Science,

410(33):3065–3084, 2009.

[13] Ido Cohn, Tal El-Hay, Nir Friedman, and Raz Kupferman. Mean field varia-

tional approximation for continuous-time bayesian networks. In Proceedings

of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI

’09, pages 91–100, Arlington, Virginia, United States, 2009. AUAI Press.

[14] Holger Conzelmann, Dirk Fey, and Ernst D Gilles. Exact model reduction of

combinatorial reaction networks. BMC Systems Biology, 2(78):342–351, 2008.

[15] Thomas H Cormen, Clifford Stein, Ronald L Rivest, and Charles E Leiserson.

Introduction to Algorithms, Chapter 21: Data structures and Disjoint Sets.

McGraw-Hill Higher Education, 2nd edition, 2001.

[16] Patrick Cousot. Abstract interpretation based formal methods and future

challenges. In Informatics - 10 Years Back. 10 Years Ahead., pages 138–156,

London, UK, 2001. Springer-Verlag.

[17] Thomas M Cover and Joy A Thomas. Elements of information theory. Wiley-

interscience, 2012.

[18] Gheorghe Craciun and Martin Feinberg. Multiple equilibria in complex chem-

ical reaction networks: Ii. the species-reaction graph. SIAM Journal on Ap-

plied Mathematics, 66(4):1321–1338, 2006.

[19] Vincent Danos, Jerome Feret, Walter Fontana, Russell Harmer, and Jean

Krivine. Rule-based modelling of cellular signalling. In CONCUR 2007–

Concurrency Theory, pages 17–41. Springer, 2007.

Bibliography 131

[20] Vincent Danos, Jerome Feret, Walter Fontana, Russell Harmer, and Jean

Krivine. Abstracting the differential semantics of rule-based models: exact

and automated model reduction. In Logic in Computer Science (LICS), 2010

25th Annual IEEE Symposium on, pages 362–381. IEEE, 2010.

[21] Vincent Danos, Jerome Feret, Walter Fontana, and Jean Krivine. Scalable

simulation of cellular signaling networks. In Programming Languages and

Systems, pages 139–157. Springer, 2007.

[22] Vincent Danos, Jerome Feret, Walter Fontana, and Jean Krivine. Abstract

interpretation of cellular signalling networks. Lecture Notes in Computer

Science, 4905:83–97, 2008.

[23] Vincent Danos, Jerome Feret, Walter Fontana, and Jean Krivine. Abstract

interpretation of reachable complexes in biological signalling networks. In Pro-

ceedings of the 9th International Conference on Verification, Model Checking

and Abstract Interpretation (VMCAI’08), volume 4905, pages 42–58, 2008.

[24] Vincent Danos and Cosimo Laneve. Formal molecular biology. Theoretical

Computer Science, 325(1):69–110, 2004.

[25] Xavier Darzacq, Jie Yao, Daniel R Larson, Sebastien Z Causse, Lana Bosanac,

Valeria de Turris, Vera M Ruda, Timothee Lionnet, Daniel Zenklusen, Ben-

jamin Guglielmi, et al. Imaging transcription in living cells. Annual review

of biophysics, 38:173, 2009.

[26] Eric J Deeds, Jean Krivine, Jerome Feret, Vincent Danos, and Walter

Fontana. Combinatorial complexity and compositional drift in protein in-

teraction networks. PLoS ONE, 7(3), 2012.

[27] Kun Deng, Prashant G Mehta, and Sean P Meyn. Optimal kullback-leibler

aggregation via spectral theory of markov chains. IEEE Trans. Automat.

Contr., 56(12):2793–2808, 2011.

[28] Lucas Dixon and Ross Duncan. Graphical reasoning in compact closed cat-

egories for quantum computation. Annals of Mathematics and Artificial In-

telligence, 56(1):23–42, 2009.

[29] Rick Durrett. Probability: Theory and examples, 2011.

Bibliography 132

[30] Jerome Feret. Fragements-based model reduction: some case studies. In

Jean Krivine and Angelo Troina, editors, Preproceedings of the First Inter-

national Workshop on Interactions between Computer Science and Biology,

CS2Bio ’2010, volume 268 of Electonic Notes in Theoretical Computer Sci-

ence, pages 77–96, Amsterdam, Netherlands, 10 June 2010. Elsevier Science

Publishers.

[31] Jerome Feret, Vincent Danos, Jean Krivine, Russ Harmer, and Walter

Fontana. Internal coarse-graining of molecular systems. Proceedings of the

National Academy of Sciences, 106(16):6453–6458, April 2009.

[32] Jerome Feret, Thomas Henzinger, Heinz Koeppl, and Tatjana Petrov. Lumpa-

bility abstractions of rule-based systems. Theoretical Computer Science,

431:137–164, 2012.

[33] Jerome Feret, Heinz Koeppl, and Tatjana Petrov. Stochastic fragments: A

framework for the exact reduction of the stochastic semantics of rule-based

models. International Journal of Software and Informatics, 4, to appear.

[34] Jerome Feret and Jean Krivine. Kasim: a simulator for kappa, 2008-2013.

http://www.kappalanguage.org.

[35] Jasmin Fisher and Thomas A Henzinger. Executable cell biology. Nature

Biotechnology, 25(11):1239–1249, November 2007.

[36] Walter Fontana and Leo W Buss. The barrier of objects: From dynamical

systems to bounded organizations. International Institute for Applied Systems

Analysis, 1996.

[37] Arnab Ganguly, Tatjana Petrov, and Heinz Koeppl. Markov chain aggrega-

tion and its applications to combinatorial reaction networks. arXiv preprint,

abs/1303.4532, 2012.

[38] J Christoph M Gebhardt, David M Suter, Rahul Roy, Ziqing W Zhao, Alec R

Chapman, Srinjan Basu, Tom Maniatis, and X Sunney Xie. Single-molecule

imaging of transcription factor binding to DNA in live mammalian cells. Na-

ture Methods, 10(5):421–426, May 2013.

[39] Alison L Gibbs and Francis E Su. On choosing and bounding probability

metrics. International Statistical Review (2002), 70(3):419–435, 2002.

http://www.kappalanguage.org

Bibliography 133

[40] Daniel T Gillespie. Exact stochastic simulation of coupled chemical reactions.

The journal of physical chemistry, 81(25):2340–2361, 1977.

[41] Daniel T Gillespie. Markov Processes: An Introduction for Physical Scientist.

Gulf Professional Publishing, 1992.

[42] Daniel T Gillespie. Stochastic simulation of chemical kinetics. Annu. Rev.

Phys. Chem., 58:35–55, 2007.

[43] Daniel T Gillespie. Deterministic limit of stochastic chemical kinetics. The

journal of physical chemistry. B, 113(6):1640–1644, February 2009.

[44] Alexander N Gorban and Ovidiu Radulescu. Dynamical robustness of bio-

logical networks with hierarchical distribution of time scales. arXiv preprint

q-bio/0701020, 2007.

[45] Peter JE Goss and Jean Peccoud. Quantitative modeling of stochastic systems

in molecular biology by using stochastic petri nets. Proceedings of the National

Academy of Sciences, 95(12):6750–6755, 1998.

[46] Robert M Gray. Entropy and information theory. Springer-Verlag New York,

Inc., New York, NY, USA, 1990.

[47] John Haigh. Stochastic modelling for systems biology by d. j. wilkinson.

Journal Of The Royal Statistical Society Series A, 170(1):261–261, 2007.

[48] Hardy and Ramanujan. Asymptotic formula in combinatory analysis. Pro-

ceedings of the London Mathematical Society, S2-17(1):75–115, 1918.

[49] Russ Harmer, Vincent Danos, Jerome Feret, Jean Krivine, and Fontana Wal-

ter. Intrinsic information carriers in combinatorial dynamical systems. Chaos,

(20):037108, 2010.

[50] H.Conzelmann, J.Saez-Rodriguez, T.Sauter, B.N.Kholodenko, and E.D.

Gilles. A domain-oriented approach to the reduction of combinatorial com-

plexity in signal transduction networks. BMC Bioinformatics, 7, 2006.

[51] Thomas A Henzinger, Barbara Jobstmann, and Verena Wolf. Formalisms

for specifying markovian population models. In Reachability Problems, pages

3–23. Springer, 2009.

[52] Thomas A Henzinger, Maria Mateescu, and Verena Wolf. Sliding window

abstraction for infinite Markov chains. In CAV, pages 337–352, 2009.

Bibliography 134

[53] William S. Hlavacek, James R. Faeder, Michael L. Blinov, Alan S. Perelson,

and Byron Goldstein. The complexity of complexes in signal transduction.

Biotechnol. Bio-eng., 84:783–794, 2005.

[54] W.S. Hlavacek. The complexity of complexes in signal transduction. Biotech-

nology and Bio-engineering, 84:783–794, 2005.

[55] Hye-Won Kang and Thomas G Kurtz. Separation of time-scales and model

reduction for stochastic reaction networks. The Annals of Applied Probability,

23(2):529–583, 2013.

[56] John Kemeny and James L Snell. Finite Markov Chains. Van Nostrand, 1960.

[57] Edda Klipp, Ralf Herwig, Axel Kowald, Christoph Wierling, and Hans

Lehrach. Systems biology in practice: concepts, implementation and appli-

cation. Wiley-Blackwell, 2008.

[58] Edda Klipp, Bodil Nordlander, Roland Kruger, Peter Gennemark, and Stefan

Hohmann. Integrative model of the response of yeast to osmotic shock. Nature

biotechnology, 23(8):975–982, 2005.

[59] Peter Krenn. Assembly and experimental validation of a rule-based model

for the high osmolarity glycerol (hog) pathway. Master thesis, University of

Graz, 2013. unpublished.

[60] Thomas G. Kurtz. Solutions of ordinary differential equations as limits of pure

jump Markov processes. Journal of Applied Probability, 7(1):49–58, 1970.

[61] Thomas G Kurtz. Limit theorems for sequences of jump Markov processes

approximating ordinary differential processes. Journal of Applied Probability,

8(2):344–356, 1971.

[62] James Ledoux. On weak lumpability of denumerable markov chains. Statistics

& probability letters, 25(4):329–339, 1995.

[63] Adiel Loinger, Azi Lipshtat, Nathalie Q Balaban, and Ofer Biham. Stochastic

simulations of genetic switch systems. Physical Review E, 75(2):021904, 2007.

[64] Harley H McAdams and Adam Arkin. Its a noisy business! Genetic regulation

at the nanomolar scale. Trends in Genetics, 15(2):65–69, 1999.

[65] Donald A McQuarrie. Stochastic Approach to Chemical Kinetics. Journal of

Applied Probability, 4(3):413–478, 1967.

Bibliography 135

[66] Brian Munsky and Mustafa Khammash. The finite state projection algorithm

for the solution of the chemical master equation. The Journal of chemical

physics, 124:044104, 2006.

[67] James R Norris. Markov chains. Number 2008. Cambridge university press,

1998.

[68] Johan Paulsson. Models of stochastic gene expression. Physics of life reviews,

2(2):157–175, 2005.

[69] Serge Pelet, Fabian Rudolf, Mariona Nadal-Ribelles, Eulalia de Nadal,

Francesc Posas, and Matthias Peter. Transient activation of the HOG MAPK

pathway regulates bimodal gene expression. Science Signaling, 332(6030):732,

2011.

[70] Tatjana Petrov, Jerome Feret, and Heinz Koeppl. Reconstructing species-

based dynamics from reduced stochastic rule-based models. In Winter Sim-

ulation Conference, 2012.

[71] Tatjana Petrov, Arnab Ganguly, and Heinz Koeppl. Model decomposition

and stochastic fragments. Electron. Notes Theor. Comput. Sci., 284:105–124,

June 2012.

[72] Tatjana Petrov and Heinz Koppl. Approximate reduction of rule-based mod-

els. In Proceedings of ECC 2013, 2013.

[73] John L Pfaltz and Azriel Rosenfeld. Web grammars. In Proceedings of the

1st international joint conference on Artificial intelligence, pages 609–619.

Morgan Kaufmann Publishers Inc., 1969.

[74] Andrew Phillips and Luca Cardelli. Efficient, correct simulation of biological

processes in the stochastic pi-calculus. In Computational Methods in Systems

Biology, pages 184–199. Springer, 2007.

[75] Christopher V Rao and Adam P Arkin. Stochastic chemical kinetics and

the quasi-steady-state assumption: Application to the gillespie algorithm.

Journal of Chemical Physics, 118(11):4999–5010, 2003.

[76] Aviv Regev and Ehud Shapiro. Cells as computation. Nature, 419(6905):343,

September 2002.

Bibliography 136

[77] Gerardo Rubino and Bruno Sericola. A finite characterization of weak

lumpable Markov Processes. part II: The continuous time case. Stochastic

processes and their applications, vol. 38, no2:195–204, 1991.

[78] Gerardo Rubino and Bruno Sericola. A finite characterization of weak

lumpable Markov processes. part I: The discrete time case. Stochastic pro-

cesses and their applications, vol. 45, no 1:115–125, 1993.

[79] Michael S Samoilov and Adam P Arkin. Deviant effects in molecular reaction

pathways. Nature biotechnology, 24(10):1235–40, 2006.

[80] Claude E Shannon. A mathematical theory of communication. Bell system

technical journal, 27, 1948.

[81] Ilya Shmulevich, Edward R Dougherty, Seungchan Kim, and Wei Zhang.

Probabilistic boolean networks: a rule-based uncertainty model for gene reg-

ulatory networks. Bioinformatics, 18(2):261–274, 2002.

[82] Jianjun Paul Tian and D Kannan. Lumpability and commutativity of Markov

processes. Stochastic analysis and Applications, 24, no3:685–702, 2006.

[83] Pedro Pablo Perez Velasco. Matrix graph grammars. arXiv preprint

arXiv:0801.1245, 2008.

[84] Christopher T. Walsh. Posttranslation Modification of Proteins: Expanding

Nature’s Inventory. Roberts and Co. Publisher, 2006.

[85] Christoph Zechner, Jakob Ruess, Peter Krenn, Serge Pelet, Matthias Peter,

John Lygeros, and Heinz Koeppl. Moment-based inference predicts bimodal-

ity in transient gene expression. Proceedings of the National Academy of

Sciences, 109(21):8340–8345, 2012.

Documents

Rights / License: Research Collection In Copyright - …7584/eth... · agr egation de chaine de Markov exacte et approch ee, qui constitue une grande partie de la th ese. La th eorie