47
François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt http://contraintes.inria.fr/ Joint work with : Nathalie Sylvain Laurence Chabrier-Rivier Soliman Calzone 2002-2004: ARC CPBIO “Process Calculi and Biology of Molecular Networks” A. Bockmayr, LORIA, V. Danos, CNRS PPS, V. Schächter, Genoscope Evry http://contraintes.inria.fr/cpbio/

François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

Embed Size (px)

Citation preview

Page 1: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

The Biochemical Abstract Machine BIOCHAM-2

François Fages, Contraintes project-team, Theme: symbolic systems,

INRIA Rocquencourt http://contraintes.inria.fr/

Joint work with :

Nathalie Sylvain Laurence

Chabrier-Rivier Soliman Calzone

2002-2004: ARC CPBIO “Process Calculi and Biology of Molecular Networks”

A. Bockmayr, LORIA, V. Danos, CNRS PPS, V. Schächter, Genoscope Evry http://contraintes.inria.fr/cpbio/

Page 2: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Systems Biology ?

• Multidisciplinary field aiming at getting over the complexity walls to reason about biological processes at the system level.

• Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments)

• Beyond providing tools to biologists, Computer Science has much to offer in terms of concepts and methods.

• Bioinformatics: end 90’s, genomic sequences post-genomic data (ARN expression, protein synthesis, protein-protein interactions,… )

• Need for a strong parallel effort on:

- the formal representation of biological processes,

- formal tools for modeling and reasoning about their global behavior.

Page 3: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Language Approach to (Cell) Systems Biology

Qualitative models: from diagrammatic notation to• Boolean networks [Thomas 73]

• Milner’s π–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00]

• Concurrent transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 03]

Biochemical abstract machine BIOCHAM-1 [Chabrier-Fages 03]

Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02]

• Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03]

Quantitative models: from differential equation systems to• Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00]

• Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01]

• Hybrid concurrent constraint languages [Bockmayr-Courtois 01]

• Rule-based compositional language BIOCHAM-2 [Chabrier-Fages-Soliman 04]

Page 4: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Plan for today

1. Introduction

2. BIOCHAM Language for Modeling Biochemical Systems1. Syntax: molecules and reactions

2. Semantics at 3 abstraction levels: molecule populations, concentrations, Boolean

3. BIOCHAM Language for Formalizing Biological Properties1. Computation Tree Logic for Boolean semantics

2. Constraint Linear Time Logic for concentration semantics

4. Machine Learning from Temporal Properties1. Learning reaction rules

2. Learning kinetic parameter values

5. Conclusion, collaborations and perspectives

Page 5: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

2. Modeling Biochemical Systems: syntax of molecules

Small molecules: covalent bonds (outer electrons shared) 50-200 kcal/mol

• 70% water

• 1% ions

• 6% amino acids (20), nucleotides (5),

fats, sugars, ATP, ADP, …

Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol

Stability and bindings determined by the number of weak bonds: 3D shape

• 20% proteins (50-104 amino acids)

• RNA (102-104 nucleotides AGCU)

• DNA (102-106 nucleotides AGCT)

Page 6: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Formal proteins

Cyclin dependent kinase 1 Cdk1

(free, inactive)

Complex Cdk1-Cyclin B Cdk1–CycB

(low activity)

Phosphorylated form Cdk1~{thr161}-CycB

at site threonine 161

(high activity)

(BIOCHAM syntax)

Page 7: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Formal Genes and RNA

Genes = parts of DNA #ERCC1

Gene transcription: RNA copying from a gene

RNA expression: Protein synthesis from an RNA

#ERCC1-(PRB-JUN-CFOS)

Page 8: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

BIOCHAM Syntax of Molecules

E ::= Name|E-E|E~{E,…,E}|(E) S ::= _|E+S

Names: molecules, proteins, #gene binding sites, abstract @processes…

- : binding operator for protein complexes, gene binding sites, …

Associative and commutative.

~{…}: modification operator for phosphorylated sites, …

Set of modified sites (Associative, Commutative, Idempotent).

+ : solution operator, “soup aspect”, Assoc. Comm. Idempotent, Neutral _

No membranes, no transport formalized. Bitonal calculi [Cardelli 03].

Page 9: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

BIOCHAM Syntax of Reactions

N ::= name : expr for R |

name : R | expr for R | R

R ::= S=>S | S=[E]=>S |

S=[R]=>S | S<=>S | S<=[E]=>S

where A<=>B stands for A=>B and B=>A

A=[C]=>B for A+C=>B+C, etc.

Three abstraction levels:

1. Boolean abstraction: presence/absence of molecules1. Concurrent Transition System

2. Concentrations: number / volume1. ODE

3. Population of molecules: number of molecules 1. Multiset Rewriting, Stochastic

Page 10: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Boolean Semantics (BIOCHAM-1)

Associate:

• Boolean state variables to molecules

denoting the presence/absence of molecules in the cell or compartment

• A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors

A reaction A+B=>C+D is translated with 4 transition rules taking into account the possible consumption of reactants:

A+BA+B+C+D

A+BA+B +C+D

A+BA+B+C+D

A+BA+B+C+D

Page 11: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Six Elementary Reaction Rule Schemas

Complexation: A + B => A-B Decomplexation A-B => A + B

Cdk1+CycB => Cdk1–CycB

Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A

Cdk1–CycB =[Myt1]=> Cdk1~{thr161}-CycB

Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB

Synthesis: _ =[C]=> A.

_ =[#Ge2-E2f13-Dp12]=> CycA

Degradation: A =[C]=> _.

CycE =[@UbiPro]=> _ (not for CycE-Cdk2 which is stable)

Page 12: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

MAPK Signaling Pathway

RAF + RAFK <=> RAF-RAFK.RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} where p2 not in $P.MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} where p2 not in $P.MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.

RAF-RAFK => RAFK + RAF~{p1}.RAF~{p1}-RAFPH => RAF + RAFPH.MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}.MEK~{p1}-MEKPH => MEK + MEKPH.MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2} + MEK~{p1,p2}.MAPK~{p1}-MAPKPH => MAPK + MAPKPH.MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.

Page 13: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

MAPK Signaling Pathway

RAF + RAFK <=> RAF-RAFK.RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} where p2 not in $P.MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} where p2 not in $P.MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.

RAF-RAFK => RAFK + RAF~{p1}.RAF~{p1}-RAFPH => RAF + RAFPH.MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}.MEK~{p1}-MEKPH => MEK + MEKPH.MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2} + MEK~{p1,p2}.MAPK~{p1}-MAPKPH => MAPK + MAPKPH.MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.

Page 14: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

2.2 Concentration Semantics

• Add kinetic expressions to BIOCHAM reaction rules

k*[A]*[B] for A + B => C

• Associate real values to molecules

[A] concentration of A

• Associate a system of ordinary differential equations (ODE)

to a system of reaction rules (BIOCHAM model)

Page 15: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Physical Interpretation of Kinetic Expressions

1) Probability of collision

Different diffusion speeds of molecules (small>substrates>enzymes…)

Average travel in a random walk: 1 μm in 1s, 2μm in 4s, 10μm in 100s

500000 random collisions per second for a substrate concentration of 10-5

50000 random collisions per second for a substrate concentration of 10-6

2) Probability of reaction upon collision

non elastic collision determined by the

shape and orientation of matching surfaces

3) Energy of bonds (for dissociation rates)

Page 16: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

The Law of Mass Action is Compositional

Law: The number of reactions is proportional to the number of A and B’s.

A + B k C

reaction rate=kAB=dC/dt , dA/dt=-kAB, dB/dt=-kAB

Diffusion assumption: each molecule moves independently of other molecules in a random walk (dilute solutions, low concentration ).

The dynamics of a complex system is the composition of the dynamics of the reactions under mass action law (at given temperature, pH,…):

E+S k1 C k2 E+P

E+S k3 C

dE/dt = -k1ES+(k2+k3)C dC/dt = k1ES-(k2+k3)C

dS/dt = -k1ES+k3C dP/dt = k2C

Page 17: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Multi-Scale Phenomena

Hydrolysis of benzoyl-L-arginine ethyl ester by trypsin

present(En,1e-8). present(S,1e-5). absent(C). absent(P).

(k1*[En]*[S],km1*[C]) for En+S <=> C. k2*[C] for C => En+P.

parameter(k1,4e6). parameter(km1,25). parameter(k2,15).

Complex formation 5e-9 in 0.1s Product formation 1e-5 in 1000s

Page 18: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Michaelis-Menten, Hill,… kinetics are not compositional

They are derived from mass action law by quasi-steady approximation for given simple systems:

• Simple enzymatic reaction for Michaelis Menten• Simple cooperative n-dimeric enzymatic reaction for Hill of order n

The quasi-steady state approximation may be no longer valid after composition with other molecules and reactions.

In a compositional approach to Systems Biology (making models composable and re-usable in different contexts)

Michaelis-Menten kinetics, Hill kinetics etc. should be abandonned as reaction kinetics (no intrinsic value) and recovered after composition (property of the system)

Page 19: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Plan

1. Introduction

2. BIOCHAM Language for Modeling Biochemical Systems1. Syntax

2. Semantics at 3 abstraction levels (molecule populations, concentrations, Boolean)

3. BIOCHAM Language for Formalizing Biological Properties1. Computation Tree Logic for Boolean semantics

2. Constraint Linear Time Logic for concentration semantics

4. Machine Learning from Temporal Properties1. Learning reaction rules

2. Learning kinetic parameter values

5. Conclusion, collaborations and perspectives

Page 20: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

3. Temporal Logic CTL as a Query Language

Computation Tree Logic [Clarke & al. 99]

Time

Non-determinism E, A

F,G,U EF

EU

AG

Choice

Time

E

exists 

A

always

X

next time

EX() AX()

F

finally

EF()

AG()

AF()

liveness

G

globally

EG()

AF( )

AG()

safety

U

untilE (U ) A (U )

Page 21: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Biological Queries (1/3)

About reachability:

• Given an initial state init, can the cell produce some protein P? init EF(P)

• Which are the states from which a set of products P1,. . . , Pn can be produced simultaneously? EF(P1^…^Pn)

About pathways:

• Can the cell reach a state s while passing by another state s2? init EF(s2^EFs)

• Is state s2 a necessary checkpoint for reaching state s? EF(s2U s)

• Is it possible to produce P without using nor creating Q? EF(Q U s)• Can the cell reach a state s without violating some constraints c? init EF(c U s)

Page 22: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Biological Queries (2/3)

About stability:

• Is a certain (partially described) state s a stable state? sAG(s) sAG(s) (s denotes both the state and the formula describing it).

• Is s a steady state (with possibility of escaping) ? sEG(s)

• Can the cell reach a stable state? initEF(AG(s))not a LTL formula.

• Must the cell reach a stable state? initAF(AG(s))

• What are the stable states? Not expressible in CTL [Chan 00].

• Can the system exhibit a cyclic behavior w.r.t. the presence of P ? init EG((P EF P) ^ (P EF P))

Page 23: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Biological Queries (3/3)

About the correctness of the model:

• Can one see the inaccuracies of the model and correct them?

Exhibit a counterexample pathway or a witness. Suggest refinements of the model or biological experiments to validate/invalidate the property of the model.

About durations:

• How long does it take for a molecule to become activated?

• In a given time, how many Cyclins A can be accumulated?

• What is the duration of a given cell cycle’s phase?

CTL operators abstract from durations. Time intervals can be modeled in FO by adding numerical arguments for start times and durations.

Page 24: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

MAPK Signaling Pathway

MEK~{p1} is a checkpoint for producing MAPK~{p1,p2}biocham: !E(!MEK~{p1} U MAPK~{p1,p2})True

The PH complexes are not compulsory for the cascadebiocham: !E(!MEK~{p1}-MEKPH U MAPK~{p1,p2})falseStep 1 rule 15 Step 2 rule 1 RAF-RAFK presentStep 3 rule 21 RAF~{p1} presentStep 4 rule 5 MEK-RAF~{p1} presentStep 5 rule 24 MEK~{p1} presentStep 6 rule 7 MEK~{p1}-RAF~{p1} presentStep 7 rule 23 MEK~{p1,p2} presentStep 8 rule 13 MAPK-MEK~{p1,p2} presentStep 9 rule 27 MAPK~{p1} presentStep 10 rule 15 MAPK~{p1}-MEK~{p1,p2} presentStep 11 rule 28 MAPK~{p1,p2} present

Page 25: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Kripke Semantics

A Kripke structure K is a triple (S; R; L) where S is a set of states, and RSxS is a total relation.

s |= if is true in s,

s |= E if there is a path from s such that |= ,

s |= A if for every path from s, |= ,

|= if s |= where s is the starting state of ,

|= X if 1 |= ,

|= F if there exists k >0 such that k |= ,

|= G if for every k >0, k |= ,

|= U iff there exists k>0 such that k |= for all j < k j |= Following [Emerson 90] we identify a formula to the set of states which

satisfy it ~ {sS : s |= }.

Page 26: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Symbolic Model Checking

Model Checking is an algorithm for computing, in a given finite Kripke structure the set of states satisfying a CTL formula: {sS : s |= }.

Basic algorithm: represent K as a graph and iteratively label the nodes with the subformulas of which are true in that node.

Add to the states satisfying Add EF (EX ) to the (immediate) predecessors of states labeled by Add E( U ) to the predecessor states of while they satisfy Add EG to the states for which there exists a path leading to a non

trivial strongly connected component of the subgraph of states satisfying

Symbolic model checking: use boolean constraints (BDDs) to represent sets of states and transitions (S is finite).

Page 27: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Cell Cycle: G1 DNA Synthesis G2 Mitosis

G1: CdK4-CycD

Cdk6-CycD

Cdk2-CycE

S: Cdk2-CycA

G2

M: Cdk1-CycA

Cdk1-CycB

Page 28: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Mammalian Cell Cycle Control Map [Kohn 99]

Page 29: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Kohn’s map detail for Cdk2

Complexation with CycA and CycE

Phosphorylation sites PY15 and P

Biocham Rules:

cdk2~$P + cycA-$C => cdk2~$P-cycA-$C

where $C in {_,cks1} .

cdk2~$P + cycE~$Q-$C => cdk2~$P-cycE~$Q-$C

where $C in {_,cks1} .

p57 + cdk2~$P-cycA-$C => p57-cdk2~$P-cycA-$C

where $C in {_, cks1}.

cycE-$C =[cdk2~{p2}-cycE-$S]=> cycE~{T380}-$C

where $S in {_, cks1} and $C in {_, cdk2~?, cdk2~?-cks1}

147-2733 rules, 165 proteins and genes, 500 variables, 2500 states.

Page 30: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Mammalian Cell Cycle Control Benchmark

147-2733 rules, 165 proteins and genes, 500 variables, 2500 states.

BIOCHAM NuSMV model-checker time in seconds:

Initial state G2 Query: Time:

compiling 29

Reachability G1 EF CycE 2

Reachability G1 EF CycD 1.9

Reachability G1 EF PCNA-CycD 1.7

Checkpoint

for mitosis complex

EF ( Cdc25~{Nterm}

U Cdk1~{Thr161}-CycB)

2.2

Cycle EG ( (CycA EF CycA) ( CycA EF CycA))

31.8

Page 31: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Plan

1. Introduction

2. BIOCHAM Language for Modeling Biochemical Systems1. Syntax

2. Semantics at 3 abstraction levels (molecule populations, concentrations, Boolean)

3. BIOCHAM Language for Formalizing Biological Properties1. Computation Tree Logic for Boolean semantics

2. Constraint Linear Time Logic for concentration semantics

4. Machine Learning from Temporal Properties1. Learning reaction rules

2. Learning kinetic parameter values

5. Conclusion, collaborations and perspectives

Page 32: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Learning by Theory Revision

• Theory T: BIOCHAM model • molecule declarations

• interaction rules: complexation, phosphorylation, …

• Examples φ: CTL specification of biological properties• Reachability

• Checkpoints

• Stable states

• Oscillations

• Bias R: Rule pattern• Kind of reaction rules to learn

Find R such that T,R |= φ

Page 33: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Simple Ad-hoc Enumerative Algorithm

For learning one reaction rule:

1. Compute the list of candidate rules• All instances of the rule pattern (the bias)

2. Order the candidates by increasing complexity• Sort the rules by size

3. For each candidate, • add it to the model

• Check the CTL specification in the augmented model

• If the specification is satisfied, output the rule as an anwser

Page 34: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Improved Theory Revision Algorithm

General idea of constraint programming: replace a generate-and-test algorithm by a constrain-and-generate algorithm.

Anticipate whether one has to add or remove a rule?

• Positive CTL formula: if false, remains false after removing a rule• EF(φ) where φ is a boolean formula (pure state description)

• Negative CTL formula: if false, remains false after adding a rule• AG(φ) where φ is a boolean formula

• Remove a rule on the path given by the model checker (why command)

• Unclassified CTL formulae• Checkpoint(a,b): ¬E(¬aUb)

• Yet if EF(b) is true, then checkpoint(a,b) is a negative formula

• Loop(a)= EG((a EFa)^(a EFa))

Page 35: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Rule Inference in Cell Cycle Control

[Tyson et al. 91] model over 6 variables,

initial state present(cdc2).

_ => cyclin.

cdc2˜{p} + cyclin => cdc2˜{p}-cyclin˜{p}.

cdc2˜{p}-cyclin˜{p} =>cdc2-cyclin˜{p}. ERASED

cdc2-cyclin˜{p} => cdc2 + cyclin˜{p}.

cyclin˜{p} => _.

cdc2 <=> cdc2˜{p}.

Page 36: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Rule Inference in Cell Cycle Control (cont.)

CTL specification of biological properties:

Activation of the kinase-cyclin (MPF) complex

reachable(cdc2-cyclin˜{p}).

Oscillation of the cycle’s phase:

loop(cyclin & cyclin˜{p} & !(cdc2-cyclin˜{p})).

Page 37: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Rule Inference in Cell Cycle Control (cont.)

? learn([$Q=>$P where $P in complexes and $Q in complexes]).

_=>cdc2-cyclin˜{p}

cyclin=>cdc2-cyclin˜{p}

cdc2˜{p}-cyclin˜{p}=>cdc2-cyclin˜{p}

? learn([$qp=>$q where $q in complexes and $qp modif $q]).

cdc2˜{p}-cyclin˜{p}=>cdc2-cyclin˜{p}

Adding temporal specification checkpoint(cdc2˜{p},cdc2-cyclin˜{p}).

? learn([$Q=>$P where $P in complexes and $Q in complexes]).

cdc2˜{p}-cyclin˜{p}=>cdc2-cyclin˜{p}

Page 38: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Process Inference in Cell Cycle Control

[Tyson et al. 91] model over 6 variables,

initial state present(cdc2).

_ => cyclin.

cdc2˜{p} + cyclin => cdc2˜{p}-cyclin˜{p}.

cdc2˜{p}-cyclin˜{p} =>cdc2-cyclin˜{p}. ERASED

cdc2-cyclin˜{p} => cdc2 + cyclin˜{p}. ERASED

cyclin˜{p} => _.

cdc2 <=> cdc2˜{p}.

Page 39: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Process Inference in Cell Cycle Control (cont.)

? learn([$qp =>$q where $q in complexes and $qp modif $q, $p+$q=>$p-$q where $q in complexes and $p in complexes]). No rule? learn(_ => $q where $q in complexes). No rule? learn([$R=> $P where $P in complexes and $R in complexes]). cdc2=>cdc2-cyclin˜{p} cyclin=>cdc2-cyclin˜{p} cdc2˜{p}=>cdc2-cyclin˜{p}? learn([$R+ $Q=> $Rp- $Qp where $Q in complexes and $R in complexes and $Rp modif $R and $Qp modif $Q]). cdc2˜{p}+cyclin=>cdc2-cyclin˜{p}

Page 40: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Cell Cycle Control [Qu et al. 03]

_=[Cdk-CycB]=>APC. APC=>_. _=>Cdk.CycB=[APC]=>_.CycB-Cdk=[APC]=>_.CycB~{p1}-Cdk=[APC]=>_.Cdk+CycB => Cdk-CycB.Cdk-CycB~{p1}=[C25~{p1,p2}]=>Cdk-CycB.Cdk-CycB=[Wee1]=>Cdk-CycB~{p1}.C25=[Cdk-CycB]=>C25~{p1}.C25~{p1}=>C25.C25~{p1}=[Cdk-CycB]=>C25~{p1,p2}.C25~{p1,p2}=>C25~{p1}.Wee1=[Cdk-CycB]=>Wee1~{p1}.Wee1~{p1}=>Wee1.CKI=[APC]=>_.CKI+Cdk-CycB=>C.C=[Cdk-CycB]=>C~{p1}.C~{p1}=[APC]=>Cdk-CycB.

Page 41: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Constraint-Based Linear Time Logic

• Constraints over concentrations and derivatives as FOL formulae over the reals:

• [M] > 0.2

• [M]+[P] > [Q]

• d([M])/dt < 0

• LTL operators for time X, F, G, U (no non-determinism).• F([M]>0.2)

• FG([M]>0.2)

• F ([M]>2 & F (d([M])/dt<0 & F ([M]<2 & d([M])/dt>0 & F(d([M])/dt<0))))

Page 42: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Traces from Numerical Simulation

• From a system of Ordinary Differential Equations

dX/dt = f(X)

• Numerical integration produces a discretization of time (by Euler, Runge-Kutta, adaptive step size Runge-Kutta, Rosenbrock methods)

• The trace is a linear Kripke structure:

(t0,X0), (t1,X1), …, (tn,Xn).

the derivatives can be added to the trace

(t0,X0,dX0/dt), (t1,X1,dX1/dt), …, (tn,Xn,dXn/dt).

• Equality x=v true if xi≤v & xi+1≥v or if xi≥v & xi+1≤v (Rolle’s theorem!)

Page 43: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Constraint-Based LTL (Forward) Model Checking

Hypothesis 1: the initial state is completely known

Hypothesis 2: the formula can be checked over a finite period of time [0,T]

Simple algorithm based on the trace of the numerical simulation:

1. Run the numerical simulation from 0 to T producing values at a finite sequence of time points

2. Iteratively label the time points with the sub-formulae of that are true:

Add to the time points where a FOL formula is true,

Add F (X ) to the (immediate) previous time points labeled by Add U to the predecessor time points of while they satisfy (Add G to the states satisfying until T (optimistic abstraction…))

Page 44: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Example of Parameter Estimation in the Brusselator

present(x,1). present(y,1.5). parameter(a,1). parameter(b,1). %wrong parametera for _=>x.[x]*[x]*[y] for 2*x+y=>3*x.b*[x] for x=>y.[x] for x=>_.? trace_check(F(([y]>2) & F((d([y])/dt<0) & F((d([y])/dt>0) & ([y]>2) & F(d([y])/dt<0))))).false

? trace_get(b,0,2,F(([y]>2) & F((d([y])/dt<0) & F((d([y])/dt>0) & ([y]>2) & F(d([y])/dt<0)))),20).

? trace_get(b,0,2,F(([y]>2) & F(([y]<[x]) & F(([y]>[x]) & ([y]>2) & F([y]<[x])))),20),plot.No value found.

? trace_get(b,0,5,F(([y]>2) & F(([y]<[x]) & F(([y]>[x]) & ([y]>2) & F([y]<[x])))),20),plot.parameter(b,2.1) makes F(([y]>2)&F(([y]<[x])&F(([y]>[x])&([y]>2)&F([y]<[x])))) true.

? trace_get(b,0,5,F(([y]>4) & F(([y]<[x]) & F(([y]>[x]) & ([y]>4) & F([y]<[x])))),20),plot.parameter(b,2.7) makes F(([y]>4)&F(([y]<[x])&F(([y]>[x])&([y]>4)&F([y]<[x])))) true.

Page 45: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Conclusion

The biochemical abstract machine BIOCHAM offers:

• A simple rule-based language for modeling biochemical processes• Molecule concentration semantics (ODE)• Boolean semantics: presence/absence of molecules

• A powerful temporal logic language for formalizing biological properties• CTL (implemented with NuSMV model checker)• Constraint LTL (implemented in Prolog)

• An original machine learning system• Rule discovery (from CTL specification)• Parameter estimation (from constraint LTL specification)

• A repository of models: cell-cycle control, signaling pathways… (SBML) http://contraintes.inria.fr/CMBSlib

Page 46: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

On-going Work and Perspectives

Molecule population semantics:

• Stochastic simulation

• Probabilistic model checking (currently using PRISM)

Space: representing compartments, transportation, and deformations

• Location algebra [Cardelli et al. 01, Plotkin 03]

• Partial differential equations

• Space deformation [Cardelli et al. 03, Danos et al. 03]

Page 47: François Fages Rennes March 2005 The Biochemical Abstract Machine BIOCHAM-2 François Fages, Contraintes project-team, Theme: symbolic systems, INRIA Rocquencourt

François Fages Rennes March 2005

Collaborations

STREP APRIL 2: Applications of probabilistic inductive logic programming

Luc de Raedt, Freiburg, Stephen Muggleton, Imperial College London,…

• Learning in a probabilistic logic setting

NoE REWERSE: Reasoning on the web with rules and semantics

François Bry, Münich, Rolf Backofen Jena, Mike Schroeder Dresden,…

• Interfacing Biocham to the Web, gene and protein ontologies

INRIA Bang, Jean Clairambault, Benoît Perthame

INSERM, Villejuif, Francis Lévi “Cancer chronotherapies”

ULB, Albert Goldbeter, Bruxelles

• Coupled BIOCHAM models of cell cycle, circadian cycle, cytotoxic drugs.