42
Formal representation of models in systems biology 1 Michel Dumontier, Ph.D. Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Professeur Associé, Département d’informatique et de génie logiciel, Université Laval Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering INRIA2011::Dumontier

Formal representation of models in systems biology

Embed Size (px)

Citation preview

Page 1: Formal representation of models in systems biology

1 INRIA2011::Dumontier

Formal representation of models in systems biology

Michel Dumontier, Ph.D.

Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University

Professeur Associé, Département d’informatique et de génielogiciel, Université Laval

Ottawa Institute of Systems BiologyOttawa-Carleton Institute of Biomedical Engineering

Page 2: Formal representation of models in systems biology

Systems BiologyWe create and simulate biological models to :• gain insight into the structure and function of

biochemical networks• reveal metabolic and signalling capabilities so as

to predict phenotypes• undertake metabolic engineering to maximize

some desired product

To do this, we need • to integrate & manage our data & knowledge in

a coherent, scalable and machine understandable manner

• efficient software to execute computationally demanding simulations

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2

Page 3: Formal representation of models in systems biology

3 INRIA2011::Dumontier

Repressilator: A self-regulating system

A synthetic oscillatory network of transcriptional regulators. Elowitz MB, Leibler S. (2000). Nature 403: 335-338.

Page 4: Formal representation of models in systems biology

4 INRIA2011::Dumontier

Biomodels are specified using SBML+ semantic annotation

Page 5: Formal representation of models in systems biology

Semantic annotations are primarily used for browse and search

• EBI managed resource with 600+ SBML models

• 300+ models are annotated with

• Pubmed - papers• ChEBI - chemicals• UniProt - proteins• KEGG - chemicals,

reactions• E.C. - reactions• Gene Ontology -

functions, reactions, compartments

• Taxonomy - organism

http://www.ebi.ac.uk/biomodels-main/

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5

Page 6: Formal representation of models in systems biology

Bio-ontologies

• Provide rich human and machine understandable descriptions of the terms they purport to describe

• Are generally used for semantic annotation of data, which when reused, facilitate integration across domains (granularity, species, experimental methods)

• Facilitate granular and cross-domain queries • Can be used to obtain explanations for inferences drawn• Can be efficiently processed by algorithms and software

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6

Page 7: Formal representation of models in systems biology

Additional annotations are specified using the Resource Description Framework (RDF)

   <species metaid="_525530" id="GLCi" compartment="cyto" initialConcentration="0.097652231064563">

           <annotation>        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">          <rdf:Description rdf:about="#_525530">            <bqbiol:is>              <rdf:Bag>                <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/>                <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/>              </rdf:Bag>            </bqbiol:is>          </rdf:Description>        </rdf:RDF>      </annotation>    </species>

object

predicate

The intent is to express that the species represents a substance composed of glucose moleculesWe also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M

The annotation element stores the RDF

subject

Implicit subject and xml attributes

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7

Page 8: Formal representation of models in systems biology

RDF-based Linked Data

• Provides the basis for simple data syndication and syntactic data integrationo IRIso Statements (aka triples) take the form of o <subject> <predicate> <object>

• Easy to implemento stand-alone datasetso logical layer over databases

• Limited reasoningo class and property hierarchieso domain/range restrictionso can’t automatically discover inconsistency

• Standardized Queries - SPARQL• Scalable - to billions of triples

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 8

Page 9: Formal representation of models in systems biology

Objective:Computational Knowledge Discovery• Terminological resources increasingly being used to annotate

SBML-based biomolecular models o Makes it easier to explore or find models

• By converting models into formal representations of knowledge we get to:o validate the accuracy of the annotations o infer knowledge explicit in terminological resourceso discover biological implications inherent in the models and the

results of simulations.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 9

Page 10: Formal representation of models in systems biology

10 INRIA2011::Dumontier

Approach• We formally represent semantically annotated

biomodels such that it becomes possible to:– Capture the semantics of models and the

biological systems they represent– Check the consistency of biological knowledge

represented through automated reasoning– query the results of simulations in the context of

the biological knowledge

Page 11: Formal representation of models in systems biology

11 COMBINE2011:Dumontier

Have you heard of OWL?

Page 12: Formal representation of models in systems biology

12 COMBINE2011:Dumontier

OWL - The Web Ontology Language

Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values

o quantifiers o existential, universal, cardinality restriction

o negationo disjunctiono property characteristics

o transitive, functional, inverse functional, symmetric, antisymmetric, reflexive, irreflexive

o complex classes in domain and range restrictions o property chains

12

Page 13: Formal representation of models in systems biology

13 COMBINE2011:Dumontier

Powerful Reasoning over OWL ontologies

• Consistency: determines whether the ontology contains contradictions.

• Satisfiability: determines whether classes can have instances.

• Subsumption: is class C1 implicitly a subclass of C2?

• Classification: repetitive application of subsumption to discover implicit subclass links between named classes

• Realization: find the most specific class that an individual belongs to.

13

Page 14: Formal representation of models in systems biology

Triples to axioms: Many possible formalizations – knowledge of logics and domain expertise comes in

handy here!We make an ontological commitment by converting RDF triples into OWL axioms. Triple in RDF:<C1 R C2>• C1 and C2 are classes, R a relation between 2 classes• intended meaning:

o C1 SubClassOf: C2o C1 SubClassOf: R some C2o C1 SubClassOf: R only C2o C2 SubClassOf: R some C1o C1 SubClassOf: S some C2o C1 DisjointFrom: C2 o C1 and C2 SubClassOf: owl:Nothingo R some C1 DisjointFrom: R some C2o C1 EquivalentClasses C2o ...

• in general: P(C1, C2), where P is an OWL axiom (template)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14

Page 15: Formal representation of models in systems biology

15 INRIA2011::Dumontier

Formalization: every element E of the SBML language represents a class Rep(E) and we assert that

E subClassOf: represents some Rep(E)

Conceptualization:Models and their components represent physical entities (material entities, processes)

Page 16: Formal representation of models in systems biology

Top-level ontologies can make additional commitment by enforcing disjointness

among basic types

Material object, Process, Function and Quality are mutually disjoint.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16

Page 17: Formal representation of models in systems biology

Relations impose additional constraints, such that inconsistencies arise when incorrectly used

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17

Page 18: Formal representation of models in systems biology

OWL Axiom:Model SubClassOf: represents some MaterialEntity

Conversion rule: a Model annotated with class C represents:

If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C

If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C)

If C is a SubClassOf Process then M SubClassOf: represents some (has-function some (realized-by only C))

For each model annotation, we make a commitment to what it represents

Page 19: Formal representation of models in systems biology

BIOMODEL 82: Converting Model

Annotated with heterotrimeric G-protein complex cycle (GO:0031684):

• represents an object O1• O1 has a function F1• F1 is realized by processes of the type heterotrimeric G-protein

complex cycle

M SubClassOf: represents some O1O1 SubClassOf: (has-function some (realized-by only GO:0031684)

Page 20: Formal representation of models in systems biology

Compartment(x): a model component that represents a material entity which is part of the object represented by the model to which the component belongs

Compartment SubClassOf 'model component' and represents some 'Material object'

Conversion rule:• represents an object O2• part of the object represented by the model• compartment’s species represent objects that are located in O2• C SubClassOf: represents some A2• A2 SubClassOf: located-in some A1

Every compartment represents a material entity

Page 21: Formal representation of models in systems biology

21 INRIA2011::Dumontier

Models and their components represent physical entities (material entities, processes)

reactions

species Material entityrepresents

Process

represents

model

has part

IndividualClass

Datatype

SBMLHarvester

Robert Hoehndorf, Michel Dumontier, John H Gennari, Sarala Wimalaratne, Bernard de Bono, Daniel L Cook and Georgios V Gkoutos. Integrating systems biology models and biomedical ontologies. BMC Systems Biology 2011, 5:124.

compartments

model-ofPhysical system

Material entityrepresents

Material entity

Function

has-function

realized in only

Page 22: Formal representation of models in systems biology

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 22

Page 23: Formal representation of models in systems biology

SBML2OWL: Implementation

SBML Harvester http://code.google.com/p/sbmlharvester/• libSBML to access model structure & extract RDF annotations• Jena RDF API to parse RDF annotations• OWLAPI to create OWL axioms & reason with top-level ontology

Application to BioModels repository yields:• OWL ontology with more than 300,000 classes, 800,000 axioms• includes all referenced ontologies

o GO (functions, compartments, processes)o ChEBI (molecules)o Celltype (cell types)o FMA (anatomy)o PATO (qualities)

Page 24: Formal representation of models in systems biology

Answering questions

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 24

Page 25: Formal representation of models in systems biology

Model verification

After reasoning, we found 27 models to be inconsistent

reasons1. our representation - functions sometimes found in the place of

physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations

2. SBML abused – e.g. species used as a measure of time3. constraints in the ontologies themselves mean that the annotation

is simply not possible

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25

Page 26: Formal representation of models in systems biology

Finding inconsistencies with axiomatically enhanced ontologiesrecent work treats function as process and axioms state that an ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process.To this, we add:• GO: ATP + Water the only inputs (universal quantification)• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all

different (disjointness)BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast.• “ATP” input to “ATPase” reaction, which is annotated with

ATPase activity. The species “ATP”, however, is mis-annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP.

Page 27: Formal representation of models in systems biology

Disambiguation: Model annotations

Assertion:M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C))

C SubClassOf: MaterialEntityThen: • represents some C is satisfiable• represents some (has-function some C) and represents

some (has-function some (realized-by only C)) are unsatisfiable

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27

Page 28: Formal representation of models in systems biology

28 INRIA2011::Dumontier

Species are further described with ‘modifiers’ in the context of a reaction

essential activator

<listOfModifiers> <modifierSpeciesReference sboTerm="SBO:0000461" species="X"/> </listOfModifiers>

partial inhibitor <listOfModifiers> <modifierSpeciesReference sboTerm="SBO:0000536" species="PX"/> </listOfModifiers>

Page 29: Formal representation of models in systems biology

29 INRIA2011::Dumontier

Roles are realized in the context of processes by material entities

reaction

species

SBO: participant role

sio: realizes

ChEBI:substance

sio: role of

sio: represents

GO:Processsio: represents

sio: has direct partChEBI:molecule

model

sio: has proper part

IndividualClass

Datatype

SBMLHarvester+

sio: has participant

role chain: realizes o role of -> has participant

Page 30: Formal representation of models in systems biology

30 INRIA2011::Dumontier

Semanticscience Integrated Ontology (SIO)

• OWL2 ontology• 1000+ classes covering basic types (physical, processual, abstract,

informational) with an emphasis on biological entities• 183 basic relations (mereological, participatory, attribute/quality,

spatial, temporal and representational)• axioms can be used by reasoners to compute inferences for

consistency checking, classification and answering questions about life science knowledge

• embodies emerging ontology design patterns – specifies a data model

• dereferenceable URIs• searchable in the NCBO bioportal• Available at http://semanticscience.org/ontology/sio.owl

Page 31: Formal representation of models in systems biology

31 INRIA2011::Dumontier

Examining Mathematical Expressions

<parameter metaid="metaid_0000233" id="k_tl" name="k_tl" constant="false" sboTerm="SBO:0000016"> (unimolecular rate constant) <notes> <p xmlns="http://www.w3.org/1999/xhtml">Translation rate constant</p> </notes> </parameter>

<assignmentRule metaid="metaid_0400235" variable="k_tl"> <math> <apply> <divide/> <ci> eff </ci> <ci> t_ave </ci> </apply> </math> </assignmentRule>

<kineticLaw sboTerm="SBO:0000049"> <math> <apply> <times/> <ci> k_tl </ci> <ci> X </ci> </apply> </math> </kineticLaw>

<parameter metaid="metaid_0000025" id="eff" name="translation efficiency" value="20"> <notes> <p xmlns="http://www.w3.org/1999/xhtml">Average number of proteins per transcript</p> </notes> </parameter>

Page 32: Formal representation of models in systems biology

32 INRIA2011::Dumontier

SBO:Reaction

SBO:mathematicalexpression

Quantity

SBO: systems description parameter

sio:denotessio: is specified bysio:

has proper part

SBML Reactions may be specified by mathematical expressions, which contain

quantitative variables that denote quantities

sio: has value Literal

IndividualClass

Datatype

SBMLFarmersio:derives from

Page 33: Formal representation of models in systems biology

33 INRIA2011::Dumontier

When running a simulation, some attributes change with timespecies

attribute

sio:has attributesio

: has value double

uo:unitsio: measured at time

sio: has valuedatetime

sio: result of

simulationsio: has agent

software

sio: has participant

parametermodelIndividualClass expression

sio: has unit

Datatype

KISAO:algorithm

sio: conforms to

Page 34: Formal representation of models in systems biology

34 INRIA2011::Dumontier

Page 35: Formal representation of models in systems biology

35 INRIA2011::Dumontier

Copasi output: not machine understandable

Page 36: Formal representation of models in systems biology

36 INRIA2011::Dumontier

Query Answering over RDF/OWLFind those concentration measurements for species that represent molecular entities that contain ribonucleotide residues

‘concentration’ and (‘measured at’ some double[>20.0, <40.0])and ‘is attribute of’ some ( ‘species’ and ‘represents’ some (‘has part’ some ‘ribonucleotide residue’))

ChEBI ontology

Page 37: Formal representation of models in systems biology

37 INRIA2011::Dumontier

Curve Analysis: Elements of a Plot

A

B

C

D

Time

Conc

entr

ation

Curves Contain- Points- Line Segments- Curve SegmentsCurves Annotated As- Monotonic/Non-Monotonic- Overall Increasing/DecreasingCurve Segments- Contain line segments- Same properties as curveLine Segments- Contain points- Increasing/DecreasingPoints- Have attributes and values- Can be local minima/maxima- Can be inflection points

Curve: MonotonicityTEDDY_0000144

TEDDY_0000008

strictly increasing

TEDDY_0000009

strictly decreasing

Curve SegmentOverall Change (Slope)Constituent Points

Point: Stationary Point (Maximum)

Curve: Overall Change

Page 38: Formal representation of models in systems biology

38 INRIA2011::Dumontier

Queries

‘local maximum’ and ‘is attribute of’ some ( species and represents some ( ‘has function’ some ‘dna binding’))

Biomodel + UniProt + GO

Page 39: Formal representation of models in systems biology

39 INRIA2011::Dumontier

Get the non-monotonic curves for protein species

‘non-monotonic curve’and ‘has part’ some ( ‘concentration’ and ‘is attribute of’ some ( ‘species’ and ‘represents’ some ‘protein’)))

Page 40: Formal representation of models in systems biology

40 INRIA2011::Dumontier

ConclusionThe SBML-derived ontologies can be

i) checked for their consistency, thereby uncovering erroneous curations

ii) infer attributes and relations of the substances, compartments and reactions beyond what was originally described in the models

iii) answer sophisticated questions across a model knowledge base

iv) extended with modifiers, mathematical expressions and parameters, simulation Results (from tab files) to answer questions about simulation results with reference to the semantic annotations (GO) in biomodels, UniProt

Page 41: Formal representation of models in systems biology

INRIA2011::Dumontier41

Acknowledgements

Leonid Chepelev

Robert Hoehndorf

Page 42: Formal representation of models in systems biology

42 INRIA2011::Dumontier

Michel [email protected]

Publications: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier