Upload
michel-dumontier
View
1.987
Download
5
Tags:
Embed Size (px)
Citation preview
1 INRIA2011::Dumontier
Formal representation of models in systems biology
Michel Dumontier, Ph.D.
Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University
Professeur Associé, Département d’informatique et de génielogiciel, Université Laval
Ottawa Institute of Systems BiologyOttawa-Carleton Institute of Biomedical Engineering
Systems BiologyWe create and simulate biological models to :• gain insight into the structure and function of
biochemical networks• reveal metabolic and signalling capabilities so as
to predict phenotypes• undertake metabolic engineering to maximize
some desired product
To do this, we need • to integrate & manage our data & knowledge in
a coherent, scalable and machine understandable manner
• efficient software to execute computationally demanding simulations
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2
3 INRIA2011::Dumontier
Repressilator: A self-regulating system
A synthetic oscillatory network of transcriptional regulators. Elowitz MB, Leibler S. (2000). Nature 403: 335-338.
4 INRIA2011::Dumontier
Biomodels are specified using SBML+ semantic annotation
Semantic annotations are primarily used for browse and search
• EBI managed resource with 600+ SBML models
• 300+ models are annotated with
• Pubmed - papers• ChEBI - chemicals• UniProt - proteins• KEGG - chemicals,
reactions• E.C. - reactions• Gene Ontology -
functions, reactions, compartments
• Taxonomy - organism
http://www.ebi.ac.uk/biomodels-main/
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5
Bio-ontologies
• Provide rich human and machine understandable descriptions of the terms they purport to describe
• Are generally used for semantic annotation of data, which when reused, facilitate integration across domains (granularity, species, experimental methods)
• Facilitate granular and cross-domain queries • Can be used to obtain explanations for inferences drawn• Can be efficiently processed by algorithms and software
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6
Additional annotations are specified using the Resource Description Framework (RDF)
<species metaid="_525530" id="GLCi" compartment="cyto" initialConcentration="0.097652231064563">
<annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/"> <rdf:Description rdf:about="#_525530"> <bqbiol:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/> <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/> </rdf:Bag> </bqbiol:is> </rdf:Description> </rdf:RDF> </annotation> </species>
object
predicate
The intent is to express that the species represents a substance composed of glucose moleculesWe also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M
The annotation element stores the RDF
subject
Implicit subject and xml attributes
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7
RDF-based Linked Data
• Provides the basis for simple data syndication and syntactic data integrationo IRIso Statements (aka triples) take the form of o <subject> <predicate> <object>
• Easy to implemento stand-alone datasetso logical layer over databases
• Limited reasoningo class and property hierarchieso domain/range restrictionso can’t automatically discover inconsistency
• Standardized Queries - SPARQL• Scalable - to billions of triples
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 8
Objective:Computational Knowledge Discovery• Terminological resources increasingly being used to annotate
SBML-based biomolecular models o Makes it easier to explore or find models
• By converting models into formal representations of knowledge we get to:o validate the accuracy of the annotations o infer knowledge explicit in terminological resourceso discover biological implications inherent in the models and the
results of simulations.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 9
10 INRIA2011::Dumontier
Approach• We formally represent semantically annotated
biomodels such that it becomes possible to:– Capture the semantics of models and the
biological systems they represent– Check the consistency of biological knowledge
represented through automated reasoning– query the results of simulations in the context of
the biological knowledge
11 COMBINE2011:Dumontier
Have you heard of OWL?
12 COMBINE2011:Dumontier
OWL - The Web Ontology Language
Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values
o quantifiers o existential, universal, cardinality restriction
o negationo disjunctiono property characteristics
o transitive, functional, inverse functional, symmetric, antisymmetric, reflexive, irreflexive
o complex classes in domain and range restrictions o property chains
12
13 COMBINE2011:Dumontier
Powerful Reasoning over OWL ontologies
• Consistency: determines whether the ontology contains contradictions.
• Satisfiability: determines whether classes can have instances.
• Subsumption: is class C1 implicitly a subclass of C2?
• Classification: repetitive application of subsumption to discover implicit subclass links between named classes
• Realization: find the most specific class that an individual belongs to.
13
Triples to axioms: Many possible formalizations – knowledge of logics and domain expertise comes in
handy here!We make an ontological commitment by converting RDF triples into OWL axioms. Triple in RDF:<C1 R C2>• C1 and C2 are classes, R a relation between 2 classes• intended meaning:
o C1 SubClassOf: C2o C1 SubClassOf: R some C2o C1 SubClassOf: R only C2o C2 SubClassOf: R some C1o C1 SubClassOf: S some C2o C1 DisjointFrom: C2 o C1 and C2 SubClassOf: owl:Nothingo R some C1 DisjointFrom: R some C2o C1 EquivalentClasses C2o ...
• in general: P(C1, C2), where P is an OWL axiom (template)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14
15 INRIA2011::Dumontier
Formalization: every element E of the SBML language represents a class Rep(E) and we assert that
E subClassOf: represents some Rep(E)
Conceptualization:Models and their components represent physical entities (material entities, processes)
Top-level ontologies can make additional commitment by enforcing disjointness
among basic types
Material object, Process, Function and Quality are mutually disjoint.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16
Relations impose additional constraints, such that inconsistencies arise when incorrectly used
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17
OWL Axiom:Model SubClassOf: represents some MaterialEntity
Conversion rule: a Model annotated with class C represents:
If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C
If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C)
If C is a SubClassOf Process then M SubClassOf: represents some (has-function some (realized-by only C))
For each model annotation, we make a commitment to what it represents
BIOMODEL 82: Converting Model
Annotated with heterotrimeric G-protein complex cycle (GO:0031684):
• represents an object O1• O1 has a function F1• F1 is realized by processes of the type heterotrimeric G-protein
complex cycle
M SubClassOf: represents some O1O1 SubClassOf: (has-function some (realized-by only GO:0031684)
Compartment(x): a model component that represents a material entity which is part of the object represented by the model to which the component belongs
Compartment SubClassOf 'model component' and represents some 'Material object'
Conversion rule:• represents an object O2• part of the object represented by the model• compartment’s species represent objects that are located in O2• C SubClassOf: represents some A2• A2 SubClassOf: located-in some A1
Every compartment represents a material entity
21 INRIA2011::Dumontier
Models and their components represent physical entities (material entities, processes)
reactions
species Material entityrepresents
Process
represents
model
has part
IndividualClass
Datatype
SBMLHarvester
Robert Hoehndorf, Michel Dumontier, John H Gennari, Sarala Wimalaratne, Bernard de Bono, Daniel L Cook and Georgios V Gkoutos. Integrating systems biology models and biomedical ontologies. BMC Systems Biology 2011, 5:124.
compartments
model-ofPhysical system
Material entityrepresents
Material entity
Function
has-function
realized in only
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 22
SBML2OWL: Implementation
SBML Harvester http://code.google.com/p/sbmlharvester/• libSBML to access model structure & extract RDF annotations• Jena RDF API to parse RDF annotations• OWLAPI to create OWL axioms & reason with top-level ontology
Application to BioModels repository yields:• OWL ontology with more than 300,000 classes, 800,000 axioms• includes all referenced ontologies
o GO (functions, compartments, processes)o ChEBI (molecules)o Celltype (cell types)o FMA (anatomy)o PATO (qualities)
Answering questions
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 24
Model verification
After reasoning, we found 27 models to be inconsistent
reasons1. our representation - functions sometimes found in the place of
physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations
2. SBML abused – e.g. species used as a measure of time3. constraints in the ontologies themselves mean that the annotation
is simply not possible
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25
Finding inconsistencies with axiomatically enhanced ontologiesrecent work treats function as process and axioms state that an ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process.To this, we add:• GO: ATP + Water the only inputs (universal quantification)• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all
different (disjointness)BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast.• “ATP” input to “ATPase” reaction, which is annotated with
ATPase activity. The species “ATP”, however, is mis-annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP.
Disambiguation: Model annotations
Assertion:M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C))
C SubClassOf: MaterialEntityThen: • represents some C is satisfiable• represents some (has-function some C) and represents
some (has-function some (realized-by only C)) are unsatisfiable
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27
28 INRIA2011::Dumontier
Species are further described with ‘modifiers’ in the context of a reaction
essential activator
<listOfModifiers> <modifierSpeciesReference sboTerm="SBO:0000461" species="X"/> </listOfModifiers>
partial inhibitor <listOfModifiers> <modifierSpeciesReference sboTerm="SBO:0000536" species="PX"/> </listOfModifiers>
29 INRIA2011::Dumontier
Roles are realized in the context of processes by material entities
reaction
species
SBO: participant role
sio: realizes
ChEBI:substance
sio: role of
sio: represents
GO:Processsio: represents
sio: has direct partChEBI:molecule
model
sio: has proper part
IndividualClass
Datatype
SBMLHarvester+
sio: has participant
role chain: realizes o role of -> has participant
30 INRIA2011::Dumontier
Semanticscience Integrated Ontology (SIO)
• OWL2 ontology• 1000+ classes covering basic types (physical, processual, abstract,
informational) with an emphasis on biological entities• 183 basic relations (mereological, participatory, attribute/quality,
spatial, temporal and representational)• axioms can be used by reasoners to compute inferences for
consistency checking, classification and answering questions about life science knowledge
• embodies emerging ontology design patterns – specifies a data model
• dereferenceable URIs• searchable in the NCBO bioportal• Available at http://semanticscience.org/ontology/sio.owl
31 INRIA2011::Dumontier
Examining Mathematical Expressions
<parameter metaid="metaid_0000233" id="k_tl" name="k_tl" constant="false" sboTerm="SBO:0000016"> (unimolecular rate constant) <notes> <p xmlns="http://www.w3.org/1999/xhtml">Translation rate constant</p> </notes> </parameter>
<assignmentRule metaid="metaid_0400235" variable="k_tl"> <math> <apply> <divide/> <ci> eff </ci> <ci> t_ave </ci> </apply> </math> </assignmentRule>
<kineticLaw sboTerm="SBO:0000049"> <math> <apply> <times/> <ci> k_tl </ci> <ci> X </ci> </apply> </math> </kineticLaw>
<parameter metaid="metaid_0000025" id="eff" name="translation efficiency" value="20"> <notes> <p xmlns="http://www.w3.org/1999/xhtml">Average number of proteins per transcript</p> </notes> </parameter>
32 INRIA2011::Dumontier
SBO:Reaction
SBO:mathematicalexpression
Quantity
SBO: systems description parameter
sio:denotessio: is specified bysio:
has proper part
SBML Reactions may be specified by mathematical expressions, which contain
quantitative variables that denote quantities
sio: has value Literal
IndividualClass
Datatype
SBMLFarmersio:derives from
33 INRIA2011::Dumontier
When running a simulation, some attributes change with timespecies
attribute
sio:has attributesio
: has value double
uo:unitsio: measured at time
sio: has valuedatetime
sio: result of
simulationsio: has agent
software
sio: has participant
parametermodelIndividualClass expression
sio: has unit
Datatype
KISAO:algorithm
sio: conforms to
34 INRIA2011::Dumontier
35 INRIA2011::Dumontier
Copasi output: not machine understandable
36 INRIA2011::Dumontier
Query Answering over RDF/OWLFind those concentration measurements for species that represent molecular entities that contain ribonucleotide residues
‘concentration’ and (‘measured at’ some double[>20.0, <40.0])and ‘is attribute of’ some ( ‘species’ and ‘represents’ some (‘has part’ some ‘ribonucleotide residue’))
ChEBI ontology
37 INRIA2011::Dumontier
Curve Analysis: Elements of a Plot
A
B
C
D
Time
Conc
entr
ation
Curves Contain- Points- Line Segments- Curve SegmentsCurves Annotated As- Monotonic/Non-Monotonic- Overall Increasing/DecreasingCurve Segments- Contain line segments- Same properties as curveLine Segments- Contain points- Increasing/DecreasingPoints- Have attributes and values- Can be local minima/maxima- Can be inflection points
Curve: MonotonicityTEDDY_0000144
TEDDY_0000008
strictly increasing
TEDDY_0000009
strictly decreasing
Curve SegmentOverall Change (Slope)Constituent Points
Point: Stationary Point (Maximum)
Curve: Overall Change
38 INRIA2011::Dumontier
Queries
‘local maximum’ and ‘is attribute of’ some ( species and represents some ( ‘has function’ some ‘dna binding’))
Biomodel + UniProt + GO
39 INRIA2011::Dumontier
Get the non-monotonic curves for protein species
‘non-monotonic curve’and ‘has part’ some ( ‘concentration’ and ‘is attribute of’ some ( ‘species’ and ‘represents’ some ‘protein’)))
40 INRIA2011::Dumontier
ConclusionThe SBML-derived ontologies can be
i) checked for their consistency, thereby uncovering erroneous curations
ii) infer attributes and relations of the substances, compartments and reactions beyond what was originally described in the models
iii) answer sophisticated questions across a model knowledge base
iv) extended with modifiers, mathematical expressions and parameters, simulation Results (from tab files) to answer questions about simulation results with reference to the semantic annotations (GO) in biomodels, UniProt
INRIA2011::Dumontier41
Acknowledgements
Leonid Chepelev
Robert Hoehndorf
42 INRIA2011::Dumontier
Michel [email protected]
Publications: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier