178
Biomedical Ontologies for data integration and verification Michel Dumontier and Robert Hoehndorf Carleton University, University of Cambridge ISMB tutorial @ Vienna. July 16,2011 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 1

ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Embed Size (px)

Citation preview

Page 1: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Biomedical Ontologies for data integration and verification Michel Dumontier and Robert Hoehndorf

Carleton University, University of Cambridge

ISMB tutorial @ Vienna. July 16,2011 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 1

Page 2: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Outline

1. General background (10min) o an introduction to the use-case: systems biology, SBML and BioModels

2. Ontological analysis (45 min) o how to express domain content as formal knowledge using the Web Ontology Language

(OWL) 3. Application of formal ontology to consistency and data verification (30min)

o how to use the OWL formalization to verify the accuracy of annotations, data and constraints in a domain

4. Break (30min) 5. Mapping, repair and disambiguation using ontologies (30min)

o how to relax and disambiguate constraints on ontologies to obtain consistent representation of domain content

6. Knowledge discovery, retrieval and querying (15min) o how to answer questions that require the inference of knowledge through automated

reasoning 7. Efficient implementation in software systems (15min)

o how to convert ontologies in efficient formal representations amenable to high-throughput analyses

8. Applications in Bioinformatics (25min) 1. how the formalized ontologies can be used to perform bioinformatics analyses

– Discussion and questions (15min)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2

Page 3: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Systems Biology

We create and simulate biological models to : • gain insight into the structure and function

of biochemical networks • reveal metabolic and signalling

capabilities so as to predict phenotypes • undertake metabolic engineering to

maximize some desired product To do this, we need • to integrate & manage our data &

knowledge in a coherent, scalable and machine understandable manner

• efficient software to execute

computationally demanding simulations

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 3

Page 4: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Bio-ontologies

• Provide rich human and machine understandable descriptions of

the terms they purport to describe • Have value for semantic annotation of data, which allows

integration across domains (granularity, species, experimental methods)

• Facilitate granular and cross-domain queries • Can be used to obtain explanations for inferences drawn • Can be efficiently processed by algorithms and software

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 4

Page 5: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Biomodels are semantically annotated SBML models

• EBI managed resource • 600+ models available as

SBML • 300+ models are curated

with GO process, function and component terms, and has links to protein databases.

• Possible to browse by GO terms:

http://www.ebi.ac.uk/biomodels-main/

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5

Page 6: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Objective: Computational Knowledge Discovery • Terminological resources increasingly being used to

annotate SBML-based biomolecular models o Makes it easier to explore or find models

• By converting models into formal representations of

knowledge we get to: o validate the accuracy of the annotations o infer knowledge explicit in terminological resources o discover biological implications inherent in the models.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6

Page 7: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML

XML-based representation of biochemical models, their components (compartments, species, reactions, events), descriptors (rules, constraints, functions, units) Consider the following enzymatic reaction:

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7

Page 8: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML captures reaction kinetics using an XML-based format

<?xml version="1.0" encoding="UTF-8"?> <sbml level="2" version="3" xmlns="http://www.sbml.org/sbml/level2/version3"> <model name="EnzymaticReaction"> <listOfUnitDefinitions> <unitDefinition id="per_second"> <listOfUnits> <unit kind="second" exponent="-1"/> </listOfUnits> </unitDefinition> <unitDefinition id="litre_per_mole_per_second"> <listOfUnits> <unit kind="mole" exponent="-1"/> <unit kind="litre" exponent="1"/> <unit kind="second" exponent="-1"/> </listOfUnits> </unitDefinition> </listOfUnitDefinitions> <listOfCompartments> <compartment id="cytosol" size="1e-14"/> </listOfCompartments> <listOfSpecies> <species compartment="cytosol" id="ES" initialAmount="0" name="ES"/> <species compartment="cytosol" id="P" initialAmount="0" name="P"/> <species compartment="cytosol" id="S" initialAmount="1e-20" name="S"/> <species compartment="cytosol" id="E" initialAmount="5e-21" name="E"/> </listOfSpecies>

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 8

Page 9: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

<listOfReactions> <reaction id="veq"> <listOfReactants> <speciesReference species="E"/> <speciesReference species="S"/> </listOfReactants> <listOfProducts> <speciesReference species="ES"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> <apply> <minus/> <apply> <times/> <ci>kon</ci> <ci>E</ci> <ci>S</ci> </apply> <apply> <times/> <ci>koff</ci> <ci>ES</ci> </apply> </apply> </apply> </math> <listOfParameters> <parameter id="kon" value="1000000" units="litre_per_mole_per_second"/> <parameter id="koff" value="0.2" units="per_second"/> </listOfParameters> </kineticLaw> </reaction>

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 9

Page 10: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

<reaction id="vcat" reversible="false"> <listOfReactants> <speciesReference species="ES"/> </listOfReactants> <listOfProducts> <speciesReference species="E"/> <speciesReference species="P"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> <ci>kcat</ci> <ci>ES</ci> </apply> </math> <listOfParameters> <parameter id="kcat" value="0.1" units="per_second"/> </listOfParameters> </kineticLaw> </reaction> </listOfReactions> </model> </sbml>

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 10

Page 11: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML models may feature several components

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 11

Page 12: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML specifies the number and kind of attributes models and components can have

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 12

Page 13: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

It’s up to the modeler to use those attributes in a meaningful way

what models have you produced?

Page 14: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Biomodels are semantically annotated SBML models

• EBI managed resource • 600+ models available as

SBML • 300+ models are curated

with GO process, function and component terms, and has links to protein databases.

• Possible to browse by GO terms:

http://www.ebi.ac.uk/biomodels-main/

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14

Page 15: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Energy (ATP) is produced from glycolysis (break down of glucose) in a series of enzyme-catalyzed biochemical reactions. Fermentation regenerates NAD+ so it can be re-used to metabolize more glucose Analysis and optimization of metabolic pathways important for biotechnology

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 15

Page 16: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Gene Ontology

• over 30,000 terms • covers

o biological processes o molecular functions o cellular components

• terms organized around "is

a" hierarchy • terms further described with

'has part'/'part of'; 'regulates' and '+ regulates', '- regulates'

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16

Page 17: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Chemical Entities of Biological Interest (ChEBI)

recently refactored to be in line with formal (reasoning capable) ontology scope includes chemical entities (atoms, substances, groups, molecules), roles and subatomic particles large numbers of curated molecules

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17

Page 18: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML annotations are captured using the Resource Description Framework (RDF)

<species metaid="_525530" id="GLCi" compartment="cyto" initialConcentration="0.097652231064563"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/"> <rdf:Description rdf:about="#_525530"> <bqbiol:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/> <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/> </rdf:Bag> </bqbiol:is> </rdf:Description> </rdf:RDF> </annotation> </species>

object

predicate

The intent is to express that the species represents a substance composed of glucose molecules We also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M

The annotation element stores the RDF

subject

Implicit subject and xml attributes

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 18

Page 19: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

annotated models contain references to entities described elsewhere

Pubmed - papers ChEBI - chemicals UniProt - proteins KEGG - chemicals, reactions E.C. - reactions Gene Ontology - functions, reactions, compartments Taxonomy - organism

<model> <annotation> `<bqmodel:isDescribedBy> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:pubmed:17667951"/> </rdf:Bag> </bqmodel:isDescribedBy> <bqbiol:hasPart> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:kegg.pathway:sce00010"/> <rdf:li rdf:resource="urn:miriam:obo.go:GO%3A0019642"/> </rdf:Bag> </bqbiol:hasPart> <bqmodel:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/> </rdf:Bag> </bqmodel:is>

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 19

Page 20: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

It looks like another XML syntax, but it has RDF semantics! What is the meaning of SBML’s RDF annotation?

• The intent is to indicate that the model is a model of a yeast • RDF semantics: #_551383 is a member of a set that is related by

bqmodel:is to a collection (rdf:Bag) that has a single member – yeast (4932)

• RDF semantics does not match the intent!

<rdf:Description about=“#_551383”> <bqmodel:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/> </rdf:Bag> </bqmodel:is> </annotation>

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 20

Page 21: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

BioModels.net biology qualifiers

is, identity The biological entity represented by the model element

has identity with the subject of the referenced resource (modeling object B). This relation might be used to link a reaction to its exact counterpart in a database, for instance.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 21

Can we formalize and automatically verify the intended meaning of the RDF annotation?

Page 22: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Biomodels: Qualifiers

Qualifiers for the biological object represented by the model component.

encodes/isEncodedBy hasPart/isPartOf hasProperty/isPropertyOf hasVersion/ isVersionOf is isDescribedBy isHomologTo occursIn

http://www.ebi.ac.uk/miriam/main/qualifiers/

Page 23: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

In this tutorial You will learn how to create accurate knowledge representations of annotated SBML models. Features • ontological commitment: terms in a vocabulary

correspond to formally defined classes and relations and expressions formulated using the Web Ontology Language (OWL) have an unambiguous interpretation

• upper level ontology of types and relations to distinguish and constrain model entities to the spatio-temporal entities they represent

• Reasoning to uncover inconsistencies, and how to repair them.

• Advanced applications of OWL ontologies for answering questions and providing biological insight

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 23

Page 24: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

What is a model?

How does it differ from the thing it is a model of?

Page 25: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Conceptualization (SBML)

• 2 kinds of entities: o in silico: model components o in vivo: the entities represented by a model

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25

Page 26: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Conceptualization

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 26

Page 27: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML Conceptualization

• Instances of SBML model entities are syntactic entities (in XML)

• SBML models represent biological phenomena and structures (e.g., Cell cycle processes, Yeast cells, ...)

• Here we focus on Model, Compartment, Species, Reaction

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27

Page 28: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Formalization

• Formalization is the process by which we map a conceptualization into a logical representation, which has a particular interpretation.

• We first express the basic nature of what the terms refer to

by defining them in using a formal language. Next, we can logically combine the terms to form expressions, which have an unambiguous interpretation, and hence can be automatically reasoned about.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 28

Page 29: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Have you heard of the Semantic Web?

Page 30: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

The Semantic Web

It is about standards for publishing, sharing and querying knowledge drawn from diverse sources

It enables the answering of sophisticated questions

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 30

Page 31: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

The Semantic Web effort aims to develop an interoperable set of standards for knowledge representation and reasoning

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 31

Page 32: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

URI/IRI

• Uniform Resource Identifiers (URI) and Internationalized Resource Identifiers (IRI) are identifiers for resources, given a particular protocol

• We’re familiar with Uniform Resource Locators, which species the use of the HTTP protocol to obtain a document with that identifier. – http://dumontierlab.com • International Resource Identifiers (IRIs) include an

expanded set of international characters • URI/IRIs are the basis for naming resources on the

Semantic Web. – As names, they can also be used to identify non-information

resources, like people and places ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 32

Page 33: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Entity naming

• Uniform Resource Identifiers (URI) are identifiers for resources given a particular protocol. Internationalized Resource Identifiers (IRI) include an expanded set of international characters

• URI/IRIs can be used to name entities, both for digital media and non-informational entities like people and places.

• Uniform Resource Name (URN) – only a name

o MIRIAM - Minimal Information Required In the Annotation of Models data source and identifier combined in a single IRI -

urn:miriam:source:identifier e.g. urn:miriam:uniprot:P62158 ~ 40 sources defined at EBI registry...

• Uniform Resource Locator (URL) – a resolvable name

o Bio2RDF - Makes life sciences data available on the Semantic Web o http://bio2rdf.org/uniprot:P62158 o content-type negotiation and explicit URLs resolve to an HTML/RDF/etc description

of it. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 33

Page 34: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Semantic Technologies: RDF vs OWL

RDF: simple triples, graph-based queries, supports very large amount of data OWL: significantly more expressive language, strong axioms, inference capabilities, consistency verification, but can be rather slow

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 34

Page 35: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Resource Description Framework (RDF)

Uniform Resource Identifier (URI) can be used as entity

names Bio2RDF specifies its naming convention

http://bio2rdf.org/uniprot:P05067 is a name for Amyloid precursor protein http://bio2rdf.org/omim:104300 is a name for Alzheimer disease

uniprot:P05067

omim:104300

Allows one to talk about anything

35 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 36: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Resource Description Framework (RDF)

uniprot:Protein

rdf:type

A RDF statement consists of: – Subject: resource identified by a URI – Predicate: resource identified by a URI – Object: resource or literal

uniprot:P05067

Allows one to express statements

36

“Amyloid precursor protein”

rdfs:label

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 37: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

RDF/XML <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:u="http://bio2rdf.org/uniprot:" <rdf:Description rdf:about=“&u;Q16665"> <rdf:type rdf:resource=“&u;Protein"/> </rdf:Description> </rdf:RDF>

PREFIX u: <http://bio2rdf.org/uniprot:> <u:Q16665> a <u:Protein> .

RDF/N3

RDF has multiple serializations

37 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 38: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Multi-Source Data Integration

uniprot:P05067 go:Membrane

uniprot:Protein is a

located in

uniprot:P05067

uniprot:P05067 uniprot:P05067 interacts with

UniProt

Gene Ontology

uniprot:P05067

has name

located in

interacts with

Unified view

+

+

iRefIndex

Syntactic data integration depends on consistent naming

38

go:Membrane

uniprot:Protein

uniprot:P05067

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 39: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Building statements creates knowledge

uniprot:P05067

Protein

is a

omim:104300

Disease

is a

is involved in

Amyloid precursor protein

label

Alzheimer Disease

label

39 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 40: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Bio2RDF’s RDFized data fits together

40 syntactic integration ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with

Page 41: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SGD as RDF-based Linked Open Data

41 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 42: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Bio2RDF links and provisions 40 high value datasets

42 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 43: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Bio2RDF now serving over 40 billion triples of linked biological data

43 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 44: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

SGD is provided by Bio2RDF and forms part of the growing linked open data cloud

44 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 45: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Semantic Integration

• Requires a level of abstraction/generalization where the relationship between each resource is formalized – classes – relations – individuals

• How do we ensure that our representation facilitates integration across datasets?

• How can we get our formalization to interoperate with ontologies?

45 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 46: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

RDF-based Linked Data

• Provides the basis for simple data syndication and syntactic data integration o IRIs o Statements (aka triples) take the form of o <subject> <predicate> <object>

• Easy to implement o stand-alone datasets o logical layer over databases

• Limited reasoning o class and property hierarchies o domain/range restrictions o can’t automatically discover inconsistency

• Standardized Queries - SPARQL • Scalable - to billions of triples

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 46

Page 47: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

What do you know of OWL?

Page 48: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

The Web Ontology Language (OWL) Has Explicit Semantics

Can therefore be used to capture knowledge in a machine understandable way

48 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 49: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL - The Web Ontology Language

• Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values o quantifiers (existential, universal, cardinality restriction) o negation o disjunction o property characteristics o complex classes in domain and range restrictions o property chains

• Advanced reasoning

49 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 50: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Advanced Reasoning

• Consistency: determines whether the ontology contains contradictions.

• Satisfiability: determines whether classes can have

instances. • Subsumption: is class C1 implicitly a subclass of C2? • Classification: repetitive application of subsumption to

discover implicit subclass links between named classes • Realization: find the most specific class that an individual

belongs to.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 50

Page 51: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL Challenges and Solutions

Inconsistency: • needs to be resolved to ask any questions involving the

ontology • Solution: explicitly accommodate multiple meanings,

remove contradictory axioms

Unsatisfiability (of a class): • may indicate a modelling error • needs to be resolved to ask meaningful questions about

the class • Solution: explicitly accommodate multiple meanings,

redefine class, remove contradicting class restrictions

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 51

Page 52: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL Challenges and Solutions

Scalability: • answers to OWL queries requires reasoning • inference in OWL is highly complex (worst case: 2

NEXPTIME) • highly optimized reasoners are getting better and better,

but can still be slow with large ontologies • tractable OWL profiles (EL, QL, RL) enable more efficient

and guaranteed polynomial-time inferences • use ontology modularization approaches to increase

performance

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 52

Page 53: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL can help you create rich, machine-understandable descriptions!

• transform our expert knowledge into axioms and expressions that can be automatically reasoned about o a transcription factor is a protein that binds to DNA and regulates the expression of a gene.

o can we mine 'omic datasets to discover which proteins are transcription factors?

• create rich expressions from combinations of classes, relations and individuals

• assert statements of truth using axioms.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 53

Page 54: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Linked data and OWL: Motivation

• use OWL reasoning to identify mistakes in RDF data o incorrect content of assertions o incorrect use of relations o conflicting conceptualizations o incorrect same-as assertions

• verify, fix and exploit Linked Data through expressive OWL reasoning

• generate/infer new triples to write back into RDF and use for efficient retrieval

Proposal: Represent SBML biomodels into OWL from the implicit relations and explicit attributes in XML/RDF.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 54

Page 55: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Elements of OWL 2.0

• The “ontology” of OWL 2 consists of: • Classes • Object properties • Data properties • Individuals • Expressions • Axioms • Plus RDF stuff (like datatypes)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 55

Page 56: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Axiomatization

• Axioms are statements that are assumed to be true in the domain

• Axioms formally interrelate terms from conceptualization step

every statement can be reduced to an expression based only on primitive terms Therefore: every axiom expressed only using primitive terms

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 56

Page 57: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Classes and class axioms

• a class is a set of individuals that share one or more characteristics o a protein

• classes can be organized in a hierarchy using subClassOf axioms o i.e. every member of C2 is a member of C1 o subClassOf (protein molecule)

• special classes o owl:Thing is the superclass of all things o owl:Nothing is the subclass of all things, denotes an empty set

• classes can be made disjoint from one another o i.e. there is no member of C1 that is also a member of C2 o disjointClasses (protein DNA )

• classes can be said to be equivalent o i.e. all members of C1 are members of C2 and all members of C2

are members of C1 o EquivalentClass (Peptide Polypeptide )

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 57

Page 58: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Object Properties and axioms

• an object property OP is a relation between two individuals o 'has part' is an object property that denotes the mereological

relation between two individuals • OPs can be organized in a hierarchy

o given OP1 and OP2 and OP2 is a subproperty of OP1 then if an individual x is connected by OP2 to an individual y, then x is also connected by OP1 to y.

o subPropertyOf ('has proper part' 'has part') o owl:TopObjectProperty, owl:BottomObjectProperty

• We can restrict the domain and range to allowed values • ObjectPropertyDomain ('is participant in', 'process') • ObjectPropertyRange ('is participant in', 'physical entity') • We can also assert objects to be disjoint or equivalent

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 58

Page 59: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

description of object properties

• Inverse o we say that 'has part' is an inverse for 'is part of' o we can also refer to this as inv('is part of')

• Symmetric o to cases where the inverse relation is the very same relation o e.g. the inverse for 'is related to' is 'is related to‘

• Transitive o a transitive relation if individual x is connected to an individual y

that is connected by to an individual z, then x is also connected by to z

o e.g. 'has part' is transitive

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 59

Page 60: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

description of object properties

• Reflexive o reflexive infers that the relation automatically refers back to the

individual o e.g. 'has part' is reflexive because protein has itself as a part.

• Functional o restrict the range of the relation to a single individual, and

therefore all individuals in the range must be the same. o e.g. 'has unique identifier‘

• Inverse Functional o restrict the domain of the relation to a single individual, therefore

all individuals in the domain must be the same o e.g. 'is unique identifier of'

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 60

Page 61: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Class Expressions

Class expressions are rich descriptions of classes through the logical combination of ontological primitives (classes, object properties, datatype properties, individuals) Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’ Combinations specified using logical operators

• conjunction (and), disjunction (or), negation (not) Object or data property expressions provide a qualified cardinality over the relation

o minimum: rel min # Y o maximum: rel max # Y o exact: rel exactly # Y (minimum + maximum) o some: rel min 1 Y

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 61

Page 62: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Class Expressions

o The quantifications can qualified by the object type o rel only Y – the only values allowed are of type Y

• To form complex class expressions like o 'molecule' and not 'dna' o 'has part' min 2 'amino acid' o 'is located in' only ('nucleus' or 'cytoplasm')

• and be expressed as axioms in the ontology Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’ Transcription Factor equivalentTo ‘protein’ and ‘has disposition’ some ‘to bind to DNA’ and ‘has function’ some ‘to regulate gene expression’

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 62

Page 63: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

What do the following mean, and what biological thing might you

annotate with it?

C equivalentTo ‘has part’ exactly 2 polypeptide

M subClassOf

DNA and not molecule

Page 64: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL has multiple syntaxes Functional-Style Syntax ClassAssertion( :Person :Robert) RDF Syntax RDF/XML <Person rdf:about="Robert"/> RDF Turtle :Robert rdf:type :Person . Manchester Syntax Individual: Robert Types: Person OWL/XML Syntax <ClassAssertion> <Class IRI="Person"/> <NamedIndividual IRI="Robert"/>

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 64

Page 65: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL Reasoners

OWL DL Reasoners • Pellet: Clark & Parsia, dual-licensed, Java. • Fact++: Manchester University, open-source, C++ with a Java API. • HermiT: Oxford University, open-source, Java. • Racer Pro: Racer Systems, commercial, Lisp with a Java API.

OWL Profile/subset reasoners • Jena: Hewlett-Packard, open-source, Java. • OWLIM: Ontotext, dual-licensed, Java. • CB: • CEL: • JCEL (Pellet) • ELLY:

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 65

Page 66: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Formalization of XML/RDF using OWL

• For every triple, we want to create an axiom that makes a commitment as to what the terms refer to and what their combination necessarily implies.

• We will also commit to expressing our knowledge in a consistent manner, and this will allow other information resources to be semantically integrated (the expressions are comparable and share the same semantics)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 66

Page 67: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Triples to axioms

Convert RDF triples into OWL axioms. Triple in RDF: <nucleus> <part-of> <cell> • Nucleus and Cell are classes • part-of is a relation between 2 classes • intended meaning:

every instance of Nucleus is partOf some instance of Cell • formalize as OWL axiom:

Nucleus SubClassOf: part-of some Cell

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 67

Page 68: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Triples to axioms: Many possible formalizations – knowledge of logics and domain expertise comes in

handy here! Convert RDF triples into OWL axioms. Triple in RDF: <C1 R C2> • C1 and C2 are classes, R a relation between 2 classes • intended meaning:

o C1 SubClassOf: C2 o C1 SubClassOf: R some C2 o C1 SubClassOf: R only C2 o C2 SubClassOf: R some C1 o C1 SubClassOf: S some C2 o C1 DisjointFrom: C2 o C1 and C2 SubClassOf: owl:Nothing o R some C1 DisjointFrom: R some C2 o C1 EquivalentClasses C2 o ...

• in general: P(C1, C2), where P is an OWL axiom (template) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 68

Challenge: Formalizing data requires

one to commit to a particular meaning – to

make an ontological commitment

Page 69: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Triples to axioms

Triple in RDF: <Cytosol> <isLocationOf> <HXK1> • Cell and HXK1 are classes • isLocationOf is an axiom pattern involving 2 classes • intended meaning:

every instance of HXK1 is located at some instance of Cytosol • not intended:

for every instance of Cytosol, there is an instance of HXK1 located in it.

HXK1 subClassOf hasLocation some Cytosol inv(isLocationOf) some Cytosol

Page 70: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Triples to axioms

Formalizing RDF triples in OWL may introduce new OWL object properties. • Which object properties should be included? • What axioms hold for included object properties? • Can domain and range restrictions be generalized across

multiple domains, i.e., reused across multiple linked data sources to ensure consistency between them?

Integration of OWL ontologies requires a common semantic platform

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 70

Challenges

Page 71: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Axiom Patterns for Triples

<nucleus> <part-of> <cell> ?X part-of ?Y •translated to axiom pattern ?X subClassOf: part-of some ?Y -> Nucleus subClassOf: part-of some Cell

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 71

Page 72: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Implementation

• expand relations in RDF based on relational patterns • relational patterns are OWL axioms with 2 variables (which

are filled by subject and object, respectively) • implementation based on OWL API • adopt implementation of relational patterns in OBO

language (http://code.google.com/p/obo2owl/) Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre, Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL and their application to OBO. OWL: Experiences and Directions (OWLED). paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdf presentation: http://www.slideshare.net/micheldumontier/relational-patterns-in-owl-and-their-application-to-obo BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 72

Page 73: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Another way?

• OPPL is an abstract formalism that allows for manipulating ontologies written in OWL.

• Use OPPL to select triples and create the axioms

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 73

http://oppl2.sourceforge.net/

Page 74: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification
Page 75: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Which types and relations should we use for our axiom patterns?

Page 76: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Top level ontologies contain generalized (domain independent) classes and

relations

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 76

They can be used to constrain what can be said about these entities (and hence will later be useful for checking the consistency of data annotated using these terms).

Page 77: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Basic classes in top-level ontologies

• Material entity • Example: Apple, Human, Cell, Planet • Has mass as an quality • Located in space and time • Independent of other entities • it exists in whole whenever it exists

• Quality

• Example: mass, color, concentration • Dependent: always the quality of some entity • Quality of object: size, shape, length • Quality of process: duration, rate • Quality of quality: shade (of color), intensity

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 77

Page 78: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Basic classes in top-level ontologies

• Function • e.g. to bind, to catalyze (a reaction), to kill bacteria • Dependent: always the function of some thing • Similar to a property of an object • Represents the potential to do something (an action) in

some process • capabilities, dispositions and tendencies

• Process • Example: running a marathon, binding, cell division • Located in space and time • Independent of other entities • Temporally extended

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 78

Page 79: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Top-level ontologies can make a commitment to these being disjoint

Material object, Process, Function and Quality are mutually disjoint.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 79

Page 80: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Basic Relations in Top Level Ontologies

• relations (object properties) in OWL hold between instances

• Mereological: parthood – ‘has part’, ‘has proper part’, ‘has component part’ • Participatory – ‘is participant in’, ‘is agent in’, ‘is target in' • Spatial – ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’ • Temporal – ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc • Referential – ‘describes’, ’denotes’, ‘represents’

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 80

Page 81: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Relations in top-level ontologies

• domain and range restrictions from top-level ontology can be applied for general relations, e.g.: • ‘has material part’ can be restricted with "Material

object" as both domain and range • ‘participates in’ can be restricted with a domain of

"Material object" and a range of "Process“ • re-use of relations (between instances) enables

inferences across resources

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 81

Page 82: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Relations impose additional constraints, such that inconsistencies arise when incorrectly used

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 82

Page 83: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Alignment with top-level ontology

Foundation of domain classes and relations in top-level ontology: • every domain class becomes a subclass of a class in top-

level ontology • every object property used in OWL axioms becomes a sub-

property of an object property in the top-level ontology • assert additional axioms to restrict domain classes and

delimit it from other domains (where appropriate) o e.g., if a particular resources uses (in RDF) the relation

part-of exclusively between processes, the additional constraint can be added to this relation

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 83

Page 84: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

What’s the role of top level ontologies?

Page 85: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Top-level ontology

Application of a top-level ontology: • can help to make the ontological commitment that is

employed within an information system explicit, • can guarantee basic agreement about fundamental,

common types, • Basic agreement about common relations, • provides common domain and range restrictions across

multiple domains, and therefore • enables re-use of relations and types across data sources,

domains, levels of granularities, information systems.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 85

Page 86: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Formalization of SBML Models:

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 86

• SBML models and model annotations are converted into OWL axioms by making SBML's ontological commitment explicit • Implementation as conversion patterns

An explicit ontological commitment establishes and implements a one-to-one correspondence between SBML expressions and a formal interpretation within an ontology.

Page 87: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Bridging the gap: combine in vivo entities and in silico entities in a common model

(an ontology) defined with axioms

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 87

Page 88: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Formalization

Reaction: A reaction represents some transformation, transport or binding process, typically a chemical reaction, that can change the amount of one or more species. (Hucka et al.) vs a Model component that is part-of a Model and represents some Process

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 88

Page 89: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Formalizing SBML models using OWL

Model component(x): a model entity that is part of a model 'model component' equivalentClass 'model entity' that 'is part of' some 'model'

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 89

Page 90: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL Axiom: Model SubClassOf: represents some MaterialEntity Conversion rule: a Model annotated with class C represents: If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C) If C is a SubClassOf Process then M SubClassOf: represents some (has-function some (realized-by only C))

Assumption 1: Every model represents a material entity

Page 91: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Annotated with heterotrimeric G-protein complex cycle (GO:0031684):

BIOMODEL 82: Converting Model

• represents an object O1 • O1 has a function F1 • F1 is realized by processes of the type heterotrimeric G-protein complex cycle • M SubClassOf: represents some O1 • O1 SubClassOf: (has-function some (realized-by only GO:0031684)

Page 92: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Compartment(x): a model component that represents a material object which is part of the object represented by the model to which the component belongs Compartment subClassOf 'model component' and represents some 'Material object' Conversion rule:

• represents an object O2 • part of the object represented by the model • compartment’s species represent objects that are located in O2 • C SubClassOf: represents some A2 • A2 SubClassOf: located-in some A1

Assumption 2: Every compartment represents a material object

Page 93: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

BIOMODEL 82: Converting Compartment “Cell”

Annotated with Cell (GO:0005623)

• represents an object O2 • O2 is a kind of Cell • O2 is a part of O1 (represented by BIOMODEL 82) • C SubClassOf: represents some O2 • O2 SubClassOf: Cell and part-of some O1

Page 94: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Species(x): a model component that represents a material object which is part of the entity represented by the compartment of which the species is a part Species subClassOf 'model component' and represents some 'Material object' Species represents an O3 which

• can have functions • the functions can be realized by processes • can have qualities (charge, amount, …) • is located in O2

Assumption 3: Every species represents a material object

Page 95: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

BIOMODEL 82: Converting Species “GTP”

Annotated with GTP (CHEBI:15996) • represents an object O3 • O3 is a kind of GTP • O3 is located-in O2 (represented by “Cell” compartment) • S SubClassOf: represents some O3 • O3 SubClassOf: GTP and located-in some O2 • O3 SubClassOf: GTP and located-in some (Cell and part-of some (has-function some (realized-by only GO:0031684)))

Page 96: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Reactions as Functions, not Processes

Reactions represent Functions. Why not processes? - Functions are capabilities while processes are manifestations of these capabilities - Processes have a duration, a time of occurrence, participants, etc. - Functions can be realized multiple times, processes occur only once - Processes may be represented by simulations

Page 97: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Reaction(x): a model component that can include reactants, products and modifiers and represents a functional entity Reaction subClassOf 'model component' and 'represents' some ( ‘material entity’ and ‘has function’ some Function) ListOfReactions(x): a List that has only Reactions as members ListOfReactions EquivalentTo: List and 'has member' only 'reaction'

Assumption 4: Every reaction represents a functional entity

Page 98: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

BIOMODEL 82: Converting Reaction “GTP-binding”

Annotated with GTP binding (GO:0005525) • represents an object O4 • O4 has a function F4 • F4 is a kind of GTP binding • F4 is realized by P4 • P4 has-input O3 (GTP) •R SubClassOf: represents some (has-function some F4) •F4 SubClassOf: GTP binding and realized-by only P •P SubClassOf: has-input some O3

Page 99: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 99

Page 100: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

How would you formalize a model annotate with: A) heart B) to pump blood C) heart palpitations

Page 101: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML2OWL: Implementation

1. Read the model • libSBML - http://sbml.org/Software/libSBML

2. Extract annotations from model & components • libSBML & Jena - http://jena.sourceforge.net

3. Formalize each annotation according to the formalization rules • OWLAPI - http://owlapi.sourceforge.net/

4. Integrate with external ontologies • OWLAPI

5. Reasoning

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 101

Page 102: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

SBML2OWL: Implementation

Application to BioModels repository yields: • OWL ontology with

• more than 300,000 classes • More than 800,000 axioms • 90,000 complex model annotations

• includes all referenced ontologies o GO o ChEBI o Celltype o FMA o PATO o (KEGG, Reactome)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 102

Page 103: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWLAPI: • Ontology consists of

o a signature (classes, object properties, individuals) o a set of axioms

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 103

SBML2OWL: Implementation

Page 104: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Reference implementation: SBML Harvester http://code.google.com/p/sbmlharvester/

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 104

SBML2OWL: Implementation

Page 105: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Verification, querying, integration

What can we do with the combined knowledge base? 1. Verification 2. Querying 3. Interoperability and knowledge integration

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 105

Page 106: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Operations on OWL ontologies

Consistency checking will identify contradictions in the stated and inferred knowledge. Consistency checking also helps to implement other reasoning tasks. • Satisfiability: determines whether classes can have

instances. • Subsumption: is class C1 implicitly a subclass of C2?

Check if C1 and not C2 is unsatisfiable, i.e., there is no instance of C1 that is not also an instance of C2

• Classification: repetitive application of subsumption to discover implicit subclass links between named classes

• Realization: find the most specific class that an individual belongs to. Does individual a classify into the class C?

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 106

Page 107: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Practical reasoning with OWL ontologies • Ontology editors such as Protege interface with reasoners to

perform consistency and class satisfiability, classification, realisation, and provide explanations.

• Some reasoners are setup to be used as the command line

to execute requests including SPARQL querying. • Programmatic use of reasoners via APIs. Maximal flexibility,

e.g., one can request all subclasses of a given class, including implicit once, or all entailed statements with a specified subject and predicate

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 107

Page 108: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Operations on OWL ontologies

Consistency checking will identify contradictions in the stated and inferred knowledge. Consistency checking also helps to implement other reasoning tasks • Satisfiability: determines whether classes can have

instances. • Subsumption: is class C1 implicitly a subclass of C2?

Check if C1 and not C2 is unsatisfiable, i.e., there is no instance of C1 that is not also an instance of C2

• Classification: repetitive application of subsumption to discover implicit subclass links between named classes

• Realization: find the most specific class that an individual belongs to. Does individual a classify into the class C? Check if a : ¬C is consistent with the underlying ontology.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 108

Page 109: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Classifying the ontology

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 109

Page 110: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Classifying the ontology

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 110

Page 111: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Classifying the ontology

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 111

Page 112: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Verification

• Use of OWL reasoning for classification • Which classes are unsatisfiable? • Unsatisfiable classes are equivalent to owl:Nothing

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 112

Page 113: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Model verification

After reasoning, we found 27 models to be inconsistent reasons 1. our representation - functions sometimes found in the place

of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations

2. SBML abused - species used as a measure of time 3. constraints in the ontologies themselves mean that the

annotation is simply not possible

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 113

Page 114: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Compartments/species annotated with

functions or processes

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 114

Page 115: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Biological inconsistency: Biomodel 176

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 115

Page 116: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Biological inconsistency: Biomodel 176

[Term] id: GO:0016887 name: ATPase activity is a: GO:0017111 intersection of: GO:0003824 ! catalytic activity intersection of: has input CHEBI:15377 ! water intersection of: has input CHEBI:15422 ! ATP intersection of: has output CHEBI:16761 ! ADP intersection of: has output CHEBI:26020 ! phosphates

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 116

Page 117: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Finding inconsistencies with axiomatically enhanced ontologies We add: • GO: ATP + Water the only inputs (=2 quantification) • ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all

different (disjointness)

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 117

Page 118: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Consistency repair

• Unsatisfiable classes result from contradictory class definitions

• Conflict in asserted axioms, in imported ontologies or through combination of both

• Conflicts can be hidden through domain/range restrictions, subclass relations, axioms for relations, etc.

• Conflicting axioms may be challenging to identify!

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 118

Page 119: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Consistency repair

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 119

Page 120: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Protege 4: Explanation Workbench

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 120

Page 121: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Ontology repair and disambiguation

• Ontological commitment may have been too strong • Complex relations (between classes) can be relaxed by

explicitly introducing a disjunction • Example: o Assumption 1: models represent material objects o model is annotated with the process Glycolysis o process and material object are disjoint, therefore the

KB will contain unsatisfiable classes

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 121

Page 122: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

disambiguation pattern: models annotated with X represents material objects X, or material objects with function X, or material objects with function that is realized by X. disambiguation patterns are applicable if multiple alternatives are mutually disjoint automated reasoning will then eliminate all but one option

Disambiguation pattern

Page 123: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Disambiguation: Model annotations

Assertion: M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C)) C SubClassOf: MaterialEntity Then: • represents some C is satisfiable • represents some (has-function some

C) and represents some (has-function some (realized-by only C)) are unsatisfiable

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 123

Page 124: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Disambiguation: Model annotations

Assertion: M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C)) C SubClassOf: Function Then: • represents some (has-function some C) is

satisfiable • represents some C and represents some (has-

function some (realized-by only C)) are unsatisfiable

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 124

Page 125: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Disambiguation: Model annotations

Assertion: M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C)) C SubClassOf: Process Then: • represents some (has-function some (realized-by

only C)) is satisfiable • represents some C and represents some (has-

function some C) are unsatisfiable

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 125

Page 126: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Aside from the disjunction pattern, what else could be used for

consistency repair?

Page 127: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Once consistent, we can query the ontology and infer new knowledge

what would YOU ask of your formalized knowledge base?

Page 128: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• All queries are of the form: o Query class: Y o List all subclasses (and descendant classes),

equivalent classes, superclasses (and ancestor classes)

o Some OWL reasoners perform only classification and output the classified taxonomy

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 128

Page 129: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• Query: list all models • Query type: subclasses • Query class: Model

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 129

Page 130: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• Query: list all reactions that are part of BIOMD0000000169

• Query type: subclasses • Query class: Reaction and part-of some BIOMD0000000169

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 130

Page 131: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• Query: list all models that represent Glycolysis • Query type: subclasses • Query class: Model and represents some (has-function some

(realized-by only Glycolysis))

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 131

Page 132: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• Query: list all models that have a compartment that represents a part of a Cell in which a sugar is located

• Query type: subclasses • Query class: Model and has-part some (Compartment and

represents some (part-of some Cell and contains some Sugar))

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 132

Page 133: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• Query: list all Model entities that represent catalytic activity involving sugar in the endocrine pancreas

• Query type: subclasses • Query class: represents some (has-function some 'catalytic

activity' and realized-by only (has-participant some (sugar and contained-in some (part-of some 'Endocrine pancreas'))))

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 133

Page 134: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery and retrieval

• Query: list all Model entities that represent mutagenic central nervous system drugs in the gastrointestinal system

• Query type: subclasses • Query class: represents some (has-part some ('has role' some

'central nervous system drug' and 'has role' some mutagen and part-of some 'Gastrointestinal system')

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 134

Page 135: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Answering questions

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 135

Page 136: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Automated reasoning

• more than 800,000 axioms • included ontologies contains several thousand axioms

o GO has approx. 35,000 classes o ChEBI contains almost 100,000 classes o complex definitions of classes create links between

large ontologies • Reasoning in OWL 2 DL is highly complex (worst-case

2NEXPTIME complete - 2^(2^n) - with n the number of operators used in the ontology)

• Consequence: OWL reasoning can rarely be employing in

a large scale. • Expressive OWL reasoners do not classify the formalized

biomodels repository. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 136

Page 137: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL Reasoners

OWL DL Reasoners • Pellet: Clark & Parsia, dual-licensed, Java. • Fact++: Manchester University, open-source, C++ with a Java

API. • HermiT: Oxford University, open-source, Java. • Racer Pro: Racer Systems, commercial, Lisp with a Java API.

OWL Profile/subset reasoners • Jena: Hewlett-Packard, open-source, Java. • OWLIM: Ontotext, dual-licensed, Java. • CB: • CEL: • JCEL (Pellet) • ELLY:

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 137

Page 138: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Implementation in information systems

• Classification of model ontology: 10-120min • Answering complex queries: up to several

hours • Consequence: OWL reasoning can rarely be

employing in a large scale • Subsets of OWL allow tractable (polynomial-

time) automated reasoning • OWL EL suitable for ontologies with a large

number of classes • Problem: convert ontologies into tractable

subset of OWL ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 138

Page 139: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL Profiles

• OWL 2 defines three different tractable profiles: • EL o polynomial time reasoning for schema and data o Useful for ontologies with large conceptual part

• QL o fast (logspace) query answering using RDBMs via SQL o Useful for large datasets already stored in RDBs

• RL o fast (polynomial) query answering using rule-extended

DBs o Useful for large datasets stored as RDF triple

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 139

Page 140: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL RL

Features: • identity of classes, instances, properties • subproperties, subclasses, domains, ranges • union and intersection of classes (some restrictions) • property characterizations (functional, symmetric, etc) • property chains • keys • some property restrictions (but not all inferences are

possible) Limitations: • not all datatypes are available • no datatype restrictions • no minimum or exact cardinality restrictions • maximum cardinality only with 0 and 1 • some consequences cannot be drawn

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 140

Page 141: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

OWL EL

Features • existential quantification to a class expression or data range • existential quantification to an individual or a literal • self-restriction • enumerations involving a single individual or a single literal • intersection of classes and data range • class axioms: subClassOf, equivalence, disjointness • property axioms: domain, range, equivalence, transitive, reflexive, inclusion

with or without property chains; functional data properties. keys. • assertions (sameAs, DifferentFrom, Class, Object Property, Data Property,

Negative Object/Data Property Not supported • universal quantification to a class expression or a data range • cardinality restrictions • disjunction (union) • class negation • enumerations involving more than one individual • object properties: disjoint, symmetric,

asymmetric, irreflexive, inverse, functional and inverse-functional

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 141

Page 142: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Ontology modularization

Can we automatically extract a large (maximal) OWL (EL, QL, RL) module from an ontology? 1. D EquivalentTo: not A (not EL) 2. C EquivalentTo: not B (not EL) 3. B subClassOf: A (EL)

Inference: • D subClassOf: C (EL) (Inference from (1)-(3))

EL module of (1)-(3): • {B subClassOf: A}, or • {B subClassOf: A, D subClassOf: C}

142 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 143: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

EL Vira modularization

• ontology modularization • identify EL, QL, RL axioms in deductive closure • retain signature of ontology • maximality is an open problem

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 143

http://el-vira.googlecode.com

Page 144: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Outcomes

The SBML-derived ontologies can be i) checked for their consistency, thereby uncovering erroneous curations ii) infer attributes and relations of the substances, compartments and reactions beyond what was originally described in the models iii) answer sophisticated questions across a model knowledge base

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 144

Page 145: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Questions?

Page 146: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Phenotypes

Phenotypes are observable characteristics of an organism.

Examples include: – Red hair – Heart rate of 120bpm – Absent arm – Malfunctional liver

Phenotypes include comparisons such as Increased heart rate

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 146

Page 147: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Phenotype and anatomy ontologies

anatomy ontologies: > 100,000 classes – FMA, MA, WA, ZFA, FA, GO-CC, ...

phenotype ontologies: > 20,000 classes – HPO, MP, WBPhenotype, FBcv, APO, ...

quality ontology: > 2,000 classes – PATO

process and function ontologies: > 25,000 classes – Gene Ontology, ...

alignments between anatomy ontologies – UBERON, various mappings

Page 148: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Phenotype: Example question

Find all regions in the human, mouse, fish, fly, worm and yeast genome that are associated with tetralogy of Fallot.

Page 149: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Tetralogy of Fallot

Page 150: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Tetralogy of Fallot

– Overriding aorta (HP:0002623) – Ventricular septal defect (HP:0001629) – Pulmonic stenosis (HP:0001642) – Right ventricular hypertrophy (HP:0001667)

Page 151: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Phenotype descriptions

Overriding aorta (HP:0002623): – Q: overlap with (PATO:0001590) – E1: Aorta (FMA:3734) – E2: Membranous part of interventricular septum

(FMA:7135)

HP:0002623 EquivalentTo: phene-of some (has-part some (FMA:3734 and has-quality some (PATO:0001590 and towards some FMA:7135)))

Page 152: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Human-mouse anatomy mappings Overriding aorta (HP:0002623):

– Q: overlap with (PATO:0001590) – E1: Aorta (FMA:3734)

• FMA:3734 EquivalentTo: MA:0000062 – E2: Membranous part of interventricular septum

(FMA:7135) • FMA:7135 EquivalentTo: MA:0002939

Page 153: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Overriding aorta (MP:0000273): – Q: overlap with (PATO:0001590) – E1: Aorta (MA:0000062) – E2: Membranous interventricular septum (MA:0002939)

MP:0000273 EquivalentTo: phene-of some (has-part some (MA:0000062 and has-quality some (PATO:0001590 and towards some MA:0002939))) Consequence: MP:000272 EquivalentTo: HP:0002623

Mouse phenotype

Page 154: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Absence: absent appendix

Absent appendix: – Q: lacks all parts of type (PATO:0002000) – E1: Human body (FMA:20394) – E2: Appendix (FMA:14542)

AbsentAppendix EquivalentTo: LacksParts and towards some Appendix and inheres-in some HumanBody AbsentAppendix EquivalentTo: LacksParts and towards some {Appendix} and inheres-in some HumanBody AbsentAppendix EquivalentTo: phene-of some (HumanBody and not has-part some Appendix)

Page 155: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Absence and inconsistency

AbsentAppendix SubClassOf: phene-of some (HumanBody and not has-part some Appendix) HumanBody SubClassOf: has-part some Appendix HumanBody(John). AbsentAppendix(x). has-phene(John,x).

Page 156: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Inconsistency removal

– Removal of conflicting axioms (has-part/part-of in anatomy) – Contextualize anatomy:

• Normal and HumanBody SubClassOf: has-part some (Normal and Appendix)

– Use of non-monotonic reasoning

Page 157: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Ontology of phenotypes

Different formal expressions for phenotypes based on – qualities, – anatomical parts, – functions, – processes

Page 158: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Tetralogy of Fallot

Page 159: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Mouse model

Page 160: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Mouse model

Page 161: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

PhenomeBLAST

– apply definition patterns to yeast, fly, worm, fish, mouse and human phenotypes and integrate in single ontology

– phenotype alignment through OWL reasoning – more than 300,000 classes and 1,000,000 axioms – combination of HermiT (for EL Vira modularization), CB

and CEL reasoner – classification time: 7 minutes

http://phenomeblast.googlecode.org

Page 162: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Phenotype alignments

Page 163: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Comparison of phenotypes

direct comparison of phenotypes: – disease phenotypes, e.g., tetralogy of Fallot – phenotypes associated with genetic mutations

(genotypes in mouse, fish, etc.)

Page 164: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

When the phenotype annotation of a genotype becomes a subclass of a disease phenotype, then we can infer a gene-disease association if

– disease phenotypes sufficient for having the disease – mutation phenotypes necessary for having a specific

genotype Inference over ontologies can establish a formal proof for a gene-disease association.

Comparison of phenotypes

Page 165: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Knowledge discovery

Similarity-based comparison allows for incomplete and noisy information.

– pairwise comparison of phenotypes – similarity: weighted Jaccard index – result: similarity matrix between phenotypes – (quantitative) evaluation based on predicting orthology,

pathway, disease – identify novel gene-disease associations

Page 166: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Evaluation

Page 167: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

http://PhenomeBrowser.net

Page 168: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification
Page 169: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

What does the future hold?

Better formalized ontologies

Dynamic generation of knowledge through semantic web services

169 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies

Page 170: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Summary - RDF and OWL

RDF provides • light-weight semantics • fast queries • highly scalable implementations • large volumes of data (e.g., DBPedia, other Linked Data

repositories) OWL provides • Constructs to formalize the intended semantics • An OWLAPI to develop, manage, and serialize OWL

ontologies • Efficient reasoners of get inferences, compute modules

and get explanations. • syntactic subset for better performance, albeit some

inferences may be lost ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 170

Page 171: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Summary - OWL & Formal languages

• Formal logic-based languages can be used to formalize the meaning of terms used in discourse. While normally restricted in terms of what can be expressed, the statements formed can be automatically reasoned about.

• OWL is based on description logics and formalizes the

meaning of terms with axioms. Axioms can be used to characterize and distinguish classes, relations and individuals. Rich expressions can be crafted from logical combinations of language primitives including conjunction, disjunction, negation and object/dataproperty restrictions.

• OWL reasoners provide a number of services including

computing subsumption, satisfiability, entailment, realization and query answering.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 171

Page 172: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Summary - Exploitation of ontologies

• verification: automated reasoning can reveal contradictory definitions of classes (unsatisfiable classes), instances that violate constraints in the ontology (often leading to inconsistent ontologies) and reveal hidden inferences (that may be considered invalid through manual verification

• querying: ontologies define an explicit, formal language based on which queries to a knowledge base can be performed; queries can be made for instances and for classes satisfying complex conditions

• repair: through explicit definitions using disjunction, constraints can be relaxed and contradictions reduced

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 172

Page 173: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Summary - Ontology

• an ontology is a specification of a conceptualization of a

domain • a conceputalization is a system of categories accounting for

a particular view on the world • ontologies are used to make some aspects of the intended

meaning of terms in a vocabulary explicit • ontologies (in computer science) may utilize philosophical

theories • formalized ontologies can be used by humans and

automated systems as a basis for communication and data exchange

• Ontologies are useful tools for translational research

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 173

Ontology is not philosophy!

Page 174: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Summary - Implementation in information systems

• The OWLAPI is a reference implementation of the OWL specification and facilitates the development, management and serialization of expressive OWL ontologies. The OWLAPI also facilitates modularization and getting explanations.

• OWL provides a syntactic subset of the language for

efficient reasoning. These so-called OWL profiles (EL, RL, QL) have well understood computational properties and can lead to better performance, but with some inferences lost.

• Formal ontology makes it possible to not only retrieve data

(similar to db), but also query the concepts themselves

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 174

Page 175: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Summary - evaluation

• ontologies are tools to support science

• Ontologies can provide insight into real biological/scientific

problems

• quantifiable evaluation can be performed, e.g., based on precision/recall or ROC analysis

• application of ontologies may go beyond reasoning alone and use statistical analyses (enrichment), semantic similarity, graph algorithms, clustering, etc.

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 175

Page 176: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Conclusions

• Ontologies + Semantic Web enables • Integration • Verification • Analysis • Discovery • Translational research

ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 176

Page 177: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Acknowledgements

George Gkoutos Heinrich Herre

Janet Kelso Dietrich Rebholz-Schuhmann

Anika Oellrich Michael Ashburner

Dan Cook John Gennari Paul Schofield

Page 178: ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

[email protected] [email protected]

178 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies