43
Real World Applications of OWL 1 Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering Professeur Associé, Université Laval Visiting Associate Professor, Stanford University Protege Short Course::Dumontier:March 2012

Real World Applications of OWL

Embed Size (px)

DESCRIPTION

A presentation for the March 2012 Protege Short Course http://protege.stanford.edu/shortcourse/protege-owl/201203/index.html

Citation preview

Page 1: Real World Applications of OWL

1

Real World Applications of OWL

Michel Dumontier, Ph.D.

Associate Professor of BioinformaticsDepartment of Biology, School of Computer Science, Institute of Biochemistry,

Carleton UniversityOttawa Institute of Systems Biology

Ottawa-Carleton Institute of Biomedical EngineeringProfesseur Associé, Université Laval

Visiting Associate Professor, Stanford University

Protege Short Course::Dumontier:March 2012

Page 2: Real World Applications of OWL

Ontologies in Use

• Knowledge Capture (Rightfield)• Formalization and Verification (SNOMED-CT)• Consistency Checking (SBML Harvester)• Classification (Phosphatases, Compounds)• Semantic Annotation (Array Express/ Gene Expression Atlas,

Semantic Assistant)• Query Formulation (Array Express/ Gene Expression Atlas)• Query Answering (KUPD)• Search & co-occurence (gopubmed)• Semantic Assistant• Hypothesis Testing (HyQue)• Disease Similarity and Model Organism prediction

(phenomeBLAST)• Function Prediction (genemania)

Protege Short Course::Dumontier:March 20122

Page 3: Real World Applications of OWL

Knowledge CaptureRightfield

Protege Short Course::Dumontier:March 20123

K.Wolstencroft, S.Owen, M.Horridge, O.Krebs, W.Mueller, JL. Snoep, F.Preez, C.Goble RightField: Embedding ontology annotation in spreadsheets. Bioinformatics (2011), May 2011

Page 4: Real World Applications of OWL

FormalizationSNOMED-CT

• SNOMED-CT (Clinical Terms) ontology

• used in healthcare systems of more than 15 countries, including Australia, Canada, Denmark, Spain, Sweden and the UK

• also used by major US providers, e.g., Kaiser Permanente

• ontology provides common vocabulary for recording clinical data

• 395036 classes

Protege Short Course::Dumontier:March 20124

Page 5: Real World Applications of OWL

SNOMED-CT

• Pattern based knowledge capture• need training and an information system to

implement

Protege Short Course::Dumontier:March 20125

Page 6: Real World Applications of OWL

SNOMED - verification

• Kaiser Permanente extending SNOMED to express, e.g.:– non-viral pneumonia (negation)– infectious pneumonia is caused by a virus or a bacterium

(disjunction)– double pneumonia occurs in two lungs (cardinalities)

• This is easy in SNOMED-OWL– but reasoner failed to find expected subsumptions, e.g., that

bacterial pneumonia is a kind of non-viral pneumonia

• Ontology highly under-constrained: need to add disjointness axioms (at least)– virus and bacterium must be disjoint

- Ian Horrocks OWL2 tutorialProtege Short Course::Dumontier:March 20126

Page 7: Real World Applications of OWL

SNOMED

• Adding disjointness led to surprising results– many classes become inconsistent, e.g., percutanious

embolization of hepatic artery using fluoroscopy guidance

• Cause of inconsistencies identified as class groin– groin asserted to be subclass of both abdomen and

leg– abdomen and leg are disjoint– modelling of groin (and other similar “junction”

regions) identified as incorrect

- Ian Horrocks OWL2 tutorialProtege Short Course::Dumontier:March 20127

Page 8: Real World Applications of OWL

Consistency CheckingFormalization of SBML annotations into

OWL ontologies

• Biomodels contains hundreds of quantitative models

• SBML is an XML-based format for specifying models and their parameters

• Models and their components are being semantically annotated

• Use the ontologies to validate the assertions

Protege Short Course::Dumontier:March 20128

Integrating systems biology models and biomedical ontologies.Hoehndorf R, Dumontier M, Gennari JH, Wimalaratne S, de Bono B, Cook DL, Gkoutos GV.BMC Syst Biol. 2011 Aug 11;5:124.

Page 9: Real World Applications of OWL

Additional annotations are specified using the Resource Description Framework (RDF)

   <species metaid="_525530" id="GLCi" compartment="cyto"

initialConcentration="0.097652231064563">      

     <annotation>        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-

ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"

xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/"

xmlns:bqmodel="http://biomodels.net/model-qualifiers/">          <rdf:Description rdf:about="#_525530">

            <bqbiol:is>              <rdf:Bag>

                <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/>

                <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/>              </rdf:Bag>            </bqbiol:is>

          </rdf:Description>        </rdf:RDF>

      </annotation>    </species>

object

predicate

The intent is to express that the species represents a substance composed of glucose moleculesWe also know from the SBML model that this substance is located in the cytosol and with a (initial)

concentration of 0.09765M

The annotation element stores the

RDFsubject

Implicit subject and xml attributes

Protege Short Course::Dumontier:March 2012

9

Page 10: Real World Applications of OWL

OWL Axiom:M SubClassOf: represents some MaterialEntity

Conversion rule: a Model annotated with class C represents:

If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C

If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C)

If C is a SubClassOf Process then M SubClassOf: represents some (has-function some

(realized-by only C))

For each model annotation, we make a commitment to what it represents

Protege Short Course::Dumontier:March 2012

10

Page 11: Real World Applications of OWL

Protege Short Course::Dumontier:March 2012

11

Page 12: Real World Applications of OWL

Model verification

After reasoning, we found 27 models to be inconsistent

reasons1. our representation - functions sometimes found in the place

of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations

2. SBML abused – e.g. species used as a measure of time3. Incorrect annotations - constraints in the ontologies

themselves mean that the annotation is simply not possible

Protege Short Course::Dumontier:March 2012

12

Page 13: Real World Applications of OWL

Finding inconsistencies with axiomatically enhanced ontologies

ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process.To this, we add:• GO: ATP + Water the only inputs (universal quantification)• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate  are all

different (disjointness)• “ATP” input to “ATPase” reaction, which is annotated with

ATPase activity. The species “ATP”, however, is mis-annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP.

• Unsatisfiable -> curation error in BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast.

Protege Short Course::Dumontier:March 2012

13

Page 14: Real World Applications of OWL

Classification:Phosphotases

• Bioinformaticians use tools to identify functional domains (e.g., InterProScan)

• Tools simply show the presence of domains - they do not classify proteins

• Experts classify proteins according to domain arrangements - the presence and number of each domain is important

14

PhosphaBase: an ontology-driven database resource for protein phosphatases.Wolstencroft KJ, Stevens R, Tabernero L, Brass A. Proteins. 2005 Feb 1;58(2):290-4.

Protege Short Course::Dumontier:March 2012

Page 15: Real World Applications of OWL

Phosphatase Functional Domains

15 Protege Short Course::Dumontier:March 2012

Page 16: Real World Applications of OWL

Defining Protein Phosphatases

• Necessary and sufficient conditions are stipulated using EquivalentClass axioms

• A protein phosphatase is exactly a protein that consists of exactly one transmembrane domain and contains at least one phosphotase domain

ProteinPhosphatase EquivalentTo: Protein AND hasDomain 1 transMembraneDomain AND hasDomain min 1 PhosphataseCatalyticDomain

16 Protege Short Course::Dumontier:March 2012

Page 17: Real World Applications of OWL

17

More precise class expressions can be formulated for subtypes

Inclusion of universal quantifier now restricts the domains to only the types listed

R2A EquivalentTo: ProteinAND hasDomain 2 ProteinTyrosinePhosphataseDomain AND hasDomain 1 TransmembraneDomain AND hasDomain 4 FibronectinDomainsAND hasDomain 1 ImmunoglobulinDomain AND hasDomain 1 MAMDomainAND hasDomain 1 Cadherin-LikeDomainAND hasDomain only (TyrosinePhosphataseDomain OR TransmembraneDomain OR FibronectinDomain OR ImnunoglobulinDomain OR Clathrin-LikeDomain OR ManDomain)

Protege Short Course::Dumontier:March 2012

Page 18: Real World Applications of OWL

hydroxyl groupmethyl group

Knowledge of functional groups is important in chemical synthesis,

pharmaceutical design and lead optimization.

Functional groups describe chemical reactivity in terms of

atoms and their connectivity, and exhibits characteristic chemical

behavior when present in a compound.

Describing chemical functional groups in OWL-DL for the classification of chemical compounds

N Villanueva-Rosales, M Dumontier. 2007. OWLED, Innsbruck, Austria.

Ethanol

Protege Short Course::Dumontier:March 201218

Page 19: Real World Applications of OWL

Describing Functional Groups in DL

HydroxylGroup: CarbonGroup that (hasSingleBondWith some (OxygenAtom that hasSingleBondWith some HydrogenAtom)

OHR

R group

Protege Short Course::Dumontier:March 201219

Page 20: Real World Applications of OWL

Fully Classified Ontology

35 FG

Protege Short Course::Dumontier:March 201220

Page 21: Real World Applications of OWL

And, we define certain compounds

Alcohol: OrganicCompound that (hasPart some HydroxylGroup)

Protege Short Course::Dumontier:March 201221

Page 22: Real World Applications of OWL

Organic Compound Ontology

28 OC

Protege Short Course::Dumontier:March 201222

Page 23: Real World Applications of OWL

Question Answering:Classes as self-contained queries

• Query PubChem, DrugBank and dbPedia

Protege Short Course::Dumontier:March 201223

Page 24: Real World Applications of OWL

Querying Kidney and Urinary Knowledge Base and Ontology

KUPO Ontology

Entre gene

Gene X GO:0054426go:biological_process

Gene YMA:00345

kupo:002444

PT epithelial cell

rdfs:label

ro:part_of

MA:00456

kupo:004672

DT epithelial cell

rdfs:label

ro:part_of

Higgings Dataset

MA:000345

kupo:expressed_in

Gene YMA:00456

kupo:expressed_in

Proximal tubule

Distal tubule

Gene X

Query: What are the genes involved in Proteins transport expressed in Proximal Tubule Epithelial Cell?

24 Protege Short Course::Dumontier:March 201224

Page 25: Real World Applications of OWL

Semantic Annotation and Query

AE/GEO acquire

>250,000 Assays

>10,000 experiment

s

Re-annotate & summarizeATLAS

ArrayExpress

Curation Curation

Ontologically Modeling Sample Variables in Gene Expression Data [email protected]

Protege Short Course::Dumontier:March 201225

Page 26: Real World Applications of OWL

ontology-based data exploration

Query for Cell adhesion genes in all ‘organism parts’

‘View on EFO’

Ontologically Modeling Sample Variables in Gene Expression Data [email protected]

Protege Short Course::Dumontier:March 201226

Page 27: Real World Applications of OWL

Ontology-based query expansion for ArrayExpress Archive @ www.ebi.ac.uk/arrayexpress

Protege Short Course::Dumontier:March 201227

Page 28: Real World Applications of OWL

Search and Co-Occurrence

Protege Short Course::Dumontier:March 201228

Page 29: Real World Applications of OWL

Semantic Assistantservices relevant for the user's current task are offered directly within a desktop application. This approach relies on ontology-described semantic web services to provide external natural language processing (NLP) pipelines

Leverage of OWL-DL axioms in a Contact Centre for Technical Product SupportAlex Kouznetsov, Bradley Shoebottom, René Witte, Christopher JO Baker. OWLED 2010.

Protege Short Course::Dumontier:March 201229

Page 30: Real World Applications of OWL

Plug-in for Open Office Client

Protege Short Course::Dumontier:March 201230

Page 31: Real World Applications of OWL

• HyQue helps construct and evaluate (automatically obtain support for) hypotheses using formalized background knowledge and data using the Semantic Web

• HyQue makes it possible to develop a reliability model around data based on our scientific expectations of corroborating evidence

Protege Short Course::Dumontier:March 201231

Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.

Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted.

Page 32: Real World Applications of OWL

Hypothesis

h1:

e1 (Gal4p induces expression of GAL1)

h2:

e2 (Gal3p induces expression of GAL2

e3 AND Gal4p induces expression of GAL7)

h3:

e4 (Gal4p induces expression of GAL7

e5 AND Gal80p inhibits production of Gal4p

when GAL3 is over-expressed

e6 AND Gal80p induces expression of GAL7)

• simple event-based expression

• conjunctive hypothesis – must satisfy two expressions

• conjunctive hypothesis with conditional expression

Protege Short Course::Dumontier:March 201232

Page 33: Real World Applications of OWL

HYQUE ARCHITECTURE

Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.

Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted.

Protege Short Course::Dumontier:March 201233

Page 34: Real World Applications of OWL

Rule-based assessment of evidence

• ‘induce’ rule (maximum score: 5):– Is event negated?

• If yes, subtract 2

– Is logical operator ‘induce’?• If yes, add 1; if no, subtract 1

– Is agent of type ‘protein’ or ‘RNA’?• If yes, add 1; if of type ‘gene’, subtract 1

– Is target of type ‘gene’? • If yes, add 1; if no, subtract 1

– Does agent have known ‘transcription factor activity’? • If yes, add 1

– Is event located in the ‘nucleus’?• If yes, add 1; if no, subtract 1

GO:0010628

CHEBI:36080

SO:0000236

GO:0003700

GO:0005634

Protege Short Course::Dumontier:March 201234

Page 35: Real World Applications of OWL

Linked Open Results : from hypothesis to evidence

Protege Short Course::Dumontier:March 201235

Page 36: Real World Applications of OWL

Literature-Based Enrichment Analysis

• Enrichment analysis on terms extracted using a target ontology for associated articles.

Protege Short Course::Dumontier:March 201236

Enabling enrichment analysis with the Human Disease Ontology. Paea LePendu, , Mark A. Musen, Nigam H. Shah. Journal of Biomedical Informatics. Volume 44, Supplement 1, December 2011, Pages S31–S38

Page 37: Real World Applications of OWL

Protege Short Course::Dumontier:March 201237

Page 38: Real World Applications of OWL

Phenotype-based predictions

Phenotypes can be used as a substrate to cluster similar diseases, identify potential model systems, predict potential disease-treating drugs or their adverse events, drug repurposing, etc

Protege Short Course::Dumontier:March 201238

Robert Hoehndorf, Paul N. Schofield and Georgios V. Gkoutos. PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleid Acids Research, 2011.

Linking pharmgkb to phenotype studies and animal models of disease for drug repurposing.Hoehndorf R, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Pac Symp Biocomput. 2012:388-99.

CK Chen, CJ Mungall, GV Gkoutos et al. MouseFinder: candidate disease genes from mouse phenotype data. Human Mutation 2012

Page 39: Real World Applications of OWL

Tetralogy of Fallot

Protege Short Course::Dumontier:March 201239

OMIM

Human Phenotype Ontology

Phenotype ontologies should contain descriptions of

morphological, behavioural, physiological, developmental

characteristics

Page 40: Real World Applications of OWL

Compare Diseases based on their Phenotypes

Protege Short Course::Dumontier:March 201240

Comparison using Weighted Jaccard – uses information content for a phenotype regarding genotype or disease

Page 41: Real World Applications of OWL

Inferring equivalent phenotypes by reasoning over OWL ontologies

human ‘overriding aorta [HP:0002623]’ EquivalentTo:

‘phenotype of’ some (‘has part’ some (‘aorta [FMA:3734]’ and ‘overlaps with’ some ‘membranous part of interventricular septum [FMA:7135]’)

mouse ‘overriding aorta [MP:0000273 ]’ EquivalentTo:

‘phenotype of’ some (‘has part’ some (‘aorta [MA:0000062]’ and ‘overlaps with’ some ‘membranous interventricular septum [MA:0002939]’

Uberon super-anatomy ontology provides inter-species mappings

‘aorta [FMA:3734]’ EquivalentTo: ‘aorta [MA:0002939]’

‘membranous part of interventricular septum [FMA:3734]’ EquivalentTo: ‘membranous interventricular septum [MA:0000062]

Thus, ‘overriding aorta [HP:0002623] EquivalentTo:‘overriding aorta[MP:0000273]’

Protege Short Course::Dumontier:March 201241

Page 42: Real World Applications of OWL

Identifying potential mouse models for human diseases

Protege Short Course::Dumontier:March 201242

Quantitative ROC Analysis prediction against curated models yields 0.89 AUC

Prediction of Tetralogy of Fallot added by MGI

Page 43: Real World Applications of OWL

Conclusion

• OWL has come of age and can be used in an increasing number of scientific investigations and applications

• OWL applications cover knowledge capture, formalization, verification, classification, semantic annotation, query formulation, query answering, search, hypothesis testing and prediction

Protege Short Course::Dumontier:March 201243