Upload
hillary-banks
View
213
Download
0
Embed Size (px)
Citation preview
About ontologies
Melissa Haendel
And who am I that I am giving you this talk?
Melissa Haendel
Anatomist, developmental neuroscientist, molecular biologist, curator, ontologist
Most ontologists become ontologists because they have a need to classify and compute on a specific domain
Information retrieval from text-based resources is hard:
OMIM Query # of records“large bone” 785"enlarged bone" 156"big bones" 16"huge bones" 4"massive bones" 28"hyperplastic bones" 12"hyperplastic bone" 40"bone hyperplasia" 134"increased bone growth" 612
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
Use of the same defined term indicates the same meaning under different annotation circumstances
Common vocabularies allow grouping of disparate data
http://www.mkbergman.com/?m=20070516
The controlled vocabulary spectrum
Bottom line: you get what you pay for. Ontologies are expensive but powerful.
OBO
Why build an ontology? A simple example
Number of genes annotated to each of the following brain parts in an ontology:
brain 20part_of hindbrain 15part_of rhombomere 10
Query brain without ontology 20Query brain with ontology 45
Ontologies can facilitate grouping and retrieval of data
An ontology is a classification vocabularyPhilosophers: Ontology = The study of being as a branch of philosophy
Bioinformaticists: Domain ontology = representing a specific knowledge base
Features of ontologies:
1. Terms are defined2. Terms are arranged in a
hierarchy3. Relationships between the
terms are defined4. Expressed in a knowledge
representation language
Some well known ontologies are:SnoMED, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species
Defining ontology classes, a beginner’s guide
An X =Y that has one or more differentiating characteristics.where Y is the is_a parent of X.
Definition: Blue cylinder = Cylinder that is blue.
Definition: cylinder = Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane.
is_a
These types are Disjoint: A cylinder can be either blue or red by definition, but not both.Definitions specify necessary and sufficient conditions.
is_a
Definition: Red cylinder = Cylinder that is red.
Y
X
Subsumption reasoning:The simplest ontology reasoning
is_a
entityentity
organismorganism
catcat
mammalmammal
animalanimalis_a
is_a
is_ahumanhuman
is_a
instance_of instance_of
Peanut Chris Shaffer
Hierarchy typesSimple taxonomy
Directed acyclic graph
Subject —— Property —— Object
brain (class) has_part neuron (class)PloS 2009 (instance) has_author Melissa (instance)Chris (instance) owns red tie (instance)
Ontology classes, instances, and relations are stored as sentences
The True Path Rule
cuticle synthesis--[i] chitin metabolismcell wall biosynthesis--[i] chitin metabolism----[i] chitin biosynthesis----[i] chitin catabolism
chitin metabolism--[i] chitin biosynthesis--[i] chitin catabolism--[i] cuticle chitin metabolism----[i] cuticle chitin biosynthesis----[i] cuticle chitin catabolism--[i] cell wall chitin metabolism----[i] cell wall chitin biosynthesis----[i] cell wall chitin catabolism
GO Before: GO After:
BUT: A fly chitin synthase could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.
BUT: A fly chitin synthase could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.
NOW: all the child terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true.
NOW: all the child terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true.
The pathway from a child term all the way up to its top level parent(s) must be universally true.
Where does the True Path Rule come from?Transitivity. Some relations are transitive, and apply across all levels of the hierarchy.
For example, a cat is_a mammal, and a mammal is_a vertebrateSOa cat is_a vertebrate
=> This is the true path rule and is because the is_a relation is transitive.
Some properties are not transitive.
For example, head has_quality round.and,
head part_of organism. So is the organism round? Of course not!
BUT, eyes are part_of head, and head part_of organism, SO eye part_of organism is true, because part_of is a tranistive relation.
Relations are logically defined in a common relation ontology or within each ontology that uses them.
≠>
Domain ontologies are organized according to upper ontologies that specify the general types of things that exist
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic Quality(PaTO)
Biological Process(GO)
CELL AND CELLULAR COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process(GO)
=> Classification according to these higher level types helps ensure the True Path Rule holds
UniversalityOften, the order of the terms in an assertion will matter:
We can assert
adult transformation_of child
but not
child transformation_of adult
More about using ontology classes and relationsUnivocityTerms should have the same meanings on every occasion of use. (= They should refer to the same universal types)
Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies
Common ontology languages
OWL is a recognized standard, more toolsavailable, less restricted expressivity
OBO simple, more easily readable, used by many biologists, restricted expressivity
</owl:Class> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000626"><!-- DNA sequencing --> <owl:disjointWith> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000705"/><!-- Edman degradation --> </owl:disjointWith> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DNA sequencing</rdfs:label> <obo:IAO_0000111 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DNA sequencing</obo:IAO_0000111> <obo:IAO_0000119 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >OBI Branch derived</obo:IAO_0000119> <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DNA sequencing is a sequencing process which uses deoxyribonucleic acid as input and results in a the creation of DNA sequence information artifact using a DNA sequencer instrument.</obo:IAO_0000115> <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000123"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000293"/><!-- has_specified_input --> </owl:onProperty> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="http://purl.org/obo/owl/CHEBI#CHEBI_16991"/> <owl:Restriction> <owl:someValuesFrom> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000067"/><!-- evaluant role --> </owl:someValuesFrom> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000316"/><!-- has_role --> </owl:onProperty> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:someValuesFrom> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0600047"/><!-- sequencing assay --> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="http://purl.obolibrary.org/obo/IAO_0000030"/> <owl:Restriction> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/OBI_0000811"/><!-- primary structure of DNA macromolecule --> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000136"/> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:someValuesFrom> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000299"/><!-- has_specified_output --> </owl:onProperty> </owl:Restriction> </rdfs:subClassOf> <obo:IAO_0000117 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Philippe Rocca-Serra</obo:IAO_0000117> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000417"/><!-- achieves_planned_objective --> </owl:onProperty> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/OBI_0000441"/><!-- assay objective --> </owl:Restriction> </rdfs:subClassOf> <obo:IAO_0000112 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Genomic deletions of OFD1 account for 23% of oral-facial-digital type 1 syndrome after negative DNA sequencing. Thauvin-Robinet C, Franco B, Saugier-Veber P, Aral B, Gigot N, Donzel A, Van Maldergem L, Bieth E, Layet V, Mathieu M, Teebi A, Lespinasse J, Callier P, Mugneret F, Masurel-Paulet A, Gautier E, Huet F, Teyssier JR, Tosi M, Frébourg T, Faivre L. Hum Mutat. 2008 Nov 19. PMID: 19023858</obo:IAO_0000112> </owl:Class> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000743"><!-- immune response assay --> <owl:equivalentClass> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="http://purl.obolibrary.org/obo/IAO_0000109"/> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000136"/> <owl:someValuesFrom rdf:resource="http://purl.org/obo/owl/GO#GO_0006955"/> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:someValuesFrom> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000299"/><!-- has_specified_output --> </owl:onProperty> </owl:Restriction> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000417"/><!-- achieves_planned_objective --> </owl:onProperty> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/OBI_0000441"/><!-- assay objective --> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:equivalentClass> <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000124"/> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >immune response assay</rdfs:label> <obo:IAO_0000111 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >immune response assay</obo:IAO_0000111> <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Is an assay with the objective to determine information about an immune response</obo:IAO_0000115> <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/OBI_0000070"/><!-- assay -->
</owl:Class> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000626"><!-- DNA sequencing --> <owl:disjointWith> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000705"/><!-- Edman degradation --> </owl:disjointWith> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DNA sequencing</rdfs:label> <obo:IAO_0000111 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DNA sequencing</obo:IAO_0000111> <obo:IAO_0000119 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >OBI Branch derived</obo:IAO_0000119> <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DNA sequencing is a sequencing process which uses deoxyribonucleic acid as input and results in a the creation of DNA sequence information artifact using a DNA sequencer instrument.</obo:IAO_0000115> <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000123"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000293"/><!-- has_specified_input --> </owl:onProperty> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="http://purl.org/obo/owl/CHEBI#CHEBI_16991"/> <owl:Restriction> <owl:someValuesFrom> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000067"/><!-- evaluant role --> </owl:someValuesFrom> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000316"/><!-- has_role --> </owl:onProperty> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:someValuesFrom> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0600047"/><!-- sequencing assay --> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="http://purl.obolibrary.org/obo/IAO_0000030"/> <owl:Restriction> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/OBI_0000811"/><!-- primary structure of DNA macromolecule --> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000136"/> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:someValuesFrom> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000299"/><!-- has_specified_output --> </owl:onProperty> </owl:Restriction> </rdfs:subClassOf> <obo:IAO_0000117 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Philippe Rocca-Serra</obo:IAO_0000117> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000417"/><!-- achieves_planned_objective --> </owl:onProperty> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/OBI_0000441"/><!-- assay objective --> </owl:Restriction> </rdfs:subClassOf> <obo:IAO_0000112 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Genomic deletions of OFD1 account for 23% of oral-facial-digital type 1 syndrome after negative DNA sequencing. Thauvin-Robinet C, Franco B, Saugier-Veber P, Aral B, Gigot N, Donzel A, Van Maldergem L, Bieth E, Layet V, Mathieu M, Teebi A, Lespinasse J, Callier P, Mugneret F, Masurel-Paulet A, Gautier E, Huet F, Teyssier JR, Tosi M, Frébourg T, Faivre L. Hum Mutat. 2008 Nov 19. PMID: 19023858</obo:IAO_0000112> </owl:Class> <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000743"><!-- immune response assay --> <owl:equivalentClass> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:someValuesFrom> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="http://purl.obolibrary.org/obo/IAO_0000109"/> <owl:Restriction> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000136"/> <owl:someValuesFrom rdf:resource="http://purl.org/obo/owl/GO#GO_0006955"/> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:someValuesFrom> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000299"/><!-- has_specified_output --> </owl:onProperty> </owl:Restriction> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000417"/><!-- achieves_planned_objective --> </owl:onProperty> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/OBI_0000441"/><!-- assay objective --> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:equivalentClass> <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000124"/> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >immune response assay</rdfs:label> <obo:IAO_0000111 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >immune response assay</obo:IAO_0000111> <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Is an assay with the objective to determine information about an immune response</obo:IAO_0000115> <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/OBI_0000070"/><!-- assay -->
[Term]id: OBI:0000626name: DNA sequencingdef:"DNA sequencing is a sequencing process which uses deoxyribonucleic acid as input and results in a the creation of DNA sequence information artifact using a DNA sequencer instrument." [OBI:sourced "OBI Branch derived"]is_a: OBI:0600047 ! sequencing assay
[Term]id: OBI:0000626name: DNA sequencingdef:"DNA sequencing is a sequencing process which uses deoxyribonucleic acid as input and results in a the creation of DNA sequence information artifact using a DNA sequencer instrument." [OBI:sourced "OBI Branch derived"]is_a: OBI:0600047 ! sequencing assay
Ontology Lookup Service at EBI
http://www.ebi.ac.uk/ontology-lookup/
Bioportal at the NCBO
http://bioportal.bioontology.org/
Ontology Editors (and viewers)
http://oboedit.org/
OBOEdit- OBO ontology editor and viewer
Protégé - OWL ontology editor and viewer
http://protege.stanford.edu/
Ontologies can help reconcile annotation inconsistencies
Developmental Biology, Scott Gilbert, 6th ed.
Using ontologies for error checking
Text match mappingFruit fly ‘tibia’ Human ‘tibia’
UBERON: tibia
UBERON: bone
is_a
is_a
is_a
Vertebrata
Drosophila melanogaster
part_of
Homo sapiens
is_a
only_in_taxon
part_of
NOT is_a
Using ontologies for alignment of similar vocabularies
Information Content (IC) is based on depth within the ontology and annotation frequency
Using ontologies for computation
Why do we need all these rules and standards for ontology use and construction?
• Automatic reasoning to infer related classes• Annotation consistency• Error checking• Alignment with other ontologies • Computation
Ontologies must be intelligible to both:
Humans Machines