View
216
Download
1
Category
Tags:
Preview:
Citation preview
Ontologies 101Melissa Haendel and Jim BalhoffINCF taskforce cross program meeting
Stockholm, Aug. 30, 2013
All about ontologies
1. What is an ontology?2. A little logic3. Upper ontologies4. OWL and RDF5. Data integration
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
Common controlled vocabularies indicate the same meaning under different annotation circumstances
Any closed, prescribed list of terms used for classifying data
What is a controlled vocabulary?
Key Features: Terms are not usually defined. Relationships between the terms are
not usually defined. Can be a list.
Here is a CV of wines:Pinot noir, red, chardonnay, Chianti, Bordeaux, Riesling….These are all different types- color, location, varietal, and are present in a list.
Another example would be the map locations list at the end of your Gazeteer.
Any controlled vocabulary that is arranged in a hierarchy
What is a Taxonomy?
Key Features:• Terms are not usually defined.• Relationships between the terms are not usually defined.• Terms are arranged in a hierarchy.
Here is a wine taxonomy:
WineRed
merlotzinfandelcabernetpinot noir
Whitechardonnaypinot grisRiesling
A taxonomy that contains additional information about use of the terms
What is a Thesaurus?
Key Features:• Terms are not usually defined.• Relationships between the terms are not usually defined.• Terms are arranged in a hierarchy.• Statements about the terms are included such as scope notes or instructions for use.
Some well known thesauri are:WordNet, NCI cancer thesaurus, MeSH
A formal conceptualization of a specified domain of interestWhat is an ontology?
Key Features:• Terms are defined.• Relationships between the terms are defined, allowing logical inference.• Terms are arranged in a hierarchy.• Expressed in a knowledge representation language such as RDFS, OBO, or OWL.
Some well known ontologies are:SnoMED, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species
Reproduced with permission, Jason Freenyhttp://web.mac.com/moistproduction/flash/index.html
http://www.mkbergman.com/?m=20070516
The Ontology spectrum:
Bottom line: you get what you pay for.
OBO
Are ontologies about terms or things?What matters are things
Data annotated to the right schema will be more consistent
DessertIce cream
gelatosno-conesoft-servecustard cream
Caketortedouble-layergaletteangel food cake
DES:000038DES:000034
DES:000456DES:005566DES:000564DES:000744
DES:0003221DES:000667DES:000975DES:000544DES:000765
DES:000034 - Frozen milk and/or cream with various sugars, and flavoring such as fresh fruit and nut purees.[1]
Labels are handles, not things
Why build an ontology? A simple example
Number of genes annotated to each of the following brain parts in an ontology:
brain 20part_of hindbrain 15part_of rhombomere 10
Query brain without ontology 20Query brain with ontology 45
Ontologies can facilitate grouping and retrieval of data
is_a
entity
organism
cat
mammal
animalis_a
is_a
is_ahuman
is_a
instance_of instance_of
Peanut Chris Shaffer
Classes, subclasses, and instances
Subtyping relation
is_a = SubClassOf
OWL individuals:
How do you tell if it is an instance or a class?Is there more than one in existence? Is the entity referencing a group of things with common properties?
Class or instance?There is only one Snoopy
There is a class of things labeled “Snoopy toys”
Class or instance? Class or instance?
There is only one Alaska
There is a class of things labeled “States”
There is only one blue morpho in my specimen collection
There is a class of things labeled “Blue morpho butterflies”
General Principle for Logical DefinitionsDefinitions are of the following Genus-Differentia form:
X = a Y which has one or more differentiating characteristics.
where X is the is_a parent of Y.
Definition: Blue cylinder = Cylinder that has color blue.
Definition: cylinder = Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane. is_a is_a
Definition: Red cylinder = Cylinder that has color red.
The True Path Rule
cuticle synthesis--[i] chitin metabolismcell wall biosynthesis--[i] chitin metabolism----[i] chitin biosynthesis----[i] chitin catabolism
chitin metabolism--[i] chitin biosynthesis--[i] chitin catabolism--[i] cuticle chitin metabolism----[i] cuticle chitin biosynthesis----[i] cuticle chitin catabolism--[i] cell wall chitin metabolism----[i] cell wall chitin biosynthesis----[i] cell wall chitin catabolism
GO Before: GO After:
BUT: A fly chitin synthase gene could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.
NOW: all the subClass terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true.
The pathway from a subClass all the way up to its top level parent(s) must be universally true.
Where does the True Path Rule come from?Transitivity. Some relations are transitive, and apply across all levels of the hierarchy.
For example, a cat is_a mammal, and a mammal is_a vertebrateSOa cat is_a vertebrate
=> This is the true path rule and is because the is_a relation is transitive.
Some properties are not transitive.
For example, head has_quality round.and,
head part_of organism. So is the organism round? Of course not!
BUT, eyes are part_of head, and head part_of organism, SO eye part_of organism is true, because part_of is a tranistive relation.
Relations are logically defined in a common relation ontology or within each ontology that uses them.
≠>
Relationships and definitions
A relationship from one class to another is a formalized part of its definition (an object property restriction in OWL)
A subtype relation (is_a in OBO, SubClassOf in OWL) specifies necessary conditions for membership of a class.
For example, finger part_of hand (all finger part_of some hand) states that a necessary condition of being in the class finger is to be part of some hand.
So… if a finger exists, it is part of some hand. But…this does not mean that if a hand exists, it has as a part a finger.
There are many useful ways to classify parts of organisms:
its parts and their arrangement its relation to other structures
what is it: part of; connected to; adjacent to, overlapping?
its shape its function its developmental origins its species or clade its evolutionary history
Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”
Not all classification is useful
Be practical: Build ontologies for what you need and for what can be reused
About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours.
C. Darwin
neuron
anatomical structure
lumen
anatomical space
epidural space
anatomical space
anatomical structure
✗
Some classes are declared to never share any instances in common
OWL DisjointWith OBO: disjoint_from
NO!
About reasonersA piece of software able to infer logical consequences from a set of asserted facts or axioms.
They are used to check the logical consistency of the ontologies and to extend the ontologies with "inferred" facts or axioms
For example, a reasoner would infer:
Major premise: All mortals die.Minor premise: Some men are mortals.Conclusion: Some men die.
Different reasoners can perform slightly differently. There are a number of reasoners to be aware of: ELK, HermiT, Pellet, etc.
Different kinds of definitions An ontology is a collection of axioms
An axiom is simply a sentence or a statement Axioms can be
non-logical (aka “annotations” or “text definitions”)
• E.G. GO_0000262 has synonym ‘mtDNA’
• opaque to reasoners logical
• well-defined semantics
• understood by reasoners
• Example: SubClassOf axioms
• Arguments can be classes or class expressions (for example, the class of things that are parts of the midbrain)
Classifying anatomy
appendage
antenna forewing
wing
hindwing
Relationships record classifications too
substantia nigra
part_of some ‘midbrain’
trochlear nucleus
‘substantia nigra’ SubClassOf part_of some ‘midbrain’
The knowledge in an ontology can make the reasons for classification explicit
Any sense organ that functions in the detection of smell is an olfactory sense organ
sense organcapable_of some detection of smell
olfactory sense organ
nose
sense organ
nose
capable_of some detection of smell
Classifying
sense organcapable_of some detection of smell
olfactory sense organ
nose
=> These are necessary and sufficient conditions, also called an equivalent class axiom
Compositionality and avoiding asserted multiple inheritance
We can logically define composed classes and create complex definitions from simpler ones
aka: building blocks, cross-products, logical definitionsDescriptions can be composed at any time
Ontology construction time (pre-composition) Annotation time (post-composition)
Formal necessary and sufficient definitions + a reasoner
Automatic (and therefore manageable) classification Requires subtype classification, so apart from the root
term(s), no term should lack a superclass parent.
Let the reasoner do the work!
Post-composed versus pre-composed anatomical entities are logically equivalent
Named class: Plasma membrane of neuron
plasma membrane and part_of some neuron
Gene Ontology Relation Ontology Cell Ontology
Genus Differentia
Has same semantic meaning as:
chemical entities
Many perspectives, many ontologies – that overlap in content
grossanatomy
tissues
cellscellanatomy
proteins
phenotypes
clinical disorders
processes
physiological processes
development
reactions
cellular processes
behavior
evolutionary characters
nervous systemneural crest
Domain ontologies are organized according to upper ontologies (Basic Formal Ontology) that specify the general types of things that exist
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic Quality(PaTO)
Biological Process(GO)
CELL AND CELLULAR COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process(GO)
=> Classification according to these higher level types helps ensure the True Path Rule holds
The Common Anatomy Reference OntologyCARO is a structural classification based on
granularity
From the bottom up:Cell componentCellTissueMulti-tissue structure
From the top down:Organism subdivisionAnatomical system
Acellular structures
CARO is an upper reference ontology that can be used to structure new anatomy ontologies
Example of complexity arising from multiple species-contexts
erythrocyte
cell
nucleate cell enucleate cell
not applicable in all contexts
Example of complexity arising from multiple species-contexts
erythrocyte
nucleate erythrocyte
enucleate erythrocyte
cell
nucleate cell enucleate cell
zebrafish nucleate
erythrocyte
human erythrocyteZFA:0009256
… …
CL:0000562
CL:0000232
CL:0000592
FMA:81100
species ontologiesattached at appropriatelevel
Developmental Biology, Scott Gilbert, 6th ed.
Using reasoners to detect errors
Fruit fly FBbt ‘tibia’ Human FMA ‘tibia’
UBERON: tibia
UBERON: bone
is_a
is_a
is_a
Vertebrata
Drosophila melanogaster
part_of
Homo sapiens
is_a
only_in_taxon
part_of
disjoint_with
✗
Resource Description Framework(RDF)
Knowledge and data represented in simple statements called “triples”
Engineers shouldn’t be allowed to name things
RDF
• Ontologies support semantic classification of data• RDF is the data standard for the semantic web• Extremely simple and flexible data model (not a file
format)• A W3C recommendation:
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
RDF• Relate “things” with simple statements:
triples• <subject> <predicate> <object>• <John> <hasPet> <Fido>• Everything has a global, unique
identifier: URI• Objects/values can be “things” or literal
values (text, number, date)
Linked Data PrinciplesUse URIs as names for things.
Use HTTP URIs so we can look up the names.
Return useful information using standards when someone looks up a name.
Link to other URIs so we can discover more things (“follow your nose”).
http://www.w3.org/DesignIssues/LinkedData.html
RDF data integration
Easy to combine separate datasets– Shared identifiers– Unordered collection of triples (facts)
• example:john example:hasPet example:fido• example:John rdfs:label “John”• example:fido rdfs:label “Fido”• example:john rdf:type example:Person
• example:fido fdf:type example:Dog• example:fido example:age “6”
Dataset 1
Dataset 2
RDF data integration
Combined dataset
• example:john example:hasPet example:fido• example:John rdfs:label “John”• example:fido rdfs:label “Fido”• example:john rdf:type example:Person• example:fido rdf:type example:Dog• example:fido example:age “6”
RDF data integration
An Introduction to OWL
What is OWL?
• Web Ontology Language (OWL) is a language for writing ontologies for the Web
• OWL 2 is the current version of OWL and a W3C recommendation as of October 2009
• The previous version of OWL (OWL 1) was a W3C recommendation in 2004
• An overview of the language can be found at: http://www.w3.org/TR/owl2-overview/
OWL
The things or objects about which knowledge is represented (e.g. Melissa, Jim, Sweden) are called individuals (aka instances)
Group of things (e.g. person, conference) are called classes
Relations between things (e.g. siblings) are called properties
What does OWL add to RDF? More meaningful semantics
– OWL allows one to specify characteristics of properties and classes• "If Melissa isMarriedTo John" then this implies “John
isMarriedTo Melissa” (symmetric property)• Complex class definitions composed of properties and
other classes—“expressions”– muscle and (attached_to some bone and (part_of some
head))
Enables reasoning
OWL - added semantics
Infer new RDF triples (facts) based on asserted triples and OWL axioms
OWL 2 Full - direct extension of RDF OWL 2 DL - practical reasoning based on
Description Logic– strict separation of classes and instances– entails some constraints on RDF
Class versus individual assertions
EquivalentTo, subClassOf, disjointWith
In OWL, the things you can say about relationships between classes are basically limited to:
All other relationships (aka properties) are between OWL individuals:
Built in properties: sameAs, differentFrom
Relationship between individual and class: Type
Object properties: user-added content in the ontology
OWL primer
http://www.w3.org/TR/owl2-primer/
This is an EXCELLENT resource
Object Properties
• Object properties link another individual to an individual• The above representations means that the individual Melissa and
INCV ontology lecture are related through the teaches property
Melissa
INCF Ontology Lecture
teaches
Object Properties
Properties may have a domain and a range specified. For instance, the property teaches can have domain Person and range Lecture
teaches rdf:type owl:ObjectProperty teaches rdfs:domain foaf:Personteaches rdfs:range Lecture
Melissa
INCF Ontology Lecture
LecturePerson
teaches
Complex Class definition (Equivalent Class)
In OWL, we can say that an instructor is equivalent to a person that teaches some workshop
PersonteachessomeLecture
instructor
Class: InstructorEquivalentTo: Person and (teaches some lecture)
Reasoning ExampleAssuming we have defined the object property teaches with domain foaf:person and range lectureWe assert thatMelissa teaches INCF_ontology_lecture
Melissa rdf:type person INCF_ontology_lecture rdf:type lecture
And because of the equivalent class, we also infer that: Melissa rdf:type instructor
Melissa
INCF Ontology Lecture
LecturePerson
teaches
Instructor
Reasoning ExampleAssuming we have defined the object property teaches with domain foaf:person and range lectureWe assert thatMelissa teaches INCF_ontology_lectureINCF_ontology_lecture rdf:type book
book owl:disjointWith lecture
??
Reasoning ExampleAssuming we have defined the object property teaches with domain foaf:person and range lectureWe assert thatmandy teaches INCF_ontology_lectureINCF_ontology_lecture rdf:type bookbook owl:disjointWith lecture
Error!!
Satisfiability• Unsatisfiable class: impossible to have
instances• Incoherent ontology: contains unsatisfiable
classes– still usable
• Inconsistent ontology: contains impossible instances (such as instances of unsatisfiable classes)– not really usable: “anything follows from a
contradiction”– http://ontogenesis.knowledgeblog.org/1329
Open World vs. Closed World
Open world assumption: everything we don’t know is undefined
Closed world assumption: everything we don’t know is false
Example:Statement: Vertebrate has_part spinal cordQuery: “Do fruit flies have spinal cords?”
Closed world response= noOpen world response= unknown
OWL Syntaxes• Many flavors
– Functional OWL Syntax– RDF based Syntaxes
• RDF/XML• Turtle• OWL/XML
• Manchester SyntaxStill they all can be represented as a graph or a set of triples!
So what can we do with all this ontology “stuff”?
Querying across anatomical granularity
CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel. 2012. Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5
Uberon.org
Top hit mouse
Top hit fish
Exomiser: Using comparative phenotype analyses to improve variant prioritization
Ontology considerations An ontology is a classification and there are many
useful ways to classify biology Everybody makes mistakes
Let the computer find errors, manage knowledge for you Re-use other people’s work where possible
Import class hierarchies, use common patterns Cautionary note – formal languages have limitations.
Don’t expect or want to express everything! RDF is a model for semantic, linked data OWL is an ontology language for RDF and the web Ontologies are a tool for data exchange and integration
via a variety of formats (tomorrow’s discussion)
A
C
B
D EVertebrata
Ascidians
Arthropoda
Annelida
Mollusca
Echinodermata
tetrapod limbs
ampullae
tube feet
parapodia
Querying for genes in similar structures across species
Panganiban et al., PNAS, 1997
Distal-less orthologs participate in distal-proximal pattern formation and appendage morphogenesis
mouse limbsea urchin tube feet
ascidian ampulla polychaete parapodia
Recommended