Upload
oscar-snow
View
215
Download
0
Embed Size (px)
Citation preview
1
Ontology and the Semantic Web
Barry Smith
August 26, 2013
2
Ontologies
• are computer-tractable representations of types in specific areas of reality
• are more and less general (upper and lower ontologies)– upper = organizing ontologies– lower = domain ontologies
3FMA
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part_
of
is_a
Foundational Model of Anatomy
ontologies = standardized labels designed for use in annotations
to make the data cognitively accessible to human beings
and algorithmically accessible to computers
4
by allowing grouping of annotations
brain 20 hindbrain 15 rhombomere 10
Query brain without ontology 20Query brain with ontology 45
Ontologies facilitate retrieval of data
5
ontologies = high quality controlled structured vocabularies used for the annotation (description, tagging) of data, images, emails, documents, …
6
7
Ontology’s greatest successes around net-centricity
• You build a site• Others discover the site and they link to it• The more they link, the more well known the
page becomes (Google …)• Your data becomes discoverable• Your data becomes more easily discoverable
the more you use common vocabularies
8
1. Each group creates a controlled vocabulary of the terms commonly used in its domain, and creates an ontology out of these terms using OWL (Web Ontology Language) syntax
4. Binds this ontology to its data and makes these data available on the Web
5. The ontologies are linked e.g. through their use of some common terms
6. These links create links among all the datasets, thereby creating a ‘web of data’
7. We can all share the same tags – they are called internet addresses
The roots of Semantic Technology
9
Audio Features Ontology
10
Audio Features Ontology
11
Where we stand today• increasing availability of semantically enhanced
data and semantic software• increasing use of OWL (Web Ontology Language)
in attempts to create useful integration of on-line data and information
• “Linked Open Data” the New Big Thing
12as of September 2010
13
The problem: the more this sort of Semantic Technology is successful, they more it fails
The original idea was to break down silos via common controlled vocabularies for the tagging of data
The very success of the approach leads to the creation of ever new controlled vocabularies – semantic silos – as ever more ontologies are created in ad hoc ways
Every organization and sub-organization now wants to have its own “ontology”
The Semantic Web framework as currently conceived and governed by the W3C yields minimal standardization
14
Divided we fail
15
United we also fail
The problem: many, many silos
• DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services
• these systems are poorly integrated• deliver redundant capabilities, • make data hard to access, foster error and waste• prevent secondary uses of data
https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data
16
17
what is missing here
Syntactic and semantic interoperability
• Syntactic interoperability = systems can exchange messages (realized by XML).
• Semantic interoperability = messages are interpreted in the same way by senders and receivers.
• In UCore, meanings are specified via natural-language strings.
• Experience shows that this is not a viable route to achieving semantic interoperability.
18
How to avoid the problem of semantic siloes
Distributed Development of a Shared Semantic Resource
Pilot testing to demonstrate feasibility for I2WD
19
20
An alternative solution: Semantic Enhancement
A distributed incremental strategy of coordinated annotation
– data remain in their original state (is treated at ‘arms length’)– ‘tagged’ using interoperable ontologies created in tandem– allows flexible response to new needs, adjustable in real
time– can be as complete as needed, lossless, long-lasting because
flexible and responsive– big bang for buck – measurable benefit even from first small
investments
The strategy works only to the degree that it rests on shared governance and training
compare: legends for mapscompare: legends for maps
21
compare: legends for mapscommon legends allow (cross-border) integration
22
The Gene Ontology
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
23
The Gene Ontology
MouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
24
The Gene Ontology
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
25
Common legends• help human beings use and understand complex
representations of reality• help human beings create useful complex
representations of reality• help computers process complex
representations of reality• help glue data together
But common legends serve these purposes only if the legends are developed in a coordinated, non-redundant fashion
26
International System of Units
27
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)The Open Biomedical Ontologies (OBO) Foundry
28
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
rationale of OBO Foundry coverage
GRANULARITY
RELATION TO TIME
29
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OFORGANISMS
Family, Community, Deme, Population
OrganFunction
(FMP, CPRO)
Population Phenotype
PopulationProcess
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)Population-level ontologies 30
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)Environment Ontology
envi
ron
men
ts
31
32
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OF ORGANISMS
Family, Community,
Deme, Population OrganFunction
(FMP, CPRO)
Population
Phenotype
Population Process
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
(FMA, CARO)
Phenotypic Quality(PaTO)
Biological Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cell Com-
ponent(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
http://obofoundry.org
E N
V I R
O N
M E
N T
33
RELATION TO TIME
GRANULARITY
CONTINUANT
INDEPENDENT
COMPLEX OF ORGANISMS
Family, Community,
Deme, Population
Environment of population
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
(FMA, CARO)
Environment of single organism
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cell Com-
ponent(FMA, GO)
Environment of cell
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular environment
http://obofoundry.org
E N
V I R
O N
M E
N T
The OBO Foundry based on the idea of annotation = semantic enhancement of data across all of biology
$200 mill. spent so far on using the GO to annotate (tag) biomedical research data through manual effort of PhD biologusts
34
OBO Foundry approach extended into other domains
35
NIF Standard Neuroscience Information Framework
ISF Ontologies Integrated Semantic Framework
OGMS and Extensions Ontology for General Medical Science
IDO Consortium Infectious Disease Ontology
cROP Common Reference Ontologies for Plants
36
What these annotations do
• make data retrievable even by those not involved in their creation
• allow integration of data deriving from heterogeneous sources
• break down the walls of roach motels
Benefits of the Approach• Does not interfere with the source content• Enables content to evolve in a cumulative fashion
as it accommodates new kinds of data• Does not depend on the data resources and can
be developed independently from them in an incremental and distributed fashion
• Provides a more consistent, homogeneous, and well-articulated presentation of the content which originates in multiple internally inconsistent and heterogeneous systems
37
Benefits of the Approach• Makes management and exploitation of the
content more cost-effective• Allows graceful integration with other
government initiatives and brings the system closer to the federally mandated net-centric data strategy
• Creates incrementally an integrated content that is effectively searchable and that provides content to which more powerful analytics can be applied
38
Building the Shared Semantic Resource
• Methodology of distributed incremental development
• Training• Governance• Common Architecture of Ontologies to support
consistency, non-redundancy, modularity– Upper Level Ontology (BFO)– Mid-Level Ontologies– Low Level Ontologies
39
Goal: To realize Horizontal Integration(HI) of intelligence data
HI =Def. the ability to exploit multiple data sources as if they are one Problem: the data coming onstream are out of our
control Any strategy for HI must be agile in the sense that
it can be quickly extended to new zones of emerging data according to need
40
I2WD StrategyCreate an agile strategy for building ontologies within a Shared Semantic Resource (SSR)
and apply and extend these ontologies to annotate new source data as they come onstream
⁻ Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups
⁻ How to manage collaboration?
41
Why do large-scale ontology projects fail?
• focus on vocabularies, lexicons, with no logical structure, no attention to life cycle
• failure of housekeeping yields redundancy and therefore forking
• the same data is annotated in different ways by users of different ontology fragments
• data is siloed as before– HOW TO BUILD THE NEEDED LOGIC INTO THE
ARCHITECTURE OF THE ONTOLOGIES?42
Examples of Principles• All terms in all ontologies should be singular nouns• Same relations between terms should be reused in
every ontology• Reference ontologies should be based on single
inheritance• All definitions should be of the form
an S = Def. a G which Dswhere ‘G’ (for: genus) is the parent term of S (for: species) in the corresponding reference ontology
Anatomy Ontology(FMA*, CARO)
Environment
Ontology(EnvO)
Infectious Disease
Ontology(IDO*)
Biological Process
Ontology (GO*)
Cell Ontology
(CL)
CellularComponentOntology
(FMA*, GO*) Phenotypic Quality
Ontology(PaTO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology (SO*) Molecular
Function(GO*)Protein Ontology
(PRO*) Extension Strategy + Modular Organization 44
top level
mid-level
domain level
Information Artifact Ontology
(IAO)
Ontology for Biomedical
Investigations(OBI)
Spatial Ontology
(BSPO)
Basic Formal Ontology (BFO)
Ontologies are built as orthogonal modules which form an incrementally evolving network
• scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network
• users are motivated by the assurance that the ontologies they turn to are maintained by experts
45
More benefits of orthogonality
• helps those new to ontology to find what they need
• to find models of good practice• ensures mutual consistency of ontologies
(trivially)• and thereby ensures additivity of annotations
46
More benefits of orthogonality• No need to reinvent the wheel for each new
domain• Can profit from storehouse of lessons learned• Can more easily reuse what is made by others• Can more easily reuse training• Can more easily inspect and criticize results of
others’ work• Leads to innovations (e.g. Mireot, Ontofox) in
strategies for combining ontologies
47