Upload
reese
View
38
Download
0
Embed Size (px)
DESCRIPTION
Ontological Analysis & Integration of Terminologies: Towards An Environmental Reference Ontology Library. Geri Steve, Aldo Gangemi, Domenico M. Pisanelli. Istituto di Tecnologie Biomediche, CNR, Rome, Italy http://saussure.irmkant.rm.cnr.it {steve,gangemi,pisanelli}@saussure.irmkant.rm.cnr.it. - PowerPoint PPT Presentation
Citation preview
Santa Fe 2K
Ontological Analysis & Integration of
Terminologies: Towards An Environmental Reference Ontology
Library
Geri Steve, Aldo Gangemi, Domenico M. Pisanelli
Istituto di Tecnologie Biomediche, CNR, Rome, Italyhttp://saussure.irmkant.rm.cnr.it
{steve,gangemi,pisanelli}@saussure.irmkant.rm.cnr.it
Santa Fe 2K
Which part are you talking about?
• If my liver is part of my digestive system, and that system is part of me, is my liver part of me?
• If my liver is a part of me and I am part of the CNR, is my liver part of the CNR?
My liver is a component of my digestive system, while I am a member of CNR. No rule for composing component and member relations
Moreover, I am a body, but I am also a person. A living person depends on a body. Nevertheless, a living person can be member of CNR, but a body cannot
Santa Fe 2K
Object or place?
• A body region is an object that one could cut, or a place?
• A gene is a DNA fragment, or a DNA region (allele)?• A river is an orographic object, or the geographic
place of a watercourse?
Despite many differences, such three cases seem analogous: they share a polysemy partly dependent on an abstract difference between objects and regions, and a related axiom specifying that objects must be located at some region
Santa Fe 2K
River in the GEMET thesaurus
hydrosphere
water (geog) watercourse
watercourse
water body
sea
surface water
water reservoir
water reservoir
brook river
brook river spring
spring
lakesea hydrologic cycle
Santa Fe 2K
Should we worry about those things?
Even in presence of polysemous names, a standalone application using a local databank or terminological repository may be able to accomplish its task without serious flaws.
However, when it is integrated with another application, semantic mismatches constitute a serious obstacle for the agent or interface that is negotiating or sharing information.
The ever-increasing demand of data sharing has to rely on a solid conceptual foundation in order to give a semantics to the terabytes available in different databases and eventually traveling over the networks.
Ontologies are currently recognized as the answer to the needs of conceptual foundation.
Santa Fe 2K
The advantages of ontologies
to allow a more effective data and knowledge sharing
to facilitate knowledge re-use in decision support systems
to give theoretical foundation to vocabulary standardization activity
Santa Fe 2K
Our task
We learn domain ontologies (in medicine, environment) by integrating the conceptual models that can be extracted from terminological sources
The goal is building Domain Reference Ontologies in the form of modular libraries of formal theories
In our ONIONS methodology, ontology learning needs both incremental bottom-up learning from sources, and incremental definition and reuse of general theories that can account for the intended meaning of terms
Santa Fe 2K
ONtologicIntegration OfNaïveSources
context
general theories
concept
defining elements
Santa Fe 2K
Minimal history
ONIONS methodology for ontology integration has been developed since the early 1990s to account for the problem of conceptual heterogeneity. It addresses some problems encountered in the context of the European project GALEN and the Italian projects SOLMC (Ontological and Linguistic Tools for Conceptual Modeling) and ONTOINT (Ontological Integration of Information)
Santa Fe 2K
Some related research projects GALEN & GALEN-IN-USE
CYC anatomy
SNOMED RT
HL7 vocabulary committee
MED
Santa Fe 2K
What is an ontology?
«A specification of a conceptualization» (Gruber, 1993)
«The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. [...] »
(Sowa, 1997)
«A partial and indirect specification of a conceptualization» -restricted notion- (Guarino, 1998)
Santa Fe 2K
What is an ontology (restricted notion)?
An ontology is a set of axioms that account for the intended meaning (the intended models) of a vocabulary (the namespace of a logical language)
A set of axioms usually only approximate such intended models that on their turn only approximate the conceptualization of vocabulary items
A conceptualization is a set of conceptual relations that range over a domain and a set of relevant states of affairs (possible worlds) for that domain
Therefore, a precise definition of "ontology" (in a restricted, formal sense) might be "a partial specification of the intended models of the conceptualization of a vocabulary"
Santa Fe 2K
Types of ontologies (broad notion)
Catalog of normalized terms, e.g. a list of terms used in the reports from a laboratory: no taxonomy, no axioms, and no glosses
Glossed catalog, e.g. a dictionary of medicine: a catalog with glosses.
Thesaurus, e.g. many parts of the UMLS Metathesaurus, GEMET: a hierarchical collection of terms; the hierarchical link is usually polysemous
Taxonomy, e.g. the ICD10: a collection of classes with a partial order induced by inclusion (classification)
Axiomatized taxonomy, e.g. the GALEN Core Model: a taxonomy with axioms
Ontology library, e.g. the Ontolingua repository: a set of axiomatized taxonomies with relations among them. Each element of the library is a module, which can be included into another one. Also, a concept from a module can be only used into another one. Ontology modules can be considered subdivisions of the namespace of a model
Santa Fe 2K
From Data Integration to Conceptual Integration
• Heterogeneous texts• Heterogeneous semi-structured texts
(retrieval of web data types and descriptions)• Heterogeneous databases (schema
integration, information brokering)
=> In all these cases, heterogeneity concerns the conceptualization of the terminology used in the sources
Santa Fe 2K
Polysemy and overlapping
Since the primary causes of heterogeneity are • polysemy (conceptual disalignment, difference of
intended meaning of one name), and • conceptual overlapping (different names having
overlapping meaning)that arise in the union of the vocabularies of two any
sources, ontologies are a major component to provide semantic access to (and integration of) terminological resources
Incidentally, polysemy is usually found within the same source as well (views, themes, homonyms):
Santa Fe 2K
Ontology Learning
• From Natural Language• From Semi-structured Data• From Structured Data• From Terminologies
=> Integration of sources needs:
(Principled) Conceptual Abstraction
Santa Fe 2K
Conceptual abstraction: an example
The domain ontology A has body region with the intended meaning of «loosely specified part of the body that can be cut, filled, etc.»
The domain ontology B has body region with the intended meaning of «region of the body at which body parts are located»
There is a metonymy acting on body region in A, whose intended meaning concerns body parts located at some region, although they are denoted by referring to the region itself (the intended meaning in B)
Hence, the metonymic name should be distinguished from the plain name, and correctly related to it
The distinction between objects (body parts) and regions, and the notion of a localization relation holding between objects and regions are both necessary to make the metonymy clear, and cannot be found in the specifications given in A or B. They have to be found in some generic theory
Santa Fe 2K
Ontology integration: conceptual issues
Ontology integration is – generally speaking – the construction of an ontology C that formally specifies the union of the vocabularies of two other ontologies A and B
To be sure that A and B can be integrated at some level, C has to commit to both A's and B's conceptualizations. In other words, the intension of the concepts in A and B should be mapped to the intension of C's concepts
Unfortunately, this cannot be realized using only the conceptual relations specified in A and B for local tasks (for a specific context). The methodological principle adopted here is that generic ontologies reused from the philosophical, linguistic, mathematical, AI literature must found the comparison of different intensions. Our approach may be called principled conceptual integration
Santa Fe 2K
Aspects of integration
Three aspects of an ontology are taken into account:
• the intended models of the conceptualizations of its vocabulary
• the domain of interest of such models, i.e. the 'topic' of the ontology
• the namespace of the ontology
The most interesting case is when A and B are supposed to commit to the conceptualization of the same domain of interest or of two overlapping domains. In particular, A and B may be:
Santa Fe 2K
Some integration cases for the same topic
Alternative ontologies: the intended models of the conceptualizations of A and B are different (they partially overlap or are completely disjoint) while the domain of interest is (mostly) the same. This is a typical case that requires integration: different descriptions of the same topic are to be integrated
Truly overlapping ontologies: both the intended models of the conceptualizations of A and B and their domains of interest have a substantial overlap. This is another frequent case of required integration: descriptions of strongly related topics are to be integrated
Equivalent ontologies with vocabulary mismatches: the intended models of the conceptualizations of A and B are the same, as well as the domain of interest, but the namespaces of A and B are overlapping or disjoint. This is the case of equivalent theories with alternative vocabularies
Santa Fe 2K
Ontological integration: operational issuesDepending on the amount of change necessary to the operational integration of A and B,
different levels of interoperability can be distinguished:
Mediation: it requires no changes to A and B, but only mapping relations that describe the equivalence (partial or total) of A's and B's elements to C's elements. This may result in weak interoperability, since usually the intended models of A and B overlap only: some concepts from A may not have a correspondent in B, and vice-versa. This is the design choice for some recent information brokering architectures. However, such architectures, have a weak commitment towards a principled way of conceptual integration, possibly for its additional cost
Alignment: it requires some change to fill the biggest gaps of A and B respect to an ideal C that completely integrates A and B. Therefore, alignment requires at least a partial conceptual integration. It may support a limited interoperability; for example, deep inferences may be excluded
Unification: it may require a major reorganization of A and B, which are 'harmonized'. Unification intervenes on the inferential features of the systems, and consists in a complete operational integration: everything can be made in one system, can be made in the other. It results in the most complete interoperability but requires a complete conceptual integration as well. From the conceptual viewpoint, unification consists in the adoption of C as a standard in the systems using A or B
Santa Fe 2K
Ontology integration: practical issues• Lack of hierarchies• Ambiguous hierarchies• Informality• Lack of modularity• Polysemy• Uncertain semantics• Prototypical descriptions• Ontological opaqueness• Lack of a (minimal) set of axioms• Confusing lexical clues• Awkward naming policy• 'Remainder' partitions• 'Exception' partitions• Terminological cycles• Meta-level soup• Low maintenance capabilities
Santa Fe 2K
Ontologies: some desiderata
• An explicit taxonomy with subsumption among concepts• Semantic explicitness of links • Modularity of namespace• A stratified design of the modules• Absence of polysemy within a module• Disjointness of concepts within a module and within the top-level• A proper interface between the ontology namespace and one or more
sets of lexical realizations• Linguistically meaningful naming policy (cognitive transparency)• Rich documentation• Some minimal axiomatization to detail the difference among sibling
concepts• Explicit linkage to concepts and relations from generic theories• Meta-level assignments to distinguish among the formal primitives
assigned to concepts• Languages and implementations that support the previous needs as
well as the possibility of collaborative modeling
Santa Fe 2K
The ONIONS Methodology
ONIONS implementation is meant to provide extensive axiomatization, clear semantics, and ontological depth to a domain terminology
• Extensive axiomatization is obtained through a conceptual analysis of the terminological sources and their representation in a logical language with a rigorous semantics
• Ontological depth is obtained by reusing a library of generic ontologies, on which the axiomatization depends. Such library may include multiple choices among partially incompatible ontologies. In particular, we suggest the importance of mereology or theory of parts, topology or theory of wholes, connexity and boundaries, morphology, or theory of form and congruence, localization, or theory of regions, time theory, actors, or theory of participants in a process, dependence theory, and the theory of environmental niches
Santa Fe 2K
The main steps (I)
0. Semantically opaque hierarchies and lists are pre-processed in order to create ‘clean’ taxonomies
1. All concepts, relations, templates, rules, and axioms from a source ontology are represented in the ONIONS formalisms, currently Loom, Ontolingua, and OKBC
2. When available, plain text descriptions are analyzed and axiomatized (text formalization)
3. The union of such products is integrated by means of a set of generic ontologies. This is the most characteristic activity in ONIONS, which can be briefly described as follows:
Santa Fe 2K
II
3.1. For any set of sibling concepts in a taxonomy, the conceptual difference between each of them is inferred, and such difference is formalized by axioms that reuse the relations and concepts already in the library. If no concept is available to represent the difference, new concepts are added to the library
3.2. For any set of polysemous senses of a term, different concepts are stated and placed within the library according to their topic and to the available modules. (Polysemy occurs when two concepts with overlapping or disjoint intended models have the same name.)
3.3. Often, polysemous senses of a term - as well as different 'alternative' concepts - are metonymically related. For example: process/outcome (as in inflammation), region/object (as in body region), etc. Alternatives must be properly defined by making it explicit the relationship between them: e.g. "has-product" for inflammation, "location" for body-region
3.4. When stating new concepts, the relations necessary to maintain the consistency with the existing concepts are instantiated. If conflicts arise with existing theories, a more general theory is searched which is more comprehensive. If this is impracticable, an alternative theory is created
Santa Fe 2K
3.5. Relevant integration cases. Since ONIONS requires the use of generic theories to axiomatize alternative theories, the integration of a concept C from an ontology O is performed by comparing C with the concepts D1,…,n already present in the evolving ontology library L, whose ontology set M1,…,n contains at least a significant subset of generic ontologies and the set of domain ontologies at that state in the evolution of L. The following cases appear relevant to the methodology:
3.5.1. C's name is polysemous in O (internal polysemy). Iterate 3.2 ÷ 3.4 3.5.2. C's name is homonym with the name of a Di. (Homonymy occurs when both the
intended models and the domains of two concepts with the same name are disjoint.) Homonyms must be differentiated by modifying the name, or by preventing the homonyms to be included in the same module namespace
3.5.3. C's name is synonym with the name of a Di. (Synonymy is the converse of homonymy and occurs when two concepts with different names have both the same intended model and the same domain.) Synonyms must be preserved, or included in the set of lexical realizations related to the concept
3.5.4. C is subsumed by some Di in L, but it has no total mapping on any Dj in L. The gap in L must be filled by adding C as a subconcept of Di
III
Santa Fe 2K
3.5.5. C is an intersection between two concepts Di and Dj in L. Solved by distinguishing types and roles, or different defining elements
3.5.6. C has an alternative concept Di in L (same domain, but overlapping or disjoint intended models):
3.5.6.1. If C metonymically depends on Di, C is properly related to Di
3.5.6.2. If C and Di are different viewpoints on the same domain of interest, both concepts are kept; if the case, they are included in separate modules
3.5.6.3. If the intended model of C is finer than Di's, Di is substituted with C
3.5.6.4. If the intended model of C is coarser than Di's, C is ignored (but track of it is kept for mapping between sources)
IV
Santa Fe 2K
4. The library of generic, intermediate, and domain ontologies should be stratified, say domain modules should include intermediate modules - that should include generic modules - so that each set of modules can be plugged or unplugged from its more general set without affecting the coherence of the entire library
5. The source ontologies are explicitly mapped to the integrated ontology, in order to allow interoperability. The only admitted mappings are equivalent and coarser equivalent. Formally: for any source ontology SO and an ontology IO that is supposed to result (also) from the integration of SO, for any concept Ci in SO, there is a Di in IO such that Ci
I = DiI (equivalence of
possible interpretations), or there is a disjunctive concept (or Di Dj) in IO such that Ci
I = DiI Dj
I (equivalence of possible interpretations to a disjunction of concepts – i.e. to a union of finer concepts)
5.1. Partial mappings must have been already resolved through the methodology: if any, some step in the integration procedure must be iterated
V
Santa Fe 2K
Ambiguous hierarchies
Entity Event
Conceptual Entity Phenomenon or Process
Natural Phenomenon Injury or Poisoning
Biologic Function
Pathologic Function
Finding
fractures
malunion and nonunion of fracture
ununited fractures
Santa Fe 2K
A principled formalization
(defconcept ununited-fracture :is-primitive (and fracture (some morphology (and bone (or (some embodies malunion) (not integral)))) (some dependently-postdates fracture)
(all interpretant clinical-condition)))
Santa Fe 2K
Some UMLS concepts pertaining the intersection: Amino Acid, Peptide, or Protein & Carbohydrate
(|hamster oviduct-specific glycoprotein|)(|Par j I|)(|(Man)6(GlcNAc)2Asn|)(|Zn(+2)-IAA|)(|collapsing factor|)(|BDV 18K glycoprotein|)(|SI-gene-associated glycoprotein, Nicotiana|)(|FdI allergen|)(|sca gene product|)(|EPV20 protein|)(|lubricin|)(|Pluritene|)(|Par h 1 allergen|)(|Wnt11 gene product|)(|I-D-Gal-BSA|)(|mannose-bovine serum albumin conjugate|)(|acrosome granule lysin|)(|sulfatide activator|)(|vaccinia virus A34R protein|)
=> More than 118,000 UMLS concepts (25%) are classified under an intersection
Santa Fe 2K
Ontological analysis of the intersection
(defconcept |Amino Acid, Peptide, or Protein & Carbohydrate|
"834 instances. This conjunct includes two sibling types. A protein containing a carbohydrate."
:annotations ((Sugg.Name "carbohydrate-containing-protein") (onto-status integrated))
:is-primitive (:and protein (:some has-component carbohydrate))
:context :substances)
Santa Fe 2K
Morphologies
Names of anatomical morphologies are often polysemous: Both a condition and the function that caused the condition ("inflammation",
"ulcer", "fracture", "wound", "hyperplasia") Both an object and the function that produced the object ("neoplasm",
"hemorrhage") Both an object O and the condition created in another object O' by O
("obstruction")
For example: "the fracture has been caused by a fall" vs. "the fracture is transverse"; "the obstruction occurred in the jejunum" vs. "the obstruction has been removed"
Conceptual analysis puts into evidence other issues concerning morphologies:• The dependence between a morphological condition, a function, and the related
organ. For example, an "ulcer" (as a condition) of a stomach implies that the stomach embodies an ulceration function (an ulcer as a function)
• The mereological import of morphologies: some are featured by an organ, some only by a part of an organ. For instance, an "ectopic heart" is wholly ectopic, but an "ulcerated stomach" is only partly ulcerated
Santa Fe 2K
Morphologies analyzed
a property ("color", "consistency", "thickness", "size", "number", "shape") a condition:
a topologically relevant condition: an alteration of connection:
that creates a configuration (a new property) in an object ("fracture", "wound")
in the holey interior of an object ("obstruction") between several objects ("fusion")
an alteration of the boundary between an object holey interior and the object complement:
creating a configuration in the boundary ("cavitation", "ulcer") producing a substance flow ("hemorrhage", "ulcer")
an abnormal placement ("dislocation", "ectopia", "absence") a form alteration condition ("deformity", "hyperplasia", "hypoplasia") a condition involving the alteration of several properties ("inflammation", "eruption")
an abnormal, foreign object ("mass", "neoplasm", "calculus", "obstruction")
Santa Fe 2K
Expliciting relations
Group
Member
Regionlocation
Health-Condition
Procedure
Guideline
target-population
has-member
uniquely-located target
has-method
Santa Fe 2K
Medical source ontologies
• The UMLS top-level (1998 edition: 132 "semantic types", 91 "relations", and 412 "templates"),
• The Snomed-III top-level (510 "terms" and 25 "links"), • GMN top-level (708 "terms"), • The Icd10 top-level (185 "terms"), and • The GALEN Core Model v.5h (2,730 "entities", 413 "attributes"
and 1,692 axioms), etc.
• The 1998 edition of the UMLS Metathesaurus (476,000 "concepts", 93,000 explicit templates, and 599,000 thesaurus-like templates)
Santa Fe 2K
The current ON9.2 libraryMetaontology Equality Dependence
Structuring-Concepts Layers
GranularityTop-Level
Mereology
TopologyUnrestricted-Time
Meronymy
Localization
Topo-Morphology
Morphology
Actors
Units
Physical-Concepts
Quantities
Positions
Assessment Representation
Social-Objects
Abstract-Objects
Topics
Artifacts
Substances
Procedures
Anatomy
Body-Directions
Biologic-Functions
Molecular-Biology
Natural-Kinds
Abnormalities
Medical-Procedures
Clin-Act
Biologic-Substances
Web-Notions
Guidelines
Diagrams
Planning
Santa Fe 2K
The current top-level
entity
*sign
occurrentcontinuant
objectprocess
situation
interval
act
material-function
biologic-function
physiologic-function pathologic-function
material-object
social-object abstract-object
biologic-object
substance
anatomical-structureorganism
non-biologic-function
natural-phenomenonhuman-caused-phenomenon
information
language
topic
notion
region
submolecular-object
Santa Fe 2K
Tool for representation
ONTOLINGUA
Tool for representation and classification
LOOM
Tool for intermediate representation and interchange
OKBC
Tool for browsing and editing ONTOSAURUS
Santa Fe 2K
Santa Fe 2K
Results
ON9.2: integration of the medical top levels within a library of generic theories. It includes a set of 50 modules with about 1,500 concepts. It is available in both Ontolingua and Loom languages
Explicitation of the Metathesaurus terminological knowledge: intersections of UMLS semantic types, relations defined by sources (IS_A and other relations)
Integration of the Metathesaurus intersections within ON9.2
Contextualization of the Metathesaurus
An integrated model of clinical guidelines
Santa Fe 2K
What is a Domain Reference Ontology?
An ontology usable to build new ontologies in a domain, or to plug existing ontologies in it
Our research in medical conceptual structures aims at defining a Medical Reference Ontology (library)
The current research in environmental metadata could be reconsidered as the construction of an Environmental Reference Ontology
We are confident that our methodology is suitable to this task without substantial revision
Warning: at first sight, conceptual heterogeneity in environment seems harder than medicine
Santa Fe 2K
"Es gibt nichts praktischers als eine gute Theorie"
(Ludwig von Boltzmann)
Santa Fe 2K
"Es gibt nichts praktischers als eine gute Theorie"
"There is nothing more practical than a good theory"
(Ludwig von Boltzmann)
Santa Fe 2K
Referencesfor generalities, the library, and conceptual investigations:Gangemi A, Pisanelli DM, Steve G, "An overview of the
ONIONS project: Applying ontologies to the integration of medical terminologies", Data and Knowledge Engineering, 31 (1999), 183-220
for the investigation of the UMLS:
Pisanelli DM, Gangemi A, Steve G, "An Ontological Analysis of the UMLS Metathesaurus", Journal of American Medical Informatics Association, vol. 5 (symposium supplement), 1998
for the pre-processing of informal terminological repositories:
Steve G, Gangemi A, Pisanelli DM, "Integrating Medical Terminologies with ONIONS Methodology", in Kangassalo H, Charrel JP (eds.) Information Modelling and Knowledge Bases VIII, Amsterdam, IOS Press 1997
for the integration of clinical guidelines:
Pisanelli DM, Gangemi A, Steve G, "Toward a Standard for Guideline Representation: an Ontological Approach", Journal of American Medical Informatics Association, vol. 6 (symposium supplement), 1999