Upload
creda
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Balancing Lexicographic and Ontological Considerations in Ontology Development. First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo [email protected]. Ontologies vs. Wordnets. Wordnets represent how we use language - PowerPoint PPT Presentation
Citation preview
Balancing Lexicographic and Ontological Considerations in
Ontology Development
First International Workshop on Ontological Analysis Trento, IT
16-20 July, 2012
Amanda Hicks, University at Buffalo
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
2
Ontologies vs. Wordnets
Wordnets represent
• how we use language– the word ‘cat’ in context
Ontologies represent
• what it is to be a cat– e.g., whether being a cat is a rigid property
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
3
Overview of some ontologies
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
4
3 Layers of Ontologies
• Upper Most abstract
• Middle Intermediately abstract
• Domain Specific to a domain or
application
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
5
Domain Ontologies
• are often developed by domain experts. • model highly specific, technical information.• often for use in a particular community of
researchers, technicians, etc. • Examples:
– Gene Ontology – KYOTO domain ontology– Protein Ontology
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
6
Middle Ontologies
• are developed by ontologists or other information technologists.
• model concepts that are often part of a normal, spoken and written vocabulary and of an intermediate level of abstraction.
• connect upper-level ontologies with the domain ontology.
• Examples:– KYOTO Middle – Information Artifact Ontology
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
7
Upper Ontologies
• developed by ontologists
• models highly abstract concepts– endurant vs. perdurant– quality vs. substance
• Because the axioms at this level will be inherited all the way down, we need to be really careful here!
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
8
Upper Ontologies, some examples
BFO - http://www.ifomis.org/bfo
SUMO - http://www.ontologyportal.org
DOLCE http://www.loa.istc.cnr.it/DOLCE.html
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
9
BFO
• is a relatively shallow top ontology – 36 classes– 6 layers deep
• The BFO consortium coordinates many biomedical domain ontologies, users, and developers.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
10
DOLCE
• DOLCE-Lite– 37 classes– depth of 6
• DOLCE-Lite Plus– 208 classes– depth of 13
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
11
The KYOTO Project
• 7th frame EU project, 2007-2010
• facilitates data mining and sharing from texts in the domain of ecology across seven languages
• WWF & ECNC are domain users
• www.kyoto-project.eu
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
12
The KYOTO Ontology
• Three layers Top, Middle, Domain• Seven wordnets mapped to KYOTO Ontology
to facilitate data extraction and management– English– Spanish– Basque– Italian– Dutch– Japanese– Chinese
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
13
The KYOTO Ontology
KYOTO 3 - three layers Top, Middle, Domain
Wordnets mapped to KYOTO Ontology to facilitate data extraction and sound inference– English– Spanish– Basque– Italian– Dutch– Japanese– Chinese
Use Protégé 4.0 or older. KYOTO is not written in OWL2.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
14
KYOTO Top
Based on DOLCE-Lite Plus• In DLP qualities are modeled according to the kinds
of entities that bear the quality.– e.g., size is a physical quality since it inheres in a physical
object
• KYOTO Top extends the physical-quality hierarchy– amount-of-matter-quality– feature-quality– physical-object-quality
• Added quality types– dispositional– relational
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
15
KYOTO Top
KYOTO Top extends the role hierarchy.
Roles are arranged according to the kind of entity that bears that role.– A physical-object-role is played by a physical
object.– In the domain layer offspring is a subclass of
organism-role since organisms are the kinds of things that play the role of offspring.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
16
KYOTO Middle
Includes:• Base Concepts (BCs) from WordNet
– nouns• Units of measurement, e.g., length, and
other qualities• 72 new perdurants (processes and states)• 123 new endurant terms (objects and
substances)• qualities that model adjectives
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
17
Base Concepts in KYOTO
Synsets from WordNet-3.0 (Fellbaum (1998))
– for each path from leaf to root: first node with at least 50 hyponyms
– roughly: cheap (and inadequate?) computational model for basic level concepts.
– CAREFUL: the set depends on structure and coverage of WordNet which is idosyncratic
• cake
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
18
Base Concepts in KYOTO
BCs facilitate mapping wordnets onto the ontology in KYOTO.
• WordNet is mapped onto the ontology via BCs.
• BC equivalents in other languages are indirectly mapped onto the ontology via mappings to WordNet’s BCs.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
19
Base Concepts in KYOTO
• 297 BCs from the noun hierarchy and
• 578 BCs from the verb hierarchy– need work, in Domain layer– group names such as verb_change still
appear though not ontological(Izquierdo et al. (2007)).
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
20
Sample BCs in KYOTO’s Middle Layer
• unit-of-measurement
• number
• color
• change
• book
• message
• food
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
21
BCs and KYOTO
In this case, the lexicon in conjunction with considerations of the application informed the population of the Middle
and Domain layers of KYOTO.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
22
KYOTO Domain
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
23
KYOTO Domain
Sample concepts from user scenarios– fish family– coast– soil– water– breed– biodiversity
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
24
The Lexicon &
The Ontology
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
25
is-a
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
26
“is a” The Problem
“Is-a” is ambiguous between individuals and subclasses. This can lead to confusion.
For example, species terms can be confused.
Kermit is-ai leptopelis vermiculatus.
Leptopelis vermiculatus is-ac species.Therefore, Kermit is a species.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
27
“is-a”The Rule
The Rule: Every property of a class belongs to every instance of that class.
Check for all inherited properties.
Species are comprised of many organisms that can successfully reproduce fertile off-spring.
Is Kermit comprised of many organisms that can successfully reproduce fertile off-spring?
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
28
“is a”KYOTO’s Solution
Model species terms twice!• Species in the sense of a group are modeled
as physical pluralities. This leptopelis vermiculatus is an instance NOT a subclass.
• ‘Leptopelis vermiculatus’ can also refer to a class. This is a type of organism.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
29
Rigid & Non-Rigid Terms
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
30
Rigidity The Problem
In ontologies and WordNet the subsumption relations are determined according to different criteria.
• WordNet– Hypernymy– Based on psycholinguistic data; native language speakers
agree with word-use.
• Ontology – Subclass– Based on extention of a term, every x is a y.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
31
Transitivity of Subsumption
BECAREFUL! WordNet’s Hypernomy can lead to unsound inferences.
Conclusion: If every pet has an owner, then every cat has an owner.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
32
RigidityKYOTO’s Solution
• Distinguish rigid and non-rigid terms in the wordnet.– This distinction comes from OntoClean
(Guarino and Welty)
• Distinguish between roles and types in the ontology.
• Map synsets to the ontology using different mapping relations.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
33
Rigidity
• “Cat” is a rigid concept.
• “Pet” is a non-rigid concept.
• A concept is rigid if it is essential to all of its instances.– Permanence: Fluffy is always a cat, not
always a pet– Necessity: Fluffy cannot stop being a cat,
Fluffy can stop being a pet.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
34
The Rule of Thumb(See Giancarlo’s slides for a more nuanced view.)
Non-rigid terms should not subsume rigid terms.
or
Roles should not subsume types.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
35
A Jumbled Hierarchy
amount of matter-R drug
+R antibiotic+R chemical compound
+R oil-R nutriment (a source of material to nourish
the body)
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
36
Clean Hierarchies
amount-of-matter
+R antibiotic+R chemical compound
+ R oil
substance-role (role played-by some amount-of matter)
-R drug-R nutriment
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
37
Mapping Synsets
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
38
Adjectives
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
39
AdjectivesGeneral Strategy in KYOTOQualities are easily modeled according to the kinds of
entities in which they inhere.
For example, amounts of matter are the kinds of things that have pH levels.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
40
AdjectivesGeneral Strategy in KYOTOThe values for specific qualities like pH levels are
located in regions.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
41
AdjectivesThe Problem
pH-levels are easy because
• they are measureable, i.e., objective criteria.
• they are confined to one kind of entity, namely, amounts of matter.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
42
AdjectivesThe Problem
How should we model concepts like “beneficial” or “important”?
• Subjective component• Not necessarily “out there” in the world• Not typically quantifiable• Criteria are context dependent• Many kinds of entities can be beneficial
or important.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
43
AdjectivesKYOTO’s Solution
The middle layer has a region evaluative-region to accommodate adjectives like ‘beneficial’ or ‘worthless’.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
44
AdjectivesKYOTO’s Solution
Concepts like “beneficial” and “important” are
• not in the domain specific layer since they are general concepts.
• not in the upper layer since they are “subjective”.
• not in a strictly realist ontology like BFO.• modeled orthogonally to “real” qualities
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
45
AdjectivesKYOTO’s solution
What kind of restriction can you write for length?
long or 2m.
Indefinite qualities
Definite qualities
length q-located-in (length-measurement-unit or indefinite-quality-region)
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
46
In Conclusion
• Procurement - BCs influenced the concepts included in the KYOTO ontology.
• Hierarchy - subsumption relations must be carefully distinguished in order to avoid influence from the lexicon that might lead to unsound inferences
• Qualities - Lexicalized adjectives that may not have a realist corollary need to be modeled in an orthogonal way.
16-20 July, 2012 Balancing Lexicographic and Ontological Considerations
47
BibliographyFellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. The MIT Press.Guarino N., and Welty, C., (2004). An Overview of OntoClean, Handbook on
Ontologies, ed. S. Staab and R. Studer. pp. 151-172.Herold, A., Hicks, A., Rigau, G., & Laparra, E. (2009) Kyoto Deliverable D6.2: Central
Ontology Version - 1, www.kyoto-project.eu.Hicks, A., Rigau, G. (2010) Kyoto Deliverable D8.3: Domain Extension of the Central Ontology,
www.kyoto-project.eu. Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the automatic selection of basic level
concepts. In Proceedings of the International Conference on Recent Advances on Natural Language Processing (RANLP'07), Borovetz, Bulgaria.
Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A., & Schneider, L. (2002). Wonderweb Deliverable D17. The Wonderweb Library of Foundational Ontologies and the Dolce Ontology.
Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In Proccedings of FOIS 2004 International Conference on Formal Ontology and Information Systems.
Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008.