47
Balancing Lexicographic and Ontological Considerations in Ontology Development First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo [email protected]

Balancing Lexicographic and Ontological Considerations in Ontology Development

  • Upload
    creda

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Balancing Lexicographic and Ontological Considerations in Ontology Development. First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo [email protected]. Ontologies vs. Wordnets. Wordnets represent how we use language - PowerPoint PPT Presentation

Citation preview

Page 1: Balancing Lexicographic and Ontological Considerations in Ontology Development

Balancing Lexicographic and Ontological Considerations in

Ontology Development

First International Workshop on Ontological Analysis Trento, IT

16-20 July, 2012

Amanda Hicks, University at Buffalo

[email protected]

Page 2: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

2

Ontologies vs. Wordnets

Wordnets represent

• how we use language– the word ‘cat’ in context

Ontologies represent

• what it is to be a cat– e.g., whether being a cat is a rigid property

Page 3: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

3

Overview of some ontologies

Page 4: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

4

3 Layers of Ontologies

• Upper Most abstract

• Middle Intermediately abstract

• Domain Specific to a domain or

application

Page 5: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

5

Domain Ontologies

• are often developed by domain experts. • model highly specific, technical information.• often for use in a particular community of

researchers, technicians, etc. • Examples:

– Gene Ontology – KYOTO domain ontology– Protein Ontology

Page 6: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

6

Middle Ontologies

• are developed by ontologists or other information technologists.

• model concepts that are often part of a normal, spoken and written vocabulary and of an intermediate level of abstraction.

• connect upper-level ontologies with the domain ontology.

• Examples:– KYOTO Middle – Information Artifact Ontology

Page 7: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

7

Upper Ontologies

• developed by ontologists

• models highly abstract concepts– endurant vs. perdurant– quality vs. substance

• Because the axioms at this level will be inherited all the way down, we need to be really careful here!

Page 8: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

8

Upper Ontologies, some examples

BFO - http://www.ifomis.org/bfo

SUMO - http://www.ontologyportal.org

DOLCE http://www.loa.istc.cnr.it/DOLCE.html

Page 9: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

9

BFO

• is a relatively shallow top ontology – 36 classes– 6 layers deep

• The BFO consortium coordinates many biomedical domain ontologies, users, and developers.

Page 10: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

10

DOLCE

• DOLCE-Lite– 37 classes– depth of 6

• DOLCE-Lite Plus– 208 classes– depth of 13

Page 11: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

11

The KYOTO Project

• 7th frame EU project, 2007-2010

• facilitates data mining and sharing from texts in the domain of ecology across seven languages

• WWF & ECNC are domain users

• www.kyoto-project.eu

Page 12: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

12

The KYOTO Ontology

• Three layers Top, Middle, Domain• Seven wordnets mapped to KYOTO Ontology

to facilitate data extraction and management– English– Spanish– Basque– Italian– Dutch– Japanese– Chinese

Page 13: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

13

The KYOTO Ontology

KYOTO 3 - three layers Top, Middle, Domain

Wordnets mapped to KYOTO Ontology to facilitate data extraction and sound inference– English– Spanish– Basque– Italian– Dutch– Japanese– Chinese

Use Protégé 4.0 or older. KYOTO is not written in OWL2.

Page 14: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

14

KYOTO Top

Based on DOLCE-Lite Plus• In DLP qualities are modeled according to the kinds

of entities that bear the quality.– e.g., size is a physical quality since it inheres in a physical

object

• KYOTO Top extends the physical-quality hierarchy– amount-of-matter-quality– feature-quality– physical-object-quality

• Added quality types– dispositional– relational

Page 15: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

15

KYOTO Top

KYOTO Top extends the role hierarchy.

Roles are arranged according to the kind of entity that bears that role.– A physical-object-role is played by a physical

object.– In the domain layer offspring is a subclass of

organism-role since organisms are the kinds of things that play the role of offspring.

Page 16: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

16

KYOTO Middle

Includes:• Base Concepts (BCs) from WordNet

– nouns• Units of measurement, e.g., length, and

other qualities• 72 new perdurants (processes and states)• 123 new endurant terms (objects and

substances)• qualities that model adjectives

Page 17: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

17

Base Concepts in KYOTO

Synsets from WordNet-3.0 (Fellbaum (1998))

– for each path from leaf to root: first node with at least 50 hyponyms

– roughly: cheap (and inadequate?) computational model for basic level concepts.

– CAREFUL: the set depends on structure and coverage of WordNet which is idosyncratic

• cake

Page 18: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

18

Base Concepts in KYOTO

BCs facilitate mapping wordnets onto the ontology in KYOTO.

• WordNet is mapped onto the ontology via BCs.

• BC equivalents in other languages are indirectly mapped onto the ontology via mappings to WordNet’s BCs.

Page 19: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

19

Base Concepts in KYOTO

• 297 BCs from the noun hierarchy and

• 578 BCs from the verb hierarchy– need work, in Domain layer– group names such as verb_change still

appear though not ontological(Izquierdo et al. (2007)).

Page 20: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

20

Sample BCs in KYOTO’s Middle Layer

• unit-of-measurement

• number

• color

• change

• book

• message

• food

Page 21: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

21

BCs and KYOTO

In this case, the lexicon in conjunction with considerations of the application informed the population of the Middle

and Domain layers of KYOTO.

Page 22: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

22

KYOTO Domain

Page 23: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

23

KYOTO Domain

Sample concepts from user scenarios– fish family– coast– soil– water– breed– biodiversity

Page 24: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

24

The Lexicon &

The Ontology

Page 25: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

25

is-a

Page 26: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

26

“is a” The Problem

“Is-a” is ambiguous between individuals and subclasses. This can lead to confusion.

For example, species terms can be confused.

Kermit is-ai leptopelis vermiculatus.

Leptopelis vermiculatus is-ac species.Therefore, Kermit is a species.

Page 27: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

27

“is-a”The Rule

The Rule: Every property of a class belongs to every instance of that class.

Check for all inherited properties.

Species are comprised of many organisms that can successfully reproduce fertile off-spring.

Is Kermit comprised of many organisms that can successfully reproduce fertile off-spring?

Page 28: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

28

“is a”KYOTO’s Solution

Model species terms twice!• Species in the sense of a group are modeled

as physical pluralities. This leptopelis vermiculatus is an instance NOT a subclass.

• ‘Leptopelis vermiculatus’ can also refer to a class. This is a type of organism.

Page 29: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

29

Rigid & Non-Rigid Terms

Page 30: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

30

Rigidity The Problem

In ontologies and WordNet the subsumption relations are determined according to different criteria.

• WordNet– Hypernymy– Based on psycholinguistic data; native language speakers

agree with word-use.

• Ontology – Subclass– Based on extention of a term, every x is a y.

Page 31: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

31

Transitivity of Subsumption

BECAREFUL! WordNet’s Hypernomy can lead to unsound inferences.

Conclusion: If every pet has an owner, then every cat has an owner.

Page 32: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

32

RigidityKYOTO’s Solution

• Distinguish rigid and non-rigid terms in the wordnet.– This distinction comes from OntoClean

(Guarino and Welty)

• Distinguish between roles and types in the ontology.

• Map synsets to the ontology using different mapping relations.

Page 33: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

33

Rigidity

• “Cat” is a rigid concept.

• “Pet” is a non-rigid concept.

• A concept is rigid if it is essential to all of its instances.– Permanence: Fluffy is always a cat, not

always a pet– Necessity: Fluffy cannot stop being a cat,

Fluffy can stop being a pet.

Page 34: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

34

The Rule of Thumb(See Giancarlo’s slides for a more nuanced view.)

Non-rigid terms should not subsume rigid terms.

or

Roles should not subsume types.

Page 35: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

35

A Jumbled Hierarchy

amount of matter-R drug

+R antibiotic+R chemical compound

+R oil-R nutriment (a source of material to nourish

the body)

Page 36: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

36

Clean Hierarchies

amount-of-matter

+R antibiotic+R chemical compound

+ R oil

substance-role (role played-by some amount-of matter)

-R drug-R nutriment

Page 37: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

37

Mapping Synsets

Page 38: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

38

Adjectives

Page 39: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

39

AdjectivesGeneral Strategy in KYOTOQualities are easily modeled according to the kinds of

entities in which they inhere.

For example, amounts of matter are the kinds of things that have pH levels.

Page 40: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

40

AdjectivesGeneral Strategy in KYOTOThe values for specific qualities like pH levels are

located in regions.

Page 41: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

41

AdjectivesThe Problem

pH-levels are easy because

• they are measureable, i.e., objective criteria.

• they are confined to one kind of entity, namely, amounts of matter.

Page 42: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

42

AdjectivesThe Problem

How should we model concepts like “beneficial” or “important”?

• Subjective component• Not necessarily “out there” in the world• Not typically quantifiable• Criteria are context dependent• Many kinds of entities can be beneficial

or important.

Page 43: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

43

AdjectivesKYOTO’s Solution

The middle layer has a region evaluative-region to accommodate adjectives like ‘beneficial’ or ‘worthless’.

Page 44: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

44

AdjectivesKYOTO’s Solution

Concepts like “beneficial” and “important” are

• not in the domain specific layer since they are general concepts.

• not in the upper layer since they are “subjective”.

• not in a strictly realist ontology like BFO.• modeled orthogonally to “real” qualities

Page 45: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

45

AdjectivesKYOTO’s solution

What kind of restriction can you write for length?

long or 2m.

Indefinite qualities

Definite qualities

length q-located-in (length-measurement-unit or indefinite-quality-region)

Page 46: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

46

In Conclusion

• Procurement - BCs influenced the concepts included in the KYOTO ontology.

• Hierarchy - subsumption relations must be carefully distinguished in order to avoid influence from the lexicon that might lead to unsound inferences

• Qualities - Lexicalized adjectives that may not have a realist corollary need to be modeled in an orthogonal way.

Page 47: Balancing Lexicographic and Ontological Considerations in Ontology Development

16-20 July, 2012 Balancing Lexicographic and Ontological Considerations

47

BibliographyFellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. The MIT Press.Guarino N., and Welty, C., (2004). An Overview of OntoClean, Handbook on

Ontologies, ed. S. Staab and R. Studer. pp. 151-172.Herold, A., Hicks, A., Rigau, G., & Laparra, E. (2009) Kyoto Deliverable D6.2: Central

Ontology Version - 1, www.kyoto-project.eu.Hicks, A., Rigau, G. (2010) Kyoto Deliverable D8.3: Domain Extension of the Central Ontology,

www.kyoto-project.eu. Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the automatic selection of basic level

concepts. In Proceedings of the International Conference on Recent Advances on Natural Language Processing (RANLP'07), Borovetz, Bulgaria.

Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A., & Schneider, L. (2002). Wonderweb Deliverable D17. The Wonderweb Library of Foundational Ontologies and the Dolce Ontology.

Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In Proccedings of FOIS 2004 International Conference on Formal Ontology and Information Systems.

Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008.