Evaluating Ontologies with Rudify - KEOD 2009, Madeira Portugal Evaluating Ontologies with Rudify Axel Herold, Amanda Hicks herold, Berlin-Brandenburg

Evaluating Ontologies with Rudify - KEOD 2009, Madeira Portugal

Evaluating Ontologies with Rudify

Axel Herold, Amanda Hicksherold, [email protected]

Berlin-Brandenburg Academy of Sciences and Humanities

mailto:[email protected]


Introduction

This presentation describes Rudify, a set of tools used for semi-automatically tagging concepts with ontological meta-properties.– Theory– Development– Evaluation– Application


BackgroundDeveloped within the Kyoto Project• Kyoto performs deep semantic searches

• Models and shares knowledge across different domains and different language communities

• Spanish, Dutch, English, Italian, Basque, Chinese, Japanese

• Initially targeted to the domain of ecology, but domain neutral


For More InformationSee• www.kyoto-project.eu• Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008.

http://www.kyoto-project.eu/


Theory:OntoClean

• Developed by Gaurino and Welty Guarino N., and Welty, C., (2004)

• Evaluates ontological hierarchies• Based on meta-properties of concepts– Rigidity– Unity– Dependence– Identity


OntoClean, cont.• Does

– show logical consequences of modeling choices after meta-properties have been assigned concepts in the ontology.

• Does not – assign meta-properties – proscribe a method for assigning meta-properties

This is where Rudify comes in!


Rigidity in Particular

• We focus on Rigidity because– Importance of type-role distinction

– Easy to find lexical patterns– AEON focused on Rigidity, compare data (Völker et al. 2008)


Rigidity

• A concept is rigid if it is essential to all of its instances.

• An essential concept is one that necessarily holds for all of its instances.

• “Cat” is rigid, “Pet” is not.


Roles vs. Types• Types are rigid sortals, roles are non-rigid sortals.

• Sortals describe what sort of thing a concept represents, – “cat”, “milk”, “hurricaine” are sortals,

– “red”, “heavy”, and “singing” are not– Sortal concepts usually correspond to nouns.


The RuleNouns tagged as non-rigid should not subsume nouns tagged as rigid.

Conclusion: if Fluffy ceases to be a pet, Fluffy ceases to be a cat.


Rudfiy Development:Basic Idea

Basic idea described in Völker et al. (2005), Völker et al. (2008)

• Talking about ontological metaproperties:– Unconsciously and implicitly: everybody (at times) (a) "Fluffy stopped being a CAT.” (b) "I wonder if this squirrel would make a

good PET.”– assumption: on the internet (e.g. WWW): lots of

instances like (a), (b)


What Rudify Does

• Queries the internet for complex patterns that might indicate meta-properties of the concept

• Uses the number of correct hits to assign scores to rigidity of the concept


Input

• list of lexical representations (LR) for concepts– e.g. CAT <-- cat (English), gata (Spanish), ...– e.g. PET <-- pet (English), animal doméstico

(Spanish), ...• list of complex linguistic patterns with slots for

LRs:– e.g. "... would make a (good|bad|perfect) X”– e.g. "... is not (|a|an) X anymore"


Input: cont

• additional feature descriptions for LR-slots,– e.g. part-of-speech(X) == NOUN

• additional feature descriptions for LR-contexts,– e.g. part-of-speech (right_neighbor(X)) !=

NOUN


Sample output

(a) "is no longer (|a|an) X" (b) && !"there is no longer (|a|an) X" (c) && !"is no longer (|a|an) X Y/NOUN" 1. "He was once a 'loved' family member, but now

has gone feral as he is no longer a pet." (ok)2. "It is guaranteed only if there is no longer a pet

in the home." (violates (b))3. "Diversity is no longer a pet project on the

sidelines of corporate life." (violates (c))


What do patterns indicate?

For Rigidty• negative evidence: discover concepts that

might be non-rigid– 25 different linguistic patterns

• positive evidence: discover concepts that might be rigid– No patterns so far.

• hypothesis: all concepts rigid unless evidence against rigidity found (problem with sparse data!)


Training Data

100 manually annotated monosemous LRs (gold standard)– WordNet 3.0– 50 rigid– 50 non-rigid


Clasification

• 4 classifiers trained on gold standard– decision tree (J48, a C4.5 implementation)– multinomial logistic regression– nearest neighbor with generalization

(NNge)– locally weighted learning, instance based

• problem: polysemous LRs


Test Data: Base Concepts

• 297 Base Concepts (Izquierdo et al. (2007))• LRs from WordNet-3.0 (Fellbaum (1998))

– for each path from leaf to root: first node with at least 50 connections pointing downwards

– roughly: cheap computational model for basic level concepts (Rosch et al. (1976))

– set depends on structure and coverage of WordNet


Test Data, cont.• We tested Rudify on English language data sets:– 50 region terms (hand selected by environmental domain specialist)

– 236 latin species names (hand selected by environmental domain specialist)

– 201 common species names (hand selected by environmental domain specialist)


Domain Specific Terms• Classifiers correctly classified all region terms and all Latin species names as rigid.

• Classified all common English species names correctly with three exceptions: – "wildcat" was misclassified as denoting a non-rigid concept by all four classifiers

– "wolf" and "apollo" (a butterfly) were mis-classified by all classifiers except NNge.


Examples from Log Files

• Mount Si High School teacher Kit McCormick is no longer a Wildcat. (generalization from a school mascot to a school member)

• Also the 400 CORBON is no longer a wildcat. (a handgun)

• For example, the dog is no longer a wolf, and is now a whole seperate species. (polysemy)

• For four years, the space agency had been planning, defining, or defending some facet of what led up to and became Apollo. (a space mission's name)


Rudify Output on Base Concepts

Rudify output Number of results

Decisive 4/0 215Difficult 82Indecisive

3/156

Undecided 2/2

26


Decisive Rudify OutputInstances Percentages

Rigid Rudify decides

6 4%

incorrect 20 12%correct 142 85%

Non-rigid incorrect 12 25%correct 36 75%


Summary of Output

Instances Decisive Total Term

Rudify decides

7 3% 2%

Incorrect 32 14% 11%Correct 178 83% 60%Difficult 82 Na 28%


Overall evaluation

• Decisive output yields a high level of accuracey.– Many errors either come from high level concepts, e.g., “Artifact” and “Unit-of-measurement”, or polysemy.

• We do not regard indecisive output as usable data.


Application of Outputamount of matter

-R drug +R antibiotic

+R chemical compound+R oil

-R nutriment (a source of material to nourish the body)


Resulting Hierarchiesamount-of-matter

antibioticchemical compound

oil

substance-roledrugnutriment


Conclusions• Rudify fills a gap in the process of ontology evaluation and development by semi-automatically tagging concepts with meta-properties that are useful in ensuring clean ontological hierarchies.

• In particular, Rudify can be productively used to distinguish types from roles in order to produce clean ontologies.


Works CitedFellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. The MIT Press.Guarino N., and Welty, C., 2004. An Overview of OntoClean,

Handbook on Ontologies, ed. S. Staab and R. Studer. pp. 151-172.

Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the automatic selection of basic level concepts. In Proceedings of the International Conference on Recent Advances on Natural Language Processing (RANLP'07), Borovetz, Bulgaria.

Kilgarriff, A. (2007). Googleology is bad science. Computational Linguistics, 33:147 151.Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., and Boyes-Braem, P. (1976). Basic

objects in natural categories. Cognitve Psychology, 8:382 439.Völker et al. (2005): Automatic Evaluation of Ontologies (AEON). In: Gil et al.: Proceedings

of the 4th International Semantic Web Conference (ISWC2005), volume 3729 of LNCS, pp. 716-731, Berlin/Heidelberg

Völker et al. (2008): AEON: An approach to the automatic evaluation of ontologies. In: Applied Ontology 3, pp. 41 --62.

Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008.


Thank you for your attention.


Should you use Google?

• results unstable over time, thus not exactly reproducible

• no linguistic search (extensive post processing needed)

• arbitrary limitations on results and meta data• WWW: uncontrolled repository, Google:

unknown filter• see e.g. Kilgarriff (2007)BUT: We need vast amounts of data given the rare

occurrence of LRs in our patterns

Documents

Evaluating Ontologies with Rudify - KEOD 2009, Madeira Portugal Evaluating Ontologies with Rudify Axel Herold, Amanda Hicks herold, Berlin-Brandenburg