20
Compound Noun Polysemy and Sense Enumeration in WordNet 1 Abed Alhakim Freihat, 2 Biswanath Dutta and 1 Fausto Giunchiglia 1 DISI, University of Trento Trento, Italy 2 Indian Statistical Institute (ISI) Bangalore, India eKNOW-2015, 22-27 February 2015, Lisbon, Portugal. 1

Compound Noun Polysemy and Sense Enumeration in WordNet

Embed Size (px)

Citation preview

Compound Noun Polysemy and Sense Enumeration in WordNet

1Abed Alhakim Freihat, 2Biswanath Dutta and 1Fausto Giunchiglia

1DISI, University of Trento Trento, Italy

2Indian Statistical Institute (ISI) Bangalore, India

eKNOW-2015, 22-27 February 2015, Lisbon, Portugal. 1

Outlines

Problem

WordNet

Compound Nouns

Polysemy

Compound Noun Polysemy

Sense Enumerations in Compound Nouns

Solution

Detecting Sense Enumerations in WordNet

Results

Conclusion and Future Work 2

WordNet (Princeton WordNet)

A lexical Database for English

A set of one or more synonyms (similar words) called asynset

#1 pizza, pizza pie: Italian open pie made of thin bread dough spread with aspiced mixture of e.g. tomato sauce and cheese.

Organized through semantic and lexical relations

Semantic Relations between synsets

hypernym, hyponym, meronym, …

Lexical Relations between words

Antonym, derivationally related form, ...

3

Compound Nouns

Multi-words or collocations that consist of noun modifier and modified nouns.

Nerve center

Nerve is the noun modifier

Center is the modified noun

Red Coral

Red is the noun modifier

Coral is the modified noun

4

Polysemy

A word is Polysemous if

It has more than one meaning (i.e., It belongs to more thanone synset)

BANK

HONEY

5

Compound Noun Polysemy

The cases where we use the modified noun to refer toseveral different compound nouns.

Using the word Center to refer:

center, centre, nerve center, nerve centre -- a cluster of nerve cells governinga specific bodily process.

plaza, mall, center, shopping mall, shopping center, shopping centre --mercantile establishment consisting of a carefully landscaped complex ofshops representing leading merchandisers; usually includes restaurants anda convenient parking area; a modern version of the traditional marketplace.

Using the word head to refer:

fountainhead, drumhead, head teacher, …

6

Statistics

#Nouns 104290

#Synsets that contain these nouns 74314

#Compound nouns 58946

#Synsets that contain at least one

compound noun

40560

#Compound polysemous nouns 3407

7

• More than 56% of the nouns in WordNet are compound

nouns.

• More than 45% of the synsets contain compound nouns.

Types of Compound Noun Polysemy

• *Specialization polysemy:

• Using the word turtledove to refer:

#1 Australian turtledove, turtledove, Stictopelia cuneata: small Australian dove

#2 turtledove: any of several Old World wild doves.

• Metonymy:

• Using the word cherry to refer:

• #2 cherry, cherry tree: any of numerous trees and shrubs producing a small fleshy round fruit with a single hard stone.

• #3 cherry: a red fruit with a single hard stone.

• Sense enumerations

*Freihat, A. A., Giunchiglia, F. and Dutta, B. (2013). Solving specialization polysemy in WordNet. International Journal of

Computational Linguistics and Applications, vol. 4, no. 1, pp. 29-52. 8

Sense Enumeration in Compound Nouns

• Assignment of the noun modifier or the modified noun as asynonym of the compound noun itself.

• Storing this kind of polysemy in a lexical database leads to aredundant explosion of the word meanings.

• E.g., WordNet contains 135 non polysemous synsets inwhich the term head is a noun modifier/modified noun of acompound noun. Word head should have 168 senses (atpresent 33 + 135 to add).

• WordNet assigns modified noun as a synonym of thecompound noun inconsistently.

9

Sense Enumeration in Compound Nouns (contd.)

• Possible solutions

• Adding the modified noun as a synoym to all itscorresponding compound nouns → redundancy

• Removing this kind of polysemy → our proposed solution

10

Disambiguating Compound Nouns

We use usually modified nouns to refer to their correspondingcompound nouns (e.g., center to refer: shopping center,research center, medical center,...)

Is it necessary to store the compound nouns and theircorresponding modified nouns as synonyms in the lexicon?

Disambiguating the modified nouns …

Are we able to disambiguate modified nouns because

We store the synonymy in our mental lexicon, OR

It is a syntactic process that does not depend on thelexicon?

11

Discovery and Elimination of Sense Enumerations in Compound Nouns

Two phases:

Discovery of sense enumerations in CompoundNouns

A semi automatic process

Elimination of sense enumerations

An automatic process

12

Discovery of sense enumerations in Compound Nouns (phase I)

Semi automatic:

Deploying an algorithm that returns sense enumerationcandidates in compound noun the polysemous nouns.

The algorithm excludes:

Specialization polysemy instances

Metonymy instances

Exclusion of false positives.

This step is manual where we exclude the false positives

We exclude: missing adjunct noun/modified noun synsetand term abbreviations.

13

Discovery of sense enumerations in Compound Nouns (phase I Contd…)

Exclusion of false positives:

Missing adjunct noun/ modified noun:

#1 party, political party -- an organization to gain political power.#2. party -- an occasion on which people can assemble for social interaction and

entertainment.#3. party, company -- a band of people associated temporarily in some activity.#4. party -- a group of people gathered together for pleasure.#5. party -- a person involved in legal proceedings.

Term abbreviation

milliliter, millilitre, mil, ml, cubic centimeter, cubic centimetre, cc -- a metric unit ofvolume equal to one thousandth of a liter.

14

Elimination of Sense Enumerations in Compound Nouns (phase II)

An automatic process:

We eliminate the sense enumerations by removing the polysemous modified nouns.

E.g., applying the function on head, the synset #32 is the synset #32':

#32 drumhead, head: a membrane that is stretched taut over a drum.

#32' drumhead: a membrane that is stretched taut over a drum.

15

Result and Evaluation

Results of the discovery of the algorithm.

Manual validation result.

Disambiguation algorithm result.

• In 80% cases, there is total agreement between the two evaluators.

• In 94% cases, there is partial agreement between the two evaluators.

16

#Compound noun polysemous terms 2270

#Compound noun polysemous synsets 2952

#Compound noun polysemous instances 11650

#Compound noun polysemous terms 1905

#Compound noun polysemous synsets 2547

#Compound noun polysemous instances 11088

#Nouns #Synsets #Senses

Before applying the algorithm 104290 74314 130207

After applying the algorithm 104290 74314 127660

Conclusion

• Sense enumeration in compound noun is a source ofnoise rather than a source of knowledge.

• Which compound noun polysemus nouns we should storein a lexical dayabase?

• Only metonymy

• Lexicon should avoid redundant information that can bederived by syntactic rules or by NLP tools.

17

Future work

• Evaluation in terms of recall and precision to test our approach

• Examine the relation between sense enumeration and missingterms.

• e.g., bony pelvis and head of muscle are missing in thefollowing two synsets respectively:

#25 head: the rounded end of a bone that bits into arounded cavity in another bone to form a joint.

#26 head: that part of a skeletal muscle that is away fromthe bone that it moves.

18

Acknowledgement

• The research leading to these results has received funding fromthe European Community’s Seventh Framework Program undergrant agreement n. 600854, Smart Society (http://www.smart-society-project.eu/).

19

Thank you

Obrigado

Grazie

لكمشكرا

for kind attention!!!

[email protected]