Upload
biswanath-dutta
View
280
Download
2
Tags:
Embed Size (px)
Citation preview
Compound Noun Polysemy and Sense Enumeration in WordNet
1Abed Alhakim Freihat, 2Biswanath Dutta and 1Fausto Giunchiglia
1DISI, University of Trento Trento, Italy
2Indian Statistical Institute (ISI) Bangalore, India
eKNOW-2015, 22-27 February 2015, Lisbon, Portugal. 1
Outlines
Problem
WordNet
Compound Nouns
Polysemy
Compound Noun Polysemy
Sense Enumerations in Compound Nouns
Solution
Detecting Sense Enumerations in WordNet
Results
Conclusion and Future Work 2
WordNet (Princeton WordNet)
A lexical Database for English
A set of one or more synonyms (similar words) called asynset
#1 pizza, pizza pie: Italian open pie made of thin bread dough spread with aspiced mixture of e.g. tomato sauce and cheese.
Organized through semantic and lexical relations
Semantic Relations between synsets
hypernym, hyponym, meronym, …
Lexical Relations between words
Antonym, derivationally related form, ...
3
Compound Nouns
Multi-words or collocations that consist of noun modifier and modified nouns.
Nerve center
Nerve is the noun modifier
Center is the modified noun
Red Coral
Red is the noun modifier
Coral is the modified noun
4
Polysemy
A word is Polysemous if
It has more than one meaning (i.e., It belongs to more thanone synset)
BANK
HONEY
5
Compound Noun Polysemy
The cases where we use the modified noun to refer toseveral different compound nouns.
Using the word Center to refer:
center, centre, nerve center, nerve centre -- a cluster of nerve cells governinga specific bodily process.
plaza, mall, center, shopping mall, shopping center, shopping centre --mercantile establishment consisting of a carefully landscaped complex ofshops representing leading merchandisers; usually includes restaurants anda convenient parking area; a modern version of the traditional marketplace.
Using the word head to refer:
fountainhead, drumhead, head teacher, …
6
Statistics
#Nouns 104290
#Synsets that contain these nouns 74314
#Compound nouns 58946
#Synsets that contain at least one
compound noun
40560
#Compound polysemous nouns 3407
7
• More than 56% of the nouns in WordNet are compound
nouns.
• More than 45% of the synsets contain compound nouns.
Types of Compound Noun Polysemy
• *Specialization polysemy:
• Using the word turtledove to refer:
#1 Australian turtledove, turtledove, Stictopelia cuneata: small Australian dove
#2 turtledove: any of several Old World wild doves.
• Metonymy:
• Using the word cherry to refer:
• #2 cherry, cherry tree: any of numerous trees and shrubs producing a small fleshy round fruit with a single hard stone.
• #3 cherry: a red fruit with a single hard stone.
• Sense enumerations
*Freihat, A. A., Giunchiglia, F. and Dutta, B. (2013). Solving specialization polysemy in WordNet. International Journal of
Computational Linguistics and Applications, vol. 4, no. 1, pp. 29-52. 8
Sense Enumeration in Compound Nouns
• Assignment of the noun modifier or the modified noun as asynonym of the compound noun itself.
• Storing this kind of polysemy in a lexical database leads to aredundant explosion of the word meanings.
• E.g., WordNet contains 135 non polysemous synsets inwhich the term head is a noun modifier/modified noun of acompound noun. Word head should have 168 senses (atpresent 33 + 135 to add).
• WordNet assigns modified noun as a synonym of thecompound noun inconsistently.
9
Sense Enumeration in Compound Nouns (contd.)
• Possible solutions
• Adding the modified noun as a synoym to all itscorresponding compound nouns → redundancy
• Removing this kind of polysemy → our proposed solution
10
Disambiguating Compound Nouns
We use usually modified nouns to refer to their correspondingcompound nouns (e.g., center to refer: shopping center,research center, medical center,...)
Is it necessary to store the compound nouns and theircorresponding modified nouns as synonyms in the lexicon?
Disambiguating the modified nouns …
Are we able to disambiguate modified nouns because
We store the synonymy in our mental lexicon, OR
It is a syntactic process that does not depend on thelexicon?
11
Discovery and Elimination of Sense Enumerations in Compound Nouns
Two phases:
Discovery of sense enumerations in CompoundNouns
A semi automatic process
Elimination of sense enumerations
An automatic process
12
Discovery of sense enumerations in Compound Nouns (phase I)
Semi automatic:
Deploying an algorithm that returns sense enumerationcandidates in compound noun the polysemous nouns.
The algorithm excludes:
Specialization polysemy instances
Metonymy instances
Exclusion of false positives.
This step is manual where we exclude the false positives
We exclude: missing adjunct noun/modified noun synsetand term abbreviations.
13
Discovery of sense enumerations in Compound Nouns (phase I Contd…)
Exclusion of false positives:
Missing adjunct noun/ modified noun:
#1 party, political party -- an organization to gain political power.#2. party -- an occasion on which people can assemble for social interaction and
entertainment.#3. party, company -- a band of people associated temporarily in some activity.#4. party -- a group of people gathered together for pleasure.#5. party -- a person involved in legal proceedings.
Term abbreviation
milliliter, millilitre, mil, ml, cubic centimeter, cubic centimetre, cc -- a metric unit ofvolume equal to one thousandth of a liter.
14
Elimination of Sense Enumerations in Compound Nouns (phase II)
An automatic process:
We eliminate the sense enumerations by removing the polysemous modified nouns.
E.g., applying the function on head, the synset #32 is the synset #32':
#32 drumhead, head: a membrane that is stretched taut over a drum.
#32' drumhead: a membrane that is stretched taut over a drum.
15
Result and Evaluation
Results of the discovery of the algorithm.
Manual validation result.
Disambiguation algorithm result.
• In 80% cases, there is total agreement between the two evaluators.
• In 94% cases, there is partial agreement between the two evaluators.
16
#Compound noun polysemous terms 2270
#Compound noun polysemous synsets 2952
#Compound noun polysemous instances 11650
#Compound noun polysemous terms 1905
#Compound noun polysemous synsets 2547
#Compound noun polysemous instances 11088
#Nouns #Synsets #Senses
Before applying the algorithm 104290 74314 130207
After applying the algorithm 104290 74314 127660
Conclusion
• Sense enumeration in compound noun is a source ofnoise rather than a source of knowledge.
• Which compound noun polysemus nouns we should storein a lexical dayabase?
• Only metonymy
• Lexicon should avoid redundant information that can bederived by syntactic rules or by NLP tools.
17
Future work
• Evaluation in terms of recall and precision to test our approach
• Examine the relation between sense enumeration and missingterms.
• e.g., bony pelvis and head of muscle are missing in thefollowing two synsets respectively:
#25 head: the rounded end of a bone that bits into arounded cavity in another bone to form a joint.
#26 head: that part of a skeletal muscle that is away fromthe bone that it moves.
18
Acknowledgement
• The research leading to these results has received funding fromthe European Community’s Seventh Framework Program undergrant agreement n. 600854, Smart Society (http://www.smart-society-project.eu/).
19