Upload
clementine-young
View
212
Download
0
Embed Size (px)
Citation preview
Automatic disambiguation of English punsTristan Miller and Iryna GurevychTechnische Universität Darmstadt
Paper Review byUtsav SinhaAugust, 2015
Part of assignment in CS 671: Natural Language Processing, IIT Kanpur
PunningDeliberate use of lexical ambiguity to create
humourThree types of Puns:Homographic: same written word
An elephant's opinion carries a lot of weightHomophonic: same spoken word
Atheism is a non-prophet institutionImperfect: differs in both spelling and
pronunciationThe sign at the nudist camp read, “Clothed until April”
Problem StatementPun disambiguation - identifying the multiple
senses of a term known a priori to be a punFocus on homographic mono-lexeme puns in this
paperDataset Creation using user-submitted puns and
private collections by professional humoristsCorpus pruned by trained human annotators to:
One pun per instanceOne content word per punTwo meanings per punWeak homography
Word Sense DisambiguationTo determine the intended sense of a
polysemous term in a given communicative act
WSD systems require:Running ContextSense Inventory
Approaches to WSD:Knowledge based using Lexical Semantic
ResourcesSupervised machine learning
Lesk AlgorithmCommon topic is shared by words in a
neighbourhood Simplified Lesk (SL) Compare various dictionary definitions of
ambiguous target word with terms in its neighbouring context
The sense which has maximum overlap is intended
Limitations:Dependent on exact wordings of definitionThe dictionary glosses are very short - coarse
grained
Lesk AlgorithmSolution: use thesaurus that includes
synonyms, homonyms and derivationsSense inventory like WordNet is usedNew Problem: WordNet is too fine-grained
use clustering or coarsening techniques
Improvement in AlgorithmFind word lemma and Part of Speech (POS)
tagging to narrow list of candidate sensesSimplified Extended Lesk (SEL)
Modified SL by concatenating each sense’s definition with those of neighbouring senses from WordNet
Simplified lexically expanded Lesk (SLEL)Extension of SL using 100 entries from large distributional thesaurus to expand each word’s sense
Tie BreakingAlgorithms fail when tie in the highest lexical overlapTwo tie-breaking approaches:POS tie-breaker
Preferentially selects the best sense/pair of senses whose POS matches the result of Stanford POS tagger
Clustering of WordNet sensesAligning WordNet to coarse-grained OmegaWiki LSRBased on hypothesis - humourous puns more likely to
exploit coarse-grained homonymy than fine-grained systematic polysemy
BaselinesTwo baselines have been proposed for
comparisonRandom selection from the candidate sensesMost Frequent Sense (MFS)
Selecting the candidate sense with highest frequency in the manually tagged sense corpus
MFS baselines - difficult to beat as built on expensive sense-tagged data
Benchmark for the performance of knowledge based disambiguators
Results
Results
ObservationsPerforms well over MFS baselineAccuracy lower on verbs – have highest
polysemyDataset small for machine learning
techniquesExplore additional tie-breaking algorithmsRemove assumption of given a priori pun text
with pun detection
Thank You !