Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur

Automatic disambiguation of English punsTristan Miller and Iryna GurevychTechnische Universität Darmstadt

Paper Review byUtsav SinhaAugust, 2015

Part of assignment in CS 671: Natural Language Processing, IIT Kanpur

PunningDeliberate use of lexical ambiguity to create

humourThree types of Puns:Homographic: same written word

An elephant's opinion carries a lot of weightHomophonic: same spoken word

Atheism is a non-prophet institutionImperfect: differs in both spelling and

pronunciationThe sign at the nudist camp read, “Clothed until April”

Problem StatementPun disambiguation - identifying the multiple

senses of a term known a priori to be a punFocus on homographic mono-lexeme puns in this

paperDataset Creation using user-submitted puns and

private collections by professional humoristsCorpus pruned by trained human annotators to:

One pun per instanceOne content word per punTwo meanings per punWeak homography

Word Sense DisambiguationTo determine the intended sense of a

polysemous term in a given communicative act

WSD systems require:Running ContextSense Inventory

Approaches to WSD:Knowledge based using Lexical Semantic

ResourcesSupervised machine learning

Lesk AlgorithmCommon topic is shared by words in a

neighbourhood Simplified Lesk (SL) Compare various dictionary definitions of

ambiguous target word with terms in its neighbouring context

The sense which has maximum overlap is intended

Limitations:Dependent on exact wordings of definitionThe dictionary glosses are very short - coarse

grained

Lesk AlgorithmSolution: use thesaurus that includes

synonyms, homonyms and derivationsSense inventory like WordNet is usedNew Problem: WordNet is too fine-grained

use clustering or coarsening techniques

Improvement in AlgorithmFind word lemma and Part of Speech (POS)

tagging to narrow list of candidate sensesSimplified Extended Lesk (SEL)

Modified SL by concatenating each sense’s definition with those of neighbouring senses from WordNet

Simplified lexically expanded Lesk (SLEL)Extension of SL using 100 entries from large distributional thesaurus to expand each word’s sense

Tie BreakingAlgorithms fail when tie in the highest lexical overlapTwo tie-breaking approaches:POS tie-breaker

Preferentially selects the best sense/pair of senses whose POS matches the result of Stanford POS tagger

Clustering of WordNet sensesAligning WordNet to coarse-grained OmegaWiki LSRBased on hypothesis - humourous puns more likely to

exploit coarse-grained homonymy than fine-grained systematic polysemy

BaselinesTwo baselines have been proposed for

comparisonRandom selection from the candidate sensesMost Frequent Sense (MFS)

Selecting the candidate sense with highest frequency in the manually tagged sense corpus

MFS baselines - difficult to beat as built on expensive sense-tagged data

Benchmark for the performance of knowledge based disambiguators

Results

Results

ObservationsPerforms well over MFS baselineAccuracy lower on verbs – have highest

polysemyDataset small for machine learning

techniquesExplore additional tie-breaking algorithmsRemove assumption of given a priori pun text

with pun detection

Thank You !

Documents

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur