Upload
sibyl-sparks
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
1
Word senses: a computational response
Adam Kilgarriff
Auckland 2012 Kilgarriff: Word senses: a computational response
Auckland 2012 Kilgarriff: Word senses: a computational response 2
My PhD (in 5 slides)
What is a word sense
Auckland 2012 Kilgarriff: Word senses: a computational response 3
The lexicographers
They create them Methods
Introspection Other dictionaries Corpus
Atkins, Hanks, Krishnamurthy
Auckland 2012 Kilgarriff: Word senses: a computational response 4
What is a word sense (1)
SFIP Sufficiently frequent insufficiently
predictable (a glass of) whisky x (a glass of) tequila
Auckland 2012 Kilgarriff: Word senses: a computational response 5
What is a word sense (2)
homonymy
analogy polysemy rules
collocation
Auckland 2012 Kilgarriff: Word senses: a computational response 6
What is a word sense (3)
A cluster Of instances of use
Operationalised as: corpus lines Clustered by lexicographers
Auckland 2012 Kilgarriff: Word senses: a computational response 11
What is a word sense (3)
A cluster Of instances of use
Operationalised as: corpus lines Clustered by lexicographers
Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting
Auckland 2012 Kilgarriff: Word senses: a computational response 12
I don’t believe in word senses
Believe in: resurrection ghost witch vampire god
miracle fairy Philosophy:
Ontological commitment (same meaning different register)
“good entities to build belief systems on”
Auckland 2012 Kilgarriff: Word senses: a computational response 13
A word sense is a cluster of corpus lines
But I’m an NLP person Automatic clustering? Inspiration:
Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999
You can get semantic sense from corpora+stats
Auckland 2012 Kilgarriff: Word senses: a computational response 14
First attempt
Longman 1994 Abject failure
No grammar Corpus too small and noisy Naïve clustering Useless programmer
Auckland 2012 Kilgarriff: Word senses: a computational response 15
Collocations
Easy Most words don’t go with most other
words Then build on what we can do well metaphor, analogy, homonymy,
rules all much harder
Auckland 2012 Kilgarriff: Word senses: a computational response 16
Clustering
Word sketch Collocates organised by grammar
Dictionary Collocates (and other things) organised
by meaning How to re-organise
Auckland 2012 Kilgarriff: Word senses: a computational response 17
Observation:
corpus: arbitrary sample dictionary (=lexicon): systematic
account
Children encounter arbitrary samples develop systematic account
Auckland 2012 Kilgarriff: Word senses: a computational response 18
Corpus provisional, dispensable used to develop lexicon
Auckland 2012 Kilgarriff: Word senses: a computational response 19
Levels of abstraction
Direct linkage:
Fragile Updates (to C or D) break links
Dictionary: abstract Corpus: raw
Intermediate level needed
CorpusCorpus DictionaryDictionary
===
===
Auckland 2012 Kilgarriff: Word senses: a computational response 20
How most automatic word sense disambiguation (WSD) works Analyse dictionary to give set of
collocates Match to collocates in a corpus
Dispensable corpus
CorpusCorpus DictionaryDictionary
===
===
===
===
CollocatesCollocates
Auckland 2012 Kilgarriff: Word senses: a computational response 21
Not just collocates triples
<object, drink (v), tea (n)> parse the corpus some “unary relations”
<v+obj+ing, hear (v)> I hear him singing domain-based clues
<domain=computing, mouse (n)>
Collocates, Constructions, Domains = CoCoDo
Auckland 2012 Kilgarriff: Word senses: a computational response 22
Automatically extract CoCoDos from corpus How linked to senses?
Automatic (WSD techniques) Manual
“dictionary-free”: ideal for new dictionaries Labour costs
Mixed WSD with manual confirmation/correction
CorpusCorpus DictionaryDictionary
===
===
===
===
CoCoDoCoCoDo
Linking CoCoDo’s to senses
Auckland 2012 Kilgarriff: Word senses: a computational response 23
Semi-automatic dictionary drafting (SADD)
CoCoDo database Automatic clustering Lexicographer input More clustering
Dictionary with corpus inside
Auckland 2012 Kilgarriff: Word senses: a computational response 24
Automatic clustering of collocates Propose senses
Iterate: Lexicographer input
Confirm/reject/edit sense inventory Assigns collocates / corpus lines to senses
WSD Uses seeds to build full WSD for word Find more collocates for each sense
XML dictionary entry Load into dictionary-editing tool
Auckland 2012 Kilgarriff: Word senses: a computational response 25
Fits with Atkins method for bilingual lexicography Analyse source language
From corpus List all expressions that might possibly have a
non-predictable translation Very fine grained Lots of collocations
target-language-neutral; re-usable Translate Edit to finalise dictionary