25
1 Word senses: a computational response Adam Kilgarriff Auckland 2012 Kilgarriff: Word senses: a computational response

1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response

Embed Size (px)

Citation preview

1

Word senses: a computational response

Adam Kilgarriff

Auckland 2012 Kilgarriff: Word senses: a computational response

Auckland 2012 Kilgarriff: Word senses: a computational response 2

My PhD (in 5 slides)

What is a word sense

Auckland 2012 Kilgarriff: Word senses: a computational response 3

The lexicographers

They create them Methods

Introspection Other dictionaries Corpus

Atkins, Hanks, Krishnamurthy

Auckland 2012 Kilgarriff: Word senses: a computational response 4

What is a word sense (1)

SFIP Sufficiently frequent insufficiently

predictable (a glass of) whisky x (a glass of) tequila

Auckland 2012 Kilgarriff: Word senses: a computational response 5

What is a word sense (2)

homonymy

analogy polysemy rules

collocation

Auckland 2012 Kilgarriff: Word senses: a computational response 6

What is a word sense (3)

A cluster Of instances of use

Operationalised as: corpus lines Clustered by lexicographers

Auckland 2012 Kilgarriff: Word senses: a computational response 7

What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response 8

What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response 9

What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response 10

What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response 11

What is a word sense (3)

A cluster Of instances of use

Operationalised as: corpus lines Clustered by lexicographers

Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting

Auckland 2012 Kilgarriff: Word senses: a computational response 12

I don’t believe in word senses

Believe in: resurrection ghost witch vampire god

miracle fairy Philosophy:

Ontological commitment (same meaning different register)

“good entities to build belief systems on”

Auckland 2012 Kilgarriff: Word senses: a computational response 13

A word sense is a cluster of corpus lines

But I’m an NLP person Automatic clustering? Inspiration:

Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999

You can get semantic sense from corpora+stats

Auckland 2012 Kilgarriff: Word senses: a computational response 14

First attempt

Longman 1994 Abject failure

No grammar Corpus too small and noisy Naïve clustering Useless programmer

Auckland 2012 Kilgarriff: Word senses: a computational response 15

Collocations

Easy Most words don’t go with most other

words Then build on what we can do well metaphor, analogy, homonymy,

rules all much harder

Auckland 2012 Kilgarriff: Word senses: a computational response 16

Clustering

Word sketch Collocates organised by grammar

Dictionary Collocates (and other things) organised

by meaning How to re-organise

Auckland 2012 Kilgarriff: Word senses: a computational response 17

Observation:

corpus: arbitrary sample dictionary (=lexicon): systematic

account

Children encounter arbitrary samples develop systematic account

Auckland 2012 Kilgarriff: Word senses: a computational response 18

Corpus provisional, dispensable used to develop lexicon

Auckland 2012 Kilgarriff: Word senses: a computational response 19

Levels of abstraction

Direct linkage:

Fragile Updates (to C or D) break links

Dictionary: abstract Corpus: raw

Intermediate level needed

CorpusCorpus DictionaryDictionary

===

===

Auckland 2012 Kilgarriff: Word senses: a computational response 20

How most automatic word sense disambiguation (WSD) works Analyse dictionary to give set of

collocates Match to collocates in a corpus

Dispensable corpus

CorpusCorpus DictionaryDictionary

===

===

===

===

CollocatesCollocates

Auckland 2012 Kilgarriff: Word senses: a computational response 21

Not just collocates triples

<object, drink (v), tea (n)> parse the corpus some “unary relations”

<v+obj+ing, hear (v)> I hear him singing domain-based clues

<domain=computing, mouse (n)>

Collocates, Constructions, Domains = CoCoDo

Auckland 2012 Kilgarriff: Word senses: a computational response 22

Automatically extract CoCoDos from corpus How linked to senses?

Automatic (WSD techniques) Manual

“dictionary-free”: ideal for new dictionaries Labour costs

Mixed WSD with manual confirmation/correction

CorpusCorpus DictionaryDictionary

===

===

===

===

CoCoDoCoCoDo

Linking CoCoDo’s to senses

Auckland 2012 Kilgarriff: Word senses: a computational response 23

Semi-automatic dictionary drafting (SADD)

CoCoDo database Automatic clustering Lexicographer input More clustering

Dictionary with corpus inside

Auckland 2012 Kilgarriff: Word senses: a computational response 24

Automatic clustering of collocates Propose senses

Iterate: Lexicographer input

Confirm/reject/edit sense inventory Assigns collocates / corpus lines to senses

WSD Uses seeds to build full WSD for word Find more collocates for each sense

XML dictionary entry Load into dictionary-editing tool

Auckland 2012 Kilgarriff: Word senses: a computational response 25

Fits with Atkins method for bilingual lexicography Analyse source language

From corpus List all expressions that might possibly have a

non-predictable translation Very fine grained Lots of collocations

target-language-neutral; re-usable Translate Edit to finalise dictionary