32
600.465 - Intro to NLP - J. Eisner 1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Embed Size (px)

Citation preview

Page 1: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 1

Splitting Words

a.k.a. “Word Sense Disambiguation”

Page 2: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 3: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 4: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 5: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 6: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 7: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 8: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Page 9: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 9

Representing Word as Vector

Could average over many occurrences of the word ...

Each word type has a different vector? Each word token has a different vector? Each word sense has a different vector?

(for this one, we need sense-tagged training data) (is this more like a type vector or a token vector?)

What is each of these good for?

Page 10: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 10

Each word type has a different vector

We saw this yesterday It’s good for grouping words

similar semantics? similar syntax? depends on how you build the vector

Page 11: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 11

Each word token has a different vector

Good for splitting words - unsupervised WSD

Cluster the tokens: each cluster is a sense!

have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going

in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party , who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt

that had been passed to a second party who made a financial decision A future obliges each party to the contract to fulfil it by

Page 12: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 12

Each word sense has a different vector

Represent each new word token as vector, too Now assign each token the closest sense

(could lump together all tokens of the word in the same document: assume they all have same sense)

have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going

let you know that there’s a party at my house tonight. Directions: Drive

in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party , who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt

?

?

Page 13: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 13

Where can we get sense-labeled training data?

To do supervised WSD, need many examples of each sense in context

have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going

let you know that there’s a party at my house tonight. Directions: Drive

in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party , who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt

?

?

Page 14: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 14

Where can we get sense-labeled training data?

Sources of sense-labeled training text: Human-annotated text - expensive Bilingual text (Rosetta stone) – can figure out which

sense of “plant” is meant by how it translates

Dictionary definition of a sense is one sample context

Roget’s thesaurus entry of a sense is one sample context

hardly any data per sense – but we’ll use it later to get unsupervised training started

To do supervised WSD, need many examples of each sense in context

Page 15: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 15

A problem with the vector model Bad idea to treat all context positions equally:

Possible solutions: Faraway words don’t count as strongly? Words in different positions relative to plant

are different elements of the vector? (i.e., (pesticide, -1) and (pesticide,+1)

are different features) Words in different syntactic relationships to

plant are different elements of the vector?

Page 16: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 16

Just one cue is sometimes enough ...

slide courtesy of D. Yarowsky (modified)

Page 17: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 17

An assortment of possible cues ...

slide courtesy of D. Yarowsky (modified)

generates a whole bunch of potential

cues – use data to find out which ones work

best

Page 18: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 18

An assortment of possible cues ...

slide courtesy of D. Yarowsky (modified)

merged rankingof all cues

of all these types

only a weak cue ...

but we’ll trust it if there’s nothing better

Page 19: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 19

Final decision list for lead (abbreviated)

slide courtesy of D. Yarowsky (modified)

To disambiguate a token of lead : Scan down the sorted list The first cue that is found

gets to make the decision all by itself

Not as subtle as combining cues, but works well for WSD

Cue’s score is its log-likelihood ratio: log [ p(cue | sense A) [smoothed] / p(cue | sense B) ]

Page 20: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky (modified)

very readable paper at http://cs.jhu.edu/~yarowsky/acl95.pssketched on the following slides ...

unsupervised learning!

Page 21: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 21

First, find a pair of “seed words”that correlate well with the 2 senses

unsupervised learning!

If “plant” really has 2 senses, it should appear in 2 dictionary entries: Pick content words from those 2 thesaurus entries: Pick synonyms from those 2 different clusters of documents: Pick representative

words from those 2 translations in parallel text: Use the translations as

seed words

Or just have a human name the seed words(maybe from a list of words that occur unusually often near “plant”)

make that “minimally supervised”

Page 22: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 22

target word:plant

table taken from Yarowsky (1995)

Yarowsky’s bootstrapping algorithm

(life, manufacturing)

life(1%)

manufacturing(1%)

98%

unsupervised learning!

Page 23: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 23

figure taken from Yarowsky (1995)

Yarowsky’s bootstrapping algorithm

(life, manufacturing)

Learn a classifier that distinguishes A from B. It will notice features like

“animal” A, “automate” B.

unsupervised learning!

Page 24: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 24

no surprise what the top

cues are

but other cues also good for discriminating

these seed examples

table taken from Yarowsky (1995)

Yarowsky’s bootstrapping algorithm

(life, manufacturing)

Learn a classifier that distinguishes A from B. It will notice features like

“animal” A, “automate” B.

unsupervised learning!

Page 25: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 25

slide courtesy of D. Yarowsky (modified)

the strongest of the new cues help us classify more examples ...

from which we can extract and rank even more cues that discriminate them ...

unsupervised learning!

(life, manufacturing)

Page 26: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 26

figure taken from Yarowsky (1995)

(life, manufacturing)

Should be a good classifier, unless we accidentally learned some bad cues along the way that polluted

the original sense distinction.

unsupervised learning!

Page 27: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

unsupervised learning!

life and manufacturing are no longer even in the top cues!many unexpected cues were extracted, without supervised training

slide courtesy of D. Yarowsky (modified)

Now use the final decision list to classify test examples:

top ranked cue

appearing in this test example

Page 28: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 28

“One sense per discourse”

A final trick: All tokens of plant in the same

document probably have the same sense.

slide courtesy of D. Yarowsky (modified)unsupervised learning!

3 tokens in same document gang up on the 4th

Page 29: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 29

“One sense per discourse”

A final trick: All tokens of plant in the same

document probably have the same sense.

slide courtesy of D. Yarowsky (modified)unsupervised learning!

Page 30: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 30

these stats come from

term papers of known

authorship

(i.e., supervised training)

A Note on Combining Cuesslide courtesy of D. Yarowsky (modified)

Page 31: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 31

A Note on Combining Cuesslide courtesy of D. Yarowsky (modified)

“Naive Bayes” model for classifying text

(Note the naive independence assumptions!)

We’ll look at it again in a later lecture

Would this kind of sentence be more typical of a student

A paper or a student B paper?

1

1

2

2

Page 32: 600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner 32

“Naive Bayes” model for classifying text

Used here for word sense disambiguation

A Note on Combining Cuesslide courtesy of D. Yarowsky (modified)

plantA

plantB

plantB

plantB

plantB plantB plantB

plantA

plantA

plantA plantA plantA

Would this kind of sentence be more typical of a plant A context or a plant B

context?

1

1

2

2