35
An Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated Meanings Juan Pino Maxine Eskenazi

An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

An Application of Latent Semantic Analysis to

Word Sense Discrimination for Words withRelated and Unrelated Meanings

Juan Pino Maxine Eskenazi

Page 2: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Outline

Motivation

Experiment

Results

Conclusion and Future Work

Page 3: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Outline

Motivation

Experiment

Results

Conclusion and Future Work

Page 4: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Framework

◮ REAP: tutor for ESL vocabulary learning

◮ Authentic documents from the Web

◮ Uses cloze questions (fill-in-the-blank)

◮ Target words in the readings

◮ Questions about the target words

Page 5: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated
Page 6: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

“Investment collapses, ushering in a recession.”

Page 7: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated
Page 8: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

“John saw the man collapse from utter exhaustion and lack offood.”

Page 9: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Document: “Investment collapses, ushering in a recession.”

Question: “John saw the man collapse from utter exhaustion andlack of food.”

Page 10: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Other possibilities

◮ The building could collapse if there is an earthquake.(different meaning)

Page 11: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Other possibilities

◮ The building could collapse if there is an earthquake.(different meaning)

◮ Social services for these very poor people will collapse if moremoney isn’t provided by the central government. (relatedmeaning)

Page 12: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Other possibilities

◮ The building could collapse if there is an earthquake.(different meaning)

◮ Social services for these very poor people will collapse if moremoney isn’t provided by the central government. (relatedmeaning)

◮ The economy will collapse completely if this fightingcontinues. (same meaning)

Page 13: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Goal

Control how the meaning of a target word in a question relates tothe meaning of this target word in a document

Page 14: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Goal

Control how the meaning of a target word in a question relates tothe meaning of this target word in a document

Make the meanings match

Page 15: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

◮ Given a document d containing the word w with meaning m

◮ Given n questions q1, ..., qn about w with meanings m1, ...,mn

◮ Find the questions qi such that mi = m

Page 16: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

◮ Given a document d containing the word w with meaning m

◮ Given n questions q1, ..., qn about w with meanings m1, ...,mn

◮ Find the questions qi such that mi = m

◮ Simpler: find at least one question qi such that mi = m

Page 17: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Algorithms

◮ Lesk

◮ Latent Semantic Analysis (LSA)

◮ Suitable for discrimination rather than disambiguation

Page 18: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Lesk (one variant)

◮ Word w in sentence s.

◮ w has definitions d1, ..., dn in a dictionary.

◮ Select context c for w (e.g. s or words around w).

◮ Select the definition that has the most words in common withc.

Page 19: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Lesk for this task

◮ Word w in document d.

◮ w has questions q1, ..., qn.

◮ Select contexts cd and cq for w in d and in q.

◮ Select the question that has the most words in common withc.

Page 20: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Latent Semantic Analysis (LSA)

◮ Challenge: the questions are a short context

◮ LSA: initially LSI, used to match (short) queries to documents

Page 21: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

LSA Description

◮ X : term-document matrix (e.g. Xi ,j = 1 if word i appears indocument j)

◮ X = U · S · V T (SVD theorem)

◮ UT· U = Im and V T

· V = In

◮ U is truncated, replaced by Ur , V replaced by Vr , S replacedby Sr

◮ X ≃ Ur · Sr · VTr

◮ Similarity between d and q: compute cosine similarity betweenS−1

r · UTr · d and S−1

r · UTr · q

Page 22: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Motivation

◮ Cope with sparsity

◮ Remove noise

Page 23: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Motivation

◮ Cope with sparsity◮ Compute similarity in another space

◮ Remove noise◮ truncated SVD

Page 24: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Experimental Goal

◮ Compare LSA performance for words with related andunrelated meanings to a random baseline and Lesk

◮ Example (related meaning): comprise◮ Example (unrelated meaning): bank

Page 25: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Outline

Motivation

Experiment

Results

Conclusion and Future Work

Page 26: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Setup

◮ 62 manually generated cloze questions

◮ 16 target words (e.g. bond, collapse, compound, comprise,etc.)

◮ Senses manually annotated (WordNet)

◮ Web documents containing target word/sense gathered

Page 27: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Factors Studied

◮ Words with related and unrelated meanings vs. with unrelatedmeanings only

◮ Truncation parameter

◮ Context selection

Page 28: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Outline

Motivation

Experiment

Results

Conclusion and Future Work

Page 29: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Baseline

◮ Random Baseline: 44%

◮ Lesk Algorithm: 61%

◮ Lesk preprocessing◮ lower casing◮ stemming◮ stop words and punctuation discarded◮ window: 10 words around the target in the question◮ window: 2 sentences around the target in the document

Page 30: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

LSA

0

0.2

0.4

0.6

0.8

1

60 80 100 120 140 160 180

Acc

urac

y

Truncation Parameter

Accuracy vs. Dimension Reduction for Related Meanings with Different Contexts

whole documentselected context

Lesk baseline

◮ words with related and unrelated meanings

◮ LSA better only with certain truncation parameter values

Page 31: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

LSA

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25 30 35

Acc

urac

y

Truncation Parameter

Accuracy vs. Dimension Reduction for Unrelated Meanings with Different Contexts

whole documentselected context

Lesk baseline

◮ words with unrelated meanings

◮ context selection plays important role

Page 32: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Outline

Motivation

Experiment

Results

Conclusion and Future Work

Page 33: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Conclusion and Future Work

◮ Comparison performance of Lesk, LSA and a random baseline

◮ Factors: truncation parameter, context selection and type(unrelated/related vs. unrelated)

◮ Future work: Comparison of LSA with other second-orderrepresentations of documents and questions

◮ Future work: use WSD to rank definitions

Page 34: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated
Page 35: An Application of Latent Semantic Analysis to Word Sense ...tetreaul/06_juan.pdfAn Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated

Thank you. Questions?

Words and questions with annotated senses available atwww.cs.cmu.edu/∼jmpino/questions.xls