Latent Semantic Indexing Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Latent Semantic IndexingLatent Semantic Indexing

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

AdministrationAdministration

Example analogies…Example analogies…

And-or ProofAnd-or Proof

out(out(xx) = g(sum) = g(sumk k wwk k xxkk))

ww11=10, w=10, w22=10, w=10, w33=-10=-10 xx1 1 ++ xx2 2 ++ ~x~x33

Sum for 110?Sum for 110?

Sum for 001?Sum for 001?

Generally? Generally? bb=110, 20 -10 sum=110, 20 -10 sumi i |b|bii-x-xii||

What happens if we setWhat happens if we set

ww00=10? =10?

ww0 0 =-15?=-15?

LSI Background ReadingLSI Background Reading

Landauer, Laham, Foltz (1998). Landauer, Laham, Foltz (1998). Learning human-like knowledge by Learning human-like knowledge by Singular Value Decomposition: A Singular Value Decomposition: A Progress Report. Progress Report. Advances in Advances in Neural Information Processing Neural Information Processing Systems Systems 10, (pp. 44-51)10, (pp. 44-51)

http://lsa.colorado.edu/papers/nips.http://lsa.colorado.edu/papers/nips.psps

OutlineOutline

Linear nets, autoassociationLinear nets, autoassociation

LSI: Cross between IR and NNsLSI: Cross between IR and NNs

Purely Linear NetworkPurely Linear Network

xx11

hh11

xx22 xx33 xxDD…

hh22 hhkk…

outout

W (nxk)W (nxk)

U (kx1)U (kx1)

What Does It Do?What Does It Do?

out(out(xx) = sum) = sumj j (sum(sumi i xxi i WWijij) U) Ujj

= sum= sumii x xi i (sum(sumj j WWijij U Uj j ))

xx11

outout

xx22 xx33 xxDD…

W’ (nx1)W’ (nx1)W’W’ii==sumsumj j WWijij U Ujj

Can Other Layers Help?Can Other Layers Help?

xx11

hh11

xx22 xx33 xx44

hh22

U (nxk)U (nxk)

V (kxn)V (kxn)

outout11 outout22 outout33 outout44

AutoassociatorAutoassociator

xx1 1 xx2 2 xx3 3 xx44

1 0 0 01 0 0 0

0 1 0 00 1 0 0

0 0 1 00 0 1 0

0 0 0 10 0 0 1

hh1 1 hh22

0 00 0

0 10 1

1 11 1

1 01 0

yy1 1 yy2 2 yy3 3 yy44

1 0 0 01 0 0 0

0 1 0 00 1 0 0

0 0 1 00 0 1 0

0 0 0 10 0 0 1

ApplicationsApplications

Autoassociators have been used for Autoassociators have been used for data compression, feature data compression, feature discovery, and many other tasks.discovery, and many other tasks.

U matrix encodes the inputs into k U matrix encodes the inputs into k featuresfeatures

How train?How train?

SVDSVD

Singular value decompositionSingular value decomposition provides another method, from provides another method, from linear algebra.linear algebra.

Training data M is nxm (input Training data M is nxm (input features by examples)features by examples)

M = U M = U 22k k VVTT

UUTTU = I, VU = I, VTTV = I, V = I, diagonal diagonal

Dimension ReductionDimension Reduction

Finds Finds least squares bestleast squares best U (nxk, free k) U (nxk, free k)

Rows of U map input features to encoded Rows of U map input features to encoded features (instance is sum)features (instance is sum)

Closely related toClosely related to• symm. eigenvalue decomposition,symm. eigenvalue decomposition,• factor analysisfactor analysis• principle component analysisprinciple component analysis

Subroutine in many math packages.Subroutine in many math packages.

SVD ApplicationsSVD Applications

EigenfacesEigenfaces

HandwritingHandwriting recognitionrecognition

Text Text applications…applications…

LSI/LSALSI/LSA

Latent semantic indexingLatent semantic indexing is the is the application of SVD to IR.application of SVD to IR.

Latent semantic analysisLatent semantic analysis is the more is the more general term.general term.

Features are words, examples are text Features are words, examples are text passages.passages.

LatentLatent: Not visible on the surface: Not visible on the surface

SemanticSemantic: Word meanings: Word meanings

Running LSIRunning LSI

Learns new word representations!Learns new word representations!

Trained on:Trained on:• 20,000-60,000 words20,000-60,000 words• 1,000-70,000 passages1,000-70,000 passages

Use k=100-350 hidden unitsUse k=100-350 hidden units

Similarity between vectors computed Similarity between vectors computed as as cosinecosine..

Step by StepStep by Step

1.1. MMij ij rows are words, columns are rows are words, columns are passages: filled w/ countspassages: filled w/ counts

2.2. Transformation of matrix:Transformation of matrix:

3.3. SVD computed: M=USVD computed: M=UVVTT

4.4. Best k components of rows of U Best k components of rows of U kept as word representations.kept as word representations.

log(Mlog(Mijij+1)+1)

-sum-sumj j ((M((Mijij/sum/sumjjMMijij)log(M)log(Mijij/sum/sumjjMMijij))

Geometric ViewGeometric View

Words embedded in high-d space.Words embedded in high-d space.

examexamtesttest

fishfish

0.420.420.020.02

0.010.01

Comparison to VSMComparison to VSM

A:A:TheThe feline climbed upon feline climbed upon thethe roof roof

B:B:AA cat leapt onto cat leapt onto aa house house

C:C:TheThe final will be on final will be on aa Thursday Thursday

How similar?How similar?• Vector space model: sim(A,B)=0Vector space model: sim(A,B)=0• LSI: sim(A,B)=.49>sim(A,C)=.45LSI: sim(A,B)=.49>sim(A,C)=.45

Non-zero sim with no words in common by Non-zero sim with no words in common by overlap in reduced representation.overlap in reduced representation.

What Does LSI Do?What Does LSI Do?

Let’s send it to school…Let’s send it to school…

Plato’s ProblemPlato’s Problem

77thth grader learns 10-15 new words today, grader learns 10-15 new words today, fewer than 1 by direct instruction. fewer than 1 by direct instruction. Perhaps 3 were even encountered. How Perhaps 3 were even encountered. How can this be?can this be?

PlatoPlato: You already knew them.: You already knew them.

LSALSA: Many weak relationships combined : Many weak relationships combined (data to back it up!)(data to back it up!)

Rate comparable to students.Rate comparable to students.

VocabularyVocabulary

TOEFL synonym testTOEFL synonym testChoose alternative with highest Choose alternative with highest

similarity score.similarity score.LSA correct on 64% of 80 items.LSA correct on 64% of 80 items.Matches avg applicant to US college. Matches avg applicant to US college.

Mistakes correlate w/ people Mistakes correlate w/ people (r=.44).(r=.44).

best solo measure of intelligencebest solo measure of intelligence

Multiple Choice ExamMultiple Choice Exam

Trained on psych textbook.Trained on psych textbook.

Given same test as students.Given same test as students.

LSA 60% lower than average, but LSA 60% lower than average, but passes.passes.

Has trouble with “hard” ones.Has trouble with “hard” ones.

Essay TestEssay Test

LSA can’t write.LSA can’t write.If you can’t do, judge.If you can’t do, judge.

Students write essays, LSA trained on Students write essays, LSA trained on related text.related text.

Compare similarity and length with graded Compare similarity and length with graded essays (labeled).essays (labeled).

Cosine weighted average of top 10. Cosine weighted average of top 10. Regression to combine sim and len.Regression to combine sim and len.

Correlation: .64-.84. Better than human. Correlation: .64-.84. Better than human. Bag of words!?Bag of words!?

Digit RepresentationsDigit Representations

Look at similarities of all pairs from Look at similarities of all pairs from one one to to ninenine..

Look at best fit of these similarities Look at best fit of these similarities in one dimension: they come out in in one dimension: they come out in order!order!

Similar experiments with cities in Similar experiments with cities in Europe in two dimensions.Europe in two dimensions.

Word SenseWord Sense

The chemistry student knew this was The chemistry student knew this was not a good time to forget how to not a good time to forget how to calculate volume and calculate volume and massmass..

heavy? .21heavy? .21

church? .14church? .14

LSI picks best p<.001LSI picks best p<.001

More TestsMore Tests

• AntonymsAntonyms just as similar as syns. just as similar as syns. (Cluster analysis separates.)(Cluster analysis separates.)

• LSA correlates .50 with children and .32 LSA correlates .50 with children and .32 with adults on with adults on word sortingword sorting (misses (misses grammatical classification).grammatical classification).

• PrimingPriming, , conjunction errorconjunction error: similarity : similarity correlates with strength of effectcorrelates with strength of effect

Conjunction ErrorConjunction Error

Linda is a young woman who is single, Linda is a young woman who is single, outspoken…deeply concerned with outspoken…deeply concerned with issues of discrimination and social issues of discrimination and social justicejustice

Is Linda a feministic bank teller?Is Linda a feministic bank teller?

Is Linda a bank teller?Is Linda a bank teller?

80% rank former has higher. Can’t be!80% rank former has higher. Can’t be!Pr(f bt | Linda) = Pr(bt | Linda) Pr(f | Linda, bt)Pr(f bt | Linda) = Pr(bt | Linda) Pr(f | Linda, bt)

LSApplicationsLSApplications

1.1. Improve IR.Improve IR.2.2. Cross-language IR. Train on Cross-language IR. Train on

parallel collection.parallel collection.3.3. Measure text coherency.Measure text coherency.4.4. Use essays to pick educational Use essays to pick educational

text.text.5.5. Grade essays.Grade essays.Demos at Demos at http://LSA.colorado.http://LSA.colorado.eduedu

AnalogiesAnalogies

Compare difference vectors: Compare difference vectors: geometric instantiation of geometric instantiation of relationship.relationship.

dogdogbarkbark

cowcowmoomoo

0.340.340.700.70

LSA Motto? LSA Motto? (AT&T Cafeteria)(AT&T Cafeteria)

suckssuckssyntaxsyntax

What to LearnWhat to Learn

Single output multiple layer linear Single output multiple layer linear nets compute the same as single nets compute the same as single output single layer linear nets.output single layer linear nets.

Autoassociation finds encodings.Autoassociation finds encodings.

LSI is the application of this idea to LSI is the application of this idea to text.text.

Homework 10 (due 12/12)Homework 10 (due 12/12)

1.1. Describe a procedure for converting a Describe a procedure for converting a Boolean formula in CNF (n variables, m Boolean formula in CNF (n variables, m clauses) into an equivalent backprop clauses) into an equivalent backprop network. How many hidden units does it network. How many hidden units does it have?have?

2.2. A key issue in LSI is picking “k”, the A key issue in LSI is picking “k”, the number of dimensions. Let’s say we had number of dimensions. Let’s say we had a set of 10,000 passages. Explain how a set of 10,000 passages. Explain how we could combine the idea of cross we could combine the idea of cross validation and autoassociation to select validation and autoassociation to select a good value for k.a good value for k.

Documents

Latent Semantic Indexing Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001