48
Lecture 6: Vector Space Model Kai-Wei Chang CS @ University of Virginia [email protected] Couse webpage: http://kwchang.net/teaching/NLP16 1 6501 Natural Language Processing

Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Lecture 6: Vector Space Model

Kai-Wei ChangCS @ University of Virginia

[email protected]

Couse webpage: http://kwchang.net/teaching/NLP16

16501 Natural Language Processing

Page 2: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

This lecture

v How to represent a word, a sentence, or a document?

v How to infer the relationship among words?

v We focus on “semantics”: distributional semantics

v What is the meaning of “life”?

26501 Natural Language Processing

Page 3: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

6501 Natural Language Processing 3

Page 4: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

How to represent a word

vNaïve way: represent words as atomic symbols: student, talk, universityvN-germ language model, logical analysis

vRepresent word as a “one-hot” vector[ 0 0 0 1 0 … 0 ]

vHow large is this vector?vPTB data: ~50k, Google 1T data: 13M

v 𝑣 ⋅ 𝑢 =?

6501 Natural Language Processing 4

eggstudenttalkuniversity happybuy

Page 5: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Issues?

vDimensionality is large; vector is sparsevNo similarity

vCannot represent new wordsvAny idea?

6501 Natural Language Processing 5

𝑣'())* =[00010…0 ]𝑣+(,=[00100…0]𝑣-./0 = [10000…0]

𝑣'())* ⋅ 𝑣+(, = 𝑣'())* ⋅ 𝑣-./0 =0

Page 6: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Idea 1: Taxonomy (Word category)

6501 Natural Language Processing 6

Page 7: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

What is “car”?

>>>fromnltk.corpusimportwordnet aswn>>>wn.synsets('motorcar')[Synset('car.n.01')]

6501 Natural Language Processing 7

>>>motorcar.hypernyms()[Synset('motor_vehicle.n.01')]>>>paths=motorcar.hypernym_paths()>>>[synset.name()for synsetin paths[0]]

['entity.n.01','physical_entity.n.01','object.n.01','whole.n.02','artifact.n.01', 'instrumentality.n.03','container.n.01','wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01','motor_vehicle.n.01','car.n.01']

>>>[synset.name()for synsetin paths[1]]['entity.n.01','physical_entity.n.01','object.n.01','whole.n.02','artifact.n.01', 'instrumentality.n.03','conveyance.n.03','vehicle.n.01','wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01','motor_vehicle.n.01','car.n.01']

Page 8: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Word similarity?

6501 Natural Language Processing 8

>>>right=wn.synset('right_whale.n.01')>>>minke =wn.synset('minke_whale.n.01')>>>orca=wn.synset('orca.n.01')>>>tortoise=wn.synset('tortoise.n.01')>>>novel=wn.synset('novel.n.01')

>>>right.lowest_common_hypernyms(minke)[Synset('baleen_whale.n.01')]>>>right.lowest_common_hypernyms(orca)[Synset('whale.n.02')]>>>right.lowest_common_hypernyms(tortoise)[Synset('vertebrate.n.01')]>>>right.lowest_common_hypernyms(novel)[Synset('entity.n.01')]

Requirehumanlabor

Page 9: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Taxonomy (Word category)

vSynonym, hypernym (Is-A), hyponym

6501 Natural Language Processing 9

Page 10: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Idea 2: Similarity = Clustering

6501 Natural Language Processing 10

Page 11: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Cluster n-gram model

vCan be generated from unlabeled corporavBased on statistics, e.g., mutual information

6501 Natural Language Processing 11

Implementation oftheBrown hierarchical wordclustering algorithm.PercyLiang

Page 12: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Idea 3: Distributional representation

v Linguistic items with similar distributions have similar meaningsv i.e., words occur in the same contexts

⇒ similar meaning

6501 Natural Language Processing 12

"a word is characterized by the company it keeps” --Firth, John, 1957

Page 13: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Vector representation (word embeddings)v Discrete ⇒ distributed representations v Word meanings are vector of “basic concept”

vWhat are “basic concept”?vHow to assign weights?vHow to define the similarity/distance?

6501 Natural Language Processing 13

𝑣0.23 = [0.8 0.9 0.1 0 …]𝑣45662 = [0.8 0.1 0.8 0 …]𝑣())/* = [0.1 0.2 0.1 0.8 …]

royaltymasculinity femininity eatable

Page 14: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

An illustration of vector space model

6501 Natural Language Processing 14

Masculine

Eatable

Royalty

w4

w2

W1W5

w3

|D2-D4|

Page 15: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Semantic similarity in 2D

vHome depot product

6501 Natural Language Processing 15

Page 16: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Capture the structure of words

vExample from GloVe

6501 Natural Language Processing 16

Page 17: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

How to use word vectors?

6501 Natural Language Processing 17

Page 18: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Pre-trained word vectors

vGoogle Bookhttps://code.google.com/archive/p/word2vecv100 billion tokens, 300 dimension, 3M words

vGlove projecthttp://nlp.stanford.edu/projects/glove/vPre-trained word vectors of Wiki (6B), web

crawl data (840B), twitter (27B)

6501 Natural Language Processing 18

Page 19: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Distance/similarity

vVector similarity measure⇒ similarity in meaning

vCosine similarity v cos 𝑢, 𝑣 = 5⋅;

||5||⋅||;||

vWord vector are normalized by length

vEuclidean distance ||𝑢 − 𝑣||>

v Inner product 𝑢 ⋅ 𝑣vSame as cosine similarity if vectors are

normalized

6501 Natural Language Processing 19

5||5||

isaunitvector

Page 20: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Distance/similarity

vVector similarity measure⇒ similarity in meaning

vCosine similarity v cos 𝑢, 𝑣 = 5⋅;

||5||⋅||;||

vWord vector are normalized by length

vEuclidean distance ||𝑢 − 𝑣||>

v Inner product 𝑢 ⋅ 𝑣vSame as cosine similarity if vectors are

normalized

6501 Natural Language Processing 20

LinguisticRegularities inSparseandExplicitWordRepresentations, Levy Goldberg, CoNLL 14

Choosing the right similarity metric is important

Page 21: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Word similarity DEMO

v http://msrcstr.cloudapp.net/

6501 Natural Language Processing 21

Page 22: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Word analogy

v 𝑣-(2 − 𝑣?@-(2 + 𝑣52B/6 ∼ 𝑣(52D

6501 Natural Language Processing 22

Page 23: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

From words to phrases

6501 Natural Language Processing 23

Page 24: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Neural Language Models

6501 Natural Language Processing 24

Page 25: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

How to “learn” word vectors?

6501 Natural Language Processing 25

Whatare“basicconcept”?Howtoassignweights?Howtodefinethesimilarity/distance?Cosinesimilarity

Page 26: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Back to distributional representation

vEncode relational data in a matrixv Co-occurrence (e.g., from a general corpus)

v Bag-of-word model: documents (clusters) as the basis for vector space

6501 Natural Language Processing 26

Page 27: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Back to distributional representation

vEncode relational data in a matrixv Co-occurrence (e.g., from a general corpus)v Skip-grams

6501 Natural Language Processing 27

Page 28: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Back to distributional representation

vEncode relational data in a matrixv Co-occurrence (e.g., from a general corpus)v Skip-gramsv From taxonomy (e.g., WordNet, thesaurus)

6501 Natural Language Processing 28

joy gladden sorrow sadden goodwill

Group 1:“joyfulness” 1 1 0 0 0

Group2:“sad” 0 0 1 1 0

Group3:“affection” 0 0 0 0 1

Input: Synonyms from a thesaurusJoyfulness: joy, gladdenSad: sorrow, sadden

Page 29: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Back to distributional representation

vEncode relational data in a matrixv Co-occurrence (e.g., from a general corpus)v Skip-gramsv From taxonomy (e.g., WordNet, thesaurus)

6501 Natural Language Processing 29

joy gladden sorrow sadden goodwill

Group 1:“joyfulness” 1 1 0 0 0

Group2:“sad” 0 0 1 1 0

Group3:“affection” 0 0 0 0 1

Input: Synonyms from a thesaurusJoyfulness: joy, gladdenSad: sorrow, sadden Cosinesimilarity?

Prosandcons?

Page 30: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Problems?

v Number of basic concepts is largev Basis is not orthogonal

(i.e., not linearly independent)v Some function words are too frequent (e.g., the)

vSyntax has too much impactvE.g, TF-IDF can be appliedvE.g, skip-gram: scaling by distance to target

6501 Natural Language Processing 30

Page 31: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Latent Semantic Analysis (LSA)

vData representationvEncode single-relational data in a matrix

v Co-occurrence (e.g., document-term matrix, skip-gram)

v Synonyms (e.g., from a thesaurus)

vFactorizationvApply SVD to the matrix to find latent

components

6501 Natural Language Processing 31

Page 32: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Principle Component Analysis (PCA)

vDecompose the similarity space into a set of orthonormal basis vectors

6501 Natural Language Processing 32

Page 33: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Principle Component Analysis (PCA)

vDecompose the similarity space into a set of orthonormal basis vectors

vFor an 𝑚×𝑛 matrix 𝐴 , there exists a factorization such that

𝐴 = 𝑈Σ𝑉L

v𝑈, 𝑉 are orthogonal matrices

6501 Natural Language Processing 33

Page 34: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Low-rank Approximation

v Idea: store the most important information in a small number of dimensions (e.g,. 100-1000)

v SVD can be used to compute optimal low-rank approximation

v Set smallest n-r singular value to zero

v Similar words map to similar location in low dimensional space

6501 Natural Language Processing 34

Page 35: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Latent Semantic Analysis (LSA)

vFactorizationvApply SVD to the matrix to find latent

components

6501 Natural Language Processing 35

Page 36: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

LSA example

vOriginal matrix C

6501 Natural Language Processing 36

ExamplefromChristopherManningandPandu Nayak,introduction toIR

Page 37: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

LSA example

vSVD: 𝐶 = 𝑈Σ𝑉L

6501 Natural Language Processing 37

Page 38: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

LSA example

vOriginal matrix CvDimension reduction 𝐶 ∼ 𝑈Σ𝑉L

6501 Natural Language Processing 38

Page 39: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

LSA example

v Original matrix 𝐶 v.s. reconstructed matrix𝐶>

v What is the similarity between ship and boat?

6501 Natural Language Processing 39

Page 40: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Word vectors

𝐶 ∼ 𝑈Σ𝑉L

𝐶𝐶L ∼ 𝑈Σ𝑉L × 𝑈Σ𝑉L L= 𝑈Σ𝑉L×𝑉ΣL𝑈L= 𝑈ΣΣL𝑈L (why?)= 𝑈Σ UΣ L

v 𝐶:,+'.) ⋅ 𝐶:,P@(D∼ 𝑈Σ :,+'.) ⋅ 𝑈Σ :,P@(D

6501 Natural Language Processing 40

Page 41: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Why we need low rank approximation?

vKnowledge base (e.g., thesaurus) is never complete

vNoise reduction by dimension reduction v Intuitively, LSA brings together “related”

axes (concepts) in the vector spacevA compact model

6501 Natural Language Processing 41

Page 42: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

All problem solved?

6501 Natural Language Processing 42

Page 43: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

An analogy game

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NIPS 16

6501 Natural Language Processing 43

Page 44: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Continuous Semantic Representations

sunnyrainy

windycloudy

car

wheel

cab sad

joy

emotion

feeling

6501 Natural Language Processing 44

Page 45: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Semantics Needs More Than Similarity

Tomorrow will be rainy.

Tomorrow will be sunny.

𝑠𝑖𝑚𝑖𝑙𝑎𝑟(rainy, sunny)?

𝑎𝑛𝑡𝑜𝑛𝑦𝑚(rainy, sunny)?

6501 Natural Language Processing 45

Page 46: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Continuous representations for entities

6501 Natural Language Processing 46

?

MichelleObama

DemocraticParty

GeorgeWBush

LauraBush

RepublicParty

Page 47: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Continuous representations for entities

6501 Natural Language Processing 47

• Useful resources for NLP applications• Semantic Parsing & Question Answering • Information Extraction

Page 48: Lecture 6: Vector Space Model - Computer Sciencekc2wc/teaching/NLP16/slides/06-VM.pdf · Latent Semantic Analysis (LSA) vData representation vEncode single-relational data in a matrix

Next lecture: a more flexible framework

vDirectly learn word vectors using NN modelvMore flexible vEasier to learn new wordsv Incorporate other informationvOptimize specific task loss.

vReview calculus!

6501 Natural Language Processing 48