Polarity Inducing Latent Semantic Analysis

Polarity Inducing Latent Semantic Analysis

Scott Wen-tau YihJoint work with Geoffrey Zweig & John PlattMicrosoft Research

A vector space model that can distinguish Antonyms from Synonyms!

Vector Space ModelText objects (e.g., words, phrases, sentences or documents) are represented as vectors

High-dimensional sparse term-vectorsConcept vectors from topic models or projection methodsConstructed compositionally from word vectors [Socher et al. 12]

Relations of the text objects are estimated by functions in the vector space

Relatedness is measured by some distance function (e.g., cosine)

qvcos()vq

vd

Applications of Vector Space ModelsDocument Level

Information Retrieval [Salton & McGill 83]Document Clustering [Deerwester et al. 90]Search Relevance Measurement [Baeza-Yates & Riberio-Neto ’99]Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11]

Word LevelLanguage modeling [Bellegarda 00]Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney 01; Turney & Littman 05; Agirre et al. 09; Reisinger & Mooney 10; Yih & Qazvinian 12]

Beyond General SimilarityExisting VSMs cannot distinguish finer relations

The “antonym” issue of distributional similarity

The co-occurrence or distributional hypothesesApply to near-synonyms, hypernyms and other semantically related words, including antonyms [Mohammad et al. 08]e.g., “hot” and “cold” occur in similar contexts

LSA does not solve the issueMight assign a high degree of similarity to opposites as well as synonyms [Landauer & Laham 98]

Approaches for Detecting AntonymsSeparate antonyms from distributionally

similar word pairs [Lin et al. 03] Patterns: “from X to Y”, “either X or Y”

WordNet graph [Harabagiu et al. 06]Synsets connected by is-a links and exactly one antonymy link

WordNet + affix rules + heuristics [Mohammad et al. 08]Distinguishing synonyms and antonyms is still perceived as a difficult open problem…[Poon & Domingos 09]

Our ContributionsPolarity Inducing Latent Semantic Analysis (PILSA)

A vector space model that encodes polarity informationSynonyms cluster together in this spaceAntonyms lie at the opposite ends of a unit spherehot

burning

coldfreezing

Our ContributionsPolarity Inducing Latent Semantic Analysis (PILSA)

A vector space model that encodes polarity informationSynonyms cluster together in this spaceAntonyms lie at the opposite ends of a unit sphere

Significantly improved the prediction accuracy on a benchmark GRE dataset ()

RoadmapIntroductionPolarity Inducing Latent Semantic Analysis

Basic constructionExtension 1: Improving accuracyExtension 2: Improving coverage

Experimental evaluationTask & datasetsResults

Conclusion

Input: A thesaurus (with synonyms & antonyms)

Create a “document”-term matrixEach group of words (synonyms and antonyms) is treated as a “document”

Induce polarity by making antonyms have negative weightsApply SVD as in regular Latent Semantic Analysis

The Core Method

Matrix ConstructionAcrimony: rancor, conflict, bitterness; goodwill, affectionAffection: goodwill, tenderness, fondness; acrimony, rancor

acrimony rancor goodwill affection …Group 1: “acrimony”

4.73 6.01 5.81 4.86 …

Group 2: “affection”

3.78 5.23 6.21 5.15 …

… … … … … …

Document: row-vector

Term: column-vector

TFIDF score

Matrix ConstructionAcrimony: rancor, conflict, bitterness; goodwill, affectionAffection: goodwill, tenderness, fondness; acrimony, rancor

acrimony rancor goodwill affection …Group 1: “acrimony”

4.73 6.01 -5.81 -4.86 …


-3.78 -5.23 6.21 5.15 …

… … … … … …

Inducing polarity

Cosine Score:

Effect of Inducing Polarityacrimony rancor goodwill affection

Group 1: “acrimony”

4.73 6.01 5.81 4.86


3.78 5.23 6.21 5.15



1 1 1 1


1 1 1 1

Cosine similarity = 1



1 1 1 1


1 1 1 1


Cannot distinguish antonyms from synonyms!



1 1 1 1


1 1 1 1

acrimony rancor goodwill affectionGroup 1: “acrimony”

1 1 -1 -1


-1 -1 1 1




1 1 1 1


1 1 1 1

acrimony rancor goodwill affectionGroup 1: “acrimony”

1 1 -1 -1


-1 -1 1 1

Cosine similarity = -1

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕T≈

𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛

𝐒words

Word similarity: cosine of two columns in SVD generalizes and smooths the original data

Uncovers relationships not explicit in the thesaurus

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕T≈

𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛

𝐒words

As , can be viewed as the projection matrix that maps the raw column-vector to the -dimensional latent space

Extension 1: Improve AccuracyRefine the projection matrix by discriminative training

S2Net [Yih et al. 11]: very similar to RankNet [Burges et al. 05] but focuses on learning concept vectors

𝒗𝒑𝒗𝒒𝑐𝑘𝑐1

𝑡1 𝑡𝑑𝐴𝑑×𝑘 𝑣𝑝=𝐴𝑇 𝑓 𝑝

𝑓 𝑠𝑖𝑚(𝑣𝑝 ,𝑣𝑞)

𝒇 𝒑

Applying S2NetTraining data: Antonym pairs from thesaurusInitialize model with the PILSA projection matrix

Learning objective: cosine score of antonyms should be lower than other word pairs

𝐿 ( Δ𝑖𝑗 ;𝐀 )= log (1+exp (−𝛾 Δ𝑖𝑗)¿)¿

-2 -1 0 1 205

101520

AntonymsOtherwordpairΔ𝑖𝑗≡ cos (𝐀 T 𝐟 𝑝𝑖 ,𝐀T 𝐟𝑞 𝑗 )− cos (𝐀T 𝐟 𝑝𝑖 ,𝐀 T 𝐟 𝑞𝑖)

Extension 2: Improve CoverageWhattodowithout-of-thesauruswords?

Some lexical variationsEncarta thesaurus contains “corruptible” and “corruption”, but not “corruptibility”

Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word

Rare or offensive wordse.g., “froward” and “moronic”

Embedding out-of-thesaurus words by leveraging a general corpus

Embedding Out-of-thesaurus WordsCreate a context vector space model using a collection of documents (e.g., Wikipedia)

Context: words within a window of [-10,10]Embed target word into the PILSA space by -NN

Find nearby in-thesaurus words in the context spaceRemove words with inconsistentpolarityUse the centroid of the corresponding PILSA vectors to represent the target word

Embedding Out-of-thesaurus WordsCreate a context vector space model using a collection of documents (e.g., Wikipedia)

Context: words within a window of [-10,10]Embed target word into the PILSA space by -NN

Context Vector Space

PILSA Space

sweltering

burning

hot

cold

RoadmapIntroductionPolarity Inducing Latent Semantic Analysis

Basic constructionExtension 1: Improving accuracyExtension 2: Improving coverage

Experimental evaluationTask & datasetsResults

Conclusion

Data for Building PILSA ModelsEncarta Thesaurus (for basic PILSA)

47k word categories (i.e., the “documents”)Vocabulary of 50k words125,724 pairs of antonyms

Wikipedia (for embedding out-of-thesaurus words)

Sentences from a Nov-2010 snapshot917M words after preprocessing

Experimental EvaluationTask: GRE closest-opposite questions

Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correctDev / Test: 162 / 950 questions [Mohammad et al. 08]Dev set is used for tuning the dimensionality of PILSA

Evaluation metricAccuracy: #correct / #total questionsQuestions with unresolved out-of-thesaurus target words are treated answered incorrectly

Results on Test Set

Lookup

Raw TF

IDFPIL

SA

PILSA

+S2Net

OOV Embe

dding

Moham

mad et

al. 08

0.50.550.6

0.650.7

0.750.8

0.85

0.56 0.57

0.740.77

0.8

0.64

ExamplesTarget word: admirableNo polarity – LSA

Most Similar: commendable, creditable, despicableLeast Similar: uninviting, dessert, seductive

With polarity – PILSAMost Similar: commendable, creditable, laudableLeast Similar: despicable, shameful, unworthy

Full results on GRE test set are available online

ConclusionPolarity Inducing LSA

Solves the open problem of antonyms/synonyms by making a vector space that can distinguish oppositesVector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity

Future WorkNew methods or representations for other word relations

e.g., Part-Whole, Is-A, AttributeApplications

e.g., Textual Entailment or Sentence Completion

Documents

Polarity Inducing Latent Semantic Analysis