Upload
gladys
View
87
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Polarity Inducing Latent Semantic Analysis. A vector space model that can distinguish Antonyms from Synonyms!. Scott Wen-tau Yih Joint work with Geoffrey Zweig & John Platt Microsoft Research. v q. Vector Space Model. v d. - PowerPoint PPT Presentation
Citation preview
Polarity Inducing Latent Semantic Analysis
Scott Wen-tau YihJoint work with Geoffrey Zweig & John PlattMicrosoft Research
A vector space model that can distinguish Antonyms from Synonyms!
Vector Space ModelText objects (e.g., words, phrases, sentences or documents) are represented as vectors
High-dimensional sparse term-vectorsConcept vectors from topic models or projection methodsConstructed compositionally from word vectors [Socher et al. 12]
Relations of the text objects are estimated by functions in the vector space
Relatedness is measured by some distance function (e.g., cosine)
qvcos()vq
vd
Applications of Vector Space ModelsDocument Level
Information Retrieval [Salton & McGill 83]Document Clustering [Deerwester et al. 90]Search Relevance Measurement [Baeza-Yates & Riberio-Neto ’99]Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11]
Word LevelLanguage modeling [Bellegarda 00]Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney 01; Turney & Littman 05; Agirre et al. 09; Reisinger & Mooney 10; Yih & Qazvinian 12]
Beyond General SimilarityExisting VSMs cannot distinguish finer relations
The “antonym” issue of distributional similarity
The co-occurrence or distributional hypothesesApply to near-synonyms, hypernyms and other semantically related words, including antonyms [Mohammad et al. 08]e.g., “hot” and “cold” occur in similar contexts
LSA does not solve the issueMight assign a high degree of similarity to opposites as well as synonyms [Landauer & Laham 98]
Approaches for Detecting AntonymsSeparate antonyms from distributionally
similar word pairs [Lin et al. 03] Patterns: “from X to Y”, “either X or Y”
WordNet graph [Harabagiu et al. 06]Synsets connected by is-a links and exactly one antonymy link
WordNet + affix rules + heuristics [Mohammad et al. 08]Distinguishing synonyms and antonyms is still perceived as a difficult open problem…[Poon & Domingos 09]
Our ContributionsPolarity Inducing Latent Semantic Analysis (PILSA)
A vector space model that encodes polarity informationSynonyms cluster together in this spaceAntonyms lie at the opposite ends of a unit spherehot
burning
coldfreezing
Our ContributionsPolarity Inducing Latent Semantic Analysis (PILSA)
A vector space model that encodes polarity informationSynonyms cluster together in this spaceAntonyms lie at the opposite ends of a unit sphere
Significantly improved the prediction accuracy on a benchmark GRE dataset ()
RoadmapIntroductionPolarity Inducing Latent Semantic Analysis
Basic constructionExtension 1: Improving accuracyExtension 2: Improving coverage
Experimental evaluationTask & datasetsResults
Conclusion
Input: A thesaurus (with synonyms & antonyms)
Create a “document”-term matrixEach group of words (synonyms and antonyms) is treated as a “document”
Induce polarity by making antonyms have negative weightsApply SVD as in regular Latent Semantic Analysis
The Core Method
Matrix ConstructionAcrimony: rancor, conflict, bitterness; goodwill, affectionAffection: goodwill, tenderness, fondness; acrimony, rancor
acrimony rancor goodwill affection …Group 1: “acrimony”
4.73 6.01 5.81 4.86 …
Group 2: “affection”
3.78 5.23 6.21 5.15 …
… … … … … …
Document: row-vector
Term: column-vector
TFIDF score
Matrix ConstructionAcrimony: rancor, conflict, bitterness; goodwill, affectionAffection: goodwill, tenderness, fondness; acrimony, rancor
acrimony rancor goodwill affection …Group 1: “acrimony”
4.73 6.01 -5.81 -4.86 …
Group 2: “affection”
-3.78 -5.23 6.21 5.15 …
… … … … … …
Inducing polarity
Cosine Score:
Effect of Inducing Polarityacrimony rancor goodwill affection
Group 1: “acrimony”
4.73 6.01 5.81 4.86
Group 2: “affection”
3.78 5.23 6.21 5.15
Effect of Inducing Polarityacrimony rancor goodwill affection
Group 1: “acrimony”
1 1 1 1
Group 2: “affection”
1 1 1 1
Cosine similarity = 1
Effect of Inducing Polarityacrimony rancor goodwill affection
Group 1: “acrimony”
1 1 1 1
Group 2: “affection”
1 1 1 1
Cosine similarity = 1
Cannot distinguish antonyms from synonyms!
Effect of Inducing Polarityacrimony rancor goodwill affection
Group 1: “acrimony”
1 1 1 1
Group 2: “affection”
1 1 1 1
acrimony rancor goodwill affectionGroup 1: “acrimony”
1 1 -1 -1
Group 2: “affection”
-1 -1 1 1
Cosine similarity = 1
Effect of Inducing Polarityacrimony rancor goodwill affection
Group 1: “acrimony”
1 1 1 1
Group 2: “affection”
1 1 1 1
acrimony rancor goodwill affectionGroup 1: “acrimony”
1 1 -1 -1
Group 2: “affection”
-1 -1 1 1
Cosine similarity = -1
Mapping to Latent Space via SVD
𝐖 𝐔 𝐕T≈
𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛
𝐒words
Word similarity: cosine of two columns in SVD generalizes and smooths the original data
Uncovers relationships not explicit in the thesaurus
Mapping to Latent Space via SVD
𝐖 𝐔 𝐕T≈
𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛
𝐒words
As , can be viewed as the projection matrix that maps the raw column-vector to the -dimensional latent space
Extension 1: Improve AccuracyRefine the projection matrix by discriminative training
S2Net [Yih et al. 11]: very similar to RankNet [Burges et al. 05] but focuses on learning concept vectors
𝒗𝒑𝒗𝒒𝑐𝑘𝑐1
𝑡1 𝑡𝑑𝐴𝑑×𝑘 𝑣𝑝=𝐴𝑇 𝑓 𝑝
𝑓 𝑠𝑖𝑚(𝑣𝑝 ,𝑣𝑞)
𝒇 𝒑
Applying S2NetTraining data: Antonym pairs from thesaurusInitialize model with the PILSA projection matrix
Learning objective: cosine score of antonyms should be lower than other word pairs
𝐿 ( Δ𝑖𝑗 ;𝐀 )= log (1+exp (−𝛾 Δ𝑖𝑗)¿)¿
-2 -1 0 1 205
101520
AntonymsOtherwordpairΔ𝑖𝑗≡ cos (𝐀 T 𝐟 𝑝𝑖 ,𝐀T 𝐟𝑞 𝑗 )− cos (𝐀T 𝐟 𝑝𝑖 ,𝐀 T 𝐟 𝑞𝑖)
Extension 2: Improve CoverageWhattodowithout-of-thesauruswords?
Some lexical variationsEncarta thesaurus contains “corruptible” and “corruption”, but not “corruptibility”
Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word
Rare or offensive wordse.g., “froward” and “moronic”
Embedding out-of-thesaurus words by leveraging a general corpus
Embedding Out-of-thesaurus WordsCreate a context vector space model using a collection of documents (e.g., Wikipedia)
Context: words within a window of [-10,10]Embed target word into the PILSA space by -NN
Find nearby in-thesaurus words in the context spaceRemove words with inconsistentpolarityUse the centroid of the corresponding PILSA vectors to represent the target word
Embedding Out-of-thesaurus WordsCreate a context vector space model using a collection of documents (e.g., Wikipedia)
Context: words within a window of [-10,10]Embed target word into the PILSA space by -NN
Context Vector Space
PILSA Space
sweltering
burning
hot
cold
RoadmapIntroductionPolarity Inducing Latent Semantic Analysis
Basic constructionExtension 1: Improving accuracyExtension 2: Improving coverage
Experimental evaluationTask & datasetsResults
Conclusion
Data for Building PILSA ModelsEncarta Thesaurus (for basic PILSA)
47k word categories (i.e., the “documents”)Vocabulary of 50k words125,724 pairs of antonyms
Wikipedia (for embedding out-of-thesaurus words)
Sentences from a Nov-2010 snapshot917M words after preprocessing
Experimental EvaluationTask: GRE closest-opposite questions
Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correctDev / Test: 162 / 950 questions [Mohammad et al. 08]Dev set is used for tuning the dimensionality of PILSA
Evaluation metricAccuracy: #correct / #total questionsQuestions with unresolved out-of-thesaurus target words are treated answered incorrectly
Results on Test Set
Lookup
Raw TF
IDFPIL
SA
PILSA
+S2Net
OOV Embe
dding
Moham
mad et
al. 08
0.50.550.6
0.650.7
0.750.8
0.85
0.56 0.57
0.740.77
0.8
0.64
ExamplesTarget word: admirableNo polarity – LSA
Most Similar: commendable, creditable, despicableLeast Similar: uninviting, dessert, seductive
With polarity – PILSAMost Similar: commendable, creditable, laudableLeast Similar: despicable, shameful, unworthy
Full results on GRE test set are available online
ConclusionPolarity Inducing LSA
Solves the open problem of antonyms/synonyms by making a vector space that can distinguish oppositesVector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity
Future WorkNew methods or representations for other word relations
e.g., Part-Whole, Is-A, AttributeApplications
e.g., Textual Entailment or Sentence Completion