Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
Duyu Tang1, Furu Wei2, Nan Yang2, Ming Zhou2, Ting Liu1, Bing Qin1
1 Harbin Institute of Technology 2 Microsoft Research
Roadmap
• Motivation
• The Proposed Method
• Experiments
• Conclusion
2
Twitter sentiment classification
• Input: A tweet
• Output: Sentiment polarity of the tweet
– Positive / Negative / Neutral
3
Top-system in SemEval-2013 Task 2(B)
• NRC-Canada [Mohammad 2013]
– Feature engineering
• Hand-crafted features
• Sentiment lexicons
– How about learning feature automatically from data for Twitter sentiment classification?
4
Word Representation (Embedding)
• Word embedding is important
– Compositionality
– [Yessenalina11; Socher13]
• Word Embedding it is not so good
Compositionality
it is not so good
5
linguistic =
1.045 0.912
-0.894 -1.053 0.459
ship Beijing
Harbin
vessel
boat
x1
x2
Is It Enough for Sentiment Analysis?
• Existing embedding learning algorithms typically use the syntactic contexts of words.
he formed the good habit of …
he formed the bad habit of …
Same contexts
ship Beijing
Shenzhen
vessel
boat
x1
x2 bad good
6
The words with similar contexts but opposite sentiment polarity are mapped into close vectors.
Roadmap
• Motivation
• The Proposed Method
• Experiments
• Conclusion
7
Twitter Sentiment Classification
don’t miss wanna :) it i
average max
concatenate
min
Input
Embedding Layer
Layer 𝒇
Layer 𝒇′
Training Data
Learning Algorithm
Sentiment Classifier
8
Text it is so coooll :)
Input Window
𝐷 𝐿𝑇𝑤
Lookup Table
𝑀1 ×⊙
Linear
𝐻
HardTanh
𝐻
Linear
𝑀2 ×⊙ C = 1
concatenate
HardTanh
Linear
Loss Function
Collobert and Weston (C&W) 2011
9
Original Ngram
Corrupted Ngram
Collobert and Weston (C&W) 2011
10
Text it is so coooll :)
Input Window
𝐷 𝐿𝑇𝑤
𝑀1 ×⊙
Lookup Table
Linear
𝐻
HardTanh
𝐻
Linear
𝑀2 ×⊙ C = 1
concatenate
Loss Function
HardTanh
Linear
Model 1: SSWE Hard
• Intuition
– Use the sentiment polarity of sentences (e.g. tweets) to learn the sentiment-specific word embedding (SSWE)
– Solution
• Predict the sentiment polarity of text
11
[1, 0]
[0, 1]
Positive
Negative
P N
P N
Text it is so coooll :)
Input Window
𝐷 𝐿𝑇𝑤
𝑀1 ×⊙
Lookup Table
Linear
𝐻
HardTanh
𝐻
Linear
𝑀2 ×⊙ C = 2
concatenate
Softmax
C = 2
Softmax
Loss Function
Gold Distribution
Predicted Distribution
12
Model 2: SSWE Soft
• Intuition
– Use the sentiment polarity of sentences to learn the sentiment-specific word embedding
– Solution
• Soften the hard constrains of Model 1
13
Positive
Negative
P N
P N
[1, 0]
[0, 1]
P N
P N
Model 1 Model 2
[1.7, 0.2] P > N
[0.3, 3.8] P < N
Text it is so coooll :)
Input Window
𝐷 𝐿𝑇𝑤
𝑀1 ×⊙
Lookup Table
Linear
𝐻
HardTanh
𝐻
Linear
𝑀2 ×⊙ C = 2
concatenate
Loss Function
Indicator Function
Positive Score
Negative Score
14
Model 3: SSWE Unified
• Intuition
– Use both the syntactic contexts of words and the sentiment polarity of sentences to learn the sentiment-specific word embedding
– Solution
• A hybrid approach by capturing both information
15
he formed the good habit of
Context Sentiment
Text it is so coooll :)
Input Window
𝐷 𝐿𝑇𝑤
𝑀1 ×⊙
Lookup Table
Linear
𝐻
HardTanh
𝐻
Linear
𝑀2 ×⊙ C = 2
concatenate
Loss Function
Sentiment Loss
Syntactic Loss
Sentiment Loss
16
Embedding Training
• Data – Tweets contains positive/negative emoticons
– 5M positive, 5M negative tweets from April, 2013
• Back-propagation + AdaGrad [Duchi 2011] – Embedding length = 50
– Window size = 3
– Learning rate = 0.1
17
Hu et al., 2013
Roadmap
• Motivation
• The Proposed Method
• Experiments
– Twitter Sentiment Classification
– Word Similarity of Sentiment Lexicons
• Conclusion
18
Twitter Sentiment Classification
• Setting
– Data
• Twitter Sentiment Classification Track in Semantic Evaluation 2013 (message-level)
• Positive VS negative classification
– Evaluation metric
• Macro-F1 of positive VS negative
19
Results
• Comparison with Different Embeddings
0.64
0.68
0.72
0.76
0.8
0.84
0.88
20
Results
• Comparison with Twitter Sentiment Classification Algorithms
21
SemEval 2014 Task 9 (b)
• Coooolll : A deep learning system for Twitter sentiment classification
Training Data
Learning Algorithm
Feature Representation
Sentiment Classifier
1 2 ….
N
N+1 N+2 …
N+K
STATE Feature
SSWE Feature
all-cap
emoticon
…
….
dimension 1
dimension 2
dimension N
elongated
Massive Tweets
Embedding Learning
22
SemEval 2014 Task 9 (b)
• Results
– Our system Coooolll is ranked 2nd among 45 systems on Twitter2014 test set.
23
SemEval 2014 Task 9 (b)
• Results
– Our system Coooolll is ranked 2nd among 45 systems on Twitter2014 test set.
24
Roadmap
• Motivation
• The Proposed Method
• Experiments
– Twitter Sentiment Classification
– Word Similarity of Sentiment Lexicons
• Conclusion
25
Experiment Settings
• Setting
– Evaluation metric
– Data
26
good
cool awesome great bad nice interesting fantastic love excellent terrible
Accuracy = 8/10 = 80%
Results
• Evaluation
– Word similarity of sentiment lexicons
27
Conclusion
• Learn continuous representation of words for Twitter sentiment classification.
• Develop three neural networks for learning sentiment-specific word embedding (SSWE) from massive tweets without manual annotation.
• The effectiveness of SSWE has been verified in Twitter sentiment classification and word similarity judgement.
28
Thanks
29