Learning Sentiment-Specific Word Embedding for Twitter ...€¦ · awesome great bad nice interesting fantastic love excellent terrible Accuracy = 8/10 = 80% . Results •Evaluation

Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Duyu Tang1, Furu Wei2, Nan Yang2, Ming Zhou2, Ting Liu1, Bing Qin1

1 Harbin Institute of Technology 2 Microsoft Research

Roadmap

• Motivation

• The Proposed Method

• Experiments

• Conclusion

2

Twitter sentiment classification

• Input: A tweet

• Output: Sentiment polarity of the tweet

– Positive / Negative / Neutral

3

Top-system in SemEval-2013 Task 2(B)

• NRC-Canada [Mohammad 2013]

– Feature engineering

• Hand-crafted features

• Sentiment lexicons

– How about learning feature automatically from data for Twitter sentiment classification?

4

Word Representation (Embedding)

• Word embedding is important

– Compositionality

– [Yessenalina11; Socher13]

• Word Embedding it is not so good

Compositionality

it is not so good

5

linguistic =

1.045 0.912

-0.894 -1.053 0.459

ship Beijing

Harbin

vessel

boat

x1

x2

Is It Enough for Sentiment Analysis?

• Existing embedding learning algorithms typically use the syntactic contexts of words.

he formed the good habit of …

he formed the bad habit of …

Same contexts

ship Beijing

Shenzhen

vessel

boat

x1

x2 bad good

6

The words with similar contexts but opposite sentiment polarity are mapped into close vectors.

Roadmap

• Motivation


• Experiments

• Conclusion

7

Twitter Sentiment Classification

don’t miss wanna :) it i

average max

concatenate

min

Input

Embedding Layer

Layer 𝒇

Layer 𝒇′

Training Data

Learning Algorithm

Sentiment Classifier

8

Text it is so coooll :)

Input Window

𝐷 𝐿𝑇𝑤

Lookup Table

𝑀1 ×⊙

Linear

𝐻

HardTanh

𝐻

Linear

𝑀2 ×⊙ C = 1

concatenate

HardTanh

Linear

Loss Function

Collobert and Weston (C&W) 2011

9

Original Ngram

Corrupted Ngram

Collobert and Weston (C&W) 2011

10


Input Window

𝐷 𝐿𝑇𝑤

𝑀1 ×⊙

Lookup Table

Linear

𝐻

HardTanh

𝐻

Linear

𝑀2 ×⊙ C = 1

concatenate

Loss Function

HardTanh

Linear

Model 1: SSWE Hard

• Intuition

– Use the sentiment polarity of sentences (e.g. tweets) to learn the sentiment-specific word embedding (SSWE)

– Solution

• Predict the sentiment polarity of text

11

[1, 0]

[0, 1]

Positive

Negative

P N

P N


Input Window

𝐷 𝐿𝑇𝑤

𝑀1 ×⊙

Lookup Table

Linear

𝐻

HardTanh

𝐻

Linear

𝑀2 ×⊙ C = 2

concatenate

Softmax

C = 2

Softmax

Loss Function

Gold Distribution

Predicted Distribution

12

Model 2: SSWE Soft

• Intuition

– Use the sentiment polarity of sentences to learn the sentiment-specific word embedding

– Solution

• Soften the hard constrains of Model 1

13

Positive

Negative

P N

P N

[1, 0]

[0, 1]

P N

P N

Model 1 Model 2

[1.7, 0.2] P > N

[0.3, 3.8] P < N


Input Window

𝐷 𝐿𝑇𝑤

𝑀1 ×⊙

Lookup Table

Linear

𝐻

HardTanh

𝐻

Linear

𝑀2 ×⊙ C = 2

concatenate

Loss Function

Indicator Function

Positive Score

Negative Score

14

Model 3: SSWE Unified

• Intuition

– Use both the syntactic contexts of words and the sentiment polarity of sentences to learn the sentiment-specific word embedding

– Solution

• A hybrid approach by capturing both information

15

he formed the good habit of

Context Sentiment


Input Window

𝐷 𝐿𝑇𝑤

𝑀1 ×⊙

Lookup Table

Linear

𝐻

HardTanh

𝐻

Linear

𝑀2 ×⊙ C = 2

concatenate

Loss Function

Sentiment Loss

Syntactic Loss

Sentiment Loss

16

Embedding Training

• Data – Tweets contains positive/negative emoticons

– 5M positive, 5M negative tweets from April, 2013

• Back-propagation + AdaGrad [Duchi 2011] – Embedding length = 50

– Window size = 3

– Learning rate = 0.1

17

Hu et al., 2013

Roadmap

• Motivation


• Experiments

– Twitter Sentiment Classification

– Word Similarity of Sentiment Lexicons

• Conclusion

18

Twitter Sentiment Classification

• Setting

– Data

• Twitter Sentiment Classification Track in Semantic Evaluation 2013 (message-level)

• Positive VS negative classification

– Evaluation metric

• Macro-F1 of positive VS negative

19

Results

• Comparison with Different Embeddings

0.64

0.68

0.72

0.76

0.8

0.84

0.88

20

Results

• Comparison with Twitter Sentiment Classification Algorithms

21

SemEval 2014 Task 9 (b)

• Coooolll : A deep learning system for Twitter sentiment classification

Training Data

Learning Algorithm

Feature Representation

Sentiment Classifier

1 2 ….

N

N+1 N+2 …

N+K

STATE Feature

SSWE Feature

all-cap

emoticon

…

….

dimension 1

dimension 2

dimension N

elongated

Massive Tweets

Embedding Learning

22


• Results

– Our system Coooolll is ranked 2nd among 45 systems on Twitter2014 test set.

23


• Results

– Our system Coooolll is ranked 2nd among 45 systems on Twitter2014 test set.

24

Roadmap

• Motivation


• Experiments

– Twitter Sentiment Classification

– Word Similarity of Sentiment Lexicons

• Conclusion

25

Experiment Settings

• Setting

– Evaluation metric

– Data

26

good

cool awesome great bad nice interesting fantastic love excellent terrible

Accuracy = 8/10 = 80%

Results

• Evaluation

– Word similarity of sentiment lexicons

27

Conclusion

• Learn continuous representation of words for Twitter sentiment classification.

• Develop three neural networks for learning sentiment-specific word embedding (SSWE) from massive tweets without manual annotation.

• The effectiveness of SSWE has been verified in Twitter sentiment classification and word similarity judgement.

28

Thanks

29

Documents

Learning Sentiment-Specific Word Embedding for Twitter ...€¦ · awesome great bad nice interesting fantastic love excellent terrible Accuracy = 8/10 = 80% . Results •Evaluation