SentiStrength: Sentiment Strength Detection in MySpace and Twitter

SentiStrength: Sentiment Strength Detection in MySpace and Twitter

Mike ThelwallStatistical Cybermetrics Research GroupUniversity of Wolverhampton, UK

Virtual Knowledge Studio (VKS)

Information Studies

SentiStrength Objective

1. Detect positive and negative sentiment strength in short informal text

1. Develop workarounds for lack of standard grammar and spelling

2. Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!)

3. Classify simultaneously as positive 1-5 AND negative 1-5 sentiment

2. Apply to MySpace comments and social issues

SentiStrength Algorithm - Core

List of 890 positive and negative sentiment terms and strengths (1 to 5), e.g. ache = -2, dislike = -3, hate=-4,

excruciating -5 encourage = 2, coolest = 3, lover = 4

Sentiment strength is highest in sentence; or highest sentence if multiple sentences

Examples

My legs ache.

You are the coolest.

I hate Paul but encourage him.

-2

3

-4 2

1, -2

positive, negative

3, -1

2, -4

Term Strength Optimisation

Term strengths (e.g., ache = -2) initially fixed by human coderTerm strengths optimised on training set with 10-fold cross-validation Adjust term strengths to give best

training set results then evaluate on test set

E.g., training set: “My legs ache”: coder sentiment = 1,-3 => adjust sentiment of “ache” from -2 to -3.

SentiStrength Algorithm -Extra

Spelling correction for repeated letters Helllllo -> Hello (emphasis: llll)

Tagging approach used (see next slide)

Extra heuristics Emphasis acts to enhance + or – emotion Emotion words ignored in questions Take strongest positive or negative

expression in whole comment Booster words (e.g., very, some)

Tagging

HIIIIII MY MATE!!!!!!!! <w equiv="HI" em="IIIII">HIIIIII</w><w>MY</w><w>MATE</w><p equiv="!" em="!!!!!!!">!!!!!!!!

</p>HI MY MATE!2 3

Overall 3, -1mate = 2

Experiments

Development data = 2600 MySpace comments coded by 1 coderTest data = 1041 MySpace comments coded by 3 independent codersComparison against a range of standard machine learning algorithms

Inter-coder agreement

Comparison +veagree-ment

-veagree-ment

Coder 1 vs. 2 51.0% 67.3%

Coder 1 vs. 3 55.7% 76.3%

Coder 2 vs. 3 61.4% 68.2%

Krippendorff’s inter-coderweighted alpha = 0.5743for positive and 0.5634for negative sentiment

Only moderate agreementbetween codersbut it is a hard 5-category task

Machine learning methods +ve

Machine learning methods -ve

Results:+ve sentiment strength

Algorithm Opt.Feat.

Accu-racy

Acc.+/- 1 class

Corr. Mean % abs. error

SentiStrength - 60.6% 96.9% .599 22.0%

Simple logistic regression 700 58.5% 96.1% .557 23.2%

SVM (SMO) 800 57.6% 95.4% .538 24.4%

J48 classification tree 700 55.2% 95.9% .548 24.7%

JRip rule-based classifier 700 54.3% 96.4% .476 28.2%

SVM regression (SMO) 100 54.1% 97.3% .469 28.2%

AdaBoost 100 53.3% 97.5% .464 28.5%

Decision table 200 53.3% 96.7% .431 28.2%

Multilayer Perceptron 100 50.0% 94.1% .422 30.2%

Naïve Bayes 100 49.1% 91.4% .567 27.5%

Baseline - 47.3% 94.0% - 31.2%

Random - 19.8% 56.9% .016 82.5%

Results:-ve sentiment strength

Algorithm Opt.feat.

Accuracy Acc.+/- 1 class

Corr. Mean % absoluteerror

SVM (SMO) 100 73.5% 92.7% .421 16.5%

SVM regression (SMO) 300 73.2% 91.9% .363 17.6%

Simple logistic regression

800 72.9% 92.2% .364 17.3%

SentiStrength - 72.8% 95.1% .564 18.3%

Decision table 100 72.7% 92.1% .346 17.0%

JRip rule-based classifier 500 72.2% 91.5% .309 17.3%

J48 classification tree 400 71.1% 91.6% .235 18.8%

Multilayer Perceptron 100 70.1% 92.5% .346 20.0%

AdaBoost 100 69.9% 90.6% - 16.8%

Baseline - 69.9% 90.6% - 16.8%

Naïve Bayes 200 68.0% 89.8% .311 27.3%

Random - 20.5% 46.0% .010 157.7%

SentiStrength ComponentsType %

Consecutive +ve words not used as boosters 61.2

Emoticons ignored 61.2

Negating words not switch (e.g., not happy) 61.0

SentiStrength standard configuration 60.9

Booster words ignored (e.g., very) 60.7

Automatic spelling correction disabled 60.6

Exclamation marks not given a strength of 2 60.6

Extra multiple letters not used as boosters 60.4

Neutral words with emphasis not counted as +ve 60.1

SentiStrength with all the above changes 57.5

Example differences/errors

THINK 4 THE ADD Computer (1,-1), Human (2,-1)

0MG 0MG 0MG 0MG 0MG 0MG 0MG 0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!! Computer (2,-1), Human (5,-1)

Selected variations tested

Modification (for positive sentiment)

Accuracy +/- 1class

corr. MeanAbs.% err.

Negating words not used to switch following sentiment (e.g., not happy)

60.87% 97.50% .6206 21.28%

SentiStrength standard algorithm 60.64% 96.90% .5986 21.96%

Exclamation marks not given a strength of 2

60.51% 96.62% .6035 21.47%

Automatic spelling correction disabled 60.39% 96.88% .5961 22.05%

Extra multiple letters not used as emotion boosters

60.21% 96.81% .5952 22.16%

Neutral words with emphasis not counted as positive emotion

60.13% 96.79% .5966 21.90%

SentiStrength with no extras 57.44% 96.07% .6073 21.91%

Application - Evidence of emotion homophily in MySpace

Automatic analysis of sentiment in 2 million comments exchanged between MySpace friends Correlation of 0.227 for +ve emotion strength and 0.254 for –vePeople tend to use similar but not identical levels of emotion to their friends in messages

CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs

Collective Emotionsin Cyberspace

Sentistrength

Application – sentiment in Twitter events

Analysis of a corpus of 1 month of English Twitter postsAutomatic detection of spikes (events)Sentiment strength classification of all postsAssessment of whether sentiment strength increases during important events Result – negative sentiment normally increases,

positive sentiment might tend to increase

Automatically-identified Twitter spikes

Chile

Hawaii

#oscars

Tiger Woods

Conclusion

Automatic classification of emotion on a 5 point positive and negative scale seems possible for MySpace…And other similar short computer text messages?Hard to get accuracy much over 60%?Next = analyse emotion inonline debates

Publication

Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (in press). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology.

Thelwall, M., Wilkinson, D. & Uppal, S.(2010). Data mining emotion in social network communication: Gender differences in MySpace, Journal of the American Society for Information Science and Technology, 61(1), 190-199.

Documents

SentiStrength: Sentiment Strength Detection in MySpace and Twitter