19
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam [email protected] Alexander Hogenboom Erasmus School of Economics Erasmus University Rotterdam [email protected] Flavius Frasincar Erasmus School of Economics Erasmus University Rotterdam [email protected] June 15, 2011

Sentiment Lexicon Creation from Lexical Resources

  • Upload
    aron

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Sentiment Lexicon Creation from Lexical Resources. Outline. Introduction Sentiment Lexicon Creation Framework Performance Conclusions Future Work. Introduction (1). The Web offers an overwhelming amount of textual data, containing traces of sentiment - PowerPoint PPT Presentation

Citation preview

Page 1: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation from Lexical Resources

BIS 2011

Bas HeerschopErasmus School of EconomicsErasmus University Rotterdam

[email protected]

Alexander HogenboomErasmus School of EconomicsErasmus University Rotterdam

[email protected]

Flavius FrasincarErasmus School of EconomicsErasmus University Rotterdam

[email protected]

June 15, 2011

Page 2: Sentiment Lexicon Creation from Lexical Resources

Outline

• Introduction

• Sentiment Lexicon Creation

• Framework

• Performance

• Conclusions

• Future Work

BIS 2011

2

Page 3: Sentiment Lexicon Creation from Lexical Resources

Introduction (1)

• The Web offers an overwhelming amount of textual data, containing traces of sentiment

• Insight into sentiment is crucial for, e.g., financial markets, reputation management, and marketing

• The challenge of automatically extracting sentiment from an ever-growing amount of data can be addressed by sentiment mining techniques

• Sentiment mining is typically focused on determining the polarity of natural language texts

BIS 2011

3

Page 4: Sentiment Lexicon Creation from Lexical Resources

Introduction (2)

• Existing sentiment mining approaches are typically based on word frequencies, yet there is a tendency of involving various other aspects of content

• Most approaches rely on lists of words and their sentiment scores: sentiment lexicons

• Existing lexicon creation methods have been assessed with respect to a manually created lexicon and have not been properly compared yet

• Which sentiment lexicon creation method performs well in the actual sentiment mining process?

44

BIS 2011

Page 5: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (1)

• Manual creation is cumbersome

• Alternative: exploiting (vast) lexical resources

• A popular lexical resource is WordNet:– Freely available, on-line semantic lexical resource– Designed to be used under program control– Organized into sets of synonyms (synsets)– Synsets are linked to one another through several relations

(e.g., synonymy, antonymy, hyponymy, or meronymy)

55

BIS 2011

Page 6: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (2)

• Possible method: traversing relations in lexical resource (Kim and Hovy 2004, Hu and Liu 2004, Lerman et al. 2009)

• Start with manually created seed set with score 1 for positive synsets and score -1 for negative synsets

• Iteratively propagate sentiment to related synsets (using WordNet relations)

• Weaken propagated score each iteration

• Resulting scores range from -1 (very negative) to 1 (very positive)

66

BIS 2011

Page 7: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (3)

77

BIS 2011

Page 8: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (4)

• Alternative: PageRank-based propagation to similar synsets (Esuli and Sebastiani 2007)

• Synsets are linked by means of the words (references to synsets) used in their glosses (descriptions)

• Iteratively update sentiment of each synset with a weighted average of a constant and the sentiment of its related synsets, proportionally to the total number of associations of these related synsets (using Extended WordNet synset relations based on glosses)

• Execute for manually created positive and negative seed set and combine obtained scores into scores ranging from -1 (very negative) to 1 (very positive)

88

BIS 2011

Page 9: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (5)

99

BIS 2011

Page 10: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (6)

1010

BIS 2011

Page 11: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (7)

• Alternatively, glosses can be analyzed by means of classifiers: SentiWordNet (Esuli and Sebastiani 2006)

• Synsets are classified as objective, positive, or negative by eight ternary classifiers

• Scores are calculated as proportion of classifiers assigning the three respective labels

• Sentiment scores are calculated by subtracting negativity from positivity scores, yielding scores ranging from -1 (very negative) to 1 (very positive)

• Classifiers differ in training data (expansion of seed set using WordNet relations) and learning approaches (Support Vector Machines and Rocchio classifiers)

1111

BIS 2011

Page 12: Sentiment Lexicon Creation from Lexical Resources

Sentiment Lexicon Creation (8)

1212

BIS 2011

Page 13: Sentiment Lexicon Creation from Lexical Resources

Framework

• Sentiment lexicon creation and subsequent lexicon-based document scoring

• Document scoring involves initial per-sentence word-level Part-of-Speech (POS) tagging, lemmatizing, and Word Sense Disambiguation (WSD)

• Words are then assigned scores in the range [-1,1], retrieved from the sentiment lexicon

• The sum of word scores is used to classify a document as positive (1) or negative (-1)

1313

BIS 2011

Page 14: Sentiment Lexicon Creation from Lexical Resources

Performance (1)

• Implementation in C#, Microsoft SQL Server database, OpenNLP-based POS tagger, WordNet.Net API for lemmatization and WSD

• Evaluation on 1,000 positive and 1,000 negative English movie reviews (Pang and Lee 2004):– Traversing WordNet relations (WN)– PageRank-based propagation of seed set (PRS) and

bootstrapped with SentiWordNet scores (PRSWN)– SentiWordNet (SWN)

• Evaluation measures: precision, recall, and F1, as well as overall accuracy and macro-level F1

1414

BIS 2011

Page 15: Sentiment Lexicon Creation from Lexical Resources

Performance (2)

Positive Negative Overall

Method Prec. Rec. F1 Prec. Rec. F1 Acc. F1

WN 51.0% 94.3% 66.2% 62.3% 9.4% 16.3% 51.9% 41.3%

PRS 49.8% 86.6% 63.3% 48.6% 12.5% 19.9% 49.7% 41.6%

PRSWN 49.6% 43.0% 46.1% 49.7% 56.3% 52.8% 49.7% 49.4%

SWN 56.3% 84.3% 67.5% 68.8% 34.6% 46.0% 57.5% 58.8%

1515

BIS 2011

Page 16: Sentiment Lexicon Creation from Lexical Resources

Conclusions

• Many existing sentiment mining approaches rely on lexical resources, which can be created in various ways

• We have evaluated exploiting semantic relations, PageRank-based algorithms, and machine learning (SentiWordNet) for sentiment lexicon creation

• Overall, SentiWordNet outperforms the other methods on our corpus, yet PageRank-based propagation yields the least biased sentiment classifier

BIS 2011

16

Page 17: Sentiment Lexicon Creation from Lexical Resources

Future Work

• Investigate sentiment lexicon creation methods yielding less biased classifiers

• Develop and assess other sentiment lexicon creation methods, e.g., by propagating document scores to word scores

• Compare the performance of different methods on a manually created lexicon such as Micro-WN(Op)

BIS 2011

17

Page 18: Sentiment Lexicon Creation from Lexical Resources

Questions?

• Feel free to contact:Alexander HogenboomErasmus School of EconomicsErasmus University RotterdamP.O. Box 1738, 3000 DR, The [email protected]

BIS 2011

18

Page 19: Sentiment Lexicon Creation from Lexical Resources

References

• Esuli, A., Sebastiani, F.: SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. In: 5th Conference on Language Resources and Evaluation (LREC 2006), European Language Resources Association (ELRA) (2006) 417—422

• Esuli, A., Sebastiani, F.: PageRanking WordNet Synsets: An Application to Opinion Mining. In: 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007), ACL (2007) 424—431

• Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), ACM (2004) 168—177

• Kim, S., Hovy, E.: Determining the Sentiment of Opinions. In: 20th International Conference on Computational Linguistics (COLING 2004), ACL (2004) 1367

• Lerman, K., Blair-Goldensohn, S., McDonald, R.: Sentiment Summarization: Evaluating and Learning User Preferences. In: 12th Conference of the European Chapter of the ACL (EACL 2009), ACL (2009) 514—522

• Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts. In: 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), ACL (2004) 271—280

BIS 2011

19