Upload
andrew-snow
View
214
Download
1
Embed Size (px)
Citation preview
1
CSC 594 Topics in AI –Text Mining and Analytics
Fall 2015/16
10. Sentiment Analysis
• Sentiment Analysis is to extract and identify the polarity of sentiments expressed in texts.
• Lately sentiment analysis has been widely applied to reviews/opinion pieces and texts from social media.
• But there are many challenges in conducting sentiment analysis, e.g.1. Judgement of sentiment (existence, degree/granularity) is not clear-cut.
2. Sentiments are dependent on the domains and contexts (e.g. “addictive”)
3. Sentences with negations (“not”, “no”, “__n’t”, etc.).
4. Sentences with comparatives (“A is better than B, but still have problems”).
5. User texts contain spelling errors, irregular typography (e.g. emoticons), and ungrammatical sentences.
6. Words/expressions that imply sentiments are subtle (sentiment lexicon).
7. Multiple sentiments could be expressed in one sentence/document.
8. Possibility of sarcasm.
Sentiment Analysis
2
Supervised:•Classify documents into sentiment categories (positive, negative, neutral, etc.)
• Goals/End Products:– Predictive models for sentiment categorization– “Important/relevant features” that determine the sentiments.
look at features which are weighted heavier in the resulting model.
•Text Pre-processing:– Standard pre-processing – stemming/lemmatizing, removing stop words– Part-of-speech tagging – often focus on adjectives and nouns– Term weighting– N-grams or noun groups/phrases – unigram is too small of a unit
•Common techniques (in machine learning):– Typical classification algorithms, such as SVM, Decision Tree, KNN.– Naïve Bayes (as with general text classification)
Sentiment Analysis Tasks (1)
3
UnSupervised:•Typical goal is to mine opinions for features/aspects
– Example: product features (e.g. “awesome graphics”)
– Features/aspects are often pre-defined (for specific domains).
– Sometimes (pre-defined) sentiment lexicons are also used.
– However, automatic identification of features or sentiment lexicon could be possible as well.
•Text Pre-processing:– Standard pre-processing, POS-tagging and possible n-grams (or noun
groups) are applied.
– Processing is done at the sentence-level – to get narrower context.
– Deeper NLP is often applied to extract precise/accurate result.
•Common techniques:– Word Association/Collocations – PMI, Likelihood
– Clustering – to obtain general topics of the opinions in a corpus
Sentiment Analysis Tasks (2)
4
• Sentiment Lexicon for English (around 6800 words) – from (Hu and Liu, KDD-2004), https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
5