14
Sentiment Analysis of Film- related Messages on Social Media Christopher Burdorf NBCUniversal

Sentiment Analysis of Film-Related Messages on Social Media

Embed Size (px)

Citation preview

Page 1: Sentiment Analysis of Film-Related Messages on Social Media

Sentiment Analysis of Film-related Messages on Social Media

Christopher BurdorfNBCUniversal

Page 2: Sentiment Analysis of Film-Related Messages on Social Media

“The big gamblers are not in Vegas, they are in Hollywood”

Animation Director

Page 3: Sentiment Analysis of Film-Related Messages on Social Media

Sentiment Analysis of Social Media

Process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral.Facebook public messages – DataSiftTwitter tweets – public API (only 1%), Twitter Gnip Firehose (100%)

Stanford CoreNLP -https://github.com/stanfordnlp/CoreNLP Natural Language processing system which uses deep learning techniques to process sentiment.

Page 4: Sentiment Analysis of Film-Related Messages on Social Media

Deep LearningCoreNLP uses RNTN (Recurrent Neural Tensor Networks)RNTNs use compositional vector representations for phrases of variable length and syntactic type.Used as features to classify each word and phrase within a sentenceComputes overall sentiment based on vector values for words and phrases it has been trained to recognize.Sentiment ranges from 0 – very negative to 4 – very positive

Page 5: Sentiment Analysis of Film-Related Messages on Social Media

RNTN ModelRNTNs represent a phrase through word vectors and a parse tree and then compute vectors for higher nodes in the tree using the same tensor-based composition function.

Page 6: Sentiment Analysis of Film-Related Messages on Social Media

Film Sentiment: FSOG

Save messages from Datasift Facebook public stream referencing Fifty Shades of Grey. Store in HBaseStored 130,000 Facebook messages over a two-week period surrounding the films opening (opening date Feb 13)Stored 300MB of Facebook message JSON data.Process sentiment analysis on the messages using different training models using parallel Scala collections.

Page 7: Sentiment Analysis of Film-Related Messages on Social Media

ExampleNo model: Sentiment= 1, “Tonight we're feeling Romantically Involved #fiftyShades”(4 (3 (2 Tonight) (3 (3 we're) (3 feeling))) (4 (4 Romantically) (4 Involved)))With Model: Sentiment= 4, “Tonight we're feeling Romantically Involved”Can match phrases as well (eg. “can't wait”).

Page 8: Sentiment Analysis of Film-Related Messages on Social Media

Facebook message counts: FSOG

Page 9: Sentiment Analysis of Film-Related Messages on Social Media

Training Models: FSOG Median

Page 10: Sentiment Analysis of Film-Related Messages on Social Media

Statistical Sampling

Manual assignment of sentiments on a statistically significant sampling of messages95% confidence level 7% margin of errorCompare result to training model results

Page 11: Sentiment Analysis of Film-Related Messages on Social Media

Sampling Results

Page 12: Sentiment Analysis of Film-Related Messages on Social Media

Performance IssuesSpam: 80% Tweets are spam. Facebook messages about 10% spam.Spam filtering using matching phrases vs H20 Deep Learning.Training performance improvements: took 8 hours to train full plus movie critic set worked with Standford NLP group to multithread – reduced training time to 1 hour.

Page 13: Sentiment Analysis of Film-Related Messages on Social Media

Performance ImprovementsSentiment lookup performance improvements – 6 hours to analyze 130k messagesSwitched to distributed database (Cassandra) and implemented concurrent lookups using Akka Actors resulted in 7x speedup on 16 cores

Page 14: Sentiment Analysis of Film-Related Messages on Social Media

Other languages

Other LanguagesTwitter Firehose is 40% English. other languages (eg. Spanish) are seeing prominent usage as well. 77% of Twitter's 284 million MAUs (Monthly Active Users) are located outside the USA. 82% of Facebook's 890 million DAUs (Daily Active Users) are located outside the USA and Canada.