Upload
adrian-duran
View
23
Download
1
Embed Size (px)
Citation preview
Social Analytics on TwitterBy: Adam Ghassouine, Robert Monegro and Adrian DuranCUS 695 – Capstone ProjectDr. Giancarlo CrocettiMondays 7:10 p.m. – 9:10 p.m.
Executive Summary
This report provides an analysis and insight into social media data, in particular posts on Twitter, pertaining to the 2016 United States Presidential Election collected over a period of one week. The purpose of this report is to identify major topics that are being discussed in regards to the election. The method of analysis included is a sentiment analysis of all significant terms related to the topic being considered in this study. All script files from this analysis can be found in the appendices section of this report.
The analysis clearly shows that during the Presidential Election of 2016 there was more negative verbiage used. 18% of 117,655 words were considered negative. 11% of conversations over Twitter were using positive language, while 71% were neutral.The report finds support that on social media most of the discussion is about gossip pertaining to political figures and groups, almost a quarter of which is about Donald Trump, as opposed to people discussing about actual political issues which is not unexpected for Twitter. Recommendations for future analysis include analyzing the sentiment of each cluster.
Data Retrieval
Dataset
Data Processing
StopWord Analysis
• Using the a StopWords dictionary, one can extract the frequency table of all words.• The principle for doing this analysis is to detect and remove unnecessary words that provide
little to no substance in regards to this research.• ‘https’ was appearing frequently, causing unnecessary n-grams to be generated. This in turn
led to the removal of this term.• Another example of a high frequency word was ‘RT’, which stands for a post that has been
retweeted. This term provided no importance to the overall analysis.• Unnecessary URLs in each post, random words with no meaning such as ‘absfwi’ and ‘acbqdi’,
were also eliminated.• The result of this analysis section are words that only relate to the 2016 Presidential Election.
Term Frequencies
Term Frequencies Results
Clustering Analysis
Clustering Analysis (cont.)
SentiWordNet
Sentiment Code
• Used to extract positive and negative scores to further discern sentiment for the clusters generated
Sentiment Analysis• The analysis clearly shows that during the
president elections of 2016 there were more negative verbiage used. 18% of 117,655 words were considered negative. 11% of conversations over Twitter were using positive language, while 71% were neutral.
• There were 28 most commonly used positive words, such as: "Happiness, Congratulations, Splendid, Excellent and Admirability"
• There were 16 most commonly used negative words, such as: “Threaten, Downhill, Apocalyptic, Negative, Trashed"
Cluster K-5
Cluster K-6
Cluster K-8
Cluster K-9
Cluster K-10
Sentiment Score
Bibliographyhttp://i2.cdn.turner.com/cnnnext/dam/assets/160201150128-trump-clinton-split-portrait-exlarge-169.jpghttp://sentiwordnet.isti.cnr.it/https://rapidminer.com/https://twitter.com/?lang=enhttp://aylien.com/