Upload
hamdan-azhar
View
401
Download
1
Embed Size (px)
Citation preview
Hamdan [email protected] // @hamdanazhar
// November 5, 2016
🐍s, s, & major s🌹 🔑an introduction to emoji data science
🗃📊🗞
why emoji data science?
http://theislamicmonthly.com/neither-here-nor-there-on-losing-my-snapchat-best-friend/
emojis data science
Overarching goals
■Understanding what emojis mean
■Using emojis to understand the topics we use them to discuss
■Getting past the “so what” hurdle and defining good questions to ask
the birth of
My reaction to this article,in emoji
So we decided to look at some actual data
Getting the data■ Use Twitter API to sample 100,000 tweets for five hashtags related
to Britain’s EU Referendum Hashtags: #NotMyVote, #VoteRemain, #EURef, #Brexit, #VoteLeave Data pulled for June 24, the day after the referendum English language tweets only
After removing retweets, we’re left with 23,989 unique tweets, i.e. the “Brexit dataset”
Of these, 1,505 tweets (6.3%) contain at least one emoji
Analyzing the data Use regular expressions in R, along with Unicode emoji
dictionaries, to extract emojis from tweets
Compute emoji counts in the Brexit dataset
Compare with counts for all >10B emoji tweets on Twitter since 2013 (from emojitracker.com)
Extract hashtags from tweets and compute hashtag profiles for various emojis
emoji emoji namebrexi
t rank
general
rankbrexit index*
general
index*overinde
x**😂 face with tears of joy 1 1 100 100
flag of united kingdom 2 363 87 0.2 400x
👍 thumbs up sign 3 18 26 11 2.3x👏 clapping hands sign 4 45 24 6 3.9x❤ heavy black heart 5 3 21 45 😭 loudly crying face 6 7 17 29 😔 pensive face 7 13 14 18 😩 weary face 8 11 13 22 😢 crying face 9 27 12 9 1.3x🙈 see-no-evil monkey 10 24 12 9 1.3x* Index is an estimate of how prevalent a given emoji is in Brexit tweets and general tweets, with the most common emoji (😂) being
defined as 100
** Reflects how much more likely a given emoji is to be used in a Brexit tweet vs. generally on Twitter (general rank and index obtained from emojitracker.com). An emoji overindexes on Brexit if both brexit rank < general rank AND brexit index > general index.
Which emojis over-index most heavily for Brexit?(above and beyond their usual popularity on Twitter)
Finding the “hashtag signature” of a given emoji We know the distribution of
hashtags in our entire dataset We can pick a given emoji and
compute the distribution of hashtags for tweets that use that emoji
By comparing these two distributions, we can estimate which hashtags an emoji is most likely to be used with
15%
17%
20%
29%
19%
Hashtag signatures of the top emojis of Brexit
http://motherboard.vice.com/read/the-emojis-of-great-brexit
Taylor Swift is winning hearts (and minds)
Source: Analysis of 100,000 public tweets
mentioning @taylorswift13 and @kanyewest from
Aug. 1-4, 2016. (PRISMOJI)
equal
higher association with
@taylorswift13
higher association with
@kanyewest
Hearts vs. Snakes:The emoji battle underyling the epic Taylor Swift – Kanye West feud
Source: Analysis of 100,000 public tweets
mentioning @taylorswift13 and @kanyewest from
Aug. 1-4, 2016. (PRISMOJI)
#taylorswiftwhatup is the most common hashtag in tweets about both Taylor and Kanye
Source: Analysis of 100,000 public tweets
mentioning @taylorswift13 and @kanyewest from
Aug. 1-4, 2016. (PRISMOJI)
Our common emoji language of #fanlove
Source: Analysis of 250,000 public tweets
mentioning @beyonce, @justinbieber,
@djkhaled, @drake, and @rihanna from
Aug. 1-4, 2016. (PRISMOJI)
Sometimes love hurtsExamples of in tweets involving #fanlove
Source: Analysis of 250,000 public tweets
mentioning @beyonce, @justinbieber,
@djkhaled, @drake, and @rihanna from
Aug. 1-4, 2016. (PRISMOJI)
http://motherboard.vice.com/read/a-data-scientists-emoji-guide-to-kanye-west-and-taylor-swift
Some more examples
#firstsevenjobs
Source: Analysis of 32,979 public tweets with
the hashtags #firstsevenjobs and
#first7jobs from Aug. 8, 2016. (PRISMOJI)
Understanding gendered emojis on Twitter#wcw vs #mcm: All hearts are not created equal
higher association
with
#mcm
higher association
with
#wcw
Source: Analysis of 100,000 public tweets
with the hashtags #wcw and #mcm from June 27-
29, 2016. (PRISMOJI)
#Rio2016 Olympics
Source: Analysis of 449,680 public tweets mentioning #rio2016
fromAug. 6-22, 2016.
(PRISMOJI)
higher association with
FIRST 3 DAYS
higher association with
LAST 3 DAYS
Third Presidential Debate
Source: Analysis of public tweets during
third presidential debate on
Oct. 20, 2016. (PRISMOJI)
Three takeaways I’d like you to leave with■Understanding emojis as data can yield
interesting insights
■More work is needed to learn more about what emojis mean, and what they reveal about our world
■You can play around with emoji data too
Thank you!
• Email: [email protected]• Twitter: @hamdanazhar• prismoji.com• hamdanazhar.com