Upload
cody-buntain
View
311
Download
3
Embed Size (px)
Citation preview
Cody [email protected] Interaction LabUniversity of Maryland
Jimmy [email protected] of Waterloo
Jennifer [email protected] of Maryland
CCNC’1611 January 2016
Las Vegas, NV
Discovering Key Moments in Social Media Streams
1
2Introduction
3Introduction
Most event detection
systems track human-
generated, seed keywords
4
Tweets per second mentioning “gol copa, gool copa, goool, golaço” during the match June 12th, 2014 [1]
Tweets per hour related to earthquakes [2]
Introduction
Step 1Identify Keywords
5
goal, score
Step 2Find Bursts
Typical Approach
Introduction
Weaknesses
6
goal == gooooal?
Introduction
Can we identify interesting moments without seed tokens?
7Introduction
Can we identify interesting moments without seed tokens?
8Introduction
Can we identify interesting moments without seed tokens?
9Introduction
Step 1Identify Keywords
10
goal, score
Step 2
LABurst Algorithm
Find Bursts
goooal, 0-1, 0:1,1-0, gollll, holandaaaa, penal, penalti, persie
Introduction
LABurst Algorithm
Discover Unanticipated Moments
11
suarez, bit,
biting
Identify Keywords
Introduction
12Methods
13
193 Key Moments
Methods
14
Can we transfer these sports-trained
models to more impactful domains?
Methods
15
Event Tweet Count Training Data 2010 NFL Division Championship 109,8092012 Premier League Soccer Games 1,064,0402014 NHL Stanley Cup Playoffs 2,421,0652014 NBA Playoffs 500,1702014 Kentucky Derby Horse Race 233,1722014 Belmont Stakes Horse Race 226,1602014 FIFA World Cup Stages A+B 5,867,783Testing Data 2013 MLB World Series Game 5 1,052,8522013 MLB World Series Game 6 1,026,8482013 Honshu Earthquake 444,0182014 NFL Super Bowl 1,024,3672014 FIFA World Cup Third Place 809,4262014 FIFA World Cup Final 1,166,7672014 Iwaki Earthquake 358,966
Total 16,305,443
Methods
LABurst learns
bursts from sporting event
data
16Methods
How do we model these
bursts?
17
Extract Tokens
Methods
How do we model these
bursts?
18Methods
19
Token Feature Vector v
How do we model these
bursts?
Freq. Regression
ΔAverage Freq.
Inter-Arrival TimeMessage EntropyNetwork Density
TF-IDFTF-PDF1
BursT2
Methods
20
Token Feature Vector v
SVM Random Forests
Ensemble
Bursty or Not?
BurstyClassifier
Methods
The more tokens that experience bursts in a
given minute, the more
important the moment
21
Key moment!
Methods
We evaluate LABurst by
comparing it against two
baseline methods
22Evaluation
Baseline 1 RawBurst
23
Find “bursts” in Twitter’s raw message frequency
Current Freq – Avg Freq ⩼ Threshold
? > threshold: KEY MOMENT!Evaluation
Baseline 2 TokenBurst
24
Modify RawBurst to use frequency of pre-specified
seed tokens
Current Freq – Avg Freq ⩼ Threshold
Sport Seed Tokens
World Series run, home, homerun
Super Bowl score, touchdown, td, fieldgoal, points
World Cup goal, gol, golazo, score, foul, penalty, card, red, yellow, points
Evaluation
25
Compare using ROC-
AUC
LABurst ThresholdNumber of tokens
experiencing a burst in this minute
Baseline ThresholdsDifference between
current frequency and average frequency
Evaluation
How well does our method perform?
26
10-Fold Cross Validation
Best scoring LABurst ensemble classifier:
ROC-AUC of 89.84% for training data
Results
Which features are the most important?
27
Feature Sets ROC-AUC Difference
AdaBoost, All Features 89.84% –
Without Regression 87.79% -2.05
Without Entropy 87.94% -1.9
Without TF-IDF 88.85% -0.99
Without TF-PDF 89.00% -0.84
Without Density 89.07% -0.77
Without InterArrival 89.46% -0.38
Without BursT 89.52% -0.31
Without Average
Difference 90.56% 0.72
Results
How well does our method perform?
28Results
How well does our method perform?
29Results
How well does our method perform?
30Results
How well does our method perform?
31Results
Composite ROC-AUC
32
Competitive without seed keywords or
prior domain knowledge
Results
Why is the Super Bowl
hard?
33
Training/Testing Data:
Other Impactful Moments:
Discussion
What was bursting at
these moments?
34
Match Event Bursty Tokens
Brazil v. Netherlands, 12 July
2014
Netherlands' Van Persie scores a goal on a penalty at 3',
1-0
0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,
holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red
Brazil v. Netherlands, 12 July
2014
Brazil's Oscar gets a yellow card at 68'
dive, juiz, penalty, ref
Germany v. Argentina, 13 July
2014
Germany’s Götze scores a goal at
113’, 1-0
goaaaaallllllll, goalllll, godammit,
goetze, gollllll, gooooool, gotze, gotzeeee, götze,
nooo, yessss,
Discussion
What other moments did
LABurst discover?
35
LABurst vs. TokenBurst at World Cup Final
Discussion
What other moments did
LABurst discover?
36
LABurst vs. TokenBurst at World Cup Final
Moment: "puyol", "gisele", and "bundchen"
Discussion
What other moments did
LABurst discover?
37
LABurst vs. Baseline at World Cup Final
Moment: "pipita", "higuaín", "", “pipa”, “choke”
Discussion
Can these models be
useful in other domains?
38
Earthquake Detection
Honshu, Japan Earthquake - 25 October 2013
Iwaki, Japan Earthquake - 11 July 2014
Simultaneously detects spikes
about the earthquake
Also detects an aftershock
Discussion
Can discover key moments from Twitter streams without seed tokens
39Conclusions
Can discover key moments from Twitter streams without seed tokens
40Conclusions
Can discover key moments from Twitter streams without seed tokens
41Conclusions
Can discover key moments from Twitter streams
without seed tokens
42Conclusions
Cody [email protected]@codybuntainHuman-Computer Interaction LabUniversity of Maryland
Thank you! Questions?
43
Discovering Key Moments in Social Media Streams
Backup Slides
44
How do we train these classifiers?
45
Examples of Bursty Tokens:
saints peterson
7-0 1-0
touchdown score
goalpenaltytd
fumble
persie messi
tonalist
Examples of Non-Bursty Tokens:
??
the, i, me, my, myself, we, our, ours, ourselves, you, before,
after, above, below, to, from, up, down, in,
out, on
Stop Words