Upload
truman
View
36
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Inferring User Political Preferences from Streaming Communications. Svitlana Volkova 1 , Glen Coppersmith 2 and Benjamin Van Durme 1,2. 1 Center for Language and Speech Processing 2 Human Language Technology Center of Excellence. ACL 2014, Baltimore. Motivation. - PowerPoint PPT Presentation
Citation preview
Inferring User Political Preferences from Streaming
Communications
Svitlana Volkova1, Glen Coppersmith2
and Benjamin Van Durme1,2
1Center for Language and Speech Processing2Human Language Technology Center of
ExcellenceACL 2014, Baltimore
Motivation• Personalized, diverse and timely data • Can reveal user interests, preferences and
opinions
DemographicsPro – http://www.demographicspro.com/WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/
Applications• Large-scale passive
polling and real-time live polling
• Online advertising • Healthcare
analytics• Personalized
recommendation systems and search
User Attribute Prediction
Political PreferenceRao et al., 2010; Conover et al.,
2011, Pennacchiotti and Popescu, 2011; Zamal et al.,
2012; Cohen and Ruths, 2013
.
.
.
Communications
GenderGarera and Yarowsky, 2009;
Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van
Durme, 2013
AgeRao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013;
Nguyen et al., 2011, 2013
…
…
…
…
…
Existing Approaches ~1K Tweets*
….…….…….…….…….…….…….…….…
Does an average Twitter user produce thousands of tweets?
*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013
Tweets as a
document
How Active are Twitter Users?
http://www.digitalbuzzblog.com/visualizing-twitter-statistics-x100/
Real-World Predictions
Not active users: no or limited content
Average Twitter usersMedian = 10 tweets per
day
Active users 1,000+ tweets
Private users: no content
10%
50%
20%
20%
Our Approach
1. Take advantage of user local neighborhoods
2. Incremental dynamic real-time predictions
Real world batch
predictions
Streaming predictions
Our Approach
1. Take advantage of user local neighborhoods
2. Incremental dynamic real-time predictions
Real world batch
predictions
Attributed Social Network
User Local Neighborhoods a.k.a. Social Circles
Twitter Network Data
Code, data and trained models for gender, age, political preference prediction
http://www.cs.jhu.edu/~svitlana/
Twitter Social GraphI. Candidate-Centric
1,031 users of interest
II. Geo-Centric 270 users
III. Politically Active* 371 users
10 - 20 neighbors of each type per user~50K nodes, ~60K edges
What types of neighbors lead to the best attribute prediction for a given
user?*Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013
Code, data and trained models for gender, age, political preference prediction
http://www.cs.jhu.edu/~svitlana/
Experiments• Log-linear binary unigram models:
(I) Users vs. (II) Neighbors and (III) Both
• Evaluate the relative utility of different neighborhood types:– varying neighborhood size n=[1, 2, 5, 10] and
content amount t=[5, 10, 15, 25, 50, 100, 200]– 10-fold cross validation with 100 random
restarts for every n and t parameter combination
Neighborhood Comparison
Tweets per Neighbor Tweets per Neighbor
1 Neighbor 10 Neighbors
Accu
racy
Optimizing Twitter API CallsCand-Centric Graph: Friend Circle
Optimizing Twitter API CallsCand-Centric Graph: Friend Circle
Optimizing Twitter API CallsCand-Centric Graph: Friend Circle
Optimizing Twitter API CallsCand-Centric Graph: Friend Circle
Summary: Batch Real-World Predictions with Limited User
DataMore data is better How to get it?• More neighbors per user >
additional content from the existing neighbors
What kind of data?• Follower, friend, @mention,
retweet
• Users recently joined Twitter• No or limited access to user
tweets
no or very
limited content!
Real-world predictions
Our Approach
1. Take advantage of user local neighborhoods
2. Incremental dynamic real-time predictions
Streaming predictions
Iterative Bayesian Predictions
Time
…
?
Cand-Centric Graph: Belief Updates
?
…
Time?
…
Time
Cand-Centric Graph: Prediction Time
User-Neighbor
_x0004_Cand _x0004_ Geo _x0007_ Active0.001
0.01
0.1
1
10
100
0.02
12 20
0.01
198.9
0.002
1.23.2
0.001
3.51.1
Wee
ks (l
og sc
ale)
100 users75%
confidence
Cand
75%95%
User Stream
Batch vs. Online Performance
Cand Geo Active0
0.2
0.4
0.6
0.8
1
0.720.57
0.75+0.03+0.1
+0.11+0.27
+0.27+0.14
+0.28+0.31
+0.25
User Batch Neighbor BatchUser Stream User-Neighbor Stream
Summary
• Neighborhood content is useful*
• Neighborhoods constructed from friends,
usermentions and retweets are most
effective
• Signal is distributed in the neighborhood
• Streaming models > batch models*Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011; Zamal et al., 2012
Thank you!Labeled Twitter network data for gender, age, political preference prediction: http://www.cs.jhu.edu/~svitlana/
Code and pre-trained models available upon request: [email protected]