Searching for Quality Microblog Posts:Filtering and Ranking based on Content Analysis and Implicit Links
Jan Vosecky, Kenneth Wai-Ting Leung, and Wilfred Ng
Department of Computer Science and Engineering
HKUST
Hong Kong
DASFAA’12
2
Agenda
Introduction Proposed method Quality features of tweets Experiments Conclusions
Introduction Method Features Experiments Conclusions
3 Introduction
Introduction Method Features Experiments Conclusions
4
Microblogs
Both social network and social media Links between users (follow, mention, re-
tweet) Users post updates (tweets)
Tweet 1
Tweet 2
usertimestamp
URL link
hashtag
mentioned user
Introduction Method Features Experiments Conclusions
5
Searching for “ipad” on Twitter
Around 50 tweets mentioning “iPad” posted within a1-minute period
Introduction Method Features Experiments Conclusions
6
Research challenge
Twitter: user-generated content Short messages, often comments or opinions High volume Varying quality
“Most tweets are not of general interest (57%)” (Alonso et al.’10)
Information overload Research questions:
How to distinguish content worth reading from useless or less important messages?
How to promote ‘high quality’ content?
Introduction Method Features Experiments Conclusions
7
Defining ‘quality’
General (global) definition for assessing tweet quality
3 criteria: Well-formedness
+ Well-written, grammatically correct, understandable
- Heavy slang, misspellings, excessive punctuation Factuality
+ News, events, announcements- Unclear message, private conversations, generic
personal feelings Navigational quality (URL links)
+ Reputable external resources (e.g. news articles)- Personal information sharing (e.g. photo sharing
websites)
Introduction Method Features Experiments Conclusions
8
Quality-based tweet filtering
+--+-
Introduction Method Features Experiments Conclusions
9
Quality-based tweet ranking
5
4
3
1
1
Introduction Method Features Experiments Conclusions
10
Research goals
Quality-based tweet filtering Filtering out low-quality tweets
In twitter feeds In search results
Quality-based tweet ranking Re-ranking Twitter search results
For a given time period
Introduction Method Features Experiments Conclusions
11 Proposed Method
Introduction Method Features Experiments Conclusions
12
Representation of tweets
Vector-space model: not sufficient Short tweet length, terms often malformed Ignores special features in Twitter
Feature-vector representation Extract features from tweet Traditional features: e.g. length, spelling Twitter-specific features:
Exploiting hashtags, URL links, mentioned usernames
Introduction Method Features Experiments Conclusions
13 Quality Features of Tweets
Introduction Method Features Experiments Conclusions
14
Feature categoriesIntroduction Method Features Experiments Conclusions
1. Punctuation and Spelling
2. Syntactic and semantic complexity
Number of exclamation marksNumber of question marksMax. no. of repeated letters% of correctly spelled words No. of capitalized wordsMax. no. of consecutive capitalized words
Max. & Avg. word lengthLength of tweetPercentage of stopwordsContains numbersContains a measureContains emoticonsUniqueness score
3. Grammaticality 4. Link-basedHas first-person part-of-speechFormality scoreNumber of proper namesMax. no. of consecutiveproper namesNumber of named entities
Contains linkIs reply-tweetIs re-tweetNo. of mentions of usersNumber of hashtagsURL domain reputation scoreRT source reputation scoreHashtag reputation score
5. TimestampDay of the week of posting Hour of the day of posting
15
1. Punctuation and spelling
Excessive punctuation Number of exclamation marks Number of question marks Max. number of consecutive dots
Capitalization Presence of all-capitalized words Largest number of consecutive words in capital letters
Spellchecking Number of correctly spelled words Percentage of words found in a dictionary
RT @_ChocolateCoco: WHO IS CHUCK NORRIS??!!!?? lls. He's only the greatest guy next to jesus lmao
Introduction Method Features Experiments Conclusions
16
2. Syntactic and semantic complexity
Syntactic complexity Tweet length Max. & avg. word length Percentage of stopwords Presence of emoticons and other sentiment indicators Presence of measure symbols ($, %) Numbers – number of digits
Tweet uniqueness Uniqueness of the tweet relative to other tweets by the
author
Introduction Method Features Experiments Conclusions
where
17
3. Grammaticality
Parts-of-speech labelling Presence of first person parts-of-speech Formality score [Heylighen’02]
F = (noun frequency + adjective freq. + preposition freq.+ article freq. − pronoun freq. − verb freq. − adverb freq. − interjection freq. + 100)/2
Names Number of ‘proper names’ as words with a single initial
capital letter Number of consecutive ‘proper names’ Number of Named entities
F. Heylighen and J.-M. Dewaele. Variation in the contextuality of language: An empirical measure. Context in Context. Special issue Foundations of Science, 7(3):293–340, 2002.
Introduction Method Features Experiments Conclusions
18
4. Link-based features
Links to other items Re-tweet (RT), reply tweet, mention of other
users Presence of a URL link Number of hashtags as indicated by the “#” sign
Link target’s quality reputation metrics to reflect the quality of tweets which
relate to a URL domain Hashtag a user
Introduction Method Features Experiments Conclusions
19
URL domain reputation
Observation: Tweets which link to news articles usually better quality
than tweets which link to photo sharing websites
Questions: What does the quality of tweets linking to a website say
about its quality? Can we predict quality of future tweets linking to that
website?
Tweetpic.com
Tweet 1
Tweet 2Tweet 3
Q = 1
Q = 3Q = 2
NYtimes.com
Tweet 4
Tweet 5Tweet 6
Q = 5
Q = 4Q = 5
Introduction Method Features Experiments Conclusions
20
URL domain reputation
Step 1: URL translationShort link to original link
bit.ly/e2jt9F http://www.reuters.com/4151120
Step 2: summarize tweets linking to a URL domain Accumulate “quality reputation” over time
Introduction Method Features Experiments Conclusions
21
URL domain reputation
Average URL domain quality
Td = set of tweets linking to domain d
qt = quality label of tweet t
Weakness: Does not reflect the number of inlink tweets in the
score Favours domains with few inlink tweets
Introduction Method Features Experiments Conclusions
22
URL domain reputation
Domain reputation score
where AvgQ(d) is between [-1, +1]
“Collecting evidence” behaviour: Score getting higher with more good quality inlink
tweets
1 10 100 1000
-4.00-3.00-2.00-1.000.001.002.003.004.00
-1-0,500,51
DRS
|Td|
AvgQ
Introduction Method Features Experiments Conclusions
23
URL domain reputation
Domain AvgQ Inlinks RSgallup.com 0,96 99 1,92mashable.com 0,79 97 1,58hrw.org 0,86 57 1,51foxnews.com 0,68 38 1,08good.is 0,68 31 1,01intuit.com 0,57 60 1,01forbes.com 0,68 19 0,87reuters.com 1,00 6 0,78cnn.com 0,36 85 0,70
Domain AvgQ Inlinks RStweetphoto.com -0,77 106 -1,57twitpic.com -0,75 113 -1,54twitlonger.com -0,85 66 -1,54myloc.me -0,85 54 -1,48instagr.am -0,62 52 -1,06formspring.me -0,78 18 -0,98yfrog.com -0,55 53 -0,94lockerz.com -0,63 16 -0,75qik.com -0,75 8 -0,68
10 domains with a high DRS: 10 domains with a low DRS:
MainlyNews-oriented
sites
MainlyImage and
location sharing sites
Introduction Method Features Experiments Conclusions
24
Reputation of hashtag & user
Hashtag reputation
Re-tweet source user reputation
#justforfunTweet 1
Tweet 2Tweet 3
Q = 1
Q = 3Q = 2
#DASFAATweet 4
Tweet 5Tweet 6
Q = 5
Q = 4Q = 5
Introduction Method Features Experiments Conclusions
#DASFAA vs. #justforfun
@barackobama vs. @wysz22212
25 Experiments
Introduction Method Features Experiments Conclusions
27
Dataset
10,000 tweets 100 users, 100 recent tweets per user
Users: 50 random users 50 influential users
Selected from listorious.com 5 categories: technology, business, politics,
celebrities, activism 10 users per category
Introduction Method Features Experiments Conclusions
28
Labelling
Crowdsourcing Amazon Mechanical Turk
3 labels per tweet from different reviewers Possible labels: 1 to 5
1 = low quality, 5 = high quality Random order of tweets
Introduction Method Features Experiments Conclusions
29
Labelling results
Tweet quality distribution
Quality score:
Introduction Method Features Experiments Conclusions
30
Feature analysis
Total 29 features Top 5 features based on Information Gain:
0.374 Domain reputation 0.287 Contains link 0.130 Formality score 0.127 Num. proper names 0.113 Max. proper names
Introduction Method Features Experiments Conclusions
31
Feature selection
Greedy attribute selection 15 selected features:
Domain reputation RT source reputation
Formality Tweet uniqueness
No. named entities % correct. spelled words
Max. no. repeat. Letters No. hash-tags
Contains numbers No. capitalized words
Is reply-tweet Is re-tweet
Avg. word length Contains first-person
No. exclam. Marks
Introduction Method Features Experiments Conclusions
32
Classification and Ranking Method Classification:
SVM, binary classification (high-quality, low-quality)
50/50 split for training/testing Ranking:
Learning-to-rank (Rank SVM) 30 queries from 5 topic categories Process:
1. Retrieve tweets matching a query2. Extract features from the tweets3. ‘Query-tweet vector’ pairs + quality scores of
the tweets passed as input to Rank SVM
Introduction Method Features Experiments Conclusions
33
Classification results
Features#attributes
High-Quality Low-Quality
OverallAUC
P R P R
Link only 1 0.798 0.702 0.894 0.934 0.818
TF-IDF 3322 0.862 0.665 0.885 0.96 0.813
Subset.Reputation 3 0.812 0.746 0.909 0.936 0.841
Subset.SVM (“greedy”)
15 0.715 0.758 0.912 0.936 0.847
All quality features 29 0.815 0.66 0.882 0.944 0.802
All quality ftr’s + TF-IDF
3351 0.739 0.775 0.915 0.899 0.837
Introduction Method Features Experiments Conclusions
Optimal feature set (15 attrs.) outperforms TF-IDF (3322 attrs.)
Link-based “reputation” features (3 attrs.) achieve the 2nd best result
Combining quality features + TF-IDF does not improve result
34
Classification results
Features#attributes AUC
Link only 1 0.818
TF-IDF 3322 0.813
Subset.Reputation 3 0.841
Subset.SVM (“greedy”)
15 0.847
All quality features 29 0.802
All quality ftr’s + TF-IDF
3351 0.837
Storage cost
Training time
Introduction Method Features Experiments Conclusions
Optimal feature set achieves reduced training time and storage cost
35
Ranking results
Features#attributes
NDCG@N
MAP1 2 5 10
Link only 1 0.067 0.111 0.22 0.324 0.398
Subset.Reputation 3 0.822 0.777
0.777 0.764 0.661
Subset.SVM (“greedy”)
15 0.867
0.767 0.778
0.769
0.653
All quality features 29 0.733 0.733 0.763 0.753 0.637
where
Introduction Method Features Experiments Conclusions
Optimal feature set (15 attrs.) achieves the best result
Link-based “reputation” features (3 attrs.) achieve the 2nd best result
36 Conclusions
Introduction Method Features Experiments Conclusions
37
Summary
Method for quality-based classification and ranking of tweets
Proposed and evaluated a set of tweet’s features to capture the tweet’s quality
Link-based features lead to the best performance
Introduction Method Features Experiments Conclusions
38
Future work
Consider different types of queries in Twitter E.g. searching for hot topics, movie
reviews, facts, opinions, etc. Different features may be important in
different scenarios Incorporating recent hot topics Personalized re-ranking
Introduction Method Features Experiments Conclusions
39
Q / AIntroduction Method Features Experiments Conclusions
40
Thank YouIntroduction Method Features Experiments Conclusions
41
Related work
Spam detection Bag-of-words, keyword-based Feature-based approaches Combinations
Social networks Finding quality answers in Q-A systems
E.g. Yahoo Answers Feature-based
Web search Quality-based ranking of web documents
Feature-based quality score (WSDM’11)
42
ROC Curve
Area under the ROC curve: probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one