Upload
shona
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Empirical Study of Topic Modeling in Twitter. Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University Bethlehem, PA USA. Why we care about text modeling in Twitter ?. SOMA 2010 . Why we care about text modeling in Twitter ?. Understanding users’ interests - PowerPoint PPT Presentation
Citation preview
Empirical Study of Topic Modeling in Twitter
Liangjie Hong and Brian D. DavisonComputer Science and Engineering
Lehigh UniversityBethlehem, PA USA
SOMA 2010
Why we care about text modeling in Twitter ?
SOMA 2010
Why we care about text modeling in Twitter ?
• Understanding users’ interests• Understanding social network• Identifying emerging topics
Problems
SOMA 2010
• Tweets are too short (140 char)• Hash tags• Abbreviations• Multiple languages
Question
SOMA 2010
How can we train an “effective” standard topic model ?
We found
SOMA 2010
• Topics learned by different aggregation strategies are substantially different
• Training the model at user-level is faster
• Learned topics can help classification tasks
A quick review of topic models
SOMA 2010
LDAAuthor-Topic
Our goal
SOMA 2010
Obtain topic mixtures for both tweets and users
Training Schemes
SOMA 2010
• Train on tweets• Infer users + tweets
• Train on aggregated tweets (by users)• Infer tweets
• Train on aggregated tweets (by terms)• Infer users + tweets
• Author-Topic model• Infer tweets
Datasets
SOMA 2010
• 1,992,758 tweets + 514,130 users• 3,697,498 terms
• 274 verified users from Twitter Suggestion• 16 categories • 50,447 tweets (150 tweets per user)
Tasks
SOMA 2010
• Topic modeling
• Retweet Prediction• User & Tweets Topical Classification
Logistic Regression
Topic Modeling
SOMA 2010
Topic Modeling
SOMA 2010
Topic Modeling
SOMA 2010
Retweet Prediction
SOMA 2010
Positive examples
@Jon Hello World2009-11-01
13:15pm
Hello World2009-11-01
12:00pm
@Kim @Jon Hello World2009-11-01
13:23pm
@Frank @Kim @Jon
Hello World2009-11-01
17:49pm
Negative examples
Retweet Prediction
SOMA 2010
Tweets Classification
SOMA 2010
User Classification
SOMA 2010
Conclusion
SOMA 2010
• User Level Aggregation is helpful• Fast and good result
• Author-Topic model does not directly apply
• Topic Modeling can help other tasks • tweets classification
Thank you and IBM Travel Grant!
Contact Info:Liangjie [email protected] LaboratoryComputer Science and EngineeringLehigh UniversityBethlehem, PA 18015 USA
SOMA 2010