Upload
maud-waters
View
212
Download
0
Embed Size (px)
Citation preview
Automatic Automatic Detection of Tags Detection of Tags
for Political for Political BlogsBlogsKhairun-nisa Hassanali and Vasileios HatzivassiloglouKhairun-nisa Hassanali and Vasileios Hatzivassiloglou
Human Language Technology Research InstituteHuman Language Technology Research InstituteThe University of Texas at DallasThe University of Texas at Dallas
June 6, 2010June 6, 2010
NAACL-HLT 2010: Computational Linguistics in a World of NAACL-HLT 2010: Computational Linguistics in a World of Social MediaSocial Media
Los Angeles, CaliforniaLos Angeles, California
06/06/2010 2Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs
GoalGoal
Goal: Identify Goal: Identify topic tags topic tags of political blog of political blog postsposts Tags are single words or groups of wordsTags are single words or groups of words
Motivation: Build a system thatMotivation: Build a system that Collates information across blog postsCollates information across blog posts Combines evidence to numerically rate Combines evidence to numerically rate
attitudes of blogs on different topicsattitudes of blogs on different topics Trace the evolution of attitudes over timeTrace the evolution of attitudes over time
Tags assigned to a post are collectively the Tags assigned to a post are collectively the post’s topical signaturepost’s topical signature
06/06/2010 3Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs
Our ApproachOur Approach
Train a Support Vector Machine for each Train a Support Vector Machine for each possible tagpossible tag
Select the five strongest votesSelect the five strongest votes Investigated several featuresInvestigated several features
Single words (baseline)Single words (baseline) Syntactic groups (noun phrases and proper nouns, Syntactic groups (noun phrases and proper nouns,
detected with shallow parsing)detected with shallow parsing) Named Entity RecognitionNamed Entity Recognition Co-reference ResolutionCo-reference Resolution Synonyms (using WordNet)Synonyms (using WordNet) Word position (title versus body)Word position (title versus body)
06/06/2010 4Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs
DataData
Collected data from two major Collected data from two major political blogspolitical blogs Daily Kos (100,000 blog posts)Daily Kos (100,000 blog posts) Red State (70,000 blog posts)Red State (70,000 blog posts)
787,780 tags across both blogs787,780 tags across both blogs Covers the period 2003-2010 for Covers the period 2003-2010 for
Daily Kos and 2007-2010 for Red Daily Kos and 2007-2010 for Red StateState
06/06/2010 5Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs
ResultsResults Baseline precision/recall (Single Words): Baseline precision/recall (Single Words):
25.84%/54.97%25.84%/54.97% +Stemming precision/recall : -0.46%/-0.62%+Stemming precision/recall : -0.46%/-0.62% +Proper Nouns precision/recall:+12.84%/+1.95%+Proper Nouns precision/recall:+12.84%/+1.95% +Named Entities precision/recall:+12.23%/-8.53%+Named Entities precision/recall:+12.23%/-8.53% All features All features
Automated Scoring precision/recall: 20.95%/65.123%Automated Scoring precision/recall: 20.95%/65.123% Manual Scoring precision/recall: 63.49%/72.71%Manual Scoring precision/recall: 63.49%/72.71%
Syntactic noun phrases help a lotSyntactic noun phrases help a lot Named entity recognition and proper nouns are Named entity recognition and proper nouns are
excellent featuresexcellent features Effect of co-reference resolution is marginalEffect of co-reference resolution is marginal
06/06/2010 6Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs
Earlier WorkEarlier Work Wang and Davison followed a similar Wang and Davison followed a similar
approach with SVM’s but for the purpose of approach with SVM’s but for the purpose of query expansion and suggestionquery expansion and suggestion Tags are assigned to web pages whereas we Tags are assigned to web pages whereas we
assign tags to individual posts assign tags to individual posts They report a precision of 45.25% and recall of They report a precision of 45.25% and recall of
23.24% compared to our precision 20.95% and 23.24% compared to our precision 20.95% and recall of 65.123%recall of 65.123%
Sood et. al find similar blog posts and filter Sood et. al find similar blog posts and filter tagstags They report a precision of 13.11% and recall of They report a precision of 13.11% and recall of
22.83% compared to our precision 20.95% and 22.83% compared to our precision 20.95% and recall of 65.123%recall of 65.123%
06/06/2010 7Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs
ConclusionConclusion
Described and evaluated a tool for Described and evaluated a tool for automatically tagging political blogs for automatically tagging political blogs for topicstopics Tagging benefits from named entity recognition Tagging benefits from named entity recognition
and proper nounsand proper nouns Using a hybrid approach (statistical and Using a hybrid approach (statistical and
grammatical) yields better resultsgrammatical) yields better results Recall exceeds numbers reported for other Recall exceeds numbers reported for other
domainsdomains Next step: Aggregate post opinion data, Next step: Aggregate post opinion data,
using the content tags as anchor pointsusing the content tags as anchor points