7
Automatic Automatic Detection of Detection of Tags for Tags for Political Blogs Political Blogs Khairun-nisa Hassanali and Vasileios Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Hatzivassiloglou Human Language Technology Research Institute Human Language Technology Research Institute The University of Texas at Dallas The University of Texas at Dallas June 6, 2010 June 6, 2010 NAACL-HLT 2010: Computational Linguistics in a NAACL-HLT 2010: Computational Linguistics in a World of Social Media World of Social Media Los Angeles, California Los Angeles, California

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

Embed Size (px)

Citation preview

Page 1: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

Automatic Automatic Detection of Tags Detection of Tags

for Political for Political BlogsBlogsKhairun-nisa Hassanali and Vasileios HatzivassiloglouKhairun-nisa Hassanali and Vasileios Hatzivassiloglou

Human Language Technology Research InstituteHuman Language Technology Research InstituteThe University of Texas at DallasThe University of Texas at Dallas

June 6, 2010June 6, 2010

NAACL-HLT 2010: Computational Linguistics in a World of NAACL-HLT 2010: Computational Linguistics in a World of Social MediaSocial Media

Los Angeles, CaliforniaLos Angeles, California

Page 2: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

06/06/2010 2Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs

GoalGoal

Goal: Identify Goal: Identify topic tags topic tags of political blog of political blog postsposts Tags are single words or groups of wordsTags are single words or groups of words

Motivation: Build a system thatMotivation: Build a system that Collates information across blog postsCollates information across blog posts Combines evidence to numerically rate Combines evidence to numerically rate

attitudes of blogs on different topicsattitudes of blogs on different topics Trace the evolution of attitudes over timeTrace the evolution of attitudes over time

Tags assigned to a post are collectively the Tags assigned to a post are collectively the post’s topical signaturepost’s topical signature

Page 3: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

06/06/2010 3Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs

Our ApproachOur Approach

Train a Support Vector Machine for each Train a Support Vector Machine for each possible tagpossible tag

Select the five strongest votesSelect the five strongest votes Investigated several featuresInvestigated several features

Single words (baseline)Single words (baseline) Syntactic groups (noun phrases and proper nouns, Syntactic groups (noun phrases and proper nouns,

detected with shallow parsing)detected with shallow parsing) Named Entity RecognitionNamed Entity Recognition Co-reference ResolutionCo-reference Resolution Synonyms (using WordNet)Synonyms (using WordNet) Word position (title versus body)Word position (title versus body)

Page 4: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

06/06/2010 4Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs

DataData

Collected data from two major Collected data from two major political blogspolitical blogs Daily Kos (100,000 blog posts)Daily Kos (100,000 blog posts) Red State (70,000 blog posts)Red State (70,000 blog posts)

787,780 tags across both blogs787,780 tags across both blogs Covers the period 2003-2010 for Covers the period 2003-2010 for

Daily Kos and 2007-2010 for Red Daily Kos and 2007-2010 for Red StateState

Page 5: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

06/06/2010 5Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs

ResultsResults Baseline precision/recall (Single Words): Baseline precision/recall (Single Words):

25.84%/54.97%25.84%/54.97% +Stemming precision/recall : -0.46%/-0.62%+Stemming precision/recall : -0.46%/-0.62% +Proper Nouns precision/recall:+12.84%/+1.95%+Proper Nouns precision/recall:+12.84%/+1.95% +Named Entities precision/recall:+12.23%/-8.53%+Named Entities precision/recall:+12.23%/-8.53% All features All features

Automated Scoring precision/recall: 20.95%/65.123%Automated Scoring precision/recall: 20.95%/65.123% Manual Scoring precision/recall: 63.49%/72.71%Manual Scoring precision/recall: 63.49%/72.71%

Syntactic noun phrases help a lotSyntactic noun phrases help a lot Named entity recognition and proper nouns are Named entity recognition and proper nouns are

excellent featuresexcellent features Effect of co-reference resolution is marginalEffect of co-reference resolution is marginal

Page 6: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

06/06/2010 6Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs

Earlier WorkEarlier Work Wang and Davison followed a similar Wang and Davison followed a similar

approach with SVM’s but for the purpose of approach with SVM’s but for the purpose of query expansion and suggestionquery expansion and suggestion Tags are assigned to web pages whereas we Tags are assigned to web pages whereas we

assign tags to individual posts assign tags to individual posts They report a precision of 45.25% and recall of They report a precision of 45.25% and recall of

23.24% compared to our precision 20.95% and 23.24% compared to our precision 20.95% and recall of 65.123%recall of 65.123%

Sood et. al find similar blog posts and filter Sood et. al find similar blog posts and filter tagstags They report a precision of 13.11% and recall of They report a precision of 13.11% and recall of

22.83% compared to our precision 20.95% and 22.83% compared to our precision 20.95% and recall of 65.123%recall of 65.123%

Page 7: Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The

06/06/2010 7Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs

ConclusionConclusion

Described and evaluated a tool for Described and evaluated a tool for automatically tagging political blogs for automatically tagging political blogs for topicstopics Tagging benefits from named entity recognition Tagging benefits from named entity recognition

and proper nounsand proper nouns Using a hybrid approach (statistical and Using a hybrid approach (statistical and

grammatical) yields better resultsgrammatical) yields better results Recall exceeds numbers reported for other Recall exceeds numbers reported for other

domainsdomains Next step: Aggregate post opinion data, Next step: Aggregate post opinion data,

using the content tags as anchor pointsusing the content tags as anchor points