19
Semantic Processing of Semantic Processing of Twitter Traffic for Twitter Traffic for Epidemic Surveillance Epidemic Surveillance David S. Hale, Alla Keselman, David S. Hale, Alla Keselman, Thomas C. Rindflesch Thomas C. Rindflesch Lister Hill National Center for Lister Hill National Center for Biomedical Communications Biomedical Communications Specialized Information Services Specialized Information Services

Semantic Processing of Twitter Traffic for Epidemic Surveillance

Embed Size (px)

DESCRIPTION

overview of initial research, utilizing semantic natural language processing of "swine flu" related Twitter posts and outline of next steps

Citation preview

Page 1: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Semantic Processing of Twitter Semantic Processing of Twitter Traffic for Epidemic Traffic for Epidemic

SurveillanceSurveillance

David S. Hale, Alla Keselman, Thomas C. David S. Hale, Alla Keselman, Thomas C.

RindfleschRindflesch

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsSpecialized Information ServicesSpecialized Information Services

Page 2: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Pandemic PreparednessPandemic Preparedness

Early detection is critical to effective responseEarly detection is critical to effective response ““The truth is out there, it is just not indexed well”The truth is out there, it is just not indexed well”

(A bumper sticker; NLM parking lot)(A bumper sticker; NLM parking lot)

Disaster information traffic: delays, loss, overloadDisaster information traffic: delays, loss, overload Outbreaks data requires fast collection / disseminationOutbreaks data requires fast collection / dissemination

Collection – from disease to syndromic surveillanceCollection – from disease to syndromic surveillance Dissemination – from formal announcements to informal channelsDissemination – from formal announcements to informal channels

Government agencies are entering web 2.0 innovationsGovernment agencies are entering web 2.0 innovations- E.g., CDC on TwitterE.g., CDC on Twitter

Page 3: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Monitoring the Internet for Syndromic SurveillanceMonitoring the Internet for Syndromic Surveillance

Current methods: keywords analysisCurrent methods: keywords analysis NewsNews

• Aggregator and visualization tools (e.g., HealthMap)Aggregator and visualization tools (e.g., HealthMap)

Web searches - queriesWeb searches - queries• Google Trends; Google Flu TrendsGoogle Trends; Google Flu Trends• Brownstein et al. – peak for “food poisoning” preceded peak Brownstein et al. – peak for “food poisoning” preceded peak

for “salmonella”, “peanut butter”, “recall”for “salmonella”, “peanut butter”, “recall”

Requires massive amounts of data Ambiguous as to searchers’ precise information

needs

Page 4: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Distribution of “swine flu” google query

Page 5: Semantic Processing of Twitter Traffic for Epidemic Surveillance

The Future of Syndromic SurveillanceThe Future of Syndromic Surveillance

Social Social media, chatter Blogs , “Microblogging” More real-time data Monitor sentiment as well as events

NLP analysis Requires less data / lower computational intensity More informative

• “swine flu” and “travel” VS. “how fast swine flu travels” and “is it safe to travel during a swine flu epidemic”

Page 6: Semantic Processing of Twitter Traffic for Epidemic Surveillance

TwitterTwitter

Micro-blogging serviceMicro-blogging service SMS gateway enables posting from mobile devicesSMS gateway enables posting from mobile devices

Users post without breaking context or settingUsers post without breaking context or setting JIT (just-in-time) bloggingJIT (just-in-time) blogging

API promotes community development of user API promotes community development of user experience and interactionexperience and interaction

4-5 million users (Nov 2008)4-5 million users (Nov 2008) 17 million visitors (April 2009)17 million visitors (April 2009)

Page 7: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Tweet CharacteristicsTweet Characteristics

Format: [username] [text] [date time client]Format: [username] [text] [date time client] Length: Text limited to 140Length: Text limited to 140 characterscharacters Char Set: Char Set: NotNot limited to ISO 8859-1 Western (Latin) limited to ISO 8859-1 Western (Latin) Grammaticality: VariableGrammaticality: Variable Hashtags (#): Denote topicsHashtags (#): Denote topics

Primarily utilized by experienced usersPrimarily utilized by experienced users

Page 8: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Tweet ContentTweet Content

Some provide (purported) information Some provide (purported) information Authority not determinedAuthority not determined

Majority express opinions Majority express opinions Often with humor or sarcasmOften with humor or sarcasm

Value for syndromic surveillanceValue for syndromic surveillance Source for assessing public sentimentSource for assessing public sentiment Observation of information trendingObservation of information trending As a guide for government actionAs a guide for government action

Page 9: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Tweets: Tweets: ExamplesExamples

CDC tips for preventing the flu: wash hands often CDC tips for preventing the flu: wash hands often and stay home when sickand stay home when sick

Oklahoma health officials say swine flu headed to Oklahoma health officials say swine flu headed to state, public needs to take precautionsstate, public needs to take precautions

Napolitano says “not a pandemic” yet Napolitano says “not a pandemic” yet I bet this whole swine flu scare really has Kermit I bet this whole swine flu scare really has Kermit

the Frog rethinking his relationship the Frog rethinking his relationship What’s next? Three-toed sloth flu? What’s next? Three-toed sloth flu?

Page 10: Semantic Processing of Twitter Traffic for Epidemic Surveillance

NLP AnalysisNLP Analysis

Unified Medical Language System (UMLS)Unified Medical Language System (UMLS) Medical concepts in semantic types (or classes)Medical concepts in semantic types (or classes)

MetaMap MetaMap Identifies UMLS concepts in textIdentifies UMLS concepts in text

SemRepSemRep Identifies semantic relations between conceptsIdentifies semantic relations between concepts

Rifampin for tuberculosisRifampin for tuberculosis Rifampin [Pharmacologic Substance]Rifampin [Pharmacologic Substance] TREATSTREATS Tuberculosis [Disease or Syndrome]Tuberculosis [Disease or Syndrome]

Page 11: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Monitoring Twitter with NLPMonitoring Twitter with NLP

Processed 1300 Twitter postsProcessed 1300 Twitter posts Known to be about swine flueKnown to be about swine flue Sent during 1 hour on Monday, April 27, 2009Sent during 1 hour on Monday, April 27, 2009

Preprocessed, to accommodate formatPreprocessed, to accommodate format Ran MetaMap and SemRepRan MetaMap and SemRep

Extracted semantic concepts and relationshipsExtracted semantic concepts and relationships

Defined a semantic schema for influenza epidemicDefined a semantic schema for influenza epidemic

Page 12: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Schema: Schema: UMLS Semantic TypesUMLS Semantic Types

Focus output Focus output In the area of interestIn the area of interest And with the components in that areaAnd with the components in that area

Schema for influenza epidemicSchema for influenza epidemic Disease or SyndromeDisease or Syndrome Sign or SymptomSign or Symptom Geographic AreaGeographic Area MammalMammal Health Care OrganizationHealth Care Organization Medical DeviceMedical Device

Page 13: Semantic Processing of Twitter Traffic for Epidemic Surveillance

MetaMap and SemRep OutputMetaMap and SemRep Output

TweetTweet Texas confirms third case of swine fluTexas confirms third case of swine flu

Concepts extractedConcepts extracted Texas [Geographic Area]Texas [Geographic Area] Third [Quantitative Concept]Third [Quantitative Concept] Family suidae [Mammal] Family suidae [Mammal] Influenza [Disease or Syndrome]Influenza [Disease or Syndrome]

RelationshipRelationship Influenza PROCESS_OF Family suidaeInfluenza PROCESS_OF Family suidae

Page 14: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Results: Results: Most Frequent ConceptsMost Frequent Concepts

371 Family suidae [Mammal]371 Family suidae [Mammal] 324 Influenza [Disease or Syndrome]324 Influenza [Disease or Syndrome] 115 Not [Functional Concept]115 Not [Functional Concept] 113 Mexico [Geographic Area]113 Mexico [Geographic Area] 89 Centers for Disease Control and Prevention 89 Centers for Disease Control and Prevention

(U.S.) [Health Care Related Organization](U.S.) [Health Care Related Organization] 71 Case unit dose [Quantitative Concept]71 Case unit dose [Quantitative Concept] 54 Time [Temporal Concept]54 Time [Temporal Concept] 53 Pandemics [Phenomenon or Process]53 Pandemics [Phenomenon or Process]

Page 15: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Results: Results: Filtered through SchemaFiltered through Schema

Disease or Syndrome: InfluenzaDisease or Syndrome: Influenza Sign or Symptom: CoughingSign or Symptom: Coughing Geographic Area: MexicoGeographic Area: Mexico Mammal: Family suidaeMammal: Family suidae Health Care Organization: Centers for Disease Health Care Organization: Centers for Disease

Control and Prevention (U.S.)Control and Prevention (U.S.) Medical Device: MaskMedical Device: Mask

Page 16: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Results: Results: PROCESS_OF RelationPROCESS_OF Relation

Influenza PROCESS_OF Family suidaeInfluenza PROCESS_OF Family suidae

Influenza PROCESS_OF Farmer, unspecifiedInfluenza PROCESS_OF Farmer, unspecified

Influenza PROCESS_OF HispanicsInfluenza PROCESS_OF Hispanics

Influenza PROCESS_OF Mexican Influenza PROCESS_OF Mexican

Influenza in Birds PROCESS_OF HumanInfluenza in Birds PROCESS_OF Human

Influenza-like symptoms PROCESS_OF PassengerInfluenza-like symptoms PROCESS_OF Passenger

Flu symptoms PROCESS_OF Family suidaeFlu symptoms PROCESS_OF Family suidae

Swine influenza PROCESS_OF Family suidaeSwine influenza PROCESS_OF Family suidae

Page 17: Semantic Processing of Twitter Traffic for Epidemic Surveillance

Next StepsNext Steps

Twitter accessTwitter access Further testing for effectivenessFurther testing for effectiveness Refine filters (frequency, semantic types)Refine filters (frequency, semantic types)

User controlUser control

Implement proof-of-conceptImplement proof-of-concept Preprocessing for tweet formatPreprocessing for tweet format NLPNLP Final filteringFinal filtering

Output formatOutput format GraphsGraphs

Page 18: Semantic Processing of Twitter Traffic for Epidemic Surveillance

OpportunitiesOpportunities

BiosurveillanceBiosurveillance Monitoring of wide-spread sentimentMonitoring of wide-spread sentiment Targeted information provisionTargeted information provision

Respond to misinformation trendsRespond to misinformation trends

Potential for evaluating authenticityPotential for evaluating authenticity Semantic comparison to trusted sourceSemantic comparison to trusted source

Page 19: Semantic Processing of Twitter Traffic for Epidemic Surveillance

ConclusionConclusion

Exploiting the Internet for disaster preparednessExploiting the Internet for disaster preparedness Assessing public sentiment and eventsAssessing public sentiment and events Leveraging social media, e.g. TwitterLeveraging social media, e.g. Twitter Using semantic NLPUsing semantic NLP Useful to CDC and other government agenciesUseful to CDC and other government agencies Proof-of-concept experiment suggests the viability Proof-of-concept experiment suggests the viability

of this approachof this approach