Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
ICDM 2015 Presentation
SimNest: Social Media Nested Epidemic
Simulation via Online Semi-supervised
Deep Learning
Joint work with Jiangzhuo Chen1, Feng Chen2, Wei Wang1,
Chang-Tien Lu1, and Naren Ramakrishnan1
Liang Zhao, Virginia Tech
1Virginia Tech, 2SUNY-Albany
Introduction: Epidemics
2
• Seasonal influenza:
• Yearly 3~5million cases of severe illness
• Yearly 250,000 to 500,000 deaths
• Pandemic Flu of 1918:
• Killed 2.5 - 5% of global population
• Many more were sick
• Ebola outbreak in West Africa:
• 27,055 cases
• 11,142 deaths
Introduction: Seasonal Epidemics
3
Week 47 Week 46 Week 45
Influenza outbreak on Week 47 ending Nov 22, 2014 in southern region
CDC: www.cdc.gov/flu/ Google Flu Trends: https://www.google.org/flutrends/about/
Epidemics Modeling (Category 1):
Computational Epidemiology
4
3. Run simulation model
a. Demographics and social contact network
b. Disease progression
c. Interventions
School Closure Vaccination Isolation
1. Model the following mechanisms 2. Tune parameters against
surveillance data
Epidemics Modeling (Category 1):
Computational Epidemiology
• Challenges
– Challenge 1: Coarse-grained surveillance data
– Challenge 2: Dynamics of contact networks
– Challenge 3: Poor timeliness
• Surveillance data comes by weeks.
• Surveillance data is at least one week behind.5
State-wise: Week-wise:
Peter moves out to another cityTaylor is immune
to flu after
getting flu shot Jim is on vacation from Dec 23.
Epidemics Modeling (Category 2):
Data Driven on Social Media
• Fast monitoring real-time epidemics
• Individual-wise health condition mining
6
• Temporally fine-grained
• No delay
Avoid crowds
in flu season,
What Peter will
do?Get flu shot
…
Feel I’m getting flu
AHA, false alarm
Maybe I indeed need see doctor
2. Identify the individual’s disease progression1. Identify the response to flu
Epidemics Modeling (Category 2):
Data Driven on Social Media
7
Have No Idea of the Underlying Mechanism
What is the real mechanism of disease progression?
What is infection process of flu across the crowds?
What is the consequence if someone took vaccine?
Is there any influence on infectivity if Jim will have summer holiday?
Challenge: Real Mechanism is hidden to social media
Motivations
8
• Drawbacks:– No mechanism on disease progression
– No mechanism on disease diffusion
– No consideration on interventions
Computational Epidemiology
• Drawbacks:– Temporally coarse-grained
– Spatially coarse-grained
– Poor dynamics in social contact
network
– One week delay
Social Media Mining
Combine
• Advantages:– Mechanism on disease progression
– Mechanism on disease diffusion
– Consideration on interventions
• Advantages:– Temporally fine-grained
– Spatially fine-grained
– Individually monitoring
– Change in social contact network is
observable in real time
– No time delay
+
• Drawbacks:– Temporally coarse-grained
– Spatially coarse-grained
– Poor dynamics in social contact
network
– One week delay
• Drawbacks:– No mechanism on disease progression
– No mechanism on disease diffusion
– No consideration on interventions
Model (part A): Supervised Loss
11
• Input (tweet content):
• Output (health stage):
• Mapping:
I: Infectious
• Supervised Loss:
𝑓𝑊 ⋅ : one-hidden layer perceptron
A
Model (part B): Bi-space Consistency Loss
• Social Contact Network:
– Nodes: 𝒱,
– Edges: ℰ,
– Weights: 𝒲, contact duration between
• Disease Progression: SEIR model
– Individual’s health stage: , where
Susceptible (S), Exposed (E), Infectious (I), and Recovered (R)
– Progression: S E I R
• Bi-space Loss•
–
12
Incubation period: 𝑝𝐸(𝑣) ∼ 𝒩(𝜇𝐸 , 𝜎𝐸) Infectious period: 𝑝𝐼(𝑣) ∼ 𝒩(𝜇𝐼 , 𝜎𝐼)
B
Model (part C): Infectious Period Loss
13
• Infectious Period observed in social media should be statistically
consistent with that in disease progression model
• Maximize the likelihood, and re-arrange:
C
Model (part D): Temporal Pattern Loss
• Health stage should be consecutive.
• Individual who recovers from flu cannot get it
again.
14
D
Online Training Algorithm
• Objective function:
• Alternating optimization:
– Solving for 𝑊, fix others.
• Stochastic Gradient Descent
– Solving for Θ, fix others.
• Nelder-Mead method.
– Solving for 𝑝𝐼 , 𝜆1.
15
C D
A B
Model Extensions
1. Consider dynamics of contact network
2. Consider heterogeneous surveillance
– Loss:
– Scaling down time frame:
16
Dynamically adjust the
transmissibility:
E
Experiments: Dataset
17
Connecticut (CT), Massachusetts (MA), Maryland (MD), and Virginia (VA), and
the District of Columbia (DC)
• Dataset:
– Twitter: Year 2011 ~ Year 2015 in the US.
– Training set: Aug 1 2011 ~ Jul 31 2012.
– Test set: Aug 1 2012 ~ Jul 31 2014.
Experiments: Label and Metrics
18
• Label:
– influenza statistics reported by the Centers for Disease Control
and Prevention (CDC).
– The CDC weekly publishes the percentage of the number of
physician visits related to influenza-like illness (ILI) within each
major region in the United States.
• Metrics:
– Lead time: How much time the output is ahead of the input.
– Mean squared error (MSE)
– Pearson correlation
– P-value
– Peak time error: Error of the predicted time of peak value
Experiments: Comparison Methods
• social media mining methods:– Linear Autoregressive Exogenous model (LinARX)
– Logistic Autoregressive Exogenous model (LogARX)
– Simple Linear Regression model (simpleLinReg)
– Multi-variable linear regression model (multiLinReg)
• computational epidemiology methods:
– SEIR
– EpiFast
• Detailed parameter settings:
– See here: http://people.cs.vt.edu/liangz8/materials/papers/SimNestAddon.pdf
19