Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Nonparametric Bayesian StorylineDetection from Microtexts
Vinodh Krishnan and Jacob EisensteinGeorgia Institute of Technology
Clustering microtexts into storylines
z = 1
Strong start for Barcelona
Oct 1, 1:15pmz = 2
Dog tuxedo bought with countycredit card
Oct 1, 1:23pm
z = 1
Messi scores! Barcelona up 1-0
Oct 1, 1:39pm
. . .
z = 3
Yellow card for Messi
Oct 8, 10:15am
Storyline detection is a multimodal clusteringproblem, involving content and time.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Clustering microtexts into storylines
z = 1 Strong start for Barcelona
Oct 1, 1:15pm
z = 2 Dog tuxedo bought with countycredit card
Oct 1, 1:23pm
z = 1 Messi scores! Barcelona up 1-0
Oct 1, 1:39pm
. . .z = 1 Yellow card for Messi
Oct 8, 10:15am
Storyline detection is a multimodal clusteringproblem, involving content and time.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Clustering microtexts into storylines
z = 1 Strong start for Barcelona Oct 1, 1:15pmz = 2 Dog tuxedo bought with county
credit cardOct 1, 1:23pm
z = 1 Messi scores! Barcelona up 1-0 Oct 1, 1:39pm. . .
z = 3 Yellow card for Messi Oct 8, 10:15am
Storyline detection is a multimodal clusteringproblem, involving content and time.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Clustering microtexts into storylines
z = 1 Strong start for Barcelona Oct 1, 1:15pmz = 2 Dog tuxedo bought with county
credit cardOct 1, 1:23pm
z = 1 Messi scores! Barcelona up 1-0 Oct 1, 1:39pm. . .
z = 3 Yellow card for Messi Oct 8, 10:15am
Storyline detection is a multimodal clusteringproblem, involving content and time.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
About timePrior approaches to modeling time
I Maximum temporal gap between items onsame storyline
I Look for attention peaks (Marcus et al., 2011)I Model temporal distribution per storyline (Ihler
et al., 2006; Wang & McCallum, 2006)
Problems with these approaches:I Storylines can have vastly different timescales,
might be periodic, etc.I Methods for determining number of storylines
are typically ad hoc.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
About timePrior approaches to modeling time
I Maximum temporal gap between items onsame storyline
I Look for attention peaks (Marcus et al., 2011)I Model temporal distribution per storyline (Ihler
et al., 2006; Wang & McCallum, 2006)Problems with these approaches:
I Storylines can have vastly different timescales,might be periodic, etc.
I Methods for determining number of storylinesare typically ad hoc.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
This work
A non-parametric Bayesian framework for storylinesI The number of storylines is a latent variable.I No parametric assumptions about the temporal structure
of storyline popularity.I Text is modeled as a bag-of-words, but the modular
framework admits arbitrary (centroid-based) models.I Linear-time inference via streaming sampling
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Modeling framework
Prior probability of storyline assignments,conditioned on timestamps
P(w , z | t) = P(z | t)K∏
k=1P({w i :zi=k})
Likelihood of text, computed per storyline
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
The prior over storyline assignments
We want a prior distribution P(z | t) that is:I nonparametric over the number of storylines;I nonparametric over the storyline temporal
distributions.How to do it?
The distance-dependentChinese restaurant process (Blei & Frazier, 2011)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
The prior over storyline assignments
We want a prior distribution P(z | t) that is:I nonparametric over the number of storylines;I nonparametric over the storyline temporal
distributions.How to do it? The distance-dependentChinese restaurant process (Blei & Frazier, 2011)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings
c1
c2
c3
c4
Key idea of dd-CRP: “follower”graphs define clusterings.
I Z = ((1, 3, 4), (2))
I Z = ((1, 3), (2, 4))I Z = ((1, 3, 4), (2))I Z = ((1, 3), (2), (4))
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings
c1
c2
c3
c4
Key idea of dd-CRP: “follower”graphs define clusterings.
I Z = ((1, 3, 4), (2))I Z = ((1, 3), (2, 4))
I Z = ((1, 3, 4), (2))I Z = ((1, 3), (2), (4))
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings
c1
c2
c3
c4
Key idea of dd-CRP: “follower”graphs define clusterings.
I Z = ((1, 3, 4), (2))I Z = ((1, 3), (2, 4))I Z = ((1, 3, 4), (2))
I Z = ((1, 3), (2), (4))
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings
c1
c2
c3
c4
Key idea of dd-CRP: “follower”graphs define clusterings.
I Z = ((1, 3, 4), (2))I Z = ((1, 3), (2, 4))I Z = ((1, 3, 4), (2))I Z = ((1, 3), (2), (4))
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Prior distributionWe reformulate the prior over follower graphs:
P(z | t) = P(c | t) =N∏
i=1P(ci | ti , tci )
P(ci | ti , tci ) =
e−|ti−tci |/a, ci 6= iα, ci = i
I Probability of two documents being linkeddecreases exponentially with time gap ti − tj .
I The likelihood of a document linking to itself(starting a new cluster) is proportional to α.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Modeling framework
Prior probability of storyline assignments,conditioned on timestamps
P(w , z | t) = P(z | t)K∏
k=1P({w i :zi=k})
Likelihood of text, computed per storyline
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Likelihood
Cluster likelihoods are computed using the DirichletCompound Multinomial (Doyle & Elkan, 2009).
P(w) =K∏
k=1P({w i}zi=k)
=K∏
k=1
∫θPMN({w i}zi=k | θk)PDir(θk ; η)dθk
=K∏
k=1PDCM({w i}zi=k ; η),
where η is a concentration hyperparameter.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
The Dirichlet Compound MultinomialThe DCM is a distribution over vectors of counts,which rewards compact word distributions.
Mes
si
card
Barc
elon
a
yello
w
cred
it
tuxe
do
goal
word
0
1
2
3
4
5w1w2
10-5 10-4 10-3 10-2 10-1 100 101 102
η
70
60
50
40
30
20
10
logP
(w)
w1
w2
We set the hyperparameter η using a heuristic fromMinka (2012).
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Modeling framework
Prior probability of storyline assignments,conditioned on timestamps
P(w , z | t) = P(z | t)K∏
k=1P({w i :zi=k})
Likelihood of text, computed per storyline
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ Pr(ci = j)×P({wk}zk =zi∨zk =zj )
P({wk}zk =zi )× P({wk}zk =zj )
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ Pr(ci = j)×P({wk}zk =zi∨zk =zj )
P({wk}zk =zi )× P({wk}zk =zj )
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ e−t4−t1
a × P({w1,w3,w4)}P({w4})× P({w1,w3})
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ e−t4−t2
a × P({w2,w4)}P({w4})× P({w2})
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ e−t4−t3
a × P({w1,w3,w4)}P({w4})× P({w1,w3})
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ α× P({w4)}P({w4})
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs samplingc1
c2
c3
c4
I We iteratively cut and resampleeach link.
I Each link is sampled from thejoint probability,
Prsample
(ci = j | c−i ,w) ∝ Pr(ci = j)× P(w | c)
∝ Pr(ci = j)×P({wk}zk =zi∨zk =zj )
P({wk}zk =zi )× P({wk}zk =zj )
I Online inference: Gibbssampling restricted to a movingwindow (linear-time)
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
TREC 2014 TTG ResultsModel F1 F w
1
dd-CRP clustering models1. baseline 0.20 0.302. offline 0.29 0.343. online 0.29 0.35
Top systems from Trec-2014 TTG4. TTGPKUICST2 (Lv et al., 2014) 0.35 0.465. EM50 (Magdy et al., 2014) 0.25 0.386. hltcoeTTG1 (Xu et al., 2014) 0.28 0.37
I Online inference as accurate as offline GibbsI 2nd of 14 TREC systems on F1, 4th/14 on F w
1I We use the baseline retrieval model, 0.31 MAP
vs 0.5-0.6 MAP for best systems.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
TREC 2014 TTG ResultsModel F1 F w
1
dd-CRP clustering models1. baseline 0.20 0.302. offline 0.29 0.343. online 0.29 0.35
Top systems from Trec-2014 TTG4. TTGPKUICST2 (Lv et al., 2014) 0.35 0.465. EM50 (Magdy et al., 2014) 0.25 0.386. hltcoeTTG1 (Xu et al., 2014) 0.28 0.37
I Online inference as accurate as offline GibbsI 2nd of 14 TREC systems on F1, 4th/14 on F w
1I We use the baseline retrieval model, 0.31 MAP
vs 0.5-0.6 MAP for best systems.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
TREC 2014 TTG ResultsModel F1 F w
1
dd-CRP clustering models1. baseline 0.20 0.302. offline 0.29 0.343. online 0.29 0.35
Top systems from Trec-2014 TTG4. TTGPKUICST2 (Lv et al., 2014) 0.35 0.465. EM50 (Magdy et al., 2014) 0.25 0.386. hltcoeTTG1 (Xu et al., 2014) 0.28 0.37
I Online inference as accurate as offline GibbsI 2nd of 14 TREC systems on F1, 4th/14 on F w
1I We use the baseline retrieval model, 0.31 MAP
vs 0.5-0.6 MAP for best systems.Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Summary
I Nonparametric Bayesian storyline detectionincorporating content and time.
Content Centroid-based likelihood(Dirichlet Compound Multinomial)
Time Distance-based prior (ddCRP)Fancier likelihoods and distance functions canbe incorporated in future work!
I Our nonparametric model is competitive withTREC TTG systems, despite using a muchweaker retrieval model.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Acknowledgments
I National Institutes for Health(R01GM112697-01)
I A Focused Research Award for computationaljournalism from Google
I CNewsStory 2016 reviewersI Patrick Violette and Irfan Essa
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
References I
Blei, D. M. & Frazier, P. I. (2011). Distance dependent chinese restaurant processes. Journal of Machine LearningResearch, 12(Aug), 2461–2488.
Doyle, G. & Elkan, C. (2009). Accounting for burstiness in topic models. In Proceedings of the 26th AnnualInternational Conference on Machine Learning, (pp. 281–288). ACM.
Ihler, A., Hutchins, J., & Smyth, P. (2006). Adaptive event detection with time-varying poisson processes. InKDD, (pp. 207–216). ACM.
Lv, C., Fan, F., Qiang, R., Fei, Y., & Yang, J. (2014). PKUICST at TREC 2014 Microblog Track: featureextraction for effective microblog search and adaptive clustering algorithms for TTG. Technical report, DTICDocument.
Magdy, W., Gao, W., Elganainy, T., & Wei, Z. (2014). Qcri at trec 2014: applying the kiss principle for the ttgtask in the microblog track. Technical report, DTIC Document.
Marcus, A., Bernstein, M. S., Badar, O., Karger, D. R., Madden, S., & Miller, R. C. (2011). Twitinfo: aggregatingand visualizing microblogs for event exploration. In chi, (pp. 227–236). ACM.
Minka, T. (2012). Estimating a dirichlet distribution.http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf.
Wang, X. & McCallum, A. (2006). Topics over time: a non-markov continuous-time model of topical trends. InProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,(pp. 424–433). ACM.
Xu, T., McNamee, P., & Oard, D. W. (2014). Hltcoe at trec 2014: Microblog and clinical decision support.
Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts