12
Dynamic Multi-Faceted Topic Discovery in Twitter Jan Vosecky Di Jiang Kenneth Wai-Ting Leung Wilfred Ng

Dynamic Multi-Faceted Topic Discovery in Twitter

Embed Size (px)

DESCRIPTION

Discovering high-level topics from social streams is important for many downstream applications. However, traditional text mining methods that rely on the bag-of-words model are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, topics in Twitter are inherently dynamic and often focus on specific entities, such as people or organizations. In this paper, we therefore propose a method for mining multifaceted topics from Twitter streams. The Multi-Faceted Topic Model (MfTM) is proposed to jointly model latent semantics among terms and entities and captures the temporal characteristics of each topic. We develop an efficient online inference method for MfTM, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We further demonstrate the effectiveness of our framework in the context of tweet clustering. More info: http://www.cse.ust.hk/~jvosecky/

Citation preview

Page 1: Dynamic Multi-Faceted Topic Discovery in Twitter

Dynamic Multi-Faceted Topic Discovery in Twitter

Jan Vosecky

Di Jiang

Kenneth Wai-Ting Leung

Wilfred Ng

Page 2: Dynamic Multi-Faceted Topic Discovery in Twitter

2

Twitter

Page 3: Dynamic Multi-Faceted Topic Discovery in Twitter

3

Representation

• Vector space model– Term vector sparseness issue

• Topic models– Latent topic vector better than VSM?

Page 4: Dynamic Multi-Faceted Topic Discovery in Twitter

4

Topic Models

A latent topic in LDA

“Arab revolutions”

Libya 0.00040Force 0.00020Human 0.00010Abuse 0.00010Protect 0.00009Secure 0.00008War 0.00005Execute 0.00004

Page 5: Dynamic Multi-Faceted Topic Discovery in Twitter

5

A topic in Twitter?

• Not just words• People talk about entities

Locations

Time

…PersonsOrganizations

Page 6: Dynamic Multi-Faceted Topic Discovery in Twitter

6

Multi-faceted Topic Model

• Each topic consists of n facets– Elements of each facet ~ multinomial distribution

• Each document d is a distribution over topics– General terms, named entities and timestamp

drawn from the respective facet of topic z

Page 7: Dynamic Multi-Faceted Topic Discovery in Twitter

7

Multi-faceted Topic Model

Multi-faceted latent topic “Arab revolutions”

General terms Persons Locations Organizations

Time

Page 8: Dynamic Multi-Faceted Topic Discovery in Twitter

8

Parameter Inference

• Scalability– Gibbs sampling and variational inference

process data in a batch

• Online inference– Stochastic variational inference

to process streaming data

Model continuously updated

Constant time to process a new doc

doc doc doc doc

inference

doc doc doc doc

inference

……

Page 9: Dynamic Multi-Faceted Topic Discovery in Twitter

9

Perplexity comparison:Online inference vs. Gibbs sampling

K = 50 K = 200

Page 10: Dynamic Multi-Faceted Topic Discovery in Twitter

10

Tweet Clustering

(a) Manually-labeled dataset (b) Hashtag-labeled dataset

DBSCANK-means Direct DBSCANK-means Direct

Vector space model (TF-IDF)

Page 11: Dynamic Multi-Faceted Topic Discovery in Twitter

11

Summary

• Model multi-faceted topics in microblogs– Entity-oriented and dynamic

• Online inference method

• Beneficial for downstream applications

Page 12: Dynamic Multi-Faceted Topic Discovery in Twitter

12

Thank You!

Jan Vosecky

Di Jiang

Kenneth Wai-Ting Leung

Wilfred Ng