46
Early Detection of Persistent Topics in Social Networks Shota Saito 1 Ryota Tomioka 2 Kenji Yamanishi 1, 3 1 The University of Tokyo 2 Toyota Technological Institute at Chicago 3 JST, CREST

Early Detection of Persistent Topics in Social Networks

Embed Size (px)

DESCRIPTION

Abstract: In social networking services (SNSs), persistent topics are extremely rare and valuable. In this paper, we propose an algorithm for the detection of persistent topics in SNSs based on Topic Graph. A topic graph is a subgraph of the ordinary social network graph that consists of the users who shared a certain topic up to some time point. Based on the assumption that the time-evolutions of the topic graphs associated with a persistent and non-persistent topics are different, we propose to detect persistent topics by performing anomaly detection on the feature values extracted from the time-evolution of the topic graph. For anomaly detection, we use principal component analysis to capture the subspace spanned by normal (non-persistent) topics. We demonstrate our technique on a real data set we gathered from Twitter and show that it performs significantly better than a base-line method based on power law curve fitting and the linear influence model. This is a slide I used when I presented the following paper. Shota Saito, Ryota Tomioka, and Kenji Yamanishi. Early Detection of Persistent Topics in Social Networks. In Proceeding of Advances in Social Networks in Analysis and Mining. pp xx-xx, 2014 The author copy of this paper is available from my website sites.google.com/site/ssaito1989

Citation preview

Page 1: Early Detection of Persistent Topics in Social Networks

Early Detection of Persistent Topics in Social Networks

Shota Saito1 Ryota Tomioka2 Kenji Yamanishi1, 3

 1 The University of Tokyo 2 Toyota Technological Institute at Chicago 3 JST, CREST

Page 2: Early Detection of Persistent Topics in Social Networks

Agenda!!Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

2

Page 3: Early Detection of Persistent Topics in Social Networks

Agenda!!Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

3

Page 4: Early Detection of Persistent Topics in Social Networks

Motivation!Long-term persistent topics have a long-tail on the number of sharers• Our Goal: Predict a topic is persistently shared or not as

soon as possible Persistent topic: Topic shared for long-term and persistently

• What is a ‘long-term persistent topic?’ 😕 Judge from the last shared date

4

3 days 10 days present

Not appropriate to think that this topic is a persistent one

# of sharers per unit time

time

Page 5: Early Detection of Persistent Topics in Social Networks

Motivation!Long-term persistent topics have a long-tail on the number of sharers• Our Goal: Predict a topic is persistently shared or not as

soon as possible Persistent topic: Topic shared for long-term and persistently

• What is a ‘long-term persistent topic?’ 😃 Judge from the long-tail of the number of sharers

!

!

!

!

!

!

5

3 days 10 days present

# of sharers per unit time

time

Long-tail

But you can know only after certain time elapsed

Would like to know only looking at the early period

Page 6: Early Detection of Persistent Topics in Social Networks

Motivation!Long-term persistent topics have a long-tail on the number of sharers

• 698 topics retweeted over 500 times • Plot amplification factor ap defined as

ap = (# of RTs w/i 50 days)/(# w/i 10 days) against # of RTs

ap > 1.1 -> persistent (marked blue) ap < 1.1 -> non-persistent (red)

!

• # of RTs doesn’t matter !

• Note that we can draw this picture only after 50 days elapsed

-> would like to know ! as soon as possible 6

Non-persistent

Persistent

Page 7: Early Detection of Persistent Topics in Social Networks

Motivation!Find “valuable topics” in social networks as early as possible• Social Networking Services (SNSs): Recently growing

More and more “non-valuable” topics in SNSs !

• What is a “valuable” topic in SNSs? 😕 Topics shared by many people

😕 Topics shared by influencers or authorised account

7

Page 8: Early Detection of Persistent Topics in Social Networks

Motivation!!Non-valuable topic example: posted by influencer

Indeed he is an influencer but…

8

Got attention, but…

Page 9: Early Detection of Persistent Topics in Social Networks

Motivation!Find “valuable topics” in social networks as early as possible• Social Networking Services (SNSs): Recently growing

More and more “non-valuable” topics in SNSs !

• What is a “valuable” topic in SNSs? 😕 Topics shared by many people

😕 Topics shared by influencers or authorised account

😃 Topics shared for a long time: survive persistently

9

Page 10: Early Detection of Persistent Topics in Social Networks

!

!

!

!

!

!

!

!

!

provide insights to predict fashion or trend predict how marketing campaign goes: success or not?

Motivation!Valuable topic example: Not only in English, but also other language

Persistent topics are insightful

10

Dropbox Marketing Campaign Emerging Opinion Leader or Topic

Before I work apple, I thought innovation is “to make something new.” But it is wrong, and innovation is actually “to make a future ordinarily thing.” It takes time to understand the difference between those.

Page 11: Early Detection of Persistent Topics in Social Networks

Motivation!Find “valuable topics” in social networks as early as possible• Social Networking Services (SNSs): Recently growing

More and more “non-valuable” topics in SNSs !

• What is a “valuable” topic in SNSs? 😕 Topics shared by many people

😕 Topics shared by influencers or authorised account

😃 Topics shared for a long time: survive persistently

Want to know persistent topics as early as possible!-> Able to foresee the trends

11

Page 12: Early Detection of Persistent Topics in Social Networks

Related Work!None of existing work focused on predicting persistent topics• Analysis on topics getting many attentions: who contributes?

Social friendship network in SNSs • Influencer[Cha+ 10] • Weak tie[Bakshy+ 12]

!

• Problem: Mainly focusing on getting attention topic -> barely have insights on persistent ones

12

Page 13: Early Detection of Persistent Topics in Social Networks

Related Work!None of existing work focused on predicting persistent topics• Topic Detection and Tracking (TDT)

Find a topic from sequential documents[Kleinberg 02] Problem: Mainly using Natural Language Techniques -> In SNS, many languages are used

!

Find a topic in Twitter from anomaly mention behaviour[Takahashi 11]

Problem: finding bursting topics, not persistent ones

13

Page 14: Early Detection of Persistent Topics in Social Networks

Agenda!!Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

14

Page 15: Early Detection of Persistent Topics in Social Networks

Approach to Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic Graphs• Approach to the Proposed Method

Previous: Language Proposed: Network

• Particularly: Previous: A friendship network fixed in SNS Proposed: A graph consisted of users who share the topic

Topic Graph

15

Page 16: Early Detection of Persistent Topics in Social Networks

Approach to Our Proposed Method!!Comparison btw graph on SNS and a Topic Graph

• Although existing work focuses on a graph made of users and their friendships on the whole SNS… !

• We focus on a topic graph, consisted of users who post or share the topic and their friendship !

• Note that a topic graph is a subgraph of a graph of the whole SNS16

Page 17: Early Detection of Persistent Topics in Social Networks

Approach to Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic Graphs• Approach to the Proposed Method

Previous: Language Proposed: Network

• Particularly: Previous: A friendship network fixed in SNS Proposed: A graph consisted of users who share the topic

Topic Graph !

Assumption: !a persistent topic has a different time-evolution of topic

graphs than other non-persistent topics17

Page 18: Early Detection of Persistent Topics in Social Networks

Approach to Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic Graphs

!

!

!

!

!

!

!

!

We focus on the time-evolution of topic graphs: Assume persistent topic’s time-evolution of topic graphs are different than others

18

Page 19: Early Detection of Persistent Topics in Social Networks

Approach to Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic Graphs: The actual example

19

Persistent topic’s time-evolution of topic graphs might be different than others

Page 20: Early Detection of Persistent Topics in Social Networks

Approach to Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic GraphsAssumption: Persistent topic has a different time-evolution of topic graph than others

!

Our proposal: The method to pick up different thing than other majority: Anomaly Detection

-> Apply an anomaly detection method to time-evolution of topic graph

!

Evaluate topic graph: Various feature values of complex network -> Utilise time-sequential various feature values of topic graph

20

Page 21: Early Detection of Persistent Topics in Social Networks

Overview of Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic GraphsAssumption: Persistent topic has a different time-evolution of topic graph than others

!

Our proposal: 1. Introduce feature values of complex network to topic graph 2. Utilise various time-sequential feature values 3. Apply anomaly detection via PCA

21

Page 22: Early Detection of Persistent Topics in Social Networks

Agenda!!Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

22

Page 23: Early Detection of Persistent Topics in Social Networks

Overview of Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic GraphsAssumption: Persistent topic has a different time-evolution of topic graph than others

!

Our proposal: 1. Introduce feature values of complex network to topic graph 2. Utilise various time-sequential feature values 3. Apply anomaly detection via PCA

23

Page 24: Early Detection of Persistent Topics in Social Networks

Overview of Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic GraphsAssumption: Persistent topic has a different time-evolution of topic graph than others

!

Our proposal: 1. Introduce feature values of complex network to topic graph 2. Utilise various time-sequential feature values 3. Apply anomaly detection via PCA

24

Page 25: Early Detection of Persistent Topics in Social Networks

Topic Graph!Define Topic Graph as a graph consisted by users who post and share the topic• Let G be a topic graph of a topic and at one time

!

!

!

!

!

Nodes: users who post or share Edges: their friendship

25

User who posts

Page 26: Early Detection of Persistent Topics in Social Networks

Topic Graph!Feature values we use: from global feature values to local feature values

26

User who posts

User who posts

User who postsUser who posts

# of sharers # of communities

Eigenvalues of Graph Laplacian LMaximum distance from the originAdjacency matrix

Degree matrix

Page 27: Early Detection of Persistent Topics in Social Networks

Overview of Our Proposed Method A persistent topic has an anomaly time-sequential of Topic GraphsAssumption: Persistent topic has a different time-evolution of topic graph than others

!

Our proposal: 1. Introduce feature values of complex network to topic graph 2. Utilise various time-sequential feature values 3. Apply anomaly detection via PCA

27

Page 28: Early Detection of Persistent Topics in Social Networks

Track evolution of Topic Graph !Set all the feature values as one vector• To track time-evolution of topic graphs -> Set all the feature values as one vector Data on one topic y

28

(� """")�# of shares # of communities

Maximum distance

Second largest GL’s eigenvalue

Largest GL’s eigenvalue

time(h) time(h) time(h) time(h) time(h)

>

Page 29: Early Detection of Persistent Topics in Social Networks

Overview of Our Proposed Method!A persistent topic has an anomaly time-sequential of Topic GraphsAssumption: Persistent topic has a different time-evolution of topic graph than others

!

Our proposal: 1. Introduce feature values of complex network to topic graph 2. Utilise various time-sequential feature values 3. Apply anomaly detection via PCA

29

Page 30: Early Detection of Persistent Topics in Social Networks

Data 1

Data 2

Image of Anomaly Detection via PCA !Use anomaly detection method via PCA proposed by Lakhina+.

Retake the base from axes to PCs[Pearson 1901] !

PC1: Normal PC2: Anomal !

Judging from norm of projection onto anomaly space[Lakhina+ 04] Data1: not anormal input Data2: anomal input

30

Page 31: Early Detection of Persistent Topics in Social Networks

Principal Component Analysis(PCA)!Retake basis to “describe” data well and not to “miss” the data• Let Y be a matrix made of non-persistent topics’ y s as

!

• let v1 be a first principal component, then !

• repeat this procedure. Hence, kth principal component induced as !

• Compose normal subspace S by picking up principal components

Use cumulative contribution values • Compose anomalous subspace by not picking-up

principal components31

Y = (y1,y2, . . . )>

S

Page 32: Early Detection of Persistent Topics in Social Networks

Anomaly Detection via PCA!!Judge from projection of the data onto anomaly space• Decompose input data y into

!

• To induce this, !

!

!

• Judge a topic is not anomalous if y is not enough projected onto anomaly space !

then a topic y is anomalous, i.e. persistent[Lakhina+ 04]

32

y = y + y y 2 S y 2 S

kyk = kCyk > �PCA

Page 33: Early Detection of Persistent Topics in Social Networks

Agenda!!Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

33

Page 34: Early Detection of Persistent Topics in Social Networks

Experiment 1: Evaluation of proposed method!!Predict whether a topic is persistent or not• Experiment using Twitter data

Use tweets retweeted by over 500 users and passed 50days as topics 698 tweets retweeted by 1.6M users amplification factor:

(# of RTs w/i 50 days)/(# of RTs w/i10 days) > 1.1 -> persistent topics

• Goal: Predict whether a topic is persistent or not, only looking at the early period of the topic !

!

• Evaluate our method and comparison methods by AUC34

Post 50th day: able to know the answer

1st d 3rd d 5th d 10th d Want to know in the early period

Page 35: Early Detection of Persistent Topics in Social Networks

Experiment 1: Evaluation of proposed method!!Evaluation criteria AUC• AUC: sort of accuracy of classifier

AUC is an area of a curve For a parameter of a classifier, plot

Vertical Axis : True Positive Horizontal Axis : False Positive and draw a curve by moving the parameter from -∞ to +∞

• Characteristics • Larger AUC, better performance • AUC is 0.5 if you classify randomly • AUC is 1.0 for the best performance of classifier

35

Page 36: Early Detection of Persistent Topics in Social Networks

Experiment 1: Evaluation of proposed method!!Experiment on our method• Divided 698 topics into training data and test data

Test data: 109 persistent topics and randomly pick up 109 topics from 589 non-persistent topics Training data: remaining 480 non-persistent topics !

!

!

• Tried several sampling intervals: 1 hr, 3hrs, 6hrs, 12hrs !

• Use anomalous scores and compute AUC !

• Repeat 200 times36

Training Data Test DataNon-persistent: !

589 TopicsPersistent:!109 Topics

Page 37: Early Detection of Persistent Topics in Social Networks

Experiment 1: Comparison Method 1!Experiment on two comparison methods: Power Law Curve Fitting• Comparison method 1: Power law curve fitting

Ground truth of long-tail fit power law curve to the difference sequence of # of retweets

Estimate and of !

where is # of sharer per unit time at time

• Fit to 218 test data !

• Use and compute AUC !

• Repeat 200 times37

nt

nt = �t↵

t

Page 38: Early Detection of Persistent Topics in Social Networks

Experiment 1: Comparison Method1!Experiment on two comparison methods: Power Law Curve Fitting: Actual ExampleAn example for power law curve fitting to a persistent topic

38

Page 39: Early Detection of Persistent Topics in Social Networks

Experiment 1: Comparison Method2!Experiment on two comparison methods: Linear Influence Model• Comparison method2: Linear Influence Model (LIM)[Yang

and Lskovec 10] Able to predict # of future retweets Predict as a superposition of users’ strength of influence learnt from the past

• Predict # of retweets within 50 days and # of retweets within 10 days !

• Use (# of retweets within 50 days)/(# of retweets within 10 days) and compute AUC

39

Page 40: Early Detection of Persistent Topics in Social Networks

Experiment 1: Result!Our method outperform two comparison methods for most cases

40

Sampling interval 1 hour Best Sampling interval

Bette

r

Proposed method outperforms comparison methods if we have enough data points

Page 41: Early Detection of Persistent Topics in Social Networks

Agenda!!Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

41

Page 42: Early Detection of Persistent Topics in Social Networks

Experiment 2: Effect of feature combination!!Use only one feature value and compose normal subspace• Which feature value contributes to persistent topics

same procedure as experiment 1 and do proposed method but use only one of the features !

• Evaluate by AUC if AUC has high score -> that feature contributes

to accuracy of prediction if AUC has low score -> that feature doesn’t contributes

to accuracy of prediction

42

Page 43: Early Detection of Persistent Topics in Social Networks

Experiment 2: Result!Our strategy might have better performance if we incorporate as many features as possible.

43

Sampling interval 1 hour

Some features, like # of communities, has a strong performance Some features, like maximum distance has a low performance Overall, all the features has a better performance

Bette

r

Page 44: Early Detection of Persistent Topics in Social Networks

Agenda !Early Detection of Persistent Topics in Social Networks1. Backgrounds and Related Work 2. Proposed Method

1. Approach of the Proposed Method 2. Mathematical Modelling

3. Experimental Results on Twitter Data 1. Comparison with Existing Methods 2. Effect of Feature Combination

4. Conclusion

44

Page 45: Early Detection of Persistent Topics in Social Networks

Conclusion

• Problem: How to predict whether a topic is persistent or not as soon as possible !

!

• Proposed a method based upon time-evolution of Topic Graph !

!

• Show good performance on Twitter Data

45

Page 46: Early Detection of Persistent Topics in Social Networks

Future Work

• Set threshold to judge whether persistent or not !

• Feature values of Topic Graph How each feature value contributes a topic to become persistent Implication on connection between Graph and real world Feature Value selection • Eigenvalues of Graph Laplacian is not realistic feature value from the

perspective of computation cost

!

• Another method to Topic Graph Supervised based method

46