17
Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011.

Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Embed Size (px)

Citation preview

Page 1: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Dynamics of Conversations

ACM SIGKDD ’10

By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon

Presented by Annie T. Chen on March 29, 2011.

Page 2: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Overview

RQ: What is the structure of online conversations?

Method Proposed a simple mathematical model for

the structure of conversations Added to it to account for factors such as

recency and author identity that may affect conversations.

Compared the predictions of these models back to the empirical data for three datasets: Usenet groups, Yahoo! Groups, and Twitter

Page 3: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Properties of Conversations

Size and depth of thread Depth: length of the maximum path from the

root to a leaf in a thread Size is roughly quadratic to depth

Degree distribution p Close to power law: p(k) k- for some >2

Page 4: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Branching Process Model (BP-Model) - 1 The Galton-Watson branching process is a

classic model for generating a random tree. At each ith step in the process, each node

generates a certain number of children according to the distribution p

p(k): fraction of nodes with k children in the data

Zi: number of children at the ith level of the thread

let =E[p], the mean of the distribution p

Page 5: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Branching Process Model (BP-Model) - 2 According to the definition of a branching

process, it can be shown that:E[Z] = (1-)-1

Since < 1 for all datasets, the branching process dies out.

Empirical Simulated

Page 6: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Branching Process Model (BP-Model) - 3 Problems with the BP-Model

Model is not generative (degree distributions are stipulated)

Model does not capture the depth distributions that are observed in reality

Number of children is determined by a single distribution

Timestamps are left out

Page 7: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

T-Model

Concept: new messages receive more attention than old ones

Probability of the decision to add a child to v is proportional to some function h(degv, rv) of degree and recency of v

Probability of death is proportional to a constant

h(degv, rv) = degv+rv for constants >=0 and (0,1)

Thus, both degree and recency play a role in generating different types of threads

Page 8: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

TI-Model - 1

The TI-Model was developed to model author identity.

Concept: authors tend to respond to responses to their own earlier messages.

Based on the polya urn model Original polya urn problem:

Initially, an urn has x balls of color 1 and y balls of color 2. At each time t, one ball is drawn out and returned to the urn with another ball of the same color.

“Rich get richer” process

Page 9: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

TI-Model - 2

New message v arrives with u=parent(v)

“Identity copying” effectEmpirical Simulated

an author on path(parent(u))

random author

Page 10: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Examples

Usenet Yahoo! Groups Twitter

Page 11: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Usenet

Empirical

Simulated

Page 12: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Usenet

Group

It.discussioni.leggende.metropolitane 10

It.politica.polo 10

Rec.games.chess.politics 3

Bln.politik.rassismus 2

Sk.politics 1.5

High : Higher degree of preferential attachment Top ones tended to be politically related

Group

fa.linux.kernel 0.98

uk.politics.electoral 0.98

rec.arts.drwho 0.97

uk.politics.crime 0.97

chile.soc.politica 0.96

High : High recency effect Lower traffic groups had a higher recency effect

Page 13: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Usenet Identity copying rates

High (low copying rate): new authors tend to join in often Low (high copying rate): tendency for authors of posts to

have previously already authored a post

High (low copying rate):

or.politics

alt.fan.cecil-adams

alt.marketplace.online.ebay

pl.misc.kolej

rec.arts.sf.written

Low (high copying rate) linux.debian.bugs.dist

microsoft.public.excel.misc

microsoft.public.excel.programming

nctu.talk

tw.bbs.campus.nctu

Page 14: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Yahoo! Groups

Groups with “bushy” threads and high recency effects

Group

indianmedical =10

IllinoisSpeakers

DetectiveRichardHead

Bodybuildersaverageguys

villageDesign

NorthCarolinaSpeakers =0.99

stbaseliosorthodoxchurch

LostnFoundEvents

PatriceVinci

molecular-biology-notebook

Page 15: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Twitter

Group

#mustsee =10

#twitterinreallife

#readingrainbow

#whathappenswhen

#vogueevolution

#yankees =0.99

#warriors

#tiff09

#iranelectioni

#followfriday

Groups with “bushy” threads and high recency effects

Page 16: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

Conclusion Employed various mathematical models to simulate

patterns in online conversations Strengths:

Incorporated time and author identity in the models Were able to predict patterns that were found in

actual datasets Weaknesses / further directions:

Explanatory power: how well do these models explain differences between conversational environments and/or networks?

Could incorporate other elements of conversation:• Topics• Structural/semantic components of messages• Actor characteristics/roles

How well do these models emulate different types of communication tools, e.g. Twitter?

Page 17: Dynamics of Conversations ACM SIGKDD ’10 By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon Presented by Annie T. Chen on March 29, 2011

References

Aldous, D. (2003). Lecture 2: Branching Processes. Accessed March 29, 2011 at http://www.stat.berkeley.edu/~aldous/Networks/lec2.pdf.

Kumar, R., Mahdian, M., & McGlohon, M. (2010). Dynamics of conversations. ACM SIGKDD 2010.

Zhu, T. (2009). Nonlinear Polya Urn Models and Self-Organizing Processes. Accessed March 29, 2011 at http://www.math.upenn.edu/grad/dissertations/tongzhudissertation.pdf.