9
HAL Id: hal-00737767 https://hal.archives-ouvertes.fr/hal-00737767 Submitted on 2 Oct 2012 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin, Zhenyu Li, Dong Wang, Kavé Salamatian, Gaogang Xie To cite this version: Jiali Lin, Zhenyu Li, Dong Wang, Kavé Salamatian, Gaogang Xie. Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media. 21st International Conference on Computer Communications and Networks (ICCCN 2012), Jul 2012, Munich, Germany. pp.1-7. hal- 00737767

Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

HAL Id: hal-00737767https://hal.archives-ouvertes.fr/hal-00737767

Submitted on 2 Oct 2012

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Analysis and Comparison of Interaction Patterns inOnline Social Network and Social Media

Jiali Lin, Zhenyu Li, Dong Wang, Kavé Salamatian, Gaogang Xie

To cite this version:Jiali Lin, Zhenyu Li, Dong Wang, Kavé Salamatian, Gaogang Xie. Analysis and Comparison ofInteraction Patterns in Online Social Network and Social Media. 21st International Conference onComputer Communications and Networks (ICCCN 2012), Jul 2012, Munich, Germany. pp.1-7. �hal-00737767�

Page 2: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

Analysis and Comparison of Interaction Patterns inOnline Social Network and Social Media

Jiali Lin∗†, Zhenyu Li∗, Dong Wang∗† , Kave Salamatian‡, Gaogang Xie∗∗Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

{linjiali, zyli, wangdong01, xie}@ict.ac.cn†Graduate School of Chinese Academy of Sciences, Beijing, China

‡Universite de Savoie, [email protected]

Abstract—In this work, we aim to analyze and compareinteraction patterns in different types of social platforms. Tothis end, we measured Renren, the largest online social networkin China, and Sina Weibo, the most popular microblog servicein China. We model the interaction networks as unidirectionalweighted graphs in light of the asymmetry of user interactions.Following this model, we first study the basic interaction patterns.Then, we examine whether weak ties hypothesis holds in theseinteraction graphs and analyze the impacts on informationdiffusion. Furthermore, we model the temporal patterns of userinteractions and cluster users based on the temporal patterns.Our findings demonstrate that although users in the two plat-forms share some common interaction patterns, users in SinaWeibo are more popular and diverse. Moreover, analysis andsimulation results show that Sina Weibo is a more efficientplatform for information diffusion. These findings provide anin-depth understanding of interaction patterns in different socialplatforms and can be used for the design of efficient informationdiffusion.

Index Terms—Interaction patterns, Social networks, Informa-tion diffusion

I. INTRODUCTION

In the past few years, the Internet has witnessed the unprece-dented boom of online social services. Online social networks,such as Facebook 1, and Renren 2, have attracted millionsof users. The key feature for their success is that they allowor even encourage users to interact with others by publishingtheir opinions, leaving messages, chatting, sharing content, andsome other social interactions. Later, Microblog services, suchas Twitter 3 and Sina Weibo 4, emerge as social media systems[8] and quickly get extremely popular.

The level of interactions between users characterizes theirpopularity and their friendship, which can be further leveragedfor information diffusion. The interaction graphs of differentsocial platforms have been examined in [8] and [13]. Dif-ferent characteristics of interaction graph and social graphhave been identified. Viswanath et al. [12] demonstratedthat links in Facebook interaction network fluctuates rapidlyover time, suggesting that time should be taken into accountwhen analyzing interaction networks. Jiang et al. [6] further

1http://www.facebook.com.2http://www.renren.com.3http://www.twitter.com.4http://weibo.com.

analyzed the latent interaction graph in Renren. These worksmainly focus on the difference between interaction and socialgraphs and only examine the online social networks, suchas Facebook, Cyworld and Renren. The interaction patternsin different types of social platforms are remained not wellunderstood.

Our study in this paper focuses on a comprehensive under-standing of interaction patterns over time and a comparisonof the patterns between an online social network (i.e. Renren)and a social media platform (i.e. Sina Weibo). To this end,we collected two large datasets: one is for Renren and theother is for Sina Weibo. Given that the interaction between twousers is not necessarily reciprocal and different users interactwith different strengths, we model the interaction network asa unidirectional weighted graph.

We first analyze the basic interaction patterns in Renren andSina Weibo, and then examine whether weak ties hypothesis[10] holds in interaction graphs. A hidden Markov model(HMM) is used to characterize temporal interaction behaviors.Then, we cluster users with similar HMM parameters using aself-organizing map (SOM) in order to investigate whetherthere exist groups of users with similar temporal behaviorpatterns. In particular, our findings can be summarized asfollows:• While the in-degree distributions in interaction graphs for

both social platforms follow power-law modes, we findstretched exponential node strength distributions in bothinteraction graphs. The stretched factor characterizes thenumber of cascade stages for information spreading.

• By measuring the coupling between interaction strengthsand the local structure in interaction graph, we find theweak ties hypothesis holds for Renren, but not for SinaWeibo. The results indicate that Sina Weibo as a socialmedia is more efficient than Renren as an online socialnetwork when it comes to information diffusion.

• Users in Sina Weibo are generally more popular than usersin Renren. In particular, the top popular users in SinaWeibo are with a much higher probability of receivinginteractions than those in Renren.

• Users do show clustering behavior patterns in both socialplatforms, and users behave more diversely in Sina Weibo

Page 3: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

than in Renren. Moreover, users with similar behaviorpatterns do have some correlations in attributes.

Our findings demonstrate that although users in Renrenand Sina Weibo share some common interaction patterns,users in Sina Weibo are more popular and diverse. Moreover,information diffusion in Sina Weibo is more efficient than thatin Renren. These findings provide an in-depth understandingof interaction patterns in different social platforms and canbe used for the design of efficient information diffusion. Tothe best of our knowledge, this work is the first to comparean online social network and a social media in terms of userinteraction.

The rest of this paper is organized as follows: Section IIdescribes the background and dataset on Renren and SinaWeibo. We analyze the interaction networks in both websitesin Section III. In Section IV, we adopt hidden Markov modeland self-organizing map to dig into the user behavior. SectionV covers the related work. Finally, we conclude in Section VI.

II. BACKGROUND AND DATASET

In this section, we first briefly introduce Renren and SinaWeibo. Then we describe the datasets.

A. Renren and Sina Weibo

Renren, with a claimed 170 million registered users, isthe largest online social network in China [6]. Renren canbe described as the Facebook’s Chinese cloning since it isvery similar with Facebook in both the user interfaces andthe features. A mutual friendship between two users is builtif and only if one sends a request and the other approves therequest. It provides various interaction applications. Each userhas a gossip wall where visitors can leave messages. Anotheruseful interaction application is ‘Status’, which enables usersinform their friends about the recent news or thoughts. Thefriends can see the status and eventually they may reply to itin thread.

Sina Weibo is launched in 2009, about three years afterTwitter. It is now the most popular microblog service platformin China with more than 200 million registered users and 32million daily active users. Like Twitter, Sina Weibo allowsusers to post 140-character messages (called tweet in Twitter),retweet and reply others’ tweets, follow whoever they areinterested in. Besides texts and photos, Sina Weibo also allowsusers to post video clips and audio files. As a consequence, itbroadens the boundary of mcrioblog, making it a platform forinformation dissemination like a social media platform, socialconnection and entertainment.

B. Datasets

We collect two datasets for analysis, one for Renren andone for Sina Weibo.1) Dataset of Renren. Our Renren dataset was collectedfrom May 10th, 2010 to July 10th, 2010. Initially, 200 randomusers were selected. Then the crawler followed the friendlinks to find more users. In total, we got over 3 millionusers, within which 1,267,731 (42.3%) users make their status

publicly accessible. We selected among them 191,121 userswho updated their status information and got feedbacks in theeight weeks of our study. We chose to gather the interactiondata from ’Status’ feature as it had a detailed and temporallog of each user’s interactions. Moreover, a larger proportionof users are willing to make public their status; there are threetimes more interactions relative to status than relative to blogsor gossip wall.2) Dataset of Sina Weibo. We crawled Sina Weibo from Oct1st, 2010 to Oct 15th, 2010. We randomly chose 100 users asseed users then followed the followers and followings of usersto get more users. In total, we got over 3.75 million users. Wecaptured the profile of each user, e.g. the number of followersand followings, and the number of tweets he had posted. Thenwe collected the original tweets these users published betweenAug 1st, 2010 and Sep 30th, 2010 and the comments to thesetweets. We found that among 3.75 million users, 1,159,269users published their tweets and got replies within the timewindow. We crawled Sina Weibo again to get latest profilesof the 3.75 million users on Dec 20th, 2011.

III. INTERACTION NETWORK ANALYSIS

In light of the asymmetry and skewed strength of userinteractions, we model the interaction network as a unidirec-tional weighted graph. In this graph nodes represent users,and a directed edge from A to B exists if and only if user Acontacts directly with B. When we speak directly contacting,for Renren, we mean A replied B’s statuses at least once,while we mean A replied B’s tweets at least once for SinaWeibo. Throughout this paper, a reply means one interaction.

The edge is weighted with the number of interactions,denoted wAB . The directed edge is weighted by the interactiontimes between users. For example, B replies three messagesto A’s statuses or tweets, then a directed edge from B to Ais generated and the weight wBA = 3. Here, the node isequivalent to the user.

A. Power-law Distribution for Node In-degree

As the network is directed, there are two types of nodedegrees: in-degree and out-degree. Each reflects a distinct per-spective: one’s in-degree characterizes his popularity whereashis out-degree represents his activity. In this paper, we areinterested in user popularity. Thus, we analyze in-degreedistribution for Renren and Sina Weibo.

Figure 1 plots the complementary cumulative distributionfunction (CCDF) for in-degree in the two interaction graphs.A look at the in-degree distribution shows that even if thelarge majority of nodes have very low in-degree, implying thepossible existence of sockpuppets with low interactions, thereare also very popular people with a lot of friends interactingwith them. The in-degree distributions in both Renren and SinaWeibo roughly follow power-law. The power-law coefficient αfor Renren is 3.5, consistent with which was found by Jiang, J.et al. [6]. It can also be found that the in-degree distributionin Sina Weibo interaction graph is less skewed than that inRenren one.

Page 4: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

100

101

102

103

104

105

10−6

10−4

10−2

100

In−degree

CC

DF

(100%

)

Renren

Sina Weibo

−3.5

−1.27

Fig. 1. Node in-degree distribution

100

101

102

103

104

105

1

194

528

951

1447

Rank (log scale)

Str

ength

(y

c s

cale

)

c = 0.68, a= 8.357, b = 108.725

R2 = 0.977165

100

102

104

100

101

102

103

104

Str

ength

(lo

g s

cale

)

data in log−yc scale

SE model fit

data in log−log scale

Fig. 2. Rank-ordering distribution of node strength in Renren

B. Stretched Exponential Distribution of Node Strength

The node strength is defined as the sum of all weights of in-coming edges in the weighted directed graph, i.e. the strength

of node i with in-degree k is defined as: si =k∑

j=1

wji. We are

interested in whether the distribution of node strength followspower-law model or not.

We rank the users according to node strength, and plot therank-ordering distribution in Figure 2 and Figure 3 for Renrenand Sina Weibo, respectively. The first glance may lead usto believe that power law fits the node strength distributionwell. However, in log-log scale the distribution curves in bothfigures are not straight lines, meaning that node-strength doesnot well follow the power-law model. However, instead ofsimply fitting the distributions with other models, we analyzethe reason behind.

The statuses or tweets of a user may be forwarded byhis friends to a large number of audiences. These audiencesmay or may not reply the statuses or tweets. The forwardingprocess can be modeled as a cascade process, which isformally defined as a random process Xn that can be describedas a multiplication of n random variables m1, . . . ,mn, i.e.Xn = m1 × m2 × . . . × mn. Among all users that see astatus or tweet from a link made in (i − 1)-th cascade stage

100

101

102

103

104

105

106

1

365

3946

18002

54977

133019

Rank (log scale)

Str

en

gth

(y

c s

ca

le)

c = 0.235, a= 0.990, b = 15.027

R2 = 0.993864

100

102

104

10610

0

101

102

103

104

105

Str

en

gth

(lo

g s

ca

le)

data in log−yc scale

SE model fit

data in log−log scale

Fig. 3. Rank-ordering distribution of node strength in Sina Weibo

(1 < i ≤ n), there is a percentage αi that will reply thecontent and make a link to it. This means that the number ofreplies can be related to a factor (1+α1)(1+α2) . . . (1+αn),representing the overall number of replies after n steps of thecascade process.

Quite interestingly, a limit theorem similar to the CentralLimit Theorem can be derived for cascade processes [4]. Theseprocesses converge to a stretched exponential (SE) distributiondefined as:

P (X ≥ x) = e−( xx0

)c (1)

where the stretched factor c is the inverse of the number ofmultiplied random variables and represents the inverse of thenumber of cascade stages, x0 is a constant parameter. In arank-ordering distribution, N objects are ranked in a descend-ing order of their reference numbers. Then P (X ≥ xi) = i/N ,where i (1 ≤ i ≤ N ) is the number objects with referencenumbers larger or equal to xi. That is log(i/N) = −( x

x0)c.

By substituting xi for yi, we have

yic = −a log i+ b (2)

where a = x0c and b = y1

c. Hence, the rank-orderingdistribution curve for data following a stretched exponentialmodel should be a straight line in loglog-yc.

We thus fit the SE distribution to node strength using thefitting method proposed by Guo et al. in [5]. To gauge thefitting errors, we use the coefficient of determination of thedata fit, also known as R2. The closer R2 to 1, the better themodel fits the empirical data. The results are plotted in Figure2 and Figure 3. As expected, the SE models well fit the nodestrength distributions for both Renren and Sina Weibo.

An interesting finding is related to the stretched factor. Thestretched factor c for Renren is 0.68, meaning that a statusis on average forwarded by 1∼2 hops. For Sina Weibo, c is0.235, meaning an average of 4∼5 hops forwarding. The abovefinding indicate that content in Sina Weibo can be forwardedto distant users. The reason behind is that Renren is friendship-based; users are almost only interested in interaction with theirfriends. However, Sina Weibo can be treated as a social media

Page 5: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

[8] and formed by content interests. Users in Sina Weibo mayreply to whatever they are interested.

C. Weak Ties Hypothesis Analysis

Social platforms are now wildly used for information dif-fusion. The efficiency greatly depends on the structure of thegraph on which information is diffused. It has been found thatties with different strengths play different roles in informationdiffusion [1]. We here examine whether weak ties hypothesisholds in the interaction graphs of Renren and Sina Weibo.Weak ties hypothesis states that the strength (i.e. weight) of atie (i.e. edge) between A and B increases with the overlap oftheir friendship circles, resulting in the importance of weak tiesin connecting communities. We define the friendship overlapof two users i and j, connected by an edge eij in weighteddirected graph as:

Oij =nij

(di out − 1) + (dj in − 1)− nij(3)

where nij is the number of the entire two-hop paths fromnode i to node j except the edge eij , di out is the out-degreeof node i, while dj in is the in-degree of node j. If node i andnode j have no common acquaintances, then we have Oij=0,and if all the neighbors of node i have directed link to nodej, then Oij=1.

To quantify the correlation between tie strength and friend-ship overlap, we leverage Spearman rank correlation ρ whichis defined as

ρ = 1− 6∑

(xi − yi)2

n(n2 − 1)(4)

where xi and yi are the ranks of edges according to the edgeweight and friendship overlap for an n-edge system. It is anon-parametric measure of correlation, which shows how wellan arbitrary monotonic function could describe the relationshipbetween two variables. The coefficient lies in between [-1,1],where “1” indicates perfect positive correlation and “-1” meansperfect negative correlation.

For any pair of two users connected by an edge in theinteraction graph, we compute their friendship overlap usingEq. 3, then use Spearman’s correlation measure Eq. 4 tocompute the correlation between tie strength and friendshipoverlap. The results of both Renren and Sina Weibo are listedin Table I. To avoid the tied ranks among the edges with theleast weight, we bin the ties based on the weight of edge.

TABLE ISPEARMAN’S RANK CORRELATION COEFFICIENTS

Tie Strength All [0,10) [10,100) [100,1000) [1000,10000)Renren 0.484 0.487 0.359 0.182 N/A

Sina Weibo 0.408 0.386 0.082 -0.013 0.055

It can be found that the correlations in Renren for all bins arerelatively high, meaning that the hypothesis holds for Renren.But for Sina Weibo, the correlations for edges with highstrengths are very close to 0, meaning the hypothesis does not

hold. Recall that, as Facebook, Renren is based on real socialrelationship, aiming at maintaining existing friendship andmaking new friends. This would yield users with close socialrelationship highly connecting with each other in clusters,where the links among different social clusters act as weakties. On the other hand, Sina Weibo is a social media platformwhere users can follow whoever they like.

Onnela et al. [10] have found a social graph where weakties hypothesis holds are not efficient when it comes toinformation diffusion. Next, we evaluate the performance ofinformation diffusion in Renren and Sina Weibo interactiongraphs.

D. Information Diffusion Simulation

We follow the simulations in [10] to evaluate the perfor-mance of information diffusion in Renren and Sina Weibointeraction graphs. The spreading mechanism is similar to thesusceptible-infected model of epidemiology, in which recov-ery is impossible, implying that an infected individual willcontinue transmitting information. We simulate two scenarios.

The first scenario is called real simulation, in which theinfection probability Pij from an infected node j to i is setaccording to the tie strength wij in interaction graph, i.e.Pij = xwij . The parameter x controls the overall spread rate.Changing x’s value does not change the qualitative nature ofresults [10]. The second scenario is called control simulation,where the network (i.e. tie) is the same and we set strength ofedge as the average over all edges. That means the infectionprobability on any edge is the same.

We chose 10 nodes as initially infected nodes with novelmessage at time 0. The initial node sets in two scenarios arethe same. Obviously, the infected nodes are bounded by thelarge connected components starting from the 10 initial nodes.

Figure 4 plots the real diffusion simulation versus thecontrol simulation in Renren. Under no circumstance wouldthe real Renren interaction network is more efficient forinformation diffusion than the control one. Before t = 125the coverage of real and control simulation are almost thesame. When t is greater than 125, the control simulation coversmore fractions of nodes. This is because the message is morelikely to escape from the original communities in the controlsimulation as time increases due to equal weight of all theedges.

Figure 5 plots the results in Sina Weibo. Compared withRenren in Figure 4, the gap between the results of real andcontrol simulations are small. Besides, before time t reaches165, the number of infected nodes in the real simulation isslight bigger than that in control simulation. We can draw theconclusion that Sina Weibo is good for information diffusion,especially in the case of reaching the maximum persons withinlimited time.

Our simulation results on one hand indicate that a socialgraph where weak ties hypothesis holds are not efficient forinformation diffusion. On the other hand, they demonstrate thatSina Weibo as a social media platform is more efficient than

Page 6: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Time t

CD

F

real

control

Fig. 4. Diffusion simulation in Renren

0 100 200 3000

0.1

0.2

0.3

0.4

0.5

0.6

Time t

CD

F

real

control

Fig. 5. Diffusion simulation in Sina Weibo

Renren as an online social network in terms of informationdiffusion.

IV. TEMPORAL USER INTERACTION PATTERNS ANALYSIS

In this section, we show an insight into the temporal userinteraction patterns. By observing how users interact with eachother, we analyze and model the user behavior patterns. Firstly,hidden Markov model is adopted to precisely depict how userbehavior evolves. Then, we implement the self-organizing mapto characterize the common patterns of user behavior. Finally,we discuss the applications based on temporal interactionpatterns.

A. Modeling Temporal User Behaviors with Hidden MarkovModel

We observe users of Renren and Sina Weibo for a period oftime to study the temporal evolution of user interactions. In[12] an analysis based on a priori separation of users’ popu-larity and unpopularity is defined and comparing consecutivestate using a resemblance factor is proposed. In this paperwe are using a more precise characterization method which ishidden Markov model (HMM).

A hidden Markov model (HMM) is a Markov chain in whichthe chain states are not observable directly [11]. However, inhidden Markov model, the outputs which statistically depend

Fig. 6. Hidden Markov model in interaction networks

on the hidden states are visible, i.e. there is probabilitydistribution of visible output for each hidden state in theHMM. Therefore the sequence of HMM output observationsgives some information about the sequence of hidden states.

Intuitively, we assume that in interaction network, eachuser follows a HMM with two hidden states: active andinactive showed in Figure 6. The intuition of setting thesetwo hidden states derives from the bi-model of human life.Then we define the visible output of each user with a two-state observation as well. The observation equals to 1 if hereceives at least a message on the day and 0 otherwise. Weextract eight consecutive weeks’ interaction logs of the usersin Renren, and 61 consecutive days’ interaction logs of theusers in Sina Weibo. Each user has therefore a sequence of 1/0(popular/unpopular) values relative to a series of consecutivedays.

For each sequence we calibrate a HMM, then we useBaum-Welch algorithm [2] to acquire the maximum likelihoodestimate of four parameters: the state transition probabilitiesPai, Pia and the output probabilities Pa1, Pi1. Pai (Pia)indicates the transition probability from active (inactive) stateto inactive (active) state. Meanwhile Pa1 (Pi1) represents theprobability that a user in the active (inactive) state to receiveat least an incoming interaction. These four values are theprominent profile of user behaviors. As a result, everyone hasone HMM characterizing the incoming interactions behavior,resulting in 191,121 different HMMs for those who have atleast one interaction in our capturing period of Renren and1,160,798 that of Sina Weibo.

With these four values, one can obtain the probability ofreceiving at least an interaction as:

Pp =Pai

Pai + PiaPi1 +

Pia

Pai + PiaPa1 (5)

We suppose that a user’s popularity is in proportion to theprobability of receiving at least an interaction, namely Pp

in formula (5). Then we plot in Figure 7 the CDF of theuser popularity. About 95% of users in Renren are with aprobability smaller than 0.4 and the distribution of these users

Page 7: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Probability of receving at least an interaction

CD

F

Renren

Sina Weibo

Fig. 7. The probability of receiving at least an interaction for all users

is almost uniform. Moreover, there are as many as 20% ofusers in Sina Weibo have a probability larger than 0.5 toreceive an interaction, while this percentage is only 2% inRenren. The above results indicate that users in Sina Weiboare generally more popular than users in Renren, and the toppopular users in Sina Weibo are with a much higher probabilityof receiving interactions than those in Renren.

B. Clustering User Behavior Patterns by Self-organizing Map

In order to figure out whether there is the common char-acteristic interaction behavior pattern shared by users, weapply a clustering algorithm to the user behavior patterns,characterized by the 4 HMM parameters described in theprevious subsection.

The clustering algorithm we applied is self-organizing map(SOM) [7]. It is a type of artificial neural network which istrained using unsupervised learning to produce maps, i.e. alow-dimensional (typically two dimensions) discretized rep-resentation of the input space of the training samples. SOMworks by assigning observed samples to neurons in such a waythat similar samples are placed in close-by neurons. SOM iswell known to perform robust clustering over complex datain an agnostic way (without making strong assumption of thedata nature) [7].

We have built HMMs for 191,121 users in Renren and1,160,798 users in Sina Weibo. For each user, the input toSOM is the vector containing the four probabilities (Pai, Pia,Pa1, Pi1). We train a 20× 20 network over 200 epochs withthese data.

We show in Figure 8 the result of clustering for users inRenren. Each node in the figure represents a group of users,thus we get 400 groups of users. In addition, the color ofthe link between two nodes indicates the Euclidean distancebetween them. The lighter the color is, the closer or moresimilar the nodes are. As can be seen, there are clearly threebig clusters.

To explain the resulting clusters we examine weight planesshowing the importance of each one of the four input proba-bilities on the users clustered in each node. The weight planesare shown in Figure 9. Four subgraphs correspond to the four

Fig. 8. SOM neighbor weight distance of Renren

0 5 10 15 20

0

5

10

15

Weights from Input 1

0 5 10 15 20

0

5

10

15

Weights from Input 2

0 5 10 15 20

0

5

10

15

Weights from Input 3

0 5 10 15 20

0

5

10

15

Weights from Input 4

Fig. 9. SOM weight planes of Renren

input HMM parameters (Pai, Pia, Pa1, Pi1) sequentially. Thedarker the node color is, the smaller the parameter representedby the node is.

It can be seen from Figure 9. Users in cluster 2 are stablein both states to some extent. They have a high probabilityto receive interaction in active state and a low probability ininactive state. Users in cluster 3 are also relatively stable, butthey are more likely to receive interaction in inactive state.Thus, cluster 3 can be treated as a mirror cluster of cluster2. Cluster 1 contains users with a low probability of gettinginteraction in either state.

We further estimate the popularity (measured by Eq. 5) forthe users in the three clusters. The results demonstrate thatusers in cluster 2 and cluster 3 are more popular than usersin cluster 1. Taking all the features of the three clusters intoconsideration, we conclude the properties of these clusters be-low. Users in cluster 2 seem to be content producers, and theiractivities in the network greatly contribute to their popularity.Cluster 3 contains celebrities, since they are less active butstill can receive interactions, implying their popularity based

Page 8: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

Fig. 10. SOM neighbor weight distance of Sina Weibo

0 5 10 15 20

0

5

10

15

Weights from Input 1

0 5 10 15 20

0

5

10

15

Weights from Input 2

0 5 10 15 20

0

5

10

15

Weights from Input 3

0 5 10 15 20

0

5

10

15

Weights from Input 4

Fig. 11. SOM weight planes of Sina Weibo

on the social attentions to them. Cluster 1 simply representsordinary users that have not a much differentiated behavior.

Figure 10 illustrates the clustering result of Sina Weibo.As can be seen, there are roughly four big clusters. However,compared with Renren, users behave more diversely in SinaWeibo. We plot the weight planes in Figure 11. Followingthe same way, we find that Cluster 1 represents the ordinaryusers with the least popularity, while cluster 2 consists of userswith moderate popularity. Cluster 3 contains celebrities, sinceno matter which states they stay at, they always have a highprobability to get contacted. Although users in cluster 4 arealso popular ones, their popularity is largely based on theirefforts: the more popular the users in cluster 4 want to be, thelonger they should stay in active state.

Next, we are interested in the correlation between clustereduser behavior and user attributes (e.g. number of friend-s/followers). Here, we focus on Sina Weibo. The attributewe look at is the number of new followers for users duringOct 15th, 2010 to Dec. 20th, 2011. We cluster users into 400

Fig. 12. Contour map of the increase of users’ followers

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Time t

CD

F

random seeds

popular seeds

Fig. 13. Diffusion simulation in Renren

groups as in Figure 10, and then compute the average numberof new followers per group. The results are plotted in Figure12 as a contour map, where the height is the logarithm of theaverage value per group. It depicts that users in cluster 3 andcluster 4 attract more new followers than cluster 1 and cluster2. This is due to the fact that users in cluster 3 and cluster 4are popular ones. The above finding reveals that the users insame cluster surely share similar attributes.

C. Applications of User Interaction Patterns

A direct application of the above findings is to choosepopular users as seeds for information diffusion. We performedsimulations on Renren interaction graph to show how seedsimpact the performance of information diffusion. We simulatedtwo scenarios. In the first one, we selected 10 popular usersfrom cluster 2 as seeds, while in the second one we randomlyselect 10 users from all users as seeds. The settings of thesimulation are similar with those introduced in Section III-D.Besides, the upper bounds of two scenarios are almost thesame. The results are plotted in Figure 13. It can be seen thatmessage starting from the popular seeds reaches a broaderrange within a shorter time lag. Clearly, the one with popularusers as seeds is more efficient than that with random seeds.

Another application is from hidden Markov model of usertemporal interaction patterns. We figure out a HMM for each

Page 9: Analysis and Comparison of Interaction Patterns in Online ... · Analysis and Comparison of Interaction Patterns in Online Social Network and Social Media Jiali Lin y, Zhenyu Li ,

user. Using the parameters we have obtained, if one also knowsthe state a user stays at time t, one can predict the probabilitythat he will receive an interaction at time t + 1. This can beused for user popularity prediction.

V. RELATED WORKThe basic properties of online social networks and social

media have been heavily studied. Mislove et al. [9] discover”small-world” properties in four online social networks afterinvestigating the structure properties of these websites. Kwak,H. et al. [8] compare fundamental features between Twitterand traditional social networks, leading the conclusion Twitteris different with online social networks.

Besides, lots of research works focus on human interactionsin daily life. Onnela, J.P. et al. [10] validate the weak ties hy-pothesis in mobile communication networks. They also verifyboth weak ties and strong ties are inefficient in informationdiffusion.

Given that the level of interactions characterizes user pop-ularity and friendship, interaction graphs of different onlinesocial networks are examined in [3] [13] [6] [12]. Chun etal. [3] analyze Cyworld and find that the structure of theinteraction network is similar to the social network. Wilson etal. [13] adopt an unweighted graph to model interactions inFacebook. They find that, in contrary to social networks, theinteraction network over Facebook does not strongly exhibit”small-world” properties. Jiang et al. [6] investigate the latentinteraction network of Renren. Viswanath et al. [12] study theevolution pattern of Facebook interaction graph. Their resultssuggest that time should be taken into account when analyzinginteraction networks.

VI. CONCLUSION

In this paper, we present the analysis and comparison of theinteraction patterns in online social network and social mediausing unidirectional weighted graphs.

Our findings show that node strength follows stretchedexponential distribution. Moreover, weak ties hypothesis holdsin Renren, leading an inefficient information diffusion net-work. In contrast, Sina Weibo exhibits a good ability forspreading messages. We figure out that there are generally alarger proportion of popular users in Sina Weibo than that inRenren. After clustering the users’ HMM parameters with self-organizing map, we find the user interaction patterns in SinaWeibo are more diverse. Besides, users in the same clustersshow some common features.

The future work will be focused on understanding theincentives and the motivations of interaction patterns. Wewill analyze whether their interactions are affected by otherattributes such as, social event, age, gender, hometown etc.

REFERENCES

[1] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. The role ofsocial networks in information diffusion. In Proceedings of the 21stinternational conference on World wide web. ACM, 2012.

[2] J.A. Bilmes. A gentle tutorial of the em algorithm and its application toparameter estimation for gaussian mixture and hidden markov models.International Computer Science Institute, 4:126, 1998.

[3] H. Chun, H. Kwak, Y. Eom, Y. Ahn, S. Moon, and H. Jeong. Comparisonof Online Social Relations in Volume vs Interaction: a Case Study ofCyworld. IMC, October 2008.

[4] U. Frisch and D. Sornette. Extreme deviations and applications. Journalof Physics I France, 7:1155–1171, September 1997.

[5] L. Guo, E. Tan, S. Chen, X. Zhang, and Y.E. Zhao. Analyzing patterns ofuser content generation in online social networks. In Proceedings of the15th ACM SIGKDD international conference on Knowledge discoveryand data mining, pages 369–378. ACM, 2009.

[6] J. Jiang, C. Wilson, X. Wang, P. Huang, W. Sha, Y. Dai, and B.Y.Zhao. Understanding latent interactions in online social networks. InProceedings of the 10th annual conference on Internet measurement,pages 369–382. ACM, 2010.

[7] T. Kohonen. Self-organization and associative memory. Self-Organization and Associative Memory, 100 figs. XV, 312 pages..Springer-Verlag Berlin Heidelberg New York. Also Springer Series inInformation Sciences, volume 8, 1, 1988.

[8] H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a socialnetwork or a news media? In Proceedings of the 19th internationalconference on World wide web, pages 591–600. ACM, 2010.

[9] A. Mislove, M. Marcon, K.P. Gummadi, P. Druschel, and B. Bhattachar-jee. Measurement and analysis of online social networks. In Proceedingsof the 7th ACM SIGCOMM conference on Internet measurement, pages29–42. ACM, 2007.

[10] J.P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski,J. Kertesz, and A.L. Barabasi. Structure and tie strengths in mobilecommunication networks. Proceedings of the National Academy ofSciences, 104(18):7332, 2007.

[11] L.R. Rabiner. A tutorial on hidden markov models and selectedapplications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.

[12] B. Viswanath, A. Mislove, M. Cha, and K. Gummadi. On the Evolutionof User Interaction in Facebook. WOSN, August 2009.

[13] C. Wilson, B. Boe, A. Sal, K. Puttaswamy, and B. Zhao. UserInteractions in Social Networks and their Implications. Eurosys, April2009.