10
Research Article Investigating How User’s Activities in Both Virtual and Physical World Impact Each Other Leveraging LBSN Data Zhiwen Yu, 1 Yue Yang, 1 Xingshe Zhou, 1 Yu Zheng, 2 and Xing Xie 2 1 School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China 2 Microsoſt Research Asia, Beijing 100080, China Correspondence should be addressed to Zhiwen Yu; [email protected] Received 5 October 2013; Accepted 15 December 2013; Published 19 February 2014 Academic Editor: Jong Hyuk Park Copyright © 2014 Zhiwen Yu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In this paper, we investigate how user’s online behavior (e.g., making friendships) and their offline activity (e.g., check-ins) affected each other by leveraging the data collected from LBSN. First, we use vectors to represent nodes and define popularity entropy for each node to weigh their popularities and the impact on forming a new edge. en, we propose an algorithm to calculate the weight of each edge based on our findings that the more overlap of linked nodes they have, the heavier weight the edge has and the more popular the nodes in their overlap are, the lighter the weight of the edge is. Finally, we conduct link prediction by using the random walk with restart method considering the effect of every node and every edge. Experimental results show that user’s activity in virtual world and physical world do really have great impact on each other. 1. Introduction e advent of GPS-enabled smart phones has had a signifi- cant impact on the development of location-based services. Combined with user’s growing interests in on-line social networks, it leads to the emergence of many location- based social network (LBSN) services, such as Foursquare (http://foursquare.com/), Brightkite (http://brightkite.com/), and Loopt (http://www.loopt.com/). Millions of users have already been attracted to such kind of services, with which they can leave digital footprints [1], known as check-ins, at the places where they have been to. is service also makes it possible for users to share their locations, photos, and tips with friends anytime and anywhere. Many users prefer to share their trajectories on LBSNs rather than to directly tell people where they are on traditional microblogs. e possible reason is that LBSNs provide much stronger social security, that is, user’s check-ins can only be seen by a small group of people specified by the user themselves, while user’s microblogs are public to anyone who has followed them on microblogging websites. When a user publishes where they are, they may worry about revealing too much personal information. As a classical problem in complex network [2], link prediction became a hot topic again when all the nodes are replaced by humans instead of physical network nodes. A considerable amount of attention has been devoted to this problem in social networks and a lot of consensuses have been reached. e most widely accepted one is that, for any two users, the more friends they have in common, the more likely that they would become friends in the coming future. is naive idea really works and has been used in link recommendation by different on-line social networks. However, based on a sociology theory called homophily [3] which emphasizes the importance of similarity in the process of forming a new link between any two strangers, it is intuitive to infer that similar people usually go to similar places, which is to say if any two users oſten visit the same or similar places, they are likely to be friends. However, it has not been well explored yet because large-scale user’s movements in physical world are difficult to monitor. In this paper, we use the data crawled from Foursquare to explore how users’ activities in virtual world and physical world impact each other, particularly how users’ movements affect users’ online friendships and vice versa. As a location- based social network, Foursquare contains two kinds of Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2014, Article ID 461780, 9 pages http://dx.doi.org/10.1155/2014/461780

Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

Research ArticleInvestigating How Userrsquos Activities in Both Virtual andPhysical World Impact Each Other Leveraging LBSN Data

Zhiwen Yu1 Yue Yang1 Xingshe Zhou1 Yu Zheng2 and Xing Xie2

1 School of Computer Science Northwestern Polytechnical University Xirsquoan 710072 China2Microsoft Research Asia Beijing 100080 China

Correspondence should be addressed to Zhiwen Yu zhiwenygmailcom

Received 5 October 2013 Accepted 15 December 2013 Published 19 February 2014

Academic Editor Jong Hyuk Park

Copyright copy 2014 Zhiwen Yu et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In this paper we investigate how userrsquos online behavior (eg making friendships) and their offline activity (eg check-ins) affectedeach other by leveraging the data collected from LBSN First we use vectors to represent nodes and define popularity entropy foreach node to weigh their popularities and the impact on forming a new edgeThen we propose an algorithm to calculate the weightof each edge based on our findings that the more overlap of linked nodes they have the heavier weight the edge has and the morepopular the nodes in their overlap are the lighter the weight of the edge is Finally we conduct link prediction by using the randomwalk with restart method considering the effect of every node and every edge Experimental results show that userrsquos activity invirtual world and physical world do really have great impact on each other

1 Introduction

The advent of GPS-enabled smart phones has had a signifi-cant impact on the development of location-based servicesCombined with userrsquos growing interests in on-line socialnetworks it leads to the emergence of many location-based social network (LBSN) services such as Foursquare(httpfoursquarecom) Brightkite (httpbrightkitecom)and Loopt (httpwwwlooptcom) Millions of users havealready been attracted to such kind of services with whichthey can leave digital footprints [1] known as check-ins atthe places where they have been to This service also makesit possible for users to share their locations photos andtips with friends anytime and anywhere Many users preferto share their trajectories on LBSNs rather than to directlytell people where they are on traditional microblogs Thepossible reason is that LBSNs provide much stronger socialsecurity that is userrsquos check-ins can only be seen by a smallgroup of people specified by the user themselves while userrsquosmicroblogs are public to anyone who has followed themon microblogging websites When a user publishes wherethey are they may worry about revealing too much personalinformation

As a classical problem in complex network [2] linkprediction became a hot topic again when all the nodes arereplaced by humans instead of physical network nodes Aconsiderable amount of attention has been devoted to thisproblem in social networks and a lot of consensuses havebeen reached The most widely accepted one is that forany two users the more friends they have in common themore likely that they would become friends in the comingfuture This naive idea really works and has been used inlink recommendation by different on-line social networksHowever based on a sociology theory called homophily [3]which emphasizes the importance of similarity in the processof forming a new link between any two strangers it is intuitiveto infer that similar people usually go to similar places whichis to say if any two users often visit the same or similar placesthey are likely to be friends However it has not been wellexplored yet because large-scale userrsquos movements in physicalworld are difficult to monitor

In this paper we use the data crawled from Foursquareto explore how usersrsquo activities in virtual world and physicalworld impact each other particularly how usersrsquo movementsaffect usersrsquo online friendships and vice versa As a location-based social network Foursquare contains two kinds of

Hindawi Publishing CorporationInternational Journal of Distributed Sensor NetworksVolume 2014 Article ID 461780 9 pageshttpdxdoiorg1011552014461780

2 International Journal of Distributed Sensor Networks

information one is userrsquos social graph in visual world and theother is userrsquos check-ins in the physical world However it isdifficult to get check-ins directly for security reasons thus weuse tips instead which contain not only time and location ascheck-ins but also some comments or messages created byusers

We use vectors with different amount of elements torepresent different users and places Each element in a vectorrepresents an entity which can be another user or a locationlinked to the user If any two users are friends or a user hasbeen to a place we can say that the two users are linked toeach other or the user and the place are linked to each otherwhich means that there is an edge between them in graphmade up by users and locations In this way we store all theinformation and edges in the graph consisting of users andlocations that are regarded as nodes

All the vectors representing users and locations makeup a graph which is called global graph while a certainuserrsquos friends and hisher visited locations make up the userrsquoslocal graph It is known to all that many ordinary usersmay follow many famous movie stars or famous writers inthe on-line social network but for any two of this kind ofordinary users they are not much less possible to be friendsin the futureThe reason is obvious that famous users usuallyhave very large local graphs linking them to many users andthese users do not have much in common with each otherThis phenomenon also exists among the users who visited avery popular place For example there are numerous visitorsvisiting the Imperial Palace every day but only very few usersmay be friends On the contrary users who are linked tounpopular locations and unpopular users are highly likely tobe friends in the future In order to portray this characteristicto be used in link prediction we define a weighing entropyto weigh every node so that we can take all the informationthat each node contains when predicting usersrsquo relationshipsand movements Finally we conduct link prediction by usingthe random walk with restart method considering the effectof every node and every edge

In the remainder of this paper we first discuss relatedworks in Section 2 and give several basic definitions and ourmotivation in Section 3 In Section 4 we analyse the statisticfeatures of the data The link prediction and experimentalresults are presented in Sections 5 and 6 respectively Finallywe conclude the paper in Section 7

2 Related Work

The link prediction problem in social networks has attracteda considerable attention since it was introduced by Liben-Nowell and Kleinberg [4] Link prediction was first exploredin the complex networks and when more and more usersrsquostarted joining on-line social networks researchers started toanalyse usersrsquo activities in on-line social networks with differ-ent backgrounds and different purposes In fact we can viewsocial networks as special complex networks whose nodesare entities (users or locations) and edges represent inter-action collaboration or influence between entities Beforeintroducing the studies in link prediction firstly we try todefine this problem as follows

Given a snapshot of a graph representing social networkat time 119879

0 we seek to accurately predict the new edges that

will be added to the graph during a certain time interval fromtime 119879

0to a given future time 119879

1

Pondering over this problem deeply we can easily findthat it contains two questions The first one is that by whatfeatures intrinsic to the network itself the social networkevolves The second one is how the social network evolvesThe key to get the answer is to find the factors that havethe greatest impact on the networkrsquos evolution We try tosummarize the relatedwork according to the factors that wereused in link prediction

21 Factors with Userrsquos Movements in Physical World Topredict the occurrence of new edge between any two usersin a social network the first step is to find the similaritiesbetween themMany researchers tried tomine user similarityfrom usersrsquo movements Usersrsquo movements in the physicalworld usually show their travelling trajectories in daily lifeand represent usersrsquo activities in the physical world directlyWhen users are at a certain place they can only do certainthings for example users can only watch movies in a movietheatrewhile have dinner at restaurants Furthermore similarpeople go to similar places with similar visiting sequencesat similar time Based on these ideas issues focusing onmining usersrsquo similarities fromusersrsquo movements have largelybeen explored Li et al [5] hire several volunteers fromdifferent countries and different cities equipped with GPSloggers to collect their trajectories for several months tomine similarities among these volunteers They firstly detectstay points from trajectories considering the length of userrsquosstaying time and the distance among different GPS points andtransferred userrsquos raw GPS trajectories into the sequences ofstay points They build a hierarchical graph with three levelswhich have different ranges of areas to model userrsquos locationhistory They argue that except for colocations which referto the locations that are visited by related users among theusers visiting sequences make more sense and carry moremeaningful information in mining usersrsquo similarities Thenthey tried to find similar sequences with different lengthsamong users and compute the similarity across multilevelsThis work has made a great process in mining user similarityfrom physical world resources However it is difficult to geta large-scale nonvolunteer userrsquos trajectories partly becauseit is impossible to let everyone equip with a GPS loggerand partly due to privacy issues Therefore it is hard toapply this idea to find similarities among a large scale ofusers

Asmore andmore users use cell phones researchers try toinfer social ties from cellular network data Cho et al [6] usedata from Brightkite (httpbrightkitecom) and Gowalla(httpgowallacom) along with a dataset of cell phonelocation trace data to understand the basic laws that governusersrsquo motion and dynamics They find that there basicallyexist two features in usersrsquo travel style One is that usersrsquo short-ranged travel is geographically and temporally periodic andhas nothing to do with social network structure the otheris that usersrsquo long-distance travel is largely affected by socialnetwork ties More than 50 of userrsquos movements can be

International Journal of Distributed Sensor Networks 3

explained by periodic behaviour while less than 30 of userrsquosmovements can be explained by social ties Thus they builda periodic and social mobility model to predict individualsrsquomobility combining periodic short-ranged movements withsocial-ties related travels Their model has three differentcomponents capturing the feature of userrsquos regularly visitingspatial locations the temporal movement between theselocations and userrsquos movements influenced by social tiesrespectively The model has acceptable performance in pre-dicting userrsquos mobility However it only explores how userrsquosmobility is influenced by features like social ties and periodicactivities but pays no attention to how userrsquos mobility canimpact other aspects of userrsquos life

There are also many research works making use of usersrsquotrajectories in physical world for other interesting purposesFor instance Lian and Xie [7] proposed a novel locationnaming approach to automatically provide concrete andmeaningful location names to users based on their currentlocation time and check-in histories

22 Factors with Userrsquos Relation in the Virtual World Nowa-days many people spent more and more time on on-linesocial networks such as Facebook and Twitter keeping intouch with existing friends getting to know new friendsand sharing ideas and resources Many factors might haveimpact on the evolution of the userrsquos social network Xianget al [8] try to model relationship strength using usersrsquoprofiles and interactions among users using the data fromFacebook and LinkedIn Before building a model they firstassume that people tend to form ties with people havingsimilar characteristics and relationship strength has impacton online interactions both on nature and frequency Theydevelop an unsupervised latent variable model to estimaterelationship strength from interaction activities which canbe communication tagging or something else and usersimilarity extracted from usersrsquo profiles The estimated rela-tionship strengths result in a weighted graph of which thespurious links have been downweighted while the importantones have been highlighted Finally these weighted links canbe used to increase the accuracy of social network miningtasks including link prediction Tang et al [9] try to dolink prediction using data across heterogeneous networksThe main question is how to bridge the available knowledgefrom different networks to help infer different types of socialrelationships Their main ideas come from several socialpsychological theories such as social balance structural holeand social status They proposed a transfer-based factorgraph model incorporating social theories into a semisu-pervised learning framework used to transfer supervisedinformation from the source network to help infer socialties in the target network As we all know almost everyuser has accounts on different online social networks anddifferent networks contain different aspects of informationof a user This work represents a new and interestingresearch direction in making full use of userrsquos online activi-ties

Besides the data fromonline social network emails can beregarded as usersrsquo activities in the virtual world Researchers

from Google [10] made use of usersrsquo implicit social graphwhich is formed by usersrsquo interactions with contact andgroups of contacts in Gmail to do friendship recommen-dation They also proposed an interaction-based metric forestimating a userrsquos affinity to their contacts and groupsTheir experiments showed that both implicit social graphand interaction based affinity were important in suggestingfriends

23 Factors with Mixed Resources As online social networksand userrsquos movements contain userrsquos activities and relation-ships in virtual and physical world respectively if we tryto understand userrsquos activities from only one point of viewwe can only get biased results To get better and unbiasedunderstanding of userrsquos activities in both circumstancesseveral researchers turn to combine information from thesetwo aspects to do link prediction Cranshaw et al [11] firstanalyze the social context of a geographic region from a setof location-based features including location entropy Thenthey compose a model to predict the friendship betweenany two users by analyzing their location trails Finally theyshow a positive relationship between the colocation historiesand the social ties that the user has in the network Theirwork proves that offline mobility has impact on userrsquos onlineactivities Wang et al [12] try to find the relation betweenhuman mobility social ties and link prediction Mobilecommunication records are regarded as the representation ofsocial ties while userrsquos moving trajectories are extracted fromcellular networkThe authors try to predict the social ties withdifferent datasets each ofwhich contains different proportionof userrsquos mobility

Scellato et al [13] described a supervised learning frame-work which exploits prediction features which they extractedfrom data of Gowalla to predict new links among friends-of-friends and place-friends and showed that the inclusion ofinformation about places and related user activity offers highlink prediction performance

Userrsquos friendship usually can be crawled from onlinesocial networks but userrsquos mobility is very hard to getPrevious works usually tend to use cellular network dataor hire volunteers to collect trajectories However cellularnetwork data is often of low accuracy while hiring volunteersto collect data is only suitable for a small scale of studyNo matter which one of these two resources was usedresearchers have to spend much time in extracting locationswith semantic or geographic context from the raw data Infact we care much more about where the user has been toinstead of how heshe got there Location-based online socialnetworks offer such kind of information as users often tag theplaces they have visited In this way we can do link predictionand study the impact that userrsquos mobility and social ties haveon each other from a new point of view

3 Preliminary

In this section we first give several definitions and thenintroduce our mainmotivations To simplify the explanationwe view locations and users as the same which is determinedby our motivations

4 International Journal of Distributed Sensor Networks

31 Definitions

Definition 1 (node) A node V refers to a user or a locationthat the network contains Users and locations have the samestatus in this paper Each node V is associated with a uniqueID V sdot 119894119889 which starts from 1 and ends at the number of totalnodes

Definition 2 (edge) If two users represented by node 119895 andnode 119896 in the network are friends there will be an edge 119890

119895119896

between them showing that they are connected to each otherSimilarly if a user has visited a place therewill also be an edgebetween them Each edge 119890

119895119896is undirected and associated

with two terminal points 119890119895119896sdot V119895and 119890119895119896sdot V119896 referring to

the two nodes (node 119895 and node 119896) it connects 119890119895119896sdot 119908 is the

weight of the edge determined by the popularity entropy oftwo nodes Furthermore a user can visit a place for severaltimes but only can be friends with another user for onlyonce which means that part of edges may occur more thanonce To record this feature the last attribute of each edge119890119895119896

is 119890119895119896sdot 119905 recording the times the edge occurred For edges

connecting users the value is 1 but for those connecting usersand locations the value may be large than 1

Definition 3 (global social graph) A global social graph119866(119881 119864) contains all the information and represents thestructure of awhole location-based social network containingtwo types of elements node V and edge 119890 It is an undirectedgraph

Definition 4 (local social graph) Node 119897rsquos local social graph119866119897(119881119897 119864119897) may contain the complete friendship and all the

visited places of a user or all the users who have beento a particular place All nodes in the local social graph119866119897are connected to the node 119897 directly that is only one-

hop connections exist in local graph The graph is also anundirected graph

Definition 5 (cofriend) For any two friends 119860 and 119861 of auser the user is a cofriend cf of them

Definition 6 (colocation) If two users have been to the sameplace the place is their colocation cl We do not require thatthe two users must visit the place at the same time

Definition 7 (popularity entropy) Popularity entropy pe isused to weigh a nodersquos popularity among all the nodes andhas impact on the strength of links connected to the nodes

32 Motivations Our motivation is quite simple and directMost existing research works focus on predicting the linkamong friendsrsquo friends or between friends and friendsrsquo visitedlocations all of which are nodes that are no more than 2hops away from each other However we argue that new linkswithin 2 hops away are just a small part of all the newly addededges and there are a large amount of newly added edgesoccurring between the nodes whose distance is more than 2hops Figure 1 shows the proportion of the two kinds of newadded links occurring during the period fromSeptember 2011to February 2012 in our dataset collected from Foursquare

New

ly ad

ded

edge

coun

t

2 hops N hops (N ge 3)

Nodes distance

90000

80000

70000

60000

50000

40000

30000

20000

10000

0489

83321

Figure 1 Distance of nodes that newly added edges link

We can see from Figure 1 that very few new links areadded among the nodes within 2-hop distance while mostnew links are added among the nodes which are farther awayfrom each otherThe result shows that if we only pay attentionto the new links within 2 hops much information will be lostand high precision of link prediction cannot be achieved

It is well accepted that for any two unconnected users themore friends they share the more likely that they would befriends in the near future But we may have a common sensethat famous users such as movie stars like Justin Bieber orsome dignitaries for example Obama usually have a verylarge social network in those online social networks If anytwo users whose most friends are the famous users on thenetwork do they have large probability to be friends in thefuture It is probably not We argue that except the amountof common friends any two users have the scale of commonfriendsrsquo social network also has impact on forming a new linkbetween two strangers

The last motivation is that since online social networkhas become a part of life we want to analyse the interactionbetween our daily life and the online social network Toexplain it in detail we divide userrsquos activities into two partsthe first part is userrsquos activities in the offline world forexample moving to a new city or meeting with new friendsand so forth while the other part is userrsquos online activitiessuch as adding new friends or commenting on friendsrsquophotos What kind of influence these two kinds of activitieshave on each other is largely unexplored

Before further introducing our work we introduce twobasic hypotheses in this paper as follows

Hypothesis 1 The larger the scale of overlap that any twousersrsquo relationship has in virtual world is the more likely thatthey will have larger scale of overlap in check-ins in physicalworld

Hypothesis 2 The larger the scale of overlap that any twousersrsquo check-ins in physical world have is the more likely thatthey will have large scale of overlap of relationship in virtualworld

International Journal of Distributed Sensor Networks 5

Table 1 Amount of newly added edges in different groups

Group Amount of shared friends Amount of shared friendsrsquo friends Amount of new added edges1 1033 2574 1412 1156 1479 375

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40Prob

abili

ty o

f bei

ng fr

iend

s

Count of cofriends

10908070605040302010

Figure 2The relationship between probability of being friends andthe amount of cofriends

These two hypotheses come from the goal of exploringthe impact of userrsquos activities in physical and virtual worldon each other Our further experimental results will showthat these two hypotheses are reasonable and meet thephenomenon in reality

4 Observations

We use the dataset crawled from Foursquare The datasetcontains 31524 users 51265 venues (locations) and usersrsquotips about these venues from July 2011 to May 2012 Inthis paper we take userrsquos check-ins extracted from tips(time and geographic information) as userrsquos activity in thephysical world and the evolution of userrsquos friendship as userrsquosactivity in the virtual world We here present the statisticcharacteristic of the data collected from July 2011 to February2012

Firstly we analyze how the amount of shared friends hasinfluence on usersrsquo social ties Figure 2 shows the relationshipbetween the amount of shared friends that any two users haveand their probability of being friends

In Figure 2 we only show the records of samples with arelatively larger scale It is easy to find that the probabilityof being friends almost rises as the amount of shared friendsgrows However it is also obvious that the curve shakes withthe increment and the probability does not grow purely Howdoes it happen Dose the probability not grow as the amountof common friends rises Just as our motivation it is notalways true To verify the idea we choose two groups of usersIn Group 1 the shared friends of any two users are almostthe users who own a relatively larger social network whilein Group 2 shared friends are with a relatively smaller socialnetwork We analyse the average amount of new links addedamong these users from September 2011 to February 2012in different groups and the results are definitely different asshown in Table 1

It is not surprising to find that there are only very few newlinks added in the first group but much more in the second

Prob

abili

ty o

f bei

ng fr

iend

sCount of colocations

1 2 3 4 5 6 7 8

02

018

016

014

012

01

008

006

004

002

0

Figure 3The relationship between probability of being friends andthe amount of colocations

group Just as mentioned in ourmotivation users with a largeamount of friends are usually some kind of famous people inthe social network and anyone who is interested in them canadd them as friends As a result the links between the famoususer and their friends are quite weak and cannot be usedas the support to predict new links But for the less popularusers since their social networks are relatively smaller theirfriends are closely connected with them and the links are thestrong supports for predicting new links

Since we take userrsquos check-ins as usersrsquo activities inphysical world and evolution of their friendships as activitiesin virtual world and our goal is to find whether userrsquosactivities in virtual and physical world have impact on eachother we have to investigate two topics The first one is thatif two users have 119873 colocations we want to know what theprobability of the two users being friends is

From Figure 3 we can observe that as the number ofcolocations of two users rises the probability of being friendsincreases obviously However we can observe some shakesclearly again The reason why the shakes occur is similarto that in Figure 2 There are always some places attractingnumerous peoplewhile someothers are visited by a very smallgroup of peopleThe popularity of a place has great impact onthe strength of the links between the place and the attractedusers The more the people visit a place the more popularthe place is the weaker the links between the place and userare and vice versa As the number of colocations rises theproportion of popular colocations changes and the amountof pairs of users who have colocations changes which is thereason why the shakes occur

The second topic we want to investigate is that as thenumber of cofriends increases whether the probability ofhaving colocations will rise The result is shown in Figure 4

6 International Journal of Distributed Sensor Networks

02

018

016

014

012

01

008

006

004

002

0

Prob

abili

ty o

f hav

ing

colo

catio

ns

Count of cofriends0 5 10 15 20 25 30 35 40 45 50

Figure 4 The relationship between probability of having coloca-tions and the count of cofriends

We can observe that when the number of cofriends increasesthe probability of having colocations also increases but thereare also some shakes The explanation is similar to the aboveone

All the static analysis results show that it is not properto consider the amount of colocations or cofriends onlywhen predicting new links Each member of colocations andcofriends contains much information about the links andhelps us to understand the existing links better But how totranslate the information into a form that can be used in linkprediction remains a problem In the next section we will tryto solve it

5 Impact Detection

We try to study how userrsquos friendship and their check-insimpact each other Our main task is to predict the newedges (links) in the graph with nodes representing users andlocations It is well accepted that the more information weobtain and use when we try to predict new links the higherprecision we can get However due to the limits of the spaceof storage and time it is not practical to store all informationIt is not a goodway to simply count the number of colocationsand cofriends since each element of them also contains somefactors that have impact on the addition of new links In thispaper we propose an effective way to handle the issue

Every node in the global graph is connected with manynodes and these nodes may be connected with many othernodes Each nodersquos popularity is determined by its connectionwith the rest of nodes We use a vector to represent eachnode There are a very large number of nodes includingusers and locations in the global graph but only a fewnodes are connected with each other which means that theglobal graph is very sparse If we store all the edges in thegraph without considering whether they really exist themostelements in every vector will be 0 meaning that the nodesare not connected and most information in the vectors ismeaningless In fact only the connected nodes and their

edges make sense in link prediction Since each edge storesthe information about its weight its occurrence times andthe two nodes it connects a nodersquos vector which contains allthe edges connected to the node can store all the informationthat the node has By doing so the topology of the socialnetworks is stored with high coefficient of storage spaceutilization and it also ensures that all the nodes in the socialnetwork have the same status

If the social network contains119873 nodes in total for node119894 connected with 119899 edges its vector V

119894 is defined as follows

V119894= V (119890

1198941198951

1198901198941198952

119890119894119895119899

) 119894 isin [1119873] 119895119898 = 119894 119898 isin [1 119899]

(1)

where 119890119894119895(119894 = 119895) represents the edge existing between node

119894 and node 119895 and 119890119894119895is the same with 119890

119895119894(119894 = 119895) because we

treat the graph of social network as on undirected graph inthis paper All the elementsrsquo initial values of weight are set tobe 1 and the final value will be larger than 0 in every vectorSince all the weights are associated with the nodesrsquo popularityentropy we introduce amethod to calculate entropy as shownin Algorithm 1 119890

119895sdot 119905means that the number of the edges of 119890

119895

appears in the datasetFor a node 119894 we first get the sum count 119899 of its vectorrsquos

elements thenwe calculate the sumvalue of occurrence timesof the edges linked to node 119894 According to our hypothesis forany node its popularity is proportional to the amount 119899 ofother nodes it is connected with and the occurrence times ofedges that are linked to the node In order to get normalizedweight for each node the popularity entropy pe

119894for node 119894 is

computed as follows

pe119894= 119899 lowast

valueSum119864

(2)

where the sum Sum119864

of edges is used to complete thenormalization To explain the algorithm in detail the basicidea of calculating the weight of each edge is summarized asfollows

For two nodes that have been connected to each other inthe location-based social network

(1) the more shared nodes they have the stronger the tiebetween them is that is why we use the sum of thevalues got from each element as edgersquos weight betweenthe two nodes

(2) the more less-popular nodes they share the strongerthe tie between them is that is why we use thereciprocal of the nodesrsquo popularity entropy as a factorthat has impact on the strength of tie between the twonodes

(3) the more times an edge occurs the more importantthe linking nodes are for each other sowe use the sumof the occurrence times they connected with theirshared nodes to weigh the importance of the sharednodes for the two nodes

As the values of 119890119895119894sdot 119908 and 119890

119894119895sdot 119908 are the same we can

compute them at the same time To keep the computation in

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

2 International Journal of Distributed Sensor Networks

information one is userrsquos social graph in visual world and theother is userrsquos check-ins in the physical world However it isdifficult to get check-ins directly for security reasons thus weuse tips instead which contain not only time and location ascheck-ins but also some comments or messages created byusers

We use vectors with different amount of elements torepresent different users and places Each element in a vectorrepresents an entity which can be another user or a locationlinked to the user If any two users are friends or a user hasbeen to a place we can say that the two users are linked toeach other or the user and the place are linked to each otherwhich means that there is an edge between them in graphmade up by users and locations In this way we store all theinformation and edges in the graph consisting of users andlocations that are regarded as nodes

All the vectors representing users and locations makeup a graph which is called global graph while a certainuserrsquos friends and hisher visited locations make up the userrsquoslocal graph It is known to all that many ordinary usersmay follow many famous movie stars or famous writers inthe on-line social network but for any two of this kind ofordinary users they are not much less possible to be friendsin the futureThe reason is obvious that famous users usuallyhave very large local graphs linking them to many users andthese users do not have much in common with each otherThis phenomenon also exists among the users who visited avery popular place For example there are numerous visitorsvisiting the Imperial Palace every day but only very few usersmay be friends On the contrary users who are linked tounpopular locations and unpopular users are highly likely tobe friends in the future In order to portray this characteristicto be used in link prediction we define a weighing entropyto weigh every node so that we can take all the informationthat each node contains when predicting usersrsquo relationshipsand movements Finally we conduct link prediction by usingthe random walk with restart method considering the effectof every node and every edge

In the remainder of this paper we first discuss relatedworks in Section 2 and give several basic definitions and ourmotivation in Section 3 In Section 4 we analyse the statisticfeatures of the data The link prediction and experimentalresults are presented in Sections 5 and 6 respectively Finallywe conclude the paper in Section 7

2 Related Work

The link prediction problem in social networks has attracteda considerable attention since it was introduced by Liben-Nowell and Kleinberg [4] Link prediction was first exploredin the complex networks and when more and more usersrsquostarted joining on-line social networks researchers started toanalyse usersrsquo activities in on-line social networks with differ-ent backgrounds and different purposes In fact we can viewsocial networks as special complex networks whose nodesare entities (users or locations) and edges represent inter-action collaboration or influence between entities Beforeintroducing the studies in link prediction firstly we try todefine this problem as follows

Given a snapshot of a graph representing social networkat time 119879

0 we seek to accurately predict the new edges that

will be added to the graph during a certain time interval fromtime 119879

0to a given future time 119879

1

Pondering over this problem deeply we can easily findthat it contains two questions The first one is that by whatfeatures intrinsic to the network itself the social networkevolves The second one is how the social network evolvesThe key to get the answer is to find the factors that havethe greatest impact on the networkrsquos evolution We try tosummarize the relatedwork according to the factors that wereused in link prediction

21 Factors with Userrsquos Movements in Physical World Topredict the occurrence of new edge between any two usersin a social network the first step is to find the similaritiesbetween themMany researchers tried tomine user similarityfrom usersrsquo movements Usersrsquo movements in the physicalworld usually show their travelling trajectories in daily lifeand represent usersrsquo activities in the physical world directlyWhen users are at a certain place they can only do certainthings for example users can only watch movies in a movietheatrewhile have dinner at restaurants Furthermore similarpeople go to similar places with similar visiting sequencesat similar time Based on these ideas issues focusing onmining usersrsquo similarities fromusersrsquo movements have largelybeen explored Li et al [5] hire several volunteers fromdifferent countries and different cities equipped with GPSloggers to collect their trajectories for several months tomine similarities among these volunteers They firstly detectstay points from trajectories considering the length of userrsquosstaying time and the distance among different GPS points andtransferred userrsquos raw GPS trajectories into the sequences ofstay points They build a hierarchical graph with three levelswhich have different ranges of areas to model userrsquos locationhistory They argue that except for colocations which referto the locations that are visited by related users among theusers visiting sequences make more sense and carry moremeaningful information in mining usersrsquo similarities Thenthey tried to find similar sequences with different lengthsamong users and compute the similarity across multilevelsThis work has made a great process in mining user similarityfrom physical world resources However it is difficult to geta large-scale nonvolunteer userrsquos trajectories partly becauseit is impossible to let everyone equip with a GPS loggerand partly due to privacy issues Therefore it is hard toapply this idea to find similarities among a large scale ofusers

Asmore andmore users use cell phones researchers try toinfer social ties from cellular network data Cho et al [6] usedata from Brightkite (httpbrightkitecom) and Gowalla(httpgowallacom) along with a dataset of cell phonelocation trace data to understand the basic laws that governusersrsquo motion and dynamics They find that there basicallyexist two features in usersrsquo travel style One is that usersrsquo short-ranged travel is geographically and temporally periodic andhas nothing to do with social network structure the otheris that usersrsquo long-distance travel is largely affected by socialnetwork ties More than 50 of userrsquos movements can be

International Journal of Distributed Sensor Networks 3

explained by periodic behaviour while less than 30 of userrsquosmovements can be explained by social ties Thus they builda periodic and social mobility model to predict individualsrsquomobility combining periodic short-ranged movements withsocial-ties related travels Their model has three differentcomponents capturing the feature of userrsquos regularly visitingspatial locations the temporal movement between theselocations and userrsquos movements influenced by social tiesrespectively The model has acceptable performance in pre-dicting userrsquos mobility However it only explores how userrsquosmobility is influenced by features like social ties and periodicactivities but pays no attention to how userrsquos mobility canimpact other aspects of userrsquos life

There are also many research works making use of usersrsquotrajectories in physical world for other interesting purposesFor instance Lian and Xie [7] proposed a novel locationnaming approach to automatically provide concrete andmeaningful location names to users based on their currentlocation time and check-in histories

22 Factors with Userrsquos Relation in the Virtual World Nowa-days many people spent more and more time on on-linesocial networks such as Facebook and Twitter keeping intouch with existing friends getting to know new friendsand sharing ideas and resources Many factors might haveimpact on the evolution of the userrsquos social network Xianget al [8] try to model relationship strength using usersrsquoprofiles and interactions among users using the data fromFacebook and LinkedIn Before building a model they firstassume that people tend to form ties with people havingsimilar characteristics and relationship strength has impacton online interactions both on nature and frequency Theydevelop an unsupervised latent variable model to estimaterelationship strength from interaction activities which canbe communication tagging or something else and usersimilarity extracted from usersrsquo profiles The estimated rela-tionship strengths result in a weighted graph of which thespurious links have been downweighted while the importantones have been highlighted Finally these weighted links canbe used to increase the accuracy of social network miningtasks including link prediction Tang et al [9] try to dolink prediction using data across heterogeneous networksThe main question is how to bridge the available knowledgefrom different networks to help infer different types of socialrelationships Their main ideas come from several socialpsychological theories such as social balance structural holeand social status They proposed a transfer-based factorgraph model incorporating social theories into a semisu-pervised learning framework used to transfer supervisedinformation from the source network to help infer socialties in the target network As we all know almost everyuser has accounts on different online social networks anddifferent networks contain different aspects of informationof a user This work represents a new and interestingresearch direction in making full use of userrsquos online activi-ties

Besides the data fromonline social network emails can beregarded as usersrsquo activities in the virtual world Researchers

from Google [10] made use of usersrsquo implicit social graphwhich is formed by usersrsquo interactions with contact andgroups of contacts in Gmail to do friendship recommen-dation They also proposed an interaction-based metric forestimating a userrsquos affinity to their contacts and groupsTheir experiments showed that both implicit social graphand interaction based affinity were important in suggestingfriends

23 Factors with Mixed Resources As online social networksand userrsquos movements contain userrsquos activities and relation-ships in virtual and physical world respectively if we tryto understand userrsquos activities from only one point of viewwe can only get biased results To get better and unbiasedunderstanding of userrsquos activities in both circumstancesseveral researchers turn to combine information from thesetwo aspects to do link prediction Cranshaw et al [11] firstanalyze the social context of a geographic region from a setof location-based features including location entropy Thenthey compose a model to predict the friendship betweenany two users by analyzing their location trails Finally theyshow a positive relationship between the colocation historiesand the social ties that the user has in the network Theirwork proves that offline mobility has impact on userrsquos onlineactivities Wang et al [12] try to find the relation betweenhuman mobility social ties and link prediction Mobilecommunication records are regarded as the representation ofsocial ties while userrsquos moving trajectories are extracted fromcellular networkThe authors try to predict the social ties withdifferent datasets each ofwhich contains different proportionof userrsquos mobility

Scellato et al [13] described a supervised learning frame-work which exploits prediction features which they extractedfrom data of Gowalla to predict new links among friends-of-friends and place-friends and showed that the inclusion ofinformation about places and related user activity offers highlink prediction performance

Userrsquos friendship usually can be crawled from onlinesocial networks but userrsquos mobility is very hard to getPrevious works usually tend to use cellular network dataor hire volunteers to collect trajectories However cellularnetwork data is often of low accuracy while hiring volunteersto collect data is only suitable for a small scale of studyNo matter which one of these two resources was usedresearchers have to spend much time in extracting locationswith semantic or geographic context from the raw data Infact we care much more about where the user has been toinstead of how heshe got there Location-based online socialnetworks offer such kind of information as users often tag theplaces they have visited In this way we can do link predictionand study the impact that userrsquos mobility and social ties haveon each other from a new point of view

3 Preliminary

In this section we first give several definitions and thenintroduce our mainmotivations To simplify the explanationwe view locations and users as the same which is determinedby our motivations

4 International Journal of Distributed Sensor Networks

31 Definitions

Definition 1 (node) A node V refers to a user or a locationthat the network contains Users and locations have the samestatus in this paper Each node V is associated with a uniqueID V sdot 119894119889 which starts from 1 and ends at the number of totalnodes

Definition 2 (edge) If two users represented by node 119895 andnode 119896 in the network are friends there will be an edge 119890

119895119896

between them showing that they are connected to each otherSimilarly if a user has visited a place therewill also be an edgebetween them Each edge 119890

119895119896is undirected and associated

with two terminal points 119890119895119896sdot V119895and 119890119895119896sdot V119896 referring to

the two nodes (node 119895 and node 119896) it connects 119890119895119896sdot 119908 is the

weight of the edge determined by the popularity entropy oftwo nodes Furthermore a user can visit a place for severaltimes but only can be friends with another user for onlyonce which means that part of edges may occur more thanonce To record this feature the last attribute of each edge119890119895119896

is 119890119895119896sdot 119905 recording the times the edge occurred For edges

connecting users the value is 1 but for those connecting usersand locations the value may be large than 1

Definition 3 (global social graph) A global social graph119866(119881 119864) contains all the information and represents thestructure of awhole location-based social network containingtwo types of elements node V and edge 119890 It is an undirectedgraph

Definition 4 (local social graph) Node 119897rsquos local social graph119866119897(119881119897 119864119897) may contain the complete friendship and all the

visited places of a user or all the users who have beento a particular place All nodes in the local social graph119866119897are connected to the node 119897 directly that is only one-

hop connections exist in local graph The graph is also anundirected graph

Definition 5 (cofriend) For any two friends 119860 and 119861 of auser the user is a cofriend cf of them

Definition 6 (colocation) If two users have been to the sameplace the place is their colocation cl We do not require thatthe two users must visit the place at the same time

Definition 7 (popularity entropy) Popularity entropy pe isused to weigh a nodersquos popularity among all the nodes andhas impact on the strength of links connected to the nodes

32 Motivations Our motivation is quite simple and directMost existing research works focus on predicting the linkamong friendsrsquo friends or between friends and friendsrsquo visitedlocations all of which are nodes that are no more than 2hops away from each other However we argue that new linkswithin 2 hops away are just a small part of all the newly addededges and there are a large amount of newly added edgesoccurring between the nodes whose distance is more than 2hops Figure 1 shows the proportion of the two kinds of newadded links occurring during the period fromSeptember 2011to February 2012 in our dataset collected from Foursquare

New

ly ad

ded

edge

coun

t

2 hops N hops (N ge 3)

Nodes distance

90000

80000

70000

60000

50000

40000

30000

20000

10000

0489

83321

Figure 1 Distance of nodes that newly added edges link

We can see from Figure 1 that very few new links areadded among the nodes within 2-hop distance while mostnew links are added among the nodes which are farther awayfrom each otherThe result shows that if we only pay attentionto the new links within 2 hops much information will be lostand high precision of link prediction cannot be achieved

It is well accepted that for any two unconnected users themore friends they share the more likely that they would befriends in the near future But we may have a common sensethat famous users such as movie stars like Justin Bieber orsome dignitaries for example Obama usually have a verylarge social network in those online social networks If anytwo users whose most friends are the famous users on thenetwork do they have large probability to be friends in thefuture It is probably not We argue that except the amountof common friends any two users have the scale of commonfriendsrsquo social network also has impact on forming a new linkbetween two strangers

The last motivation is that since online social networkhas become a part of life we want to analyse the interactionbetween our daily life and the online social network Toexplain it in detail we divide userrsquos activities into two partsthe first part is userrsquos activities in the offline world forexample moving to a new city or meeting with new friendsand so forth while the other part is userrsquos online activitiessuch as adding new friends or commenting on friendsrsquophotos What kind of influence these two kinds of activitieshave on each other is largely unexplored

Before further introducing our work we introduce twobasic hypotheses in this paper as follows

Hypothesis 1 The larger the scale of overlap that any twousersrsquo relationship has in virtual world is the more likely thatthey will have larger scale of overlap in check-ins in physicalworld

Hypothesis 2 The larger the scale of overlap that any twousersrsquo check-ins in physical world have is the more likely thatthey will have large scale of overlap of relationship in virtualworld

International Journal of Distributed Sensor Networks 5

Table 1 Amount of newly added edges in different groups

Group Amount of shared friends Amount of shared friendsrsquo friends Amount of new added edges1 1033 2574 1412 1156 1479 375

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40Prob

abili

ty o

f bei

ng fr

iend

s

Count of cofriends

10908070605040302010

Figure 2The relationship between probability of being friends andthe amount of cofriends

These two hypotheses come from the goal of exploringthe impact of userrsquos activities in physical and virtual worldon each other Our further experimental results will showthat these two hypotheses are reasonable and meet thephenomenon in reality

4 Observations

We use the dataset crawled from Foursquare The datasetcontains 31524 users 51265 venues (locations) and usersrsquotips about these venues from July 2011 to May 2012 Inthis paper we take userrsquos check-ins extracted from tips(time and geographic information) as userrsquos activity in thephysical world and the evolution of userrsquos friendship as userrsquosactivity in the virtual world We here present the statisticcharacteristic of the data collected from July 2011 to February2012

Firstly we analyze how the amount of shared friends hasinfluence on usersrsquo social ties Figure 2 shows the relationshipbetween the amount of shared friends that any two users haveand their probability of being friends

In Figure 2 we only show the records of samples with arelatively larger scale It is easy to find that the probabilityof being friends almost rises as the amount of shared friendsgrows However it is also obvious that the curve shakes withthe increment and the probability does not grow purely Howdoes it happen Dose the probability not grow as the amountof common friends rises Just as our motivation it is notalways true To verify the idea we choose two groups of usersIn Group 1 the shared friends of any two users are almostthe users who own a relatively larger social network whilein Group 2 shared friends are with a relatively smaller socialnetwork We analyse the average amount of new links addedamong these users from September 2011 to February 2012in different groups and the results are definitely different asshown in Table 1

It is not surprising to find that there are only very few newlinks added in the first group but much more in the second

Prob

abili

ty o

f bei

ng fr

iend

sCount of colocations

1 2 3 4 5 6 7 8

02

018

016

014

012

01

008

006

004

002

0

Figure 3The relationship between probability of being friends andthe amount of colocations

group Just as mentioned in ourmotivation users with a largeamount of friends are usually some kind of famous people inthe social network and anyone who is interested in them canadd them as friends As a result the links between the famoususer and their friends are quite weak and cannot be usedas the support to predict new links But for the less popularusers since their social networks are relatively smaller theirfriends are closely connected with them and the links are thestrong supports for predicting new links

Since we take userrsquos check-ins as usersrsquo activities inphysical world and evolution of their friendships as activitiesin virtual world and our goal is to find whether userrsquosactivities in virtual and physical world have impact on eachother we have to investigate two topics The first one is thatif two users have 119873 colocations we want to know what theprobability of the two users being friends is

From Figure 3 we can observe that as the number ofcolocations of two users rises the probability of being friendsincreases obviously However we can observe some shakesclearly again The reason why the shakes occur is similarto that in Figure 2 There are always some places attractingnumerous peoplewhile someothers are visited by a very smallgroup of peopleThe popularity of a place has great impact onthe strength of the links between the place and the attractedusers The more the people visit a place the more popularthe place is the weaker the links between the place and userare and vice versa As the number of colocations rises theproportion of popular colocations changes and the amountof pairs of users who have colocations changes which is thereason why the shakes occur

The second topic we want to investigate is that as thenumber of cofriends increases whether the probability ofhaving colocations will rise The result is shown in Figure 4

6 International Journal of Distributed Sensor Networks

02

018

016

014

012

01

008

006

004

002

0

Prob

abili

ty o

f hav

ing

colo

catio

ns

Count of cofriends0 5 10 15 20 25 30 35 40 45 50

Figure 4 The relationship between probability of having coloca-tions and the count of cofriends

We can observe that when the number of cofriends increasesthe probability of having colocations also increases but thereare also some shakes The explanation is similar to the aboveone

All the static analysis results show that it is not properto consider the amount of colocations or cofriends onlywhen predicting new links Each member of colocations andcofriends contains much information about the links andhelps us to understand the existing links better But how totranslate the information into a form that can be used in linkprediction remains a problem In the next section we will tryto solve it

5 Impact Detection

We try to study how userrsquos friendship and their check-insimpact each other Our main task is to predict the newedges (links) in the graph with nodes representing users andlocations It is well accepted that the more information weobtain and use when we try to predict new links the higherprecision we can get However due to the limits of the spaceof storage and time it is not practical to store all informationIt is not a goodway to simply count the number of colocationsand cofriends since each element of them also contains somefactors that have impact on the addition of new links In thispaper we propose an effective way to handle the issue

Every node in the global graph is connected with manynodes and these nodes may be connected with many othernodes Each nodersquos popularity is determined by its connectionwith the rest of nodes We use a vector to represent eachnode There are a very large number of nodes includingusers and locations in the global graph but only a fewnodes are connected with each other which means that theglobal graph is very sparse If we store all the edges in thegraph without considering whether they really exist themostelements in every vector will be 0 meaning that the nodesare not connected and most information in the vectors ismeaningless In fact only the connected nodes and their

edges make sense in link prediction Since each edge storesthe information about its weight its occurrence times andthe two nodes it connects a nodersquos vector which contains allthe edges connected to the node can store all the informationthat the node has By doing so the topology of the socialnetworks is stored with high coefficient of storage spaceutilization and it also ensures that all the nodes in the socialnetwork have the same status

If the social network contains119873 nodes in total for node119894 connected with 119899 edges its vector V

119894 is defined as follows

V119894= V (119890

1198941198951

1198901198941198952

119890119894119895119899

) 119894 isin [1119873] 119895119898 = 119894 119898 isin [1 119899]

(1)

where 119890119894119895(119894 = 119895) represents the edge existing between node

119894 and node 119895 and 119890119894119895is the same with 119890

119895119894(119894 = 119895) because we

treat the graph of social network as on undirected graph inthis paper All the elementsrsquo initial values of weight are set tobe 1 and the final value will be larger than 0 in every vectorSince all the weights are associated with the nodesrsquo popularityentropy we introduce amethod to calculate entropy as shownin Algorithm 1 119890

119895sdot 119905means that the number of the edges of 119890

119895

appears in the datasetFor a node 119894 we first get the sum count 119899 of its vectorrsquos

elements thenwe calculate the sumvalue of occurrence timesof the edges linked to node 119894 According to our hypothesis forany node its popularity is proportional to the amount 119899 ofother nodes it is connected with and the occurrence times ofedges that are linked to the node In order to get normalizedweight for each node the popularity entropy pe

119894for node 119894 is

computed as follows

pe119894= 119899 lowast

valueSum119864

(2)

where the sum Sum119864

of edges is used to complete thenormalization To explain the algorithm in detail the basicidea of calculating the weight of each edge is summarized asfollows

For two nodes that have been connected to each other inthe location-based social network

(1) the more shared nodes they have the stronger the tiebetween them is that is why we use the sum of thevalues got from each element as edgersquos weight betweenthe two nodes

(2) the more less-popular nodes they share the strongerthe tie between them is that is why we use thereciprocal of the nodesrsquo popularity entropy as a factorthat has impact on the strength of tie between the twonodes

(3) the more times an edge occurs the more importantthe linking nodes are for each other sowe use the sumof the occurrence times they connected with theirshared nodes to weigh the importance of the sharednodes for the two nodes

As the values of 119890119895119894sdot 119908 and 119890

119894119895sdot 119908 are the same we can

compute them at the same time To keep the computation in

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

International Journal of Distributed Sensor Networks 3

explained by periodic behaviour while less than 30 of userrsquosmovements can be explained by social ties Thus they builda periodic and social mobility model to predict individualsrsquomobility combining periodic short-ranged movements withsocial-ties related travels Their model has three differentcomponents capturing the feature of userrsquos regularly visitingspatial locations the temporal movement between theselocations and userrsquos movements influenced by social tiesrespectively The model has acceptable performance in pre-dicting userrsquos mobility However it only explores how userrsquosmobility is influenced by features like social ties and periodicactivities but pays no attention to how userrsquos mobility canimpact other aspects of userrsquos life

There are also many research works making use of usersrsquotrajectories in physical world for other interesting purposesFor instance Lian and Xie [7] proposed a novel locationnaming approach to automatically provide concrete andmeaningful location names to users based on their currentlocation time and check-in histories

22 Factors with Userrsquos Relation in the Virtual World Nowa-days many people spent more and more time on on-linesocial networks such as Facebook and Twitter keeping intouch with existing friends getting to know new friendsand sharing ideas and resources Many factors might haveimpact on the evolution of the userrsquos social network Xianget al [8] try to model relationship strength using usersrsquoprofiles and interactions among users using the data fromFacebook and LinkedIn Before building a model they firstassume that people tend to form ties with people havingsimilar characteristics and relationship strength has impacton online interactions both on nature and frequency Theydevelop an unsupervised latent variable model to estimaterelationship strength from interaction activities which canbe communication tagging or something else and usersimilarity extracted from usersrsquo profiles The estimated rela-tionship strengths result in a weighted graph of which thespurious links have been downweighted while the importantones have been highlighted Finally these weighted links canbe used to increase the accuracy of social network miningtasks including link prediction Tang et al [9] try to dolink prediction using data across heterogeneous networksThe main question is how to bridge the available knowledgefrom different networks to help infer different types of socialrelationships Their main ideas come from several socialpsychological theories such as social balance structural holeand social status They proposed a transfer-based factorgraph model incorporating social theories into a semisu-pervised learning framework used to transfer supervisedinformation from the source network to help infer socialties in the target network As we all know almost everyuser has accounts on different online social networks anddifferent networks contain different aspects of informationof a user This work represents a new and interestingresearch direction in making full use of userrsquos online activi-ties

Besides the data fromonline social network emails can beregarded as usersrsquo activities in the virtual world Researchers

from Google [10] made use of usersrsquo implicit social graphwhich is formed by usersrsquo interactions with contact andgroups of contacts in Gmail to do friendship recommen-dation They also proposed an interaction-based metric forestimating a userrsquos affinity to their contacts and groupsTheir experiments showed that both implicit social graphand interaction based affinity were important in suggestingfriends

23 Factors with Mixed Resources As online social networksand userrsquos movements contain userrsquos activities and relation-ships in virtual and physical world respectively if we tryto understand userrsquos activities from only one point of viewwe can only get biased results To get better and unbiasedunderstanding of userrsquos activities in both circumstancesseveral researchers turn to combine information from thesetwo aspects to do link prediction Cranshaw et al [11] firstanalyze the social context of a geographic region from a setof location-based features including location entropy Thenthey compose a model to predict the friendship betweenany two users by analyzing their location trails Finally theyshow a positive relationship between the colocation historiesand the social ties that the user has in the network Theirwork proves that offline mobility has impact on userrsquos onlineactivities Wang et al [12] try to find the relation betweenhuman mobility social ties and link prediction Mobilecommunication records are regarded as the representation ofsocial ties while userrsquos moving trajectories are extracted fromcellular networkThe authors try to predict the social ties withdifferent datasets each ofwhich contains different proportionof userrsquos mobility

Scellato et al [13] described a supervised learning frame-work which exploits prediction features which they extractedfrom data of Gowalla to predict new links among friends-of-friends and place-friends and showed that the inclusion ofinformation about places and related user activity offers highlink prediction performance

Userrsquos friendship usually can be crawled from onlinesocial networks but userrsquos mobility is very hard to getPrevious works usually tend to use cellular network dataor hire volunteers to collect trajectories However cellularnetwork data is often of low accuracy while hiring volunteersto collect data is only suitable for a small scale of studyNo matter which one of these two resources was usedresearchers have to spend much time in extracting locationswith semantic or geographic context from the raw data Infact we care much more about where the user has been toinstead of how heshe got there Location-based online socialnetworks offer such kind of information as users often tag theplaces they have visited In this way we can do link predictionand study the impact that userrsquos mobility and social ties haveon each other from a new point of view

3 Preliminary

In this section we first give several definitions and thenintroduce our mainmotivations To simplify the explanationwe view locations and users as the same which is determinedby our motivations

4 International Journal of Distributed Sensor Networks

31 Definitions

Definition 1 (node) A node V refers to a user or a locationthat the network contains Users and locations have the samestatus in this paper Each node V is associated with a uniqueID V sdot 119894119889 which starts from 1 and ends at the number of totalnodes

Definition 2 (edge) If two users represented by node 119895 andnode 119896 in the network are friends there will be an edge 119890

119895119896

between them showing that they are connected to each otherSimilarly if a user has visited a place therewill also be an edgebetween them Each edge 119890

119895119896is undirected and associated

with two terminal points 119890119895119896sdot V119895and 119890119895119896sdot V119896 referring to

the two nodes (node 119895 and node 119896) it connects 119890119895119896sdot 119908 is the

weight of the edge determined by the popularity entropy oftwo nodes Furthermore a user can visit a place for severaltimes but only can be friends with another user for onlyonce which means that part of edges may occur more thanonce To record this feature the last attribute of each edge119890119895119896

is 119890119895119896sdot 119905 recording the times the edge occurred For edges

connecting users the value is 1 but for those connecting usersand locations the value may be large than 1

Definition 3 (global social graph) A global social graph119866(119881 119864) contains all the information and represents thestructure of awhole location-based social network containingtwo types of elements node V and edge 119890 It is an undirectedgraph

Definition 4 (local social graph) Node 119897rsquos local social graph119866119897(119881119897 119864119897) may contain the complete friendship and all the

visited places of a user or all the users who have beento a particular place All nodes in the local social graph119866119897are connected to the node 119897 directly that is only one-

hop connections exist in local graph The graph is also anundirected graph

Definition 5 (cofriend) For any two friends 119860 and 119861 of auser the user is a cofriend cf of them

Definition 6 (colocation) If two users have been to the sameplace the place is their colocation cl We do not require thatthe two users must visit the place at the same time

Definition 7 (popularity entropy) Popularity entropy pe isused to weigh a nodersquos popularity among all the nodes andhas impact on the strength of links connected to the nodes

32 Motivations Our motivation is quite simple and directMost existing research works focus on predicting the linkamong friendsrsquo friends or between friends and friendsrsquo visitedlocations all of which are nodes that are no more than 2hops away from each other However we argue that new linkswithin 2 hops away are just a small part of all the newly addededges and there are a large amount of newly added edgesoccurring between the nodes whose distance is more than 2hops Figure 1 shows the proportion of the two kinds of newadded links occurring during the period fromSeptember 2011to February 2012 in our dataset collected from Foursquare

New

ly ad

ded

edge

coun

t

2 hops N hops (N ge 3)

Nodes distance

90000

80000

70000

60000

50000

40000

30000

20000

10000

0489

83321

Figure 1 Distance of nodes that newly added edges link

We can see from Figure 1 that very few new links areadded among the nodes within 2-hop distance while mostnew links are added among the nodes which are farther awayfrom each otherThe result shows that if we only pay attentionto the new links within 2 hops much information will be lostand high precision of link prediction cannot be achieved

It is well accepted that for any two unconnected users themore friends they share the more likely that they would befriends in the near future But we may have a common sensethat famous users such as movie stars like Justin Bieber orsome dignitaries for example Obama usually have a verylarge social network in those online social networks If anytwo users whose most friends are the famous users on thenetwork do they have large probability to be friends in thefuture It is probably not We argue that except the amountof common friends any two users have the scale of commonfriendsrsquo social network also has impact on forming a new linkbetween two strangers

The last motivation is that since online social networkhas become a part of life we want to analyse the interactionbetween our daily life and the online social network Toexplain it in detail we divide userrsquos activities into two partsthe first part is userrsquos activities in the offline world forexample moving to a new city or meeting with new friendsand so forth while the other part is userrsquos online activitiessuch as adding new friends or commenting on friendsrsquophotos What kind of influence these two kinds of activitieshave on each other is largely unexplored

Before further introducing our work we introduce twobasic hypotheses in this paper as follows

Hypothesis 1 The larger the scale of overlap that any twousersrsquo relationship has in virtual world is the more likely thatthey will have larger scale of overlap in check-ins in physicalworld

Hypothesis 2 The larger the scale of overlap that any twousersrsquo check-ins in physical world have is the more likely thatthey will have large scale of overlap of relationship in virtualworld

International Journal of Distributed Sensor Networks 5

Table 1 Amount of newly added edges in different groups

Group Amount of shared friends Amount of shared friendsrsquo friends Amount of new added edges1 1033 2574 1412 1156 1479 375

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40Prob

abili

ty o

f bei

ng fr

iend

s

Count of cofriends

10908070605040302010

Figure 2The relationship between probability of being friends andthe amount of cofriends

These two hypotheses come from the goal of exploringthe impact of userrsquos activities in physical and virtual worldon each other Our further experimental results will showthat these two hypotheses are reasonable and meet thephenomenon in reality

4 Observations

We use the dataset crawled from Foursquare The datasetcontains 31524 users 51265 venues (locations) and usersrsquotips about these venues from July 2011 to May 2012 Inthis paper we take userrsquos check-ins extracted from tips(time and geographic information) as userrsquos activity in thephysical world and the evolution of userrsquos friendship as userrsquosactivity in the virtual world We here present the statisticcharacteristic of the data collected from July 2011 to February2012

Firstly we analyze how the amount of shared friends hasinfluence on usersrsquo social ties Figure 2 shows the relationshipbetween the amount of shared friends that any two users haveand their probability of being friends

In Figure 2 we only show the records of samples with arelatively larger scale It is easy to find that the probabilityof being friends almost rises as the amount of shared friendsgrows However it is also obvious that the curve shakes withthe increment and the probability does not grow purely Howdoes it happen Dose the probability not grow as the amountof common friends rises Just as our motivation it is notalways true To verify the idea we choose two groups of usersIn Group 1 the shared friends of any two users are almostthe users who own a relatively larger social network whilein Group 2 shared friends are with a relatively smaller socialnetwork We analyse the average amount of new links addedamong these users from September 2011 to February 2012in different groups and the results are definitely different asshown in Table 1

It is not surprising to find that there are only very few newlinks added in the first group but much more in the second

Prob

abili

ty o

f bei

ng fr

iend

sCount of colocations

1 2 3 4 5 6 7 8

02

018

016

014

012

01

008

006

004

002

0

Figure 3The relationship between probability of being friends andthe amount of colocations

group Just as mentioned in ourmotivation users with a largeamount of friends are usually some kind of famous people inthe social network and anyone who is interested in them canadd them as friends As a result the links between the famoususer and their friends are quite weak and cannot be usedas the support to predict new links But for the less popularusers since their social networks are relatively smaller theirfriends are closely connected with them and the links are thestrong supports for predicting new links

Since we take userrsquos check-ins as usersrsquo activities inphysical world and evolution of their friendships as activitiesin virtual world and our goal is to find whether userrsquosactivities in virtual and physical world have impact on eachother we have to investigate two topics The first one is thatif two users have 119873 colocations we want to know what theprobability of the two users being friends is

From Figure 3 we can observe that as the number ofcolocations of two users rises the probability of being friendsincreases obviously However we can observe some shakesclearly again The reason why the shakes occur is similarto that in Figure 2 There are always some places attractingnumerous peoplewhile someothers are visited by a very smallgroup of peopleThe popularity of a place has great impact onthe strength of the links between the place and the attractedusers The more the people visit a place the more popularthe place is the weaker the links between the place and userare and vice versa As the number of colocations rises theproportion of popular colocations changes and the amountof pairs of users who have colocations changes which is thereason why the shakes occur

The second topic we want to investigate is that as thenumber of cofriends increases whether the probability ofhaving colocations will rise The result is shown in Figure 4

6 International Journal of Distributed Sensor Networks

02

018

016

014

012

01

008

006

004

002

0

Prob

abili

ty o

f hav

ing

colo

catio

ns

Count of cofriends0 5 10 15 20 25 30 35 40 45 50

Figure 4 The relationship between probability of having coloca-tions and the count of cofriends

We can observe that when the number of cofriends increasesthe probability of having colocations also increases but thereare also some shakes The explanation is similar to the aboveone

All the static analysis results show that it is not properto consider the amount of colocations or cofriends onlywhen predicting new links Each member of colocations andcofriends contains much information about the links andhelps us to understand the existing links better But how totranslate the information into a form that can be used in linkprediction remains a problem In the next section we will tryto solve it

5 Impact Detection

We try to study how userrsquos friendship and their check-insimpact each other Our main task is to predict the newedges (links) in the graph with nodes representing users andlocations It is well accepted that the more information weobtain and use when we try to predict new links the higherprecision we can get However due to the limits of the spaceof storage and time it is not practical to store all informationIt is not a goodway to simply count the number of colocationsand cofriends since each element of them also contains somefactors that have impact on the addition of new links In thispaper we propose an effective way to handle the issue

Every node in the global graph is connected with manynodes and these nodes may be connected with many othernodes Each nodersquos popularity is determined by its connectionwith the rest of nodes We use a vector to represent eachnode There are a very large number of nodes includingusers and locations in the global graph but only a fewnodes are connected with each other which means that theglobal graph is very sparse If we store all the edges in thegraph without considering whether they really exist themostelements in every vector will be 0 meaning that the nodesare not connected and most information in the vectors ismeaningless In fact only the connected nodes and their

edges make sense in link prediction Since each edge storesthe information about its weight its occurrence times andthe two nodes it connects a nodersquos vector which contains allthe edges connected to the node can store all the informationthat the node has By doing so the topology of the socialnetworks is stored with high coefficient of storage spaceutilization and it also ensures that all the nodes in the socialnetwork have the same status

If the social network contains119873 nodes in total for node119894 connected with 119899 edges its vector V

119894 is defined as follows

V119894= V (119890

1198941198951

1198901198941198952

119890119894119895119899

) 119894 isin [1119873] 119895119898 = 119894 119898 isin [1 119899]

(1)

where 119890119894119895(119894 = 119895) represents the edge existing between node

119894 and node 119895 and 119890119894119895is the same with 119890

119895119894(119894 = 119895) because we

treat the graph of social network as on undirected graph inthis paper All the elementsrsquo initial values of weight are set tobe 1 and the final value will be larger than 0 in every vectorSince all the weights are associated with the nodesrsquo popularityentropy we introduce amethod to calculate entropy as shownin Algorithm 1 119890

119895sdot 119905means that the number of the edges of 119890

119895

appears in the datasetFor a node 119894 we first get the sum count 119899 of its vectorrsquos

elements thenwe calculate the sumvalue of occurrence timesof the edges linked to node 119894 According to our hypothesis forany node its popularity is proportional to the amount 119899 ofother nodes it is connected with and the occurrence times ofedges that are linked to the node In order to get normalizedweight for each node the popularity entropy pe

119894for node 119894 is

computed as follows

pe119894= 119899 lowast

valueSum119864

(2)

where the sum Sum119864

of edges is used to complete thenormalization To explain the algorithm in detail the basicidea of calculating the weight of each edge is summarized asfollows

For two nodes that have been connected to each other inthe location-based social network

(1) the more shared nodes they have the stronger the tiebetween them is that is why we use the sum of thevalues got from each element as edgersquos weight betweenthe two nodes

(2) the more less-popular nodes they share the strongerthe tie between them is that is why we use thereciprocal of the nodesrsquo popularity entropy as a factorthat has impact on the strength of tie between the twonodes

(3) the more times an edge occurs the more importantthe linking nodes are for each other sowe use the sumof the occurrence times they connected with theirshared nodes to weigh the importance of the sharednodes for the two nodes

As the values of 119890119895119894sdot 119908 and 119890

119894119895sdot 119908 are the same we can

compute them at the same time To keep the computation in

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

4 International Journal of Distributed Sensor Networks

31 Definitions

Definition 1 (node) A node V refers to a user or a locationthat the network contains Users and locations have the samestatus in this paper Each node V is associated with a uniqueID V sdot 119894119889 which starts from 1 and ends at the number of totalnodes

Definition 2 (edge) If two users represented by node 119895 andnode 119896 in the network are friends there will be an edge 119890

119895119896

between them showing that they are connected to each otherSimilarly if a user has visited a place therewill also be an edgebetween them Each edge 119890

119895119896is undirected and associated

with two terminal points 119890119895119896sdot V119895and 119890119895119896sdot V119896 referring to

the two nodes (node 119895 and node 119896) it connects 119890119895119896sdot 119908 is the

weight of the edge determined by the popularity entropy oftwo nodes Furthermore a user can visit a place for severaltimes but only can be friends with another user for onlyonce which means that part of edges may occur more thanonce To record this feature the last attribute of each edge119890119895119896

is 119890119895119896sdot 119905 recording the times the edge occurred For edges

connecting users the value is 1 but for those connecting usersand locations the value may be large than 1

Definition 3 (global social graph) A global social graph119866(119881 119864) contains all the information and represents thestructure of awhole location-based social network containingtwo types of elements node V and edge 119890 It is an undirectedgraph

Definition 4 (local social graph) Node 119897rsquos local social graph119866119897(119881119897 119864119897) may contain the complete friendship and all the

visited places of a user or all the users who have beento a particular place All nodes in the local social graph119866119897are connected to the node 119897 directly that is only one-

hop connections exist in local graph The graph is also anundirected graph

Definition 5 (cofriend) For any two friends 119860 and 119861 of auser the user is a cofriend cf of them

Definition 6 (colocation) If two users have been to the sameplace the place is their colocation cl We do not require thatthe two users must visit the place at the same time

Definition 7 (popularity entropy) Popularity entropy pe isused to weigh a nodersquos popularity among all the nodes andhas impact on the strength of links connected to the nodes

32 Motivations Our motivation is quite simple and directMost existing research works focus on predicting the linkamong friendsrsquo friends or between friends and friendsrsquo visitedlocations all of which are nodes that are no more than 2hops away from each other However we argue that new linkswithin 2 hops away are just a small part of all the newly addededges and there are a large amount of newly added edgesoccurring between the nodes whose distance is more than 2hops Figure 1 shows the proportion of the two kinds of newadded links occurring during the period fromSeptember 2011to February 2012 in our dataset collected from Foursquare

New

ly ad

ded

edge

coun

t

2 hops N hops (N ge 3)

Nodes distance

90000

80000

70000

60000

50000

40000

30000

20000

10000

0489

83321

Figure 1 Distance of nodes that newly added edges link

We can see from Figure 1 that very few new links areadded among the nodes within 2-hop distance while mostnew links are added among the nodes which are farther awayfrom each otherThe result shows that if we only pay attentionto the new links within 2 hops much information will be lostand high precision of link prediction cannot be achieved

It is well accepted that for any two unconnected users themore friends they share the more likely that they would befriends in the near future But we may have a common sensethat famous users such as movie stars like Justin Bieber orsome dignitaries for example Obama usually have a verylarge social network in those online social networks If anytwo users whose most friends are the famous users on thenetwork do they have large probability to be friends in thefuture It is probably not We argue that except the amountof common friends any two users have the scale of commonfriendsrsquo social network also has impact on forming a new linkbetween two strangers

The last motivation is that since online social networkhas become a part of life we want to analyse the interactionbetween our daily life and the online social network Toexplain it in detail we divide userrsquos activities into two partsthe first part is userrsquos activities in the offline world forexample moving to a new city or meeting with new friendsand so forth while the other part is userrsquos online activitiessuch as adding new friends or commenting on friendsrsquophotos What kind of influence these two kinds of activitieshave on each other is largely unexplored

Before further introducing our work we introduce twobasic hypotheses in this paper as follows

Hypothesis 1 The larger the scale of overlap that any twousersrsquo relationship has in virtual world is the more likely thatthey will have larger scale of overlap in check-ins in physicalworld

Hypothesis 2 The larger the scale of overlap that any twousersrsquo check-ins in physical world have is the more likely thatthey will have large scale of overlap of relationship in virtualworld

International Journal of Distributed Sensor Networks 5

Table 1 Amount of newly added edges in different groups

Group Amount of shared friends Amount of shared friendsrsquo friends Amount of new added edges1 1033 2574 1412 1156 1479 375

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40Prob

abili

ty o

f bei

ng fr

iend

s

Count of cofriends

10908070605040302010

Figure 2The relationship between probability of being friends andthe amount of cofriends

These two hypotheses come from the goal of exploringthe impact of userrsquos activities in physical and virtual worldon each other Our further experimental results will showthat these two hypotheses are reasonable and meet thephenomenon in reality

4 Observations

We use the dataset crawled from Foursquare The datasetcontains 31524 users 51265 venues (locations) and usersrsquotips about these venues from July 2011 to May 2012 Inthis paper we take userrsquos check-ins extracted from tips(time and geographic information) as userrsquos activity in thephysical world and the evolution of userrsquos friendship as userrsquosactivity in the virtual world We here present the statisticcharacteristic of the data collected from July 2011 to February2012

Firstly we analyze how the amount of shared friends hasinfluence on usersrsquo social ties Figure 2 shows the relationshipbetween the amount of shared friends that any two users haveand their probability of being friends

In Figure 2 we only show the records of samples with arelatively larger scale It is easy to find that the probabilityof being friends almost rises as the amount of shared friendsgrows However it is also obvious that the curve shakes withthe increment and the probability does not grow purely Howdoes it happen Dose the probability not grow as the amountof common friends rises Just as our motivation it is notalways true To verify the idea we choose two groups of usersIn Group 1 the shared friends of any two users are almostthe users who own a relatively larger social network whilein Group 2 shared friends are with a relatively smaller socialnetwork We analyse the average amount of new links addedamong these users from September 2011 to February 2012in different groups and the results are definitely different asshown in Table 1

It is not surprising to find that there are only very few newlinks added in the first group but much more in the second

Prob

abili

ty o

f bei

ng fr

iend

sCount of colocations

1 2 3 4 5 6 7 8

02

018

016

014

012

01

008

006

004

002

0

Figure 3The relationship between probability of being friends andthe amount of colocations

group Just as mentioned in ourmotivation users with a largeamount of friends are usually some kind of famous people inthe social network and anyone who is interested in them canadd them as friends As a result the links between the famoususer and their friends are quite weak and cannot be usedas the support to predict new links But for the less popularusers since their social networks are relatively smaller theirfriends are closely connected with them and the links are thestrong supports for predicting new links

Since we take userrsquos check-ins as usersrsquo activities inphysical world and evolution of their friendships as activitiesin virtual world and our goal is to find whether userrsquosactivities in virtual and physical world have impact on eachother we have to investigate two topics The first one is thatif two users have 119873 colocations we want to know what theprobability of the two users being friends is

From Figure 3 we can observe that as the number ofcolocations of two users rises the probability of being friendsincreases obviously However we can observe some shakesclearly again The reason why the shakes occur is similarto that in Figure 2 There are always some places attractingnumerous peoplewhile someothers are visited by a very smallgroup of peopleThe popularity of a place has great impact onthe strength of the links between the place and the attractedusers The more the people visit a place the more popularthe place is the weaker the links between the place and userare and vice versa As the number of colocations rises theproportion of popular colocations changes and the amountof pairs of users who have colocations changes which is thereason why the shakes occur

The second topic we want to investigate is that as thenumber of cofriends increases whether the probability ofhaving colocations will rise The result is shown in Figure 4

6 International Journal of Distributed Sensor Networks

02

018

016

014

012

01

008

006

004

002

0

Prob

abili

ty o

f hav

ing

colo

catio

ns

Count of cofriends0 5 10 15 20 25 30 35 40 45 50

Figure 4 The relationship between probability of having coloca-tions and the count of cofriends

We can observe that when the number of cofriends increasesthe probability of having colocations also increases but thereare also some shakes The explanation is similar to the aboveone

All the static analysis results show that it is not properto consider the amount of colocations or cofriends onlywhen predicting new links Each member of colocations andcofriends contains much information about the links andhelps us to understand the existing links better But how totranslate the information into a form that can be used in linkprediction remains a problem In the next section we will tryto solve it

5 Impact Detection

We try to study how userrsquos friendship and their check-insimpact each other Our main task is to predict the newedges (links) in the graph with nodes representing users andlocations It is well accepted that the more information weobtain and use when we try to predict new links the higherprecision we can get However due to the limits of the spaceof storage and time it is not practical to store all informationIt is not a goodway to simply count the number of colocationsand cofriends since each element of them also contains somefactors that have impact on the addition of new links In thispaper we propose an effective way to handle the issue

Every node in the global graph is connected with manynodes and these nodes may be connected with many othernodes Each nodersquos popularity is determined by its connectionwith the rest of nodes We use a vector to represent eachnode There are a very large number of nodes includingusers and locations in the global graph but only a fewnodes are connected with each other which means that theglobal graph is very sparse If we store all the edges in thegraph without considering whether they really exist themostelements in every vector will be 0 meaning that the nodesare not connected and most information in the vectors ismeaningless In fact only the connected nodes and their

edges make sense in link prediction Since each edge storesthe information about its weight its occurrence times andthe two nodes it connects a nodersquos vector which contains allthe edges connected to the node can store all the informationthat the node has By doing so the topology of the socialnetworks is stored with high coefficient of storage spaceutilization and it also ensures that all the nodes in the socialnetwork have the same status

If the social network contains119873 nodes in total for node119894 connected with 119899 edges its vector V

119894 is defined as follows

V119894= V (119890

1198941198951

1198901198941198952

119890119894119895119899

) 119894 isin [1119873] 119895119898 = 119894 119898 isin [1 119899]

(1)

where 119890119894119895(119894 = 119895) represents the edge existing between node

119894 and node 119895 and 119890119894119895is the same with 119890

119895119894(119894 = 119895) because we

treat the graph of social network as on undirected graph inthis paper All the elementsrsquo initial values of weight are set tobe 1 and the final value will be larger than 0 in every vectorSince all the weights are associated with the nodesrsquo popularityentropy we introduce amethod to calculate entropy as shownin Algorithm 1 119890

119895sdot 119905means that the number of the edges of 119890

119895

appears in the datasetFor a node 119894 we first get the sum count 119899 of its vectorrsquos

elements thenwe calculate the sumvalue of occurrence timesof the edges linked to node 119894 According to our hypothesis forany node its popularity is proportional to the amount 119899 ofother nodes it is connected with and the occurrence times ofedges that are linked to the node In order to get normalizedweight for each node the popularity entropy pe

119894for node 119894 is

computed as follows

pe119894= 119899 lowast

valueSum119864

(2)

where the sum Sum119864

of edges is used to complete thenormalization To explain the algorithm in detail the basicidea of calculating the weight of each edge is summarized asfollows

For two nodes that have been connected to each other inthe location-based social network

(1) the more shared nodes they have the stronger the tiebetween them is that is why we use the sum of thevalues got from each element as edgersquos weight betweenthe two nodes

(2) the more less-popular nodes they share the strongerthe tie between them is that is why we use thereciprocal of the nodesrsquo popularity entropy as a factorthat has impact on the strength of tie between the twonodes

(3) the more times an edge occurs the more importantthe linking nodes are for each other sowe use the sumof the occurrence times they connected with theirshared nodes to weigh the importance of the sharednodes for the two nodes

As the values of 119890119895119894sdot 119908 and 119890

119894119895sdot 119908 are the same we can

compute them at the same time To keep the computation in

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

International Journal of Distributed Sensor Networks 5

Table 1 Amount of newly added edges in different groups

Group Amount of shared friends Amount of shared friendsrsquo friends Amount of new added edges1 1033 2574 1412 1156 1479 375

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40Prob

abili

ty o

f bei

ng fr

iend

s

Count of cofriends

10908070605040302010

Figure 2The relationship between probability of being friends andthe amount of cofriends

These two hypotheses come from the goal of exploringthe impact of userrsquos activities in physical and virtual worldon each other Our further experimental results will showthat these two hypotheses are reasonable and meet thephenomenon in reality

4 Observations

We use the dataset crawled from Foursquare The datasetcontains 31524 users 51265 venues (locations) and usersrsquotips about these venues from July 2011 to May 2012 Inthis paper we take userrsquos check-ins extracted from tips(time and geographic information) as userrsquos activity in thephysical world and the evolution of userrsquos friendship as userrsquosactivity in the virtual world We here present the statisticcharacteristic of the data collected from July 2011 to February2012

Firstly we analyze how the amount of shared friends hasinfluence on usersrsquo social ties Figure 2 shows the relationshipbetween the amount of shared friends that any two users haveand their probability of being friends

In Figure 2 we only show the records of samples with arelatively larger scale It is easy to find that the probabilityof being friends almost rises as the amount of shared friendsgrows However it is also obvious that the curve shakes withthe increment and the probability does not grow purely Howdoes it happen Dose the probability not grow as the amountof common friends rises Just as our motivation it is notalways true To verify the idea we choose two groups of usersIn Group 1 the shared friends of any two users are almostthe users who own a relatively larger social network whilein Group 2 shared friends are with a relatively smaller socialnetwork We analyse the average amount of new links addedamong these users from September 2011 to February 2012in different groups and the results are definitely different asshown in Table 1

It is not surprising to find that there are only very few newlinks added in the first group but much more in the second

Prob

abili

ty o

f bei

ng fr

iend

sCount of colocations

1 2 3 4 5 6 7 8

02

018

016

014

012

01

008

006

004

002

0

Figure 3The relationship between probability of being friends andthe amount of colocations

group Just as mentioned in ourmotivation users with a largeamount of friends are usually some kind of famous people inthe social network and anyone who is interested in them canadd them as friends As a result the links between the famoususer and their friends are quite weak and cannot be usedas the support to predict new links But for the less popularusers since their social networks are relatively smaller theirfriends are closely connected with them and the links are thestrong supports for predicting new links

Since we take userrsquos check-ins as usersrsquo activities inphysical world and evolution of their friendships as activitiesin virtual world and our goal is to find whether userrsquosactivities in virtual and physical world have impact on eachother we have to investigate two topics The first one is thatif two users have 119873 colocations we want to know what theprobability of the two users being friends is

From Figure 3 we can observe that as the number ofcolocations of two users rises the probability of being friendsincreases obviously However we can observe some shakesclearly again The reason why the shakes occur is similarto that in Figure 2 There are always some places attractingnumerous peoplewhile someothers are visited by a very smallgroup of peopleThe popularity of a place has great impact onthe strength of the links between the place and the attractedusers The more the people visit a place the more popularthe place is the weaker the links between the place and userare and vice versa As the number of colocations rises theproportion of popular colocations changes and the amountof pairs of users who have colocations changes which is thereason why the shakes occur

The second topic we want to investigate is that as thenumber of cofriends increases whether the probability ofhaving colocations will rise The result is shown in Figure 4

6 International Journal of Distributed Sensor Networks

02

018

016

014

012

01

008

006

004

002

0

Prob

abili

ty o

f hav

ing

colo

catio

ns

Count of cofriends0 5 10 15 20 25 30 35 40 45 50

Figure 4 The relationship between probability of having coloca-tions and the count of cofriends

We can observe that when the number of cofriends increasesthe probability of having colocations also increases but thereare also some shakes The explanation is similar to the aboveone

All the static analysis results show that it is not properto consider the amount of colocations or cofriends onlywhen predicting new links Each member of colocations andcofriends contains much information about the links andhelps us to understand the existing links better But how totranslate the information into a form that can be used in linkprediction remains a problem In the next section we will tryto solve it

5 Impact Detection

We try to study how userrsquos friendship and their check-insimpact each other Our main task is to predict the newedges (links) in the graph with nodes representing users andlocations It is well accepted that the more information weobtain and use when we try to predict new links the higherprecision we can get However due to the limits of the spaceof storage and time it is not practical to store all informationIt is not a goodway to simply count the number of colocationsand cofriends since each element of them also contains somefactors that have impact on the addition of new links In thispaper we propose an effective way to handle the issue

Every node in the global graph is connected with manynodes and these nodes may be connected with many othernodes Each nodersquos popularity is determined by its connectionwith the rest of nodes We use a vector to represent eachnode There are a very large number of nodes includingusers and locations in the global graph but only a fewnodes are connected with each other which means that theglobal graph is very sparse If we store all the edges in thegraph without considering whether they really exist themostelements in every vector will be 0 meaning that the nodesare not connected and most information in the vectors ismeaningless In fact only the connected nodes and their

edges make sense in link prediction Since each edge storesthe information about its weight its occurrence times andthe two nodes it connects a nodersquos vector which contains allthe edges connected to the node can store all the informationthat the node has By doing so the topology of the socialnetworks is stored with high coefficient of storage spaceutilization and it also ensures that all the nodes in the socialnetwork have the same status

If the social network contains119873 nodes in total for node119894 connected with 119899 edges its vector V

119894 is defined as follows

V119894= V (119890

1198941198951

1198901198941198952

119890119894119895119899

) 119894 isin [1119873] 119895119898 = 119894 119898 isin [1 119899]

(1)

where 119890119894119895(119894 = 119895) represents the edge existing between node

119894 and node 119895 and 119890119894119895is the same with 119890

119895119894(119894 = 119895) because we

treat the graph of social network as on undirected graph inthis paper All the elementsrsquo initial values of weight are set tobe 1 and the final value will be larger than 0 in every vectorSince all the weights are associated with the nodesrsquo popularityentropy we introduce amethod to calculate entropy as shownin Algorithm 1 119890

119895sdot 119905means that the number of the edges of 119890

119895

appears in the datasetFor a node 119894 we first get the sum count 119899 of its vectorrsquos

elements thenwe calculate the sumvalue of occurrence timesof the edges linked to node 119894 According to our hypothesis forany node its popularity is proportional to the amount 119899 ofother nodes it is connected with and the occurrence times ofedges that are linked to the node In order to get normalizedweight for each node the popularity entropy pe

119894for node 119894 is

computed as follows

pe119894= 119899 lowast

valueSum119864

(2)

where the sum Sum119864

of edges is used to complete thenormalization To explain the algorithm in detail the basicidea of calculating the weight of each edge is summarized asfollows

For two nodes that have been connected to each other inthe location-based social network

(1) the more shared nodes they have the stronger the tiebetween them is that is why we use the sum of thevalues got from each element as edgersquos weight betweenthe two nodes

(2) the more less-popular nodes they share the strongerthe tie between them is that is why we use thereciprocal of the nodesrsquo popularity entropy as a factorthat has impact on the strength of tie between the twonodes

(3) the more times an edge occurs the more importantthe linking nodes are for each other sowe use the sumof the occurrence times they connected with theirshared nodes to weigh the importance of the sharednodes for the two nodes

As the values of 119890119895119894sdot 119908 and 119890

119894119895sdot 119908 are the same we can

compute them at the same time To keep the computation in

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

6 International Journal of Distributed Sensor Networks

02

018

016

014

012

01

008

006

004

002

0

Prob

abili

ty o

f hav

ing

colo

catio

ns

Count of cofriends0 5 10 15 20 25 30 35 40 45 50

Figure 4 The relationship between probability of having coloca-tions and the count of cofriends

We can observe that when the number of cofriends increasesthe probability of having colocations also increases but thereare also some shakes The explanation is similar to the aboveone

All the static analysis results show that it is not properto consider the amount of colocations or cofriends onlywhen predicting new links Each member of colocations andcofriends contains much information about the links andhelps us to understand the existing links better But how totranslate the information into a form that can be used in linkprediction remains a problem In the next section we will tryto solve it

5 Impact Detection

We try to study how userrsquos friendship and their check-insimpact each other Our main task is to predict the newedges (links) in the graph with nodes representing users andlocations It is well accepted that the more information weobtain and use when we try to predict new links the higherprecision we can get However due to the limits of the spaceof storage and time it is not practical to store all informationIt is not a goodway to simply count the number of colocationsand cofriends since each element of them also contains somefactors that have impact on the addition of new links In thispaper we propose an effective way to handle the issue

Every node in the global graph is connected with manynodes and these nodes may be connected with many othernodes Each nodersquos popularity is determined by its connectionwith the rest of nodes We use a vector to represent eachnode There are a very large number of nodes includingusers and locations in the global graph but only a fewnodes are connected with each other which means that theglobal graph is very sparse If we store all the edges in thegraph without considering whether they really exist themostelements in every vector will be 0 meaning that the nodesare not connected and most information in the vectors ismeaningless In fact only the connected nodes and their

edges make sense in link prediction Since each edge storesthe information about its weight its occurrence times andthe two nodes it connects a nodersquos vector which contains allthe edges connected to the node can store all the informationthat the node has By doing so the topology of the socialnetworks is stored with high coefficient of storage spaceutilization and it also ensures that all the nodes in the socialnetwork have the same status

If the social network contains119873 nodes in total for node119894 connected with 119899 edges its vector V

119894 is defined as follows

V119894= V (119890

1198941198951

1198901198941198952

119890119894119895119899

) 119894 isin [1119873] 119895119898 = 119894 119898 isin [1 119899]

(1)

where 119890119894119895(119894 = 119895) represents the edge existing between node

119894 and node 119895 and 119890119894119895is the same with 119890

119895119894(119894 = 119895) because we

treat the graph of social network as on undirected graph inthis paper All the elementsrsquo initial values of weight are set tobe 1 and the final value will be larger than 0 in every vectorSince all the weights are associated with the nodesrsquo popularityentropy we introduce amethod to calculate entropy as shownin Algorithm 1 119890

119895sdot 119905means that the number of the edges of 119890

119895

appears in the datasetFor a node 119894 we first get the sum count 119899 of its vectorrsquos

elements thenwe calculate the sumvalue of occurrence timesof the edges linked to node 119894 According to our hypothesis forany node its popularity is proportional to the amount 119899 ofother nodes it is connected with and the occurrence times ofedges that are linked to the node In order to get normalizedweight for each node the popularity entropy pe

119894for node 119894 is

computed as follows

pe119894= 119899 lowast

valueSum119864

(2)

where the sum Sum119864

of edges is used to complete thenormalization To explain the algorithm in detail the basicidea of calculating the weight of each edge is summarized asfollows

For two nodes that have been connected to each other inthe location-based social network

(1) the more shared nodes they have the stronger the tiebetween them is that is why we use the sum of thevalues got from each element as edgersquos weight betweenthe two nodes

(2) the more less-popular nodes they share the strongerthe tie between them is that is why we use thereciprocal of the nodesrsquo popularity entropy as a factorthat has impact on the strength of tie between the twonodes

(3) the more times an edge occurs the more importantthe linking nodes are for each other sowe use the sumof the occurrence times they connected with theirshared nodes to weigh the importance of the sharednodes for the two nodes

As the values of 119890119895119894sdot 119908 and 119890

119894119895sdot 119908 are the same we can

compute them at the same time To keep the computation in

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

International Journal of Distributed Sensor Networks 7

Input a collection of nodesrsquo initial vectors V the sum of edges Sum119864

Output a collection of nodesrsquo popularity entropy PE = pe(1) 119894 larr 1119873 larr |119881|

(2) While (119894 lt 119873 + 1)(3) pe

119894larr 0 119899 larr

1003817100381710038171003817V1198941003817100381710038171003817 V119886119897119906119890 larr 0

(4) foreach 119890119895in V119894

(5) V119886119897119906119890 = V119886119897119906119890 + 119890119895119905

(6) 119895 larr 119895 + 1

(7) pe119894= 119899 lowast V119886119897119906119890Sum

119864

(8) 119894 larr 119894 + 1

(9) return pe119894minus1

Algorithm 1 Popularity entropy calculation

Input a collection of nodesrsquo vectors V a collection of nodesrsquopopularity PE the sum of edges Sum

119864

Output a collection of nodesrsquo vectors119881 = V(1) ilarr 1119873 larr

1003816100381610038161003816119881ini1003816100381610038161003816 119868119899119894119905119894119886119897(119881)

(2) While (119894 lt 119873 + 1)(3) foreach 119890

119894119895in V119894

(4) if 119890119894119895V119895 gt 119894

(5) 119895 larr 119890119894119895V119895 119875 larr 119862119900119898119898119900119899119873119900119889119890119904(V

119905 V119895) 119890119894119895119908 = 119890

119895119894119908 = 0

(6) foreach p in P(7) 119898 larr

10038171003817100381710038171003817V119901

10038171003817100381710038171003817 119899 larr sum(119890

119894119901119905 119890119895119901119905)

(8) 119890119895119894119908 = 119890

119894119895119908 = 119890

119894119895119908 + 119903119890V119890119903119904119890(pe

119901) lowast 119899119898

(9) 119894 larr 119894 + 1

(10) return V119894minus1

Algorithm 2 Edge final weight calculation

a controllable style and save both the time and space we leteach node only compute the strengths of ties with the othernodes having larger node ID than it does For any node 119894 wecheck the elements in its vector one by one For an element119890119895119894in node 119894rsquos vector we first check the 119894119889 V

119895sdot 119894119889 of the other

node that node 119894 connects if V119895sdot 119894119889 is larger than V

119894sdot 119894119889 then

we store the nodersquos id as 119895 and try to get the shared nodes ofnode 119894 and node 119895Then for every shared node119901 we calculateits vectorrsquos count 119898 and the sum of 119890

119895119901sdot 119905 According to our

hypothesis the occurrence times of edges linking to sharednodes are proportional to the weight of edge 119890

119894119895 while the

popularity entropy of shared nodes is inversely proportionalto the weight of that edge So each shared nodersquos contributionto the edge 119890

119894119895is computed by the following formula

119862119901minus119894119895

= reverse (pe119901) lowast

119899

119898 (3)

119862119901minus119894119895

represents the shared node 119901rsquos contribution to the edge119890119894119895 and reverse (pe

119901) represents the reverse value of pe

119901

By computing the sum of all shared nodesrsquo contribution weget the value of edge 119890

119894119895 All these processes are shown in

Algorithm 2 The algorithm is different from TF-IDF [14]Using TF-IDF algorithm we should firstly specify one kind ofnodes as a document containing many words and the othersare chosen to be words However users and locations have the

same status in the social network choosing any one of themas document will destroy the balanced status

Using Algorithm 2 we can get the weights of all the edgesexisting in the social network After getting the popularity ofeach node and weight of each edge we use randomwalk withrestart [15] as the model to do link prediction

6 Experimental Results

We divide the crawled dataset into two parts the first partstarts from July 2011 to February 2012 and the second partstart from February 2012 to May 2012 We use the first partdata as training set and the second part as test dataWewouldlike to use the test set to verify the effectiveness of ourmethodin improving the accuracy of link prediction To go one stepfurther we also want to show that userrsquos activities in virtualworld and physical world have impact on each other whichcan be shown in the result of link prediction

First we would like to experimentally evaluate thecapability of our method to compute the strength of tiesbetween real entities for example between user and user orbetween user and location We use Algorithm 2 to calculatethe weights of existing edges and the possible weights ofedges between nodes that are not really connected to eachother In other words we suppose that any two nodes inthe social network were connected to each other and we

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

8 International Journal of Distributed Sensor NetworksPr

obab

ility

of b

eing

frie

nds

Weight of edge

06

05

04

03

02

01

00 10 20 30 40 50 60 70 80 90 100 110 120

Figure 5 The relationship between probability of being friend-nodes and the weight of edges

Table 2 Contents of different graphs

Graph Feature AW UW UUW ULWUserrsquo relation lowast lowast lowast mdashUserrsquos tips lowast lowast mdash lowast

Weighted or not lowast mdash lowast lowast

calculate the weights of all edges even though some of themmay not exist Then we view the pair of nodes that are reallyconnected as friend-nodes and finally get the relationshipbetween probability of being friend-nodes and the weightof edges The results are shown in Figure 5 where the wholedataset is used (ie July 2011 to May 2012)

From Figure 5 we can observe that for any two nodesthe probability that they are connected with each other variesdirectly with the weight of the edge linking them Further-more there is almost no shake before the ending part of thecurve Comparing to Figure 2 this result verifies our ideathat nodesrsquo relations are not only impacted by the amountsof shared nodes but also the amounts of connected nodeseach shared node owns Since the probability varies alongwith the weight of edge the curve also shows that it is reallynecessary to take the existing edgesrsquo weights into accountwhen performing link prediction instead of counting theamount of shared nodes only

To analyze the impact that the physical world and virtualworld have on each other we conduct the experiments ondifferent graphs for example the graph with all weightededges (AW) the graph with all edgesrsquo weights to be 1 (UW)graph with only user-user edges (UUW) and graph withonly user-location edges (ULW) Details are as shown inTable 2 where ldquolowastrdquo means that the corresponding vertical linecontains the information in the related horizontal line whileldquomdashrdquo means that is does not For instance UW contains theinformation of user relation and tips (ie check-ins)

We use the four graphs to show how different types ofinformation have impact on the result of link predictionComparing the results of AW and UW we intend to seehow the weighted information can be used to improve theperformance of link prediction The purpose of comparing

TPR

FPR

UWAWUUW

ULWBase line

1

09

08

07

06

05

04

03

02

01

010908070605040302010

Figure 6 Link prediction results using different information (thegraph with all weighted edges (AW) the graph with all edgesrsquoweights to be 1 (UW) graph with only user-user edges (UUW) andgraph with only user-location edges (ULW))

the results of AW and UUW is to find out how a userrsquosactivities in the physical world can influence hisher activitiesin the virtual world while comparing the results of AW andULW is just for the inverse purposeWe use the graph inMAY2012 as ground truth The new links are predicted based onthe previous observations and the graph in February 2012Finally we use ROC curve to demonstrate each graphrsquos resultas shown in Figure 6

From Figure 6 we can observe that the performance ofthe unweight graph (UW) is the worst while the weightedgraph with all kinds of nodes and edges (AW) performs thebest which means that our method of calculating each edgersquosweight is basically effective Furthermore UUW containingthe weighted user-user edge only performs better than UWbut much worse than AW which shows that using boththe information in physical world and virtual world in linkprediction can provide much better results This also provesthat userrsquos activities in physical world have impact on hishervirtual world activities On the other hand we can drawa conclusion that userrsquos activities in the virtual world caninfluence userrsquos activities in the physical world by comparingthe performance of AW and ULW (the graph with onlyweighted user-location edges)

7 Conclusion

In this paper we present a study investigating how userrsquosactivities in virtual world and physical world impact eachother We use information vector to represent nodes and tostore all the edges that are linked to every node We definepopularity entropy to weigh each nodersquos popularity accordingto the amount of nodes it connects and the occurrence timesof edges linked to it Then we calculate each edgersquos weightconsidering its linking nodesrsquo popularity entropy and theirshared nodesrsquo features In this way we get a weighted graph

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

International Journal of Distributed Sensor Networks 9

and use it for link prediction The results show that userrsquosactivities in virtual world and physical world do have impacton each other andusing both virtualworld andphysical worldinformation can improve the accuracy of link prediction

In the future we plan to detect and analyze communitiesformed by the nodes of online social network to see whetheruserrsquos activities in virtual and physical world have impacton community dynamics Real applications that employ theresults will also be a future work of this study

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was partially supported by the National BasicResearch Program of China (no 2012CB316400) theNational Natural Science Foundation of China (nos61103063 61222209 61373119 and 61332005) the SpecializedResearch Fund for the Doctoral Program of HigherEducation (no 20126102110043) andMicrosoftCorporation

References

[1] D Zhang B Guo and Z Yu ldquoThe emergence of social andcommunity intelligencerdquo IEEE Computer vol 44 no 7 pp 21ndash28 2011

[2] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003

[3] M McPherson L Smith-Lovin and J M Cook ldquoBirds of a fea-ther homophily in social networksrdquo Annual Review of Sociol-ogy vol 27 pp 415ndash444 2001

[4] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[5] Q Li Y Zheng X Xie Y Chen W Liu andW-Y Ma ldquoMininguser similarity based on location historyrdquo in Proceedings of the16th ACM SIGSPATIAL International Conference on Advancesin Geographic Information Systems (GIS rsquo08) pp 298ndash307 ACMPress November 2008

[6] E Cho S A Myers and J Leskovec ldquoFriendship and mobilityuser movement in location-based social networksrdquo in Proceed-ings of the 17th ACM SIGKDD International Conference onKnowledgeDiscovery andDataMining (KDD rsquo11) pp 1082ndash1090ACM Press August 2011

[7] D Lian and X Xie ldquoLearning location naming from usercheck-in historiesrdquo in Proceedings of the 19th ACM SIGSPATIALInternational Conference onAdvances inGeographic InformationSystems (GIS rsquo11) pp 112ndash121 ACM Press November 2011

[8] R Xiang J Neville and M Rogati ldquoModeling relationshipstrength in online social networksrdquo in Proceedings of the 19thInternationalWorldWideWeb Conference (WWW rsquo10) pp 981ndash990 ACM Press April 2010

[9] J Tang T Lou and J Kleinberg ldquoInferring social ties acrossheterogeneous networksrdquo in Proceedings of the 5th ACM Inter-national Conference on Web Search and Data Mining (WSDMrsquo12) pp 743ndash752 ACM Press February 2012

[10] M Roth D Deutscher G Flysher I Horn and A LeichtbergldquoFriends using the implicit social graphrdquo in Proceedings ofthe ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD rsquo10) pp 233ndash242 ACM PressJuly 2011

[11] J Cranshaw E Toch J Hong A Kittur andN Sadeh ldquoBridgingthe gap between physical location and online social networksrdquoinProceedings of the 12th International Conference onUbiquitousComputing (UbiComp rsquo10) pp 119ndash128 ACM Press October2010

[12] D Wang D Pedreschi C Song F Giannotti and A-LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1100ndash1108 ACM Press August 2011

[13] S Scellato A Noulas and C Mascolo ldquoExploiting place fea-tures in link prediction on location-based social networksrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo11) pp 1046ndash1054 ACM Press August 2011

[14] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing and Managementvol 24 no 5 pp 513ndash523 1988

[15] H Tong C Faloutsos and J-Y Pan ldquoFast random walk withrestart and its applicationsrdquo in Proceedings of the 6th Interna-tional Conference on Data Mining (ICDM rsquo06) pp 613ndash622IEEE Press December 2006

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article Investigating How User s Activities in ...downloads.hindawi.com/journals/ijdsn/2014/461780.pdfe advent of GPS-enabled smart phones has had a signi - ... to share their

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of