1
Measurement and Analysis of Online Social Networks
A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee
Presentation byYong Wang
(Defense side)
My general opinion
• This is a brilliant paper.
2
Title
• Let’s recall the title of this course.
3
“There are no
accidents”
Author• Who is this guy?
Alan Mislove
At least six papers on OSN published within two years to top-class conferences, like WWW, IMC, WOSN, NSDI..
We will read two of them in the next two months
This paper- 16 citations already in one year
From Rice Univ. Social relationship
4
Contributions• Introduce what online social networks are
definition section 2
• Measure online social networks at scale
data section 4
• Introduce static structural properties
results section 5
• Explain why study online social networks?
impact section 5 + section 6
5
What are (online) social networks?
• Social networks are graphs of people
Graph edges connect friends
• Online social networking Social network hosted by a Web site
Friendship represents shared interest or trust
Online friends may have never met
6
Data -- Measure online social networks at scale
• This paper presents a large-scale measurement study and analysis on four online social networks containing over 11.3 million users and 328 million links.
7
8
Site YT Flickr LJ Orkut
Users(mill)
1.1 1.8 5.2 3
Links(mill)
4.9 22 72 223
Traffic ranking in Alexa
3 34 88 102
Coverage Rich media --video
Rich media – photo
Blog “Pure”OSN
Data are representative
How large the scale is?
9
Paper Site proportion
This paper Orkut 11.3%
Analysis of Topological Characteristics of Huge Online
Social Networking Services (WWW’07)
Orkut 0.3%
This paper LiveJournal 95.4%Group Formation in Large Social
Networks: Membership, Growth, and Evolution (KDD’06)
LiveJournal 0.08%
10
Why study the graphs?• important to improve existing system and
develop new applications
– information search
• Web search: PeerSpective [HotNets’06]
– trusted users
• Trust can be used to solve security problems
• Multiple identity attacks: SybilGuard [SIGCOMM’06]
• Spam: RE [NSDI’06]
• Ostra: thwart unwanted communication [NSDI’08]
• Understanding network structure is necessary first step
Information searchLocating content
• Comparison between Google and OSN
• How Google comes and works?
11
Search on OSN • The integration of search engines and
online social networks could enable queries such as
• "Has any of my acquaintances been on holidays in New Zealand?" or
• "Recent articles on hypertext authored by people associated with Ted Nelson".
12
Results+Impacts
• Link symmetry
• Power-law node degrees
• Correlation of indegree and outdegree
• Path lengths and diameter
• Link degree correlations
• Densely connected core
• Tightly clustered fringe
• Groups
Link symmetry
• Social networks show high level of link symmetry
14
In the WEB (CNN- “a dancing queen” in the web) Things are different in OSN due to reciprocation
In the OSN world, “the dancing queen” may place a link pointing back to other “gentlemen”, although not 100%..
likelihood is much higher
Implications of high symmetry
Implications is that ‘hubs’ become ‘authorities’
May impact search algorithms (PageRank, HITS)
Open a research direction for others, e.g.
“The Karma of Digg: Reciprocity in Online Social Networks” by E. Sadlon et al. in 2008
15
Power-law node degrees
16U.S. highways U.S. Airlines
Power-law node degrees• In the WEB (CNN vs. personal webpages)
• In the OSN- power-law. as well, after all, it is second life
17
• In the WEB, the indegree and outdegree power-law exponents differ significantly• In the OSNs, the power-law exponents for the indegree and outdegree distributions in each of the social networks are very similar
Power-law node degrees• The differences show that : In the WEB, the
incoming links are significantly more concentrated on a few high-degree nodes than the outgoing links
18
In all social networks, distributions of incoming and outgoing links across the nodes are very similar.
19
Implications of Power-law degrees
• Realize the structure of OSN --- power-law.
• nodes with many incoming links (hubs) have value due to their connection to many users
• it becomes easy to spread important information to the other nodes, e.g. DNS
• in order for a user to send spam, they have to become a more important node, amass friends. introduced at • “SybilGuard : [SIGCOMM’06]” and
• “Ostra : [NSDI’08]”
20
Correlation of indegree and outdegree
• In WEB, most nodes have considerably higher outdegree than indegree, while a small fraction of nodes have significantly higher indegree than outdegree. (CNN vs. personal webpages)
• In social networks, the nodes with very high outdegree also tend to have very high indegree
• The famous people who know lots of people also is known by lost of people
PW CNN
OSN
Implications of Correlation of indegree and outdegre
• The high correlation between indegree and outdegree in social networks can be explained by the high number of symmetric links
• The high symmetry may be due to the tendency of users to reciprocate links from other users who point to them.
21
Search information : makes it harder to identify reputable sources due to dilutionpossible sol: who initiated the link?
22
Path lengths and diameter• all four networks have short path length
from 4.25 – 5.88
• six degrees of separation
Facebook, 4.2 million for Octorber 2007, 6.12 fromhttp://blog.paulwalk.net/2007/10/08/no-degrees-of-separation/
23
Implications of Path lengths and diameter
The small diameter and path lengths of social networks are likely to impact the design of techniques for finding paths in such networks
24
Link degree correlations• high-degree nodes tend to connect to other high-degree nodes ? OR
• high-degree nodes tend to connect to low-degree nodes ?
• In real society: the former theory is true.
• By virtue of two metrics: the scale-free metric and the assortativity.
• Suggests that there exists a tightly-connected “core” of the high-degree nodes which connect to each other, with the lower-degree nodes on the fringes of the network.
• The next question: How big the core is
25
Implications of Link degree correlationsSpread of Information
“A Measurement-driven Analysis of Information Propagation in the Flickr Social Network” [WWW’ 09]
26
Densely connected core
• the graphs have a densely connected core comprising of between 1% and 10% of the highest degree nodes such that removing this core completely disconnects the graph.
Sub logarithmic growth
Implications of densely connected core
• Network contains dense core of users
Core necessary for connectivity of 90% of users
Most short paths pass through core
Could be used for quickly disseminating information
• So 10% at core
• What about remaining nodes (90% at fringe)
27
28
Tightly clustered fringe• Clustering Coefficient of a network:
How many of your friends are also friends themselves?
• social network graphs show stronger clustering, most likely because: people tend to be introduced to other people via mutual friends, increasing the probability that two friends of a single user are also friends.
Are the fringes more clustered?
The clustering coefficient is higher for nodes of low degree
Implications of Tightly clustered fringe
• Fringe is highly clusteredUsers with few friends form mini-cliques
Similar to previously observed offline behavior
Could be leveraged for sharing information of local interest
29
30
Groups• group sizes follow power-law distribution
• the members of smaller user groups tend to be more clustered than those of larger groups
31
Groups
• Low-degree nodes tend to be part of very few communities, while high-degree nodes tend to be members of multiple groups.
Implications of Groups
• “To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private User Profiles” [WWW’ 09]
32
• Finally, Give details and reasons for all deviations , that is good
33
What does the structure look like
the networks contain a densely connected core of high-degree nodes;
and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network.octopus
Two stories
• This paper shows its brilliance in the same way
35