Upload
silvester-ross
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Properties of network community structure
Jure Leskovec, CMUKevin Lang, Anirban Dasgupta, Michael MahoneyYahoo! Research
2
Networks
Big data Study emerging behaviors How are small networks different from large
3
Network community structure
Communities (groups, clusters, modules): Sets of nodes with lots
of connections inside and few to outside (the rest of the network)
Communities, clusters, groups,
modules
4
Example 1: Biology
Nodes represent proteins Edges represent
interactions/associations
Proteins with same function interact more
Can use network to discover functional groups
Yeast transcriptional regulatory modules [Bar-Joseph et al., 2003]
5
Example 2: Social networks
Clusters correspond to social communities, organizational units (e.g., departments)
Zachary’s Karate club network• During the study the club split into 2• The split corresponds to min-cut (● vs. ■)
6
Example 3: Web (blogs)
[Adamic-Glance 2005]Democrat vs. Republican blogs
7
Example 4: Science
Citations Collaborations
[Newman 2003]
8
Example 5: Hierarchies
Nested communities: modular structure of networks is hierarchically organized
CS Math Drama Music
Science Arts
University
9
Example 6: Hierarchies
Recursive hierarchical network
(a) N=5, E=8
(b) N=25, E=56(c) N=125, E=344
10
Discovering communities Intuition: Find nodes that can be easily
separated from the rest of the network Various objective functions
Min-cut Normalized-cut Centrality, Modularity
Various algorithms Spectral clustering (random walks) Girvan-Newman (centrality) Metis (contraction based)
Girvan-Newman: 1) Betweenness centrality: number of shortest paths passing through an edge. 2) Remove edges by decreasing centrality
11
Our question: How well are communities expressed?
12
Our question: How well are communities expressed?
Statistical properties of community structure Instead of searching for communities we measure
well how expressed are communities
Questions What is the community structure of real world
networks? How to measure and quantify this? What does this tell us about network structure? What is a good model (intuition)? What are consequences for clustering/partitioning
algorithms?
13
Community score (quality)
How community like is a set of nodes?
Need a natural intuitive measure
Conductance (normalized cut)Φ(S) = # edges cut / # edges inside
Small Φ(S) corresponds to more community-like sets of nodes
S
S’
14
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
What is “best”
community of 5 nodes?
15
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Bad communit
yΦ=5/6 = 0.83
What is “best”
community of 5 nodes?
16
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
What is “best”
community of 5 nodes?
17
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
Best communit
yΦ=2/8 = 0.25
What is “best”
community of 5 nodes?
18
Network Community Profile Plot We define:
Network community profile (NCP) plotPlot the score of best community of size k
Search over all subsets of size k and find best: Φ(k=5) = 0.25
NCP plot is intractable to compute
19
Network Community Profile Plot We define:
Network community profile (NCP) plotPlot the score of best community of size k
Community size, log k
log Φ(k)k=5, Φ(k)=0.25 k=7,
Φ(k)=0.18
20
NCP: Example
Community size, log k
Com
munit
y s
core
, lo
g Φ
(k)
21
Scaling to large networks Local spectral clustering
algorithm Pick a seed node Slowly diffuse mass around
it (via PageRank like random walk)
Find the bottleneck
Repeat many times Many seed nodes for very
local walks Less seed nodes for more
global (longer) walks
22
How are NCP plots of real networks?
23
NCP plot: Small Social Network
Dolphin social network Two communities of dolphins
NCP plotNetwork
24
NCP plot: Zachary’s karate club
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B
NCP plotNetwork
25
NCP plot: Network Science
Collaborations between scientists in Networks
NCP plotNetwork
26
NCP plot: Grids
NCP plotNetwork
27
NCP plot: Graphs with geometry
NCP plotNetwork
28
NCP plot: Manifold dataset
Manifold learning dataset (Hands)
NCP plotNetwork
29
NCP plot: Power Grid
Eastern US power grid:
30
NCP plot: Hierarchical network
NCP plotNetwork
– Small social networks– Geometric and– Hierarchical networkhave downward NCP plot
What about large
networks?
31
NCP of Large Networks
32
Our work: Large networks
Previously researchers examined community structure of small networks (~100 nodes)
We examined more than 70 different large networks
Large real-world networks look very
different!
33
Example of our findings
Typical example:General relativity collaboration network (4,158 nodes, 13,422 edges)
34
Com
mu
nit
y s
core
Community size
NCP: LiveJournal (N=5M, E=42M)
Better and better
communities
Best communities get worse and worse
Best community
has 100 nodes
35
Explanation: Downward part
Whiskers are responsible for downward slope of NCP plot
Whisker is a set of nodes connected to the network by a single
edge
NCP plot
Largest whisker
36
Explanation: Upward part
Each new edge inside the community costs more
NCP plot
Φ=2/4 = 0.5
Φ=8/6 = 1.3
Φ=64/14 = 4.5
Each node has twice as many
children
Φ=1/3 = 0.33
37
Comparison to rewired network
Take a real network G Rewire edges for a long time We obtain a random graph with same degree
distribution as the real network G
38
Comparison to a rewired network
Rewired network: random
network with same degree distribution
39
LiveJournal whisker sizes
Whiskers in real networks are larger than expected
40
Whisker shapes
Whiskers in real networks are non-trivial (richer than
trees)
Edge to cut
41
Caveat: Bag of whiskers
What if we allow cuts that give disconnected communities?
• Cut all whiskers • Compose communities out of whiskers• How good “communities” do we get?
42
Communities made of whiskers
Com
mu
nit
y s
core
Community size
We get better community scores when
composing disconnected sets of
whiskers
Connected communitie
sBag of whiskers
43
What if I remove whiskers?
Nothing happens!Now we have 2-edge
connected whiskers to deal with.
44
Conclusion: Core+Whiskers
Connected communitie
sBag of whiskers
Rewired network
45
Conclusion: Network structure
Network structure: Core-
periphery (jellyfish, octopus)
Whiskers are responsible for
good communities
Denser and denser
core of the network
Core contains 60%
node and 80% edges
46
What is a good model? Intuiton?
47
(Sparse) Random graphs
(Sparse) Random graph: Start with N nodes Pick pairs of nodes uniformly at random and connect
Flat (long random connections)
Theorem (works for any degree distribution)
Sparsity does not explain our observation
48
Preferential attachment
Preferential attachment [Price 1965, Albert & Barabasi 1999]: Add a new node, create m out-links Probability of linking a node ki is
proportional to its degree Based on Herbert Simon’s result
Power-laws arise from “Rich get richer” (cumulative advantage)
Flat (connections to hubs – no locality)
49
Small-World model
Let’s exploit local connections
Down (locally network looks like a mesh)and Flat (at large scale network looks random)
50
Geometry + Preferential attachment
Geometric preferential attachment: Place nodes at random in 2D Pick a node Pick nodes in a radius Connect preferentially
Flat (locally network is random)and Down (globally network is a mesh – union of local expanders)
51
Model that works: Forest Fire Forest Fire: connections spread like a fire
New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively
As community grows it
blends into the core of
the network
52
Forest Fire NCP plot
rewired
network
Bag of whisk
ers
53
Conclusion and connections Whiskers:
Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social
relationship to at most 150 people Core:
Core has little structure (hard to cut) Still more structure than the random network
54
Connections to previous work
Other researchers examined small networks so they did not hit the Dunbar’s limit
Small evidence: 400k nodes Amazon co-purchasing network
[Clauset et al. 2004]▪ Largest community has 50% of all nodes▪ It was labeled “Miscelaneous”
Karate club has no significant community structure [Newman et al. 2007]
55
Other explanantions
Bond vs. identity communities Multiple hierarchies that blur the community
boundaries
56
Is there still hope?
Ground truth Yes, use attributes, better link semantics
57
Conclusion and connections
NCP plot is a way to analyze network community structure
Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms)
But large networks are different Large networks
Whiskers + Core structure Small well isolated communities blend into the
core of the networks as they grow