46
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

http://cs224w.stanford.edu

Page 2: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

2

Network Adjacency matrix

Nodes

Nod

es

Page 3: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Non-overlapping vs. overlapping communities

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

Page 4: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

A node belongs to many social “circles’

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

[Palla et al., ‘05]

Page 5: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

5

High school Company

Stanford (Squash) Stanford (Basketball)

Page 6: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

Page 7: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Two nodes belong to the same community if they can be connected through adjacent k-cliques: k-clique: Fully connected

graph on k nodes

Adjacent k-cliques: overlap in k-1 nodes

k-clique community Set of nodes that can

be reached through a sequence of adjacent k-cliques

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

3-clique Adjacent 3-cliques

[Palla et al., ‘05]

Non-adjacent 3-cliques

Page 8: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Two nodes belong to the same community if they can be connected through adjacent k-cliques:

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

4-clique

Adjacent 4-cliques

Communities for k=4

[Palla et al., ‘05]

Non-adjacent 4-cliques

Page 9: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Clique Percolation Method: Find maximal-cliques

(not k-cliques!) Clique overlap graph: Each clique is a node Connect two cliques if they

overlap in at least k-1 nodes Communities: Connected components of

the clique overlap matrix How to set k? Set k so that we get the “richest” (most widely

distributed cluster sizes) community structure

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

A

C D B

A

C D B

Cliques Communities

Set: k=3

Page 10: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

(1) Graph (2) Clique overlap matrix

(3) Thresholded matrix at 3

(4) Communities (connected components)

Start with graph Find maximal

cliques Create clique

overlap matrix Threshold the

matrix at value k-1 If 𝑎𝑎𝑖𝑖𝑖𝑖 < 𝑘𝑘 − 1 set 0

Communities are the connected components of the thresholded matrix

Cliques

Cliq

ues

Overlap size

Page 11: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

Communities in a “tiny” part of a phone call network of 4 million users [Palla et al., ‘07]

[Palla et al., ‘07]

Page 12: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

No nice way, hard combinatorial problem Maximal clique: Clique that can’t be extended {𝑎𝑎, 𝑏𝑏, 𝑐𝑐} is a clique but not maximal clique {𝑎𝑎, 𝑏𝑏, 𝑐𝑐,𝑑𝑑} is maximal clique

Algorithm: Sketch Start with a seed node Expand the clique around the seed Once the clique cannot be further

expanded we found the maximal clique Note: This will generate the same clique multiple times

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

Page 13: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Start with a seed vertex 𝒂𝒂 Goal: Find the max clique 𝑸𝑸 that 𝒂𝒂 belongs to Observation: If some 𝒙𝒙 belongs to 𝑸𝑸 then it is a neighbor of 𝒂𝒂 Why? If 𝒂𝒂,𝒙𝒙 ∈ 𝑸𝑸 but edge (𝒂𝒂,𝒙𝒙) does not exist, 𝑸𝑸 is not a clique!

Recursive algorithm: 𝑸𝑸 … current clique 𝑹𝑹 … candidate vertices to expand the clique to

Example: Start with 𝒂𝒂 and expand around it

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

Q= {a} {a,b} {a,b,c} bktrack {a,b,d} R= {b,c,d} {b,c,d} {c,d}∩Γ(c)={} {c}∩Γ(d)={} ∩Γ(b)={c,d} Steps of the recursive algorithm Γ(u)…neighbor set of u

Page 14: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Start with a seed vertex 𝒂𝒂 Goal: Find the max clique 𝑸𝑸 that 𝒂𝒂 belongs to Observation: If some 𝒙𝒙 belongs to 𝑸𝑸 then it is a neighbor of 𝒂𝒂 Why? If 𝒂𝒂,𝒙𝒙 ∈ 𝑸𝑸 but edge (𝒂𝒂,𝒙𝒙) does not exist, 𝑸𝑸 is not a clique!

Recursive algorithm: 𝑸𝑸 … current clique 𝑹𝑹 … candidate vertices to expand the clique to

Example: Start with 𝒂𝒂 and expand around it

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

Q= {a} {a,b} {a,b,c} bktrack {a,b,d} R= {b,c,d} {b,c,d} {d}∩Γ(c)={} {c}∩Γ(d)={} ∩Γ(b)={c,d} Steps of the recursive algorithm Γ(u)…neighbor set of u

Page 15: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

𝑸𝑸 … current clique 𝑹𝑹 … candidate vertices

Expand(R,Q) while R ≠ {} p = vertex in R

Qp = Q ∪ {p} Rp = R ∩ Γ(p) if Rp ≠ {}: Expand(Rp,Qp) else: output Qp R = R – {p}

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Start: Expand(V, {}) R={a…f}, Q={} p = {a} Qp = {a} Rp = {b,d} Expand(Rp, Q): R = {b,d}, Q={a} p = {b} Qp = {a,b} Rp = {d} Expand(Rp, Q): R = {d}, Q={a,b} p = {d} Qp = {a,b,d} Rp = {} : output {a,b,d} p = {d} Qp = {a,d} Rp = {b} Expand(Rp, Q): R = {b}, Q={a,d} p = {b} Qp = {a,d} Rp = {} : output {a,d,b}

Page 16: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Announcement: On Thu we will have Lars Backstrom from Facebook Data Science team give a guest lecture!

Page 17: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

How should we think about large scale organization of clusters in networks? Finding: There is a lack of (traditional) clustering

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 Nested Core-Periphery

Page 18: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

How do we reconcile these two views? (and still do community detection)

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

vs.

Page 19: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

How community like is a set of nodes? A good cluster S has Many edges internally Few edges pointing outside

Simplest objective function: Conductance Small conductance corresponds to good clusters (Note |S| < |V|/2)

22

S

S’

∑∈

∉∈∈=

Sssd

SjSiEjiS |},;),{(|)(φ

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 20: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Define: Network community profile (NCP) plot

Plot the score of best community of size k

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

Community size, log k

log Φ(k)

k=5 k=7

[WWW ‘08]

k=10

(Note |S| < |V|/2)

Page 21: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

• Run the favorite clustering method • Each dot represents a cluster • For each size find “best” cluster

Cluster size, log k

Clu

ster

scor

e, lo

g Φ

(k)

Spectral Graclus

Metis

Page 22: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Meshes, grids, dense random graphs:

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

d-dimensional meshes California road network

11/19/2013

[WWW ‘08]

Page 23: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Collaborations between scientists in networks [Newman, 2005]

26

Community size, log k

Cond

ucta

nce,

log Φ

(k)

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11/19/2013

[WWW ‘08]

Page 24: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Natural hypothesis about NCP: NCP of real networks slopes

downward Slope of the NCP corresponds

to the “dimensionality“ of the network

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

What about large networks?

[Internet Mathematics ‘09]

Page 25: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Typical example: General Relativity collaborations (n=4,158, m=13,422)

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

[Internet Mathematics ‘09]

Page 26: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

[Internet Mathematics ‘09]

Page 27: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Φ(k

), (s

core

)

k, (cluster size) 30

Better and better clusters

Clusters get worse and worse

Best cluster has ~100 nodes

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 28: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

As clusters grow the number of edges inside grows slower that the number crossing

31

Φ=2/10 = 0.2

Each node has twice as many children

Φ=1/7=0.14

Φ=8/20 = 0.4

Φ=64/92 = 0.69

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 29: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Empirically we note that best clusters are barely connected to the network

32

NCP plot

⇒ Core-periphery structure 11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 30: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

33

Nothing happens! ⇒ Nestedness of the

core-periphery structure 11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 31: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Nested Core-Periphery (jellyfish, octopus)

Whiskers are responsible for

good communities

Denser and denser core

of the network

Core contains 60% node and

80% edges

34 11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 32: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

35

vs.

How do we reconcile these two views?

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Page 33: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Many methods for overlapping communities Clique percolation [Palla et al. ‘05] Link clustering [Ahn et al. ‘10] [Evans et al.‘09] Clique expansion [Lee et al. ‘10] Mixed membership stochastic block models

[Airoldi et al. ’08] Bayesian matrix factorization [Psorakis et al. ‘11]

What do these methods assume about community overlaps?

36

Page 34: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Many overlapping community detection methods make an implicit assumption:

Edge probability decreases with the number of shared communities

37

Network Adjacency matrix

Nodes

Nod

es

Is this true?

Page 35: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Basic question: nodes u, v share k communities What’s the edge probability?

38

LiveJournal social network

Amazon product network

Page 36: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Edge density in the overlaps is higher!

39

Communities as “tiles”

“The more different foci (communities) that two individuals share, the more likely it is that they will be tied” - S. Feld, 1981

Page 37: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

40 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11/19/2013

Communities as overlapping tiles Web of affiliations [Simmel ‘64]

Page 38: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

41

What does this mean?

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11/19/2013

Ganovetter and all non-overlapping

methods

Palla et al., MMSB and other

overlapping methods as well

Page 39: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Many methods fail to detect dense overlaps: Clique percolation, …

42 Clique percolation

Page 40: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Generative model: How is a network generated from community affiliations?

Model parameters: Nodes V, Communities C, Memberships M Each community c has a single probability pc

43

Communities, C

Nodes, V Community Affiliation Network

Model pA pB

Memberships, M

Page 41: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Given parameters (V, C, M, {pc}) Nodes in community c connect to each other by

flipping a coin with probability pc

Nodes that belong to multiple communities have multiple coin flips: Dense community overlaps

44

Communities, C

Nodes, V

Community Affiliation Network

Model pA pB

Memberships, M

Page 42: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

45

Model

Network

Page 43: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

AGM is flexible and can express variety of network structures: Non-overlapping, Nested, Overlapping

11/19/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46

Page 44: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

47 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11/19/2013

Page 45: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

48 LiveJournal social network

Page 46: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

Primary core: Foodwebs, Web-graph, Social Secondary cores: PPI, Products

49

Foodweb PPI