48
Community structure in complex networks V.A. Traag KITLV, Leiden, the Netherlands e-Humanities, KNAW, Amsterdam, the Netherlands February 21, 2014 e Royal Netherlands Academy of Arts and Sciences Humanities

Community structure in complex networks

Embed Size (px)

DESCRIPTION

Overview of my work in community detection. Showing how to address resolution limit and how to assess the significance of a partition.

Citation preview

Page 1: Community structure in complex networks

Community structure in complex networks

V.A. Traag

KITLV, Leiden, the Netherlandse-Humanities, KNAW, Amsterdam, the Netherlands

February 21, 2014

eRoyal Netherlands Academy of Arts and SciencesHumanities

Page 2: Community structure in complex networks

Overview

1 What are communities in networks? How do we find them?

2 Where are those small communities?

3 When are communities significant?

4 What should I remember? And what’s next?

Page 3: Community structure in complex networks

Part IWhat are communities?How do we find them?

Page 4: Community structure in complex networks

What is a community?

• Everybody has an intuitive idea.

• Yet no single agreed upon definition.

• Common core:

Groups of nodes that areI relatively densely connected within, andI relatively sparsely connected between.

Page 5: Community structure in complex networks

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11

Page 6: Community structure in complex networks

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11

Page 7: Community structure in complex networks

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11

Page 8: Community structure in complex networks

Different weights

No a-priori constraints on weights aij , bij .

Model aij bijReichardt & Bornholdt 1− bij γpijArenas, Fernandez & Gomez 1− bij pij(γ)− γδijRonhovde & Nussinov 1 γConstant Potts Model 1− γ γ

Page 9: Community structure in complex networks

Modularity

• Null-model pij , constraint:∑

ij pij = 2m.

• Popular null-model, configuration model pij =kikj2m .

• With γ = 1, leads to modularity:

Q =∑ij

(Aij −

kikj2m

)δ(σi , σj).

• As sum over communities:

Q =∑c

(ec − 〈ec〉).

Page 10: Community structure in complex networks

Optimising modularity

Initial communities

0

12

3

4

56

7

8

9

10

11

Page 11: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

0

12

3

4

56

7

8

9

10

11

Page 12: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

0

12

3

4

56

7

8

9

10

11

Page 13: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

12

3

4

56

7

8

9

10

11

Page 14: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

12

3

4

56

7

8

9

10

11

No more improvement

Page 15: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

1

2

1

1 1

3 6

5

Aggregate graph, andrepeat same procedure.

Page 16: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

1

2

1

1 1

3 6

5

Aggregate graph, andrepeat same procedure.

Louvain algorithm

1 Move node i to best (greedy) community.

2 Repeat (1) until no more improvement.

3 Contract graph (communities → nodes).

4 Repeat (1)-(3) until no more improvement.

Page 17: Community structure in complex networks

Part IIWhere are those small

communities?

Page 18: Community structure in complex networks

Resolution limit

• Modularity might miss ‘small’communities.

• Merge two cliques in ring of cliqueswhen

γRB <q

nc(nc − 1) + 2.

• Number of communities scales as√γRBm.

• For general null model, problemremains since

∑ij pij = 2m.

Page 19: Community structure in complex networks

Resolution limit

• Modularity might miss ‘small’communities.

• Merge two cliques in ring of cliqueswhen

γRB <q

nc(nc − 1) + 2.

• Number of communities scales as√γRBm.

• For general null model, problemremains since

∑ij pij = 2m.

Page 20: Community structure in complex networks

Resolution-limit-free

• Ronhovde & Nussinov model (aij = 1, bij = γ).

• Claim: resolution-limit-free, as merge depends only on ‘local’variables

γRN <1

n2c − 1.

• But, take pij = kikj , we obtain

γRB <1

2(nc(nc − 1) + 2)2,

also only ‘local’ variables. Hence, also resolution-limit-free?

• Problems of scale remain.

Page 21: Community structure in complex networks

Resolution limit

Page 22: Community structure in complex networks

Resolution limit

Page 23: Community structure in complex networks

Resolution limit

Resolution limit

Resolution-limit-free

Page 24: Community structure in complex networks

Defining resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Theorem (Swap optimal subpartitions)

If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.

Page 25: Community structure in complex networks

Defining resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Theorem (Swap optimal subpartitions)

If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.

What methods areresolution-limit-free?

Page 26: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Page 27: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Page 28: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Page 29: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.

Page 30: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.

Local methods areresolution-limit-free.

Page 31: Community structure in complex networks

Part IIWhen are communities

significant?

Page 32: Community structure in complex networks

Modularity in non-modular graphs

Modularity as sign of community structure

• Modularity −1 ≤ Q ≤ 1.

• High modularity ⇒ community structure?

• Modularity higher than 0.3 seen as significant.

Page 33: Community structure in complex networks

Modularity in non-modular graphs

Modularity as sign of community structure

• Modularity −1 ≤ Q ≤ 1.

• High modularity ⇒ community structure?

• Modularity higher than 0.3 seen as significant.

Many graphs have high modularity,but no community structure.

Page 34: Community structure in complex networks

Modularity without community structure

Q = 0.31

Modularity Q 6≈ 0 for random graphs.

Page 35: Community structure in complex networks

Significance

How significant is a partition?

Page 36: Community structure in complex networks

Significance

E = 14

E = 9

Fixed partition

E = 11

Better partition

Page 37: Community structure in complex networks

Significance

E = 14

E = 9

Fixed partition

E = 11

Better partition

• Not: Probability to find E edges in partition.

• But: Probability to find partition with E edges.

Page 38: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 39: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 40: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 41: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 42: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 43: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 44: Community structure in complex networks

Significance

10−3 10−2 10−1 100103

104

105

106

γ

N E

Page 45: Community structure in complex networks

Significance

10−3 10−2 10−1 100103

104

105

106

γ

N E S

Page 46: Community structure in complex networks

Final ChapterWhat should I remember? And

what’s next?

Page 47: Community structure in complex networks

Conclusions

To remember

• Modularity can hide small communities.

• Local methods avoid this problem (RN, CPM).

• High modularity 6⇒ significant: use significance.

What’s next?

• Various measures of significance: what’s the difference?

• Choose “correct” resolution ⇒ resolution limit?

Page 48: Community structure in complex networks

Thank you!Questions?

Traag, Van Dooren & NesterovNarrow scope for resolution-limit-free community detectionPhys Rev E 84, 016114 (2011)

Traag, Krings & Van DoorenSignificant scales in community structureSci Rep 3, 2930 (2013)

Reichardt & BornholdtStatistical mechanics of community detection.Phys Rev E 74, 016110 (2006)

m www.traag.net B [email protected] @vtraag