Community structure in complex networks

Community structure in complex networks

V.A. Traag

KITLV, Leiden, the Netherlandse-Humanities, KNAW, Amsterdam, the Netherlands

February 21, 2014

eRoyal Netherlands Academy of Arts and SciencesHumanities

Overview

1 What are communities in networks? How do we find them?

2 Where are those small communities?

3 When are communities significant?

4 What should I remember? And what’s next?

Part IWhat are communities?How do we find them?

What is a community?

• Everybody has an intuitive idea.

• Yet no single agreed upon definition.

• Common core:

Groups of nodes that areI relatively densely connected within, andI relatively sparsely connected between.

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11





H =∑ij


0

12

3

4

56

7

8

9

10

11





H =∑ij


0

12

3

4

56

7

8

9

10

11

Different weights

No a-priori constraints on weights aij , bij .

Model aij bijReichardt & Bornholdt 1− bij γpijArenas, Fernandez & Gomez 1− bij pij(γ)− γδijRonhovde & Nussinov 1 γConstant Potts Model 1− γ γ

Modularity

• Null-model pij , constraint:∑

ij pij = 2m.

• Popular null-model, configuration model pij =kikj2m .

• With γ = 1, leads to modularity:

Q =∑ij

(Aij −

kikj2m

)δ(σi , σj).

• As sum over communities:

Q =∑c

(ec − 〈ec〉).

Optimising modularity

Initial communities

0

12

3

4

56

7

8

9

10

11


Initial communities

Move 0

0

12

3

4

56

7

8

9

10

11


Initial communities

Move 0

Move 5

0

12

3

4

56

7

8

9

10

11


Initial communities

Move 0

Move 5

Move 11

0

12

3

4

56

7

8

9

10

11


Initial communities

Move 0

Move 5

Move 11

0

12

3

4

56

7

8

9

10

11

No more improvement


Initial communities

Move 0

Move 5

Move 11

0

1

2

1

1 1

3 6

5

Aggregate graph, andrepeat same procedure.


Initial communities

Move 0

Move 5

Move 11

0

1

2

1

1 1

3 6

5

Aggregate graph, andrepeat same procedure.

Louvain algorithm

1 Move node i to best (greedy) community.

2 Repeat (1) until no more improvement.

3 Contract graph (communities → nodes).

4 Repeat (1)-(3) until no more improvement.

Part IIWhere are those small

communities?

Resolution limit

• Modularity might miss ‘small’communities.

• Merge two cliques in ring of cliqueswhen

γRB <q

nc(nc − 1) + 2.

• Number of communities scales as√γRBm.

• For general null model, problemremains since

∑ij pij = 2m.

Resolution limit

• Modularity might miss ‘small’communities.

• Merge two cliques in ring of cliqueswhen

γRB <q

nc(nc − 1) + 2.

• Number of communities scales as√γRBm.

• For general null model, problemremains since

∑ij pij = 2m.

Resolution-limit-free

• Ronhovde & Nussinov model (aij = 1, bij = γ).

• Claim: resolution-limit-free, as merge depends only on ‘local’variables

γRN <1

n2c − 1.

• But, take pij = kikj , we obtain

γRB <1

2(nc(nc − 1) + 2)2,

also only ‘local’ variables. Hence, also resolution-limit-free?

• Problems of scale remain.

Resolution limit

Resolution limit

Resolution limit

Resolution limit

Resolution-limit-free

Defining resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Theorem (Swap optimal subpartitions)

If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.

Defining resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Theorem (Swap optimal subpartitions)

If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.

What methods areresolution-limit-free?

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.






















Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.








Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.

Local methods areresolution-limit-free.

Part IIWhen are communities

significant?

Modularity in non-modular graphs

Modularity as sign of community structure

• Modularity −1 ≤ Q ≤ 1.

• High modularity ⇒ community structure?

• Modularity higher than 0.3 seen as significant.

Modularity in non-modular graphs

Modularity as sign of community structure

• Modularity −1 ≤ Q ≤ 1.

• High modularity ⇒ community structure?

• Modularity higher than 0.3 seen as significant.

Many graphs have high modularity,but no community structure.

Modularity without community structure

Q = 0.31

Modularity Q 6≈ 0 for random graphs.

Significance

How significant is a partition?

Significance

E = 14

E = 9

Fixed partition

E = 11

Better partition

Significance

E = 14

E = 9

Fixed partition

E = 11

Better partition

• Not: Probability to find E edges in partition.

• But: Probability to find partition with E edges.

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).


Decompose partition






]

Significance



].


n2cD(pc ‖ p).


Decompose partition






]

Significance



].


n2cD(pc ‖ p).


Decompose partition






]

Significance



].


n2cD(pc ‖ p).


Decompose partition






]

Significance



].


n2cD(pc ‖ p).


Decompose partition






]

Significance



].


n2cD(pc ‖ p).

Significance

10−3 10−2 10−1 100103

104

105

106

γ

N E

Significance

10−3 10−2 10−1 100103

104

105

106

γ

N E S

Final ChapterWhat should I remember? And

what’s next?

Conclusions

To remember

• Modularity can hide small communities.

• Local methods avoid this problem (RN, CPM).

• High modularity 6⇒ significant: use significance.

What’s next?

• Various measures of significance: what’s the difference?

• Choose “correct” resolution ⇒ resolution limit?

Thank you!Questions?

Traag, Van Dooren & NesterovNarrow scope for resolution-limit-free community detectionPhys Rev E 84, 016114 (2011)

Traag, Krings & Van DoorenSignificant scales in community structureSci Rep 3, 2930 (2013)

Reichardt & BornholdtStatistical mechanics of community detection.Phys Rev E 74, 016110 (2006)

m www.traag.net B [email protected] @vtraag

http://www.traag.net

mailto:[email protected]

http://twitter.com/vtraag