Graph mining 2: Statistical approaches for graph mining

Graph mining 2Statistical approaches for graph mining

Nathalie Villa-Vialaneix

[email protected]://www.nathalievilla.org

Advanced mathematics for network analysisLuchon, May 3rd 2016

Nathalie Villa-Vialaneix | Graph mining 2 1/48

mailto:[email protected]

http://www.nathalievilla.org

Talk map...

Who am I? Statistician working in biostatistics at INRA ToulouseMy research interests are: data mining, network inference andmining, machine learning

Purpose of this talk: presenting a few statistical tools for graphmining (graph structure, important vertices) and clustering


Background

Unlike said so, G:

I undirected and connected graph;

I with vertices V = {x1, ..., xn};I with set of edges E;

I eventually with (positive and symmetric) weights on edges, wij

(st wii = 0, no self loop)I adjacency matrix A = (wij)i,j=1,...,n


Background

Unlike said so, G:






Background

Unlike said so, G:






Examples are made with...the toy example “Les Misérables” (co-appearance network inHugo’s novel)

MyrielNapoleon

MlleBaptistineMmeMagloire

CountessDeLoGeborand

ChamptercierCravatte

Count

OldMan

Labarre

Valjean

Marguerite

MmeDeRIsabeau

Gervais

Tholomyes

ListolierFameuil

BlachevilleFavourite

Dahlia

Zephine

Fantine

MmeThenardier

Thenardier

Cosette

Javert

Fauchelevent

Bamatabois

Perpetue

Simplice

Scaufflaire

Woman1

JudgeChampmathieu

BrevetChenildieu

Cochepaille

Pontmercy

Boulatruelle

Eponine

Anzelma

Woman2

MotherInnocent

Gribier

Jondrette

MmeBurgon

Gavroche

Gillenormand

Magnon

MlleGillenormand

MmePontmercy

MlleVaubois

LtGillenormand

Marius

BaronessT

MabeufEnjolras

Combeferre

Prouvaire

FeuillyCourfeyrac

BahorelBossuet

Joly

Grantaire

MotherPlutarch

GueulemerBabet

Claquesous

Montparnasse

Toussaint

Child1Child2

Brujon

MmeHucheloup

software and especially the R package igraph

the full script and the dataset is available on my website at:http://www.nathalievilla.org/teaching/toconet.html


http://www.nathalievilla.org/teaching/toconet.html











Basic description of the graph

lesmis

## IGRAPH U--- 77 254 --## + attr: layout (g/n), id (v/n), label (v/c), value (e/n)## + edges:## [1] 1-- 2 1-- 3 1-- 4 3-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--10## [11] 11--12 4--12 3--12 1--12 12--13 12--14 12--15 12--16 17--18 17--19## [21] 18--19 17--20 18--20 19--20 17--21 18--21 19--21 20--21 17--22 18--22## [31] 19--22 20--22 21--22 17--23 18--23 19--23 20--23 21--23 22--23 17--24## [41] 18--24 19--24 20--24 21--24 22--24 23--24 13--24 12--24 24--25 12--25## [51] 25--26 24--26 12--26 25--27 12--27 17--27 26--27 12--28 24--28 26--28## [61] 25--28 27--28 12--29 28--29 24--30 28--30 12--30 24--31 31--32 12--32## [71] 24--32 28--32 12--33 12--34 28--34 12--35 30--35 12--36 35--36 30--36## + ... omitted several edges

U--- means: Undirected, not Named (no name attribute for thevertices), not Weighted (no weight attribute for the edges) and notBipartite


System information

## R version 3.2.5 (2016-04-14)## Platform: x86_64-pc-linux-gnu (64-bit)## Running under: Ubuntu 14.04.4 LTS#### locale:## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C## [9] LC_ADDRESS=C LC_TELEPHONE=C## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C#### attached base packages:## [1] stats graphics grDevices utils datasets methods base#### other attached packages:## [1] igraph_1.0.1 knitr_1.12.3#### loaded via a namespace (and not attached):## [1] magrittr_1.5 formatR_1.3 tools_3.2.5 stringi_1.0-1## [5] highr_0.5.1 stringr_1.0.0 evaluate_0.8.3


Outline

Numerical characteristics

ClusteringModularity optimizationSpectral clusteringModel based clustering


Sketch of this section

Issue at stake:

I a graph is given

I numerical characteristics describing the graph, the nodes, area standard approach to describe it

I how to know that the observed value are unexpectedaccording to a so-called “null model”?



Issue at stake:

I a graph is given





Issue at stake:

I a graph is given




Standard (global) characteristics

I density: |E |n(n−1)/2 graph.density

I number of triangles: triangles (see also motifs)I transitivity: number of triangles divided by the number of

triplets with at least two edges transitivityI diameter: length of the longest shortest paths between two

nodes diameterI radius: minimal length, over all vertices in the graph, of the

longest shortest path linking this vertex to another vertexradius

I girth: length of the shortest circle in the graph girthI cohesion: minimum number of vertices to remove to

disconnect the graph


Standard (global) characteristics for “Les misérables”graph.density(lesmis); triangles(lesmis); length(triangles(lesmis))/3

## [1] 0.08680793## + 1401/77 vertices:## [1] 12 1 3 12 1 4 12 3 4 12 24 32 12 24 13 12 24 25 12 24 30 12 25## [24] 71 12 25 70 12 25 69 12 25 27 12 26 24 12 26 25 12 26 27 12 26 72 12## [47] 26 71 12 26 70 12 26 69 12 27 73 12 27 52 12 27 50 12 27 44 12 28 73## [70] 12 28 24 12 28 25 12 28 26 12 28 27 12 28 29 12 28 30 12 28 32 12 28## [93] 34 12 28 44 12 28 72 12 28 59 12 28 69 12 28 70 12 28 71 12 29 45 12## [116] 30 39 12 30 38 12 30 37 12 30 35 12 30 36 12 35 39 12 35 38 12 35 36## [139] 12 35 37 12 36 39 12 36 38 12 36 37 12 37 39 12 37 38 12 38 39 12 49## [162] 26 12 49 28 12 49 56 12 49 59 12 49 65 12 49 69 12 49 70 12 49 72 12## [185] 50 52 12 56 26 12 56 27 12 56 65 12 56 50 12 56 52 12 56 59 12 59 71## [208] 12 59 65 12 69 72 12 69 71 12 69 70 12 70 72 12 70 71 12 71 72 49 26## + ... omitted several vertices## [1] 467

transitivity(lesmis); diameter(lesmis); radius(lesmis); girth(lesmis)

## [1] 0.4989316## [1] 5## [1] 3## $girth## [1] 3#### $circle## + 3/77 vertices:## [1] 3 1 4


Comparison with random graphs...

Erdos-Renyi model with the same number of nodes and the samenumber of edges than the original graph (uniform probability toobserve an edge between two given nodes)

Method: compare the observed values with those of a largenumber of randomly generated random graphs (with no loop, onlyconnected graphs are kept)sample_gnm(vcount(lesmis), ecount(lesmis))




Method: compare the observed values with those of a largenumber of randomly generated random graphs (with no loop, onlyconnected graphs are kept)sample_gnm(vcount(lesmis), ecount(lesmis))


Results of the comparison with random graphs...For B = 500 graphs (only connected graphs are kept), we have:## density triangles transitivity diameter## Min. :0.08681 Min. :31.00 Min. :0.05834 Min. :4.000## 1st Qu.:0.08681 1st Qu.:43.00 1st Qu.:0.07907 1st Qu.:4.000## Median :0.08681 Median :47.00 Median :0.08701 Median :5.000## Mean :0.08681 Mean :47.55 Mean :0.08660 Mean :4.627## 3rd Qu.:0.08681 3rd Qu.:52.00 3rd Qu.:0.09415 3rd Qu.:5.000## Max. :0.08681 Max. :67.00 Max. :0.11793 Max. :6.000## radius girth cohesion## Min. :3.000 Min. :3 Min. :1.000## 1st Qu.:3.000 1st Qu.:3 1st Qu.:1.000## Median :3.000 Median :3 Median :2.000## Mean :3.004 Mean :3 Mean :1.599## 3rd Qu.:3.000 3rd Qu.:3 3rd Qu.:2.000## Max. :4.000 Max. :3 Max. :3.000

compared to: 0.0868079, 467, 0.4989316, 5, 3, 3, 1⇒ all values are standard except for:I the number of triangles and the transitivity which are larger:

local connectivity is strongest than expected in Erdos-Renyirandom graphs

I the cohesion which is in the lowest values of what is expectedin Erdos-Renyi random graphs: this again indicates astrongest local connectivity


Standard (local) characteristics

... for the vertex xi :I degree:

∣∣∣{xj : (xi , xj) ∈ E, j , i}∣∣∣ degree (or strength for the

weighted version,∑

j,i wij)I betweenness (or centrality): number of shortest paths

between any pair of vertices in the graph which pass throughxi betweenness

I eccentricity: maximal length of all the shortest paths goingfrom xi to any other vertex in the graph eccentricity

I closeness (or closeness centrality): 1∑j,i d(xi ,xj)

in which d(xi , xj)

is the length of the shortest path between xi and xj closeness

...and their distributions among all vertices.


Standard (local) characteristics for “Les misérables”

summary(degree(lesmis))

## Min. 1st Qu. Median Mean 3rd Qu. Max.## 1.000 2.000 6.000 6.597 10.000 36.000

summary(betweenness(lesmis))


summary(eccentricity(lesmis))


summary(closeness(lesmis))





Method: compare the observed values (average betweenness anddegree) with those of a large number of randomly generatedrandom graphs (with no loop, only connected graphs are kept)sample_gnm(vcount(lesmis), ecount(lesmis))




Method: compare the observed values (average betweenness anddegree) with those of a large number of randomly generatedrandom graphs (with no loop, only connected graphs are kept)sample_gnm(vcount(lesmis), ecount(lesmis))


Results of the comparison with random graphs...

For B = 500 graphs (only connected graphs are kept), we have:

## degree betweenness eccentricity closeness## Min. :6.597 Min. :54.64 Min. :3.597 Min. :0.005249## 1st Qu.:6.597 1st Qu.:55.93 1st Qu.:3.779 1st Qu.:0.005322## Median :6.597 Median :56.32 Median :3.857 Median :0.005340## Mean :6.597 Mean :56.36 Mean :3.863 Mean :0.005340## 3rd Qu.:6.597 3rd Qu.:56.71 3rd Qu.:3.909 3rd Qu.:0.005361## Max. :6.597 Max. :58.79 Max. :4.688 Max. :0.005430

compared to: 6.597, 62.364, 4.13, 0.00512

⇒ the observed average betweenness is higher and the observedaverage closeness is smaller for all the randomly generatedgraphs: this seems to indicate that, in average, shortest paths inthe graphs are longer than expected for graphs with uniformdistribution of the edges.


Degree distribution for “Les misérables”+

+

+

+

+

+

+

+

++

+++

++++ +

0 1 2 3

−4.

0−

3.5

−3.

0−

2.5

−2.

0−

1.5

log(k)

log(

P(k

))

Estimation of power law fit (left: α = 1.49) withfit_power_law(degree(lesmis) + 1, implementation ="R.mle")



Scale free model with a parameter for the power law identical tothe one previously estimated and the same number of nodes.Barabási and Albert model is used with a number of edges addedat each step which is chosen so that the final number of edgesresembles that of the original graph (3 edges, which gives 225edges in the final graph, compared to 254)

P(degree = k) = k−α

Method: compare the observed values with those of a largenumber of randomly generated random graphssample_pa(vcount(lesmis), m = 3, power = ..., directed =

FALSE)



Scale free model with a parameter for the power law identical tothe one previously estimated and the same number of nodes.Barabási and Albert model is used with a number of edges addedat each step which is chosen so that the final number of edgesresembles that of the original graph (3 edges, which gives 225edges in the final graph, compared to 254)

P(degree = k) = k−α

Method: compare the observed values with those of a largenumber of randomly generated random graphssample_pa(vcount(lesmis), m = 3, power = ..., directed =

FALSE)


Results of the comparison with random graphs...For B = 500 graphs, we have:

## density triangles transitivity diameter## Min. :0.0769 Min. : 72 Min. :0.1075 Min. :3.000## 1st Qu.:0.0769 1st Qu.:102 1st Qu.:0.1250 1st Qu.:4.000## Median :0.0769 Median :112 Median :0.1307 Median :4.000## Mean :0.0769 Mean :113 Mean :0.1303 Mean :3.988## 3rd Qu.:0.0769 3rd Qu.:124 3rd Qu.:0.1359 3rd Qu.:4.000## Max. :0.0769 Max. :153 Max. :0.1530 Max. :5.000## radius girth cohesion degree betweenness## Min. :2.000 Min. :3 Min. :3 Min. :5.844 Min. :41.86## 1st Qu.:2.000 1st Qu.:3 1st Qu.:3 1st Qu.:5.844 1st Qu.:47.88## Median :2.000 Median :3 Median :3 Median :5.844 Median :49.55## Mean :2.314 Mean :3 Mean :3 Mean :5.844 Mean :49.35## 3rd Qu.:3.000 3rd Qu.:3 3rd Qu.:3 3rd Qu.:5.844 3rd Qu.:50.97## Max. :3.000 Max. :3 Max. :3 Max. :5.844 Max. :55.73## eccentricity closeness## Min. :2.935 Min. :0.005407## 1st Qu.:3.130 1st Qu.:0.005695## Median :3.221 Median :0.005788## Mean :3.234 Mean :0.005805## 3rd Qu.:3.325 3rd Qu.:0.005901## Max. :3.662 Max. :0.006334

compared to: 0.087, 467, 0.499, 5, 3, 3, 1, 6.597, 62.364, 4.13, 0.00512

⇒ the number of triangles, the transitivity, the radius, the average degree, the

average betweenness and the eccentricity are larger than in power law graphs

with power 1.495, whereas the cohesion and the closeness are smaller.


Limits of the previous approaches

Until now, we have compared the real graph to graphs randomlygenerated according to a given random model but:

I this approach only gives information about globalcharacteristics of the observed graph;

I none of the distributions of the current characteristics ispreserved during the process, especially not the degreedistribution which is central for controlling local/globalconnectivity, counts of specific patterns...


A null model closer to the real graph...

Sketch of statistical tests on graphs

1. sample at random within the set of graphs with the samedegree distribution than the observed graph (B times)

2. compute a numerical statistics for each of these randomlygenerated graphs

3. comparing the observed value of the statistics and itsdistribution over the random graphs, a p-value can be derived(for B large enough)

Two main approaches to sample at random with fixed degrees:I configuration model [Bender and Canfield, 1978]

I permutation approach [Rao et al., 1996, Roberts Jr., 2000]


A null model closer to the real graph...

Sketch of statistical tests on graphs

1. sample at random within the set of graphs with the samedegree distribution than the observed graph (B times)

2. compute a numerical statistics for each of these randomlygenerated graphs

3. comparing the observed value of the statistics and itsdistribution over the random graphs, a p-value can be derived(for B large enough)

Two main approaches to sample at random with fixed degrees:I configuration model [Bender and Canfield, 1978]

I permutation approach [Rao et al., 1996, Roberts Jr., 2000]


Sampling at random within the set of graphs with a givendegree distribution

Aim:I all graphs can exhaustively be sampledI all graphs have the same probability to be sampled

⇒ MCMC approach

Method:1: Start from the observed graph G2: for t = 1→ T do3: Select uniformly at random two edges e1 = (x1

i , x1j ) and e2 = (x2

i , x2j ) ∈ E

4: E′ ← E \ {e1, e2} ∪ {e1s , e

2s } with e1

s = (x1i , x

2j ) and e2

s = (x2i , x

1j )

5: if G′ = (V ,E′) is simple and connected then6: G ← G′

7: end if8: end for9: return G


Sampling at random within the set of graphs with a givendegree distribution

Aim:I all graphs can exhaustively be sampledI all graphs have the same probability to be sampled

⇒ MCMC approach

Method:1: Start from the observed graph G2: for t = 1→ T do3: Select uniformly at random two edges e1 = (x1

i , x1j ) and e2 = (x2

i , x2j ) ∈ E

4: E′ ← E \ {e1, e2} ∪ {e1s , e

2s } with e1

s = (x1i , x

2j ) and e2

s = (x2i , x

1j )

5: if G′ = (V ,E′) is simple and connected then6: G ← G′

7: end if8: end for9: return G


In practice...This method is used in [Milo et al., 2004] with T = 100. It can beperformed using rewire(lesmis, keeping_degseq(n = 100))

Number of triangles

Fre

quen

cy

200 300 400

020

4060

8010

012

0

transitivity

Fre

quen

cy

0.25 0.35 0.45

020

4060

8010

0


In practice... for the vertex characteristicsFind a(n empirical) p-value for all vertices which indicates if itsbetweenness is higher or lower than expected with respect to itsdegree: ratio of random graphs for which the observedbetweenness is higher (resp. lower) than 95% of thebetweennesses for the corresponding vertex in random graphs.

Myriel

Valjean

ListolierFameuilBlachevilleFavourite

Dahlia

Zephine

Fantine

JudgeChampmathieu

BrevetChenildieu

Cochepaille

LtGillenormand

Marius

Combeferre

Prouvaire

FeuillyCourfeyracBahorelJoly

Grantaire

GueulemerBabetClaquesous

MontparnasseBrujon

MmeHucheloup


More on random graphs generation

Sometimes, one wants to compare the observed graph with amore sophisticated (constrained) null model (taking into accountsome additional information on edges or nodes for instance):

I This can be achieved using the same principle and throwingaway the random graphs which do not satisfy the constrains.

Warning: The more sophisticated the model is, the morecostly the simulation would be. For instance, only removinggraphs with multiple edges and graphs which are notconnected leads to throw away 47 simulations over 500.

I Possible solution: [Tabourier and Cointet, 2011] use multiple edgeswitching to improve the simulations such simulations.




I This can be achieved using the same principle and throwingaway the random graphs which do not satisfy the constrains.Warning: The more sophisticated the model is, the morecostly the simulation would be. For instance, only removinggraphs with multiple edges and graphs which are notconnected leads to throw away 47 simulations over 500.





I This can be achieved using the same principle and throwingaway the random graphs which do not satisfy the constrains.Warning: The more sophisticated the model is, the morecostly the simulation would be. For instance, only removinggraphs with multiple edges and graphs which are notconnected leads to throw away 47 simulations over 500.



Outline

Numerical characteristics

ClusteringModularity optimizationSpectral clusteringModel based clustering



Issue at stake:

I short overview of different types of methods for vertexclustering

I only simple clustering (although some methods foroverlapping clustering, clustering according to vertex/edgeattributes, clustering of bipartite graphs... also exist)

I statistical relevance and comparison of clustering results


A short overview of vertex clustering

Purpose: Find communities or modules (i.e., groups of vertices) stvertices inside the community are strongly connected whereasvertices between two communities are slightly connected.

Some approaches to perform such task:I optimizing a given criterion (e.g., modularity maximization)I spectral clusteringI model based clusteringI ... (see [Fortunato and Barthélémy, 2007, Schaeffer, 2007,

Brohée and van Helden, 2006])


A short overview of vertex clustering

Purpose: Find communities or modules (i.e., groups of vertices) stvertices inside the community are strongly connected whereasvertices between two communities are slightly connected.

Some approaches to perform such task:I optimizing a given criterion (e.g., modularity maximization)I spectral clusteringI model based clusteringI ... (see [Fortunato and Barthélémy, 2007, Schaeffer, 2007,

Brohée and van Helden, 2006])


Clustering based on criterion optimizationI “Cut” criteria: Given a number of clusters, K , find the partition

of V , C1, . . . , CK such that it solves the mincut problem, i.e., itminimizes

cut(A1, . . . ,AK ) =12

K∑k=1

∑xi∈Ak , xj<Ak

wij

I “Modularity” criterion [Newman and Girvan, 2004]: Given anumber of clusters, K , find the partition of V , C1, . . . , CK

which maximizes

Q(A1, . . . ,Ak ) =1

2m

K∑k=1

∑xi , xj∈Ck

(wij − Pij)

with Pij : weight of a “null model” (graph with the same degreedistribution but no preferential attachment): Pij =

didj2m with

di =12∑

j,i wij .



of V , C1, . . . , CK such that it solves the mincut problem, i.e., itminimizes

cut(A1, . . . ,AK ) =12

K∑k=1

∑xi∈Ak , xj<Ak

wij

Problem: The mincut problem often separates individualvertices from the rest of the graph.


which maximizes

Q(A1, . . . ,Ak ) =1

2m

K∑k=1

∑xi , xj∈Ck

(wij − Pij)


didj2m with

di =12∑

j,i wij .



of V , C1, . . . , CK such that it solves the “RatioCut” problem,i.e., it minimizes

RatioCut(A1, . . . ,AK ) =12

K∑k=1

∑xi∈Ak , xj<Ak

wij

|Ak |

(forces larger communities than the mincut problem).


which maximizes

Q(A1, . . . ,Ak ) =1

2m

K∑k=1

∑xi , xj∈Ck

(wij − Pij)


didj2m with

di =12∑

j,i wij .



of V , C1, . . . , CK such that it solves the “NCut” problem, i.e.,it minimizes

NCut(A1, . . . ,AK ) =12

K∑k=1

∑xi∈Ak , xj<Ak

wij

Vol(Ak )

in which Vol(Ak ) =∑

xi , xj∈Akwij (also forces larger

communities than the mincut problem).


which maximizes

Q(A1, . . . ,Ak ) =1

2m

K∑k=1

∑xi , xj∈Ck

(wij − Pij)


didj2m with

di =12∑

j,i wij .


Clustering based on criterion optimizationI “Cut” criteria


which maximizes

Q(A1, . . . ,Ak ) =1

2m

K∑k=1

∑xi , xj∈Ck

(wij − Pij)


didj2m with

di =12∑

j,i wij .


Advantages and drawbacks

I mincut is not adapted to vertex clustering in practice (clusterswith isolated vertices)

I the other three methods are NP hard to solve...

I the modularity takes into account asymmetry in degreedistribution by correcting the importance of a vertex by itsdegree: it is often more adapted to real life graphs

I [Fortunato and Barthélémy, 2007] showed that modularity has asmall resolution issue. [Bickel and Chen, 2009] gave conditionsfor consistency of the clusters obtained by modularityoptimization in Stochastic Block Models (SBM).

Remark: Relaxation of RatioCut problem and NCut problem givesspectral clustering. Modularity optimization is often solved byapproximation methods.




I the other three methods are NP hard to solve...I the modularity takes into account asymmetry in degree

distribution by correcting the importance of a vertex by itsdegree: it is often more adapted to real life graphs






I the other three methods are NP hard to solve...I the modularity takes into account asymmetry in degree

distribution by correcting the importance of a vertex by itsdegree: it is often more adapted to real life graphs




A short description of approximation methods formodularity optimization

I simple greedy algorithms ([Newman, 2004] and[Clauset et al., 2004] for a fast version): hierarchical clusteringwhich merges pairs of vertices with the highest contribution tomodularity cluster_fast_greedy

I multi-level greedy algorithms ([Blondel et al., 2008], also knownas “Louvain algorithm” and [Noack and Rotta, 2009] for animproved version): hierarchical approach in which vertices aresometimes re-assigned to a different community in a greedyway cluster_louvain

I simulated annealing ([Reichardt and Bornholdt, 2006] uses aspin-glass model which, in some cases, is equivalent tomodularity maximization) cluster_spinglass(..., gamma= 1, update.rule = "config")

...to be compared (when usable) with the exact optimizationcluster_optimal.




















Examples

res_time <- cbind(system.time(res_hierarchical <- cluster_fast_greedy(lesmis)),system.time(res_multilevel <- cluster_louvain(lesmis)),system.time(res_annealing <- cluster_spinglass(lesmis)),system.time(res_exact <- cluster_optimal(lesmis))

)[3, ]

## hierarchical multilevel annealing exact## 0.002 0.002 1.907 21.656


Computational time (greedy approaches)

Difference (computational time) between the first two approaches(100 evaluations):

library(microbenchmark)res_micro <- microbenchmark(cluster_fast_greedy(lesmis),

cluster_louvain(lesmis))

cluster_fast_greedy(lesmis)

cluster_louvain(lesmis)

1000Time [microseconds]


Accuracy of the clusteringhierarchical − 0.5006 − 5 multilevel − 0.5556 − 6

simulated annealing − 0.5596 − 7 exact − 0.56 − 6


Assessing the relevance of a clusteringGiven a graph, the modularity optimization will always return aclustering: how to know that this clustering is meaningful? (i.e.,that its modularity is large)

Similarly as previously, compare the maximum modularity to themaximum modularity over a large number of randomly generatedgraphs (with same degree sequence).

Modularity

Fre

quen

cy

0.30 0.35 0.40 0.45 0.50 0.55

020

4060

80


Assessing the relevance of a clusteringGiven a graph, the modularity optimization will always return aclustering: how to know that this clustering is meaningful? (i.e.,that its modularity is large)Similarly as previously, compare the maximum modularity to themaximum modularity over a large number of randomly generatedgraphs (with same degree sequence).

Modularity

Fre

quen

cy

0.30 0.35 0.40 0.45 0.50 0.55

020

4060

80


Relation between RatioCut and Laplacian[von Luxburg, 2007] shows that minimizing

RatioCut(A1, A2) =12

2∑k=1

∑xi∈Ak , xj<Ak

wij

|Ak |

is equivalent to the following constrained problem:

minA1, ,A2

v>Lv st v ⊥ 1n and ‖v‖ =√

n

for v the vector of Rn obtained from the partition by:

vi =

{ √(|A2|)/|A1| if vi ∈ A1

−√|A1|/(|A2|) otherwise.

and L is the Laplacian of the graph, n × n-matrix with entries:

Lij =

{−wij if i , jdi =

∑j,i wij otherwise

.


... and more remarks

I this is a discrete (since v can only have two values) andNP-hard problem;

I the same relation holds between NCut problem andnormalized Laplacian D−1/2LD−1/2 is whichD = Diag(d1, . . . , dn);

I a generalization of these results exist for K > 2.












Some properties of the LaplacianRelations with the graph structure:

1

2

3

4

5

has a null space spanned by the vectors

11100

and

00011

.

Random walk point of view: If we consider a random walk on thegraph with probability to jump from one node to the other equal towijdi

then the average time to go from one node to another(commute time) is given by L+ [Fouss et al., 2007].


Some properties of the LaplacianRelations with the graph structure: the vector 1n spans the nullspace for connected graphs.






then NCut(A1,A2) is interpreted as the probability to go from A1

to A2 or from A2 to A1.

the average time to go from one node toanother (commute time) is given by L+ [Fouss et al., 2007].






Spectral clustering: relaxing the constrains

K has to be given. Solving minA1, ,A2 Tr(U>LU) for a K × n matrix Ust U>U = 1:

1. Compute the first K eigenvectors of L , u1, . . . , uK and writeU = (u1, . . . ,uK ) (a n × K matrix).

2. For i = 1, . . . , n, denote ui ∈ RK the i-th row of U. Cluster the

points (ui)i=1,...,n using a clustering algorithm (e.g., k-means).

embed_laplacian_matrix(..., no = ..., which = "sa",scaled = ...) et kmeans(..., centers = ..., nstart =10)
















Spectral clustering in practice

res_time_spec <- system.time({spec_embed <- embed_laplacian_matrix(lesmis, no = 6, which = "sa",

scaled = FALSE)res_spectral <- kmeans(spec_embed$X[ ,-1], centers = 6, nstart = 1)

})[3]res_time_spec

## elapsed## 0.017

Time is between the greedy approaches for modularityoptimization and simulated annealing for modularity optimization.


Accuracy of the clusteringspectral clustering − 0.4461 − 6 exact − 0.56 − 6

Modularity is smaller (as expected) and clusters tend to be moreunbalanced. An empirical comparison between the performance ofspectral clustering and modularity optimization is provided in[Bickel and Chen, 2009]. [Lei and Rinaldo, 2015] gives conditions for theconsistency of spectral clustering in stochastic block models.


A mixture model for networks[Snijders and Nowicki, 1997]: The observed network G is supposed tobe the realization of some random graph model in which verticesare organized in groups.

description of the model

I vertices xi belong to an unknow class in {C1, ...,CK } (K isgiven)⇒ latent (unobserved) variables

Zi ∼ M(1, α = (α1, . . . , αK ))

in which αk is the probability that xi belongs to Ck

I given the class membership, the probabilities to have an edgebetween xi and xj are all independant and obtained by:

typically, the Bernouilli distribution with probability πkk ′ with

πkk ′ =

{p1 if k = k ′

p0 if k , k ′for p1 > p0.





Zi ∼ M(1, α = (α1, . . . , αK ))


I given the class membership, the probabilities to have an edgebetween xi and xj are all independant and obtained by:

wij = 1|Zik Zik ′ = 1 ∼ L(., πkk ′)

for a given distribution L

typically, the Bernouilli distribution

with probability πkk ′ with πkk ′ =

{p1 if k = k ′






Zi ∼ M(1, α = (α1, . . . , αK ))


I given the class membership, the probabilities to have an edgebetween xi and xj are all independant and obtained by:typically, the Bernouilli distribution with probability πkk ′ with

πkk ′ =

{p1 if k = k ′



Basic principle for using SBM

1. assignments of vertices to groups;

2. parameter estimation ((αk )k and (πkk ′)k ,k ′);

3. estimation of the number of groups.

Estimation is made by Bayesian or frequentist approaches andVariational EM (see e.g., [Daudin et al., 2008] for the morecomputationally efficient frequentist approach). Number of nodescan be chosen using ICL [Biernacki et al., 2000].

All this is implemented in the package blockmodels [Léger, 2016].BM_bernoulli("SBM_sym", as_adjacency_matrix(...,sparse = FALSE))BM_bernoulli$estimate()
















SBM in practice

library(blockmodels)res_time_sbm <- system.time({res_sbm <- BM_bernoulli("SBM_sym",

as_adjacency_matrix(lesmis, sparse = FALSE))res_sbm$estimate()

})[3]

res_time_sbm

## elapsed## 1.821

opt_K <- which.max(res_sbm$ICL)opt_K

## [1] 6

sbm_clust <- apply(res_sbm$memberships[[opt_K]]$Z, 1, which.max)


Accuracy of the clustering

SBM clustering − 0.4556 − 6 exact − 0.56 − 6

Modularity is smaller (as expected) but groups can be interpretedby being sets of vertices with similar connecting patterns.


Comparing clusteringVarious metrics ((di)similarities) exist to compare clustering,among which:I Rand Index [Rand, 1971] compare(..., method = "rand"):

number of agreements between the two clusteringsn

I Normalized Mutual Information [Danon et al., 2005]

compare(..., method = "nmi")

K1∑k=1

K2∑k ′=1

nkk ′

nlog

nkk ′nn1

k n2k ′

in which Kj is the number of clusters in clustering j, nj

k is thenumber of vertices classified into cluster k for clustering j andnkk ′ is the number of vertices classified into cluster k forclustering 1 and cluster k ′ for clustering 2. The similarity isnormalized so that it is between 0 and 1 (1 is for a perfectmatch).


How do clusterings relate?

Method:

1. compute a dissimilarity based on Rand index or NMI(1 − value)

2. perform clustering (of the results of vertex clustering) usinghierarchical clustering hclust


How do clusterings relate?

sbm

spec

tral

hier

arch

ical

mul

tilev

el

anne

alin

g

exac

t

0.0

0.1

0.2

0.3

Rand index

hclust (*, "complete")as.dist(compare_rand)

Hei

ght

sbm

spec

tral

hier

arch

ical

mul

tilev

el

anne

alin

g

exac

t

0.0

0.2

0.4

0.6

NMI

hclust (*, "complete")as.dist(compare_nmi)

Hei

ght


Any question?


Bender, E. and Canfield, E. (1978).The asymptotic number of labeled graphs with given degree sequences.Journal of Combinatorial Theory, Series A, 24(3):296–307.

Bickel, P. and Chen, A. (2009).A nonparametric view of network models and Newman-Girvan and other modularities.Proceedings of the National Academy of Sciences, USA, 106(50):21068–21073.

Biernacki, C., Celeux, G., and Govaert, G. (2000).Assessing a mixture model for clustering with the integrated completed likelihood.IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725.

Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008).Fast unfolding of communites in large networks.Journal of Statistical Mechanics: Theory and Experiment, P10008:1742–5468.

Brohée, S. and van Helden, J. (2006).Evaluation of clustering algorithms for protein-protein interaction networks.BMC Bioinformatics, 7(488).

Clauset, A., Newman, M. E. J., and Moore, C. (2004).Finding community structure in very large networks.Physical Review E, 70:066111.

Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005).Comparing community structure identification.Journal of Statistical Mechanics, page P09008.

Daudin, J., Picard, F., and Robin, S. (2008).A mixture model for random graphs.Statistics and Computing, 18:173–183.

Fortunato, S. and Barthélémy, M. (2007).Resolution limit in community detection.


In Proceedings of the National Academy of Sciences, volume 104, pages 36–41.doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract.

Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007).Random-walk computation of similarities between nodes of a graph, with application to collaborativerecommendation.IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369.

Léger, J. (2016).Blockmodels: a R-package for estimating in LBM and SBM, with many pdf, with or without covariates.Preprint arXiv 1602.07587v1. Submitted for publication.

Lei, J. and Rinaldo, A. (2015).Consistency of spectral clustering in stochastic block models.The Annals of Statistics, 43(1):215–237.

Milo, R., Kashtan, N., Itzkovitz, S., Newman, M., and Alon, U. (2004).On the uniform generation of random graphs with prescribed degree sequences.eprint arXiv: cond-mat/0312028v2.

Newman, M. (2004).Fast algorithm for detecting community structure in networks.Physical Review E, 69:066133.

Newman, M. and Girvan, M. (2004).Finding and evaluating community structure in networks.Physical Review, E, 69:026113.

Noack, A. and Rotta, R. (2009).Multi-level algorithms for modularity clustering.In SEA 2009: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257–268,Berlin, Heidelberg. Springer-Verlag.

Rand, W. (1971).


http://www.pnas.org/content/104/1/36.abstract

Objective criteria for the evaluation of clustering methods.Journal of the American Statistical Association, 66(336):846–850.

Rao, A., Jana, R., and Bandyopadhyay, S. (1996).A markov chain monte carlo method for generating random (0, 1)-matrices with given marginals.Sankhyã: The Indian Journal of Statistics, Series A (1961-2002), 58(2):225–242.

Reichardt, J. and Bornholdt, S. (2006).Statistical mechanics of community detection.Physical Review, E, 74(016110).

Roberts Jr., J. (2000).Simple methods for simulating sociomatrices with given marginal totals.Social Networks, 22(3):273 – 283.

Schaeffer, S. (2007).Graph clustering.Computer Science Review, 1(1):27–64.

Snijders, T. and Nowicki, K. (1997).Estimation and prediction for stochastic block-structures for graphs with latent block structure.Journal of Classification, 14:75–100.

Tabourier, L.and Roth, C. and Cointet, J. (2011).Generating constrained random graphs using multiple edge switches.ACM Journal of Experimental Algorithmics, 16(1):1.7.

von Luxburg, U. (2007).A tutorial on spectral clustering.Statistics and Computing, 17(4):395–416.