42
Clustering dynamic random graphs Catherine Matias CNRS - Sorbonne Universit´ e, Universit´ e Paris Diderot, Paris [email protected] http://cmatias.perso.math.cnrs.fr/ Journ´ ees de Statistique Saclay, May 2018

Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Clustering dynamic random graphs

Catherine Matias

CNRS - Sorbonne Universite, Universite Paris Diderot, [email protected]

http://cmatias.perso.math.cnrs.fr/

Journees de StatistiqueSaclay, May 2018

Page 2: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Outline

Dynamic Random Graphs: the data

Graphs clustering: different approaches

The stochastic block model

Clustering dynamic networksClustering graphs sequencesClustering links streams (with no duration)

Page 3: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Dynamic interactions data

Types of data and their representation

One should distinguish between

I Long time relations (eg social relations, physical wiring ofrouters, . . . ): graphs sequences

I Short time interactions (eg: pone call, physical encounter,. . . ): temporal networks or stream links

For a nice review, see [Holme(2015)].Pictures that follow are from [Gaumont(2016)].

Page 4: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Graphs sequences

18 Chapitre 1. Etat de l’art

en fonction des contacts qui existent entre les personnes. Il est donc tout naturel des’appuyer initialement sur un graphe dont les nœuds representent des personnes et lesliens representent les interactions entre personnes. En plus du graphe, il est necessaired’ajouter l’etat de chaque nœud, e.g. sain ou infecte. Ainsi, un nœud sain ne peut de-venir infecte que s’il est relie a un nœud infecte dans le graphe. Ce genre de modelesmet donc en avant un chemin de diffusion, e.g. une suite de personnes transmettant lamaladie de l’un a l’autre.

Les premiers travaux [PSV01] s’appuient sur un graphe d’interactions de personnesagregeant toute l’information temporelle. Or, la prise en compte du temps dans cecontexte est primordiale car il est possible que le chemin d’infections simule dans legraphe agrege ne soit pas realisable si l’on prend en compte le temps. Imaginons 3personnes A, B, C telles que A et B interagissent a l’instant 1, et B et C interagissenta l’instant 2. Dans le graphe agrege, il est possible pour C d’infecter A via B alors quedans la realite cela n’est pas possible. L’ajout du temps a donc une forte influence surles resultats obtenus dans le contexte epidemiologique et ce phenomene est d’ailleurstres etudie [GPBC15, KKP+11, JPKK14, HK14, HL14, SWP+14, PJHS14].

Une fois reconnue l’importance du temps, il est necessaire de trouver un nouveauformalisme etendant la theorie des graphes pour en tenir compte. Nous presentonsmaintenant differentes extensions possibles de la theorie des graphes, dans les sous-sections 1.3.1 et 1.3.2. Nous detaillons egalement comment la recherche de commu-nautes se transpose dans ces nouveaux formalismes et plus generalement quels sontles problemes qu’ils permettent de resoudre. Des etats de l’art dans ce domaine ontd’ailleurs deja ete esquisses [BBC+14, CA14, HKW14].

1.3.1 Extensions avec pertes d’informations temporelles

Series de graphes

a

bc

def

ko

nm

g

ij

l

h

a

bc

def

ko

nm

g

ij

l

h

a

bc

def

ko

nm

g

ij

l

h

T=2T=1 T=3FIGURE 1.3 – Exemple de serie de graphes sur trois intervalles de temps.

La premiere solution qui a ete apportee ne prend le temps en compte que partiel-lement. Il s’agit de manipuler une serie de graphes dont chaque graphe represente lereseau durant un intervalle de temps donne. Ainsi, il est possible d’appliquer les outilsde la theorie des graphes sur chaque intervalle. Cependant chaque intervalle de tempsest represente par un graphe agrege. Il y a donc toujours une perte d’information. Plusformellement, une serie de graphe est definie par G = {Gi}i<T ou T est un entier (voirla figure 1.3).

Remarks

I In practice, there could be small variations in theindividuals present at each time step,

I These data are sometimes obtained through aggregationI possible loss of informationI problem of choosing the time window for aggregation.

Page 5: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Temporal networks

1.3. Extensions temporelles des graphes 21

Les graphes multicouches representent les donnees evoluant dans le temps mais ilsmodelisent aussi tres bien d’autres situations. Par exemple, ils representent facilementles differents moyens de transport dans une ville ou chaque moyen de transport (bus,voiture, metro ...) est represente par une couche. Plusieurs travaux [DSRC+13, KAB+14,BBC+14, WFZ15] decrivent les graphes multicouches et leurs applications.

Detection de communautes Grace au formalisme de graphe multicouches, il est pos-sible de traiter le temps de maniere un peu plus fine que dans les series de graphescar il permet de mieux suivre l’evolution des nœuds. Comme un graphe multicouchesest un graphe, il est possible d’adapter les methodes existantes pour tenir compte desdifferents types de liens. C’est le cas d’Infomap [DLAR15], de la modularite [MRM+10,BPW+13, BPW+16] et du SBM [SSTM15, Pei15].

ResumeLes series de graphes, les tenseurs 3D et les graphes multicouches permettent deprendre en compte le temps tout en autorisant l’utilisation de methodes concues pourdes graphes statiques. Or, ces approches reposent sur une decoupe du temps en sous-intervalles durant lesquels le temps n’est plus pris en compte. Il peut etre delicat dedefinir ces intervalles de temps : la construction des graphes agreges entraıne uneperte d’information temporelle et cela a une influence sur la precision temporelle desstructures communautaires qui sont manipulables. Il n’est pas envisageable d’aug-menter le nombre d’intervalles de temps car, d’une part, des graphes agreges auraienttres peu de liens et, d’autre part, le temps de calcul serait tres fortement augmente.

1.3.2 Extensions sans perte d’information temporelle

Graphes temporels

a

bc

def

a

f

bc

de

a

ed

cb

f

a

bc

def

a

f

bc

de

T=1 T=3 T=3.5 T=3.6 T=4

FIGURE 1.5 – Graphe temporel avec des ajouts de lien representes en traitsepais verts et des suppressions de lien representees par des liens pointilles

rouges.

Les graphes temporels (Time Varying Graph ou Evolving Graph) permettent de te-nir compte de l’ensemble de l’information temporelle. Pour cela au lieu de considererdes intervalles de temps, ils considerent l’ensemble des modifications qui affectent legraphe : les ajouts et retraits de liens. En pratique, cela revient a considerer sur chaquelien une fonction de presence dependant du temps qui vaut 1 a un instant t si le lienexiste a cet instant et 0 sinon. Ainsi, il est possible de connaıtre la structure de graphe achaque instant. Ce formalisme est presente dans differents travaux [CFQS11, WZF15]et illustre dans la figure 1.5. Dans cette figure, on voit apparaıtre l’ordre de modifica-tion du graphe. Tout d’abord, les liens (b, f) et (c, d) disparaissent puis les liens (b, e) et(f, c) apparaissent chacun leur tour.

Remarks

I Again, variations in node presence/absence is possible,

I Here, there is no loss of information.

I Ideal setup in the sense that most of the time, we do nothave all this knowledge.

Page 6: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Links streams [Latapy et al.(2017)]

22 Chapitre 1. Etat de l’art

Detection de communautes Dans un graphe temporel, une structure de graphe existea chaque instant. Il est donc possible de calculer apres chaque modification l’evolutiond’une metrique. Par exemple, il est possible de calculer apres l’ajout d’un lien le nou-veau degre interne des nœuds concernes par ce changement. En fonction de l’evolutionde cette metrique, on decide alors d’ajouter ou de retirer un nœud voire de fusionnerdeux communautes. Li et al. [LHB+12] se basent sur le nombre de liens que partageun nœud avec les communautes environnantes. Ainsi, un nœud est toujours dans lacommunaute avec laquelle il partage le plus de liens. Shang et al. [SLX+14], Cordeiroet al. [CSG16] et Sun et al. [SHZ+14] se basent sur l’evolution de la modularite. Ce-pendant, ces approches ne permettent pas de representer l’ensemble des evolutionsde communaute possibles, en particulier l’apparition d’une nouvelle communaute.C’est pourquoi l’evolution de la structure courante peut mener a une structure ayantune faible qualite. Une autre approche a ete proposee par Cazabet et al. [CAH10] afind’ameliorer l’evolution de la partition. Ils utilisent une metrique locale basee sur lenombre de chemins de longueur 2 existant entre un nœud et une communaute. Apreschaque modification, ils considerent egalement la possibilite de creer une nouvellecommunaute sous la forme d’une petite clique. Ainsi, ils assurent une meilleure qualitede la partition au cours de l’evolution.

Flots de liens

Time

b

e

cd

f

a

6 9870 321 104 5

FIGURE 1.6 – Flot de liens entre 6 nœuds ( en ordonnees) au cours dutemps (en abscisses). Dans l’exemple, il existe un lien entre a et b durant

l’intervalle [4, 6].

Dans les graphes temporels, toute l’information temporelle est gardee. Cependant,l’intuition derriere cette methode est qu’il existe une structure de graphe a chaqueinstant. Cette hypothese n’est pas toujours verifiee, en particulier lorsque les liens ap-paraissent et disparaissent tres rapidement. C’est le cas des appels telephoniques quidurent rarement plus d’une heure ou bien de maniere plus frappante avec les SMS etles courriels qui n’ont meme pas de duree. Dans ces contextes, il n’est pas possible desupposer qu’a chaque instant une structure de graphe pertinente existe.

Il faut donc un formalisme et des metriques qui s’adaptent a ce contexte. C’estpour repondre a ce besoin que le formalisme de flot de liens a ete pense. Le but estde construire un objet ne presupposant aucune structure et qui stocke toute l’informa-tion disponible. Meme si le formalisme ne presuppose aucune contrainte structurelle,il se peut que le reseau represente en ait. Par exemple dans les telecommunications,une personne ne peut appeler qu’une ou deux personnes en meme temps. Plusieurs

Remarks

I Here, there is no underlying graph!

I One could add in the data (and in its visualisation) the infothat one individual is not present during some time periods,

I Again, no loss of information.

Page 7: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Outline

Dynamic Random Graphs: the data

Graphs clustering: different approaches

The stochastic block model

Clustering dynamic networksClustering graphs sequencesClustering links streams (with no duration)

Page 8: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Graph clustering: why and how? I

Why?

I Networks are intrinsically heterogeneous: need to accountfor different nodes behaviours,

I Summarise network information through a higher-levelview (zoom-out the network),

I Some networks exhibit modularity: modules orcommunities are groups of nodes with high number ofintra-connections and low number of outer-connections;

I Other structures might be of interest: hierarchical groups,hubs, periphery nodes, homophilic/heterophilic structures,. . .

Page 9: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Graph clustering: why and how? II

How?Many methods, with different aims

I Searching for communities,I Modularity-based approaches;I Random walk algorithms;I Spectral clustering (NB: absolute spectral clust. also

captures heterophilic struct.);I Latent space models by [Hoff et al.(2002)].

I Searching for groups, without any a priori on theirstructure: Stochastic block models (SBMs).SBMs search for groups of nodes with a similarconnectivity behaviour towards the other groups.

I Recently, mixtures of ERGMs [Vu et al.(2013)].

Page 10: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Outline

Dynamic Random Graphs: the data

Graphs clustering: different approaches

The stochastic block model

Clustering dynamic networksClustering graphs sequencesClustering links streams (with no duration)

Page 11: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Stochastic block model (binary graphs)

1 2

3

4

5

6

7

84

5

6

7

8

γ••

9

10

γ••

γ••

γ••

γ••

n = 10, Z5• = 1

A12 = 1, A15 = 0

Binary case (parametric model with θ = (π,γ))

I K groups (=colors •••).I {Zi}1≤i≤n i.i.d. vectors Zi = (Zi1, . . . , ZiK) ∼M(1,π),

with π = (π1, . . . , πK) groups proportions. Zi not observed(latent).

I Observations: presence/absence of an edge {Aij}1≤i<j≤n,

I Conditional on {Zi}’s, the r.v. Aij are independentB(γZiZj ).

Page 12: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Stochastic block model (weighted graphs)

1 2

3

4

5

6

7

84

5

6

7

8

γ••

9

10

γ••

γ••

γ••

γ••

n = 10, Z5• = 1

A12 ∈ R, A15 = 0

Weighted case (parametric model with θ = (π,γ(1),γ(2)))

I Latent variables: idem

I Observations: ’weights’ Aij , where Aij = 0 orAij ∈ Rs \ {0},

I Conditional on the {Zi}’s, the random variables Aij areindependent with distribution

µZiZj (·) = γ(1)ZiZj

f(·, γ(2)ZiZj

) + (1− γ(1)ZiZj

)δ0(·)

(Assumption: f has continuous cdf at zero).

Page 13: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

SBM classification vs community detection

SBM classification

I Nodes classification induced by the model reflects acommon connectivity behaviour;

I Community detection methods focus on communities

I Toy example

SBM clusters Community detection or SBM

Page 14: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Particular cases and generalisations

Particular case: Affiliation model (planted partition)

γ =

α . . . β...

. . ....

β . . . α

(α� β =⇒ community detection)

Some generalisations

I Overlapping groups[Latouche et al.(2011), Airoldi et al.(2008)] for binarygraphs; SBM with covariates; Degree corrected SBM;. . .

I Latent block models (LBM), for array data or bipartitegraphs [Govaert and Nadif(2003)];

I Nonparametric SBM (graphon);

I Dynamic SBM

Page 15: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Overview of algorithms

Goal is MLE. Likelihood computation is untractable for n notsmall.

Parameter estimation

I em algorithm not feasible because latent variables are notindependent conditional on observed ones:P({Zi}i|{Aij}i,j) 6=

∏i P(Zi|{Aij}i,j)

I Alternatives:I Gibbs samplingI Variational approximation to em.I Ad-hoc methods: Composite likelihood or Moment methods

[Ambroise and M.(2012), Bickel et al.(2011)]; Degrees[Channarond et al.(2012)];

Page 16: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Variational approximation principle I

Log-likelihood decomposition

LA(θ) := logP(A;θ) = logP(A,Z;θ)− logP(Z|A;θ) and forany distribution Q on Z,

LA(θ) = EQ(logP(A,Z;θ)) +H(Q) +KL(Q‖P(Z|A;θ))

em principle

I e-step: maximise the quantity EQ(logP(A,Z;θ(t))) +H(Q)with respect to Q. This is equivalent to minimizingKL(Q‖P(Z|A;θ(t))) with respect to Q.

I m-step: keeping now Q fixed, maximize the quantityEQ(logP(A,Z;θ)) +H(Q) with respect to θ and updatethe parameter value θ(t+1) to this maximiser. This isequivalent to maximizing the conditional expectationEQ(logP(A,Z;θ)) w.r.t. θ.

Page 17: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Variational approximation principle II

Variational em

I e-step: search for an optimal Q within a restricted class Q,e.g. class of factorized distr.

Q(Z) =

n∏i=1

Q(Zi), Q? = argminQ∈Q

KL(Q‖P(Z|A;θ(t)))

I m-step: unchanged, i.e.θ(t+1) = argmaxθ EQ?(logP(A,Z;θ))

I A consequence of KL ≥ 0 is the lower bound

LA(θ) ≥ EQ(logP(A,Z;θ)) +H(Q)

So that the variational approximation consists inmaximizing a lower bound on the log-likelihood. Why does it

make sense ?

Page 18: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Model selection

How do we choose the number of groups K?

Frequentist setting

I Maximal likelihood is not available (thus neither AIC orBIC),

I ICL criterion is used [Daudin et al.(2008)] (no consistencyresult on that).

Bayesian setting

I MCMC approach to select number of LBM groups[Wyse and Friel(2012)].

I Exact ICL requires greedy search optimization[Come and Latouche(2015)]

Page 19: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

(Some) SBMs packages/codes

VEM implementations

I MixNet

http://www.math-evry.cnrs.fr/logiciels/mixnet is aC/C++ code and MixeR R package on the CRAN: for binarySBM, directed or not;

I OSBM R package R for Overlapping SBM,http://www.math-evry.cnrs.fr/logiciels/osbm

I Blockmodels R package binary/valued SBM, possibly withcovariates

Page 20: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Outline

Dynamic Random Graphs: the data

Graphs clustering: different approaches

The stochastic block model

Clustering dynamic networksClustering graphs sequencesClustering links streams (with no duration)

Page 21: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Outline

Dynamic Random Graphs: the data

Graphs clustering: different approaches

The stochastic block model

Clustering dynamic networksClustering graphs sequencesClustering links streams (with no duration)

Page 22: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Follow the groups through time

Label switching issue in the dynamic context

t = t1

1 2

3

4

5

6

7

8

9

10

t = t2

1 2

3

5

6

7

84

9

10

Page 23: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Follow the groups through time

Label switching issue in the dynamic context

t = t1

1 2

3

4

5

6

7

8

9

10

t = t2

1 2

3

5

6

7

84

9

10

If the 2 classifications are constructed independently, then it’simpossible to follow the groups evolution. It’s thus mandatoryto do a joint clustering of the graphs.

Page 24: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Dynsbm: a dynamic stochastic blockmodel

Model [M. & Miele(2017)]

I We simply combine a latent Markov chain with weightedSBMs;

I Our graphs may be directed or undirected, binary orweighted; some individuals can appear or disappear;

I Groups and model parameters may change through time;

I Careful discussion on identifiability conditions on themodel.

Inference

I VEM algorithm to infer the nodes groups across time andthe model parameters;

I Model selection criterion (ICL type) to select for thenumber of groups.

Page 25: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Dynamics: Markov chain on latent groups

Latent Markov chain

I Across individuals: (Zi)1≤i≤N iid,

I Across time: Each Zi = (Zti )1≤t≤T is a Markov chain on{1, . . . , Q} with transition π = (πqq′)1≤q,q′≤Q and initialstationary distribution α = (α1, . . . , αQ).

6 C. Matias and V. Miele

··· Zt−1 Zt Zt+1 ···

··· Y t−1 Y t Y t+1 ···

··· Zt−11 Zt

1 Zt+11

···

··· Zt−12 Zt

2 Zt+12

···

......

......

...

··· Zt−1N Zt

N Zt+1N

···

··· Y t−1 Y t Y t+1 ···

π π π π

π π π π

π

φt−1

π

φt

π

φt+1

π

Zt1 Zt

2 · · · Zti · · · Zt

j · · · ZtN−1 Zt

N

Y t12 · · · Y t

1N · · · Y tij · · · Y t

N−2,N−1 Y tN−1,N

Figure 1. Dependency structures of the model. Top: general view corresponding to hidden Markovmodel (HMM) structure; Middle: details on latent structure organisation corresponding to N differentiid Markov chains Zi = (Zt

i )1≤t≤T across individuals; Bottom: details for fixed time point t corre-sponding to SBM structure.

GoalInfer the parameter θ = (π,β,γ), recover the clusters {Zti}i,tand follow their evolution through time.

Page 26: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Application on ecological networks [Miele & M.(2017)] I

Ants dataset[Mersch et al.(2013)]

T=10, N=152

Selection of 3 social groups.

Low turnover : 47% of ants donot switch group.

No group switches betweengroups 1 and 2.

Page 27: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Application on ecological networks [Miele & M.(2017)]II

2 4 6 8 10

Index

● ● ●● ●

●●

●●

group 1 − 1

2 4 6 8 10

0.5

1.0

1.5

2.0

2.5

3.0

Index

conn

ec

● ● ●● ● ● ● ● ●

group 1 − 2

2 4 6 8 10

0.5

1.0

1.5

2.0

2.5

3.0

Index

conn

ec

●● ● ● ● ●

●● ● ●

group 1 − 3

2 4 6 8 10

Index

● ● ●● ● ● ● ● ●

group 2 − 1

2 4 6 8 10

0.5

1.0

1.5

2.0

2.5

3.0

Index

conn

ec

● ●●

● ●●group 2 − 2

2 4 6 8 10

0.5

1.0

1.5

2.0

2.5

3.0

Index

conn

ec

● ●● ●

● ●

● ● ●●

group 2 − 3

●● ● ● ● ●

●● ● ●

group 3 − 1

0.5

1.0

1.5

2.0

2.5

3.0

conn

ec

● ●● ●

● ●

● ● ●●

group 3 − 2

0.5

1.0

1.5

2.0

2.5

3.0

conn

ec

●●

● ●●

●● ●

group 3 − 3

Group 2: a community.Group 3: contacts with all ants fromany groups.Group 1: avoid contacts with group 2.

Perfect match with the threefunctional category groups: nurses,foragers and cleaners

nurses foragers cleaners

1 42 0 02 0 29 23 4 1 29

(75% of ants, staying at least 8/10 steps in same group)

Page 28: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Outline

Dynamic Random Graphs: the data

Graphs clustering: different approaches

The stochastic block model

Clustering dynamic networksClustering graphs sequencesClustering links streams (with no duration)

Page 29: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Longitudinal interaction networks = Stream links view

1

2

3

i

j

...

...

...

t1 t2 t3 . . . ts

Page 30: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Longitudinal interaction networks = point process view

0 Tt1 t2 t3 t4 t5 t6 t7

interactions between individuals i, j

interactions between individuals i, k

interactions between individuals k, l

I We observe a marked point process: the mark is a pair ofindividuals (i, j) that interact at time t.

I Goal: cluster the individuals i (not the processes Nij !)

Page 31: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

ppsbm: a dynamic point process SBM

Model characteristics [M., Rebafka, Villers(2018)]

I Pointwise interactions with no duration only; Individualsare always present;

I Groups are constant through time;

I Conditional on the latent groups Zi, Zj , the point processNij is a non-homogeneous point process with(nonparametric) intensity t 7→ αZi,Zj (t).

I Recover latent groups Z = (Z1, . . . , Zn) and estimate theintensities per groups pairs {α(q,l)(·)}1≤q<l≤Q with VEM

Inference characteristics

I Procedure is semi-parametric: intensities may either beestimated through histograms (with adaptive selection ofthe partition), or kernels.

I ICL to select the number of groups Q.

Page 32: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

London Santander cycles

Data

I Cycles journeys from the Santander cycles hiring stations:departure station, arrival station, time of journey start.

I 1st dataset from Wed. February 1st, 2012, with n = 415stations (=individuals), and M = 17 631 journeys (timepoints)

I 2nd dataset from Thursday February 2nd, 2012: n = 417stations, M = 16 333 journeys.

Model selection of the number of groups Q

ICL selects 6 groups for both days.

Page 33: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

London Santander cycles: geographical projection of theclusters

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

Clustering for 1st dataset.

Page 34: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

The smallest cluster x I

I Contains only 2 bike stations, located at Waterloo andKing’s Cross

I among the stations with highest activities

0

25

50

75

100

0:4

51

:30

2:1

5 33

:45

4:3

05

:15 6

6:4

57

:30

8:1

5 99

:45

10

:30

11

:15

12

12

:45

13

:30

14

:15

15

15

:45

16

:30

17

:15

18

18

:45

19

:30

20

:15

21

21

:45

22

:30

23

:15

24

Time

Ou

tgo

ing

co

un

ts

Kings Cross

0

20

40

60

0:4

51

:30

2:1

5 33

:45

4:3

05

:15 6

6:4

57

:30

8:1

5 99

:45

10

:30

11

:15

12

12

:45

13

:30

14

:15

15

15

:45

16

:30

17

:15

18

18

:45

19

:30

20

:15

21

21

:45

22

:30

23

:15

24

Time

Inco

min

g c

ou

nts

Kings Cross

0

50

100

150

0:4

51

:30

2:1

5 33

:45

4:3

05

:15 6

6:4

57

:30

8:1

5 99

:45

10

:30

11

:15

12

12

:45

13

:30

14

:15

15

15

:45

16

:30

17

:15

18

18

:45

19

:30

20

:15

21

21

:45

22

:30

23

:15

24

Time

Ou

tgo

ing

co

un

ts

Waterloo 3

0

25

50

75

100

0:4

51

:30

2:1

5 33

:45

4:3

05

:15 6

6:4

57

:30

8:1

5 99

:45

10

:30

11

:15

12

12

:45

13

:30

14

:15

15

15

:45

16

:30

17

:15

18

18

:45

19

:30

20

:15

21

21

:45

22

:30

23

:15

24

Time

Inco

min

g c

ou

nts

Waterloo 3

Barplots of outgoing(Ni·(·)) and incoming(N·i(·)) processes fromthe 2 stations i in thesmallest cluster: volumesof connections to all otherstations during day 1.

The cluster is composedof ’outgoing’ stations inthe morning and ’ingoing’stations in the evening.

Page 35: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

The smallest cluster x II

I Stations close to Victoria and Liverpool Street stations alsohave high activity but not the same temporal profile sothey cluster differently,

I This cluster x is due to a specific temporal profile, thatwould not be captured through a snapshot approach.

I The cluster has strong connections with cluster � thatcorresponds to business city center.

Page 36: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Conclusions

Dynamic modeling of interactions is still in its earlydevelopments, lot of things to improve.

Page 37: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

Thank you for your attention !

Page 38: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

References I

E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing.Mixed-membership stochastic blockmodels.Journal of Machine Learning Research, 9:1981–2014, 2008.

C. Ambroise and C. Matias.New consistent and asymptotically normal parameterestimates for random graph mixture models.JRSSB, 74(1):3–35, 2012.

P. Bickel, A. Chen and E. LevinaThe method of moments and degree distributions fornetwork modelsAnn. Statist., 39(5):2280—2301, 2011.

A. Channarond, J.-J. Daudin, and S. Robin.Classification and estimation in the Stochastic Blockmodelbased on the empirical degrees.Electron. J. Statist., 6:2574—2601, 2012.

Page 39: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

References II

E. Come and P. Latouche.Model selection and clustering in stochastic block modelsbased on the exact integrated complete data likelihood.Statistical Modelling, 2015.

J.-J. Daudin, F. Picard, and S. Robin.A mixture model for random graphs.Stat. Comput., 18(2):173–183, 2008.

N. Gaumont.Groupes et communautes dans les flots de liens : desdonnees aux algorithmes.PhD thesis, Universite Pierre et Marie Curie, 2016.

G. Govaert and M. Nadif.Clustering with block mixture models.Pattern Recognition, 36(2):463–473, 2003.

Page 40: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

References III

Hoff, P., A. Raftery, and M. Handcock (2002).Latent space approaches to social network analysis.J. Amer. Statist. Assoc. 97 (460), 1090–98.

P. Holme.Modern temporal network theory: a colloquium.Eur. Phys. J. B, 88(9):234, 2015.

Matthieu Latapy, Tiphaine Viard, Clemence MagnienStream Graphs and Link Streams for the Modeling ofInteractions over TimearXiv:1710.04073

P. Latouche, E. Birmele, and C. Ambroise.Overlapping stochastic block models with application to theFrench political blogosphere.Ann. Appl. Stat., 5(1):309–336, 2011.

Page 41: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

References IV

C. Matias and V. Miele.Statistical clustering of temporal networks through adynamic stochastic block model.JRSSB, 79(4), 1119–1141, 2017

C. Matias, T. Rebafka, and F. Villers.A semiparametric extension of the stochastic block modelfor longitudinal networks.To appear in Biometrika, 2018.

D. P. Mersch, A. Crespi, and L. Keller.Tracking individuals shows spatial fidelity is a key regulatorof ant social organization.Science, 340(6136):1090–1093, 2013.

Page 42: Clustering dynamic random graphscmatias.perso.math.cnrs.fr/Docs/Matias_SFDS_2018.pdfOutline Dynamic Random Graphs: the data Graphs clustering: di erent approaches The stochastic block

References V

V. Miele and C. Matias.Revealing the hidden structure of dynamic ecologicalnetworks.Royal Society Open Science, 4(6), 170251, 2017

Vu, Hunter & SchweinbergerModel-based clustering of large networksThe Annals of Applied Statistics 7(2), 1010–1039, (2013).

J. Wyse and N. Friel.Block clustering with collapsed latent block models.Statistics and Computing, 22(2):415–428, 2012.