Transcript

Consistent estimation of Mixed Memberships withSuccessive Projections

Maxim Panovjoint work with E. Marshakov, R. Ushakov and N. Mokrov

Skoltech and IITP

15.05.2018

Community detectionProblem statement

Graph G (E ,V ):I nodes vj ;I edges Aij .

Problem: we want to partition graph in such a way that there are few edgesbetween groups.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 2 / 31

Community detectionOverlapping communities

Non-overlapping vs. overlapping communities

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 3 / 31

Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 4 / 31

Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 4 / 31

Graph modelsGeneralized Erdos-Renyi graph

Simple generalization of Erdos-Renyi model:

Aij = Bernoulli(pij),

where pij ∈ [0, 1].

In a matrix form we can write

A ∼ Bernoulli(P),

where P = {pij}ni,j=1.

Question: what types of matrix P allow for community structure?

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 5 / 31

Graph modelsStochastic block model (SBM)

Figure: Example of stochastic block model and corresponding graph.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 6 / 31

Graph modelsMixed membership stochastic block model (MMSB)

Graph edges are generated according generalized Erdos-Renyi model:

A ∼ Bernoulli(P).

The probability matrix P can be factorized as

P = ΘBΘT,

where

B ∈ [0, 1]K×K is a symmetric matrix of community-community probabilities;

Θ ∈ [0, 1]n×K is a community membership matrix.

ConditionWe assume that

1 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n;

2 (optional) All the community membership vectors are independent draws fromDirichlet distribution, i.e. 𝜃i ∼ Dirichlet(𝛼) for some 𝛼 ∈ RK

+, i = 1, . . . , n.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 7 / 31

Graph modelsMMSB examples

As discussed, in MMSB model the probability matrix is

P = ΘBΘT.

It means that

pij =K∑

k,l=1

𝜃ik𝜃jlbkl .

SBM is particular case of MMSB with the property that for any i ∈ 1, n thereexists k ∈ 1,K such that

𝜃ik = 1 and 𝜃il = 0, k = l

leading to

pij = bkl

for any i , j = 1, . . . n and some k = k(i), l = l(j).Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 8 / 31

Graph modelsIdentifiability of MMSB

Problem: If our goal is estimation of parameters Θ and B, whether the truevalues are unique?

Answer: Of course not, for example

then

P(1) = M1 I3 M1T = I3 M2 I3 = P(2),

where I3 is an identity matrix of size 3.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 9 / 31

Graph modelsIdentifiability of MMSB

Condition (Identifiability)

1 There is at least one “pure” node at each community, i.e. for eachk = 1, . . . ,K there exists i such that 𝜃ik =

∑Kl=1 𝜃il = 1.

2 Matrix B ∈ [0, 1]K×K is full rank.

3 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n.

Theorem

If the Condition (Identifiability) is satisfied then the MMSB is identifiable, i.e. forevery P = ΘBΘT matrices Θ and B are uniquely defined up to permutation ofcommunities (columns of matrix Θ and rows and columns of matrix B).

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 10 / 31

Algorithms for parameter estimation in MMSB

There exist several algorithms for parameter estimation in MMSB:

stochastic variational inference (Airoldi at al., 2009; SVI);

tensor spectral method (Anandkumar et al., 2013; Tensor);

geometrical nonnegative matrix factorization (Mao et al., 2013; GeoNMF).

Problems of these methods:

absence of provable guarantees (SVI);

high computational complexity (SVI, Tensor);

applicability only to limited subclass of MMSB (GeoNMF).

Recently, couple of algorithms were proposed (SPACL by Mao et al. andMixed-SCORE by Jin et al.), which are based on the ideas very similar to ours.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 11 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

To account for sparsity:

P = 𝜌ΘBΘT

where 𝜌 > 0 is a sparsity parameter and we restrict maxk,l Bk,l = 1.

Spectral decomposition of probability matrix (exact):

P = ULUT,

We can conclude that

U = ΘF,

where F ∈ RK×K is some full rank matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 12 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

We can proceed with decomposition

U = ΘF.

Importantly, rows ui of matrix U lie in simplex:

−0.125 −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050

−0.100

−0.075

−0.050

−0.025

0.000

0.025

0.050

0.075

0.100

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 13 / 31

Successive projection overlapping clustering (SPOC)Successive projection algorithm

Question: How to detect simplex?

Answer: Successive projection algorithm (Araujo et al., 2001; Gillis and Vavasis,2014):

1 Find the point with the maximal norm: j* = arg maxj ‖uj‖.

2 fj = uj* .

3 U = U(I − fTj fj

‖fj‖2

).

4 Iterate

The final output is matrix F =(fj)Kj=1

.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 14 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Spectral decomposition of probability matrix (approximate):

A ≃ ULUT,

where L ∈ RK×K is diagonal matrix of top-K eigenvalues and U ∈ Rn×K is matrixof corresponding eigenvalues.

Similarly,

U = ΘF + N,

where F ∈ RK×K is some full rank matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 15 / 31

Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Importantly, rows ui of matrix U approximately lie in simplex:

−0.10 −0.05 0.00 0.05 0.10

−0.10

−0.05

0.00

0.05

0.10

So, we can compute estimate F of matrix F by SPA algorithm.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 16 / 31

Successive projection overlapping clusteringResulting estimates

Estimate of the community-community matrix:

B = FLFT.

Estimate of community membership matrix:

Θ = UF−1.

Question: What about the efficiency of estimates?

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 17 / 31

Successive projection overlapping clustering (SPOC)

Algorithm 1 SPOC

Require: Adjacency matrix A and number of communities K .Ensure: Estimated 𝜌, Θ, B.

1: Get the rank-K eigenvalue decomposition A ≃ ULUT.2: Run SPA algorithm with input U, which outputs set of indices J of cardinality

K .3: F = [J, :].4: B = FLFT.5: 𝜌 = maxij Bij .

6: B = 1𝜌 B.

7: Θ = UF−1.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 18 / 31

Provable efficiencyDavis-Kahan theorem

Lemma (Variant of Davis-Kahan)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest nonzerosingular value 𝜆K (P).

Let A be any symmetric matrix and U,U ∈ Rn×K be the K leadingeigenvectors of A and P, respectively.

Then there exists a K × K orthogonal matrix OP such that

‖U−UOP‖F ≤ 2√

2K‖A− P‖𝜆K (P)

.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 19 / 31

Provable efficiencyConcentration in spectral norm

Lemma (Lei and Rinaldo, 2015)

Let A be the adjacency matrix of a random graph on n nodes in which edgesoccur independently.

Set E[A] = P = (pij)i,j=1,...,n and assume that nmaxij pij ≤ d for d ≥ c0 log nand c0 > 0.

Then, for any r > 0 there exists a constant C = C (r , c0) such that

‖A− P‖ ≤ C√d

with probability at least 1 − n−r .

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 20 / 31

Provable efficiencyQuality of SPA

Theorem (Gillis and Vavasis, 2014)

Let G = FW and G = G + N. Suppose that K ≥ 2 and the Condition 2 issatisfied. If in matrix N each column ni satisfies ‖ni‖F ≤ 𝜀 with

𝜀 ≤ 𝜆min(F)

1225√r,

then SPA algorithm with the input (G, r) returns the set of indices J such thatthere exists a permutation 𝜋 which gives

‖gJ(j) − f𝜋(j)‖2 ≤ (432𝜅(F) + 4)𝜀

for all j = 1, . . . , r , where gk and fk are the columns of matrices G and F

correspondingly. Here we denote by 𝜅(F) = 𝜆max (F)𝜆min(F)

is the condition number of the

matrix F.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 21 / 31

Provable efficiencyBeyond Davis-Kahan

Lemma (Panov et al., 2017)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest non-zerosingular value 𝜆K (P).

Let A be any symmetric matrix such that ‖A− P‖ ≤ 12𝜆K (P) and U,U are

the n × K matrices of eigenvectors for matrices A and P corresponding totop-K eigenvalues.

Then

‖eTi (U−UOP)‖F ≤ 23K 1/2𝜅(P)‖eTi A‖F · ‖A− P‖

𝜆2K (P)

+‖eTi (A− P)U‖F

𝜆K (P),

where ei is a vector of length n with 1 in the i-th position and OP is someorthogonal matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 22 / 31

Provable efficiencyFinal theorem

Theorem (Panov et al., 2017)

There exist constants c and C depending only on the condition numbers of thematrices B and Θ and parameter r such that for 𝜌 ≥ c log n

n it holds with aprobability at least 1 − n−r that

𝜌B− 𝜌ΠBΠTF

‖𝜌B‖F≤ CK

√log n

𝜌n

and Θ−ΘΠT

F

‖Θ‖F≤ CK

√log n

𝜌n,

where Π is some permutation matrix and 𝜌 is maximal value in matrix B.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 23 / 31

Provable efficiencyLower bound

Theorem

Consider the MMSB model. Then there exists a constant c > 0 that for𝜌 ≥ c log n

n the following lower bounds for matrices Θ, B hold

infΘ

supΘ∈Θn,K

P

(‖Θ−Θ‖F

‖Θ‖F≥ CΘ

1√𝜌n

)> 0.1,

infB

supB

P

(‖𝜌B− 𝜌B‖F

‖𝜌B‖F≥ CB

1

𝜌n

)> 0.1,

where CΘ,CB > 0 are some constants.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 24 / 31

Provable efficiencyOpen question

We currently have the gap between lower and upper bounds form matrix B:

c1

𝜌n≤ inf

BsupB

‖𝜌B− 𝜌B‖F‖𝜌B‖F

≤ C1

√𝜌n

.

The idea for improved algorithm:

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 25 / 31

ExperimentsModel data

Default parameter settings:

number of nodes n = 5000;

number of communities K = 3;

pure nodes number 3;

Dirichlet parameter 𝛼 = 1/3;

Community-community matrix B = diag(0.3, 0.5, 0.7).

We consider several experiments.Each experiment was repeated 20 times and results were averaged over runs.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 26 / 31

ExperimentsModel data

Figure: Experiment with varying number of nodes n.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 27 / 31

ExperimentsModel data

Figure: Experiment with noisy off-diagonal elements of B.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 28 / 31

ExperimentsModel data

Figure: Experiment with skewed B matrix.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 29 / 31

ExperimentsReal data

Figure: Experiments on DBLP co-authorship networks.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 30 / 31

Conclusions and outlook

Conclusions:

We proposed the algorithm SPOC for parameter estimation in MMSB whichis computationally efficient.

Theoretical guarantees on performance are provided.

The algorithm is still not perfect as well as analysis.

Outlook:It is interesting to extend the results to the cases of

dynamical networks;

multiplex networks.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 31 / 31


Recommended