Consistent estimation of Mixed Memberships withSuccessive Projections
Maxim Panovjoint work with E. Marshakov, R. Ushakov and N. Mokrov
Skoltech and IITP
15.05.2018
Community detectionProblem statement
Graph G (E ,V ):I nodes vj ;I edges Aij .
Problem: we want to partition graph in such a way that there are few edgesbetween groups.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 2 / 31
Community detectionOverlapping communities
Non-overlapping vs. overlapping communities
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 3 / 31
Graph modelsErdos-Renyi graph
Simplest possible random graph model
Aij = Bernoulli(p),
where Aij are independent and p ∈ [0, 1].
Figure: Erdos-Renyi graph with p = 0.5.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 4 / 31
Graph modelsErdos-Renyi graph
Simplest possible random graph model
Aij = Bernoulli(p),
where Aij are independent and p ∈ [0, 1].
Figure: Erdos-Renyi graph with p = 0.5.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 4 / 31
Graph modelsGeneralized Erdos-Renyi graph
Simple generalization of Erdos-Renyi model:
Aij = Bernoulli(pij),
where pij ∈ [0, 1].
In a matrix form we can write
A ∼ Bernoulli(P),
where P = {pij}ni,j=1.
Question: what types of matrix P allow for community structure?
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 5 / 31
Graph modelsStochastic block model (SBM)
Figure: Example of stochastic block model and corresponding graph.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 6 / 31
Graph modelsMixed membership stochastic block model (MMSB)
Graph edges are generated according generalized Erdos-Renyi model:
A ∼ Bernoulli(P).
The probability matrix P can be factorized as
P = ΘBΘT,
where
B ∈ [0, 1]K×K is a symmetric matrix of community-community probabilities;
Θ ∈ [0, 1]n×K is a community membership matrix.
ConditionWe assume that
1 Every row of matrix Θ sums to 1:∑K
k=1 𝜃ik = 1, i = 1, . . . , n;
2 (optional) All the community membership vectors are independent draws fromDirichlet distribution, i.e. 𝜃i ∼ Dirichlet(𝛼) for some 𝛼 ∈ RK
+, i = 1, . . . , n.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 7 / 31
Graph modelsMMSB examples
As discussed, in MMSB model the probability matrix is
P = ΘBΘT.
It means that
pij =K∑
k,l=1
𝜃ik𝜃jlbkl .
SBM is particular case of MMSB with the property that for any i ∈ 1, n thereexists k ∈ 1,K such that
𝜃ik = 1 and 𝜃il = 0, k = l
leading to
pij = bkl
for any i , j = 1, . . . n and some k = k(i), l = l(j).Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 8 / 31
Graph modelsIdentifiability of MMSB
Problem: If our goal is estimation of parameters Θ and B, whether the truevalues are unique?
Answer: Of course not, for example
then
P(1) = M1 I3 M1T = I3 M2 I3 = P(2),
where I3 is an identity matrix of size 3.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 9 / 31
Graph modelsIdentifiability of MMSB
Condition (Identifiability)
1 There is at least one “pure” node at each community, i.e. for eachk = 1, . . . ,K there exists i such that 𝜃ik =
∑Kl=1 𝜃il = 1.
2 Matrix B ∈ [0, 1]K×K is full rank.
3 Every row of matrix Θ sums to 1:∑K
k=1 𝜃ik = 1, i = 1, . . . , n.
Theorem
If the Condition (Identifiability) is satisfied then the MMSB is identifiable, i.e. forevery P = ΘBΘT matrices Θ and B are uniquely defined up to permutation ofcommunities (columns of matrix Θ and rows and columns of matrix B).
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 10 / 31
Algorithms for parameter estimation in MMSB
There exist several algorithms for parameter estimation in MMSB:
stochastic variational inference (Airoldi at al., 2009; SVI);
tensor spectral method (Anandkumar et al., 2013; Tensor);
geometrical nonnegative matrix factorization (Mao et al., 2013; GeoNMF).
Problems of these methods:
absence of provable guarantees (SVI);
high computational complexity (SVI, Tensor);
applicability only to limited subclass of MMSB (GeoNMF).
Recently, couple of algorithms were proposed (SPACL by Mao et al. andMixed-SCORE by Jin et al.), which are based on the ideas very similar to ours.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 11 / 31
Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix
To account for sparsity:
P = 𝜌ΘBΘT
where 𝜌 > 0 is a sparsity parameter and we restrict maxk,l Bk,l = 1.
Spectral decomposition of probability matrix (exact):
P = ULUT,
We can conclude that
U = ΘF,
where F ∈ RK×K is some full rank matrix.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 12 / 31
Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix
We can proceed with decomposition
U = ΘF.
Importantly, rows ui of matrix U lie in simplex:
−0.125 −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050
−0.100
−0.075
−0.050
−0.025
0.000
0.025
0.050
0.075
0.100
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 13 / 31
Successive projection overlapping clustering (SPOC)Successive projection algorithm
Question: How to detect simplex?
Answer: Successive projection algorithm (Araujo et al., 2001; Gillis and Vavasis,2014):
1 Find the point with the maximal norm: j* = arg maxj ‖uj‖.
2 fj = uj* .
3 U = U(I − fTj fj
‖fj‖2
).
4 Iterate
The final output is matrix F =(fj)Kj=1
.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 14 / 31
Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix
Spectral decomposition of probability matrix (approximate):
A ≃ ULUT,
where L ∈ RK×K is diagonal matrix of top-K eigenvalues and U ∈ Rn×K is matrixof corresponding eigenvalues.
Similarly,
U = ΘF + N,
where F ∈ RK×K is some full rank matrix.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 15 / 31
Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix
Importantly, rows ui of matrix U approximately lie in simplex:
−0.10 −0.05 0.00 0.05 0.10
−0.10
−0.05
0.00
0.05
0.10
So, we can compute estimate F of matrix F by SPA algorithm.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 16 / 31
Successive projection overlapping clusteringResulting estimates
Estimate of the community-community matrix:
B = FLFT.
Estimate of community membership matrix:
Θ = UF−1.
Question: What about the efficiency of estimates?
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 17 / 31
Successive projection overlapping clustering (SPOC)
Algorithm 1 SPOC
Require: Adjacency matrix A and number of communities K .Ensure: Estimated 𝜌, Θ, B.
1: Get the rank-K eigenvalue decomposition A ≃ ULUT.2: Run SPA algorithm with input U, which outputs set of indices J of cardinality
K .3: F = [J, :].4: B = FLFT.5: 𝜌 = maxij Bij .
6: B = 1𝜌 B.
7: Θ = UF−1.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 18 / 31
Provable efficiencyDavis-Kahan theorem
Lemma (Variant of Davis-Kahan)
Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest nonzerosingular value 𝜆K (P).
Let A be any symmetric matrix and U,U ∈ Rn×K be the K leadingeigenvectors of A and P, respectively.
Then there exists a K × K orthogonal matrix OP such that
‖U−UOP‖F ≤ 2√
2K‖A− P‖𝜆K (P)
.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 19 / 31
Provable efficiencyConcentration in spectral norm
Lemma (Lei and Rinaldo, 2015)
Let A be the adjacency matrix of a random graph on n nodes in which edgesoccur independently.
Set E[A] = P = (pij)i,j=1,...,n and assume that nmaxij pij ≤ d for d ≥ c0 log nand c0 > 0.
Then, for any r > 0 there exists a constant C = C (r , c0) such that
‖A− P‖ ≤ C√d
with probability at least 1 − n−r .
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 20 / 31
Provable efficiencyQuality of SPA
Theorem (Gillis and Vavasis, 2014)
Let G = FW and G = G + N. Suppose that K ≥ 2 and the Condition 2 issatisfied. If in matrix N each column ni satisfies ‖ni‖F ≤ 𝜀 with
𝜀 ≤ 𝜆min(F)
1225√r,
then SPA algorithm with the input (G, r) returns the set of indices J such thatthere exists a permutation 𝜋 which gives
‖gJ(j) − f𝜋(j)‖2 ≤ (432𝜅(F) + 4)𝜀
for all j = 1, . . . , r , where gk and fk are the columns of matrices G and F
correspondingly. Here we denote by 𝜅(F) = 𝜆max (F)𝜆min(F)
is the condition number of the
matrix F.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 21 / 31
Provable efficiencyBeyond Davis-Kahan
Lemma (Panov et al., 2017)
Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest non-zerosingular value 𝜆K (P).
Let A be any symmetric matrix such that ‖A− P‖ ≤ 12𝜆K (P) and U,U are
the n × K matrices of eigenvectors for matrices A and P corresponding totop-K eigenvalues.
Then
‖eTi (U−UOP)‖F ≤ 23K 1/2𝜅(P)‖eTi A‖F · ‖A− P‖
𝜆2K (P)
+‖eTi (A− P)U‖F
𝜆K (P),
where ei is a vector of length n with 1 in the i-th position and OP is someorthogonal matrix.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 22 / 31
Provable efficiencyFinal theorem
Theorem (Panov et al., 2017)
There exist constants c and C depending only on the condition numbers of thematrices B and Θ and parameter r such that for 𝜌 ≥ c log n
n it holds with aprobability at least 1 − n−r that
𝜌B− 𝜌ΠBΠTF
‖𝜌B‖F≤ CK
√log n
𝜌n
and Θ−ΘΠT
F
‖Θ‖F≤ CK
√log n
𝜌n,
where Π is some permutation matrix and 𝜌 is maximal value in matrix B.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 23 / 31
Provable efficiencyLower bound
Theorem
Consider the MMSB model. Then there exists a constant c > 0 that for𝜌 ≥ c log n
n the following lower bounds for matrices Θ, B hold
infΘ
supΘ∈Θn,K
P
(‖Θ−Θ‖F
‖Θ‖F≥ CΘ
1√𝜌n
)> 0.1,
infB
supB
P
(‖𝜌B− 𝜌B‖F
‖𝜌B‖F≥ CB
1
𝜌n
)> 0.1,
where CΘ,CB > 0 are some constants.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 24 / 31
Provable efficiencyOpen question
We currently have the gap between lower and upper bounds form matrix B:
c1
𝜌n≤ inf
BsupB
‖𝜌B− 𝜌B‖F‖𝜌B‖F
≤ C1
√𝜌n
.
The idea for improved algorithm:
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 25 / 31
ExperimentsModel data
Default parameter settings:
number of nodes n = 5000;
number of communities K = 3;
pure nodes number 3;
Dirichlet parameter 𝛼 = 1/3;
Community-community matrix B = diag(0.3, 0.5, 0.7).
We consider several experiments.Each experiment was repeated 20 times and results were averaged over runs.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 26 / 31
ExperimentsModel data
Figure: Experiment with varying number of nodes n.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 27 / 31
ExperimentsModel data
Figure: Experiment with noisy off-diagonal elements of B.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 28 / 31
ExperimentsModel data
Figure: Experiment with skewed B matrix.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 29 / 31
ExperimentsReal data
Figure: Experiments on DBLP co-authorship networks.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 30 / 31
Conclusions and outlook
Conclusions:
We proposed the algorithm SPOC for parameter estimation in MMSB whichis computationally efficient.
Theoretical guarantees on performance are provided.
The algorithm is still not perfect as well as analysis.
Outlook:It is interesting to extend the results to the cases of
dynamical networks;
multiplex networks.
Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 31 / 31