25
Exact recovery in the Ising blockmodel Quentin Berthet Optimization and Statistical Learning - Les Houches - 2017

Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

  • Upload
    doannga

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Exact recovery in the Ising blockmodel

Quentin Berthet

Optimization and Statistical Learning - Les Houches - 2017

Page 2: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

P. Rigollet (MIT) P. Srivastava (Caltech → Tata Inst.)

• Exact recovery in the Ising blockmodel

Q. Berthet, P.R, and P. Srivastava

arxiv.org/abs/1612.03880

Page 3: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Motivation

• Finding communities in populations, based on similar behavior and influence.

• One of the justifications for stochastic blockmodels

• What if we observe the behavior, not the graph?

Q.Berthet - Les Houches - OSL 2017 3/24

Page 4: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Motivation

SS SS

Model with p individuals, σ ∈ −1, 1p and balanced communities (S, S).

PS(σ) = ︸ ︷︷ ︸?

.

Q.Berthet - Les Houches - OSL 2017 4/24

Page 5: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Motivation

SS SS

/p/p

/p/p

↵/p↵/p

Model with p individuals, σ ∈ −1, 1p and balanced communities (S, S).

PS(σ) =1

Zα,βexp

[ β2p

∑i∼j

σiσj +α

2p

∑ij

σiσj

].

Q.Berthet - Les Houches - OSL 2017 5/24

Page 6: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Motivation

SS SS

/p/p

/p/p

↵/p↵/p

Model with p individuals, σ ∈ −1, 1p and balanced communities (S, S).

PS(σ) =1

Zα,βexp

[ β2p

∑i∼j

σiσj +α

2p

∑ij

σiσj

].

Q.Berthet - Les Houches - OSL 2017 6/24

Page 7: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Problem description

Ising blockmodel:

PS(σ) =1

Zα,βexp

[ β2p

∑i∼j

σiσj +α

2p

∑ij

σiσj

]=

1

Zα,βexp

(−HS,α,β(σ)

).

Energy decreases (probability increases) with more agreement inside each block.

• Blockmodel: PS(σi = σj) =

b for all i ∼ ja for all i j

• Balance: |S| = |S| = p/2 ,

• Homophily: β > 0⇔ b > 1/2 ,

• Assortativity: β > α⇔ b > a .

Observations: σ(1), . . . , σ(n) ∈ −1, 1p i.i.d. from PS.

Objective: recover the balanced partition (S, S) from observations.

Q.Berthet - Les Houches - OSL 2017 7/24

Page 8: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Stochastic blockmodels

• one observation of random graph onp vertices

P(i↔ j) =

b for all i ∼ ja for all i j

• Exact recovery using SDP iff

a = alog p

p, b = b

log p

p

and

(a + b)/2 > 1 +√

ab

Abbe, Bandeira, Hall ’14Hajek, Wu ’16

Wigner matrices

Graphical models / MRF

• n observations σ(1), . . . , σ(n) i.i.d.

P(σ) ∝ exp[ β2p

∑i,j

Jijσiσj

]

• Goal estimating sparse J = Jij(max degree d)

• Sample complexity n 2d log p

Chow-Liu ’68Bresler, Mossel, Sly ’08Santhanam, Wainwright ’12Bresler ’15Vuffray, Misra, Lokhov, Chertkov ’16

Wishart matrices

Q.Berthet - Les Houches - OSL 2017 8/24

Page 9: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Problem overview

• Structure of the problem visible in the covariance matrix Σ

Σ = E[σσ>] =

(∆ ΩΩ ∆

)+ (1−∆)Ip .

• Difficulty of the problem related with the values of quantities ∆,Ω ∈ (−1, 1)

∆ = 2b− 1 , Ω = 2a− 1 .

• Parallel with the stochastic block model on graphs with independent edges

• Main difficulty of the analysis: Scaling of ∆− Ω with p?

Q.Berthet - Les Houches - OSL 2017 9/24

Page 10: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Maximum likelihood estimation

• Log-likelihood Ln(S) = −n logZα,β +n

2Tr[ΣQS]

• Maximum likelihood estimator:

V ∈ argmaxV ∈P

Tr[ΣV ] , where P = vv> : v ∈ −1, 1p, v>1[p] = 0 .

• Define Γ = PΣP and Γ = P ΣP , for a projector P on the orthogonal of 1:

Γ = (1−∆)P + p∆− Ω

2uSu

>S , uS =

1√p

(1S − 1S)

• For all V ∈ P, Tr[ΓV ] = Tr[ΣV ], so equivalently

V ∈ argmaxV ∈P

Tr[ΓV ]

Q.Berthet - Les Houches - OSL 2017 10/24

Page 11: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

SDP relaxation

V ∈ argmaxV ∈P

Tr[ΓV ] , where P = vv> : v ∈ −1, 1p, v>1[p] = 0 .

NP-Hard (Min bisection)

• Semidefinite convex relaxation of P

E = V : diag(V ) = 1, V 0 .

Change of variable V = vv>

MAXCUT Goemans-Williamson (95)

• Point V solution of maxV ∈CTr[ΓV ]equivalent to Γ ∈ NC(V )

• Relaxation is tight for populationmatrix Γ: V = VS if n =∞.

VSVS

PP

NP(VS)NP(VS)

Q.Berthet - Les Houches - OSL 2017 11/24

Page 12: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

SDP relaxation

V ∈ argmaxV ∈P

Tr[ΓV ] , where P = vv> : v ∈ −1, 1p, v>1[p] = 0 .

NP-Hard (Min bisection)

• Semidefinite convex relaxation of P

E = V : diag(V ) = 1, V 0 .

Change of variable V = vv>

MAXCUT Goemans-Williamson (95)

• Point V solution of maxV ∈CTr[ΓV ]equivalent to Γ ∈ NC(V )

• Relaxation is tight for populationmatrix Γ: V = VS if n =∞.

VSVS

EE

PP

NE(VS)NE(VS)NP(VS)NP(VS)

Q.Berthet - Les Houches - OSL 2017 11/24

Page 13: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Exact recovery

• Explicit description of the normal cone to VS in the case of relaxation on E

V = VS ⇐⇒ Γ ∈ NE(VS) ⇐⇒ LS(Γ) := diag(ΓvSv>S )− Γ 0 .

• Result involves a ‘signed’ laplacian matrix

LS(C) := diag(CvSv>S )− C .

• In the population case (n =∞), we note that

LS(Γ) = (1−∆)1[p]√p

1>[p]√p

+ p∆− Ω

2(Ip − uSu>S ) 0 .

• Finite sample case:

‖LS(Γ− Γ)‖op ≤ p∆− Ω

2=⇒ LS(Γ) 0 =⇒ V = VS .

Q.Berthet - Les Houches - OSL 2017 12/24

Page 14: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Exact recovery

• Upper bound: we have V = VS with probability 1− δ for

n &1

Cα,β

log(p/δ)

∆− Ω,

by bounding ‖LS(Γ− Γ)‖op, a sum of independent matrices. Tropp 12

• Matching lower bound: Fano’s inequality yields

n ≤ γ

β − αlog(p/4)

∆− Ω=⇒ P(recovery) . γ

• Full understanding of the scaling of ∆− Ω needed.

Q.Berthet - Les Houches - OSL 2017 13/24

Page 15: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

The Curie-Weiss model (α = β) Σ =

(∆ ∆∆ ∆

)+ (1−∆)Ip

• Mean magnetization: µ = 1>σp ∈ [−1, 1]. Observe that

∆ ≈ 1

p2

p∑i,j=1

E[σiσj]−1

p= E[µ2]− 1

p≈ E[µ2]

• Free energy: µ is a sufficient statistic

Pβ(µ) ≈ 1

Zβexp

(− p

4gCWβ (µ)

), gCW

β (µ) = −2βµ2 + 4h(1 + µ

2

)• Ground states: Minimizers G ⊂ [−1, 1] of gCW

β (µ).

• Concentration: µ ≈ ground state with exponentially large probability so

∆ ≈ E[µ2] ≈ 1

|G|∑s∈G

s2 .

Q.Berthet - Les Houches - OSL 2017 14/24

Page 16: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Free energy of the Curie-Weiss model

Ground states G = µ(β),−µ(β), µ(β) ≥ 0:

-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0

β = 0 β = 1 β = 1.2 β = 1.8

0.0 0.5 1.0 1.5 2.0 2.50.0

0.2

0.4

0.6

0.8

1.0

∆ ≈ 1

|G|∑s∈G

s2 = µ(β)2

β 7→ µ(β)

Q.Berthet - Les Houches - OSL 2017 15/24

Page 17: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Free energy of the Ising blockmodel

• Energy function of the mean magnetizations: (µS, µS) =2

p

(1>Sσ,1

>Sσ)

PS(σ) =1

Zα,βexp

(− p

8

(− βµ2

S − βµ2S − 2αµS µS

))

• Marginal: number of configurations with magnetizations µ is( (p/2)

1+µ2 (p/2)

)PS(µS, µS) ≈ 1

Zα,βexp

(− p

8gα,β(µS, µS)

)where gα,β is the free energy defined by

gα,β(µS, µS) = −βµ2S − βµ2

S − 2αµS µS + 4h(1 + µS

2

)+ 4h

(1 + µS2

).

Q.Berthet - Les Houches - OSL 2017 16/24

Page 18: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

The Ising blockmodel model (α < β) Σ =

(∆ ΩΩ ∆

)+ (1−∆)Ip

• Block magnetizations: µS = 1S>σp/2 , µS =

1S>σp/2 ∈ [−1, 1]. Observe that

∆ ≈ 2

p2

∑i∼j

E[σiσj] ≈1

2E[µ2

S + µ2S] and Ω =

2

p2

∑ij

E[σiσj] = E[µS µS]

• Free energy: (µS, µS) ∈ [−1, 1]2 is a sufficient statistic

PS(µS, µS) ≈ 1

Zα,βexp

(− p

8gα,β(µS, µS)

)• Ground states: Minimizers G ⊂ [−1, 1]2 of gCW

α,β(µS, µS).

• Concentration: (µS, µS) ≈ ground states with exp. large probability so

∆− Ω ≈ 1

2E[(µS − µS)2] ≈ 1

|G|∑s∈G

(s1 − s2)2.

Q.Berthet - Les Houches - OSL 2017 17/24

Page 19: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Ground states for the Ising blockmodel

gα,β(µS, µS) = −βµ2S − βµ2

S − 2αµS µS + 4h(1 + µS

2

)+ 4h

(1 + µS2

)

α = −6 α = −2.5 α = −0.5 α = 0

Ground states on the skew-diagonal (µS = −µS) for α ≤ 0 and fixed β = 1.5 < 2

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

0.2

0.4

0.6

0.8

1.0

α 7→ µS(α, β = 1.5)

Q.Berthet - Les Houches - OSL 2017 18/24

Page 20: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Ground states for the Ising blockmodel

gα,β(µS, µS) = −βµ2S − βµ2

S − 2αµS µS + 4h(1 + µS

2

)+ 4h

(1 + µS2

)

α = −4 α = −0.9 α = −0.2 α = 0

Ground states on the skew-diagonal (µS = −µS) for α ≤ 0 and fixed β = 2.5 > 2

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

0.2

0.4

0.6

0.8

1.0

α 7→ µS(α, β) β = 2.5

Q.Berthet - Les Houches - OSL 2017 19/24

Page 21: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Phase diagram

Full understanding of the position of the ground states for β > 0, α < β6 BERTHET, RIGOLLET AND SRIVASTAVA

2 24 41 1

2

4

(III)(I)

(II)

Figure 1: The phase diagram of the Ising block model, with three regions for theparameters ↵ and > 0. In region (I), where ↵ < 0 and + |↵| > 2, there aretwo ground states of the form (x,x) and (x, x), x > 0. In region (II), where + |↵| < 2, there is one ground state at (0, 0). In region (III), where ↵ > 0 and + |↵| > 2, there are two ground states of the form (x, x) and (x,x), x > 0.At the boundary between regions (I) and (III), there are four ground states. Thedotted line has equation ↵ = , we only consider parameters in the region to itsleft, where > ↵.

Proof. Throughout this proof, for any b 2 IR, we denote by gcwb (x), x 2

[1, 1], the free energy of the Curie-Weiss model with inverse temperature b. Wewrite g := g↵, for simplicity to denote the free energy of the IBM.

Note that

(2.7) g(x, y) = gcw+↵

2

(x) + gcw+↵

2

(y) + ↵(x y)2 .

We split our analysis according to the sign of ↵. Note first that if ↵ = 0, we have

g(x, y) = gcw2

(x) + gcw2

(y) .

It yields that:

• If 2, then gcw2

has a unique local minimum at x = 0 which implies that g

has a unique minimum at (0, 0)• If > 2, then gcw

2

has exactly two minima at x(/2) and x(/2), where

x(/2) 2 (1, 1). It implies that g has four minima at (±x(/2), ±x(/2)).

Next, if ↵ > 0, in view of (2.7) we have

g(x, y) gcw+↵

2

(x) + gcw+↵

2

(y)

with equality i↵ x = y. It implies that:

• If ↵ + 2, then g has a unique minimum at (0, 0)

• Phase diagram for all the parameter regions

Region (I): Two ground states (µS, µS) = ±(x,−x).

Region (II): One ground state at (0, 0).

Region (III): Two ground states (µS, µS) = ±(x, x).

Q.Berthet - Les Houches - OSL 2017 20/24

Page 22: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Concentration

• Quantities of interest as expectations of the mean block magnetizations

∆ ≈ 1

2E[µ2

S + µ2S] , Ω ≈ E[µSµS] and ∆− Ω ≈ 1

2E[(µS − µS)2] .

• Gaussian approximation of the discrete distribution with Z ∼ N (0, I2).

Eα,β[ϕ(µ)] 'p1

|G|∑s∈G

E[ϕ(s+ 2

√2

pH−1/2Z)

]∀ϕ .

• Approximation of the gap ∆− Ω:

∆− Ω 'p

2x2 in region (I)

Cα,βp

in region (II)

C ′α,βp

in region (III)

Q.Berthet - Les Houches - OSL 2017 21/24

Page 23: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Naive estimation

• Covariance matrix:

Σ = E[σσ>] =

∆ Ω

Ω ∆

+ (1−∆)Ip .

• Empirical covariance matrix:

Σ =1

n

n∑t=1

σ(t)σ(t)> = Σ±√

log p

nentrywise

• Threshold off-diagonal entries of Σ at (∆ + Ω)/2

• Exact recovery if

n &

log p in region (I)

p2 log p in region (II)

p2 log p in region (III)

Q.Berthet - Les Houches - OSL 2017 22/24

Page 24: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Exact recovery

• Upper bound: we have V = VS with probability 1− δ for

n &1

Cα,β

log(p/δ)

∆− Ω,

by bounding ‖LS(Γ− Γ)‖op, a sum of independent matrices. Tropp 12

• Matching lower bound: Fano’s inequality yields

n ≤ γ

β − αlog(p/4)

∆− Ω=⇒ P(recovery) . γ

• Full understanding of the scaling of ∆− Ω gives optimal rates.

n &

log p in region (I)

p log p in regions (II) and (III)

with constant factors illustrating further these transitions.

Q.Berthet - Les Houches - OSL 2017 23/24

Page 25: Quentin Berthet - imag.frljk.imag.fr/membres/Jerome.Malick/osl2017/talks/berthe… ·  · 2017-04-13Quentin Berthet Optimization and Statistical Learning - Les Houches ... Q. Berthet,

Conclusion

• Contributions

New model for interactions between individuals in different communities.

Analysis from statistical physics to understand parameters of the problem.

Study of convex relaxations with an analysis on normal cones.

• Open questions

Exact recovery threshold, conjecture that n∗ = C∗ log(p)(β−α)(∆−Ω).

Rates for partial recovery in Hamming distance.

Generalization to multiple blocks, more complex structures.

THANK YOU

Work supported by the Isaac Newton Trust and the Alan Turing Institute.

Q.Berthet - Les Houches - OSL 2017 24/24