Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering...

Preview:

Citation preview

Impact of Regularization on Spectral Clustering

Antony Joseph* and Bin Yu#

* Walmart Research Lab in San Francisco(formerly UCB and LBNL)

# Departments of Statistics and EECSUC Berkeley

Workshop on Spectral Algorithms, Simons Inst, Oct. , 2014

Monday, November 3, 2014

/46

spectral clusteringin

graphs

Berkeley Drosophila Genome Project (BDGP)(The fruit fly project)

Overview

2

collaborators :

• Siqi Wu, UC Berkeley

• Erwin Frise, Lawrence Berkeley Lab

• Ann Hammonds, Lawrence Berkeley Lab

• Sue Celniker, Lawrence Berkeley Lab

Monday, November 3, 2014

/46

spectral clusteringin

graphs

Berkeley Drosophila Genome Project (BDGP)(The fruit fly project)

Overview

2

collaborators :

• Siqi Wu, UC Berkeley

• Erwin Frise, Lawrence Berkeley Lab

• Ann Hammonds, Lawrence Berkeley Lab

• Sue Celniker, Lawrence Berkeley Lab

Monday, November 3, 2014

/46

A Graph

Social network people

The fruit fly project pixels/points in the embryo template

...

Context Nodes

3Monday, November 3, 2014

/46

The fruit fly project (Berkeley Drosophila Genome Project)

4Monday, November 3, 2014

/46

Drosophila(fruit fly)

Widely studied :• genetic mechanism similar to humans • easy to maintain in the lab• short life cycle • ...

5Monday, November 3, 2014

Image dataset from the fruit fly project

Monday, November 3, 2014

Image dataset from the fruit fly project

tailless

Monday, November 3, 2014

Image dataset from the fruit fly project

• Over 100,000 stained embryo images (over 7000 genes)

Monday, November 3, 2014

Image dataset from the fruit fly project

• Over 100,000 stained embryo images (over 7000 genes)

• the interaction between different genes

• the genes required for development of various organs.

Goals: Contribute to the understanding of ...

Monday, November 3, 2014

/46

‘Fate’ map in early embryos

Lohs-Schardin et. al (’70), Hartenstein et. al. (‘85)

Laser ablation experiments in embryos in early stages of development

7Monday, November 3, 2014

/46

‘Fate’ map in early embryos

hind gut

anterior mid-gut

dorsal epidermis

ventral neurogenic region

procephalic neurogenic region

pharynx

mesoderm

esophagus

Lohs-Schardin et. al (’70), Hartenstein et. al. (‘85)

Laser ablation experiments in embryos in early stages of development

7Monday, November 3, 2014

/46

Do genes explain the `fate’ map?

.... early stage gene expression images

8Monday, November 3, 2014

/46

Discovery of fate map &

communities on graphs

9Monday, November 3, 2014

/46

Discovery of fate map &

communities on graphs

Nodes : pixels/points in the embryo

9Monday, November 3, 2014

/46

Discovery of fate map &

communities on graphs

9Monday, November 3, 2014

/46

Discovery of fate map &

communities on graphs

Edge if lot of genes are co-expressed at the two nodes

9Monday, November 3, 2014

/46

Discovery of fate map &

communities on graphs

9Monday, November 3, 2014

/46

Discovery of fate map &

communities on graphs

fate map

9

??

Monday, November 3, 2014

/46

Edge between node i and node j

10Monday, November 3, 2014

/46

= . . . . . .

Edge between node i and node j

10Monday, November 3, 2014

/46

= . . . . . .

= . . . . . .

Edge between node i and node j

10Monday, November 3, 2014

/46

= . . . . . .

= . . . . . .

> >

Edge between node i and node j

10Monday, November 3, 2014

/46

= . . . . . .

= . . . . . .

> >

Edge between node i and node j

90-th percentile

10Monday, November 3, 2014

/46

τ = τ =

Take K = 8

11

Comparing unregularized vs. regularized SC

Monday, November 3, 2014

/46

τ = τ =

Take K = 8dorsal epidermis

mesoderm

hind gut

ventral neurogenic region

procephalic neurogenic region

anterior mid-gut

pharynx

esophagus

11

Comparing unregularized vs. regularized SC

Monday, November 3, 2014

/46

Communities

people like minded people

pixels/points in embryo area of future organs

...

Nodes Communities

12Monday, November 3, 2014

/46

Finding communities

13Monday, November 3, 2014

/46

Finding communities

Notion of (two) communities

13Monday, November 3, 2014

/46

Finding communities

Notion of (two) communities

Methods

Spectral clustering (Fiedler (’73), Donath & Hoffman (’73), ...)

Modularity (Newman & Girvan (‘03)), Latent space methods (Hoff et. al. (’02))Profile-likelihood (Bickel & Chen (’09)), Pseudo-Likelihood (Amini et. al. (’13)),

13Monday, November 3, 2014

/4614

Spectral Clustering

Monday, November 3, 2014

/46

Notation

15

Adjacency matrix:(symmetric binary)

Number of nodes: n

A ∈ Rn×n

Aij = Aji =

�1, (i, j)

0,

Monday, November 3, 2014

/46

Notation

15

Adjacency matrix:(symmetric binary)

Number of nodes: n

A ∈ Rn×n

Aij = Aji =

�1, (i, j)

0,

Each row/column of A associated with a node

Monday, November 3, 2014

/46

Notation

15

Adjacency matrix:(symmetric binary)

Number of nodes: n

A ∈ Rn×n

Aij = Aji =

�1, (i, j)

0,

Degree matrix:(diagonal)

D ∈ Rn×n

Dii =�

j

Aij

Monday, November 3, 2014

/46

Spectral Clustering

16

(Normalizedsymmetric Laplacian matrix)

= − / − /

Spectral clustering deals with the eigenvectors of the matrix :

Monday, November 3, 2014

/46

Spectral Clustering

16

(Normalizedsymmetric Laplacian matrix)

= − / − /

Spectral clustering deals with the eigenvectors of the matrix :

Other matrices used ...

D −A

A

D−1 A ( Normalized random walk Laplacian)

(Unnormalized Laplacian)

(Adjacency matrix)

Monday, November 3, 2014

/4617

Illustration of SC

Monday, November 3, 2014

/4617

Illustration of SC

A =

Monday, November 3, 2014

/4617

A =

Illustration of SC

Monday, November 3, 2014

/4617

L =

Illustration of SC

Monday, November 3, 2014

/4617

First eigenvector

Seco

nd e

igen

vect

or

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.038 0.0360.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

L =

Illustration of SC

Monday, November 3, 2014

/4617

First eigenvector

Seco

nd e

igen

vect

or

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.038 0.0360.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

L =

Illustration of SC

cluster

Seco

nd e

igen

vect

or

First eigenvector

Monday, November 3, 2014

/46

SC for finding K clusters (Shi and Malik (00), Ng et. al (’02))

18

n×K V K L

V K

Monday, November 3, 2014

/46

SC for finding K clusters (Shi and Malik (00), Ng et. al (’02))

18

n×K V K L

V K

V

Monday, November 3, 2014

/46

Popularity of spectral clustering

• Computational advantage :

-requires eigenvector decomposition which is very fast

Theoretical backing :

- relaxation of various cut-based measures

(Hagen & Kahng (’92), Shi & Malik (’00), Ng et al, (’02))

- Stochastic Block Model and its extensions

(McSherry (‘01), Rohe. et. al (‘11), Chaudhari et. al. (’12), Sussman (’12),

Fishkind (’11))

19Monday, November 3, 2014

/46

Performance of spectral clustering improves greatly through regularization

Regularization proposed by Amini, Chen, Bickel and Levina (AoS, 2013)

20Monday, November 3, 2014

/46

Performance of spectral clustering improves greatly through regularization

Regularization proposed by Amini, Chen, Bickel and Levina (AoS, 2013)

20

A

Aτ = A+τ

n11�, τ > 0.

Lτ Aτ

Vτ K

Vτ = K Lτ

Monday, November 3, 2014

/46

Performance of spectral clustering improves greatly through regularization

Regularization proposed by Amini, Chen, Bickel and Levina (AoS, 2013)

Alternative forms of regularization proposed and analyzed in Chaudhuri et. al (2012), Qin & Rohe (’13)

20

A

Aτ = A+τ

n11�, τ > 0.

Lτ Aτ

Vτ K

Vτ = K Lτ

Monday, November 3, 2014

/46

Stochastic Block Model

21Monday, November 3, 2014

/46

Stochastic Block Model (SBM) (Holland et. al (’83))

22

n

(i, j) Pij

Monday, November 3, 2014

/46

Stochastic Block Model (SBM) (Holland et. al (’83))

22

SBM with two blocks

=� �

=

n× n

n

(i, j) Pij

Monday, November 3, 2014

/4623

sample.4

.3

.2

.2

Monday, November 3, 2014

/4623

sample.4

.3

.2

.2

Monday, November 3, 2014

/46

Analysis of regularization for the SBM(Focus on K =2)

24Monday, November 3, 2014

/4625

Comparing unregularized vs. regularized SC

==

τ =

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

τ =

.003

.04

.0025

.0025

Monday, November 3, 2014

/4625

Comparing unregularized vs. regularized SC

==

τ =

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

τ =

.003

.04

.0025

.0025

k-means success : 100%k-means success : 87%

Monday, November 3, 2014

/46

Recap : Regularized spectral clustering

26

Aτ = A+τ

n11�, τ > 0.

Lτ = D−1/2τ AτD

−1/2τ

Vτ = Lτ

Monday, November 3, 2014

/46

Population level quantities

27

A = P=

Monday, November 3, 2014

/46

Population level quantities

27

τ

τ

Pτ = P + τn11

Lpopτ

Monday, November 3, 2014

/46

Population level quantities

27

τ

τ

Pτ = P + τn11

Lpopτ

Recall:

Vτ n× 2

Monday, November 3, 2014

/46

Population level quantities

27

τ

τ

Pτ = P + τn11

Lpopτ

Vτ V popτ

Monday, November 3, 2014

/46

Population level quantities

27

1,τ 2,τ

τ

τ

Pτ = P + τn11

Lpopτ

Vτ V popτ

Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

τ =max = , max ∈ � ,τ − ,τ�

� ,τ − ,τ�

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

/4629

τ

τ =max = , max ∈ � ,τ − ,τ�

� ,τ − ,τ�

Monday, November 3, 2014

/4629

τ

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/46

τ

29

τ

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/4629

τ

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/4629

τ

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/4629

τ

τ

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

(µ2,τ τ)

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/4629

τ

Implication of concentration of Laplacian (Oliveira (’10)):

with high probability� τ − τ � � min

�√

, + τ, ,

( , + τ)

��log

τ � log ,

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/46

Improvements using extension of techniques in Balakrishnan et. al. (’11).

29

τ

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

/4630

Let,

Set,

dn :=

τ = dn

Monday, November 3, 2014

/4630

Let,

Set,

dn :=

τ = dn

Result (SBM with two blocks):

dn �√n log n

µ2,0

Monday, November 3, 2014

/4630

Let,

Set,

Summary:

dn :=

τ = dn

Result (SBM with two blocks):

dn �√n log n

µ2,0

Monday, November 3, 2014

/46

Choice of regularization parameter

31Monday, November 3, 2014

/46

� τ − τ �µ ,τ

Recall: trade-offs dictated by

32Monday, November 3, 2014

/46

� τ − τ �µ ,τ

Recall: trade-offs dictated by

� τ − ˆτ �µ̂ ,τ

τ

32Monday, November 3, 2014

/46

Estimates based on estimated SBM (or degree corrected SBM)

� τ − τ �µ ,τ

Recall: trade-offs dictated by

� τ − ˆτ �µ̂ ,τ

τ

32Monday, November 3, 2014

/46

� τ − τ �µ ,τ

Recall: trade-offs dictated by

� τ − ˆτ �µ̂ ,τ

τ

32Monday, November 3, 2014

/46

ˆτ , µ̂ , τ

33

P=

τ

C1, C2

Monday, November 3, 2014

/46

ˆτ , µ̂ , τ

33

P=

τ

C1, C2

p1, p2 q C1 C2

e.g. p̂1 = C1

Monday, November 3, 2014

/46

ˆτ , µ̂ , τ

33

P̂ =

p̂1

p̂2q̂

τ

C1, C2

p1, p2 q C1 C2

e.g. p̂1 = C1

Monday, November 3, 2014

/46

ˆτ , µ̂ , τ

33

P̂ =

p̂1

p̂2q̂

τ

C1, C2

p1, p2 q C1 C2

e.g. p̂1 = C1

P̂ L̂popτ µ̂2,τ L̂pop

τ

Monday, November 3, 2014

/4634

Example

==

.003

.01

.0025

.0025

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

τ = 0 τ = 18

Monday, November 3, 2014

/4634

Example

==

.003

.01

.0025

.0025

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

k-means success : 94%k-means success : 75%

τ = 0 τ = 18

Monday, November 3, 2014

/46

Political blog data

35Monday, November 3, 2014

/46

source : Adamic & Glance (’05)

=

36Monday, November 3, 2014

/46

0 50 100 150 200 250 300 350 4000

100

200

300

400

500

600

700

800

Histogram of degrees

source : Adamic & Glance (’05)

=

36Monday, November 3, 2014

/46

first

eig

enve

ctor

nodes

Political blogs data set

Unregularized Spectral Clustering

37

seco

nd e

igen

vect

or

nodes

Monday, November 3, 2014

/4638Monday, November 3, 2014

/46

second eigenvector discriminates these from the remaining

38Monday, November 3, 2014

/4638Monday, November 3, 2014

/4638

third eigenvector

Monday, November 3, 2014

/46

Regularized SC for political blogs datasetfir

st e

igen

vect

or

seco

nd e

igen

vect

or

39

τ = .

nodesnodes

Monday, November 3, 2014

/46

Regularized SC for political blogs dataset

13% of misclassified nodes for regularizedcompared to 48% for unregularized

first

eig

enve

ctor

seco

nd e

igen

vect

or

39

τ = .

nodesnodes

Monday, November 3, 2014

/46

τ = τ =

Take K = 8

40

Comparing unregularized vs. regularized Spectral Clustering (SC)

Monday, November 3, 2014

/46

τ = τ =

Take K = 8dorsal epidermis

mesoderm

hind gut

ventral neurogenic region

procephalic neurogenic region

anterior mid-gut

pharynx

esophagus

40

Comparing unregularized vs. regularized Spectral Clustering (SC)

Monday, November 3, 2014

/46

Summary

• Theoretical upper bound under SBM shows “bias-variance”-like trade-off while the amount of regularization increases in SC

• Theoretical analysis motivates practically useful scheme (using SBM or degree-corrected SBM) to select regularization parameter in RSC.

Promising results in fruitfly image segmentation

Paper at (2014 rev): http://arxiv.org/pdf/1312.1733.pdf

41Monday, November 3, 2014

/46

Ongoing/future directions

The BDGP project (with Antony Joseph, Siqi Wu, Ann Hammonds, Sue Celniker, Erwin Frise)

• Fast algorithm for computing the data-driven choice of regularization parameter• Role of regularization in other scenarios, such as hierarchical clusters• Regularization parameter choice for continuous data

Spectral Clustering (with Antony Joseph)

• Analysis of gene interactions in different regions of early stage embryos• Extension of analysis to later stage embryos

42Monday, November 3, 2014

Recommended