30
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh

Density Traversal Clustering and Generative Kernels

Embed Size (px)

DESCRIPTION

Density Traversal Clustering and Generative Kernels. a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh. Attribute Generalisation. Prior work. Tishby and Slonim Meila and Shi Coifman et al Nadler et al. Example : Transition Matrix. - PowerPoint PPT Presentation

Citation preview

Page 1: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics.

Density Traversal Clusteringand Generative Kernels

a generative framework

for spectral clustering

Amos Storkey, Tom G Griffiths

University of Edinburgh

Page 2: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Attribute Generalisation

Page 3: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Prior work

• Tishby and Slonim• Meila and Shi• Coifman et al• Nadler et al

Page 4: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Example: Transition Matrix

Page 5: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Example: 20 Iterations

Page 6: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Example: 400 Iterations

Page 7: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Argument

• A priori dependence on data.• No generative model.• Inconsistent with underlying density.

• Clusters are spatial characteristics that are properties of distributions.

• Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.

Page 8: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

But we do know

• Know diffusion asymptotics, but probabilistic formalism inconsistent with data density:– Finite time-step, infinite data limit equilibrium distribution

does not match data distribution.

Page 9: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Density Traversal Clustering

• Define discrete time, continuous, diffusing Markov chain.

• Definition dependent on some latent distribution.• Call this the Traversal Distribution.

Page 10: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

The Markov chain

• Transition with probability

• D(y,x) is Gaussian centred at x, P* is Traversal distribution.

• Here S is given by the solution of

)()(),()(

)(

)()(),()|(

1*1

11

1*1

1

ySyPxyDdyxZ

xZ

xSxPxxDxxP

t

tttttt

)(

),()()(

*

yS

xyDyPdyxS

Page 11: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Generative procedure

Page 12: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Problems

• Random walk in continuous space• Each step involves many intractable integrals.• Real Bayesians would...• Good prior distributions over distributions is a hard

problem, but need prior for traversal distributions.

Page 13: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

CHEAT

• Doing all the integrals is not possible, but...– All integrals are with respect to traversal distribution– Use empirical data proxy– All the integrals now become sample estimates: sums

over the data points.– Everything is computable in the space of data points.– WORKS!: never need to evaluate the probability at a

point, only integrals over regions.

Page 14: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

We get…

• Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij

– A = WS-1

– W is usual affinity

– S-1 is extra consistency term.

• More generally have out of sample scaled likelihood:– P(x | centre y) / P(x)= n a(x)T (AD-2)b(y)

where a(x) and b(x) are the traversal probabilities to and from x.

Page 15: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Example: Scaled likelihoods

Page 16: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Example: 20 Iterations

Page 17: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Example: 400 Iterations

Page 18: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Initial distribution

• Can consider other initial distributions.• Specifically can consider delta functions at mixture

centres.• Variational Bayesian Mixture models…

Page 19: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Demo

Page 20: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Number of clusters

• Scaled likelihoods for three cluster problem.

Page 21: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Number of clusters

• Scaled likelihoods for a five cluster problem.

Page 22: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Cluster allocations

Page 23: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Cluster allocations

Page 24: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Conclusion

• A priori formulation of spectral clustering.• Can be used as any other spectral procedure• But also provides scaled likelihoods – can be

combined with Bayesian procedures.• Variational Bayesian formalism.• Small sample approximation issues.• Better to have a flexible density estimator.

Page 25: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Generative Kernels

• Related to Seeger: Covariance Kernels from Bayesian Generative Models

Gaussian Process over X space

Data is obtained by diffusing in X space using the traversal process...

Density, and corresponding traversal process.

And then local averaging andAdditive noise.

X

Page 26: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Generative Kernels

• Covariance Kij is

• Again use sample estimates.• Presume measured target is local average.• Just standard basis function derivation of GP.

),() sourced () sourced (),( yxKsyPrxPdxdysrK

Page 27: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Motivation

• Generative model generates clustered data positions.

• Targets diffuse using traversal process.• Target values suffer locality averaging influence:

– Diffused objects locally influence one another’s target values so everyone becomes like their neighbours.

• E.g. Accents.• Can add local measurement noise.

Page 28: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Kernel Clustering

• Use sample estimates again to get kernel

• Can also encorporate a prior over iterations and integrate out.

• For example can use matrix exponential exp(A) instead of (AD).

ij

ijjDT

iDT KAsaArasrK .

1.

1 )()()()(),(

Page 29: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

Generating targets for rings data

• Can generate from the model:

• Across cluster covariance is low.

• Within cluster continuity.

Page 30: Density Traversal Clustering and Generative Kernels

Amos Storkey, School of Informatics, University of Edinburgh

The point?

• Density dependence matters in missing data problems.

• Gaussian process: data with missing targets has no influence.

• Density Traversal Kernel: data with missing targets affects kernel, and hence has influence.