Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom

Amos Storkey, School of Informatics.

Density Traversal Clusteringand Generative Kernels

a generative framework

for spectral clustering

Amos Storkey, Tom G Griffiths

University of Edinburgh

Amos Storkey, School of Informatics, University of Edinburgh

Attribute Generalisation


Prior work

• Tishby and Slonim• Meila and Shi• Coifman et al• Nadler et al


Example: Transition Matrix


Example: 20 Iterations




Argument

• A priori dependence on data.• No generative model.• Inconsistent with underlying density.

• Clusters are spatial characteristics that are properties of distributions.

• Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.


But we do know

• Know diffusion asymptotics, but probabilistic formalism inconsistent with data density:– Finite time-step, infinite data limit equilibrium distribution

does not match data distribution.


Density Traversal Clustering

• Define discrete time, continuous, diffusing Markov chain.

• Definition dependent on some latent distribution.• Call this the Traversal Distribution.


The Markov chain

• Transition with probability

• D(y,x) is Gaussian centred at x, P* is Traversal distribution.

• Here S is given by the solution of

)()(),()(

)(

)()(),()|(

1*1

11

1*1

1

ySyPxyDdyxZ

xZ

xSxPxxDxxP

t

tttttt

)(

),()()(

*

yS

xyDyPdyxS


Generative procedure


Problems

• Random walk in continuous space• Each step involves many intractable integrals.• Real Bayesians would...• Good prior distributions over distributions is a hard

problem, but need prior for traversal distributions.


CHEAT

• Doing all the integrals is not possible, but...– All integrals are with respect to traversal distribution– Use empirical data proxy– All the integrals now become sample estimates: sums

over the data points.– Everything is computable in the space of data points.– WORKS!: never need to evaluate the probability at a

point, only integrals over regions.


We get…

• Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij

– A = WS-1

– W is usual affinity

– S-1 is extra consistency term.

• More generally have out of sample scaled likelihood:– P(x | centre y) / P(x)= n a(x)T (AD-2)b(y)

where a(x) and b(x) are the traversal probabilities to and from x.


Example: Scaled likelihoods






Initial distribution

• Can consider other initial distributions.• Specifically can consider delta functions at mixture

centres.• Variational Bayesian Mixture models…


Demo


Number of clusters

• Scaled likelihoods for three cluster problem.


Number of clusters

• Scaled likelihoods for a five cluster problem.


Cluster allocations


Cluster allocations


Conclusion

• A priori formulation of spectral clustering.• Can be used as any other spectral procedure• But also provides scaled likelihoods – can be

combined with Bayesian procedures.• Variational Bayesian formalism.• Small sample approximation issues.• Better to have a flexible density estimator.


Generative Kernels

• Related to Seeger: Covariance Kernels from Bayesian Generative Models

Gaussian Process over X space

Data is obtained by diffusing in X space using the traversal process...

Density, and corresponding traversal process.

And then local averaging andAdditive noise.

X


Generative Kernels

• Covariance Kij is

• Again use sample estimates.• Presume measured target is local average.• Just standard basis function derivation of GP.

),() sourced () sourced (),( yxKsyPrxPdxdysrK


Motivation

• Generative model generates clustered data positions.

• Targets diffuse using traversal process.• Target values suffer locality averaging influence:

– Diffused objects locally influence one another’s target values so everyone becomes like their neighbours.

• E.g. Accents.• Can add local measurement noise.


Kernel Clustering

• Use sample estimates again to get kernel

• Can also encorporate a prior over iterations and integrate out.

• For example can use matrix exponential exp(A) instead of (AD).

ij

ijjDT

iDT KAsaArasrK .

1.

1 )()()()(),(


Generating targets for rings data

• Can generate from the model:

• Across cluster covariance is low.

• Within cluster continuity.


The point?

• Density dependence matters in missing data problems.

• Gaussian process: data with missing targets has no influence.

• Density Traversal Kernel: data with missing targets affects kernel, and hence has influence.

Documents

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom