30
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Amos Storkey, School of Informatics.

Density Traversal Clusteringand Generative Kernels

a generative framework

for spectral clustering

Amos Storkey, Tom G Griffiths

University of Edinburgh

Amos Storkey, School of Informatics, University of Edinburgh

Attribute Generalisation

Amos Storkey, School of Informatics, University of Edinburgh

Prior work

• Tishby and Slonim• Meila and Shi• Coifman et al• Nadler et al

Amos Storkey, School of Informatics, University of Edinburgh

Example: Transition Matrix

Amos Storkey, School of Informatics, University of Edinburgh

Example: 20 Iterations

Amos Storkey, School of Informatics, University of Edinburgh

Example: 400 Iterations

Amos Storkey, School of Informatics, University of Edinburgh

Argument

• A priori dependence on data.• No generative model.• Inconsistent with underlying density.

• Clusters are spatial characteristics that are properties of distributions.

• Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.

Amos Storkey, School of Informatics, University of Edinburgh

But we do know

• Know diffusion asymptotics, but probabilistic formalism inconsistent with data density:– Finite time-step, infinite data limit equilibrium distribution

does not match data distribution.

Amos Storkey, School of Informatics, University of Edinburgh

Density Traversal Clustering

• Define discrete time, continuous, diffusing Markov chain.

• Definition dependent on some latent distribution.• Call this the Traversal Distribution.

Amos Storkey, School of Informatics, University of Edinburgh

The Markov chain

• Transition with probability

• D(y,x) is Gaussian centred at x, P* is Traversal distribution.

• Here S is given by the solution of

)()(),()(

)(

)()(),()|(

1*1

11

1*1

1

ySyPxyDdyxZ

xZ

xSxPxxDxxP

t

tttttt

)(

),()()(

*

yS

xyDyPdyxS

Amos Storkey, School of Informatics, University of Edinburgh

Generative procedure

Amos Storkey, School of Informatics, University of Edinburgh

Problems

• Random walk in continuous space• Each step involves many intractable integrals.• Real Bayesians would...• Good prior distributions over distributions is a hard

problem, but need prior for traversal distributions.

Amos Storkey, School of Informatics, University of Edinburgh

CHEAT

• Doing all the integrals is not possible, but...– All integrals are with respect to traversal distribution– Use empirical data proxy– All the integrals now become sample estimates: sums

over the data points.– Everything is computable in the space of data points.– WORKS!: never need to evaluate the probability at a

point, only integrals over regions.

Amos Storkey, School of Informatics, University of Edinburgh

We get…

• Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij

– A = WS-1

– W is usual affinity

– S-1 is extra consistency term.

• More generally have out of sample scaled likelihood:– P(x | centre y) / P(x)= n a(x)T (AD-2)b(y)

where a(x) and b(x) are the traversal probabilities to and from x.

Amos Storkey, School of Informatics, University of Edinburgh

Example: Scaled likelihoods

Amos Storkey, School of Informatics, University of Edinburgh

Example: 20 Iterations

Amos Storkey, School of Informatics, University of Edinburgh

Example: 400 Iterations

Amos Storkey, School of Informatics, University of Edinburgh

Initial distribution

• Can consider other initial distributions.• Specifically can consider delta functions at mixture

centres.• Variational Bayesian Mixture models…

Amos Storkey, School of Informatics, University of Edinburgh

Demo

Amos Storkey, School of Informatics, University of Edinburgh

Number of clusters

• Scaled likelihoods for three cluster problem.

Amos Storkey, School of Informatics, University of Edinburgh

Number of clusters

• Scaled likelihoods for a five cluster problem.

Amos Storkey, School of Informatics, University of Edinburgh

Cluster allocations

Amos Storkey, School of Informatics, University of Edinburgh

Cluster allocations

Amos Storkey, School of Informatics, University of Edinburgh

Conclusion

• A priori formulation of spectral clustering.• Can be used as any other spectral procedure• But also provides scaled likelihoods – can be

combined with Bayesian procedures.• Variational Bayesian formalism.• Small sample approximation issues.• Better to have a flexible density estimator.

Amos Storkey, School of Informatics, University of Edinburgh

Generative Kernels

• Related to Seeger: Covariance Kernels from Bayesian Generative Models

Gaussian Process over X space

Data is obtained by diffusing in X space using the traversal process...

Density, and corresponding traversal process.

And then local averaging andAdditive noise.

X

Amos Storkey, School of Informatics, University of Edinburgh

Generative Kernels

• Covariance Kij is

• Again use sample estimates.• Presume measured target is local average.• Just standard basis function derivation of GP.

),() sourced () sourced (),( yxKsyPrxPdxdysrK

Amos Storkey, School of Informatics, University of Edinburgh

Motivation

• Generative model generates clustered data positions.

• Targets diffuse using traversal process.• Target values suffer locality averaging influence:

– Diffused objects locally influence one another’s target values so everyone becomes like their neighbours.

• E.g. Accents.• Can add local measurement noise.

Amos Storkey, School of Informatics, University of Edinburgh

Kernel Clustering

• Use sample estimates again to get kernel

• Can also encorporate a prior over iterations and integrate out.

• For example can use matrix exponential exp(A) instead of (AD).

ij

ijjDT

iDT KAsaArasrK .

1.

1 )()()()(),(

Amos Storkey, School of Informatics, University of Edinburgh

Generating targets for rings data

• Can generate from the model:

• Across cluster covariance is low.

• Within cluster continuity.

Amos Storkey, School of Informatics, University of Edinburgh

The point?

• Density dependence matters in missing data problems.

• Gaussian process: data with missing targets has no influence.

• Density Traversal Kernel: data with missing targets affects kernel, and hence has influence.