27
Gaussian Mixture Models Learning from Data: Gaussian Mixture Models Amos Storkey, School of Informatics November 21, 2005 http://www.anc.ed.ac.uk/amos/lfd/ Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Learning from Data: Gaussian MixtureModels

Amos Storkey, School of Informatics

November 21, 2005

http://www.anc.ed.ac.uk/∼amos/lfd/

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 2: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Summary

I Class conditional GaussiansI What about if the class is unknown.I What if just have a complicated densityI Can we learn a clustering?I Can we learn a density?

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 3: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Gaussian: Reminder

I The vector x is multivariate Gaussian if for mean µ andcovariance matrix Σ, it is distributed according to

P(x|µ,Σ) =1

|2πΣ|1/2 exp(−1

2(x− µ)T Σ−1(x− µ)

)

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 4: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Multivariate Gaussian: Picture

−3−2

−10

12

3

−3

−2

−1

0

1

2

30

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 5: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Reminder: Class Conditional Classification

I Have real valued multivariate data, along with class labelfor each point.

I Want to predict the value of the class label given some newpoint.

I Presume that if we take all the points with a particularlabel, then we believe they were sampled from a Gaussian.

I How should we predict the class at a new point?

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 6: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Reminder: Class Conditional Classification

I Learning: Fit Gaussian to data in each class (classconditional fitting). Gives P(position|class)

I Find estimate for probability of each class P(class)

I Inference: Given a new position, we can ask “What is theprobability of this point being generated by each of theGaussians.”

I Pick the largest and give probability using Bayes rule

P(class|position) ∝ P(position|class)P(class)

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 7: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Unsupervised Problem

I Given exactly the same problem, but without any classlabels, can we still solve it?

I Presume we know the number of classes and know theyare Gaussian.

I Can we model the underlying distribution?I Can we cluster the data into different classes?I Effectively we want to allocate a class label to each point.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 8: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−30 −20 −10 0 10 20 30 40−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 9: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−30 −20 −10 0 10 20 30 40−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 10: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 11: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Can solve either-or

I If we know which clusters the points belong to we cansolve (learning a class-conditional model).

I If we know what the Gaussian clusters are we can solve(inferential classification using a class-conditional model)

I The first case was just what we did when we had trainingdata.

I The second was just what we did using Bayes rule for newtest data.

I But how can we do the two together?

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 12: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Iterate?

I Could just iterate: Guess the cluster values. Calculateparameters. Find maximum cluster values.

I No reason to believe this will converge.I Problem is that we have probability of belonging to a

cluster, but we allocate all-or nothing.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 13: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Use Gradient Optimisation

I Write out model. Calculate derivatives, optimise directlyusing conjugate gradients.

I Will work. Quite complicated. Not necessarily fastestapproach.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 14: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Probabilistic Allocation

I Stronger: if we know a probabilistic allocation to clusterswe can find the parameters of the Gaussians.

I If we know the parameters of the Gaussians we can do aprobabilistic allocation to clusters.

I Convergence guarantee!I Convergence-to-a-local-maximum-likelihood-value

guarantee!!

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 15: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

EM Algorithm

I Choose number of mixtures.I Initialise Gaussians and mixing proportions.I Calculate responsibilities:I P(i |xµ) - the probability of data point xµ belonging to

cluster i given the current parameter values of theGaussians and mixing proportions.

I Pretend the responsibilities are the truth. Update theparameters using a maximum likelihood method(generalisation of class conditional learning).

I But the responsibilities are not the truth, so update themand repeat.

I Will converge to maximum likelihood parameter values.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 16: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−40 −20 0 20 40

−40

−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 17: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−40 −20 0 20 40

−40

−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 18: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−40 −20 0 20 40

−40

−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 19: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−40 −20 0 20 40

−40

−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 20: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−40 −20 0 20 40

−40

−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 21: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Example

−40 −20 0 20 40

−40

−30

−20

−10

0

10

20

30

40

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 22: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Labelling

I The labels are permutable.I We can use partially labelled data to set the cluster labels:

fix responsibilities deterministically.I EM algorithm for the rest.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 23: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Choice of number of mixtures

I How good is the likelihood?I More mixtures, better likelihood.I One mixture on each data point: infinite likelihood.I Need regularisation on Gaussian widths (i.e. covariances).I Aside: Bayesian methods account for probability mass in

parameter space.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 24: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Initialisation

I K-MeansI Uses K-NN for clustering.I See lecture notes.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 25: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Inference and Clustering

I Just as with class conditional Gaussian modelI Have mixture parameters.I Calculate posterior probability of belonging to a particular

mixture.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 26: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Covariances

I Full covariancesI Diagonal covariancesI Others types (factor analysis covariances - bit like PCA).

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models

Page 27: Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classification I Have real valued multivariate data, along with class label for

Gaussian Mixture Models

Summary

I Mixture Models - class conditional models without theclasses!

I EM algorithmI Using class conditional modelsI Issues: number of mixtures, covariance types etc.

Amos Storkey, School of Informatics Learning from Data: Gaussian Mixture Models