22
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1

CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

Clustering Lecture 5: Mixture Model

Jing Gao SUNY Buffalo

1

Page 2: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

Outline

• Basics – Motivation, definition, evaluation

• Methods – Partitional

– Hierarchical

– Density-based

– Mixture model

– Spectral methods

• Advanced topics – Clustering ensemble

– Clustering in MapReduce

– Semi-supervised clustering, subspace clustering, co-clustering, etc.

2

Page 3: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

Using Probabilistic Models for Clustering

• Hard vs. soft clustering

– Hard clustering: Every point belongs to exactly one cluster

– Soft clustering: Every point belongs to several clusters with

certain degrees

• Probabilistic clustering

– Each cluster is mathematically represented by a parametric distribution

– The entire data set is modeled by a mixture of these distributions

3

Page 4: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

x

f(x)

μ

σ

Changing μ shifts the distribution left or right

Changing σ increases or decreases the spread

Probability density function f(x) is a function of x given μ and σ

Gaussian Distribution

4

))(2

1exp(

2

1),|( 22

xxN

Page 5: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

Likelihood

x

f(x)

Define likelihood as a function of μ and σ given x1, x2, …, xn

Which Gaussian distribution is more likely to generate the data?

5

n

i

ixN1

2 ),|(

Page 6: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

6

Gaussian Distribution

• Multivariate Gaussian

• Log likelihood

mean covariance

|)|ln))()(2

1(),|(ln),( 1

11

i

T

i

n

i

n

i

i xxxNL

Page 7: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

7

Maximum Likelihood Estimate

• MLE

– Find model parameters that maximize log likelihood

• MLE for Gaussian

),( L

,

Page 8: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

8

Gaussian Mixture

• Linear combination of Gaussians

where

parameters to be estimated

Page 9: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

• To generate a data point:

– first pick one of the components with probability

– then draw a sample from that component distribution

• Each data point is generated by one of K components, a latent variable is associated with each

9

Gaussian Mixture

Page 10: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

10

• Maximize log likelihood

• Without knowing values of latent variables, we have to maximize the incomplete log likelihood

Gaussian Mixture

Page 11: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

11

Expectation-Maximization (EM) Algorithm

• E-step: for given parameter values we can compute the expected values of the latent variables (responsibilities of data points)

– Note that instead of but we

still have

Page 12: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

12

• M-step: maximize the expected complete log likelihood

• Parameter update:

Expectation-Maximization (EM) Algorithm

Page 13: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

13

EM Algorithm

• Iterate E-step and M-step until the log likelihood of data does not increase any more.

– Converge to local optimal

– Need to restart algorithm with different initial guess of parameters (as in K-means)

• Relation to K-means

– Consider GMM with common covariance

– As , two methods coincide

Page 14: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

14 14

Page 15: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

15 15

Page 16: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

16 16

Page 17: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

17 17

Page 18: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

18 18

Page 19: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

19 19

Page 20: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

20

K-means vs GMM

• Objective function – Minimize sum of squared

Euclidean distance

• Can be optimized by an EM algorithm – E-step: assign points to clusters

– M-step: optimize clusters

– Performs hard assignment during E-step

• Assumes spherical clusters with equal probability of a cluster

• Objective function

– Maximize log-likelihood

• EM algorithm

– E-step: Compute posterior probability of membership

– M-step: Optimize parameters

– Perform soft assignment during E-step

• Can be used for non-spherical clusters

• Can generate clusters with different probabilities

Page 21: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

Mixture Model

• Strengths

– Give probabilistic cluster assignments

– Have probabilistic interpretation

– Can handle clusters with varying sizes, variance etc.

• Weakness

– Initialization matters

– Choose appropriate distributions

– Overfitting issues

21

Page 22: CSE601 Mixture Model Clustering - University at Buffalojing/cse601/fa12/materials/... · 2012. 10. 26. · Outline •Basics –Motivation, definition, evaluation •Methods –Partitional

Take-away Message

• Probabilistic clustering

• Maximum likelihood estimate

• Gaussian mixture model for clustering

• EM algorithm that assigns points to clusters and estimates model parameters alternatively

• Strengths and weakness

22