43
Expectation-Maximization Algorithm CAP5610: Machine Learning Instructor: Guo-Jun QI

3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Expectation-Maximization Algorithm

CAP5610: Machine LearningInstructor: Guo-Jun QI

Page 2: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

What is EM?

An algorithm alternating between assignment and estimation

A parameter estimation method: it falls into the general framework of maximum-likelihood estimation (MLE).

The general form was given in (Dempster, Laird, and Rubin, 1977), although essence of the algorithm appeared previously in various forms.

2

Page 3: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Outline

Maximum Likelihood Estimation (MLE) EM: basic concepts EM for Gaussian Mixture Model (GMM)

3

Page 4: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Outline

Maximum Likelihood Estimation (MLE) EM: basic concepts EM for Gaussian Mixture Model (GMM)

4

Page 5: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

What is MLE?

Given A sample X=X1, …, Xn A vector of parameters θ

We define Likelihood of the data: P(X | θ) Log-likelihood of the data: L(θ)=log P(X|θ)

Given X, find

)(maxarg Θ=Ω∈

LMLθ

θ

5

Page 6: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

MLE (cont)

Often we assume that Xis are independently identically distributed (i.i.d.)

Depending on the form of p(x|θ), solving optimization problem can be easy or hard.

)|(logmaxarg

)|(logmaxarg

)|,...,(logmaxarg

)|(logmaxarg

)(maxarg

1

Θ=

Θ=

Θ=

Θ=

Θ=

Ω∈

Ω∈

Ω∈

Ω∈

Ω∈

ii

ii

n

ML

XP

XP

XXP

XP

L

θ

θ

θ

θ

θθ

6

Page 7: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Example 1: Fit a line to observed data

x

y

7

Page 8: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Maximum likelihood estimation for the slope of a single line

Maximum likelihood estimate:

gives regression formula

Data likelihood for point n:

NnYX nn ,,1),,( =waXY +=

),0(~ 2σNw

Data:Model:

where

8

Page 9: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Example 2:Fitting two lines to observed data

x

y

9

Page 10: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Fitting two lines: on the one hand…

x

y

If we knew which points went with which lines, we’d be back at the single line-fitting problem, twice.

Line 1

Line 2

10

Page 11: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Fitting two lines, on the other hand…

x

y

We could figure out the probability that any point came from either line if we just knew the two equations for the two lines.

11

Page 12: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Expectation Maximization (EM): a solution to chicken-and-egg problems

12

Page 13: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM example:

Step 1: randomly guess two lines as initialization.

13

Page 14: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM example:

Step 2: Assign points to the closest lines.

14

Page 15: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM example:

Step 3: Update the current fitting of lines to the assigned points.

15

Page 16: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM example:

Step 4: Re-assign points to the closest lines.

16

Page 17: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM example:

Converged!

Step 5: Re-fit lines to the assigned points.

17

Page 18: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Algorithm

EM with hard assignment: Randomly guess the initial models

The cluster means in K-means and the lines in multi-line fitting problem

Repeat Hard assignment: assign data points to the corresponding clusters

based on fitness Fitness: distance to cluster mean in K-means and distance to

the line in multi-line fitting. Re-fitting of the model for each cluster

In K-means: update cluster mean In multi-line fitting: update the line Both re-fitting rules can be formulated by maximizing the

corresponding likelihood of generating assigned data points Until convergence

18

Page 19: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Outline

Maximum Likelihood Estimation (MLE) EM: basic concepts (soft assignment) EM for Gaussian Mixture Model (GMM)

19

Page 20: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Basic setting in EM

X is a set of data points: observed data Y is a cluster index denoting the assignment of

data point to a cluster Y is not a deterministic variable as in hard assignment; it is now a

random variable for soft assignment.

EM is a method to find θML where

Calculating P(X | θ) directly is hard. Because we have to marginalize out Y.

Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).

arg max ( ) arg max log ( | )ML L P X

20

Page 21: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

The EM terminology

( , | ) ( | Y k, ) P(Y | ) ( | ) Pk kP X Y k P X k P X

21

Page 22: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Algorithm

22

Page 23: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Outline

Maximum Likelihood Estimation (MLE) EM: basic concepts EM for Gaussian Mixture Model (GMM)

23

Page 24: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

d = feature dimensions x = random data point = mean value = covariance matrix

Gaussian Distribution

( )( ) ( )11( , ) exp

22 det( )

T

d

x xf x

µ µµ

π

− − Σ −Σ = −

Σ

Σµ

24

Page 25: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

GMMs – Gaussian Mixture Models

W

H

Suppose we have 1000 data points in 2D space (w,h)

25

Page 26: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

W

H

GMMs – Gaussian Mixture Models

Assume each data point is normally distributed Obviously, there are 5 sets of underlying gaussians

26

Page 27: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

GMM

Model the data distribution by a combination of Gaussian functions

Given a set of sample points, how to estimate their assignments and the parameters of the Gaussian distribution for each cluster?

Page 28: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM with GMM

In the context of GMM X: data points Y: which Gaussian creates which data points Distribution of complete data

Gaussian distribution for each cluster

Proportion of each cluster

where Pk’s must sum up to 1.

Page 29: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Soft assignment

E-Step: Assign each data Xi to a cluster k

Max-Step: Q-function

1 1 11 1

1

1 1

1 , ,

n n Tt t t tik i ik i k i kn

t t t ti ik ik k kn n

t tiik ik

i i

p x p x xp p

n p p

29

Page 30: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM for GMM (summary)

Objective:Given N data points, find maximum likelihood estimation of :

Algorithm:1. Guess initial2. Perform E step (expectation)

Based on , assign each data point to a specific Gaussian component

3. Perform M step (maximization) Based on data points clustering, maximize Q function

4. Repeat 2-3 until convergence (~tens iterations)

Θ

1arg max ( ,..., )Nf x xΘ

Θ = Θ

Θ

Θ

30

Page 31: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

Gaussian j

data point t

blue: P(Yt=j|Xt)

31

Page 32: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

32

Page 33: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

33

Page 34: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

34

Page 35: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

35

Page 36: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

36

Page 37: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

37

Page 38: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM Example

38

Page 39: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

EM is more than clustering

Y need not necessarily be a variable of cluster membership

It may just be any variable that is unknown to us. Bad pixels in an image Lost labels for an example

There could be many possible Ys. EM is still applicable

39

Page 40: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

Summary

EM is composed of two basic steps E-Step: Estimate the posterior distribution of

unobserved variables based on the current estimate of model parameters

M-Step: Update the model parameters by maximizing the expected likelihood weighted by the posterior distribution obtained in E-Step

Application of EM into Gaussian Mixture Model

40

Page 41: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

The steps of EM algorithm

E-step: calculate

M-step: find

),(maxarg)1( tt Q θθθθ

=+

∫==

dyxypyxp

yxpEQt

xypt

t

),|()|,(log

)]|,([log),( ),|(

θθ

θθθθ

See online notes for more derivations and analysis!

41

Page 42: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

How to write the Q function under GMM setting Likelihood of a data set is the multiplication of all the

sample likelihood, so

1 1

' ''

1 1 1 1 1

( , ) | , log , |

|| ,

|

, | | | , |

n Kt t

i i i ii k

t tk i kt

i i t tk i k

k

t t t t ti i i i i k i k

Q p y k x p x y k

p p xp y k x

p p x

p x y k p y k p x y k p p x

Page 43: 3D Geometry for Computer Graphicsgqi/CAP5610/CAP5610Lecture16.pdf · 2014. 10. 9. · Hard assignment: assign data points to the corresponding clusters based on fitness Fitness: distance

The Q function specific for GMM is

Plug in the definition of p(x|Θk), compute derivative w.r.t. the parameters, we obtain the iteration procedures

E step

M step

( )( ) ( )( )∑∑∑=

+++ =n

i

tki

tk

kk

tki

tk

tki

tktt xpp

xppxppQ

1

11

'''

1 |log|

|),( θθ

θθθ

( )

( )( )

∑∑

=

=+

=

=+

=

+−−

=Σ==

===

n

i

tik

Ttki

tki

n

i

tik

tkn

i

tik

i

n

i

tik

tk

n

i

tik

tk

k

tki

tk

tki

tkt

iitik

p

xxp

p

xpp

np

xppxppxkypp

1

11

1

11

1

1

'''

,,1

,)|(

)|(,|

µµµ

θθθ