CRV 2010 - computerrobotvision.orgcomputerrobotvision.org/2010/tutorial_day/GMM_said_crv10_tutorial.pdf · A short tutorial on. Gaussian Mixture Models. CRV. 2010. By: Mohand Saïd

A short tutorial onGaussian Mixture Models

CRV 2010

By: Mohand Saïd AlliliUniversité du Québec en Outaouais

1

2

Plan

-Introduction

-What is a Gaussian mixture model?

-The Expectation-Maximization algorithm

-Some issues

-Applications of GMM in computer vision

3

A few basic equalities that are often used:

)()|()( BpBApBAp =∩

• Conditional probabilities:

)()()|()|(

APBpBApABp =

• Bayes rule:

•

)()(

,:

1i

K

i

ji

K

1ii

BApAp

BBjiB

∩=

=∩≠∀Ω=

∑=

=

:then and if φU

Introduction

44

Carl Friedrich Gauss invented the normal distribution in 1809 as a way to rationalize the method of least squares.

Introduction

5

For d dimensions, the Gaussian distribution of a vector x =(x1, x2, …,xd)T

is defined by:

⎟⎠⎞

⎜⎝⎛ −Σ−−

Σ=Σ − )()(

21exp

||)2(1),|( 1

2/ μμπ

μ xxxΝ Td

What is a Gaussian?

Gaussian. the of matrix covariancethe is and mean the is where Σμ

Example: T)0,0(=μ ⎟⎟⎠

⎞⎜⎜⎝

⎛=Σ

00.130.030.025.0

6

What is a Gaussian mixture model?

Examples:

d=1: d=2:

The probability given in a mixture of K Gaussians is:

∑=

Σ⋅=K

jjjj xΝwxp

1

),|()( μ

Gaussian. th the of (weight) yprobabilit prior the is where jwj

∑=

=K

jjw

1

1 10 ≤≤ jwand

7

data. the fits that model GMM the of parameters the estimate GMM), a(probably ondistributi unknown

an from drawn data of set a Given

θ

},...,,{ 21 NxxxX =

• Problem:

What is a Gaussian mixture model?

• Solution:

?)|(

parameters model the toregard withdata the of likelihood the Maximize θXp

∏=

==N

iixpXp

1

* )|(maxarg )|(maxarg θθθθθ

The Expectation-Maximization algorithm

• Basic ideas of the EM algorithm:

.likelihood the of onmaximizati the simplify wouldknowledge its that such variable hidden a Introduce -

:iteration each At-

• E-Step: Estimate the distribution of the hidden variable given the data and the current value of the parameters.

• M-Step: Maximize the joint distribution of the data and the hidden variable.

8

algorithm. (EM) onMaximizati-nExpectatio the use to islikelihood the maximize to approaches popular most the of One

9

The EM for the GGM (graphical view 1)

Hidden variable: for each point, which Gaussian generated it?

10


E-Step: for each point, estimate the probability that each Gaussian generated it.

11


M-Step: modify the parameters according to the hidden variable to maximize the likelihood of the data (and the hidden variable).

12

General formulation of the EM

)Z is name variablehidden the follows what (In

:function auxillary"" following the Consider

[ ]))|,(log(),( tZ

t ZXpEQ θθθ =

:),( tQ θθ maximizing that shown been can It

[ ]),(maxarg1 tt Q θθθθ

=+

data. the of dliekelihoo the maximizes always

[ ]

)]|(log[)],|(log[),|(

)]|(),|(log[),|(

))|,(log(),|(

))|,(log(),(

θθθ

θθθ

θθ

θθθ

XpXZpXzp

XpXZpXzp

ZXpXzp

ZXpEQ

z

t

z

t

z

t

tZ

t

+⎥⎦

⎤⎢⎣

⎡=

⋅=

=

=

∑

∑

∑

13

Proof of EM convergence

)1(

14

Proof of EM convergence

)]|(log[)],|(log[),|(),( tt

z

ttt XpXZpXzpQ θθθθθ +⎥⎦

⎤⎢⎣

⎡= ∑

:have we If ,tθθ =

:have we ,(2) and (1) From

444444 3444444 210

),|(),|(log),|(),(),(

)]|(log[)]|(log[

≥

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+−

=−

∑ tz

tttt

t

XZpXZpXzpQQ

XpXp

θθθθθθθ

θθ

increases. likelihood the then increases If )|(),( θθθ XpQ t

Conclusion:

.likelihood maximum the as same the is of maximum The ),( tQ θθ

)2(

15

EM for the GMM

known. always not is ely,Unfortunat Gaussian. by Gaussian

estimated be can parameters the then observed, is if that Note ‐

ZZ

Nixi :,...,1, = point data a for Gaussians of mixture the write us Let‐

∑=

Σ⋅=K

jjjiji xΝwxp

1

),|()( μ

:variable indicator the introduce We‐

otherwise. 0

emitted Gaussian if 1

⎩⎨⎧

=.i

ij

xjz

16

EM for the GMM

:follows as and all of likelihood joint the write now can We ZX

[ ]

[ ] [ ]∏∏

∏∏

= =

= =

=

=

=

N

i

K

j

zzi

N

i

K

j

zi

ijij

ij

jpjxp

jxp

ZXpZXL

1 1

1 1

)|(),|(

)|,(

)|,(),,(

θθ

θ

θθ

:gives function log"" Using

[ ] [ ] [ ]∑∑= =

+=N

i

K

jijiij jpzjxpzZXp

1 1

)|(log),|(log),(log θθ

17

:by given is function auxillary The

[ ][ ] [ ]

[ ] [ ]∑∑

∑∑

= =

= =

+=

⎥⎦

⎤⎢⎣

⎡+=

=

N

i

K

j

tijZi

tijZ

N

i

K

j

tijiijZ

tZ

t

jpzEjxpzE

jpzjxpzE

ZXpEQ

1 1

1 1

)|(log)|(),|(log)|(

|)|(log),|(log

|)),(log(),(

θθθθ

θθθ

θθθ

EM for the GMM

:by given Step‐E the then have We

),(),|(),(

),|(

)|0(0)|1(1)|(

ti

ti

t

ti

tij

tij

tijZ

xpjxpjp

xjp

zpzpzE

θθθ

θ

θθθ

=

=

=×+=×=

(Posterior distribution)

18

:by given Step‐M the And

EM for the GMM

0),(=

∂∂

θθθ tQ

.1),...,1,,,( ∑=

==Σ=K

1jjjjj wKjw and where μθ

:obtain we ons,manipulati rwardstraightfo After

∑

∑

=

== N

i

ti

i

N

i

ti

j

xjp

xxjp

1

1

),|(

),|(

θ

θμ

[ ]

∑

∑

=

=

−−=Σ N

i

ti

Tjiji

N

i

ti

j

xjp

xxxjp

1

1

),|(

))((),|(

θ

μμθ

19

EM for the GMM

:gives multiplier Lagrange using constraint the gcorporatin In ∑=

=K

1jjw 1

⎟⎟⎠

⎞⎜⎜⎝

⎛−−= ∑

=

K

jj

tt wQJ1

1),(),( λθθθθ

:Then

0),|(

),(),(

1

=−=

−∂

∂=

∂∂

∑=

λθ

λθθθθ

N

i j

ti

j

t

j

t

wxjp

wQ

wJ

:gives which

λ

θ∑==

N

i

ti

j

xjpw 1

),|()3(

20

EM for the GMM

:have we Also,

01),(1

=−=∂

∂ ∑=

K

jj

t

wJλθθ

:obtain we(4), and (3) From

1),|(11 1

=∑∑= =

K

j

N

i

tixjp θ

λ

:that follows It N=λ

:finally and ∑=

=N

i

tij xjp

Nw

1

),|(1 θ

)4(

21

Some issues

1- Initialization:- EM is an iterative algorithm which is very sensitive to initial conditions:

trash with up end trash from Start →

- Usually, we use the K-Means to get a good initialization.

2- Number of Gaussians:

- Use information-theoretic criteria to obtain the optima K.

)log()1(21)1()),,(log( NDDDKKZXLMDL

⎭⎬⎫

⎩⎨⎧

⎥⎦⎤

⎢⎣⎡ +++−+−= θ

.,,, etcMMLBICAIC:criteria Other

Ex:

22

3- Simplification of the covariance matrices:

Some issues

Case 1: Spherical covariance matrix Idiag jjjjj2222 ),...,,( σσσσ ==Σ

Case 2: Diagonal covariance matrix ),...,,( 222

21 jdjjj diag σσσ=Σ

-Less precise.-Very efficient to compute.

-More precise.-Efficient to compute.

23

3- Simplification of the covariance matrices:

Some issues

Case 1: Full covariance matrix

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=Σ

221

22212

12121

),(cov),(cov

),(cov),(cov

),(cov),(cov

jddjdj

djjj

djjj

j

xxxx

xxxx

xxxx

σ

σ

σ

L

MMMM

L

L

-Very precise.-Less efficient to compute.

24

Applications of GMM in computer vision

1- Image segmentation:

( )TBGRX ,,=

25

Applications of GMM in computer vision

2- Object tracking:Knowing the moving object distribution in the first frame, we can localizethe objet in the next frames by tracking its distribution.

26

Some references

1. Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.

2. Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley Series in Probability and Statistics, 2000.

3. Samy Bengio. Statistical Machine Leaning from Data: Gaussian Mixture Models. http://bengio.abracadoudou.com/lectures/gmm.pdf

4. T. Hastie et al. The Elements of of Statistical Learning. Springer, 2009.

http://bengio.abracadoudou.com/lectures/gmm.pdf

Questions?

27

Documents

CRV 2010 - computerrobotvision.orgcomputerrobotvision.org/2010/tutorial_day/GMM_said_crv10_tutorial.pdf · A short tutorial on. Gaussian Mixture Models. CRV. 2010. By: Mohand Saïd