20
Gaussian Mixture Model Gaussian Mixture Model classification classification of of Multi-Color Fluorescence In Situ Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science and Electrical Department of Computer Science and Electrical Engineering Engineering University of Missouri – Kansas City University of Missouri – Kansas City

Amin Fazel 2006

  • Upload
    erling

  • View
    55

  • Download
    5

Embed Size (px)

DESCRIPTION

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images. Amin Fazel 2006. Department of Computer Science and Electrical Engineering University of Missouri – Kansas City. Motivation and Goals. Chromosomes store genetic information - PowerPoint PPT Presentation

Citation preview

Page 1: Amin Fazel 2006

Gaussian Mixture ModelGaussian Mixture Modelclassificationclassification ofof

Multi-Color Fluorescence In Situ Multi-Color Fluorescence In Situ

Hybridization (M-FISH) ImagesHybridization (M-FISH) Images

Amin Fazel

2006

Department of Computer Science and Electrical EngineeringDepartment of Computer Science and Electrical Engineering University of Missouri – Kansas CityUniversity of Missouri – Kansas City

Page 2: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

2/15

Motivation and Goals

• Chromosomes store genetic information

• Chromosome images can indicate genetic disease, cancer, radiation damage, etc.

• Research goals:– Locate and classify each chromosome in

an image– Locate chromosome abnormalities

Page 3: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

3/15

Karyotyping

• 46 human chromosomes form 24 types– 22 different pairs– 2 sex chromosomes, X and Y

• Grouped and ordered by length

Banding Patterns Karyotype

Page 4: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

4/15

Multi-spectral Chromosome Imaging

• Multiplex Fluorescence In-Situ Hybridization (M-FISH) [1996]

• Five color dyes (fluorophores)• Each human chromosome type

absorbs a unique combination of the dyes

• 32 (25) possible combinations of dyes distinguish 24 human chromosome types

Healthy Male

Page 5: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

5/15

M-FISH Images

• 6th dye (DAPI) binds to all chromosomes

DAPI Channel6th Dye

M-FISH Image5 Dyes

Page 6: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

6/15

M-FISH Images

• Images of each dye obtained with appropriate optical filter

• Each pixel a six dimensional vector• Each vector element gives contribution of a

dye at pixel• Chromosomal origin distinguishable at single

pixel (unless overlapping)• Unnecessary to estimate length, relative

centromere position, or banding pattern

Page 7: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

7/15

Bayesian Classification

• Based on probability theory– A feature vector is denoted as

• x = [x1; x2; : : : ; xD]T

– D is the dimension of a vector

• The probability that a feature vector x belongs to class wk is p(wk|x) and this posteriori probability can be computed via

• and

)(

)()|()|(

xp

cPcxpxwp kk

k

k

iii cPcxpxp

1

)()|()(

Probability density function of class wk

Prior probability

Page 8: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

8/15

Gaussian Probability Density Function

• In the D-dimensional space

• is the mean vector • is the covariance matrix

– In the Gaussian distribution lies an assumption that the class model is truly a model of one basic class

)()(2

1

2/12/

1

e||)2(

1),;(

μxΣμx

ΣΣμx

T

DN

μ

Σ

Page 9: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

9/15

Gaussian mixture model GMM

• GMM is a set of several Gaussians which try to represent groups / clusters of data– therefore represent different subclasses

inside one class– The PDF is defined as a weighted sum of

Gaussians

C

ccckΝp

1

),;();( Σμxx

Page 10: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

10/15

Gaussian Mixture Models

Equations for GMMs:

multi-dimensional case: becomes vector , becomes covariance matrix .

assume is diagonal matrix:

C

ccccΝp

1

),,();( μxx

22 2/)(e2

1),,(

xxN

)()(2

1 1

e||)2(

1),,(

μxμxμx

T

DN

n

iii

1

2||

211

1

222

1

233

1

0 000

0 0-1 =

Page 11: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

11/15

GMM

• Gaussian Mixture Model (GMM) is characterized by• the number of components,• the means and covariance matrices of

the Gaussian components• the weight (height) of each component

Page 12: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

12/15

GMM

• GMM is the same dimension as the feature space (6-dimensional GMM)

• for visualization purposes, here are 2-dimensional GMMs:

like

liho

od

value1

valu

e2va

lue2

Page 13: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

13/15

GMM

• These parameters are tuned using a iterative procedure called the Expectation Maximization (EM)

• EM algorithm: recursively updates distribution of each Gaussian model and conditional probability to increase the maximum likelihood.

Page 14: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

14/15

GMM Training Flow Chart (1)• Initialize the initial Gaussian means μi using the K-means clustering

algorithm• Initialize the covariance matrices to the distance to the nearest cluster• Initialize the weights 1 / C so that all Gaussian are equally likely

• K-means clustering1. Initialization:

random or max. distance.2. Search:

for each training vector, find the closest code word,assign this training vector to that cell

3. Centroid Update:for each cell, compute centroid of that cell. Thenew code word is the centroid.

4. Repeat (2)-(3) until average distance falls below threshold

Page 15: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

15/15

GMM Training Flow Chart (2)

E step: Computes the conditional expectation of the complete log-likelihood, (Evaluate the posterior probabilities that relate each cluster to each data point in the conditional probability) assuming the current cluster parameters to be correct

M step: Find the cluster parameters that maximize the likelihood of the data assuming that the current data distribution is correct.

N

n cnNic wp

1 ,11

N

ncn

N

ncnn

ic

w

w

1,

1,

1

x

c

j

in

ij

in

ic

cn

jxpp

cxppw

1

,

);|(

);|(

N

ncn

N

n

Ticn

icncn

ic

w

xxw

1,

1

11,

1

))((

Page 16: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

16/15

GMM Training Flow Chart (3)

• recompute wn,c using the new weights, means and covariances. Stop training if

– wn+1,c - wn,c < threshold

• Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.

Page 17: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

17/15

GMM Test Flow Chart

• Present each input pattern x and compute the confidence for each class k:

• Where is the prior probability of class ck estimated by counting the number of training patterns

• Classify pattern x as the class with the highest confidence.

),|()( kk cPcP x

)( kcP

Page 18: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

18/15

Results

Training Input Data

Page 19: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

19/15

ResultsOne GaussianCorrectness

Two GaussianCorrectness

True label

Page 20: Amin Fazel 2006

Thursday, June, 2006

CS and EE DepartmentCS and EE DepartmentUMKCUMKC

20/15

Thanks for your patience !