High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture...

Preview:

Citation preview

High-Dimensional Unsupervised Selection and Estimation of a Finite Generali

zed Dirichlet Mixture model Based on Minimum Message Length

by Nizar Bouguila and Djemel Ziou

Dissusion led by Qi An

Duke University Machine Learning Group

Outline

• Introduction

• The generalized Dirichlet mixture

• The minimal message length (MML) criterion

• Fisher information matrix and priors

• Density estimation and model selection

• Experimental results

• Conclusions

Introduction

• How to determine the number of components in a mixture model for high-dimensional data?– Stochastic and resampling (Slow)

• Implementation of model selection criteria• Fully Bayesian way

– Deterministic (Fast)• Approximate Bayesian criteria• Information/coding theory concepts

– Minimal message length (MML)

– Akaike’s information criterion (AIC)

The generalized Dirichlet distribution

• A d dimensional generalized Dirichlet distribution is defined to be

It can be reduced to the Dirichlet distribuiton when

where and , , ,

d

iiX

1

1 10 iX 0i 0i 11 iiii

11 iii

The generalized Dirichlet distribution

For the generalized Dirichlet distribution:

The GDD has a more general covariance structure than the DD and it is conjugate to multinomial distribution.

GDD vs. Gaussian

• The GDD has smaller number of parameters to estimate. The estimation can be more accurate

• The GDD is defined in a support [0,1] and can be extended to a compact support [A,B]. It is more appropriate for the nature of data.

Beta distribution:

Beta type-II distribution:

They are equal if we set u=v/(1+v).

A GDD mixture model

A generalized Dirichlet mixture model with M components, where p(X|α) takes a form of the GDD.

The MML criterion

• The message length is defined as minus the logarithm of the posterior probability.

• After placing an explicit prior over parameters, the message length for a mixture of distribution is given as

prior likelihood Fisher Information

optimal quantization constant

Fisher Information matrix

• The Fisher information matrix is the expected value of the Hessian minus the logarithm of the likelihood

where

Prior distribution

• Assume the independence between difference components

Mixture weighs

GDD parameters

Place a Dirichlet distribution and a generalized Dirichlet distribution on P and α, respectively, with parameters set to 1.

Message length

• After obtaining the Fisher information and specifying the prior distribution, the message length can be expressed as

Estimation and selection algorithm

• The authors use an EM algorithm to estimate the mixture parameters.

• To overcome the computation issue and local maxima problem, they implement a fairly sophisticated initialization algorithm.

• The whole algorithm is summarized in the next page

Experimental results

The correct number of mixture are 5, 6, 7, respectively

Experimental results

Experimental results

• Web mining:– Training with multiple

classes of labels– Use to

predict the label of testing sample

– Use top 200 words frequency

Conclusions

• A MML-based criterion is proposed to select the number of components in generalized Dirichlet mixtures.

• Full dimensionality of the data is used.• Generalized Dirichlet mixtures allow more model

ing flexibility than mixture of Gaussians.• The results indicate clearly that the MML and LE

C model selection methods outperform the other methods.