Valero Laparra Jesús Malo Gustavo Camps

Gaussianization based on Principal Components Analisys

(GPCA):an easy tool for optimal signal

processing.

Valero Laparra

Jesús Malo

Gustavo Camps

INDEX

-What?-Why?-How?-Conclusions

-Toolbox

What?

• Estimate multidimensional Probability Densities

• How the N-D data is distributed in the N-D space

What to pay atention to!What is important from our data

What?

Why?

• GENERIC OPTIMAL SOLUTIONS

Why?


Why?


Why?


How?

• PDF estimation through samples always asume a model.

• HISTOGRAM: without assuming a functional model

How?

• X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]

How?

• Problem: Number of bins estimationNbins = √Nsamples

How?

• Problem: “the curse of dimensionality”

- Nb_total = Nb_dim ^N_dim

- If we assume: Ns = Nb^2

- Ns = Nb^2*Nd

How?

• Problem: “the curse of dimensionality”Nb total = Nb dimension ^N dimensions

e.g.– Assuming a minimum number of Nb = 11 bins– We need Ns = 11^2*Nd– If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY

Nd = 6, Ns = 3.138.400.000.000 HELP MEMORY

How?

• From P(x) to P(y) (Gaussian)

T??

How?

How?

MATLAB, MATLAB, WHAT A WONDERFUL WORLD

Answer: GPCA

How? Theoretical convergence Proof

• Negentropy:

How?

OPEN ISSUE

How?

• Stop criterion:

NOTE THAT:

Measuring Mutual

Information

GAUSSIAN UNIQUE DISTRIBUTION

WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS

I (Xn) = ~ 0

How? GPCA Inverse

NOTE THAT:

Synthesis

How? GPCA Jacobian

CONCLUSIONS

• The optimal solution of many problems involves the knoledge of the data pdf.

• GPCA obtains a transform that convert any pdf in a Gaussian pdf.

• It has an easy inverse.

• It has an easy Jacobian.

• This transform can be used to calculate the pdf of any data.

GPCA toolbox (Matlab)3 examples

• PDF estimation

• Mutual Information Measures

• Synthesis

Wiki-page

Wiki-page

Beta version

Beta version

Basic toolbox

• [datT Trans] = GPCA (dat, Nit, Perc)

- dat = data matrix with [N dimensions x N samples]

e.g. 100 samples from 2-D gaussiandat = [2 x 100]

- Nit = Number of iterations

- Perc = percentage of increase the pdf Range.

Basic toolbox

• Perc = percentage of increase the pdf range.

Basic toolbox

• [datT Trans] = auto_GPCA(dat)

• [datT] = apply_GPCA(dat,Trans)

• [dat] = inv_GPCA(datT,Trans)

• [Px pT detJ JJ] = GPCA_probability(x0,Trans)

Estimating PDF/manifold

• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);






• PROBLEMS

– Not always arrives to Gaussian– Pdf with clusters is more complicated– The Jacobian estimation is highly point-dependent– The derivative (in the Jacobian estimation) is much more

irregular than the integral.– The pdf has to be estimated for each point

Measuring Mutual Information

• [datT Trans] = auto_GPCA(dat)• MI = abs(min(cumsum(cat(1,Trans.I)))));

Error = (Real MI – Estimated MI) / Real MI (10 realizations)

N - dim Pdf - 1 Pdf - 2 Pdf - 3

3 0.0697 0.0787 0.0630

4 0.0150 0.0031 0.0048

5 0.0353 0.0297 0.0328

8 0.0313 0.0369 0.0372

10 0.0148 0.0145 0.0132



• PROBLEMS

– Entropy estimators are not perfectly defined– More iterations, more error– As more complicated pdf, more error

Synthesizing data• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans);

T1

T2

Inv T1

Inv T2

Synthesizing data

• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);

Synthesizing data

• PROBLEMS

– Not always arrive to a Gaussian– Little variations on the variance of the random data obtains very

different results.– No information about features of the data in the transformed

domain.

• Thanks for your time

Documents

Valero Laparra Jesús Malo Gustavo Camps