Upload
taite
View
34
Download
0
Embed Size (px)
DESCRIPTION
Gaussianization based on Principal Components Analisys (GPCA): an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps. INDEX. What? Why? How? Conclusions Toolbox. What?. Estimate multidimensional Probability Densities - PowerPoint PPT Presentation
Citation preview
Gaussianization based on Principal Components Analisys
(GPCA):an easy tool for optimal signal
processing.
Valero Laparra
Jesús Malo
Gustavo Camps
INDEX
-What?-Why?-How?-Conclusions
-Toolbox
What?
• Estimate multidimensional Probability Densities
• How the N-D data is distributed in the N-D space
What to pay atention to!What is important from our data
What?
Why?
• GENERIC OPTIMAL SOLUTIONS
Why?
• GENERIC OPTIMAL SOLUTIONS
Why?
• GENERIC OPTIMAL SOLUTIONS
Why?
• GENERIC OPTIMAL SOLUTIONS
How?
• PDF estimation through samples always asume a model.
• HISTOGRAM: without assuming a functional model
How?
• X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]
How?
• Problem: Number of bins estimationNbins = √Nsamples
How?
• Problem: “the curse of dimensionality”
- Nb_total = Nb_dim ^N_dim
- If we assume: Ns = Nb^2
- Ns = Nb^2*Nd
How?
• Problem: “the curse of dimensionality”Nb total = Nb dimension ^N dimensions
e.g.– Assuming a minimum number of Nb = 11 bins– We need Ns = 11^2*Nd– If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY
Nd = 6, Ns = 3.138.400.000.000 HELP MEMORY
How?
• From P(x) to P(y) (Gaussian)
T??
How?
How?
MATLAB, MATLAB, WHAT A WONDERFUL WORLD
Answer: GPCA
How? Theoretical convergence Proof
• Negentropy:
How?
OPEN ISSUE
How?
• Stop criterion:
NOTE THAT:
Measuring Mutual
Information
GAUSSIAN UNIQUE DISTRIBUTION
WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS
I (Xn) = ~ 0
How? GPCA Inverse
NOTE THAT:
Synthesis
How? GPCA Jacobian
CONCLUSIONS
• The optimal solution of many problems involves the knoledge of the data pdf.
• GPCA obtains a transform that convert any pdf in a Gaussian pdf.
• It has an easy inverse.
• It has an easy Jacobian.
• This transform can be used to calculate the pdf of any data.
GPCA toolbox (Matlab)3 examples
• PDF estimation
• Mutual Information Measures
• Synthesis
Wiki-page
Wiki-page
Beta version
Beta version
Basic toolbox
• [datT Trans] = GPCA (dat, Nit, Perc)
- dat = data matrix with [N dimensions x N samples]
e.g. 100 samples from 2-D gaussiandat = [2 x 100]
- Nit = Number of iterations
- Perc = percentage of increase the pdf Range.
Basic toolbox
• Perc = percentage of increase the pdf range.
Basic toolbox
• [datT Trans] = auto_GPCA(dat)
• [datT] = apply_GPCA(dat,Trans)
• [dat] = inv_GPCA(datT,Trans)
• [Px pT detJ JJ] = GPCA_probability(x0,Trans)
Estimating PDF/manifold
• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);
Estimating PDF/manifold
• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);
Estimating PDF/manifold
• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);
Estimating PDF/manifold
• PROBLEMS
– Not always arrives to Gaussian– Pdf with clusters is more complicated– The Jacobian estimation is highly point-dependent– The derivative (in the Jacobian estimation) is much more
irregular than the integral.– The pdf has to be estimated for each point
Measuring Mutual Information
• [datT Trans] = auto_GPCA(dat)• MI = abs(min(cumsum(cat(1,Trans.I)))));
Error = (Real MI – Estimated MI) / Real MI (10 realizations)
N - dim Pdf - 1 Pdf - 2 Pdf - 3
3 0.0697 0.0787 0.0630
4 0.0150 0.0031 0.0048
5 0.0353 0.0297 0.0328
8 0.0313 0.0369 0.0372
10 0.0148 0.0145 0.0132
Measuring Mutual Information
Measuring Mutual Information
• PROBLEMS
– Entropy estimators are not perfectly defined– More iterations, more error– As more complicated pdf, more error
Synthesizing data• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans);
T1
T2
Inv T1
Inv T2
Synthesizing data
• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);
Synthesizing data
• PROBLEMS
– Not always arrive to a Gaussian– Little variations on the variance of the random data obtains very
different results.– No information about features of the data in the transformed
domain.
• Thanks for your time