On image intensities, eigenfaces and LDA

DRAFT

IMAGE PROCESSING, RETRIEVAL, AND ANALYSIS II: REPORT ON RESULTS

Raghunandan Palakodety

Universitat BonnInstitut fur Informatik

Bonn

ABSTRACT

This report presents the problem statements for three differentprojects and illustrate the results that followed from practicalimplementations in C++, using OpenCV framework. In im-age processing and information theory, data compression isfound to be useful for transmitting or representing data withrelatively few number of bits. In the case of images, the prob-ability distribution is not uniform and assigning equal numberof bits to each pixel can prove to be redundant. Now, im-age quantization corresponds to reducing the number of bitsused for representing the image pixels at the expense of dataloss. This loss is not much noticeable. For this task, we usediterative Lloyd-Max quantizer design which is non-uniformquantizer. Dealing furthermore on image intensities, manyof face recognition pipelines include an image pre-processingstep. One such step is illumination compensation, employedto cope with varying illumination. For addressing this prob-lem, we used Retinex theory. The next project follows oncomputing eigenfaces, an approach addressing high-level vi-sual problem, face recognition. In this approach, we trans-form face images into a small set of characteristic feature im-ages known as eigenfaces, which are principal components ofthe initial training set of images. Later, recognition is per-formed by projecting a new image onto the subspace spannedby eigenfaces. The final project includes two tasks. First taskbeing binary classification based on Fisher’s linear discrimi-nant or LDA. Second task accentuates the merits of tensorialdiscriminant classification, utilizes the concepts from tensoralgebra for the task of visual object recognition. This ap-proach outperforms conventional LDA in terms of trainingtime and also addresses the singular scatter matrices, that is,the small sample size problem.

Index Terms— image intensities, quantization, illumina-tion correction, principal component analysis, linear discrim-inant analysis, tensor contractions.

1. INTRODUCTION

This paper summarizes and highlights results of three projectsgiven. Problem specifications on image intensities, eigen-faces and linear discriminant analysis were given and the im-

plementations were done using OpenCV framework, C++ andQCustomplot. The outcomes of the given projects werediscussed to the best of our abilities, based on relevant ques-tions that were posed.The first project contains two tasks. First is implementationof Llyod-Max algorithm for grey value quantization. Secondis estimating illumination plane parameters of an image thatcorresponds to the best-fit plane from the image intensities.The second project consists computing eigenfaces using a col-lection of 2429 tiny face images of size 19x19. In this project,we wish to find the principal components of the distributionof faces and treating each image as a point in a very high di-mensional space.The third project focuses on object recognition and it con-sists of two tasks. First is implementation of a binary clas-sifier based on traditional Linear Discriminant Analysis orLDA. Second task is tensor based Linear Discriminant Anal-ysis which involves treating images as higher order tensorsinstead of vectorizing the same. The theory behind this taskis taken from the paper [1] in which tensor contractions arerepeatedly applied to the given set of training examples anduses alternating least squares to obtain an ρ term projectiontensor.This paper is organized into sections in which the theoreti-cal background of the project, the task specifications and out-comes are discussed. This document ends with a conclusionsection, in which recent advents and improvements pertainingto the projects are discussed.

2. THEORETICAL BACKGROUND FOR IMAGEQUANTIZATION

This section describes image quantization and summarizesthe need for an algorithm or procedure to achieve end re-sults. Quantization reduces ranges of values in a signal toa single value. A quantizer maps the continuous variablex into a discrete xq which takes values from a finite set{r1, r2, r3, ...., rL} of numbers. The quantizer minimizes themean squared error for a given number of quantization levelsL. Let x, with 0 ≤ x ≤ A be a real scalar random variablewith a continuous probability density function (PDF) pX(x).It is desired to find optimum boundaries or (decision) av and

DRAFT

the quantization (representation or reconstruction) points bvfor an L-level quantizer such that mean square error (MSE)or quantization error E drops below a threshold or does notimprove significantly.

2.1. Llyod-Max quantization algorithm

For visualizing quantization curves, an intensity histogramh(x) of a grey value image is converted into a density func-tion p(x) using the following transformation,

p(x) =h(x)∑y h(y)

(1)

.The following steps 1 and 2 describe the initialization of

boundaries and quantization points. Steps 3 and 4 are com-puted iteratively [2].

1. Initialize the boundaries av of the quantization intervalsas

a0 = 0 (2)

av = v.256

L(3)

aL+1 = 256 (4)

2. Initialize the quantization or representation points bv asfollows

bv = v.256

L+

256

2.L(5)

3. Iterate the following two steps

av =bv + bv−1

2(6)

4.

bv =

∫ av+1

avx.p(x).dx∫ av+1

avp(x).dx

(7)

The above steps 3 and 4 are computed iteratively until thequantization error drops below a threshold.

E =

L∑v=1

∫ av+1

av

(x− bv)2.p(x).dx (8)

3. THEORETICAL BACKGROUND FORILLUMINATION COMPENSATION

In image pre-processing algorithms it is necessary to com-pensate for non-uniform lighting conditions. Illuminationconditions have an impact on facial features which attributetowards robust face recognition. A study [3] performed byNIST, on the progress made on face recognition under con-trolled and uncontrolled illumination constraints, shows thatillumination has substantial effect on the recognition process.

The reason behind such illumination compensation proved tobe conducive for face recognition systems. Due to 3D shapeof human faces, a direct lighting source can produce strongshadows that accentuate or diminish certain facial features.In such a case, face recognition becomes arduous [4]. Theclassic solution to this problem is applying histogram equal-ization which produces optimal global contrast for a givenimage. However, histogram equalization was considered asa crude approach. Another approach proposed in [5] per-forms logarithmic transformations to enhance low gray levelsand compress the higher ones. To recover an image underassumed lighting condition, Quotient Image proposed in [6]outperformed PCA.Assuming reader is cognizant of lambertian surfaces, theobject surface’s irradiance is modeled using a mathemati-cal equation, Quotient Image extracts the object’s surfacereflectance as an illumination invariant. More on model-ing reflectance of opaque surfaces can be found BRDF (bi-directional reflectance distribution function) theory.

The following approach is based on Retinex theory andplane-subtraction or illumination gradient compensation al-gorithm, which calculates a best-brightness plane to the im-age under analysis and later subtracting this plane to the im-age [4].

3.1. Illumination compensation

The reflectance model used in many cases can be expressedas

I(x, y) = R(x, y).L(x, y) (9)

where I(x, y) is image pixel value, R(x, y) is the reflectanceandL(x, y) is the illumination at each point (x, y). The natureof L(x, y) is determined by lighting source while R(x, y) isdetermined by characteristics of the surface of object. There-fore, R(x, y) which can be regarded as illumination insen-sitive measure. Separating the reflectance R and the illumi-nance L from real images is an ill-posed problem.It is known from image pre-processing techniques that illu-mination plane IP (x, y) of an image I(x, y) corresponds tobest-fit plane from the image intensities. IP (x, y) is a linearapproximation of I(x, y), given by

ansatz : IP (x, y) = ax + by + c (10)

Here, IP (x, y) is the intensity value of the pixel at loca-tion (x, y)The above equation addresses 3-D regression planefitting problem [7]. The plane parameters a, b and c are esti-mated by the linear regression formula as follows

p = (XTX)−1XT x (11)

where p ∈ R3 is a vector that comprises the plane parameters(a, b and c) and x ∈ Rn is I(x, y) in a vector form where n isthe number of pixels. X ∈ Rnx3 is a matrix which holds the

DRAFT(a) (b)

Fig. 1. The image in (a) has an uneven illumination, while (b)is the illumination compensated image.

(a) (b)

Fig. 2. The plot in (a) shows the image function f(x, y) withx and y, while (b) image function f(x, y) along with illumi-nation plane IP (x, y) and the contours.

pixels co-ordinates of the image under analysis. The first col-umn contains the horizontal coordinates, the second columnthe vertical coordinates and the entries in the third column areset to value 1.After estimating IP (x, y), this plane is subtracted fromI(x, y). This allows reducing shadows caused by extremelighting angles[4]. The results from our task on a set of twoinput images are shown below. Figure 1 shows the same.

Another result of our experiment is shown in figure 3.However, the changes are not conspicuous, but on perusal theresults show compensation. An additional step of Histogramequalization can improve the results.3-dimensional plot of the image function f(x, y) (shown infigure 2a)along with the estimated illumination plane modelis shown in the figure 2b.

4. THEORETICAL BACKGROUND FOREIGENFACES AND PRINCIPAL COMPONENT

ANALYSIS

The eigenface approach for this classical pattern recognitionproblem is to find the principal components of the distribu-

(a) (b)

Fig. 3. The image in (a) has an uneven illumination, while (b)is illumination corrected image.

tion of the faces or the eigenvectors of the covariance matrixof the set of face images, treating an image as in a very highdimensional space. In this approach, we project training im-age patches onto a lower-dimension space (sub-space) whererecognition is carried out. Since, we vectorize all the trainingimage patches before such a projection, each face image patchI ∈ Rmxn generates a huge dimensional input face space Rd,where d is m.n. Due to memory storage constraints and lim-ited computational capacity, obtaining a parameterized modelin this high dimensional space is very difficult.Dimensionality reduction of the input face space is the so-lution and principal component analysis or PCA is one suchprojection algorithm used, in order to obtain a reduced repre-sentation of face images. Later in [8], these PCA projectionsare used as feature vectors and similarity functions or distancemetrics such as Mahalobnis distance, Euclidean distance areemployed to to solve the problem of face recognition.PCA was invented by Karl Pearson in 1901 and first publishedin German as Karhunen-Loeve transformation or KLT[9], inwhich a continuous transformation for de-correlating signalswas proposed. In this task, PCA is a powerful unsupervisedmethod for dimensionality reduction in data. It can be il-lustrated using a two dimensional dataset. Consider the plotshown in 5 for illustration of PCA. PCA finds the principalaxes in the data and explains the importance of those axesthat which describe the data distribution. Consider anotherplot shown in figure 6, in which one of the vectors is no longerthan the other. This implies that data in the direction of longervector has significance greater than the data towards shortervector. After removing 5% of variance of this dataset and re-projecting the data points on to the vector, the resulting plotis shown in figure 4. The light shaded points are the originaldata points and the dark blue points are the projected version.This can be understood as dimensionality reduction.

Another approach for the task of face recognition is usingFisher’s Linear discriminant analysis as projection algorithmwhich will be dealt along with a novel and fast approach pro-posed by [1].

DRAFTFig. 4. Approximating the dataset in lower dimension or di-mensionality reduction

Fig. 5. 2 dimensional scatter plot

4.1. Computing eigenfaces

In this task we are given a collection of 2429 tiny face imagesor image patches, each of size 19x19. From the given collec-tion, we randomly chose 2186 as training images and rest 243as test images. The images are read into a matrices X361x2186

train

and X361x243test . The data matrix X361x2186

train is centered at zeromean as

Xtrain = Xtrain − Xmean. (12)

The mean image computed is shown in the figure 9. Later,the covariance matrix C. C = XtrainXTtrain is computed ina way that is conducive for eigenvalue decomposition. Notethat here covariance matrix is C361x361. To compute the eigenvectors of covariance matrix C = XtrainXTtrain, we multiply

Fig. 6. Principal axes

both sides of the equation with data matrix XTtrain. Upon do-ing the same, the equation looks the following way,

XTtrainXtrain(XTtrainvi) = λi(XTtrainvi) (13)

Compute the eigenvalues λi and vi of the covariance matrixC and the resulting eigenvectors are orthogonal to each other.To this end, eigendecomposition of C is carried out to obtainthe eigenvalues and eigenvectors. The spectrum of covariancematrix is shown in the figure 10. The set of eigenvalues arearranged in a descending order. Here, the eigenvalues rep-resent the variance of the data along the eigenvector direc-tions. From the plot 10, we considered first 20 eigenvectorsvi ∈ R361 where i = 0, 1, .., 19.Upon visualizing first 20 eigenvectors (corresponding to 20largest eigenvalues), the results are shown in figure 8. Fromthese results, we can understand that each image patch (withmean subtracted) in the training set can be represented as alinear combination of the best 20 eigenvectors. In general theequation is as follows,

Ii − Imean =

K∑j=1

wjuj (14)

and wj = uTj Ii.

in which we call the uj as eigenfaces.As mentioned in the section 4.1, Xtest holds test patch imagesvectorized similar to those of training image patches and thetest data is centered with respect to training data mean, asshown below

Xtest = Xtest − Xmean. (15)

We selected 10 random test image patches, computed theeuclidean distance to all training image patches and plotted

DRAFT(a)

(b)

Fig. 7. The plot in (a) shows distances of test image 0 toall the training data while (b) displays the same except in alower-dimensional space.

the distances in descending order as shown in figure 7a. Fur-thermore, we projected all training and test data onto a sub-space spanned by k = 20 eigenvectors vi. Later, we com-puted and plotted the euclidean distances (in descending or-der) of same test vectors to all the training vectors in thislower dimensional space or subspace as shown in figure 7b.

5. THEORETICAL BACKGROUND FOR LINEARDISCRIMINANT ANALYSIS

In the previous section 4, a projection method for dimension-ality reduction, PCA, was discussed. PCA is a general methodfor identifying the linear directions in which a set of vectors

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j)

(k) (l) (m) (n) (o) (p) (q) (r) (s) (t)

Fig. 8. Visualizing 20 eigen vectors

Fig. 9. Mean image computed from training samples.

Fig. 10. Spectrum of covariance

are best represented, allowing a dimension reduced by choos-ing the directions of largest variance.As we have seen in the previous section, dimensionality re-duction depends on linear methods such as PCA, which findsthe directions of maximal variance in high dimensional data.By selecting only those axes that have the largest variance,PCA aims to capture the directions that contain most infor-mation about the training image vectors, so we can express asmuch as possible with a minimal number of dimensions. PCAgives out components that well describe this pattern, however,the question remains whether those components are necessar-ily good for distinguishing between classes. This questionarises during the recognition role of the system. For address-ing this problem, we need discriminative features instead ofdescriptive features. This claim can be supported by allowinga supervised learning setting, that is, using labeled trainingimage patches. Furthermore, an additional question arises onthe definition of discriminant and separability of classes.Fisher’s linear discriminant analysis or LDA is used to find anoptimal linear projection W , that captures major differencebetween classes, in other words, that maximizes the separa-bility between two classes in a two class problem setting. Inthe projected discriminative subspace, data are then clustered[10]. Linear Discriminant Analysis or LDA searches for theprojection axes on which the input vectors of two differentclasses are far away from each other and at the same time in-put vectors of same class are close to each other [10]. Amongall such infinitely many projection axes or lines, a line is cho-sen that maximally separates the projected data [11]. Thesolution to this problem is obtained by solving the generaleigensystem of within-class and between-class scatter matri-ces.

DRAFT

LDA for binary classification requires supervised setting. Acollection of n labeled training data

{(xi, yi)}ni=1 (16)

where the data vectors xi ∈ Rm are from two classes C1

and C2 and the labels yi ∈ {+1,−1} indicate class member-ship in such way,

yi =

{+1, if xi ∈ C1

−1, if xi ∈ C2

the task requires us to determine a classifier y(x) that assignsor predicts an unknown/unseen or new data point, a class la-bel [11].One way to view a linear classification model is in termsof dimensionality reduction. Consider first the case of twoclasses, and suppose we take m dimensional input vector xiand project it down to one dimension using

y = wT x (17)

If we place a threshold on y and classify y ≥ −w0 as class C1

and otherwise class C2. In general, the projection onto onedimension leads to a considerable loss of information, andclasses that are well separated in the original m dimensionalspace may become strongly overlapping in one dimension.The simplest measure of separation of the classes when pro-jected onto w is the separation of the projected class means.The problem boils down to choosing w so as to maximize

m2 − m1 = wT (m2 −m1) (18)

where,mk = wTmk (19)

is the mean of the projected images from class Ck. The pro-jection formula shown in (17) transforms the set of labeleddata points in x into a labeled set in the one-dimensional spacey. The within-class variance of the transformed data fromclass Ck is given by

s2k =∑n∈Ck

(yn −mk)2 (20)

where yn = wT xn. From [11] we can derive total withinclass-variance for the whole dataset to be simply s21 + s22 asshown below

s2k =∑n∈Ck

(yn −mk)2

=∑n∈Ck

(wT x− wTmk)2

=∑n∈Ck

wT (x−mk)(x−mk)Tw

= wTSkw

(21)

Now using equation (21), in the process of yieldingRaleigh co-efficient, rewrite within-class scatter matrix as,

SW = S1 + S2 (22)

s21 + s22 = wTS1w + wTS2w

= wTSWw(23)

Following [11], we want the distance between the pro-jected means m1 and m2 to be as large as possible.

| m1 −m2 |2=| wTm1 − wTm2 |2 (24)

where projected means m1 and m2 are as shown in equa-tion (25).

m1 =1

N1

∑x∈C1

wT x

m2 =1

N2

∑x∈C2

wT x(25)

The equation in (24) can be written as,

| m1 −m2 |2 =| wTm1 − wTm2 |2

= wT (m1 −m2)(m1 −m2)Tw

= wTSBw

(26)

Following [11], Fisher’s linear discriminant is definedas the linear function wT x that maximizes the following ob-jective/distortion function J(w),

J(w) =(m1 −m2)

2

s21 + s22(27)

Substituting (23), (24) in (27) and we need to find an op-timal w∗, that maximizes (27) and must satisfy

SBw = λSWw (28)

From [11], optimal projection is,

w∗ = S−1W (m1 −m2) (29)

The intuition behind the equation (29) is projecting thedata on to one dimension that maximizes the ratio of between-class scatter and total within-class scatter.

The first task of this project measures the performance lin-ear discriminant analysis or LDA for the case of binary clas-sification. The second task of this project uses tensors [12] ofrank 2 as training vectors (instead of vectorizing the trainingimages) for the same task of binary classification.

DRAFT

5.1. Applying Fisher’s linear discriminant analysis: Ex-perimental setting

A collection of 2556 training image patches of which 2442are patches of background, tagged as class label C2, whereasthe rest 124 are patches of containing cars and tagged as classlabel C1. Each of these ground truth image patches is of size81 × 31. The 2D visualization of projection vector w com-puted is shown in figure 11. From this figure, which is ob-tained from least squares regression training, there is no car-like structural traits upon visualization of w.

Fig. 11. 2D visualization of projection vectorw = (XTX)−1XT y

5.2. Applying classifier on test data

We used k = 1, 2, ...10 different classifiers as shown below

y(x) =

{+1, if wT x ≥ θk−1, otherwise

where θk ∈ [µ1, µ2]. µ1 and µ2 are projected means.Before applying the best performing classifier on test set

of 170 images, we plotted the precision-recall curve on thetraining set. Precision and recall often show an inverse rela-tionship, that is, increasing one goes along with the cost ofreducing the other. Applying the best performing classifier(among the 10 classifiers), the figure 12 shows results on animage with single target (car).

6. THEORETICAL BACKGROUND FOR TENSORLINEAR DISCRIMINANT ANALYSIS

In the previous approach discussed in section 5, where train-ing image patches x ∈ Rm×n of size m × n are vectorized

(a) (b)

Fig. 12. The figure in (a) and (b) shows a car bounded by arectangle upon applying the classifier with threshold.

(a) (b) (c)

Fig. 13. W =∑ρr=1 urv

Tr

into mn, instead, treating images for what they are, we usetensors [1]. In the procedure proposed in [1], we computethe projection tensor by applying tensor contractions to thegiven set of training image patches and use alternating leastsquares.A tensor also known as n-way array or multidimensionalmatrix or n-mode matrix, is a higher order generalizationof a vector (first order tensor) and a matrix (second ordertensor). In this short description on second order tensorsX ∈ Rm×n, we use calligraphic upper-case letters X ,to represent grey-value images of size m × n. A train-ing set {(X α, yα)}α=1,2,...N of N image patches, whereX α ∈ Rm×n is given. Tensor discriminant analysis re-quires a projection tensor W which solves the regressionproblem[1],

W = argminW∗

∑α

(yα −W∗X α)2 (30)

6.1. Applying tensor discriminant analysis: Experimen-tal setting

As described in section 5.1, we use the same image collectionfor training and test data. We determine a projectorW where,

W =

ρ∑r=1

urvTr (31)

A random initialization of u, we compute a set of vectorsxα from tensor contractions X αkluk and inserting them intoa design matrix X and use equation w = (XTX)−1XT y tocompute v. Now, having v, we compute for u and iterativelyuntil the error converges ‖ur(t) − ur(t − 1)‖ ≤ ε. Follow-ing the algorithm [1] for computing a second order tensordiscriminant classifier W , we compute ρ-term solution ofsecond order projection tensor asW =

∑r ur ⊗ vr.

Visualizing the ρ-term solution of second order projectiontensors, we observe (shown in figure 13) car-like structuraltraits which was not in the case of conventional linear dis-criminant analysis [1]. The figures for (a)ρ = 1, (b)ρ = 3 and(c)ρ = 9 show the projection tensors respectively.

The mutlilinear classifier maps the training samples ontothe best discriminant direction, the results of the implemen-tation proposed in [1] are shown in the figures 14a, 14b and14c. In figure 14c, an overlap is observed.While implementation, the training time of this approach no-ticeably outperforms the conventional LDA (running time is

DRAFT(a) (b) (c)

Fig. 14. Projections produced by the tensor predictor

not reported). Adding to the list of advantages is that thisapproach addresses the problem of singular matrices (whichis often in the case where dimensionality of input space isgreater than the number of samples).

7. CONCLUSION

In discriminant analysis, linear discriminant analysis com-putes a transformation that maximizes the between-class scat-ter while minimizing the within-class scatter. Such a trans-formation must retain the class separability while reducingthe variation due to sources other than illumination. Whileconventional LDA takes huge running time for training theprojector, tensorial based approach outperforms the former inthis aspect. Also to alleviate the small sample size problem,we can perform two projections. PCA can be applied to thedata set to reduce its dimensionality and LDA is then appliedfurther reduce the dimensionality. However, the major advan-tage of tensor discriminant classifiers is that rank deficiencyconstraint considerably reduces the number of free parame-ters which makes the multi-linear classifiers faster and pre-ferred.In the case of linear methods for dimensionality reduction andunsupervised techniques, in PCA, there are limitations on thekinds of feature dimensions that can be extracted. For manygeneralized object detection problems, the features that mat-ter are not easy to express. It becomes really difficult to selectthose features where the algorithm needs to classify apart cats,from faces, from cars. We need to extract information rich di-mensions from our input images.Autoencoders overcome these limitations by exploiting the in-herent non-linearity of neural networks. An autoencoder [13]comes under the category of unsupervised learning that uti-lizes a neural network to produce a low-dimensional repre-sentation of a high-dimensional input. It consists of two ma-jor parts, the encoder and the decoder networks, in which, theformer is used during both training and testing, latter beingused only during training.

8. REFERENCES

[1] C. Bauckhage and T. Kaster, “Benefits of separa-ble, multilinear discriminant classification,” in PatternRecognition, 2006. ICPR 2006. 18th International Con-ference on, Aug 2006, vol. 4, pp. 959–959.

[2] Prof. Christian Bauckhage, “Image processing,retrieval,and analysis (ii),” [online], 2015, https://sites.google.com/site/bitimageprocessing/home/lecture-notes-ii.

[3] P Jonathon Phillips, W Todd Scruggs, Alice J OToole,Patrick J Flynn, Kevin W Bowyer, Cathy L Schott, andMatthew Sharpe, “Frvt 2006 and ice 2006 large-scaleresults,” 2007.

[4] Javier Ruiz-del Solar and Julio Quinteros, “Illumi-nation compensation and normalization in eigenspace-based face recognition: A comparative study of differentpre-processing approaches,” Pattern Recognition Let-ters, vol. 29, no. 14, pp. 1966–1979, 2008.

[5] Hong Liu, Wen Gao, Jun Miao, Debin Zhao, GangDeng, and Jintao Li, “Illumination compensation andfeedback of illumination feature in face detection,” inInfo-tech and Info-net, 2001. Proceedings. ICII 2001- Beijing. 2001 International Conferences on, 2001,vol. 3, pp. 444–449 vol.3.

[6] Amnon Shashua and Tammy Riklin-Raviv, “The quo-tient image: Class-based re-rendering and recognitionwith varying illuminations,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 23, no. 2, pp. 129–139, Feb. 2001.


[8] Matthew Turk, Alex P Pentland, et al., “Face recogni-tion using eigenfaces,” in Computer Vision and PatternRecognition, 1991. Proceedings CVPR’91., IEEE Com-puter Society Conference on. IEEE, 1991, pp. 586–591.

[9] K. Karhunen, Ueber lineare Methoden in derWahrscheinlichkeitsrechnung, Annales Academiae sci-entiarum Fennicae. Series A. 1, Mathematica-physica.1947.

[10] Ying Wu, “Principal component analysis and lin-ear discriminant analysis,” Electrical Engineering andComputer Science, Northwestern University, Evanston,wykład, 2014.



DRAFT

[13] Yoshua Bengio, “Learning deep architectures for ai,”Foundations and Trends in Machine Learning, vol. 2,no. 1, pp. 1–127, 2009.