24
Principal Component Analysis Machine Learning

Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Embed Size (px)

Citation preview

Page 1: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Principal Component Analysis

Machine Learning

Page 2: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Last Time

• Expectation Maximization in Graphical Models– Baum Welch

Page 3: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Now

• Unsupervised Dimensionality Reduction

Page 4: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Curse of Dimensionality

• In (nearly) all modeling approaches, more features (dimensions) require (a lot) more data – Typically exponential in the number of features

• This is clearly seen from filling a probability table.

• Topological arguments are also made.– Compare the volume of an inscribed hypersphere

to a hypercube

Page 5: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Dimensionality Reduction

• We’ve already seen some of this.

• Regularization attempts to reduce the number of effective features used in linear and logistic regression classifiers

Page 6: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Linear Models

• When we regularize, we optimize a function that ignores as many features as possible.

• The “effective” number of dimensions is much smaller than D

Page 7: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Support Vector Machines

• In exemplar approaches (SVM, k-nn) each data point can be considered to describe a dimension.

• By selecting only those instances that maximize the margin (setting α to zero), SVMs use only a subset of available dimensions in their decision making.

Page 8: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Decision Trees

• Decision Trees explicitly select split points based on features that improve InformationGain or Accuracy

• Features that don’t contribute to the classification sufficiently are never used.

weight

<165

5M height

<68

5F 1F / 1M

Page 9: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Feature Spaces

• Even though a data point is described in terms of N features, this may not be the most compact representation of the feature space

• Even classifiers that try to use a smaller effective feature space can suffer from the curse-of-dimensionality

• If a feature has some discriminative power, the dimension may remain in the effective set.

Page 10: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

1-d data in a 2-d world

0 0.020.040.060.08 0.1 0.120.14249.6249.8

250250.2250.4250.6250.8

251251.2251.4

Page 11: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Dimensions of high variance

Page 12: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Identifying dimensions of variance

• Assumption: directions that show high variance represent the appropriate/useful dimension to represent the feature set.

Page 13: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Aside: Normalization

• Assume 2 features:– Percentile GPA– Height in cm.

• Which dimension shows greater variability?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1235

240

245

250

255

260

265

270

275

280

285

Page 14: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Aside: Normalization

• Assume 2 features:– Percentile GPA– Height in cm.

• Which dimension shows greater variability?

0 5 10 15 20 25 30235

240

245

250

255

260

265

270

275

280

285

Page 15: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Aside: Normalization

• Assume 2 features:– Percentile GPA– Height in m.

• Which dimension shows greater variability?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 16: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Principal Component Analysis

• Principal Component Analysis (PCA) identifies the dimensions of greatest variance of a set of data.

Page 17: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Eigenvectors

• Eigenvectors are orthogonal vectors that define a space, the eigenspace.

• Any data point can be described as a linear combination of eigenvectors.

• Eigenvectors of a square matrix have the following property.

• The associated lambda is the eigenvalue.

Page 18: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

PCA

• Write each data point in this new space

• To do the dimensionality reduction, keep C < D dimensions.

• Each data point is now represented as a vector of c’s.

Page 19: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Identifying Eigenvectors

• PCA is easy once we have eigenvectors and the mean.

• Identifying the mean is easy.• Eigenvectors of the covariance matrix,

represent a set of direction of variance.• Eigenvalues represent the degree of the

variance.

Page 20: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Eigenvectors of the Covariance Matrix

• Eigenvectors are orthonormal• In the eigenspace, the Gaussian is diagonal – zero

covariance.• All eigen values are non-negative.• Eigenvalues are sorted.• Larger eigenvalues, higher variance

Page 21: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Dimensionality reduction with PCA

• To convert from an original data point to PCA

• To reconstruct a point

Page 22: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Eigenfaces

Encoded then Decoded.

Efficiency can be evaluatedwith Absolute or Squared error

Page 23: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

Some other (unsupervised) dimensionality reduction techniques

• Kernel PCA• Distance Preserving Dimension Reduction• Maximum Variance Unfolding• Multi Dimensional Scaling (MDS)• Isomap

Page 24: Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch

• Next Time– Model Adaptation and Semi-supervised

Techniques• Work on your projects.