Eigenvalues/Eigenvectors & Discriminant Analysis

Preview:

DESCRIPTION

Geology 659 - Quantitative Methods. Eigenvalues/Eigenvectors & Discriminant Analysis. tom.h.wilson wilson@geo.wvu.edu. Department of Geology and Geography West Virginia University Morgantown, WV. Eigenvalue and eigenvector problems. - PowerPoint PPT Presentation

Citation preview

tom.h.wilsonwilson@geo.wvu.edu

Department of Geology and GeographyWest Virginia University

Morgantown, WV

The eigenvalue/eigenvector problems of concern to us in statistical analysis are associated with matrices of correlation coefficients.

Consider the 4 x 4 matrix on page 147.

1 0.28

0.28 1

The matrix is symmetrical. The diagonal elements with value 1 represent the correlation of a sample with itself, while the remaining elements represent correlations of 1 sample to another

The plots represent different states of correlation between two variables. The eigenvectors define the directions of maximum and minimum variance.

High correlation

Low correlation

The example in the text illustrates grain size and sorting variations associated with two samples. One sample is taken from an offshore environment and the other from a beach environment. The plot suggests that neither grain size or sorting uniquely differentiate the offshore sands form the beach sands.

The result of discriminant analysis to find a linear combination of the sorting and grain size characteristics that helps differentiate between the clusters of offshore sands and beach sands appearing in the scatter plot.

As discussed by Davis, it is possible to derive a discriminant function along which there is maximum difference in the locations of beach and offshore sand clusters as defined by sorting and grain size.

1 1 2 2i i iR X X

The discriminant function (below) yields a “score” for each observation or sorting/grain size pair.

Beach Sands

Offshore Sands

i is the observation and j (1 or 2) is the variable (grain size or sorting).

The score is calculated as

1

2

where is the average of mean grain size, and

is average of mean sorting.

i

i

X

X

The cutoff score is the score calculated using the average of the group averages for sorting and grain size.

1 21 2i ioR X X

Ro helps differentiate the two groups and may suggest that some observations in the offshore sands, for example, may actually be beach sands. As you can see below, the discriminant scores suggest that three observations classified as offshore sands have characteristics similar to those of beach sands. There are also three observations classed as beach sands that have characteristics more like those of the offshore sand.

Discriminant analysis allows you to maximize the difference between clusters in the multidimensionalmultidimensional space defined by the measured variables.

Discriminant analysis provides a one dimensional measure of cluster separation along the discriminant score axis.

The data provided by Davis consists of three columns: 1) Group (A (beach) or B (offshore)), 2) median grain size, 3) sorting coefficient.

Beach Sands vs. Offshore Sands

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

0.31 0.32 0.33 0.34 0.35 0.36

Porosity

So

rtin

g

Using the original classifications the two groups plot as shown below.

In today’s lab we’ll show you how to use the statistical analysis package MiniTab and work through examples in multiple linear regression and discriminant analysis using data sets from Davis (2002).

Note that Minitab has abundant help files that will provide answers to many of your questions.

Help files for Multiple Regression and Discriminant Analysis are included in today’s handout.

The simple exercises begun in today’s class should be handed in on Thursday.

Recommended