Transcript
Page 1: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Applied Multivariate Analysis

Seppo Pynnonen

Department of Mathematics and Statistics, University of Vaasa, Finland

Spring 2017

Seppo Pynnonen Applied Multivariate Analysis

Page 2: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Discriminant Analysis

Seppo Pynnonen Applied Multivariate Analysis

Page 3: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

1 Discriminant analysis

Background

General Setup for the Discriminant Analysis

Descriptive Discriminant Analysis

Number of Discriminant Functions

Seppo Pynnonen Applied Multivariate Analysis

Page 4: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

Example 1

Consider the following data on financial ratios for solvent and bankrupted

companies

Financial Ratios of Bankrupt and Solvent Companies, Altman (1968)

Source: Morrison (1990). Multivariate Statistical Methods,

3rd ed. McGraw-Hill

X1 = Working Capital / Total Assets

X2 = Retained Earnings / Total Assets

X3 = Earnings Before Interest and Taxes / Total Assets

X4 = Market Value of Equity / Total Value of Liabilities

X5 = Sales / Total Assets

Group, 1 = Bankrupt 2 = Solvent

Seppo Pynnonen Applied Multivariate Analysis

Page 5: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

Group X1 X2 X3 X4 X5 Group X1 X2 X3 X4 X5

1 36.7 -62.8 -89.5 54.1 1.7 1 25.2 -11.4 4.8 7.0 0.9

1 24.0 3.3 -3.5 20.9 1.1

1 -61.6 -120.8 -103.2 24.7 2.5

1 -1.0 -18.1 -28.8 36.2 1.1

1 18.9 -3.8 -50.6 26.4 0.9

1 -57.2 -61.2 -56.6 11.0 1.7

1 3.0 -20.3 -17.4 8.0 1.0

1 -5.1 -194.5 -25.8 6.5 0.5

1 17.9 20.8 -4.3 22.6 1.0

1 5.4 -106.1 -22.9 23.8 1.5

1 23.0 -39.4 -35.7 69.1 1.2

1 -67.6 -164.1 -17.7 8.7 1.3

1 -185.1 -308.9 -65.8 35.7 0.8

1 13.5 7.2 -22.6 96.1 2.0

1 -5.7 -118.3 -34.2 21.7 1.5

1 72.4 -185.9 -280.0 12.5 6.7

1 17.0 -34.6 -19.4 35.5 3.4

1 -31.2 -27.9 6.3 7.0 1.3

1 14.1 -48.2 6.8 16.6 1.6

1 -60.6 -49.2 -17.2 7.2 0.3

1 26.2 -19.2 -36.7 90.4 0.8

1 7.0 -18.1 -6.5 16.5 0.9

1 53.1 -98.0 -20.8 26.6 1.7

1 -17.2 -129.0 -14.2 267.9 1.3

1 32.7 -4.0 -15.8 177.4 2.1

1 26.7 -8.7 -36.3 32.5 2.8

1 -7.7 -59.2 -12.8 21.3 2.1

1 18.0 -13.1 -17.6 14.6 0.9

1 2.0 -38.0 1.6 7.7 1.2

1 -35.3 -57.9 0.7 13.7 0.8

1 5.1 -8.8 -9.1 100.9 0.9

1 0.0 -64.7 -4.0 0.7 0.1

1 25.2 -11.4 4.8 7.0 0.9Seppo Pynnonen Applied Multivariate Analysis

Page 6: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

2 35.2 43.0 16.4 99.1 1.3

2 38.8 47.0 16.0 126.5 1.9

2 14.0 -3.3 4.0 91.7 2.7

2 55.1 35.0 20.8 72.3 1.9

2 59.3 46.7 12.6 724.1 0.9

2 33.6 20.8 12.5 152.8 2.4

2 52.8 33.0 23.6 475.9 1.5

2 45.6 26.1 10.4 287.9 2.1

2 47.4 68.6 13.8 581.3 1.6

2 40.0 37.3 33.4 228.8 3.5

2 69.0 59.0 23.1 406.0 5.5

2 34.2 49.6 23.8 126.6 1.9

2 47.0 12.5 7.0 53.4 1.8

2 15.4 37.3 34.1 570.1 1.5

2 56.9 35.3 4.2 240.3 0.9

2 43.8 49.5 25.1 115.0 2.6

2 20.7 18.1 13.5 63.1 4.0

2 33.8 31.4 15.7 144.8 1.9

2 35.8 21.5 -14.4 90.0 1.0

2 24.4 8.5 5.8 149.1 1.5

2 48.9 40.6 5.8 82.0 1.8

2 49.9 34.6 26.4 310.0 1.8

2 54.8 19.9 26.7 239.9 2.3

2 39.0 17.4 12.6 60.5 1.3

2 53.0 54.7 14.6 771.7 1.7

2 20.1 53.5 20.6 307.5 1.1

2 53.7 35.6 26.4 289.5 2.0

2 46.1 39.4 30.5 700.0 1.9

2 48.3 53.1 7.1 164.4 1.9

2 46.7 39.8 13.8 229.1 1.2

2 60.3 59.5 7.0 226.6 2.0

2 17.9 16.3 20.4 105.6 1.0

2 24.7 21.7 -7.8 118.6 1.6

Seppo Pynnonen Applied Multivariate Analysis

Page 7: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

Relevant questions then are:

How do the companies in these two groups differ from each other?

Which ratios best discriminate the groups?

Are the ratios useful for predicting bankruptcies?

Partial answers to can be obtained by examining each single variable at a

time.

Seppo Pynnonen Applied Multivariate Analysis

Page 8: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

For example sample statistics for each group are

Seppo Pynnonen Applied Multivariate Analysis

Page 9: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Background

Some graphics may also be helpful. For example,

More complete use of group separation information, however, canbe given by discriminant analysis (DA).

Seppo Pynnonen Applied Multivariate Analysis

Page 10: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

General Setup for the Discriminant Analysis

1 Discriminant analysis

Background

General Setup for the Discriminant Analysis

Descriptive Discriminant Analysis

Number of Discriminant Functions

Seppo Pynnonen Applied Multivariate Analysis

Page 11: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

General Setup for the Discriminant Analysis

Discriminant analysis is used for two purposes:

(1) describing major differences among the groups, and

(2) classifying subject on the basis of measurements.

Seppo Pynnonen Applied Multivariate Analysis

Page 12: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

1 Discriminant analysis

Background

General Setup for the Discriminant Analysis

Descriptive Discriminant Analysis

Number of Discriminant Functions

Seppo Pynnonen Applied Multivariate Analysis

Page 13: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

The start off setup:

p variables

q exclusive groups

Seppo Pynnonen Applied Multivariate Analysis

Page 14: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

The goal of the descriptive DA is:

Form k new variables such that

1 The new variables are uncorrelated.

2 The first new variable has the best discriminating power w.r.tthe given groups. The second new variable has the secondbest discriminating power and is uncorrelated with the firstone, the third has the third best discriminating power and isuncorrelated with the previous ones, etc.

Remark 1

k ≤ min(p, q − 1). For example, if q = 2 then k = min(p, 1) = 1.

Seppo Pynnonen Applied Multivariate Analysis

Page 15: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

More precisely, suppose we have observations on random variablesx1, . . . , xp from q groups.

Then the jth discriminant function is defined as a linearcombination of the original variables

yj = aj1x1 + · · ·+ ajpxp, (1)

such that corr[yj , y`] = 0 for j 6= `, and y1 has the bestdiscriminating power, y2 the second best, and so on.

Seppo Pynnonen Applied Multivariate Analysis

Page 16: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Remark 2

In the basic case the assumption is that the groups differ only withrespect to the means of the variables.

As a consequence the correlations between the variables and variances are

assumed the same over the groups (groups have similar covariance

structures).

Seppo Pynnonen Applied Multivariate Analysis

Page 17: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

The idea in deriving the discriminant functions is to divide thetotal variation into between group and within group variation

T = B + W, (2)

where T denotes the total covariance matrix, B the betweencovariance matrix, and W the within covariance matrix.

Seppo Pynnonen Applied Multivariate Analysis

Page 18: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Technically the problem reduces again to an eigenvalue problem.

In this case the eigenvalues are extracted form the matrix

BW−1. (3)

The resulting eigenvectors form the coefficients for thediscriminant functions yj , j = 1, . . . , k with k = min(q − 1, p).

The functions are called canonical discriminant functions.

Seppo Pynnonen Applied Multivariate Analysis

Page 19: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Example 2

Consider the bankruptcy data. SAS proc candisc or SPSS (Analyze

→ Classify → Discriminant). Below are SAS results.

Example: Discriminant analysis applied to bankrupt data

Canonical Discriminant Analysis

66 Observations 65 DF Total

5 Variables 64 DF Within Classes

2 Classes 1 DF Between Classes

Class Level Information

GROUP Frequency Weight Proportion

1 33 33.0000 0.500000

2 33 33.0000 0.500000

Seppo Pynnonen Applied Multivariate Analysis

Page 20: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Canonical Discriminant Analysis Within-Class Covariance Matrices

GROUP = 1 DF = 32

Variable X1 X2 X3 X4 X5

X1 2104.5659 1834.1637 -266.4029 249.8980 18.0357

X2 1834.1637 5085.4767 1632.2018 177.7665 -15.6653

X3 -266.4029 1632.2018 2637.1822 168.3066 -46.6066

X4 249.8980 177.7665 168.3066 3018.2188 1.6108

X5 18.0357 -15.6653 -46.6066 1.6108 1.3509

GROUP = 2 DF = 32

Variable X1 X2 X3 X4 X5

X1 201.986 117.413 16.740 974.165 1.921

X2 117.413 272.496 52.076 1630.092 0.879

X3 16.740 52.076 118.108 814.591 2.762

X4 974.165 1630.092 814.591 42669.190 -14.529

X5 1.921 0.879 2.762 -14.529 0.865

Seppo Pynnonen Applied Multivariate Analysis

Page 21: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Canonical Discriminant Analysis

Simple Statistics

Total-Sample

Variable N Mean Variance Std Dev

X1 66 19.28485 1632 40.39972

X2 66 -13.63485 5064 71.15836

X3 66 -8.23182 1920 43.81308

X4 66 147.35909 34186 184.89362

X5 66 1.72121 1.13924 1.06735

GROUP = 1

Variable N Mean Variance Std Dev

X1 33 -2.83030 2105 45.87555

X2 33 -62.51212 5085 71.31253

X3 33 -31.78182 2637 51.35350

X4 33 40.04545 3018 54.93832

X5 33 1.50303 1.35093 1.16229

GROUP = 2

Variable N Mean Variance Std Dev

X1 33 41.40000 201.98563 14.21216

X2 33 35.24242 272.49627 16.50746

X3 33 15.31818 118.10841 10.86777

X4 33 254.67273 42669 206.56522

X5 33 1.93939 0.86496 0.93003

Seppo Pynnonen Applied Multivariate Analysis

Page 22: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Univariate Test Statistics

F Statistics, Num DF= 1 Den DF= 64

Total Pooled Between RSQ/

Variable STD STD STD R-Squared (1-RSQ)

X1 40.3997 33.9599 31.2755 0.304266 0.4373

X2 71.1584 51.7589 69.1229 0.479063 0.9196

X3 43.8131 37.1166 33.3047 0.293363 0.4152

X4 184.8936 151.1413 151.7644 0.342055 0.5199

X5 1.0673 1.0526 0.3086 0.042428 0.0443

Univariate Test Statistics

Variable F Pr > F

X1 27.9892 0.0001

X2 58.8555 0.0001

X3 26.5698 0.0001

X4 33.2726 0.0001

X5 2.8357 0.0971

Average R-Squared: Unweighted = 0.2922351

Weighted by Variance = 0.3546308

Multivariate Statistics and Exact F Statistics

S=1 M=1.5 N=29

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.369760775 20.4534 5 60 0.0001

Pillai’s Trace 0.630239225 20.4534 5 60 0.0001

Hotelling-Lawley Trace 1.704451275 20.4534 5 60 0.0001

Roy’s Greatest Root 1.704451275 20.4534 5 60 0.0001

Seppo Pynnonen Applied Multivariate Analysis

Page 23: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Example: Discriminant analysis applied to bankrupt data

Canonical Discriminant Analysis

Adjusted Approx Squared

Canonical Canonical Standard Canonical

Correlation Correlation Error Correlation

1 0.793876 0.781803 0.045863 0.630239

Eigenvalues of INV(E)*H

= CanRsq/(1-CanRsq)

Eigenvalue Difference Proportion Cumulative

1 1.7045 . 1.0000 1.0000

Test of H0: The canonical correlations in the

current row and all that follow are zero

Likelihood

Ratio Approx F Num DF Den DF Pr > F

1 0.36976078 20.4534 5 60 0.0001

NOTE: The F statistic is exact.

Total Canonical Structure

CAN1

X1 0.694823

X2 0.871854

X3 0.682260

X4 0.736708

X5 0.259462

Seppo Pynnonen Applied Multivariate Analysis

Page 24: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Between Canonical Structure

CAN1

X1 1.000000

X2 1.000000

X3 1.000000

X4 1.000000

X5 1.000000

Pooled Within Canonical Structure

CAN1

X1 0.506539

X2 0.734533

X3 0.493528

X4 0.552283

X5 0.161231

Total-Sample Standardized Canonical Coefficients

CAN1

X1 0.1404518774

X2 0.6028563830

X3 0.6695203123

X4 0.5616859665

X5 0.5320432994

Pooled Within-Class Standardized Canonical Coefficients

CAN1

X1 0.1180635365

X2 0.4385036080

X3 0.5671902048

X4 0.4591503359

X5 0.5246858501

Seppo Pynnonen Applied Multivariate Analysis

Page 25: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Raw Canonical Coefficients

CAN1

X1 0.0034765558

X2 0.0084720383

X3 0.0152812900

X4 0.0030378872

X5 0.4984713894

Class Means on Canonical Variables

GROUP CAN1

1 -1.285613175

2 1.285613175

Seppo Pynnonen Applied Multivariate Analysis

Page 26: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

The output includes several coefficient matrices.

The structure matrices describe the correlations of the original variableswith the discriminant function.

The most useful of these for interpretation purposes is the withincanonical structure.

In the case of multiple groups also between canonical structure may giveuseful additional information.

This structure tells how the means of variables and means of discriminant

functions are correlated.

Seppo Pynnonen Applied Multivariate Analysis

Page 27: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

The standardized coefficients are obtained by dividing the rawcoefficients by the standard deviations of the variables.

These coefficient tell the marginal effect of the (standardized)variable on the discriminant function.

Labeling the discriminant function is based on those variableshaving largest correlations and largest standardized coefficients.

Seppo Pynnonen Applied Multivariate Analysis

Page 28: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Example 3

From the within canonical structure we observe:

X2 (Retained earnings / Total assets) has the highest correlationwith the discriminant function.

X4 (Market value of equity / Total Value of Liabilities), X1

(Working capital / Total Assets), and X3 (Earnings before interestand taxes / Total assets) have next highest.

X5 (Sales / Total Assets) is small, but it has a large standardizedcoefficient.

Summing up, profitable and companies whose market value is on a high

level are the properties preventing from the bankruptcy.

Seppo Pynnonen Applied Multivariate Analysis

Page 29: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

It should be noted that the basic assumption in the discriminantanalysis is that the variables are normally distributed in each of thegroups, and that the covariance matrices are the same.

The former assumption is harder to test. The latter is easier (inSPSS select Box M from the options).

If the covariance matrices are not the same the linear discriminantfunction analysis is invalid.

One should move to the quadratic discriminant function analysis.

This method, however, is planned for classification purposes.

Seppo Pynnonen Applied Multivariate Analysis

Page 30: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Descriptive Discriminant Analysis

Example 4

Testing for the equality of the population covariance matrices.

H0 : Σ1 = Σ2, (4)

where Σi is the population covariance matrix of the population i(i = 1, 2).

SPSS give the result: Test Chi-Square Value = 186.18 with 15 degrees offreedom and p-value = 0.0001

We observe that the null hypothesis is rejected, hence one analysis results

should be interpreted with caution.

Seppo Pynnonen Applied Multivariate Analysis

Page 31: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

1 Discriminant analysis

Background

General Setup for the Discriminant Analysis

Descriptive Discriminant Analysis

Number of Discriminant Functions

Seppo Pynnonen Applied Multivariate Analysis

Page 32: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

In a case of multiple group (> 2) the question is: in how manydimension the groups are different.

In the case of two groups this is not a major problem, because thegroups can differentiate only in one dimension.

Generally, however, there can be more discriminating dimensions, ifq > 2.

Seppo Pynnonen Applied Multivariate Analysis

Page 33: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Example 5

The following data is a classic example considering different species ofIris Setosa.

The following measures were made:

SL: Sepal lengthSW: Sepal WIdthPL: Pedal LengthPW: Pedal Width

Seppo Pynnonen Applied Multivariate Analysis

Page 34: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

The CANDISC procedure produces the following results.

title;

data iris;

title ’Discriminant Analysis of Fisher (1936) Iris Data’;

input sepallen sepalwid petallen petalwid spec_no @@;

if spec_no=1 then species=’SETOSA ’;

if spec_no=2 then species=’VERSICOLOR’;

if spec_no=3 then species=’VIRGINICA ’;

label sepallen=’Sepal Length in mm.’

sepalwid=’Sepal Width in mm.’

petallen=’Petal Length in mm.’

petalwid=’Petal Width in mm.’;

datalines;

50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3

63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2

59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2

65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3

68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3

77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3

49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2

64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3

55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1

49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1

67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1

77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2

50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1

61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1

61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1

51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1

51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1

.

.

.

;

Seppo Pynnonen Applied Multivariate Analysis

Page 35: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

title ’Canonical Discriminant Analysis of IRIS data’;

proc candisc data = iris;

class species;

var sepallen--petalwid;

run;

Which gives the results:

Canonical Discriminant Analysis of IRIS data

Canonical Discriminant Analysis

150 Observations 149 DF Total

4 Variables 147 DF Within Classes

3 Classes 2 DF Between Classes

Class Level Information

SPECIES Frequency Weight Proportion

SETOSA 50 50.0000 0.333333

VERSICOLOR 50 50.0000 0.333333

VIRGINICA 50 50.0000 0.333333

Canonical Discriminant Analysis

Multivariate Statistics and F Approximations

S=2 M=0.5 N=71

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.023438631 199.145 8 288 0.0001

Pillai’s Trace 1.191898825 53.4665 8 290 0.0001

Hotelling-Lawley Trace 32.47732024 580.532 8 286 0.0001

Roy’s Greatest Root 32.1919292 1166.96 4 145 0.0001

NOTE: F Statistic for Roy’s Greatest Root is an upper bound.

NOTE: F Statistic for Wilks’ Lambda is exact.

Seppo Pynnonen Applied Multivariate Analysis

Page 36: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Adjusted Approx Squared

Canonical Canonical Standard Canonical

Correlation Correlation Error Correlation

1 0.984821 0.984508 0.002468 0.969872

2 0.471197 0.461445 0.063734 0.222027

Eigenvalues of INV(E)*H

= CanRsq/(1-CanRsq)

Eigenvalue Difference Proportion Cumulative

1 32.1919 31.9065 0.9912 0.9912

2 0.2854 . 0.0088 1.0000

Test of H0: The canonical correlations in the

current row and all that follow are zero

Likelihood

Ratio Approx F Num DF Den DF Pr > F

1 0.02343863 199.1453 8 288 0.0001

2 0.77797337 13.7939 3 145 0.0001

Total Canonical Structure

CAN1 CAN2

SEPALLEN 0.791888 0.217593 Sepal Length in mm.

SEPALWID -0.530759 0.757989 Sepal Width in mm.

PETALLEN 0.984951 0.046037 Petal Length in mm.

PETALWID 0.972812 0.222902 Petal Width in mm.

Seppo Pynnonen Applied Multivariate Analysis

Page 37: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Between Canonical Structure

CAN1 CAN2

SEPALLEN 0.991468 0.130348 Sepal Length in mm.

SEPALWID -0.825658 0.564171 Sepal Width in mm.

PETALLEN 0.999750 0.022358 Petal Length in mm.

PETALWID 0.994044 0.108977 Petal Width in mm.

Pooled Within Canonical Structure

CAN1 CAN2

SEPALLEN 0.222596 0.310812 Sepal Length in mm.

SEPALWID -0.119012 0.863681 Sepal Width in mm.

PETALLEN 0.706065 0.167701 Petal Length in mm.

PETALWID 0.633178 0.737242 Petal Width in mm.

Seppo Pynnonen Applied Multivariate Analysis

Page 38: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Total-Sample Standardized Canonical Coefficients

CAN1 CAN2

SEPALLEN -0.686779533 0.019958173 Sepal Length in mm.

SEPALWID -0.668825075 0.943441829 Sepal Width in mm.

PETALLEN 3.885795047 -1.645118866 Petal Length in mm.

PETALWID 2.142238715 2.164135931 Petal Width in mm.

Pooled Within-Class Standardized Canonical Coefficients

CAN1 CAN2

SEPALLEN -.4269548486 0.0124075316 Sepal Length in mm.

SEPALWID -.5212416758 0.7352613085 Sepal Width in mm.

PETALLEN 0.9472572487 -.4010378190 Petal Length in mm.

PETALWID 0.5751607719 0.5810398645 Petal Width in mm.

Raw Canonical Coefficients

CAN1 CAN2

SEPALLEN -.0829377642 0.0024102149 Sepal Length in mm.

SEPALWID -.1534473068 0.2164521235 Sepal Width in mm.

PETALLEN 0.2201211656 -.0931921210 Petal Length in mm.

PETALWID 0.2810460309 0.2839187853 Petal Width in mm.

Class Means on Canonical Variables

SPECIES CAN1 CAN2

SETOSA -7.607599927 0.215133017

VERSICOLOR 1.825049490 -0.727899622

VIRGINICA 5.782550437 0.512766605

Seppo Pynnonen Applied Multivariate Analysis

Page 39: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

The Wilk’s lambda test indicates that there are two statisticallysignificant discriminators on the five percent level.

Generally the hypotheses to be tested is like in the factor analysis

H0 : The number of discriminators = m

H1 : More is needed(5)

On the basis of the within-matrices the first discriminator indicatesthat the species differ with respect to the overall size of the leavesand the second discriminator that species differ also with respectto the width of the leaves.

Seppo Pynnonen Applied Multivariate Analysis

Page 40: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Example 9.6: Bankruptcy risk and signal to reorganization of a company(Laitinen, Luoma, Pynnonen 1996, UV, Discussion Papers 200)

Thus we have four groups.Seppo Pynnonen Applied Multivariate Analysis

Page 41: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Sample statistics:Table 7. Descriptive statistics of groups for estimation data.

B2 (n=20) N3 (n=17) N4 (n=23) F for eqVariable Mean Std Dev Mean Std Dev Mean Std Dev Mean Std Dev of meansROI -10.24 8.60 3.52 5.59 2.27 7.14 12.02 5.96 37.66***TCF -13.32 10.83 0.13 2.31 0.97 5.00 6.47 5.67 32.48***QRA 0.58 0.39 0.57 0.55 1.14 0.70 0.85 0.42 4.95**SCA -0.61 20.22 -4.75 18.79 13.62 13.19 23.13 19.55 10.39***DSR 1.09 0.55 0.69 0.25 0.88 0.34 0.57 0.28 7.62*****=significant at level 0.01***=significant at level 0.001

B1 (n=20)

Seppo Pynnonen Applied Multivariate Analysis

Page 42: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Number of canonical discriminant functions:

The results indicate that also the third canonical discriminant function is

statistically significant.Seppo Pynnonen Applied Multivariate Analysis

Page 43: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Canonical structure and standardized coefficients:

Table 11. Canonical structure and Standardized canonical coefficients both as pooled within.

Canonical structure* Standardized coefficientVariable CAN1 CAN2 CAN3 CAN1 CAN2 CAN3

ROI 0.702 0.036 0.004 0.717 0.013 -0.737TCF 0.643 0.059 0.467 0.372 -0.458 0.983QRA 0.101 0.513 0.653 -0.061 0.563 0.661SCA 0.252 0.773 -0.168 0.169 0.946 -0.522DSR -0.306 0.203 0.149 -0.722 0.034 0.16

*Correlation coefficients between original variables and canonical variables.

Seppo Pynnonen Applied Multivariate Analysis

Page 44: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Interpretation of the discriminant functions:

Seppo Pynnonen Applied Multivariate Analysis

Page 45: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

Group differences:

Seppo Pynnonen Applied Multivariate Analysis

Page 46: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c9.pdf · Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total

Discriminant analysis

Number of Discriminant Functions

CAN1, the financial performance, shows that the financial performance isthe main characteristic differentiating healthy and bankruptcy firms (asexpected).

CAN2, controversy dynamic liquidity and static ratios, is differentiatingcharacteristic between reorganizable non-bankrupt and reorganizablebankrupt firms.

CAN3, controversy between liquidity and other ratios, reorganizable

non-bankrupt firms and healthy firms. The distinction is probably due to

the fact that non-bankrupt firms may have cash reserves (high liquidity),

but do not use it profitably.

Seppo Pynnonen Applied Multivariate Analysis


Recommended