Applied Multivariate Analysis - Vaasan sjp/Teaching/mva/lectures/c9.pdf¢  Multivariate Statistical Methods,

  • View
    2

  • Download
    0

Embed Size (px)

Text of Applied Multivariate Analysis - Vaasan sjp/Teaching/mva/lectures/c9.pdf¢  Multivariate...

  • Applied Multivariate Analysis

    Seppo Pynnönen

    Department of Mathematics and Statistics, University of Vaasa, Finland

    Spring 2017

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Discriminant Analysis

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    1 Discriminant analysis

    Background

    General Setup for the Discriminant Analysis

    Descriptive Discriminant Analysis

    Number of Discriminant Functions

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    Example 1

    Consider the following data on financial ratios for solvent and bankrupted

    companies

    Financial Ratios of Bankrupt and Solvent Companies, Altman (1968)

    Source: Morrison (1990). Multivariate Statistical Methods,

    3rd ed. McGraw-Hill

    X1 = Working Capital / Total Assets

    X2 = Retained Earnings / Total Assets

    X3 = Earnings Before Interest and Taxes / Total Assets

    X4 = Market Value of Equity / Total Value of Liabilities

    X5 = Sales / Total Assets

    Group, 1 = Bankrupt 2 = Solvent

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    Group X1 X2 X3 X4 X5 Group X1 X2 X3 X4 X5

    1 36.7 -62.8 -89.5 54.1 1.7 1 25.2 -11.4 4.8 7.0 0.9

    1 24.0 3.3 -3.5 20.9 1.1

    1 -61.6 -120.8 -103.2 24.7 2.5

    1 -1.0 -18.1 -28.8 36.2 1.1

    1 18.9 -3.8 -50.6 26.4 0.9

    1 -57.2 -61.2 -56.6 11.0 1.7

    1 3.0 -20.3 -17.4 8.0 1.0

    1 -5.1 -194.5 -25.8 6.5 0.5

    1 17.9 20.8 -4.3 22.6 1.0

    1 5.4 -106.1 -22.9 23.8 1.5

    1 23.0 -39.4 -35.7 69.1 1.2

    1 -67.6 -164.1 -17.7 8.7 1.3

    1 -185.1 -308.9 -65.8 35.7 0.8

    1 13.5 7.2 -22.6 96.1 2.0

    1 -5.7 -118.3 -34.2 21.7 1.5

    1 72.4 -185.9 -280.0 12.5 6.7

    1 17.0 -34.6 -19.4 35.5 3.4

    1 -31.2 -27.9 6.3 7.0 1.3

    1 14.1 -48.2 6.8 16.6 1.6

    1 -60.6 -49.2 -17.2 7.2 0.3

    1 26.2 -19.2 -36.7 90.4 0.8

    1 7.0 -18.1 -6.5 16.5 0.9

    1 53.1 -98.0 -20.8 26.6 1.7

    1 -17.2 -129.0 -14.2 267.9 1.3

    1 32.7 -4.0 -15.8 177.4 2.1

    1 26.7 -8.7 -36.3 32.5 2.8

    1 -7.7 -59.2 -12.8 21.3 2.1

    1 18.0 -13.1 -17.6 14.6 0.9

    1 2.0 -38.0 1.6 7.7 1.2

    1 -35.3 -57.9 0.7 13.7 0.8

    1 5.1 -8.8 -9.1 100.9 0.9

    1 0.0 -64.7 -4.0 0.7 0.1

    1 25.2 -11.4 4.8 7.0 0.9Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    2 35.2 43.0 16.4 99.1 1.3

    2 38.8 47.0 16.0 126.5 1.9

    2 14.0 -3.3 4.0 91.7 2.7

    2 55.1 35.0 20.8 72.3 1.9

    2 59.3 46.7 12.6 724.1 0.9

    2 33.6 20.8 12.5 152.8 2.4

    2 52.8 33.0 23.6 475.9 1.5

    2 45.6 26.1 10.4 287.9 2.1

    2 47.4 68.6 13.8 581.3 1.6

    2 40.0 37.3 33.4 228.8 3.5

    2 69.0 59.0 23.1 406.0 5.5

    2 34.2 49.6 23.8 126.6 1.9

    2 47.0 12.5 7.0 53.4 1.8

    2 15.4 37.3 34.1 570.1 1.5

    2 56.9 35.3 4.2 240.3 0.9

    2 43.8 49.5 25.1 115.0 2.6

    2 20.7 18.1 13.5 63.1 4.0

    2 33.8 31.4 15.7 144.8 1.9

    2 35.8 21.5 -14.4 90.0 1.0

    2 24.4 8.5 5.8 149.1 1.5

    2 48.9 40.6 5.8 82.0 1.8

    2 49.9 34.6 26.4 310.0 1.8

    2 54.8 19.9 26.7 239.9 2.3

    2 39.0 17.4 12.6 60.5 1.3

    2 53.0 54.7 14.6 771.7 1.7

    2 20.1 53.5 20.6 307.5 1.1

    2 53.7 35.6 26.4 289.5 2.0

    2 46.1 39.4 30.5 700.0 1.9

    2 48.3 53.1 7.1 164.4 1.9

    2 46.7 39.8 13.8 229.1 1.2

    2 60.3 59.5 7.0 226.6 2.0

    2 17.9 16.3 20.4 105.6 1.0

    2 24.7 21.7 -7.8 118.6 1.6

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    Relevant questions then are:

    How do the companies in these two groups differ from each other?

    Which ratios best discriminate the groups?

    Are the ratios useful for predicting bankruptcies?

    Partial answers to can be obtained by examining each single variable at a

    time.

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    For example sample statistics for each group are

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Background

    Some graphics may also be helpful. For example,

    More complete use of group separation information, however, can be given by discriminant analysis (DA).

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    General Setup for the Discriminant Analysis

    1 Discriminant analysis

    Background

    General Setup for the Discriminant Analysis

    Descriptive Discriminant Analysis

    Number of Discriminant Functions

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    General Setup for the Discriminant Analysis

    Discriminant analysis is used for two purposes:

    (1) describing major differences among the groups, and

    (2) classifying subject on the basis of measurements.

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    1 Discriminant analysis

    Background

    General Setup for the Discriminant Analysis

    Descriptive Discriminant Analysis

    Number of Discriminant Functions

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    The start off setup:

    p variables

    q exclusive groups

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    The goal of the descriptive DA is:

    Form k new variables such that

    1 The new variables are uncorrelated.

    2 The first new variable has the best discriminating power w.r.t the given groups. The second new variable has the second best discriminating power and is uncorrelated with the first one, the third has the third best discriminating power and is uncorrelated with the previous ones, etc.

    Remark 1

    k ≤ min(p, q − 1). For example, if q = 2 then k = min(p, 1) = 1.

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    More precisely, suppose we have observations on random variables x1, . . . , xp from q groups.

    Then the jth discriminant function is defined as a linear combination of the original variables

    yj = aj1x1 + · · ·+ ajpxp, (1)

    such that corr[yj , y`] = 0 for j 6= `, and y1 has the best discriminating power, y2 the second best, and so on.

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    Remark 2

    In the basic case the assumption is that the groups differ only with respect to the means of the variables.

    As a consequence the correlations between the variables and variances are

    assumed the same over the groups (groups have similar covariance

    structures).

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    The idea in deriving the discriminant functions is to divide the total variation into between group and within group variation

    T = B + W, (2)

    where T denotes the total covariance matrix, B the between covariance matrix, and W the within covariance matrix.

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    Technically the problem reduces again to an eigenvalue problem.

    In this case the eigenvalues are extracted form the matrix

    BW−1. (3)

    The resulting eigenvectors form the coefficients for the discriminant functions yj , j = 1, . . . , k with k = min(q − 1, p).

    The functions are called canonical discriminant functions.

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    Example 2

    Consider the bankruptcy data. SAS proc candisc or SPSS (Analyze

    → Classify → Discriminant). Below are SAS results.

    Example: Discriminant analysis applied to bankrupt data

    Canonical Discriminant Analysis

    66 Observations 65 DF Total

    5 Variables 64 DF Within Classes

    2 Classes 1 DF Between Classes

    Class Level Information

    GROUP Frequency Weight Proportion

    1 33 33.0000 0.500000

    2 33 33.0000 0.500000

    Seppo Pynnönen Applied Multivariate Analysis

  • Discriminant analysis

    Descriptive Discriminant Analysis

    Canonical Discriminant Analysis Within-Class Covariance Matrices

    GROUP = 1 DF = 32

    Variable X1 X2 X3 X4 X5

    X1 2104.5659 1834.1637 -266.4029 249.8980 18.0357

    X2 1834.1637 5085.4767 1632.2018 177.7665 -15.6653

    X3 -266.4029 1632.2018 2637.1822 168.3066 -46.6066

    X4 249.8980 177.7665 168.3066 3018.2188 1.6108

    X5 18.0357 -15.6653 -46.6066 1.6108 1.3509

    GROUP = 2 DF = 32

    Variable X1 X2 X3 X4 X5

    X1 201.986 117.413 16.740 974.165 1.921

    X2 117.413 272.496 52.076 1630.092 0.879