71
Copyright © 2010 Pearson Education, Inc., publishing as Prentice- Hall. 5-1 Chapter 5 Multiple Discriminant Analysis

Copy of Multiple ant Analysis

Embed Size (px)

Citation preview

Page 1: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-1

Chapter 5Multiple Discriminant Analysis

Page 2: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-2

LEARNING OBJECTIVESUpon completing this chapter, you should be able to

do the following:• State the circumstances under which a linear

discriminant analysis should be used instead of multiple regression.

• Identify the major issues relating to types of variables used and sample size required in the application of discriminant analysis.

• Understand the assumptions underlying discriminant analysis in assessing its appropriateness for a particular problem.

Chapter 5Multiple Discriminant Analysis

Chapter 5Multiple Discriminant Analysis

Page 3: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-3

LEARNING OBJECTIVES continued . . . Upon completing this chapter, you should be able to do

the following:• Describe the two computation approaches for

discriminant analysis and the method for assessing overall model fit.

• Explain what a classification matrix is and how to develop one, and describe the ways to evaluate the predictive accuracy of the discriminant function.

• Tell how to identify independent variables with discriminatory power.

• Justify the use of a split-sample approach for validation.

Chapter 5Multiple Discriminant Analysis

Chapter 5Multiple Discriminant Analysis

Page 4: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-4

Multiple discriminant analysis (MDA) is an appropriate technique when the dependent variable is categorical (nominal or nonmetric) and the independent variables are metric. The single dependent variable can have two, three or more categories.

Discriminant Analysis Defined

Examples:• Gender – Male vs. Female• Heavy Users vs. Light Users• Purchasers vs. Non-purchasers• Good Credit Risk vs. Poor Credit Risk• Member vs. Non-Member• Low, medium, high• Attorney, Physician or Professor

Page 5: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-5

• MDA is a linear combination of 2 (or more) independent variables will discriminate between objects or groups defined a priori.

• Variate’s weight for each IV is calculated which is also known as “discriminant function”.

• MDA derives variate that best distinguishes between a priori groups.

• MDA sets variate’s weights to maximize between-group variance relative to within-group variance.

Discriminant Analysis Defined

Page 6: Copy of Multiple ant Analysis

Discriminant Analysis Defined

For each observation we can obtain a Discriminant Z-score

Average Z score for a group gives Centroid Classification done using Cutting Scores

which are derived from group centroids Statistical significance of Discriminant

Function done using distance bet. group centroids

LR similar to 2-group discriminant analysis

Page 7: Copy of Multiple ant Analysis

Discriminant Function

Z = W1X1 + W2X2 + W3X3 + …. + WiXi

Z = Discriminant Score

Wi = Discriminant weight for variable i

Xi = Independent variable i

Page 8: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-8

KitchenAid Survey Results for the Evaluation* of a New Consumer

ProductX3 Style

Group 1 Would purchase 1 8 9 6

2 6 7 53 10 6 34 9 4 45 4 8 2

Group Mean 7.4 6.8 4.0 Group 2 Would not purchase 6 5 4 7

7 3 7 28 4 5 59 2 4 3

10 2 2 2Group Mean 3.2 4.4 3.8

Difference between group means 4.2 2.4 0.2

Purchase Intention Subject Number

X1 Durabilit

y

X2 Performance

*Evaluations made on a 0 (very poor) to 10 (excellent) rating scale.

Page 9: Copy of Multiple ant Analysis

Univariate Representation of Discriminant Z Scores

Discriminant Function

Discriminant Function

Z

Z

A B

BA

Page 10: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-10

Graphic Illustration of Two-Group Discriminant Analysis

X2

X1

Z

Discriminant Function

A’

B’

A

B

Page 11: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-11

Discriminant Analysis Decision Process

Stage 1: Objectives of Discriminant Analysis

Stage 2: Research Design for Discriminant Analysis

Stage 3: Assumptions of Discriminant Analysis

Stage 4: Estimation of the Discriminant Model and Assessing Overall Fit

Stage 5: Interpretation of the Results

Stage 6: Validation of the Results

Page 12: Copy of Multiple ant Analysis

Stage 1: Objective of Discriminant Analysis

Discriminant Analysis can address any of the following questions:• Determining whether statistically significant differences

exist between the average score profile on a set of variables for two (or more) defined groups.

• Determining which of the IVs account the most for the differences in the average score profiles of the two or more groups.

• Establishing procedures for classifying statistical units (individuals or objects) into groups on the basis of their scores on a set of IVs.

• Establishing the number and composition of the dimensions of discrimination between groups formed from the set of IVs.

Page 13: Copy of Multiple ant Analysis

Stage 1: Illustrative example

A company has two locations to serve customers• North America• Outside North America

The management is interested in any difference in perceptions between those customers saved by two locations

Page 14: Copy of Multiple ant Analysis

Two variables can be identified

X6- X18- customers perceptions on

thirteen characteristics

X4 company locations two locations

(North America, out of North America)

Multiple Discriminant analysis is to be used

Page 15: Copy of Multiple ant Analysis

Objectives (Example)

Find any differences in customer perception that may occur between two geographical areas

Page 16: Copy of Multiple ant Analysis

Stage 2: Research Design for Discriminant Analysis

Selection of dependent and independent variables

Sample size (total and per variable) Sample division for vadidation

Page 17: Copy of Multiple ant Analysis

Selecting Dependent and Independent Variables

How many categories in the dependent Variables?

Converting metric variables• Most Common Approach

– To use the metric scale responses to develop non-metric categories. For example, use a question asking the typical number of soft drinks consumed per day and develop a three-category variable of 0 drinks for non-user, 1 -5 for light users, and 5 or more for heavy users.

• Polar Extremes Approach– Compares only the extreme two group and excludes the

middle group(s)

Page 18: Copy of Multiple ant Analysis

Sample Size

Overall sample size

Sample size per category

Page 19: Copy of Multiple ant Analysis

Division of the sample

Creating the subsamples

What if the overall sample is too small?

Page 20: Copy of Multiple ant Analysis

Rules of Thumb 5-1Discriminant Analysis Design

The dependent variable must be non-metric, representing groups of object that are expected to differ on the independent variables.

Choose a dependent variable that:• Best represent group differences of interest• Defines groups that are substantially different, and• Minimizes the number of categories while still meeting

the research objectives. In converting metric variable to a non-metric scale

for use as the dependent variable, consider using extreme groups to maximize the group differences.

Page 21: Copy of Multiple ant Analysis

Rules of Thumb 5-1 continued…

Independent variables must identify differences between at least two groups to be of any use in discriminant analysis

The sample size must be large enough to:• Have at least one more observation per group.• Have 20 cases per independent variable, with a minimum

recommended level of 5 observations per variable.• Have at least one more observation per group than the

number of independent variables, but striving for at least 20 cases per group.

• Have a large enough sample to divide it into an estimation and holdout sample, each meeting the above requirements.

Page 22: Copy of Multiple ant Analysis

Rules of Thumb 5-1 continued…

Assess the equality of covariance matrices with the Box’s M test, but apply a conservative significance level of .01.

Examine the independent variables for univariate normality.

Multicollinearity among the independent variables can markedly reduce the estimated impact of independent variables in the derived discriminant function(s), particularly is a stepwise estimation process is used.

Page 23: Copy of Multiple ant Analysis

Stage 2: Research design (Example)

Three key issues

Selecting dependent and independent variables

Sample size

Division of the sample

Page 24: Copy of Multiple ant Analysis

Selecting dependent and independent variables

DV = X4 two groups categorical variable (non Metric)

IVs = X6 to X18 thirteen (13) customer perceptions to discriminate between each geographical area (metric)

Important Dependent variable - Non metric

Independent variables- Metric

Page 25: Copy of Multiple ant Analysis

Thirteen independent Variables X6 Product Quality metric X7 E-Commerce Activities/Website metric X8 Technical Support metric X9 Complaint Resolution metric X10 Advertising metric X11 Product Line metric X12 Sales force Image metric X13 Competitive Pricing metric X14 Warranty & Claims metric X15 New Products metric X16 Ordering & Billing metric X17 Price Flexibility metric X18 Delivery Speed metric

Page 26: Copy of Multiple ant Analysis

Selecting Sample size

Overall sample

100 observations

Satisfy minimum requirement 5: 1 ratio When total sample not split, ratio can be

increased to 8: 1 but validation of result is more important

Important • Overall sample size• Sample size for categories Analysis sample and Holdout Sample (validation sample)

Page 27: Copy of Multiple ant Analysis

Sample for categories

The researcher can decide with adequate sample units

Analysis sample 60

Holdout sample 40

Page 28: Copy of Multiple ant Analysis

Division of the sample

Analysis sample (60) can be divided into two groups

Group sizes 26 and 34

Satisfies the minimum requirement of 20 observations per group

Page 29: Copy of Multiple ant Analysis

Stage 3: Assumptions of Discriminant Analysis

Key Assumptions• Multivariate normality of the IVs. • Equal variance and covariance for the

groups Other Assumptions

• Minimal multicollinearity among IVs.• Group sample sizes relatively equal.• Linear relationships.• Elimination of outliers.

Page 30: Copy of Multiple ant Analysis

Stage 3: Assumptions of the MDA (Example)

Normality and linearity has been tested for the variables

At an acceptable level

Important• Normality, Multivariate normality• Linearity• Multicolinearity• Equality of covariance matrices

chapter 2 page no 80 in the book

Page 31: Copy of Multiple ant Analysis

page no 80

Page 32: Copy of Multiple ant Analysis

Multicolinearity has been tested for the variables

At an acceptable level

chapter 4 page no 211 in the book

Page 33: Copy of Multiple ant Analysis

page no 211

Page 34: Copy of Multiple ant Analysis

Equality of covariance matrices

Box’s M test It has been tested and differences in

covariance matrices between two groups .011 and significance

As all assumptions are met, no additional remedies are needed for transforming

variables

Page 35: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-35

Stage 4: Estimation of the Discriminant Model and Assessing Overall Fit

Selecting An Estimation Method . . .

1. Simultaneous Estimation – all independent variables are considered concurrently.

2. Stepwise Estimation – independent variables are entered into the discriminant function one at a time.

Page 36: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-36

Estimating the Discriminant Function

The stepwise procedure begins with all independent

variables not in the model, and selects variables for

inclusion based on:

• Statistically significant differences across the

groups (.05 or less required for entry),

• Statistical Significance of Functions: Wilks’ lamda, Hotelling’s trace, Pilliai’s criterion. Mahalanobis D2 and Rao’s V for stepwise.

Page 37: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-37

Assessing Overall Model Fit

• Calculating discriminant Z scores for each observation,

• Evaluating group differences on the discriminant Z scores, and

• Assessing group membership prediction accuracy.

Page 38: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-38

Assessing Group Membership Prediction Accuracy

Major Considerations: • The statistical and practical rational for

developing classification matrices,• The cutting score determination, • Construction of the classification matrices,

and • Standards for assessing classification

accuracy.

Page 39: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-39

Rules of Thumb 5–2

Model Estimation and Model Fit • Although stepwise estimation may seem “optimal” by

selecting the most parsimonious set of maximally discriminating variables, beware of the impact of multicollinearity on the assessment of each variable’s discriminatory power.

• Overall model fit assesses the statistical significance between groups on the discriminant Z score(s), but does not assess predictive accuracy.

• With more than two groups, do not confine your analysis to only the statistically significant discriminant function(s), but consider if nonsignificant functions (with significance levels of up to .3) add explanatory power.

Page 40: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-40

Calculating the Optimum Cutting Score

Issues . . .

• Define the prior probabilities based either on the relative sample sizes of the observed groups or specified by the researcher (either assumed to be equal or with values set by the researcher), and

• Calculate the optimum cutting score value as a weighted average based on the assumed sizes of the groups (derived from the sample sizes).

Page 41: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-41

Optimal Cutting Score with Equal Samples Sizes

Group BGroup A

_ ZA

_ ZB

Classify as B (Purchaser)Classify as A

(Nonpurchaser)

Page 42: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-42

Optimal Cutting Score with Unequal Samples Sizes

Group B

Group A

_ ZA

_ ZB

Optimal Weighted Cutting Score

Unweighted Cutting Score

Page 43: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-43

Establishing Standards of Comparison for the Hit

Ratio

Group sizes determine standards based on:

• Equal Group Sizes

• Unequal Group Sizes – two criteria:

o Maximum Chance Criterion

o Proportional Chance Criterion

Page 44: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-44

Classification MatrixHBAT’s New Consumer Product

ActualGroup

WouldPurchase

WouldNot

PurchaseActualTotal

PercentCorrect

Classification

Predicted Group

Percent Correctly Classified (hit ratio) =

100 x [(22 + 20)/50] = 84%

(1) 22 3 2588%

(2) 5 20 2580%Predicte

d Total 27 23 50

Page 45: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-45

Rules of Thumb 5–3

Assessing Predictive Accuracy

• The classification matrix and hit ratio replace R2 as the measure of model fit:

assess the hit ratio both overall and by group..

If the estimation and analysis samples both exceed 100 cases and each group exceeds 20 cases, derive separate standards for each sample. If not, derive a single standard from the overall sample.

• Analyze the missclassified observations both graphically (territorial map) and empirically (Mahalanobis D2).

Page 46: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-46

Rules of Thumb 5–3 Continued . . .

Assessing Predictive Accuracy

• There are multiple criteria for comparison to the hit ratio:

The maximum chance criterion for evaluating the hit ratio is the most conservative, giving the highest baseline value to exceed.

Be cautious in using the maximum chance criterion in situations with overall samples less than 10 and/or group sizes under 20.

The proportional chance criterion considers all groups in establishing the comparison standard and is the most popular.

The actual predictive accuracy (hit ratio) should exceed the any criterion value by at least 25%.

Page 47: Copy of Multiple ant Analysis

Stage 4: estimating discriminant model and assessing overall fit (example)

As the objective is to determine the discriminating capabilities of individual variables stepwise method is selected

Identifying the variables with significant differences between groups

ImportantThere are two methods

• Simultaneous• stepwise

Page 48: Copy of Multiple ant Analysis

page no 377

Page 49: Copy of Multiple ant Analysis

X6, X11, X12, X13, X17 has the larger differences in the group means look at wilks’ Lambada, F value, significance, and Minimum D2

X13 has the largest D2 between groups and significance less than .05 and qualify for first entry.

Page 50: Copy of Multiple ant Analysis

page no378

Page 51: Copy of Multiple ant Analysis

page no 380

Page 52: Copy of Multiple ant Analysis

page no 381

Page 53: Copy of Multiple ant Analysis

.7492 = .561 = 56%

Calculate discriminant Z scores used in classification

Discriminant loadings from highest to lowest used for interpretation

Fisher’s linear discriminant function used for classification

page no 382

Page 54: Copy of Multiple ant Analysis

Assessing the predictive accuracy of discriminant function

Important• Calculating cutting score• procedure for classification • Assess the predictive accuracy

Page 55: Copy of Multiple ant Analysis

Predictive accuracy

page no 384

Page 56: Copy of Multiple ant Analysis

Cases with discriminant score less than -.2997 in group 0 and greater in 1

page no 385

Page 57: Copy of Multiple ant Analysis

Cases with discriminant score less than -.2997 in group 0 and greater in 1

page no 385

Page 58: Copy of Multiple ant Analysis

page no387

Page 59: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-59

Stage 5: Interpretation of the Results

Three Methods . . .

1. Standardized discriminant weights,

2. Discriminant loadings (structure correlations), and

3. Partial F values.

Page 60: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-60

Standardized discriminant weights

• Examines the sign and magnitude of standardized discriminant weight (discriminant coefficient assigned to each variable.

• It is used to compute discriminant function.

• The signs denotes the relationship (negative or postivie)

Page 61: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-61

Discriminant loadings (structure correlations)

• Is used because of deficiencies of “weights”.

• It reflects the variance that IV shares with the discriminant function.

• It can be interpreted like factor loadings.• It can be calculated for variables.

Page 62: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-62

Interpretation of the Results

Two or More Functions . . .

1. Rotation of discriminant functions

2. Potency index

Page 63: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-63

Graphical Display of Discriminant Scores and

Loadings

• Territorial Map = most common method.

• Vector Plot of Discriminant Loadings,

preferably the rotated loadings = simplest

approach.

Page 64: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-64

Plotting Procedure for Vectors

Three Steps . . .

1. Selecting variables,

2. Stretching the vectors, and

3. Plotting the group centroids.

Page 65: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-65

Rules of Thumb 5–4

Interpreting and Validating Discriminant Functions • Discriminant loadings are the preferred method to

assess the contribution of each variable to a discriminant function because they are: a standardized measure of importance (ranging

from 0 to 1).available for all independent variables whether

used in the estimation process or not.unaffected by multicollinearity.

• Loadings exceeding ±.40 are considered substantive for interpretation purposes.

Page 66: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-66

Rules of Thumb 5–4 continued . . .

Interpreting and Validating Discriminant Functions • If there is more than one discriminant function, be sure

to: use rotated loadings.assess each variable’s contribution across all the

functions with the potency index. • The discriminant function must be validated either with

a holdout sample or one of the “Leave-one-out” procedures.

Page 67: Copy of Multiple ant Analysis

Stage 5: interpretation of the results (example)

Identifying important discriminating variables

1. Analyzing Wilk’s Lambada and univariate F

2. Analyzing the discriminant weights

3. Discriminant Loadings

Page 68: Copy of Multiple ant Analysis

Stage 5: interpretation of the results

Not entered due to multicollinearity

page no 388

Page 69: Copy of Multiple ant Analysis

Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 5-69

Stage 6: Validation of the Results

• Utilizing a Holdout Sample

• Cross-Validation

Page 70: Copy of Multiple ant Analysis

Stage 6: validation of the results (example)

Important Internal validity External validity

• Internal validityClassification accuracy for both the holdout sample and cross-validated sample is higher and it establishes the internal validity• External validityExternal validity should be measured by the researcher use of additional samples

Page 71: Copy of Multiple ant Analysis

Thank You