21
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Embed Size (px)

Citation preview

Page 1: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

1

Chapter 6

Diagnostics for Leverage

and Influence

Page 2: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

2

6.1 Importance of Detecting Influential Observations

• Leverage Point:– unusual x-value; – very little effect

on regression coefficients.

Page 3: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

3

6.1 Importance of Detecting Influential Observations

• Influence Point: unusual in y and x;

Page 4: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

4

6.2 Leverage

• The hat matrix is:

H = X(XX)- 1 X• The diagonal elements of the hat matrix are

given by

hii = xi(XX)-1xi

• hii – standardized measure of the distance of

the ith observation from the center of the x-space.

Page 5: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

5

6.2 Leverage

• The average size of the hat diagonal is p/n.

• Traditionally, any hii > 2p/n indicates a

leverage point.

• An observation with large hii and a large

residual is likely to be influential

Page 6: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

6

Page 7: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

7

Example 6.1 The Delivery Time Data

• Examine Table 6.1; if some possibly influential points are removed here is what happens to the coefficient estimates and model statistics:

Page 8: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

8

6.3 Measures of Influence

• The influence measures discussed here are those that measure the effect of deleting the ith observation.

1. Cook’s Di, which measures the effect on

2. DFBETASj(i), which measures the effect on

3. DFFITSi, which measures the effect on

4. COVRATIOi, which measures the effect on the variance-covariance matrix of the parameter estimates.

j

iY

Page 9: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

9

6.3 Measures of Influence: Cook’s D

What contributes to Di:1. How well the model fits the ith observation, yi

2. How far that point is from the remaining dataset.

Large values of Di indicate an influential point, usually if Di > 1.

)1()(

)ˆ(

)ˆˆ()'ˆˆ(),'(

22

Re

)()(Re

ii

iii

i

ii

s

iiisi

h

h

p

r

eVar

yVar

p

r

pMSDpMSXXD

XX'

Page 10: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

10

Page 11: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

11

6.4 Measures of Influence: DFFITS and DFBETASDFBETAS – measures how much the regression

coefficient changes in standard deviationunits if the ith observation is removed.

where is an estimate of the jth coefficient when the ith observation is removed.

• Large DFBETAS indicates ith observation has considerable influence. In general, |DFBETASj,i| > 2/

jji

ijjij

CSDFBETAS

2)(

)(,

ˆˆ

n

)(ˆ

ij

Page 12: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

12

6.4 Measures of Influence: DFFITS and DFBETAS

DFFITS – measures the influence of the ith observation on the fitted value, again in standard deviation units.

• Cutoff: If |DFFITSi| > 2 , the point is

most likely influential.

iii

iii

hS

yyDFFITS

2)(

)(ˆˆ

np /

Page 13: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

13

6.4 Measures of Influence: DFFITS and DFBETAS

Equivalencies

• See the computational equivalents of both DFBETAS and DFFITS (page 217). You will see that they are both functions of R-student and hii.

Page 14: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

14

Page 15: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

15

6.5 A Measure of Model Performance

• Information about the overall precision of estimation can be obtained through another statistic, COVRATIOi

iips

pi

s

iiii

hMS

S

MS

SCOVRATIO

1

1)(

|)(|

|)(|

Re

2)(

Re1

2)(

1)()(

XX'

XX

Page 16: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

16

6.5 A Measure of Model Performance

Cutoffs and Interpretation• If COVRATIOi > 1, the ith observation

improves the precision. • If COVRATIOi < 1, ith observation can

degrade the precision. Or,• Cutoffs: COVRATIOi > 1 + 3p/n or

COVRATIOi < 1 - 3p/n; (the lower limit is really only good if n > 3p).

Page 17: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

17

Page 18: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

18

6.6 Detecting Groups of Influential Observations

• Previous diagnostics were “single-observation”

• It is possible that a group of points have high-leverage or exert undue influence on the regression model.

• Multiple-observation deletion diagnostic can be implemented.

Page 19: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

19

6.6 Detecting Groups of Influential Observations

• Cook’s D can be extended to incorporate multiple observations:

where i denotes the m 1 vector of indices specifying the points to be deleted.

• Large values of Di indicate that the set of m points are influential.

ss pMS

DpMSDRe

)()(Re

)ˆˆ()'ˆˆ(),'(

ii

ii

XX'XX

Page 20: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

20

6.7 Treatment of Influential Observations

• Should an influential point be discarded?

Yes, if– there is an error in recording a measured value;– the sample point is invalid; or,– the observation is not part of the population that

was intended to be sampled

No, if– the influential point is a valid observation.

Page 21: Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining

21

6.7 Treatment of Influential Observations

• Robust estimation techniques– These techniques offer an alternative to deleting

an influential observation.– Observations are retained but downweighted in

proportion to residual magnitude or influence.