36
Lecture #7 - 2/15/2005 Slide 1 of 36 Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis

Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Embed Size (px)

Citation preview

Page 1: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Lecture #7 - 2/15/2005 Slide 1 of 36

Advanced Regression Topics:Violation of Assumptions

Lecture 7

February 15, 2005Applied Regression Analysis

Page 2: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Today’s Lecture

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 2 of 36

Today’s Lecture

■ Revisiting residuals.

◆ Outliers aside, what other things are important to look for:

■ Nonconstant variance.

■ Nonlinearity.

■ Nonnormality.

Page 3: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Snow geese

Regression Analysis

Assumptions

Residual Plot

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 3 of 36

Snow Geese

From Weisberg (1985, p. 102):

“Aerial survey methods are regularly used to estimated thenumber of snow geese in their summer range areas west ofHudson Bay in Canada. To obtain estimates, small aircraft flyover the range and, when a flock of geese is spotted, anexperienced person estimates the number of geese in theflock. To investigate the reliability of this method of counting,an experiment was conducted in which an airplane carryingtwo observers flew over 45 flocks, and each observer made anindependent estimate of the number of birds in each flock.Also, a photograph of the flock was taken so that an exactcount of the number of birds in the flock could be made (databy Cook and Jacobsen, 1978).”

Page 4: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Lecture #7 - 2/15/2005 Slide 4 of 36

Snow Geese

Page 5: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Lecture #7 - 2/15/2005 Slide 5 of 36

Hudson Bay

Page 6: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Snow geese

Regression Analysis

Assumptions

Residual Plot

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 6 of 36

Regression Analysis

■ Using the first observer in the plane, we consider therelationship between this person’s count and that from thephotograph:

0 100 200 300 400 500

observer 1 count

0

100

200

300

400

ph

oto

co

un

t

WWW

WWWW

WW

WW

WWW

W

W

W

W

W

W

WW

W

W

WW

W

W

W

W

W

W

W

W

W

WW

WW

W

W

W

WW

W

Page 7: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Snow geese

Regression Analysis

Assumptions

Residual Plot

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 7 of 36

Regression Analysis

■ One way of analyzing these data is to fit a regression thatattempts to predict the count in the photo from the count bythe observer.

■ Using SPSS, this regression was estimated, giving thefollowing statistics:

Coefficient Estimate SE t p-value

a - intercept 26.65 8.61 3.09 0.003

b - slope 0.88 0.08 11.37 < 0.001

Statistic Estimate

SSreg 254,769.50SSres 84,790.18R2 0.750

Page 8: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Snow geese

Regression Analysis

Assumptions

Residual Plot

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 8 of 36

Assumptions

■ But, remember, before we can interpret these results, wemust first check our assumptions.

■ Assumptions of regression analyses revolve around theresiduals, e = Y − Y ′ = Y − (a + bX).

■ In particular, we specified that all residuals were:

◆ Independent (or non-correlated).

◆ Identically distributed.

◆ Distribution was normal, with:

■ A zero mean.

■ A constant variance.

e ∼ N(0, σ2e)

Page 9: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Snow geese

Regression Analysis

Assumptions

Residual Plot

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 9 of 36

Residual Plot

■ From a previous lecture, recall that an easy way to checkassumptions was to look at a plot of the standardizedresiduals against the unstandardized predicted values:

100.00000 200.00000 300.00000 400.00000

Unstandardized Predicted Value

−2.00000

0.00000

2.00000

Sta

nd

ard

ize

d R

esid

ua

l

WW

W

WWW

W WWWWWWW

W

W

WW

W

WWW

W W

W

W

W

W

W

WW

W

WWW

W

W

WW

W

W

WWW

W

■ Do you notice any problems from this plot?

Page 10: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 10 of 36

Nonconstant Variance

■ One of the primary assumptions in a linear regression is thatvar(ei) = σ2

e for all i = 1, . . . , N observations.

100.00000 200.00000 300.00000 400.00000

Unstandardized Predicted Value

−2.00000

0.00000

2.00000

Sta

nd

ard

ize

d R

esid

ua

lWW

W

WWW

W WWWWWWW

W

W

WW

W

WWW

W W

W

W

W

W

W

WW

W

WWW

W

W

WW

W

W

WWW

W

■ In our example, this assumption is clearly violated.

Page 11: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 11 of 36

Nonconstant Variance Detection

■ Detecting nonconstant variance is often accomplished byexamining the residual plot.

■ However, using visual inspection can lead to some problems:

◆ Subjective interpretation, relying on experience.

◆ “How much is too much?” Nonconstant variance is really amatter of degree.

■ For this reason, one can construct a statistical hypothesistest for the constancy of variance.

Page 12: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 12 of 36

Nonconstant Variance Detection Test

■ Note that var(ei) is caused by:

◆ The response, y.

◆ The predictors X.

◆ Some other quantity not involved in the regression, suchas:

■ Observations over time.

■ Observations related by space (spatial orientation).

■ Any (or all) of these features can be put into a large matrix,Z, so each observation i has a row vector zi.

Page 13: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 13 of 36

Nonconstant Variance Detection Test

■ Given our suspected cause of nonconstancy of variance foreach observation zi, we can assume:

var(ei) = σ2e [exp(λ′zi)]

■ This is a very technical way of making variance a function ofother variables.

■ This form specifies the following constraints on our variance:

1. var(ei) > 0 for all observations zi.

2. The variance depends on zi and λ, but only because inthe linear function λ

′zi.

3. var(ei) is monotonic (either increasing or decreasing)across each component of zi.

4. If λ = 0, then var(ei) = σ2e for all i.

Page 14: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 14 of 36

Nonconstant Variance Detection Test: Steps

■ STEP 0: Determination of what is causing nonconstantvariance.

◆ Let’s go back to our geese data example, and assume thenonconstancy in variance was due to the predictor.

■ Human’s have a more difficult time detecting the numberof geese consistently as the number observed gets verylarge.

◆ Because we feel that the nonconstancy in variability iscaused by our predictor variables, we will construct Z fromX.

◆ Note that X has a column vector of ones for the intercept.

Page 15: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 15 of 36

Nonconstant Variance Detection Test: Steps

1. Estimate regression line for original model (Y ′ = a + bX),and save the unstandardized residuals for each observation:

ei = Yi − (a + bXi).

2. For each observation, compute scaled squared residuals,ui:

ui =e2i

σ2e

,

where σ2e is the ML estimate of σ2

e (differing from MSerror

because of denominator of N rather than N − k − 1):

σ2e =

∑N

i=1 e2i

N

Page 16: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 16 of 36

Nonconstant Variance Detection Test: Steps

3. Compute the regression of ui onto zi.

■ Obtain from this regression the SSreg.

■ Obtain the dfreg, where this is the number of predictors inZ (not including the intercept).

4. Compute the Score statistic (using SSreg from step 3):

S =SSreg

2

5. Test the hypothesis that λ = 0 by obtaining a p-value for S,which is distributed χ2(dfreg) where dfreg is from step 3.

Page 17: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 17 of 36

From Our Example

1. Original regression estimates found from SPSS:Analyze...Regression...Linear.■ Save unstandardized residuals from Save button menu.

2. For each observation, compute scaled squared residuals,

ui =e2

i

σ2e:

■ Compute σ2e =

Ni=1 e2

i

N.

◆ In SPSS: Transform...Compute, and make a newvariable that is the squared value of the unstandardizedresidual.

◆ Then find average of the new variable fromAnalyze...Descriptive Statistics...Descriptives.

◆ Alternative: take SSres from step 1 output and divide byN .

◆ σ2e = 1, 884.23.

■ Compute ui in SPSS by going to Transform...Compute.

Page 18: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 18 of 36

From Our Example

3. Compute the regression of ui onto zi.

■ In SPSS: Analyze...Regression...Linear.

4. Compute the Score statistic S = SSreg2 , using SSreg from

step 3. Get this from SPSS output.

■ S = 162.82/2 = 81.41

■ This has dfreg = 1.

5. Get p-value for S.

■ In Excel type “=chidist(81.41,1)”.

■ p < 0.001.

Based on these results, we reject the null hypothesis ofconstant variance, and conclude that our example violates theconstant variance assumption.

Page 19: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 19 of 36

Nonconstant Variance...Now What?

■ We found in our example that we have statistical evidence fornonconstant variance.

■ The biggest result of nonconstant variance is that ourregression line does not accurately represent all cases in oursample.

■ Also a problem is that the hypothesis tests we use are basedon the assumption that ei ∼ N(0, σ2

e).

■ When nonconstant variance is found, two options arepossible:

1. Estimate the regression using alternate methods.

◆ Weighted least squares.◆ Median regression.

2. Transform either the response or the predictors.

Page 20: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 20 of 36

Remedy #1: Alternate Estimation Algorithms

■ Much like the mean is extremely sensitive to highly skewed(outlying) observations, least squares estimates have what iscalled a high “breakdown” point.

■ Instead of finding regression parameters that minimize:

N∑

i=1

(Y − Y ′)2,

alternative optimization criteria exist.

Page 21: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 21 of 36

Alternate Estimation Algorithms

■ Two possible alternatives:

◆ Weighted Least Squares (WLS):

N∑

i=1

wi(Y − Y ′)2

■ Can be performed in SPSS.

◆ Minimum absolute deviation:

N∑

i=1

|Y − Y ′|

■ Much more technical.

■ Simplex optimization method involves linearprogramming.

Page 22: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 22 of 36

Remedy #2: Variance Stabilizing Transformations

■ The second (and perhaps most commonly used) remedy fornonconstant variance is to transform the response variableY .

(Weisberg, 1985; p. 134)

Transformation Situation Reason√

Y var(ei) ∝ E(Yi) Poisson counts√

Y +√

Y + 1 ’ ’ Poisson, small Y values

ln(Y ) var(ei) ∝ [E(Yi)]2 Broad range of Y

ln(Y + 1) ’ ’ Some Y are zero1Y var(ei) ∝ [E(Yi)]

4 Y bunched near zero1

(Y +1)’ ’ Some Y are zero

sin−1(√

Y ) var(ei) ∝ E(Yi)(1 − E(Yi)) Binomial proportions

Page 23: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Detection

Remedy

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 23 of 36

Nonlinear Relationship Between Predictor(s) and Y

■ Situations occur where a non-linear relationship betweenpredictors and Y is present.

■ For example, imagine that the true relationship between Yand X is something like:

Y = aXb

■ To use linear regression techniques we are well aware of,this function must be transformed (both Y and X):

ln(Y ) = ln(a) + b ln(X)

Page 24: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Detection

Remedy

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 24 of 36

Nonlinear Relationship Between Predictor(s) and Y

■ Depending on the situation, not all functions can be madelinear:

Y = a1eb1X1 + a2e

b2X2

■ Furthermore, depending on the error dependency(multiplicative or additive), transformations will not lead toerrors with the distributional assumptions of linearregression.

■ Linear regression can only go so far, so if data have afunctional relationship other methods may be better suited.

Page 25: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Detection

Remedy

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 25 of 36

Detection of Nonlinearity

■ Often, nonlinearity is detected visually, through use of theresidual plots.

5.00000 5.50000 6.00000 6.50000 7.00000

Unstandardized Predicted Value

−1.00000

0.00000

1.00000S

tan

da

rdiz

ed

Re

sid

ua

l

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

Page 26: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Detection

Remedy

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 26 of 36

Possible Remedy for Nonlinearity

■ As discussed, a possible remedy for nonlinearity is to use atransformation of both Y and X.

◆ Situations occur where a non-linear relationship betweenpredictors and Y is present.

(Weisberg, 1985; p. 142)

Y Transformation X Transformation Form

ln(Y ) ln(X) Y = aXb11 X

b22 . . . X

bkk

ln(Y ) X Y = aeb1X1+b2X2+...+bkXk

Y ln(X) Y = a + b1 ln(X1) + b2 ln(X2) + . . . + ln(bk)Xk1Y

1X

Y = 1a+(b1/X1)+(b2/X2)+...+(bk/Xk)

1Y

X Y = 1a+(b1X1)+(b2X2)+...+(bkXk)

Y 1X

Y = a + (b11

X1+ b2

1X2

+ . . . + bk1

Xk

Page 27: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Transformation Parameter 1

Transformation Parameter 2

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 27 of 36

Parameterizing Transformations

■ Instead of choosing some type of transformation functionseemingly arbitrarily, statistical techniques have beendeveloped to transform both Y and the set of all predictors Xbased on known functions.

■ These techniques bear mention because from time to timeyou will encounter estimates based on these.

■ Furthermore, a clear functional relationship between Y andX may not be known, either from substantive theory orempirical results.

■ In these situations, linear methods are often easier to relyupon because of parsimony (real or perceived).

Page 28: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Transformation Parameter 1

Transformation Parameter 2

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 28 of 36

Transformation of Y : Optimization

■ Consider the family of regression models (power models):

yλ = Xb + e.

■ Finding λ gives an idea of the power relationship betweenthe response and the predictor variables.

■ Such transformations are often referred to as Box-Coxtransformations, and involve iterative techniques to find themost likely value of λ.

■ Another transformation method is called Atkinson’s scoremethod.

Page 29: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Transformation Parameter 1

Transformation Parameter 2

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 29 of 36

Transformation of X: Optimization

■ Similar methods have been developed for transforming X.

■ Transforming X is inherently more difficulty.

■ Often times absurd results can be the outcome.

■ Many times non-significant relationships obscure anyneeded transformations.

Page 30: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 30 of 36

Nonnormality

■ The last assumption to be checked for is the normality of theresiduals.

■ Detecting nonnormality can be tricky, and is often basedupon sample size.

■ Statistically speaking, there is not a hypothesis test that canconclude that a variable is normally distributed.

■ Violations of this assumption lead to inaccurate p-values inhypothesis tests (only).

■ The p-values, however, are fairly robust to violations of thisassumption.

Page 31: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 31 of 36

Detection of Nonnormality: Probability Plots

■ The easiest way to detect nonnormal errors is to use a Q-Qplot.

■ A Q-Q plot is a plot of an ordered variable against what itsexpected value should be for a variable from a normaldistribution with size N.

■ In SPSS: Graphs...Q-Q (check Standardize values and besure Test distribution is Normal).

■ The plot of the data should fall on the line produced on theplot.

■ If not, the data are not from a normal distribution.

Page 32: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 32 of 36

Detection of Nonnormality: Probability Plots

Page 33: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 33 of 36

Detection of Nonnormality: Hypothesis Tests

■ Additionally, statistical hypothesis tests have been developedto test the null hypothesis that the data (in this case theresiduals) can from a normal distribution.

◆ Shapiro-Wilk test.

◆ Kolmogorov-Smirnov test.

■ In SPSS: get the unstandardized residuals and go toAnalyze...Descriptive Statistics...Explore.

◆ Put the residuals in the Dependent List box.

◆ Click on Plots and check “Normality plots with tests".

■ If p-value is less than some α, then reject the nullhypothesis...residuals are not from a normal distribution.

■ Little validity under small samples.

Page 34: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 34 of 36

Test for Correlated Errors

■ Finally, there is a hypothesis test for seriation effects in thepredictor variables.

■ The Durbin-Watson statistic tests for correlation betweenadjacent observations.

■ Really, this test is only valid if observations were made onequal time intervals.

■ In SPSS: Analyze...Regresssion...Linear.

◆ Click on the Statistics box.

◆ Under Residuals check Durbin-Watson.

Page 35: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Moment of Zen

Next Class

Lecture #7 - 2/15/2005 Slide 35 of 36

Here it is: Your Moment of Zen

■ Regression diagnostics areimportant parts of ananalysis that must not beoverlooked.

■ Often times, inferences canbe wrong if assumptions ofthe regression have notbeen met.

■ Transformations can take forever, and may not get you closerto a good result with respect to assumptions.

■ Listen to your data, they are trying to tell you something.

Page 36: Advanced Regression Topics: Violation of Assumptions · Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, ... Regression Analysis Assumptions ... and conclude

Overview

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Moment of Zen

Next Class

Lecture #7 - 2/15/2005 Slide 36 of 36

Next Time

■ Bringing it all together: Case studies in regression.