21
Regression Models Residuals and Diagnosing the Quality of a Model

Regression Models Residuals and Diagnosing the Quality of a Model

Embed Size (px)

Citation preview

Page 1: Regression Models Residuals and Diagnosing the Quality of a Model

Regression Models

Residuals and Diagnosing the Quality of a Model

Page 2: Regression Models Residuals and Diagnosing the Quality of a Model

Visualizing Regression Models

Page 3: Regression Models Residuals and Diagnosing the Quality of a Model

Collinearity

Page 4: Regression Models Residuals and Diagnosing the Quality of a Model

An Omitted Variable?

Page 5: Regression Models Residuals and Diagnosing the Quality of a Model

Models

• A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it.

• Steps in the Process of Quantitative Analysis:– Specification of the model– Estimation of the model– Evaluation of the model

Page 6: Regression Models Residuals and Diagnosing the Quality of a Model

Thus far…

• We’ve discussed… – The specification of a model,– The estimation of a model and how to read

and interpret the statistics we’ve produced: coefficients, t tests, F tests, R Square

• Now we need to evaluate the model for problems and further elaboration.

Page 7: Regression Models Residuals and Diagnosing the Quality of a Model

We need to evaluate

• The variation in the predicted values and the difference between the Yi and the predicted Y. That difference is called a “residual.”

• We can analyze the residuals to see how good the equation is, and whether there are problems with the model that need correction or improvement.

Page 8: Regression Models Residuals and Diagnosing the Quality of a Model

More statistics…

• Standard Error of the Estimate: The square root of the average squared error of prediction is used as a measure of the accuracy of prediction. (p. 281 and 340 in the text).

• For the population:

• For the sample:

Page 9: Regression Models Residuals and Diagnosing the Quality of a Model

Standard Error of the Estimate

• Used to calculate a confidence interval around the predicted y.

• As a rule of thumb, multiply the SEE by 2 and add and subtract from the predicted Ys to determine a measure of the variability of the prediction at a 95% confidence level.

• At the mean of the independent variable: the standard error of the prediction = SEE/(square root of n).

Page 10: Regression Models Residuals and Diagnosing the Quality of a Model

Hypothetical Example

55

predicted value is 48.8

10

20

30

40

50

60

0 10 20X

Y

residual is 6.2

Page 11: Regression Models Residuals and Diagnosing the Quality of a Model

Example from last week….

Newval = a + b1(Newsize) + b2(Families) + b3(Eastside) + b4(South)

Dep Var: NEWVAL N: 467 Multiple R: 0.75 Squared multiple R: 0.56 Adjusted squared multiple R: 0.55 Standard error of estimate: 19.61 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -3.32 2.95 0.00 . -1.13 0.26NEWSIZE 23.60 1.32 0.67 0.68 17.88 0.00FAMILIES -5.27 2.15 -0.08 0.87 -2.46 0.01EASTSIDE 14.06 2.53 0.20 0.78 5.56 0.00SOUTH 6.08 2.75 0.08 0.81 2.21 0.03

Page 12: Regression Models Residuals and Diagnosing the Quality of a Model

To understand the principles, let’s simplify….

• We return to the bivariate case: • House value is a function of the size of the building. • Regression models assume that the errors of prediction

are homoscedastic, not autocorrelated, normally distributed, and not correlated with the independent variables.

• That is, the error term should be noise. • Now we ask:

– 1. how accurate our prediction is,– 2. what are the characteristics of the residuals or the

error term.

Page 13: Regression Models Residuals and Diagnosing the Quality of a Model

Model of Housing Values and Building Size

Dep Var: NEWVAL N: 467 Multiple R: 0.719 Squared multiple R: 0.517

Adjusted squared multiple R: 0.516 Standard error of estimate: 20.419

Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -8.667 2.012 0.000 . -4.307 0.000

NEWSIZE 25.381 1.138 0.719 1.000 22.312 0.000

Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P

Regression 207571.306 1 207571.306 497.842 0.000

Residual 193878.246 465 416.942

Page 14: Regression Models Residuals and Diagnosing the Quality of a Model

Scatterplot of Newsize and Newval

NEWSIZE

76543210

NE

WV

AL

500

400

300

200

100

0

NEWSIZE

76543210

NE

WV

AL

500

400

300

200

100

0

Page 15: Regression Models Residuals and Diagnosing the Quality of a Model

Scatterplot, cont.

NEWSIZE

76543210

NE

WV

AL

500

400

300

200

100

0

NEWSIZE

76543210

NE

WV

AL

500

400

300

200

100

0 Rsq = 0.5171

Page 16: Regression Models Residuals and Diagnosing the Quality of a Model

95% Confidence Intervals for Mean Predictions of Y (left) and Individual Predictions of Y (right)

NEWSIZE

76543210

NE

WV

AL

500

400

300

200

100

0 Rsq = 0.5171

NEWSIZE

76543210

NE

WV

AL

500

400

300

200

100

0 Rsq = 0.5171

Page 17: Regression Models Residuals and Diagnosing the Quality of a Model

Hypothetical Example

55

predicted value is 48.8

10

20

30

40

50

60

0 10 20X

Y

residual is 6.2

Page 18: Regression Models Residuals and Diagnosing the Quality of a Model

Analysis of Residuals• ESTIMATE NEWVAL RESIDUAL• N of cases 467 467 467• Minimum -2.647 6.400 -56.140• Maximum 157.129 399.600 242.471• Range 159.777 393.200 298.611• Sum 14463.200 14463.200 0.000• Median 25.391 24.000 -0.092• Mean 30.970 30.970 0.000• 95% CI Upper 32.963 33.639 1.775• 95% CI Lower 28.977 28.301 -1.775• Std. Error 1.014 1.358 0.903• Standard Dev 21.917 29.351 19.522• Variance 480.353 861.480 381.127• C.V. 0.708 0.948 9.54775E+14• Skewness(G1) 1.337 6.756 7.030• SE Skewness 0.113 0.113 0.113• Kurtosis(G2) 2.875 67.925 79.001• SE Kurtosis 0.225 0.225 0.225

Page 19: Regression Models Residuals and Diagnosing the Quality of a Model

Visualizing Regression Models

Page 20: Regression Models Residuals and Diagnosing the Quality of a Model

Collinearity

Page 21: Regression Models Residuals and Diagnosing the Quality of a Model

An Omitted Variable?