48
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regressi on Diagnost ics

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Embed Size (px)

Citation preview

Page 1: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 1

Chapter 22Regression Diagnostics

Page 2: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 2

22.1 Changing Variation

Although regression analysis allows the use of prices of different size homes to estimate the home of a specific size, prices tend to be more variable for larger homes. How does this affect the SRM? Consider how to recognize and fix three potential

problems affecting regression models: changing variation in the data, outliers, and dependence among observations

Page 3: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 3

22.1 Changing Variation

Price ($000) vs. Home Size (Sq. Ft.)

Both the average and standard deviation in price increase as home size increases.

Page 4: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 4

22.1 Changing Variation

SRM Results: Home Price Example

Page 5: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 5

22.1 Changing Variation

Fixed Costs, Marginal Costs, and Variable Costs

The estimated intercept (50.599) can be interpreted as the fixed cost of a home.

The 95% confidence interval for the intercept (after rounding) is -$4,000 to $105,000.

Since it includes zero, this interval is not a precise estimate of fixed costs.

Page 6: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 6

22.1 Changing Variation

Fixed Costs, Marginal Costs, and Variable Costs

The slope (0.159) estimates the marginal cost of an additional square foot of space.

The 95% confidence interval for the slope (after rounding) is $135,000 to $183,500.

It can be interpreted as the average difference in home price associated with 1,000 square feet.

Page 7: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 7

22.1 Changing Variation

Detecting Differences in Variation

Based on the scatterplot, the association between home price and size appears linear.

Little concern about lurking variables since the sample of homes is from the same neighborhood.

Similar variances condition is not satisfied.

Page 8: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 8

22.1 Changing Variation

Detecting Differences in Variation

Fan-shaped appearance of residual plot indicates changing variances.

Page 9: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 9

22.1 Changing Variation

Detecting Differences in Variation

Side-by-side boxplots confirm that variances increase as home size increases.

Page 10: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 10

22.1 Changing Variation

Detecting Differences in Variation

Heteroscedastic: errors that have different amounts of variation.

Homoscedastic: errors having equal amounts of variation.

Page 11: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 11

22.1 Changing Variation

Consequences of Different Variation

Prediction intervals are too narrow or too wide.

Confidence intervals for the slope and intercept are not reliable.

Hypothesis tests regarding β0 and β1 are not reliable.

Page 12: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 12

22.1 Changing Variation

Consequences of Different Variation

The 95% prediction intervals are too wide for small homes and too narrow for large homes.

Page 13: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 13

22.1 Changing Variation

Fixing the Problem: Revise the Model

If F represents fixed cost and M marginal costs, the equation of the SRM becomes

Price = SqFtMF

Page 14: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 14

22.1 Changing Variation

Fixing the Problem: Revise the Model

Divide both sides of the equation by the number of square feet and simplify:

SqFt

SqFt

SqFt

Price

MF

'

SqFt

1 FM

Page 15: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 15

22.1 Changing Variation

Fixing the Problem: Revise the Model

The response variable becomes price per square foot and the explanatory variable becomes the reciprocal of the number of square feet.

The marginal cost M is the intercept and the slope is F, the fixed cost.

The residuals have similar variances.

Page 16: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 16

22.1 Changing Variation

Fixing the Problem: Revise the Model

Boxplots confirm homoscedastic errors.

Page 17: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 17

4M Example 22.1: ESTIMATING HOME PRICES

Motivation

A company is relocating several managers to the Seattle area. For budgeting purposes, they would like a break down of home prices into fixed and variable costs to better prepare for negotiations with realtors.

Page 18: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 18

4M Example 22.1: ESTIMATING HOME PRICES

Method

Data consists of a sample of 94 homes for sale in Seattle. The explanatory variable is the reciprocal of home size and the response is price per square foot. The scatterplot shows a linear association and there are no obvious lurking variables.

Page 19: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 19

4M Example 22.1: ESTIMATING HOME PRICES

Mechanics

Evidently independent, similar variances, and nearly normal conditions met.

Page 20: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 20

4M Example 22.1: ESTIMATING HOME PRICES

Mechanics

The SRM results.

Page 21: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 21

4M Example 22.1: ESTIMATING HOME PRICES

Mechanics

The fitted equation is

Estimated $/SqFt = 157.753 + 53,887/SqFt.

The 95% confidence interval for the intercept is [136.86 to 178.65] and the 95% confidence interval for the slope is [18,663to 89,111].

Page 22: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 22

4M Example 22.1: ESTIMATING HOME PRICES

Message

Prices for homes in this Seattle neighborhood run about $140 to $180 per square foot, on average. Average fixed costs associated with the purchase are in the range $20,000 to $90,000, with 95% confidence.

Page 23: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 23

22.1 Changing Variation

Comparing Models with Different Responses

Even though the revised model has a smaller r2,

It provides more reliable and narrower confidence intervals for fixed and variable costs; and

It provides more sensible prediction intervals.

Page 24: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 24

22.1 Changing Variation

Comparing Models with Different Responses

Page 25: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 25

22.1 Changing Variation

Comparing Models with Different Responses

Page 26: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 26

22.2 Outliers

Consider a Contractor’s Bid on a Project

A contractor is bidding on a project to construct an 875 square-foot addition to a home.

If he bids too low, he loses money on the project.

If he bids too high, he does not get the job.

Page 27: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 27

22.2 Outliers

Contractor Data for n=30 Similar Projects

Note that all but one of his previous projects are smaller than 875 square feet.

Page 28: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 28

22.2 Outliers

Contractor Example

His one project at 900 square feet is an outlier.

It is also a leveraged observation as it pulls the regression line in its direction.

Leveraged: an observation in regression that has a small or large value of the explanatory variable.

Page 29: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 29

22.2 Outliers

Consequences of an Outlier

To see the consequences of an outlier, fit the least squares regression line both with and without it.

Use the standard errors obtained without including the outlier to compare estimates.

Page 30: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 30

22.2 Outliers

Consequences for the Contractor Example

Page 31: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 31

22.2 Outliers

Consequences for the Contractor Example

Including the outlier shifts the estimated fixed cost up by about 1.5 standard errors.

Including the outlier shifts the estimated marginal cost down by about 1.56 standard errors.

Page 32: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 32

22.2 Outliers

Consequences for the Contractor Example

Prediction intervals when the outlier is included.

Page 33: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 33

22.2 Outliers

Consequences for the Contractor Example

Prediction intervals when the outlier is not included.

Page 34: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 34

22.2 Outliers

Fixing the Problem: More Information

If the outlier describes what is expected the next time under the same conditions, then it should be included.

In the contractor example, more information is needed to decide whether to include or exclude the outlier.

Page 35: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 35

22.3 Dependent Errors and Time Series

Detecting Dependence

With time series data, plot residuals versus time to look for a pattern indicating dependence in the errors.

Use the Durbin-Watson statistic to test for correlation between adjacent residuals (known as autocorrelation).

Page 36: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 36

22.3 Dependent Errors and Time Series

Detecting Dependence

Scatterplot suggests few problems with fit.

Page 37: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 37

22.3 Dependent Errors and Time Series

Detecting Dependence

Timeplot of residuals from regression of change in employment on utilization reveals dependence.

Page 38: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 38

22.3 Dependent Errors and Time Series

The Durbin-Watson Statistic

Tests the null hypothesis H0: ρε = 0.

Is calculated as follows:

2

222

21

12

232

12

...

)(...)()(

n

nn

eee

eeeeeeD

Page 39: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 39

22.3 Dependent Errors and Time Series

The Durbin-Watson Statistic Use p-value provided by software or table of

critical values at α = 0.05 (portion shown below) to draw a conclusion.

Page 40: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 40

22.3 Dependent Errors and Time Series

Consequences of Dependence

If there is positive autocorrelation in the errors, the estimated standard errors are too small.

The estimated slope and intercept are less precise than suggested by the output.

Best remedy is to incorporate the dependence into the regression model.

Page 41: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 41

4M Example 22.2: CELL PHONE SUBSCRIBERS

Motivation

Predict the market for cellular telephone services.

Page 42: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 42

4M Example 22.2: CELL PHONE SUBSCRIBERS

Method

Use simple regression to predict the future number of subscribers. The number of subscriber connections, in millions, is the response. The explanatory variable is the date (time). The scatterplot shows a linear association. Lurking variables may be present, such as technology and marketing.

Page 43: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 43

4M Example 22.2: CELL PHONE SUBSCRIBERS

Mechanics

The least squares equation is

Estimated Subscribers = - 40,142 + 20.1 Date

Page 44: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 44

4M Example 22.2: CELL PHONE SUBSCRIBERS

Mechanics

The timeplot of meandering residuals and D = 0.25 indicate independence condition is not satisfied.

Page 45: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 45

4M Example 22.2: CELL PHONE SUBSCRIBERS

Message

There is a strong upward trend in the number of subscribers that can be summarized by

Estimated Subscribers = -40,142 + 20.1 Date.

However, since the conditions for SRM are not satisfied, we cannot rely on statistical inferences to quantify the uncertainty for predictions.

Page 46: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 46

Best Practices

Make sure that your model makes sense.

Plan to change your model if it does not match the data.

Report the presence of and how you handle any outliers.

Page 47: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 47

Pitfalls

Do not rely on summary statistics like r2 to pick the best model.

Don’t compare r2 between regression models unless the response is the same.

Do not check for normality until you get the right equation.

Page 48: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics

Copyright © 2014, 2011 Pearson Education, Inc. 48

Pitfalls (Continued)

Don’t think that your data are independent if the Durbin-Watson statistic is close to 2.

Never forget to look at plots of the data and model.