47
1 © 2008 Thomson South-Western. All Rights Reserved © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

  • Upload
    rasha

  • View
    52

  • Download
    1

Embed Size (px)

DESCRIPTION

Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS. Chapter 14 Simple Linear Regression. Simple Linear Regression Model. Least Squares Method. Coefficient of Determination. Model Assumptions. Testing for Significance. Using the Estimated Regression Equation - PowerPoint PPT Presentation

Citation preview

Page 1: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

1 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Slides byJOHN

LOUCKS& Updated

bySPIROS

VELIANITIS

Page 2: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

2 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Chapter 14Chapter 14 Simple Linear Regression Simple Linear Regression

Simple Linear Regression ModelSimple Linear Regression Model Least Squares MethodLeast Squares Method Coefficient of DeterminationCoefficient of Determination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression Using the Estimated Regression

EquationEquation for Estimation and Predictionfor Estimation and Prediction Residual Analysis: Validating Model Residual Analysis: Validating Model AssumptionsAssumptions Outliers and Influential Outliers and Influential ObservationsObservations

Page 3: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

3 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear RegressionSimple Linear Regression

Regression analysisRegression analysis can be used to develop ancan be used to develop an equation showing how the variables are related.equation showing how the variables are related.

Managerial decisions often are based on theManagerial decisions often are based on the relationship between two or more variables.relationship between two or more variables.

The variables being used to predict the value of theThe variables being used to predict the value of the dependent variable are called the dependent variable are called the independentindependent variablesvariables and are denoted by and are denoted by xx..

Variation in a variable is explained by another variableVariation in a variable is explained by another variable..

The variable being predicted is called the The variable being predicted is called the dependentdependent variablevariable and is denoted by and is denoted by yy..

Page 4: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

4 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear RegressionSimple Linear Regression

The relationship between the two variables isThe relationship between the two variables is approximated by a straight line.approximated by a straight line.

Simple linear regressionSimple linear regression involves one independentinvolves one independent variable and one dependent variable.variable and one dependent variable.

Regression analysis involving two or more Regression analysis involving two or more independent variables is called independent variables is called multiple regressionmultiple regression..

Page 5: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

5 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression ModelSimple Linear Regression Model

yy = = 00 + + 11xx + +

where:where:00 and and 11 are called are called parameters of the modelparameters of the model,, is a random variable called theis a random variable called the error termerror term..

The The simple linear regression modelsimple linear regression model is: is:

The equation that describes how The equation that describes how yy is related to is related to xx and and an error term is called the an error term is called the regression modelregression model..

Page 6: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

6 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression EquationSimple Linear Regression Equation

The The simple linear regression equationsimple linear regression equation is: is:

• EE((yy) is the expected value of ) is the expected value of yy for a given for a given xx value. value.• 11 is the is the slope of the regression lineslope of the regression line..• 00 is the is the yy intercept of the regression line intercept of the regression line..• Graph of the regression equation is a straight line.Graph of the regression equation is a straight line.

EE((yy) = ) = 00 + + 11xx

Page 7: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

7 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression EquationSimple Linear Regression Equation

Positive Linear RelationshipPositive Linear Relationship

EE((yy))

xx

Slope Slope 11is positiveis positive

Regression lineRegression line

InterceptIntercept00

Page 8: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

8 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression EquationSimple Linear Regression Equation

Negative Linear RelationshipNegative Linear Relationship

EE((yy))

xx

Slope Slope 11is negativeis negative

Regression lineRegression lineInterceptIntercept00

Page 9: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

9 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression EquationSimple Linear Regression Equation

No RelationshipNo Relationship

EE((yy))

xx

Slope Slope 11is 0is 0

Regression lineRegression lineInterceptIntercept

00

Page 10: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

10 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation

The The estimated simple linear regression estimated simple linear regression equationequation

0 1y b b x

• is the estimated value of is the estimated value of yy for a given for a given xx value. value.y• bb11 is the slope of the line. is the slope of the line.• bb00 is the is the yy intercept of the line. intercept of the line.

• The graph is called the estimated regression line.The graph is called the estimated regression line.

Page 11: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

11 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Estimation ProcessEstimation Process

Regression ModelRegression Modelyy = = 00 + + 11xx + +

Regression EquationRegression EquationEE((yy) = ) = 00 + + 11xx

Unknown ParametersUnknown Parameters00, , 11

Sample Data:Sample Data:x yx yxx11 y y11. .. . . .. . xxnn yynn

bb00 and and bb11provide estimates ofprovide estimates of

00 and and 11

EstimatedEstimatedRegression EquationRegression Equation

Sample StatisticsSample Statistics

bb00, , bb11

0 1y b b x

Page 12: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

12 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Least Squares MethodLeast Squares Method The least squares method is a procedure for using sample data to find the estimated regression equationThe least squares method is a procedure for using sample data to find the estimated regression equation Least Squares CriterionLeast Squares Criterion

min (y yi i )2

where:where:yyii = = observedobserved value of the dependent variable value of the dependent variable for the for the iith observationth observation

^yyii = = estimatedestimated value of the dependent variable value of the dependent variable for the for the iith observationth observation

Page 13: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

13 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Slope for the Estimated Regression Equation is Slope for the Estimated Regression Equation is calculated using Differential Calculus aid is:calculated using Differential Calculus aid is:

1 2( )( )

( )i i

i

x x y yb

x x

Least Squares MethodLeast Squares Method

where:where:xxii = value of independent variable for = value of independent variable for iithth observationobservation

__yy = mean value for dependent variable = mean value for dependent variable

__xx = mean value for independent variable = mean value for independent variable

yyii = value of dependent variable for = value of dependent variable for iithth observationobservation

Page 14: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

14 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Reed Auto periodically has a special week-long sale. As part Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below.from a sample of 5 previous sales are shown below.

Simple Linear RegressionSimple Linear Regression Example: Reed Auto SalesExample: Reed Auto Sales

Number ofNumber of TV Ads (TV Ads (xx))

Number ofNumber ofCars Sold (Cars Sold (yy))

1133221133

14142424181817172727

xx = 10 = 10 yy = 100 = 1002x 20y

Page 15: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

15 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Scatter Diagram and Trend LineScatter Diagram and Trend Line

y = 5x + 10

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Car

s So

ld

Page 16: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

16 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Coefficient of DeterminationCoefficient of Determination How well does the estimated regression equation fit the data? The coefficient of determination provides a How well does the estimated regression equation fit the data? The coefficient of determination provides a

measure of goodness of fit for the estimated regression equation. SSE is the measure of goodness of fit for the estimated regression equation. SSE is the sum of squares due to error sum of squares due to error sums the sums the residualsresiduals or errors. or errors.

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

SST = SSR + SST = SSR + SSE SSE

The The coefficient of determinationcoefficient of determination is: is:

rr22 = SSR/SST = SSR/SST

Page 17: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

17 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Coefficient of DeterminationCoefficient of Determination

rr22 = SSR/SST = 100/114 = .8772 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 87.7%The regression relationship is very strong; 87.7%of the variability in the number of cars sold can beof the variability in the number of cars sold can beexplained by the linear relationship between theexplained by the linear relationship between thenumber of TV ads and the number of cars sold.number of TV ads and the number of cars sold.

Page 18: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

18 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Sample Correlation CoefficientSample Correlation Coefficient

21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy

where:where: bb11 = the slope of the estimated regression = the slope of the estimated regression equationequation xbby 10ˆ

The The correlation coefficient correlation coefficient is a descriptive measure of is a descriptive measure of the strength of a linear equation between two variables the strength of a linear equation between two variables x and y. Values of the correlation coefficient are always x and y. Values of the correlation coefficient are always between -1 (negative or inverse relation) and +1 between -1 (negative or inverse relation) and +1 (positive relation). Zero (0), or close to zero, indicates (positive relation). Zero (0), or close to zero, indicates no relationship.no relationship.

Page 19: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

19 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

21 ) of(sign rbrxy

The sign of The sign of bb11 in the equation in the equation is “+”. is “+”.ˆ 10 5y x

=+ .8772xyr

Sample Correlation CoefficientSample Correlation Coefficient

rrxyxy = = +.9366 +.9366

Page 20: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

20 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for SignificanceTesting for Significance To test for a significant regression relationship, we must conduct aTo test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of hypothesis test to determine whether the value of 11 is zero is zero because if because if 11 is zero, we would conclude that the two variables is zero, we would conclude that the two variables are not related. Also, if are not related. Also, if 11 is not zero the two variables are related. is not zero the two variables are related.

Two tests are commonly used:Two tests are commonly used:

tt Test Test andand FF Test Test

Both the Both the tt test and test and FF test require an estimate of test require an estimate of 22, the variance , the variance of of in the regression model. in the regression model.

Page 21: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

21 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

An Estimate of An Estimate of 22

Testing for SignificanceTesting for Significance

ss 22 = MSE = SSE/( = MSE = SSE/(n n 2) 2)

The mean square error (MSE) provides the estimateThe mean square error (MSE) provides the estimateof of 22, and the notation , and the notation ss22 is also used. is also used.

where:where: SSE = sum of squares due to errorSSE = sum of squares due to error

Page 22: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

22 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

HypothesesHypotheses

Testing for Significance: Testing for Significance: tt Test Test

0 1: 0H

1: 0aH

Rejection RuleRejection Rule

where: where: tt is based on a is based on a tt distribution distributionwith with nn - 2 degrees of freedom - 2 degrees of freedom

Reject Reject HH00 if if pp-value -value << or or tt << - -ttor or tt >> tt

Page 23: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

23 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

1. Determine the hypotheses.1. Determine the hypotheses.

2. Specify the level of significance.2. Specify the level of significance.

3. Select the test statistic.3. Select the test statistic.

= .05= .05

4. State the rejection rule.4. State the rejection rule.Reject Reject HH00 if if pp-value -value << .05 .05or |or |t|t| > 3.182 (with > 3.182 (with

3 degrees of freedom)3 degrees of freedom)

Testing for Significance: Testing for Significance: tt Test Test

0 1: 0H

1: 0aH

1

1

b

bts

Page 24: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

24 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: tt Test Test

5. Compute the value of the test statistic.5. Compute the value of the test statistic.

6. Determine whether to reject 6. Determine whether to reject HH00..tt = 4.541 provides an area of .01 in the upper = 4.541 provides an area of .01 in the uppertail. Hence, the tail. Hence, the pp-value is less than .02. (Also,-value is less than .02. (Also,tt = 4.63 > 3.182.) We can reject = 4.63 > 3.182.) We can reject HH00..

1

1 5 4.631.08b

bts

Page 25: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

25 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Confidence Interval for Confidence Interval for 11

HH00 is rejected if the hypothesized value of is rejected if the hypothesized value of 11 is not is not included in the confidence interval for included in the confidence interval for 11..

We can use a 95% confidence interval for We can use a 95% confidence interval for 11 to test to test the hypotheses just used in the the hypotheses just used in the tt test. test.

Page 26: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

26 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

The form of a confidence interval for The form of a confidence interval for 11 is: is:

Confidence Interval for Confidence Interval for 11

11 / 2 bb t s

wherewhere is the is the tt value providing an area value providing an areaof of /2 in the upper tail of a /2 in the upper tail of a tt distribution distributionwith with n n - 2 degrees of freedom- 2 degrees of freedom

2/tbb11 is the is the

pointpointestimatestimat

oror

is theis themarginmarginof errorof error

1/ 2 bt s

Page 27: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

27 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Confidence Interval for Confidence Interval for 11

Reject Reject HH00 if 0 is not included in if 0 is not included inthe confidence interval for the confidence interval for 11..

0 is not included in the confidence interval. 0 is not included in the confidence interval. Reject Reject HH00

= 5 +/- 3.182(1.08) = 5 +/- 3.44= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb

or 1.56 to 8.44or 1.56 to 8.44

Rejection RuleRejection Rule

95% Confidence Interval for 95% Confidence Interval for 11

ConclusionConclusion

Page 28: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

28 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

HypothesesHypotheses

Test StatisticTest Statistic

Testing for Significance: Testing for Significance: FF Test Test

FF = MSR/MSE = MSR/MSE

0 1: 0H

1: 0aH

Page 29: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

29 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Rejection RuleRejection Rule

Testing for Significance: Testing for Significance: FF Test Test

where:where:FF is based on an is based on an FF distribution with distribution with1 degree of freedom in the numerator and1 degree of freedom in the numerator andnn - 2 degrees of freedom in the denominator - 2 degrees of freedom in the denominator

Reject Reject HH00 if if pp-value -value <<

or or FF >> FF

Page 30: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

30 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

1. Determine the hypotheses.1. Determine the hypotheses.

2. Specify the level of significance.2. Specify the level of significance.

3. Select the test statistic.3. Select the test statistic.

= .05= .05

4. State the rejection rule.4. State the rejection rule.Reject Reject HH00 if if pp-value -value << .05 .05or or FF >> 10.13 (with 10.13 (with 1 d.f.1 d.f.

in numerator andin numerator and 3 d.f. in denominator)3 d.f. in denominator)

Testing for Significance: Testing for Significance: FF Test Test

0 1: 0H

1: 0aH

FF = MSR/MSE = MSR/MSE

Page 31: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

31 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: FF Test Test

5. Compute the value of the test statistic.5. Compute the value of the test statistic.

6. Determine whether to reject 6. Determine whether to reject HH00.. FF = 17.44 provides an area of .025 in = 17.44 provides an area of .025 in the upper tail. Thus, the the upper tail. Thus, the pp-value -value corresponding to corresponding to FF = 21.43 is less than = 21.43 is less than 2(.025) = .05. Hence, we reject 2(.025) = .05. Hence, we reject HH00..

FF = MSR/MSE = 100/4.667 = 21.43 = MSR/MSE = 100/4.667 = 21.43

The statistical evidence is sufficient to The statistical evidence is sufficient to concludeconcludethat we have a significant relationship that we have a significant relationship between thebetween thenumber of TV ads aired and the number of number of TV ads aired and the number of cars sold. cars sold.

Page 32: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

32 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Some Cautions about theSome Cautions about theInterpretation of Significance TestsInterpretation of Significance Tests

Just because we are able to reject Just because we are able to reject HH00: : 11 = 0 and = 0 and demonstrate statistical significance does not enabledemonstrate statistical significance does not enable

us to conclude that there is a us to conclude that there is a linear relationshiplinear relationshipbetween between xx and and yy..

Rejecting Rejecting HH00: : 11 = 0 and concluding that = 0 and concluding that thethe

relationship between relationship between xx and and yy is significant is significant does does not enable us to conclude that a not enable us to conclude that a cause-cause-and-effectand-effect

relationshiprelationship is present between is present between xx and and yy..

Page 33: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

33 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

If 3 TV ads are run prior to a sale, we If 3 TV ads are run prior to a sale, we expectexpectthe mean number of cars sold to be:the mean number of cars sold to be:

Point EstimationPoint Estimation

^yy = 10 + 5(3) = 25 cars = 10 + 5(3) = 25 cars

Page 34: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

34 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

The 95% confidence interval estimate of the The 95% confidence interval estimate of the mean number of cars sold when 3 TV ads mean number of cars sold when 3 TV ads are run is:are run is:

Confidence Interval for Confidence Interval for EE((yypp))

25 25 ++ 4.61 4.61

/ y t sp yp 2

25 25 ++ 3.1824(1.4491) 3.1824(1.4491)

20.39 to 29.61 cars20.39 to 29.61 cars

Page 35: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

35 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

The 95% prediction interval estimate of the The 95% prediction interval estimate of the number of cars sold in one particular week number of cars sold in one particular week when 3 TV ads are run is:when 3 TV ads are run is:

Prediction Interval for Prediction Interval for yypp

25 25 ++ 8.28 8.2825 25 ++ 3.1824(2.6013) 3.1824(2.6013)

/ 2 indpy t s

16.72 to 33.28 cars16.72 to 33.28 cars

Page 36: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

36 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Residual AnalysisResidual Analysis

ˆi iy y

Much of the residual analysis is based on anMuch of the residual analysis is based on an examination of graphical plots.examination of graphical plots.

Residual for Observation Residual for Observation ii The residuals provide the best information about The residuals provide the best information about ..

If the assumptions about the error term If the assumptions about the error term appear appear questionable, the hypothesis tests about thequestionable, the hypothesis tests about the significance of the regression relationship and thesignificance of the regression relationship and the interval estimation results may not be valid.interval estimation results may not be valid.

Page 37: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

37 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against Residual Plot Against xx If the assumption that the variance of If the assumption that the variance of is the is the

same for all values of same for all values of x x is valid, and the is valid, and the assumed regression model is an adequate assumed regression model is an adequate representation of the relationship between the representation of the relationship between the variables, thenvariables, then

The residual plot should give an overallThe residual plot should give an overall impression of a horizontal band of pointsimpression of a horizontal band of points

Page 38: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

38 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

xx

ˆy y

00

Good PatternGood PatternRe

sidua

lRe

sidua

l

Residual Plot Against Residual Plot Against xx

Page 39: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

39 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against Residual Plot Against xx

xx

ˆy y

00

Resid

ual

Resid

ual

Nonconstant VarianceNonconstant Variance

Page 40: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

40 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against Residual Plot Against xx

xx

ˆy y

00

Resid

ual

Resid

ual

Model Form Not AdequateModel Form Not Adequate

Page 41: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

41 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

ResidualsResiduals

Residual Plot Against Residual Plot Against xx

Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

Page 42: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

42 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against Residual Plot Against xx

TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Resi

dual

s

Page 43: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

43 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual PlotStandardized Residual Plot

The standardized residual plot can provide The standardized residual plot can provide insight about the assumption that the error insight about the assumption that the error term term has a normal distribution. has a normal distribution.

If this assumption is satisfied, the distribution If this assumption is satisfied, the distribution of the standardized residuals should appear to of the standardized residuals should appear to come from a standard normal probability come from a standard normal probability distribution.distribution.

Page 44: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

44 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Standardized ResidualsStandardized Residuals

Standardized Residual PlotStandardized Residual Plot

Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069

Page 45: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

45 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual Standardized Residual PlotPlot

Standardized Residual PlotStandardized Residual Plot

A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30

Cars Sold

Stan

dard

Res

idua

ls

Page 46: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

46 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual PlotStandardized Residual Plot

All of the standardized residuals are between –All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason 1.5 and +1.5 indicating that there is no reason to question the assumption that to question the assumption that has a normal has a normal distribution.distribution.

Page 47: Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

47 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Outliers and Influential ObservationsOutliers and Influential Observations Detecting OutliersDetecting Outliers

• An An outlier outlier is an observation that is unusual is an observation that is unusual in comparison with the other data.in comparison with the other data.

• Minitab classifies an observation as an Minitab classifies an observation as an outlier if its standardized residual value is < outlier if its standardized residual value is < -2 or > +2.-2 or > +2.

• This standardized residual rule sometimes This standardized residual rule sometimes fails to identify an unusually large fails to identify an unusually large observation as being an outlier.observation as being an outlier.

• This rule’s shortcoming can be This rule’s shortcoming can be circumvented by using circumvented by using studentized deleted studentized deleted residualsresiduals..

• The |The |i i th studentized deleted residual| will th studentized deleted residual| will be larger than the |be larger than the |i i th standardized th standardized residual|.residual|.