Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS

1 Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Slides byJOHN

LOUCKS& Updated

bySPIROS

VELIANITIS

2 Slide


Chapter 14Chapter 14 Simple Linear Regression Simple Linear Regression

Simple Linear Regression ModelSimple Linear Regression Model Least Squares MethodLeast Squares Method Coefficient of DeterminationCoefficient of Determination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression Using the Estimated Regression

EquationEquation for Estimation and Predictionfor Estimation and Prediction Residual Analysis: Validating Model Residual Analysis: Validating Model AssumptionsAssumptions Outliers and Influential Outliers and Influential ObservationsObservations

3 Slide


Simple Linear RegressionSimple Linear Regression

Regression analysisRegression analysis can be used to develop ancan be used to develop an equation showing how the variables are related.equation showing how the variables are related.

Managerial decisions often are based on theManagerial decisions often are based on the relationship between two or more variables.relationship between two or more variables.

The variables being used to predict the value of theThe variables being used to predict the value of the dependent variable are called the dependent variable are called the independentindependent variablesvariables and are denoted by and are denoted by xx..

Variation in a variable is explained by another variableVariation in a variable is explained by another variable..

The variable being predicted is called the The variable being predicted is called the dependentdependent variablevariable and is denoted by and is denoted by yy..

4 Slide


Simple Linear RegressionSimple Linear Regression

The relationship between the two variables isThe relationship between the two variables is approximated by a straight line.approximated by a straight line.

Simple linear regressionSimple linear regression involves one independentinvolves one independent variable and one dependent variable.variable and one dependent variable.

Regression analysis involving two or more Regression analysis involving two or more independent variables is called independent variables is called multiple regressionmultiple regression..

5 Slide


Simple Linear Regression ModelSimple Linear Regression Model

yy = = 00 + + 11xx + +

where:where:00 and and 11 are called are called parameters of the modelparameters of the model,, is a random variable called theis a random variable called the error termerror term..

The The simple linear regression modelsimple linear regression model is: is:

The equation that describes how The equation that describes how yy is related to is related to xx and and an error term is called the an error term is called the regression modelregression model..

6 Slide


Simple Linear Regression EquationSimple Linear Regression Equation

The The simple linear regression equationsimple linear regression equation is: is:

• EE((yy) is the expected value of ) is the expected value of yy for a given for a given xx value. value.• 11 is the is the slope of the regression lineslope of the regression line..• 00 is the is the yy intercept of the regression line intercept of the regression line..• Graph of the regression equation is a straight line.Graph of the regression equation is a straight line.

EE((yy) = ) = 00 + + 11xx

7 Slide



Positive Linear RelationshipPositive Linear Relationship

EE((yy))

xx

Slope Slope 11is positiveis positive

Regression lineRegression line

InterceptIntercept00

8 Slide



Negative Linear RelationshipNegative Linear Relationship

EE((yy))

xx

Slope Slope 11is negativeis negative

Regression lineRegression lineInterceptIntercept00

9 Slide



No RelationshipNo Relationship

EE((yy))

xx

Slope Slope 11is 0is 0

Regression lineRegression lineInterceptIntercept

00

10 Slide


Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation

The The estimated simple linear regression estimated simple linear regression equationequation

0 1y b b x

• is the estimated value of is the estimated value of yy for a given for a given xx value. value.y• bb11 is the slope of the line. is the slope of the line.• bb00 is the is the yy intercept of the line. intercept of the line.

• The graph is called the estimated regression line.The graph is called the estimated regression line.

11 Slide


Estimation ProcessEstimation Process

Regression ModelRegression Modelyy = = 00 + + 11xx + +

Regression EquationRegression EquationEE((yy) = ) = 00 + + 11xx

Unknown ParametersUnknown Parameters00, , 11

Sample Data:Sample Data:x yx yxx11 y y11. .. . . .. . xxnn yynn

bb00 and and bb11provide estimates ofprovide estimates of

00 and and 11

EstimatedEstimatedRegression EquationRegression Equation

Sample StatisticsSample Statistics

bb00, , bb11

0 1y b b x

12 Slide


Least Squares MethodLeast Squares Method The least squares method is a procedure for using sample data to find the estimated regression equationThe least squares method is a procedure for using sample data to find the estimated regression equation Least Squares CriterionLeast Squares Criterion

min (y yi i )2

where:where:yyii = = observedobserved value of the dependent variable value of the dependent variable for the for the iith observationth observation

^yyii = = estimatedestimated value of the dependent variable value of the dependent variable for the for the iith observationth observation

13 Slide


Slope for the Estimated Regression Equation is Slope for the Estimated Regression Equation is calculated using Differential Calculus aid is:calculated using Differential Calculus aid is:

1 2( )( )

( )i i

i

x x y yb

x x

Least Squares MethodLeast Squares Method

where:where:xxii = value of independent variable for = value of independent variable for iithth observationobservation

__yy = mean value for dependent variable = mean value for dependent variable

__xx = mean value for independent variable = mean value for independent variable

yyii = value of dependent variable for = value of dependent variable for iithth observationobservation

14 Slide


Reed Auto periodically has a special week-long sale. As part Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below.from a sample of 5 previous sales are shown below.

Simple Linear RegressionSimple Linear Regression Example: Reed Auto SalesExample: Reed Auto Sales

Number ofNumber of TV Ads (TV Ads (xx))

Number ofNumber ofCars Sold (Cars Sold (yy))

1133221133

14142424181817172727

xx = 10 = 10 yy = 100 = 1002x 20y

15 Slide


Scatter Diagram and Trend LineScatter Diagram and Trend Line

y = 5x + 10

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Car

s So

ld

16 Slide


Coefficient of DeterminationCoefficient of Determination How well does the estimated regression equation fit the data? The coefficient of determination provides a How well does the estimated regression equation fit the data? The coefficient of determination provides a

measure of goodness of fit for the estimated regression equation. SSE is the measure of goodness of fit for the estimated regression equation. SSE is the sum of squares due to error sum of squares due to error sums the sums the residualsresiduals or errors. or errors.

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

SST = SSR + SST = SSR + SSE SSE

The The coefficient of determinationcoefficient of determination is: is:

rr22 = SSR/SST = SSR/SST

17 Slide


Coefficient of DeterminationCoefficient of Determination

rr22 = SSR/SST = 100/114 = .8772 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 87.7%The regression relationship is very strong; 87.7%of the variability in the number of cars sold can beof the variability in the number of cars sold can beexplained by the linear relationship between theexplained by the linear relationship between thenumber of TV ads and the number of cars sold.number of TV ads and the number of cars sold.

18 Slide


Sample Correlation CoefficientSample Correlation Coefficient

21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy

where:where: bb11 = the slope of the estimated regression = the slope of the estimated regression equationequation xbby 10ˆ

The The correlation coefficient correlation coefficient is a descriptive measure of is a descriptive measure of the strength of a linear equation between two variables the strength of a linear equation between two variables x and y. Values of the correlation coefficient are always x and y. Values of the correlation coefficient are always between -1 (negative or inverse relation) and +1 between -1 (negative or inverse relation) and +1 (positive relation). Zero (0), or close to zero, indicates (positive relation). Zero (0), or close to zero, indicates no relationship.no relationship.

19 Slide


21 ) of(sign rbrxy

The sign of The sign of bb11 in the equation in the equation is “+”. is “+”.ˆ 10 5y x

=+ .8772xyr

Sample Correlation CoefficientSample Correlation Coefficient

rrxyxy = = +.9366 +.9366

20 Slide


Testing for SignificanceTesting for Significance To test for a significant regression relationship, we must conduct aTo test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of hypothesis test to determine whether the value of 11 is zero is zero because if because if 11 is zero, we would conclude that the two variables is zero, we would conclude that the two variables are not related. Also, if are not related. Also, if 11 is not zero the two variables are related. is not zero the two variables are related.

Two tests are commonly used:Two tests are commonly used:

tt Test Test andand FF Test Test

Both the Both the tt test and test and FF test require an estimate of test require an estimate of 22, the variance , the variance of of in the regression model. in the regression model.

21 Slide


An Estimate of An Estimate of 22

Testing for SignificanceTesting for Significance

ss 22 = MSE = SSE/( = MSE = SSE/(n n 2) 2)

The mean square error (MSE) provides the estimateThe mean square error (MSE) provides the estimateof of 22, and the notation , and the notation ss22 is also used. is also used.

where:where: SSE = sum of squares due to errorSSE = sum of squares due to error

22 Slide


HypothesesHypotheses

Testing for Significance: Testing for Significance: tt Test Test

0 1: 0H

1: 0aH

Rejection RuleRejection Rule

where: where: tt is based on a is based on a tt distribution distributionwith with nn - 2 degrees of freedom - 2 degrees of freedom

Reject Reject HH00 if if pp-value -value << or or tt << - -ttor or tt >> tt

23 Slide


1. Determine the hypotheses.1. Determine the hypotheses.

2. Specify the level of significance.2. Specify the level of significance.

3. Select the test statistic.3. Select the test statistic.

= .05= .05

4. State the rejection rule.4. State the rejection rule.Reject Reject HH00 if if pp-value -value << .05 .05or |or |t|t| > 3.182 (with > 3.182 (with

3 degrees of freedom)3 degrees of freedom)


0 1: 0H

1: 0aH

1

1

b

bts

24 Slide



5. Compute the value of the test statistic.5. Compute the value of the test statistic.

6. Determine whether to reject 6. Determine whether to reject HH00..tt = 4.541 provides an area of .01 in the upper = 4.541 provides an area of .01 in the uppertail. Hence, the tail. Hence, the pp-value is less than .02. (Also,-value is less than .02. (Also,tt = 4.63 > 3.182.) We can reject = 4.63 > 3.182.) We can reject HH00..

1

1 5 4.631.08b

bts

25 Slide


Confidence Interval for Confidence Interval for 11

HH00 is rejected if the hypothesized value of is rejected if the hypothesized value of 11 is not is not included in the confidence interval for included in the confidence interval for 11..

We can use a 95% confidence interval for We can use a 95% confidence interval for 11 to test to test the hypotheses just used in the the hypotheses just used in the tt test. test.

26 Slide


The form of a confidence interval for The form of a confidence interval for 11 is: is:


11 / 2 bb t s

wherewhere is the is the tt value providing an area value providing an areaof of /2 in the upper tail of a /2 in the upper tail of a tt distribution distributionwith with n n - 2 degrees of freedom- 2 degrees of freedom

2/tbb11 is the is the

pointpointestimatestimat

oror

is theis themarginmarginof errorof error

1/ 2 bt s

27 Slide



Reject Reject HH00 if 0 is not included in if 0 is not included inthe confidence interval for the confidence interval for 11..

0 is not included in the confidence interval. 0 is not included in the confidence interval. Reject Reject HH00

= 5 +/- 3.182(1.08) = 5 +/- 3.44= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb

or 1.56 to 8.44or 1.56 to 8.44


95% Confidence Interval for 95% Confidence Interval for 11

ConclusionConclusion

28 Slide


HypothesesHypotheses

Test StatisticTest Statistic

Testing for Significance: Testing for Significance: FF Test Test

FF = MSR/MSE = MSR/MSE

0 1: 0H

1: 0aH

29 Slide




where:where:FF is based on an is based on an FF distribution with distribution with1 degree of freedom in the numerator and1 degree of freedom in the numerator andnn - 2 degrees of freedom in the denominator - 2 degrees of freedom in the denominator

Reject Reject HH00 if if pp-value -value <<

or or FF >> FF

30 Slide


1. Determine the hypotheses.1. Determine the hypotheses.

2. Specify the level of significance.2. Specify the level of significance.

3. Select the test statistic.3. Select the test statistic.

= .05= .05

4. State the rejection rule.4. State the rejection rule.Reject Reject HH00 if if pp-value -value << .05 .05or or FF >> 10.13 (with 10.13 (with 1 d.f.1 d.f.

in numerator andin numerator and 3 d.f. in denominator)3 d.f. in denominator)


0 1: 0H

1: 0aH

FF = MSR/MSE = MSR/MSE

31 Slide



5. Compute the value of the test statistic.5. Compute the value of the test statistic.

6. Determine whether to reject 6. Determine whether to reject HH00.. FF = 17.44 provides an area of .025 in = 17.44 provides an area of .025 in the upper tail. Thus, the the upper tail. Thus, the pp-value -value corresponding to corresponding to FF = 21.43 is less than = 21.43 is less than 2(.025) = .05. Hence, we reject 2(.025) = .05. Hence, we reject HH00..

FF = MSR/MSE = 100/4.667 = 21.43 = MSR/MSE = 100/4.667 = 21.43

The statistical evidence is sufficient to The statistical evidence is sufficient to concludeconcludethat we have a significant relationship that we have a significant relationship between thebetween thenumber of TV ads aired and the number of number of TV ads aired and the number of cars sold. cars sold.

32 Slide


Some Cautions about theSome Cautions about theInterpretation of Significance TestsInterpretation of Significance Tests

Just because we are able to reject Just because we are able to reject HH00: : 11 = 0 and = 0 and demonstrate statistical significance does not enabledemonstrate statistical significance does not enable

us to conclude that there is a us to conclude that there is a linear relationshiplinear relationshipbetween between xx and and yy..

Rejecting Rejecting HH00: : 11 = 0 and concluding that = 0 and concluding that thethe

relationship between relationship between xx and and yy is significant is significant does does not enable us to conclude that a not enable us to conclude that a cause-cause-and-effectand-effect

relationshiprelationship is present between is present between xx and and yy..

33 Slide


If 3 TV ads are run prior to a sale, we If 3 TV ads are run prior to a sale, we expectexpectthe mean number of cars sold to be:the mean number of cars sold to be:

Point EstimationPoint Estimation

^yy = 10 + 5(3) = 25 cars = 10 + 5(3) = 25 cars

34 Slide


The 95% confidence interval estimate of the The 95% confidence interval estimate of the mean number of cars sold when 3 TV ads mean number of cars sold when 3 TV ads are run is:are run is:

Confidence Interval for Confidence Interval for EE((yypp))

25 25 ++ 4.61 4.61

/ y t sp yp 2

25 25 ++ 3.1824(1.4491) 3.1824(1.4491)

20.39 to 29.61 cars20.39 to 29.61 cars

35 Slide


The 95% prediction interval estimate of the The 95% prediction interval estimate of the number of cars sold in one particular week number of cars sold in one particular week when 3 TV ads are run is:when 3 TV ads are run is:

Prediction Interval for Prediction Interval for yypp

25 25 ++ 8.28 8.2825 25 ++ 3.1824(2.6013) 3.1824(2.6013)

/ 2 indpy t s

16.72 to 33.28 cars16.72 to 33.28 cars

36 Slide


Residual AnalysisResidual Analysis

ˆi iy y

Much of the residual analysis is based on anMuch of the residual analysis is based on an examination of graphical plots.examination of graphical plots.

Residual for Observation Residual for Observation ii The residuals provide the best information about The residuals provide the best information about ..

If the assumptions about the error term If the assumptions about the error term appear appear questionable, the hypothesis tests about thequestionable, the hypothesis tests about the significance of the regression relationship and thesignificance of the regression relationship and the interval estimation results may not be valid.interval estimation results may not be valid.

37 Slide


Residual Plot Against Residual Plot Against xx If the assumption that the variance of If the assumption that the variance of is the is the

same for all values of same for all values of x x is valid, and the is valid, and the assumed regression model is an adequate assumed regression model is an adequate representation of the relationship between the representation of the relationship between the variables, thenvariables, then

The residual plot should give an overallThe residual plot should give an overall impression of a horizontal band of pointsimpression of a horizontal band of points

38 Slide


xx

ˆy y

00

Good PatternGood PatternRe

sidua

lRe

sidua

l

Residual Plot Against Residual Plot Against xx

39 Slide



xx

ˆy y

00

Resid

ual

Resid

ual

Nonconstant VarianceNonconstant Variance

40 Slide



xx

ˆy y

00

Resid

ual

Resid

ual

Model Form Not AdequateModel Form Not Adequate

41 Slide


ResidualsResiduals


Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

42 Slide



TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Resi

dual

s

43 Slide


Standardized Residual PlotStandardized Residual Plot

The standardized residual plot can provide The standardized residual plot can provide insight about the assumption that the error insight about the assumption that the error term term has a normal distribution. has a normal distribution.

If this assumption is satisfied, the distribution If this assumption is satisfied, the distribution of the standardized residuals should appear to of the standardized residuals should appear to come from a standard normal probability come from a standard normal probability distribution.distribution.

44 Slide


Standardized ResidualsStandardized Residuals


Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069

45 Slide


Standardized Residual Standardized Residual PlotPlot


A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30

Cars Sold

Stan

dard

Res

idua

ls

46 Slide



All of the standardized residuals are between –All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason 1.5 and +1.5 indicating that there is no reason to question the assumption that to question the assumption that has a normal has a normal distribution.distribution.

47 Slide


Outliers and Influential ObservationsOutliers and Influential Observations Detecting OutliersDetecting Outliers

• An An outlier outlier is an observation that is unusual is an observation that is unusual in comparison with the other data.in comparison with the other data.

• Minitab classifies an observation as an Minitab classifies an observation as an outlier if its standardized residual value is < outlier if its standardized residual value is < -2 or > +2.-2 or > +2.

• This standardized residual rule sometimes This standardized residual rule sometimes fails to identify an unusually large fails to identify an unusually large observation as being an outlier.observation as being an outlier.

• This rule’s shortcoming can be This rule’s shortcoming can be circumvented by using circumvented by using studentized deleted studentized deleted residualsresiduals..

• The |The |i i th studentized deleted residual| will th studentized deleted residual| will be larger than the |be larger than the |i i th standardized th standardized residual|.residual|.

Documents

Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS