Upload
axel-walter
View
31
Download
3
Tags:
Embed Size (px)
DESCRIPTION
DSCI 5180: Introduction to the Business Decision Process Spring 2013 – Dr. Nick Evangelopoulos. Lectures 5-6: Simple Regression Analysis (Ch. 3). Chapter 3 Simple Regression Analysis (Part 2). Terry Dielman Applied Regression Analysis: - PowerPoint PPT Presentation
Citation preview
slide 1
DSCI 5180: Introduction to the Business Decision Process
Spring 2013 – Dr. Nick Evangelopoulos
Lectures 5-6: Simple Regression Analysis (Ch. 3)
Simple Regression IISimple Regression II 22Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chapter 3Chapter 3Simple Regression AnalysisSimple Regression Analysis
(Part 2)(Part 2)
Terry DielmanTerry DielmanApplied Regression Analysis:Applied Regression Analysis:
A Second Course in Business and A Second Course in Business and Economic Statistics, fourth editionEconomic Statistics, fourth edition
Simple Regression IISimple Regression II 33Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.4 Assessing the Fit of the 3.4 Assessing the Fit of the Regression LineRegression Line
It some problems, it may not be possible It some problems, it may not be possible to find a good predictor of the to find a good predictor of the yy values. values.
We know the least squares procedure We know the least squares procedure finds the best possible fit, but that des not finds the best possible fit, but that des not guarantee good predictive power.guarantee good predictive power.
In this section we discuss some methods In this section we discuss some methods for summarizing the fit quality.for summarizing the fit quality.
Simple Regression IISimple Regression II 44Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.4.1 The ANOVA Table3.4.1 The ANOVA Table
Let us start by looking at the amount of Let us start by looking at the amount of variation in the variation in the yy values. The variation values. The variation about the mean is:about the mean is:
which we will call SST, the which we will call SST, the total sum of total sum of squaressquares..
Text equations (3.14) and (3.15) show how Text equations (3.14) and (3.15) show how this can be split up into two parts.this can be split up into two parts.
2
1
)( yyn
ii
Simple Regression IISimple Regression II 55Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Partitioning SSTPartitioning SST
SST can be split into two pieces which SST can be split into two pieces which are the previously introduced SSE are the previously introduced SSE and a new quantity, SSR, the and a new quantity, SSR, the regression sum of squaresregression sum of squares..
SST = SSR + SSESST = SSR + SSE2
1
2
1
2
1
)ˆ()ˆ()( i
n
ii
n
ii
n
ii yyyyyy
Simple Regression IISimple Regression II 66Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Explained and Unexplained VariationExplained and Unexplained Variation
We know that SSE is the sum of all the We know that SSE is the sum of all the squared residuals, which represent lack of squared residuals, which represent lack of fit in the observations.fit in the observations.
We call this the We call this the unexplainedunexplained variation in variation in the sample.the sample.
Because SSR contains the remainder of Because SSR contains the remainder of the variation in the sample, it is thus the the variation in the sample, it is thus the variation variation explainedexplained by the regression by the regression equation.equation.
Simple Regression IISimple Regression II 77Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The ANOVA TableThe ANOVA Table
Most statistics packages organize Most statistics packages organize these quantities in an these quantities in an ANANalysis alysis OOf f VAVAriance table.riance table.
SourceSource DFDF SSSS MSMS FF
RegressionRegression 11 SSRSSR MSRMSR MSR/MSEMSR/MSE
ResidualResidual n-2n-2 SSESSE MSEMSE
TotalTotal n-1n-1 SSTSST
Simple Regression IISimple Regression II 88Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.4.2 The Coefficient of Determination3.4.2 The Coefficient of Determination
If we had an exact relationship If we had an exact relationship between between yy and and xx, then SSE would be , then SSE would be zero and SSR = SST.zero and SSR = SST.
Since that does not happen often it is Since that does not happen often it is convenient to use the ratio of SSR to convenient to use the ratio of SSR to SST as measure of how close we get SST as measure of how close we get to the exact relationship.to the exact relationship.
This ratio is called the This ratio is called the Coefficient of Coefficient of DeterminationDetermination or or RR22..
Simple Regression IISimple Regression II 99Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
RR22
SSRSSRRR22 = —— is a fraction between 0 and 1 = —— is a fraction between 0 and 1 SSTSST
In an exact model, RIn an exact model, R22 would be 1. Most of would be 1. Most of the time we multiply by 100 and report it the time we multiply by 100 and report it as a percentage.as a percentage.
Thus, RThus, R22 is the percentage of the variation in is the percentage of the variation in the sample of the sample of y y values that is explained by values that is explained by the regression equation. the regression equation.
Simple Regression IISimple Regression II 1010Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Correlation CoefficientCorrelation Coefficient
Some programs also report the Some programs also report the square root of Rsquare root of R22 as the correlation as the correlation between the between the yy and and y-haty-hat values. values.
When there is only a single predictor When there is only a single predictor variable, as here, the Rvariable, as here, the R22 is just the is just the square of the correlation between square of the correlation between yy and and xx..
Simple Regression IISimple Regression II 1111Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.4.3 The F Test3.4.3 The F Test
An additional measure of fit is provided by An additional measure of fit is provided by the F statistic, which is the ratio of MSR to the F statistic, which is the ratio of MSR to MSE.MSE.
This can be used as another way to test This can be used as another way to test the hypothesis that the hypothesis that 11 = 0. = 0.
This test is not real important in simple This test is not real important in simple regression because it is redundant with regression because it is redundant with the the tt test on the slope. test on the slope.
In multiple regression (next chapter) it is In multiple regression (next chapter) it is much more important.much more important.
Simple Regression IISimple Regression II 1212Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
F Test SetupF Test Setup
The hypotheses are:The hypotheses are:HH00:: 1 1 = 0 = 0
HHaa:: 11 ≠ 0 ≠ 0
The F ratio has 1 numerator degree of The F ratio has 1 numerator degree of freedom and freedom and n-2n-2 denominator degrees of denominator degrees of freedom. freedom.
A critical value for the test is selected from A critical value for the test is selected from that distribution and that distribution and HH00 is rejected if the is rejected if the computed F ratio exceeds the critical computed F ratio exceeds the critical value.value.
Simple Regression IISimple Regression II 1313Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 3.8 Pricing Communications Example 3.8 Pricing Communications Nodes (continued)Nodes (continued)
Below we see the portion of the Below we see the portion of the Minitab output that lists the statistics Minitab output that lists the statistics we have just discussed.we have just discussed.
S = 4307 R-Sq = 88.7% R-Sq(adj) = 87.8%
Analysis of Variance
Source DF SS MS F PRegression 1 1751268376 1751268376 94.41 0.000Residual Error 12 222594146 18549512Total 13 1973862521
Simple Regression IISimple Regression II 1414Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
RR22 and F and F
RR22 = = SSR/SST = 1751268376/ 1973862521SSR/SST = 1751268376/ 1973862521
= .8872 or 88.7%= .8872 or 88.7%
F = MSR/MSE = 1751268376/222594146 F = MSR/MSE = 1751268376/222594146
= 94.41= 94.41
From the FFrom the F1,121,12 distribution, the critical value distribution, the critical value at a 5% significance level is 4.75at a 5% significance level is 4.75
Simple Regression IISimple Regression II 1515Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.5 Prediction or Forecasting With a Simple 3.5 Prediction or Forecasting With a Simple Linear Regression EquationLinear Regression Equation
Suppose we are interested in predicting Suppose we are interested in predicting the cost of a new communications node the cost of a new communications node that had 40 ports.that had 40 ports.
If this size project is something we would If this size project is something we would see often, we might be interested in see often, we might be interested in estimating the average cost of all projects estimating the average cost of all projects with 40 nodes.with 40 nodes.
If it was something we expect to see only If it was something we expect to see only once, we would be interested in predicting once, we would be interested in predicting the cost of the individual project. the cost of the individual project.
Simple Regression IISimple Regression II 1616Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.5.1 Estimating the Conditional Mean 3.5.1 Estimating the Conditional Mean of of yy Given Given xx..
At At xxmm = 40 ports, the quantity we are = 40 ports, the quantity we are estimating is:estimating is:
Our best guess of this is just the point Our best guess of this is just the point on the regression line:on the regression line:
1040| 40 xy
10 40ˆ bbym
Simple Regression IISimple Regression II 1717Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Standard Error of the MeanStandard Error of the Mean
We will want to make an interval estimate, We will want to make an interval estimate, so we need some kind of standard error.so we need some kind of standard error.
Because our point estimate is a function of Because our point estimate is a function of the random variables the random variables bb00 and and bb1 1 their their standard errors figure into our standard errors figure into our computation.computation.
The result is:The result is:
2
2
)1(
)(1
x
mem Sn
xx
nSS
Simple Regression IISimple Regression II 1818Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Where Are We Most Accurate?Where Are We Most Accurate? For estimating the mean at the point For estimating the mean at the point xxm m
the standard error is the standard error is SSmm.. If you examine the formula:If you examine the formula:
you can see that the second term will be you can see that the second term will be zero if we predict at the mean value of zero if we predict at the mean value of xx..
That makes sense—it says you do your That makes sense—it says you do your best prediction right in the center of your best prediction right in the center of your data.data.
2
2
)1(
)(1
x
mem Sn
xx
nSS
Simple Regression IISimple Regression II 1919Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Interval EstimateInterval Estimate
For estimating the conditional mean For estimating the conditional mean of of yy that occurs at that occurs at xxmm we use: we use:
^̂ yymm ± t ± tn-2n-2 S Sm m
We call this a confidence interval for We call this a confidence interval for the mean value of the mean value of yy at at xxmm..
Simple Regression IISimple Regression II 2020Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Hypothesis TestHypothesis Test
We could also perform a hypothesis We could also perform a hypothesis test about the conditional mean.test about the conditional mean.
The hypothesis would be:The hypothesis would be:
HH00: µ: µy|x=40y|x=40 = (some value) = (some value)
and we would construct a and we would construct a tt ratio from ratio from the point estimate and standard the point estimate and standard error.error.
Simple Regression IISimple Regression II 2121Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.5.2 Predicting an Individual 3.5.2 Predicting an Individual Value of Value of yy Given Given xx
If we are trying to say something If we are trying to say something about an individual value of about an individual value of yy it is a it is a little bit harder.little bit harder.
We not only have to first estimate We not only have to first estimate the conditional mean, but we also the conditional mean, but we also have to tack on an allowance for have to tack on an allowance for yy being above or below its mean.being above or below its mean.
We use the same point estimate but We use the same point estimate but our standard error is larger.our standard error is larger.
Simple Regression IISimple Regression II 2222Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Prediction Standard ErrorPrediction Standard Error
It can be show that the prediction It can be show that the prediction standard error is:standard error is:
This looks a lot like the previous one This looks a lot like the previous one but has an additional term under he but has an additional term under he square root sign.square root sign.
The relationship is: The relationship is:
2
2
)1(
)(11
x
mep Sn
xx
nSS
222emp SSS
Simple Regression IISimple Regression II 2323Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Predictive InferencePredictive Inference
Although we could be interested in a Although we could be interested in a hypothesis test, the most common hypothesis test, the most common type of predictive inference is a type of predictive inference is a prediction interval.prediction interval.
The interval is just like the one for The interval is just like the one for the conditional mean, except that the conditional mean, except that SSpp is used in the computation.is used in the computation.
Simple Regression IISimple Regression II 2424Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 3.10 Pricing Communications Example 3.10 Pricing Communications Nodes (one last time)Nodes (one last time)
What do we get when there are 40 What do we get when there are 40 ports?ports?
Many statistics packages have a way Many statistics packages have a way for you to do the prediction. Here is for you to do the prediction. Here is Minitab's output:Minitab's output:
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI1 42600 1178 ( 40035, 45166) ( 32872, 52329)
Values of Predictors for New Observations
New Obs NUMPORTS1 40.0
Simple Regression IISimple Regression II 2525Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
From the OutputFrom the Output
^̂ yym m = = 42600 42600 SSmm = 1178 = 1178
Confidence interval: Confidence interval: 40035 to 4516640035 to 45166computed: 42600 ± 2.179(1178)computed: 42600 ± 2.179(1178)
Prediction interval: 32872 to 52329Prediction interval: 32872 to 52329computed: 42600 ± 2.179(????)computed: 42600 ± 2.179(????)
it does not list Sit does not list Spp
Simple Regression IISimple Regression II 2626Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
InterpretationsInterpretations
For all projects with 40 nodes, we are For all projects with 40 nodes, we are 95% sure that the 95% sure that the average costaverage cost is is between $40,035 and $45,166.between $40,035 and $45,166.
We are 95% sure that any We are 95% sure that any individual individual projectproject will have a cost between will have a cost between $32,872 and $52,329.$32,872 and $52,329.
Simple Regression IISimple Regression II 2727Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.5.3 Assessing Quality of Prediction3.5.3 Assessing Quality of Prediction
We use the model's RWe use the model's R22 as a measure of fit as a measure of fit ability, but this may overestimate the ability, but this may overestimate the model's ability to predict.model's ability to predict.
The reason for that is that RThe reason for that is that R22 is optimized is optimized by the least squares procedure, for the by the least squares procedure, for the data in our sample. data in our sample.
It is not necessarily optimal for data It is not necessarily optimal for data outside our sample, which is what we are outside our sample, which is what we are predicting.predicting.
Simple Regression IISimple Regression II 2828Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Data SplittingData Splitting
We can split the data into two pieces. Use the We can split the data into two pieces. Use the first part to obtain the equation and use it to first part to obtain the equation and use it to predict the data in the second part.predict the data in the second part.
By comparing the actual By comparing the actual yy values in the second values in the second part to their corresponding predicted values, you part to their corresponding predicted values, you get an idea of how well you predict data that is get an idea of how well you predict data that is not in the "fit" sample.not in the "fit" sample.
The biggest drawback to this is that it won't work The biggest drawback to this is that it won't work too well unless we have a lot of data. To be really too well unless we have a lot of data. To be really reliable we should have at least 25 to 30 reliable we should have at least 25 to 30 observations in both samples.observations in both samples.
Simple Regression IISimple Regression II 2929Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The PRESS StatisticThe PRESS Statistic
Suppose you temporarily deleted Suppose you temporarily deleted observation observation ii from the data set, fit a new from the data set, fit a new equation, then used it to predict the equation, then used it to predict the yyii value.value.
Because the new equation did not use any Because the new equation did not use any information from this data point, we get a information from this data point, we get a clearer picture of the model's ability to clearer picture of the model's ability to predict it.predict it.
The sum of these squared prediction The sum of these squared prediction errors is the PRESS statistic. errors is the PRESS statistic.
Simple Regression IISimple Regression II 3030Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Prediction RPrediction R22
It sounds like a lot of work to do by It sounds like a lot of work to do by hand, but most statistics packages hand, but most statistics packages will do it for you.will do it for you.
You can then compute an RYou can then compute an R22-like -like measure called the measure called the prediction Rprediction R22::
SST
PRESSRPRED 12
Simple Regression IISimple Regression II 3131Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
In Our ExampleIn Our Example
For the communications node data we have been using, For the communications node data we have been using, SSE = 222594146, SST =1973862521 and RSSE = 222594146, SST =1973862521 and R22 = 88.7% = 88.7%
Minitab reports that PRESS = 345066019Minitab reports that PRESS = 345066019
Our prediction ROur prediction R2:2:
1 – (345066019/1973862521) = 1 - .175 = .825 or 82.5%1 – (345066019/1973862521) = 1 - .175 = .825 or 82.5%
Although there is a little loss, it implies we still have good Although there is a little loss, it implies we still have good prediction ability.prediction ability.
Simple Regression IISimple Regression II 3232Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.6 Fitting a Linear Trend Model to 3.6 Fitting a Linear Trend Model to Time-Series DataTime-Series Data
Data gathered on different units at the Data gathered on different units at the same point in time are called same point in time are called cross cross sectional datasectional data..
Data gathered on a single unit (person, Data gathered on a single unit (person, firm, etc.) over a sequence of time periods firm, etc.) over a sequence of time periods are called are called time-series datatime-series data..
With this type of data, the primary goal is With this type of data, the primary goal is often building a model that can forecast often building a model that can forecast the futurethe future
Simple Regression IISimple Regression II 3333Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Time Series ModelsTime Series Models
There are many types of models that There are many types of models that attempt to identify patterns of attempt to identify patterns of behavior in a time series in order to behavior in a time series in order to extrapolate it into the future.extrapolate it into the future.
Some of these will be examined in Some of these will be examined in Chapter 11, but here we will just Chapter 11, but here we will just employ a simple employ a simple linear trend modellinear trend model..
Simple Regression IISimple Regression II 3434Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Linear Trend ModelThe Linear Trend ModelWe assume the series displays a steady upward or We assume the series displays a steady upward or
downward behavior over time that can be downward behavior over time that can be described by:described by:
where where tt is the time index ( is the time index (t =1t =1 for the first for the first observation, observation, t=2t=2 for the second, and so forth). for the second, and so forth).
The forecast for this model is quite simple:The forecast for this model is quite simple:
You just insert the appropriate value for You just insert the appropriate value for TT into the into the regression equation.regression equation.
tt ety 10
TbbyT 10ˆ
Simple Regression IISimple Regression II 3535Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 3.11 ABX Company SalesExample 3.11 ABX Company Sales
The ABX Company sells winter sports The ABX Company sells winter sports merchandise including skates and skis. merchandise including skates and skis. The quarterly sales (in $1000s) from first The quarterly sales (in $1000s) from first quarter 1994 through fourth quarter 2003 quarter 1994 through fourth quarter 2003 are graphed on the next slide.are graphed on the next slide.
The time-series plot shows a strong The time-series plot shows a strong upward trend. There are also some upward trend. There are also some seasonal fluctuations which will be seasonal fluctuations which will be addressed in Chapter 7.addressed in Chapter 7.
Simple Regression IISimple Regression II 3636Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
300
250
200
40302010
SA
LES
Index
ABX Company SalesABX Company Sales
Simple Regression IISimple Regression II 3737Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Obtaining the Trend EquationObtaining the Trend Equation
We first need to create the time We first need to create the time index variable which is equal to 1 for index variable which is equal to 1 for first quarter 1994 and 40 for fourth first quarter 1994 and 40 for fourth quarter 2003.quarter 2003.
Once this is created we can obtain Once this is created we can obtain the trend equation by linear the trend equation by linear regression.regression.
Simple Regression IISimple Regression II 3838Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Trend Line EstimationTrend Line EstimationThe regression equation isSALES = 199 + 2.56 TIME
Predictor Coef SE Coef T PConstant 199.017 5.128 38.81 0.000TIME 2.5559 0.2180 11.73 0.000
S = 15.91 R-Sq = 78.3% R-Sq(adj) = 77.8%
Analysis of Variance
Source DF SS MS F PRegression 1 34818 34818 137.50 0.000Residual Error 38 9622 253Total 39 44440
Simple Regression IISimple Regression II 3939Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Slope CoefficientThe Slope Coefficient
The slope in the equation is 2.5559. The slope in the equation is 2.5559. This implies that over this 10-year This implies that over this 10-year period, we saw an average growth in period, we saw an average growth in sales of $2,556 per quarter.sales of $2,556 per quarter.
The hypothesis test on the slope has a The hypothesis test on the slope has a tt value of 11.73, so this is indeed value of 11.73, so this is indeed significantly greater than zero.significantly greater than zero.
Simple Regression IISimple Regression II 4040Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Forecasts For 2004Forecasts For 2004
Forecasts for 2004 can be obtained Forecasts for 2004 can be obtained by evaluating the equation at by evaluating the equation at t = 41, t = 41, 42, 43 42, 43 andand 44. 44.
For example,For example, the salesthe sales in fourth in fourth quarter are forecast:quarter are forecast:SALES = 199 + 2.56 (44) = 311.48
A graph of the data, the estimated trend and the forecasts is next.
Simple Regression IISimple Regression II 4141Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
454035302520151050
300
250
200
TIME
SA
LE
SData, Trend (Data, Trend (——) and Forecast () and Forecast (------))
Simple Regression IISimple Regression II 4242Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.7 Some Cautions in Interpreting 3.7 Some Cautions in Interpreting Regression ResultsRegression Results
Two common mistakes that are made Two common mistakes that are made when using regression analysis are:when using regression analysis are:
1.1. That x causes y to happen, andThat x causes y to happen, and
2.2. That you can use the equation to That you can use the equation to predict y for any value of x.predict y for any value of x.
Simple Regression IISimple Regression II 4343Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.7.1 Association Versus Causality3.7.1 Association Versus Causality
If you have a model with a high RIf you have a model with a high R22, it does , it does not automatically mean that a change in not automatically mean that a change in xx causes causes yy to change in a very predictable to change in a very predictable way.way.
It could be just the opposite, that It could be just the opposite, that y y causes causes xx to change. A high correlation goes both to change. A high correlation goes both ways.ways.
It could also be that both It could also be that both yy and and xx are are changing in response to a third variable changing in response to a third variable that we don't know about. that we don't know about.
Simple Regression IISimple Regression II 4444Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Third FactorThe Third Factor One example of this third factor is the price and One example of this third factor is the price and
gasoline mileage of automobiles. As price gasoline mileage of automobiles. As price increases, there is a sharp drop in mpg. This is increases, there is a sharp drop in mpg. This is caused by size. Larger cars cost more and get caused by size. Larger cars cost more and get less mileage.less mileage.
Another is mortality rate in a country versus Another is mortality rate in a country versus percentage of homes with television. As TV percentage of homes with television. As TV ownership increases, mortality rate drops. This is ownership increases, mortality rate drops. This is probably due to better economic conditions probably due to better economic conditions improving quality of life and simultaneously improving quality of life and simultaneously allowing for greater ownership. allowing for greater ownership.
Simple Regression IISimple Regression II 4545Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
3.7.2 Forecasting Outside the Range of the 3.7.2 Forecasting Outside the Range of the Explanatory VariableExplanatory Variable
When we have a model with a high When we have a model with a high RR22, it means we know a good deal , it means we know a good deal about the relationship of about the relationship of y y and and xx for for the range of the range of xx values in our study. values in our study.
Think of our communication nodes Think of our communication nodes example where number of ports example where number of ports ranged from 12 to 68. Does our ranged from 12 to 68. Does our model even hold if we wanted to model even hold if we wanted to price a massive project of 200 ports?price a massive project of 200 ports?
Simple Regression IISimple Regression II 4646Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
An Extrapolation PenaltyAn Extrapolation Penalty
Recall that our prediction intervals Recall that our prediction intervals were always narrowest when we were always narrowest when we predicted right in the middle of our predicted right in the middle of our data set.data set.
As we go farther and farther outside As we go farther and farther outside the range of our data, the interval the range of our data, the interval gets wider and wider, implying we gets wider and wider, implying we know less and less about what is know less and less about what is going on.going on.
slide 47
DSCI 5180Decision Making
HW 3 – Hypothesis Testing in Regression