Upload
bryan-golden
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
6.1Ch.6 Simple Linear Regression: Continued
To complete the analysis of the simple linear regression model, in this chapter we will consider
• how to measure the variation in yt, that is explained by the model
• how to report the results of a regression analysis
• how changes in the units of measurement affect the estimates
• some alternative functional forms that may be used to represent possible relationships between yt and xt.
6.2The Coefficient of Determination (R2)
Two major reasons for analyzing the model
y = 1 + 2x + eare• To explain how the dependent varaible (yt) changes as the
independent variable (xt) changes• To predict yo given xo.
We want the independent variable (xt) to explain as much of the variation in the dependent variable (yt) as possible.
We introduced the independent variable (xt) in hope that its variation will explain the variation in y
A measure of goodness of fit will measure how much of the variation in the dependent variable (yt) has been explained by variation in the independent variable (xt).
6.3
ttt
ttt
eyy
yye
ˆˆ
ˆˆ
tt xyE 21)(
Separate yt into its explainable and unexplainable components:
where is explainable.
The error term et is unexplainable. Using our estimates for 1 and 2, we get estimates of E(yt) and our residuals give us estimates of the error terms.
ttt eyEy )(
tt xbby 21ˆ Residual is defined as the difference between the actual and the predicted values of y.
6.4
yeyyy ttt ˆˆ
A single deviation of yt from its mean can be split into two parts:
22
22
2
22
ˆ)ˆ(
ˆ)ˆ(2ˆ)ˆ(
)ˆ)ˆ((
)ˆˆ()(
tt
tttt
tt
ttt
eyy
eyyeyy
eyy
yeyyy
The sum of squared deviations from the mean is:
This term is zero
The total variation in yt is measured as the sum of the squared deviations from the mean:
2)( yyt Also known as SST (Total Sum of Squares)
6.5
xbby 21
yyyt ˆ
ttt yye ˆˆ Unexplained
Explained
Total Variation
yyt ty
ty
xt
Graphically, a single y deviation from mean can be split into the two parts:
6.6
Where:
SST: Total Sum of Squares with T-1 degrees of freedom. It measures the total variation in the actual yt values about its mean.
SSR: Regression Sum of Squares with 1 degree of freedom. It measures the variation in the predicted values of yt about their mean. It is the part of the total variation that is explained by the model.
SSE: Error Sum of Squares with T-2 degrees of freedom. It measures the variation in the actual yt values about the predicted yt values. It is the part of the total variation that is left unexplained.
222 ˆ)ˆ()( ttt eyyyy
Analysis of Variance (ANOVA):
SST = SSR + SSE
6.7
Multiple R 0.563132517R Square 0.317118231Adjusted R Square 0.299147658Standard Error 37.80536423Observations 40
ANOVAdf SS MS F Significance F
Regression 1 25221.22299 25221.22299 17.64652878 0.00015495Residual 38 54311.33145 1429.245564Total 39 79532.55444
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 40.76755647 22.13865442 1.841464964 0.073369453 -4.049807902 85.58492083x 0.128288601 0.030539254 4.200777164 0.00015495 0.066465111 0.190112091
SST SSR SSE
R2 = SSR/SST = 1 – SSE/SST
6.8Coefficient of Determination: R2
• R2 is the proportion of the total variation (SST) that is explained by the model. We can also think of it as one minus the proportion of the total variation that is unexplained (left in the residuals).
• 0 R2 1• The closer R2 is to 1.0, the better the fit of the
model and the greater is the predictive ability of the model over the sample.
• If R2 =1 the model has explained everything. All the data points lie on the regression lie (very unlikely). There are no residuals.
• If R2 = 0 the model has explained nothing.
2
22
)(
ˆ11
yy
e
SST
SSE
SST
SSRR
t
t
6.9
x
x
y
y
R2 appears to be 1.0. All data Points lie on a line.
R2 appears to be 0. The best line thru thesepoints appears to have a slope of zero.
Graph A
Graph B
6.10
x
x
y
y
R2 appears to be close to 1.0.
R2 appears to be greater than 0 butless than R2 in graph C.
Graph C
Graph D
6.11• In the food expenditure example, R2 = 0.317 “31.7%
of the total variation in food expenditures has been explained by variation in household income.”
• More Examples:
6.12Correlation Analysis
• Correlation coefficient between x and y is:
• The Sample Correlation between x and y is:
• It is always true that
-1 r 1
• It measures the strength of a linear relationship between x and y.
22
22
)()(
))((
)(1
1)(
11
))((1
1
)(ˆ)(ˆ
),(ˆ
yyxx
yyxx
yyT
xxT
yyxxT
yraVxraV
yxvoCr
tt
tt
tt
tt
)()(
),(
yVarxVar
yxCov
6.13Correlation and R2
• It can be shown that the square of the sample correlation coefficient for x and y is equal to R2.
• R2 can also be computed as the square of the sample correlation coefficient for the y values and the values.
• It can also be shown that
x
y
s
srb 2
y
6.14Reporting Regression Results
• The numbers in parentheses are the standard errors of the coefficients estimates. These can be used to construct the necessary t-statistics to ascertain the significance of the estimates.
• Sometimes, authors will report the t-statistic instead of the standard error. This would be the t-statistic for the Ho: = 0
tt xy 1283.0768.40ˆ (s.e.) (22.139) (0.0305) R2 = 0.317
tt xy 1283.0768.40ˆ (t-stat) (1.841) (4.201) R2 = 0.317
6.15Units of Measurement
b1 is measured in “y units”
b2 is measured in “y units over x units”
Example 3.15 from Chapter 3 Exercises
y = number of sodas sold x = temperature in degrees (oF)
22 )(
))((
xx
yyxxb
t
ttxbyb 21
xyt 6240ˆ
240ˆ oyIf xo = 0o then the model predicts:So b1 is measured in y units (# of sodas).
b2 = 6 where 6 is in (# of sodas / degrees).If x increases by 10 degrees y increases by 60 sodas
xyo 6ˆ^
6.16
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.563132517R Square 0.317118231Adjusted R Square 0.299147658Standard Error 37.80536423Observations 40
ANOVAdf SS MS F Significance F
Regression 1 25221.22299 25221.22299 17.64652878 0.00015495Residual 38 54311.33145 1429.245564Total 39 79532.55444
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 40.76755647 22.13865442 1.841464964 0.073369453 -4.049807902 85.58492083newx 12.82886011 3.053925406 4.200777164 0.00015495 6.646511122 19.01120909
Let newx = x/100. We have no change to b1 because b1 is in Y units. b2 increases by 100, because it is in y units/x units.If newx increases by 1 unit (weekly income increases by $100), the model predicts food spending to rise by $12.83.Note this isn’t a new result. It still predicts that if income increasesby $1, food spending will increase by $0.1283
6.17Functional Forms
A linear model is one that is linear in the parameters with an additive error term.
y = 1 + 2x + e
The coefficient 2 measures the effect of a one unit change in x on y. As the model is written above, this effect is assumed to be constant:
However, we want to have the ability to model relationships among economic variables where the effect of x on y is not constant.
Example: our food expenditure example assumes that the increase in food spending from an additional dollar of income was the same whether the family had a high or low income. We can capture these effects using logs, powers and reciprocals yet still maintain a model that is linear in the parameters with an additive error term.
6.18The Natural Logarithm
• We will use the derivative property often:
• Let y be the log of X:y = ln(x) dy/dx = 1/x or dy = dx/x
• This means that the absolute change in the log of X is equivalent to the relative change in the level of X.Let x=50 ln(x) = 3.912Let x=52 ln(x) = 3.951
dln(x) = 3.951 – 3.912 = 0.039
The absolute change in ln(x) is 0.039, which can be interpreted as a relative change in X (X increases from 50 to 52, which, in relative terms, is 3.9%)
6.20
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.571864156 xbar=698R Square 0.327028613 ybar=130Adjusted R Square 0.30931884Standard Error 37.5300348Observations 40
ANOVAdf SS MS F
Regression 1 26009.42098 26009.42098 18.46599654Residual 38 53523.13346 1408.503512Total 39 79532.55444
Coefficients Standard Error t Stat P-valueIntercept -415.5556981 127.1672145 -3.267789578 0.002303753lnx 83.91235619 19.52718051 4.297207994 0.000115804
tt exy )ln(21
Example: Y: food $, X: Weekly Income
6.23SUMMARY OUTPUT
Regression StatisticsMultiple R 0.959804589R Square 0.921224848Adjusted R Square 0.913347333Standard Error 51.41714671Observations 12
ANOVAdf SS MS F
Regression 1 309166.4369 309166.4369 116.9435829Residual 10 26437.22976 2643.722976Total 11 335603.6667
Coefficients Standard Error t Stat P-valueIntercept 2813.319917 175.3238285 16.04642073 1.82583E-08p -1577.581002 145.8825916 -10.81404563 7.72349E-07
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.938565661R Square 0.880905499Adjusted R Square 0.868996049Standard Error 0.068144835Observations 12
ANOVAdf SS MS F
Regression 1 0.343481619 0.343481619 73.96693327Residual 10 0.046437185 0.004643719Total 11 0.389918805
Coefficients Standard Error t Stat P-valueIntercept 7.152753536 0.044168328 161.9430444 1.98024E-18lnp -1.927315081 0.224095901 -8.600403088 6.21314E-06
Linear Model
Double Log Model