Upload
ambrose-todd
View
226
Download
2
Tags:
Embed Size (px)
Citation preview
A Measure of“Goodness of Fit”
• THE COEFFICIENT OF DETERMINATION (R2)
3
we shall find out how “well” the sample regression line fits the data
Figure A :Venn diagram
• In this figure the circle Y represents variation in the dependent variable Y and the circle X represents variation in the explanatory variable X.
• When there is no overlap, R2 is obviously zero, but when the overlap is complete, R2 is 1, since 100 percent of the variation in Y is explained by X. As we shall show shortly, R2 lies between 0 and 1. 4
Analysis of Variance
• The total variability in a regression analysis, SST, can be partitioned into a component explained by the regression, SSR, and a component due to unexplained error, SSE
5
SSESSRSST
• With the components defined as,• Total sum of squares :• Error sum of squares :
• Regression sum of squares :
6
n
ii YySST
1
2)(
n
ii
n
iii
n
iii eyyxbbySSE
1
2
1
2
1
210 )ˆ())((
n
ii
n
ii XxbYySSR
1
221
1
2 )()ˆ(
Measure of Coefficient of Determination, R2
• The Coefficient of Determination for a regression equation is defined as
7
SST
SSE
SST
SSRR 12
Some Examples
8
EXAMPLE 1 : HYPOTHETICAL DATA ON WEEKLY FAMILY CONSUMPTION EXPENDITURE (Y) AND WEEKLY FAMILY INCOME (X)
• The estimated regression line is
• The value of β2 = 0.5091, which measures the slope of the line, shows that, within the sample range of X between $80 and $260 per week, as X increases, say, by $1, the estimated increase in the mean or average weekly consumption expenditure amounts to about 51 cents. 10
Y = 24,4545 + 0,5091 Xi
R2 = 0,9621
• The value of β1 = 24.4545, which is the intercept of the line, indicates the average level of weekly consumption expenditure when weekly income is zero.
• The value of R2 of 0.9621 means that about 96 percent of the variation in the weekly consumption expenditure is explained by income. Since R2 can at most be 1, the observed R2 suggests that the sample regression line fits the data very well.
11
• The data relate to a sample of 55 rural households in India. The regressand in this example is expenditure on food and the regressor is total expenditure, a proxy for income, both figures in rupees. The data in this example are thus cross-sectional data.
12
EXAMPLE 2 : FOOD EXPENDITURE IN INDIA
• If total expenditure increases by 1 rupee, on average, expenditure on food goes up by about 44 paise (1 rupee = 100 paise).
• If total expenditure were zero, the average expenditure on food would be about 94 rupees.
• The R2 value of about 0.37 means that only 37 percent of the variation in food expenditure is explained by the total expenditure.
13
FoodExpi = 94,20 + 0,43 TotalExpi
• The data relating average hourly earnings and education, as measured by years of schooling. Using that data, if we regress average hourly earnings (Y) on education (X), we obtain the following results.
14
EXAMPLE 3 : THE RELATIONSHIP BETWEEN EARNINGSAND EDUCATION
Yi = -0,0144 + 0,7241 Xi
• As the regression results show, there is a positive association between education and earnings, an unsurprising finding.
• For every additional year of schooling, the average hourly earnings go up by about 72 cents an hour. The intercept term is positive but it may have no economic meaning.
• The R2 value suggests that about 89 percent of the variation in average hourly earnings is explained by education. For cross-sectional data, such a high R2 is rather unusual.
15
Basis for Inference About the Population Regression
Slope• Let 1 be a population regression
slope and b1 its least squares estimate based on n pairs of sample observations. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed.
161
11
bs
bt
Excel Output for Retail Sales Model
• The regression equation is Y Retail Sales = 1922 + 0.382 X Income
17
Regression StatisticsMultiple R 0.958748803R Square 0.919199267Adjusted R Square 0.91515923Standard Error 147.6697181Observations 22
Analysis of Variancedf SS MS F Significance F
Regression 1 4961434.406 4961434.406 227.522506 2.17134E-12Residual 20 436126.9127 21806.34563Total 21 5397561.318
Coefficients Standard Error t Stat P-value Lower 95%Intercept 1922.392694 274.9493737 6.99180605 8.74464E-07 1348.858617X Income 0.38151672 0.025293061 15.08384918 2.17134E-12 0.328756343
Tests of the Population Regression Slope
18
If the regression errors i are normally distributed and the standard least squares assumptions hold (or if the distribution of b1 is approximately normal), the following tests have significance value :
1. To test either null hypothesis
against the alternative
the decision rule is
*110
*110 :: HorH
,2
*11
0
1
b if HReject
n
b
ts
*111 : H
19
2. To test either null hypothesis
against the alternative
the decision rule is
*110
*110 :: HorH
,2
*11
0
1
b if HReject
n
b
ts
*111 : H
20
3. To test the null hypothesis
Against the two-sided alternative
the decision rule is
*110 : H
2/,2
*11
2/,2
*11
0
11
bb if HReject
n
bn
b
ts
orts
*111 : H
F test for Simple Regression Coefficient
21
• We can test the hypothesis
• against the alternative
• By using the F statistic
• The decision rule is
• We can also show that the F statistic is
• For any simple regression analysis.
0: 10 H
2,-n1,0 FF if HReject
2es
SSR
MSE
MSRF
0: 11 H
2
1btF