Upload
seema-soleja
View
55
Download
2
Tags:
Embed Size (px)
DESCRIPTION
bart hobijn uc berkeley econ 141 linear regression slides lecture 2
Citation preview
Ch.4: Simple Linear Regression
Econ 141 Spring 2014
Lecture: February 02 and 05, 2014
Bart Hobijn
2/03&05/2014 Econ 141, Spring 2014 1
The views expressed in these lecture notes are solely those of the instructor and do not necessarily
reflect those of the UC Berkeley, or other institutions with which he is affiliated.
Example: Estimate MPC
• MPC: Marginal Propensity to Consume
Suppose households’ pre-tax income
increases by a dollar, what fraction of this
dollar would they end up spending versus
paying in taxes or saving?
2/03&05/2014 Econ 141, Spring 2014 2
Example: Estimate MPC
• Basic equation
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖 – 𝑖 MSA, (unit of observation)
– 𝑌𝑖 Average consumption expenditures per household
– 𝑋𝑖 Average pre-tax income per household
– 𝛽0 Average consumption level at zero income
– 𝛽1 Marginal propensity to consume (MPC)
– 𝑢𝑖 MSA-specific deviation from average linear
relationship between income and spending
• How can we estimate value of MPC, i.e. 𝛽1?
2/03&05/2014 Econ 141, Spring 2014 3
Income and spending by MSA
MSA
(𝒊) Spending
(𝒀𝒊)
Income
(𝑿𝒊)
MSA
(𝒊) Spending
(𝒀𝒊)
Income
(𝑿𝒊)
Chicago 57.7 74.4 Atlanta 51.9 71.2
Detroit 50.5 79.8 Miami 40.6 58.9
Minneapolis-
St. Paul 56.7 66.8
Dallas-
Fort Worth 57.1 71.0
Cleveland 48.0 65.9 Houston 58.2 73.5
New York 58.7 80.2 Los
Angeles 55.3 69.6
Philadelphia 53.5 71.7 San
Francisco 73.6 98.2
Boston 65.0 79.8 San Diego 56.2 76.4
Washington,
D.C. 77.9 111.9 Seattle 60.7 74.1
Baltimore 62.3 96.9 Phoenix 53.7 63.2
2/03&05/2014 Econ 141, Spring 2014 4
Note: Spending and income are annual average across households in thousands of dollars
Source: Consumer Expenditure Survey
Data scatterplot
2/03&05/2014 Econ 141, Spring 2014 5
Chicago
Detroit
Minneapolis-St. Paul
Cleveland
New York
Philadelphia
Boston
Washington,D.C.
Baltimore
Atlanta
Miami
Dallas-Fort Worth
Houston
LosAngeles
SanFrancisco
San Diego
Seattle
Phoenix
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
Data scatterplot
2/03&05/2014 Econ 141, Spring 2014 6
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
Estimate of MPC (𝜷𝟏)?
2/03&05/2014 Econ 141, Spring 2014 7
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
Estimate of MPC (𝜷𝟏)?
2/03&05/2014 Econ 141, Spring 2014 8
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
What is best estimate of line
defined by 𝜷𝒐 and 𝜷𝟏?
Ordinary Least Squares
2/03&05/2014 Econ 141, Spring 2014 9
Simple linear regression model
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖
– 𝑖 observation number
– 𝑌𝑖 dependent variable (regressand)
– 𝑋𝑖 independent (explanatory) variable (regressor)
– 𝛽0 intercept / constant
– 𝛽1 slope coefficient
– 𝑢𝑖 error term / residual
2/03&05/2014 Econ 141, Spring 2014 10
Simple linear regression model
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖
– 𝑖 observation number
– 𝑌𝑖 dependent variable (regressand)
– 𝑋𝑖 independent (explanatory) variable (regressor)
– 𝛽0 intercept / constant
– 𝛽1 slope coefficient
– 𝑢𝑖 error term / residual
2/03&05/2014 Econ 141, Spring 2014 11
Population regression line /
Population regression function
Average linear relationship between
dependent and independent variable.
Simple linear regression model
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖
– 𝑖 observation number
– 𝑌𝑖 dependent variable (regressand)
– 𝑋𝑖 independent (explanatory) variable (regressor)
– 𝛽0 intercept / constant
– 𝛽1 slope coefficient
– 𝑢𝑖 error term / residual
2/03&05/2014 Econ 141, Spring 2014 12
Error term / Residual
Observation-specific deviation from
average linear relationship between
dependent and independent variable.
Simple linear regression model
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖
– 𝑖 observation number
– 𝑌𝑖 dependent variable (regressand)
– 𝑋𝑖 independent (explanatory) variable (regressor)
– 𝛽0 Intercept / constant
– 𝛽1 Slope coefficient
– 𝑢𝑖 error term / residual
2/03&05/2014 Econ 141, Spring 2014 13
Why we need to estimate
Observed: Sample 𝑌𝑖 , 𝑋𝑖 for 𝑖 = 1, … , 𝑛.
Unobserved: Parameters 𝛽0 and 𝛽1 as well
as error terms 𝑢𝑖 for 𝑖 = 1, … , 𝑛.
Ordinary Least Squares (OLS)
• OLS estimates: Choose 𝛽 0 and 𝛽 1 to minimize the sum of squared
residuals (SSR)
𝛽 0, 𝛽 1 = argmin𝑏1,𝑏2
𝑌𝑖 − 𝑏0 − 𝑏1𝑋𝑖2
𝑛
𝑖=1
• Properties:
– What is solution for 𝛽 0, 𝛽 1 ?
– Are 𝛽 0, 𝛽 1 consistent estimates of true 𝛽0 and 𝛽1?
– Are 𝛽 0, 𝛽 1 unbiased estimates of true 𝛽0 and 𝛽1?
– What is their asymptotic distribution? 2/03&05/2014 Econ 141, Spring 2014 14
Solution for 𝜷 𝟎, 𝜷 𝟏
First order necessary condition for 𝜷 𝟎
0 =𝜕
𝜕𝛽 0 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
2𝑛
𝑖=1
= 𝜕
𝜕𝛽 0𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
2𝑛
𝑖=1
= −2 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
⇒ 0 =1
𝑛 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
.
Solving for 𝛽 0
0 =1
𝑛 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
= 𝑌 − 𝛽 0 − 𝛽 1𝑋
Such that
𝛽 0 = 𝑌 − 𝛽 1𝑋
2/03&05/2014 Econ 141, Spring 2014 15
Solution for 𝜷 𝟎, 𝜷 𝟏
First order necessary condition for 𝜷 𝟏
0 =𝜕
𝜕𝛽 1 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
2𝑛
𝑖=1
=𝜕
𝜕𝛽 1 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
2𝑛
𝑖=1
= −2 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
⇒ 0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 16
Solution for 𝜷 𝟎, 𝜷 𝟏
First order necessary condition for 𝜷 𝟏
0 =𝜕
𝜕𝛽 1 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
2𝑛
𝑖=1
=𝜕
𝜕𝛽 1 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
2𝑛
𝑖=1
= −2 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
⇒ 0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 17
This implies that
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛𝑖=1
Solution for 𝜷 𝟎, 𝜷 𝟏
First order necessary condition for 𝜷 𝟏
0 =𝜕
𝜕𝛽 1 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
2𝑛
𝑖=1
=𝜕
𝜕𝛽 1 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
2𝑛
𝑖=1
= −2 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
⇒ 0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌
𝑛
𝑖=1
− 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
= 𝑠𝑋𝑌 −𝛽 1𝑠𝑋2 .
2/03&05/2014 Econ 141, Spring 2014 18
Solution for 𝜷 𝟎, 𝜷 𝟏
First order necessary condition for 𝜷 𝟏
0 =𝜕
𝜕𝛽 1 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
2𝑛
𝑖=1
=𝜕
𝜕𝛽 1 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
2𝑛
𝑖=1
= −2 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
⇒ 0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌
𝑛
𝑖=1
− 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
= 𝑠𝑋𝑌 − 𝛽 1𝑠𝑋2 .
2/03&05/2014 Econ 141, Spring 2014 19
𝜷 𝟏 =𝑠𝑋𝑌
𝑠𝑋2
Simple linear regression with OLS
• OLS estimators of 𝜷𝟎 and 𝜷𝟏:
– 𝛽 1 =1
𝑛 𝑋𝑖−𝑋 𝑌𝑖−𝑌 𝑛
𝑖=11
𝑛 𝑋𝑖−𝑋 2𝑛
𝑖=1
=𝑠𝑋𝑌
𝑠𝑋2
– 𝛽 0 = 𝑌 − 𝛽 1𝑋
• Derived estimates for each 𝒊 = 𝟏, … , 𝒏
– 𝑌 𝑖 = 𝛽 0 + 𝛽 1𝑋𝑖, predicted/fitted value of 𝑌𝑖
– 𝑢 𝑖 = 𝑌𝑖 − 𝑌 𝑖, residual
2/03&05/2014 Econ 141, Spring 2014 20
Estimating the MPC
2/03&05/2014 Econ 141, Spring 2014 21
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
Estimating the MPC
2/03&05/2014 Econ 141, Spring 2014 22
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
San Francisco
Estimating the MPC
2/03&05/2014 Econ 141, Spring 2014 23
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
San Francisco
Estimated MPC out of
pre-tax income is 56
cents on the dollar
73.6
69.6
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
San Francisco
SF predicted value and residual
2/03&05/2014 Econ 141, Spring 2014 24
Expenditures in
SF higher than
predicted by
regression
SF predicted value and residual
2/03&05/2014 Econ 141, Spring 2014 25
73.6
69.6
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
San Francisco
Residual:
Goodness of fit
• Main question
What fraction of the variance of the
dependent variable, 𝑌𝑖, is explained by the
regression line rather than unexplained? (unexplained means part of the residuals)
• Variance accounting – 𝑇𝑆𝑆 = 𝑌𝑖 − 𝑌 2𝑛
𝑖=1 total sum of squares
– 𝐸𝑆𝑆 = 𝑌 𝑖 − 𝑌 2𝑛
𝑖=1 , estimated sum of squares
a.k.a. model sum of squares
– 𝑆𝑆𝑅 = 𝑌𝑖 − 𝑌 𝑖2𝑛
𝑖=1 = 𝑢 𝑖2 𝑛
𝑖=1 sum of squares residuals
a.k.a. residual sum of squares
2/03&05/2014 Econ 141, Spring 2014 26
TSS = ESS + SSR decomposition
𝑇𝑆𝑆 = 𝑌𝑖 − 𝑌 2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑌𝑖 − 𝑌 𝑖2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑢 𝑖
2=
𝑛
𝑖=1
𝛽 0 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝑌 − 𝛽 1𝑋 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢 𝑖2
𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+ 2𝛽 1 𝑋𝑖 − 𝑋 𝑢 𝑖
𝑛
𝑖=1
+ 𝑢 𝑖2
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 27
TSS = ESS + SSR decomposition
𝑇𝑆𝑆 = 𝑌𝑖 − 𝑌 2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑌𝑖 − 𝑌 𝑖2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑢 𝑖
2=
𝑛
𝑖=1
𝛽 0 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝑌 − 𝛽 1𝑋 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢 𝑖2
𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+ 2𝛽 1 𝑋𝑖 − 𝑋 𝑢 𝑖
𝑛
𝑖=1
+ 𝑢 𝑖2
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 28
𝑋𝑖 − 𝑋 𝑢 𝑖𝑛𝑖=1 = 𝟎 according to first-order
necessary condition derived on slide 17.
TSS = ESS + SSR decomposition
𝑇𝑆𝑆 = 𝑌𝑖 − 𝑌 2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑌𝑖 − 𝑌 𝑖2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑢 𝑖
2=
𝑛
𝑖=1
𝛽 0 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝑌 − 𝛽 1𝑋 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢 𝑖2
𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+ 2𝛽 1 𝑋𝑖 − 𝑋 𝑢 𝑖
𝑛
𝑖=1
+ 𝑢 𝑖2
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 29
𝛽 1 𝑋𝑖 − 𝑋 = 𝑌 𝑖 − 𝑌
TSS = ESS + SSR decomposition
𝑇𝑆𝑆 = 𝑌𝑖 − 𝑌 2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑌𝑖 − 𝑌 𝑖2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 + 𝑢 𝑖
2=
𝑛
𝑖=1
𝛽 0 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝑌 − 𝛽 1𝑋 + 𝛽 1𝑋𝑖 − 𝑌 + 𝑢 𝑖
2𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢 𝑖2
𝑛
𝑖=1
= 𝛽 1 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+ 2𝛽 1 𝑋𝑖 − 𝑋 𝑢 𝑖
𝑛
𝑖=1
+ 𝑢 𝑖2
𝑛
𝑖=1
= 𝑌 𝑖 − 𝑌 2
𝑛
𝑖=1
+ 𝑢 𝑖2
𝑛
𝑖=1
= 𝐸𝑆𝑆 + 𝑆𝑆𝑅.
2/03&05/2014 Econ 141, Spring 2014 30
Goodness of fit for MCP regression
2/03&05/2014 Econ 141, Spring 2014 31
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
San Francisco
𝑹𝟐: measure of goodness of fit
Measure Equation Value Share (percentage)
ESS 𝑌 𝑖 − 𝑌 2𝑛
𝑖=1 944.0 74.9
SSR 𝑢 𝑖2
𝑛
𝑖=1 316.4 25.1
TSS 𝑌𝑖 − 𝑌 2𝑛
𝑖=1 1260.3 100.0
2/03&05/2014 Econ 141, Spring 2014 32
𝑹𝟐: measure of goodness of fit
Measure Equation Value Share (percentage)
ESS 𝑌 𝑖 − 𝑌 2𝑛
𝑖=1 944.0 74.9
SSR 𝑢 𝑖2
𝑛
𝑖=1 316.4 25.1
TSS 𝑌𝑖 − 𝑌 2𝑛
𝑖=1 1260.3 100.0
2/03&05/2014 Econ 141, Spring 2014 33
𝑹𝟐 fraction of the variation in
the dependent variable, i.e.
of 𝑻𝑺𝑺, explained by the
regression line. 𝑹𝟐 = 𝟎. 𝟕𝟒𝟗.
Standard error of the regression
Measure Equation Value Share (percentage)
ESS 𝑌 𝑖 − 𝑌 2𝑛
𝑖=1 944.0 74.9
SSR 𝑢 𝑖2
𝑛
𝑖=1 316.4 25.1
TSS 𝑌𝑖 − 𝑌 2𝑛
𝑖=1 1260.3 100.0
2/03&05/2014 Econ 141, Spring 2014 34
𝑺𝑬𝑹 = 𝒔𝒖 , where 𝒔𝒖 𝟐 =
𝟏
𝒏−𝟐 𝒖 𝒊
𝟐 𝒏𝒊=𝟏 is
an unbiased estimate of variance of
residuals 𝒗𝒂𝒓 𝒖𝒊
Standard error of the regression
𝑆𝐸𝑅 = 𝑠𝑢 , where 𝑠𝑢 2 =
1
𝑛−2 𝑢 𝑖
2 𝑛𝑖=1
• Degrees of freedom correction If we had only two observations we would be able to
perfectly fit a straight line and residuals would be
zero.
2/03&05/2014 Econ 141, Spring 2014 35
Standard error of the regression
𝑆𝐸𝑅 = 𝑠𝑢 , where 𝑠𝑢 2 =
1
𝑛−2 𝑢 𝑖
2 𝑛𝑖=1
• Degrees of freedom correction If we had only two observations we would be able to
perfectly fit a straight line and residuals would be
zero.
• Measure of spread around regression line Estimate of standard deviation of deviation from the
regression line.
2/03&05/2014 Econ 141, Spring 2014 36
MCP regression in Excel
2/03&05/2014 Econ 141, Spring 2014 37
30
40
50
60
70
80
90
50 60 70 80 90 100 110 120
Source: Consumer Expenditure Survey by MSA
Annual income and expenditures by household; 000's dollars; 2012
Average Income and Expenditures by major MSA
Income
Expenditures
San Francisco
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 38
𝜷 𝟎 estimated
intercept
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 39
𝜷 𝟏 estimated slope
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 40
𝒏 sample size
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 41
𝑬𝑺𝑺 Explained sum of squares
𝑺𝑺𝑹 Sum of squared residuals
𝑻𝑺𝑺 Total sum of squares
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 42
𝑹𝟐
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 43
𝑠𝑢 2 =
1
𝑛 − 2 𝑢 𝑖
2 𝑛
𝑖=1
SUMMARY OUTPUT
Dependent variable: Expenditures
Regression Statistics
Multiple R 0.865434015
R Square 0.748976034
Adjusted R Square 0.733287037
Standard Error 4.446724217
Observations 18
ANOVA
df SS MS F Significance F
Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06
Residual 16 316.3737002 19.77335626
Total 17 1260.332652
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632
Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847
Example Excel regression output
2/03&05/2014 Econ 141, Spring 2014 44
𝑆𝐸𝑅 = 𝑠𝑢
Why use OLS?
• Most common estimation method
– Implemented in many different applications
– Most common methodology. Thus important to
understand.
• OLS has very desirable properties
Under relatively general conditions OLS estimates
are
– Consistent
– Unbiased
– Have tractable asymptotic distribution 2/03&05/2014 Econ 141, Spring 2014 45
Why use OLS?
• Most common estimation method
– Implemented in many different applications
– Most common methodology. Thus important to
understand.
• OLS has very desirable properties
Under relatively general conditions OLS estimates
are
– Consistent
– Unbiased
– Have tractable asymptotic distribution 2/03&05/2014 Econ 141, Spring 2014 46
Conditions listed in book
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖, where 𝑖 = 1, … , 𝑛
• No information in 𝑿𝒊 about 𝒖𝒊
𝐸 𝑢𝑖 = 𝐸 𝑢𝑖 𝑋𝑖 = 0
2/03&05/2014 Econ 141, Spring 2014 47
Conditions listed in book
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖, where 𝑖 = 1, … , 𝑛
• No information in 𝑿𝒊 about 𝒖𝒊
𝐸 𝑢𝑖 = 𝐸 𝑢𝑖 𝑋𝑖 = 0 Suppose not and instead 𝐸 𝑢𝑖 𝑋𝑖 = 𝛾𝑋𝑖, then we can write
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝛾𝑋𝑖 + 𝑢𝑖 − 𝛾𝑋𝑖
= 𝛽0 + 𝛽1 + 𝛾 𝑋𝑖 + 𝑢𝑖 − 𝛾𝑋𝑖 = 𝛽0 + 𝛽 1𝑋𝑖 + 𝑢 𝑖
where 𝐸 𝑢 𝑖 𝑋𝑖 = 𝐸 𝑢𝑖 − 𝛾𝑋𝑖 𝑋𝑖 = 𝐸 𝑢𝑖 𝑋𝑖 − 𝛾𝑋𝑖 = 0
So, in this case there is an alternative representation of the
linear regression line with a different slope parameter,
𝛽 1 = 𝛽1 + 𝛾, that satisfies this assumption.
2/03&05/2014 Econ 141, Spring 2014 48
Conditions listed in book
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖, where 𝑖 = 1, … , 𝑛
• No information in 𝑿𝒊 about 𝒖𝒊
𝐸 𝑢𝑖 = 𝐸 𝑢𝑖 𝑋𝑖 = 0
Note, this implies that
𝐸 𝑢𝑖𝑋𝑖 = 𝐸 𝑋𝑖𝐸 𝑢𝑖 𝑋𝑖 = 0
Which, given 𝐸 𝑢𝑖 = 0 implies that
cov 𝑋𝑖 , 𝑢𝑖 = 𝐸 𝑢𝑖𝑋𝑖 − 𝐸 𝑢𝑖 𝐸 𝑋𝑖 = 0
2/03&05/2014 Econ 141, Spring 2014 49
Conditions listed in book
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖, where 𝑖 = 1, … , 𝑛
• No information in 𝑿𝒊 about 𝒖𝒊
𝐸 𝑢𝑖 = 𝐸 𝑢𝑖 𝑋𝑖 = 0
• 𝑿𝒊, 𝒀𝒊 well-behaved random variables
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, are independently drawn from
identical joint distribution.
– Large outliers are unlikely.
2/03&05/2014 Econ 141, Spring 2014 50
Conditions listed in book
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖, where 𝑖 = 1, … , 𝑛
• No information in 𝑿𝒊 about 𝒖𝒊
𝐸 𝑢𝑖 = 𝐸 𝑢𝑖 𝑋𝑖 = 0
• 𝑿𝒊, 𝒀𝒊 well-behaved random variables
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, are independently drawn from
identical joint distribution.
– Large outliers are unlikely.
Last two assumptions are made such that we can apply LLN
and CLT to derive properties of 𝛽 0 and 𝛽 1.
2/03&05/2014 Econ 141, Spring 2014 51
Properties of 𝛽 0 and 𝛽 1
Properties of 𝛽 0 and 𝛽 1 are derived by manipulating
the first-order necessary conditions from slides 15 and
17.
0 =1
𝑛 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑢 𝑖
𝑛
𝑖=1
and
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑢 𝑖
𝑛
𝑖=1
2/03&05/2014 Econ 141, Spring 2014 52
Properties of 𝛽 0 and 𝛽 1
Properties of 𝛽 0 and 𝛽 1 are derived by manipulating
the first-order necessary conditions from slides 15 and
17.
0 =1
𝑛 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑢 𝑖
𝑛
𝑖=1
and
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑢 𝑖
𝑛
𝑖=1
These are sample approximations of condition
𝐸 𝑢𝑖 = 𝐸 𝑢𝑖𝑋𝑖 = 0
2/03&05/2014 Econ 141, Spring 2014 53
Consistency of 𝛽 1
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖 − 𝛽0 − 𝛽1𝑋𝑖 − 𝑢 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽1 − 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢𝑖 − 𝑢
𝑛
𝑖=1
= 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 54
Consistency of 𝛽 1
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖 − 𝛽0 − 𝛽1𝑋𝑖 − 𝑢 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽1 − 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢𝑖 − 𝑢
𝑛
𝑖=1
= 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
2/03&05/2014 Econ 141, Spring 2014 55
𝒑 𝒗𝒂𝒓 𝑿𝒊 > 𝟎 𝒑
𝒄𝒐𝒗 𝑿𝒊, 𝒖𝒊 = 𝟎
Consistency of 𝛽 1
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖 − 𝛽0 − 𝛽1𝑋𝑖 − 𝑢 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽1 − 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢𝑖 − 𝑢
𝑛
𝑖=1
= 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
𝑝 0 = 𝛽1 − 𝛽 1 var 𝑋𝑖 .
Such that
𝛽1 − 𝛽 1𝑝 0, that is 𝛽 1
𝑝 𝛽1
2/03&05/2014 Econ 141, Spring 2014 56
Consistency of 𝛽 1
0 =1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝛽 0 − 𝛽 1𝑋𝑖
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖 − 𝛽0 − 𝛽1𝑋𝑖 − 𝑢 − 𝛽 1 𝑋𝑖 − 𝑋
𝑛
𝑖=1
=1
𝑛 𝑋𝑖 − 𝑋 𝛽1 − 𝛽 1 𝑋𝑖 − 𝑋 + 𝑢𝑖 − 𝑢
𝑛
𝑖=1
= 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
𝑝 0 = 𝛽1 − 𝛽 1 var 𝑋𝑖 .
Such that
𝛽1 − 𝛽 1𝑝 0, that is 𝛽 1
𝑝 𝛽1
2/03&05/2014 Econ 141, Spring 2014 57
As the sample size 𝒏 gets arbitrarily
large, i.e. 𝒏 ∞, our estimate of the
slope coefficient, 𝜷 𝟏, gets arbitrarily
close to the true parameter value
𝜷𝟏from the population regression line
But, in real life, 𝒏 is finite
Small sample properties of OLS estimators
• Unbiasedness
On average OLS estimate equals true
parameter value of interest.
• Asymptotic distribution
OLS assumptions imply we can use CLT to
derive asymptotic normal distribution of OLS
estimates, that can be used as approximation
when 𝑛 is big.
2/03&05/2014 Econ 141, Spring 2014 58
Unbiasedness of 𝛽 1
0 = 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
Such that
𝛽 1 = 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
Taking expectations yields
E 𝛽 1 = E 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋1, … , 𝑋𝑛𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 0𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1
2/03&05/2014 Econ 141, Spring 2014 59
Unbiasedness of 𝛽 1
0 = 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
Such that
𝛽 1 = 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
Taking expectations yields
E 𝛽 1 = E 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋1, … , 𝑋𝑛𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 0𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1
2/03&05/2014 Econ 141, Spring 2014 60
Here is where we
apply second
condition from slide 43
Unbiasedness of 𝛽 1
0 = 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
Such that
𝛽 1 = 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
Taking expectations yields
E 𝛽 1 = E 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋1, … , 𝑋𝑛𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 0𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1
2/03&05/2014 Econ 141, Spring 2014 61
Implied by first
condition from slide 43
Unbiasedness of 𝛽 1
0 = 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
Such that
𝛽 1 = 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
Taking expectations yields
E 𝛽 1 = E 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋1, … , 𝑋𝑛𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 0𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1
2/03&05/2014 Econ 141, Spring 2014 62
Even as the sample size 𝒏 is not large, on
average our estimate of the slope
coefficient, 𝜷 𝟏, will equal the true parameter
value 𝜷𝟏from the population regression line.
𝐄 𝜷 𝟏 = 𝜷𝟏. Thus, 𝜷 𝟏 unbiased.
Unbiasedness of 𝛽 1
0 = 𝛽1 − 𝛽 11
𝑛 𝑋𝑖 − 𝑋 2
𝑛
𝑖=1
+1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢
𝑛
𝑖=1
.
Such that
𝛽 1 = 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
Taking expectations yields
E 𝛽 1 = E 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋1, … , 𝑋𝑛𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 +E
1𝑛
𝑋𝑖 − 𝑋 E 𝑢𝑖 − 𝑢 𝑋𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1 + E
1𝑛
𝑋𝑖 − 𝑋 0𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
= 𝛽1
2/03&05/2014 Econ 141, Spring 2014 63
Note that the first condition that
E 𝑢𝑖 − 𝑢 𝑋𝑖 = 0
is crucial for OLS to be unbiased.
If this condition is not true the average OLS
estimate will deviate from
the true parameter value.
Asymptotic distribution of 𝛽 1
𝛽 1 − 𝛽1 =
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖 − 𝑢 𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
=
1𝑛
𝑋𝑖 − 𝑋 𝑢𝑖𝑛𝑖=1
1𝑛
𝑋𝑖 − 𝑋 2𝑛𝑖=1
See slide 48
3 steps to deriving asymptotic distribution
1. Apply CLT to numerator
2. Apply LLN to denominator
3. Combine using Slutsky’s theorem (S&W page 676)
2/03&05/2014 Econ 141, Spring 2014 64
Apply CLT to 1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖
𝑛𝑖=1
• Define random variable
𝑣𝑖 = 𝑋𝑖 − 𝑋 𝑢𝑖
• Condition 1 from slide 47: E 𝑣𝑖 = 0
• Conditions 2&3 from slide 50 imply that
– var 𝑣𝑖 exists and is finite.
– CLT applies to sample mean of 𝑣𝑖.
𝑣 =1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖
𝑛
𝑖=1
• Note that: 𝛽 1 − 𝛽1 = 𝑣 𝑠𝑋2
2/03&05/2014 Econ 141, Spring 2014 65
Apply CLT to 1
𝑛 𝑋𝑖 − 𝑋 𝑢𝑖
𝑛𝑖=1
Apply the Central Limit Theorem
𝑍 =𝑣 𝑖 − E 𝑣𝑖
var 𝑣𝑖 𝑛 =
𝑣 𝑖
var 𝑣𝑖 𝑛 𝑑 𝑁 0,1
Such that
𝑣 𝑖𝑑 𝑁 0, var 𝑣𝑖 𝑛
Thus, the numerator of
𝛽 1 − 𝛽1 = 𝑣 𝑠𝑋2
has an asymptotic distribution that is normal
with a mean equal to zero. 2/03&05/2014 Econ 141, Spring 2014 66
Apply LLN to 𝑠𝑋2
• Conditions 2&3 from slide 50 imply that we
can apply the Law of Large Numbers to 𝑠𝑋2
and that 𝑠𝑋2 converges in probability to the
variance of 𝑋𝑖, i.e. to var 𝑋𝑖 .
• Thus
𝑠𝑋2
𝑝 var 𝑋𝑖
• Now that we know asymptotic behavior of
numerator and denominator of 𝛽 1 − 𝛽1 = 𝑣 𝑠𝑋2
the only thing left is to combine them.
2/03&05/2014 Econ 141, Spring 2014 67
Apply Slutsky’s theorem (S&W page 676)
Slutsky’s theorem implies that we can combine the
asymptotic properties of
𝑣 𝑖𝑑 𝑁 0, var 𝑣𝑖 𝑛
and
𝑠𝑋2
𝑝 var 𝑋𝑖
such that
𝛽 1 − 𝛽1 = 𝑣 𝑠𝑋2
𝑑 𝑁 0,
var 𝑣𝑖 𝑛
var 𝑋𝑖
and thus
𝛽 1 = 𝛽1 + 𝑣 𝑠𝑋2
𝑑 𝑁 𝛽1,
var 𝑣𝑖 𝑛
var 𝑋𝑖
2/03&05/2014 Econ 141, Spring 2014 68
𝛽 1 has tractable asymptotic distribution
Thus as sample size 𝑛 get large then estimated
slope coefficient has approximately a normal
distribution, such that
𝛽 1~𝑁 𝜇𝛽 1, 𝜎𝛽 1
where
𝜇𝛽 1= 𝛽1
𝜎𝛽 1
2 =var 𝑣𝑖 𝑛
var 𝑋𝑖
2
=1
𝑛
var 𝑣𝑖
var 𝑋𝑖2
2/03&05/2014 Econ 141, Spring 2014 69
𝛽 1 has tractable asymptotic distribution
Thus as sample size 𝑛 get large then estimated
slope coefficient has approximately a normal
distribution, such that
𝛽 1~𝑁 𝜇𝛽 1, 𝜎𝛽 1
where
𝜇𝛽 1= 𝛽1
𝜎𝛽 1
2 =var 𝑣𝑖 𝑛
var 𝑋𝑖
2
=1
𝑛
var 𝑣𝑖
var 𝑋𝑖2
2/03&05/2014 Econ 141, Spring 2014 70
Remember:
𝐄 𝜷 𝟏 = 𝜷𝟏. Thus, 𝜷 𝟏 unbiased.
𝛽 1 has tractable asymptotic distribution
Thus as sample size 𝑛 get large then estimated
slope coefficient has approximately a normal
distribution, such that
𝛽 1~𝑁 𝜇𝛽 1, 𝜎𝛽 1
where
𝜇𝛽 1= 𝛽1
𝜎𝛽 1
2 =var 𝑣𝑖 𝑛
var 𝑋𝑖
2
=1
𝑛
var 𝑣𝑖
var 𝑋𝑖2
2/03&05/2014 Econ 141, Spring 2014 71
Remember:
𝜎𝛽 1
2 𝟎 as 𝒏 ∞. Thus, 𝜷 𝟏 consistent.
𝛽 1 has tractable asymptotic distribution
Thus as sample size 𝑛 get large then estimated
slope coefficient has approximately a normal
distribution, such that
𝛽 1~𝑁 𝜇𝛽 1, 𝜎𝛽 1
where
𝜇𝛽 1= 𝛽1
𝜎𝛽 1
2 =var 𝑣𝑖 𝑛
var 𝑋𝑖
2
=1
𝑛
var 𝑣𝑖
var 𝑋𝑖2
2/03&05/2014 Econ 141, Spring 2014 72
This term is like a noise to signal ratio.
The numerator is related to variance of the
residual
The denominator is related to the variance of the
explanatory variable
The large the variation in the explanatory variable
relative to the residual the more accurate the
OLS estimate.
Virtues of asymptotic normality
• Asymptotic normality of OLS coefficients
allows us to
– Do hypothesis tests
– Calculate confidence intervals
• Simple generalizations of same techniques
applied to the population mean.
• Chapter 5! Next week.
2/03&05/2014 Econ 141, Spring 2014 73
Summary
• Linear regression
model
• OLS estimators
• 𝑅2 and 𝑆𝐸𝑅
• OLS conditions
• Consistency of 𝛽 1
• Unbiasedness of 𝛽 1
• Asymptotic distribution
of 𝛽 1
• Even-numbered
problems:
4.2, 4.4, 4.6, 4.10,
4.12, 4.14
• Study STATA tutorial
• Empirical exercises
next week.
2/03&05/2014 Econ 141, Spring 2014 74