Upload
buikien
View
228
Download
3
Embed Size (px)
Citation preview
2. Linear regression with multiple regressors
Aim of this section:
• Introduction of the multiple regression model
• OLS estimation in multiple regression
• Measures-of-fit in multiple regression
• Assumptions in the multiple regression model
• Violations of the assumptions(omitted-variable bias, multicollinearity, heteroskedasticity,autocorrelation)
5
2.1. The multiple regression model
Intuition:
• A regression model specifies a functional (parametric) rela-tionship between a dependent (endogenous) variable Y anda set of k independent (exogenous) regressors X1, X2, . . . , Xk
• In a first step, we consider the linear multiple regressionmodel
6
Definition 2.1: (Multiple linear regression model)
The multiple (linear) regression model is given by
Yi = β0 + β1 ·X1i + β2 ·X2i + . . . + βk ·Xki + ui, (2.1)
i = 1, . . . , n, where
• Yi is the ith observation on the dependent variable,
• X1i, X2i, . . . , Xki are the ith observations on each of the kregressors,
• ui is the stochastic error term.
• The population regression line is the relationship that holdsbetween Y and the X’s on average:
E(Yi|X1i = x1, X2i = x2, . . . , Xki = xk) = β0+β1x1+. . .+βkxk.
7
Meaning of the coefficients:
• The intercept β0 is the expected value of Yi (for all i =1, . . . , n) when all X-regressors equal 0
• β1, . . . , βk are the slope coefficients on the respective regres-sors X1, . . . , Xk
• β1, for example, is the expected change in Yi resulting fromchanging X1i by one unit, holding constant X2i, . . . , Xki(and analogously β2, . . . , βk)
Definition 2.2: (Homoskedasticity, Heteroskedasticity)
The error term ui is called homoskedastic if the conditional vari-ance of ui given X1i, . . . , Xki, Var(ui|X1i, . . . , Xki), is constant fori = 1, . . . , n and does not depend on the values of X1i, . . . , Xki.Otherwise, the error term is called heteroskedastic.
8
Example 1: (Student performance)
• Regression of student performance (Y ) in n = 420 US-districts on distinct school characteristics (factors)
• Yi: average test score in the ith district (TEST SCORE)
• X1i: average class size in the ith district(measured by the student-teacher ratio, STR)
• X2i: percentage of English learners in the ith district (PCTEL)
• Expected signs of the coefficients:
β1 < 0
β2 < 0
9
Example 2: (House prices)
• Regression of house prices (Y ) recorded for n = 546 housessold in Windsor (Canada) on distinct housing characteristics
• Yi: sale price (in Canadian dollars) of the ith house (SALEPRICE)
• X1i: lot size (in square feet) of the ith property (LOTSIZE)
• X2i: number of bedrooms in the ith house (BEDROOMS)
• X3i: number of bathrooms in the ith house (BATHROOMS)
• X4i: number of storeys (excluding the basement) in the ith
house (STOREYS)
• Expected signs of the coefficients:β1, β2, β3, β4 > 0
10
2.2. The OLS estimator in multiple regression
Now:
• Estimation of the coefficients β0, β1, . . . , βk in the multipleregression model on the basis of n observations by applyingthe Ordinary Least Squares (OLS) technique
Idea:
• Let b0, b1, . . . , bk be estimators of β0, β1, . . . , βk
• We can predict Yi by b0 + b1X1i + . . . + bkXki
• The prediction error is Yi − b0 − b1X1i − . . .− bkXki
11
Idea: [continued]
• The sum of the squared prediction errors over all n observa-tions is
n∑
i=1(Yi − b0 − b1X1i − . . .− bkXki)
2 (2.2)
Definition 2.3: (OLS estimators, predicted values, residuals)
The OLS estimators β0, β1, . . . , βk are the values of b0, b1, . . . , bkthat minimize the sum of squared prediction errors (2.2). TheOLS predicted values Yi and residuals ui (for i = 1, . . . , n) are
Yi = β0 + β1X1i + . . . + βkXki (2.3)
and
ui = Yi − Yi. (2.4)
12
Remarks:
• The OLS estimators β0, β1, . . . , βk and the residuals ui arecomputed from a sample of n observations of (X1i, . . . , Xki, Yi)for i = 1, . . . , n
• They are estimators of the unknown true population coeffi-cients β0, β1, . . . , βk and ui
• There are closed-form formulas for calculating the OLS es-timates from the data(see the lectures Econometrics I+II)
• In this lecture, we use the software-package EViews
13
Regression estimation results (EViews) for the student-performance dataset
14
Dependent Variable: TEST_SCORE Method: Least Squares Date: 07/02/12 Time: 16:29 Sample: 1 420 Included observations: 420
Variable Coefficient Std. Error t-Statistic Prob.
C 686.0322 7.411312 92.56555 0.0000STR -1.101296 0.380278 -2.896026 0.0040
PCTEL -0.649777 0.039343 -16.51588 0.0000
R-squared 0.426431 Mean dependent var 654.1565Adjusted R-squared 0.423680 S.D. dependent var 19.05335S.E. of regression 14.46448 Akaike info criterion 8.188387Sum squared resid 87245.29 Schwarz criterion 8.217246Log likelihood -1716.561 Hannan-Quinn criter. 8.199793F-statistic 155.0136 Durbin-Watson stat 0.685575Prob(F-statistic) 0.000000
Predicted values Yi and residuals ui for the student-performance dataset
15
-60
-40
-20
0
20
40
60
600
620
640
660
680
700
720
50 100 150 200 250 300 350 400
Residual Actual Fitted
Regression estimation results (EViews) for the house-prices dataset
16
Dependent Variable: SALEPRICE Method: Least Squares Date: 07/02/12 Time: 16:50 Sample: 1 546 Included observations: 546
Variable Coefficient Std. Error t-Statistic Prob.
C -4009.550 3603.109 -1.112803 0.2663LOTSIZE 5.429174 0.369250 14.70325 0.0000
BEDROOMS 2824.614 1214.808 2.325153 0.0204BATHROOMS 17105.17 1734.434 9.862107 0.0000
STOREYS 7634.897 1007.974 7.574494 0.0000
R-squared 0.535547 Mean dependent var 68121.60Adjusted R-squared 0.532113 S.D. dependent var 26702.67S.E. of regression 18265.23 Akaike info criterion 22.47250Sum squared resid 1.80E+11 Schwarz criterion 22.51190Log likelihood -6129.993 Hannan-Quinn criter. 22.48790F-statistic 155.9529 Durbin-Watson stat 1.482942Prob(F-statistic) 0.000000
Predicted values Yi and residuals ui for the house-prices dataset
17
-80,000
-40,000
0
40,000
80,000
120,000
0
40,000
80,000
120,000
160,000
200,000
50 100 150 200 250 300 350 400 450 500
Residual Actual Fitted
OLS assumptions in the multiple regression model (2.1):
1. ui has conditional mean zero given X1i, X2i, . . . , Xki:
E(ui|X1i, X2i, . . . , Xki) = 0
2. (X1i, X2i, . . . , Xki, Yi), i = 1, . . . , n, are independently and iden-tically distributed (i.i.d.) draws from their joint distribution
3. Large outliers are unlikely: X1i, X2i, . . . , Xki and Yi have non-zero finite fourth moments
4. There is no perfect multicollinearity
Remarks:
• Note that we do not assume any specific parametric distri-bution for the ui
• The OLS assumptions imply specific distribution results
18
Theorem 2.4: (Unbiasedness, consistency, normality)
Given the OLS assumptions the following properties of the OLSestimators β0, β1, . . . , βk hold:
1. β0, β1, . . . , βk are unbiased estimators of β0, . . . , βk.
2. β0, β1, . . . , βk are consistent estimators of β0, . . . , βk.(Convergence in probability)
3. In large samples β0, β1, . . . , βk are jointly normally distributedand each single OLS estimator βj, j = 0, . . . , k, is normallydistributed with mean βj and variance σ2
βj, that is
βj ∼ N(βj, σ2βj
).
19
Remarks:
• In general, the OLS estimators are correlated
• This correlation among β0, β1, . . . , βk arises from the correla-tion among the regressors X1, . . . , Xk
• The sampling distribution of the OLS estimators will becomerelevant in Section 3(hypothesis-testing, confidence intervals)
20
2.3. Measures-of-fit in multiple regression
Now:
• Three well-known summary statistics that measure how wellthe OLS estimates fit the data
Standard error of regression (SER):
• The SER estimates the standard deviation of the error termui (under the assumption of homoskedasticity):
SER =
√
√
√
√
1n− k − 1
n∑
i=1u2
i
21
Standard error of regression: [continued]
• We denote the sum of squared residuals by SSR ≡∑n
i=1 u2i
so that
SER =
√
SSRn− k − 1
• Given the OLS assumptions and homoskedasticity the squaredSER, (SER)2, is an unbiased estimator of the unknown con-stant variance of the ui
• SER is a measure of the spread of the distribution of Yiaround the population regression line
• Both measures, SER and SSR, are reported in the EViewsregression output
22
R2:
• The R2 is the fraction of the sample variance of the Yi ex-plained by the regressors
• Equivalently, the R2 is 1 minus the fraction of the varianceof the Yi not explained by the regressors(i.e. explained by the residuals)
• Denoting the explained sum of squares (ESS) and the totalsum of squares (TSS) by
ESS =n
∑
i=1(Yi − Y )2 and TSS =
n∑
i=1(Yi − Y )2,
respectively, we define the R2 as
R2 =ESSTSS
= 1−SSRTSS
23
R2: [continued]
• In multiple regression, the R2 increases whenever an addi-tional regressor Xk+1 is added to the regression model, unlessthe estimated coefficient βk+1 is exactly equal to zero
• Since in practice it is extremely unusual to have exactlyβk+1 = 0, the R2 generally increases (and never decreases)when an new regressor is added to the regression model
−→ An increase in the R2 due to the inclusion of a new regressordoes not necessarily indicate an actually improved fit of themodel
24
Adjusted R2:
• The adjusted R2 (in symbols: R2), deflates the conventionalR2:
R2 = 1−n− 1
n− k − 1SSRTSS
• It is always true that R2 < R2
(why?)
• When adding a new regressor Xk+1 to the model, the R2
can increase or decrease(why?)
• The R2 can be negative(why?)
25
2.4. Omitted-variable bias
Now:
• Discussion of a phenomenon that implies violation of the firstOLS assumption on Slide 18
• This issue is known under the phrasing omitted-variable biasand is extremely relevant in practice
• Although theoretically easy to grasp, avoiding this specifi-cation problem turns out to be a nontrivial task in manyempirical applications
26
Definition 2.5: (Omitted-variable bias)
Consider the multiple regression model in Definition 2.1 on Slide7. Omitted-variable bias is the bias in the OLS estimator βj ofthe coefficient βj (for j = 1, . . . , k) that arises when the associ-ated regressor Xj is correlated with an omitted variable. Moreprecisely, for omitted-variable bias to occur, the following twoconditions must hold:
1. Xj is correlated with the omitted variable.
2. The omitted variable is a determinant of the dependent vari-able Y .
27
Example:
• Consider the house-prices dataset (Slides 16, 17)
• Using the entire set of regressors, we obtain the OLS esti-mate β2 = 2824.61 for the BEDROOMS-coefficient
• The correlation coefficients between the regressors are asfollows:
28
BEDROOMS BATHROOMS LOTSIZE STOREYS
BEDROOMS 1.000000 0.373769 0.151851 0.407974 BATHROOMS 0.373769 1.000000 0.193833 0.324066
LOTSIZE 0.151851 0.193833 1.000000 0.083675 STOREYS 0.407974 0.324066 0.083675 1.000000
Example: [continued]• There is positive (significant) correlation between the vari-
able BEDROOMS and all other regressors• Excluding the other variables from the regression yields the
following OLS-estimates:
−→ The alternative OLS-estimates of the BEDROOMS-coefficientdiffer substantially
29
Dependent Variable: SALEPRICE Method: Least Squares Date: 14/02/12 Time: 16:10 Sample: 1 546 Included observations: 546
Variable Coefficient Std. Error t-Statistic Prob.
C 28773.43 4413.753 6.519040 0.0000BEDROOMS 13269.98 1444.598 9.185932 0.0000
R-squared 0.134284 Mean dependent var 68121.60Adjusted R-squared 0.132692 S.D. dependent var 26702.67S.E. of regression 24868.03 Akaike info criterion 23.08421Sum squared resid 3.36E+11 Schwarz criterion 23.09997Log likelihood -6299.989 Hannan-Quinn criter. 23.09037F-statistic 84.38135 Durbin-Watson stat 0.811875Prob(F-statistic) 0.000000
Intuitive explanation of the omitted-variable bias:
• Consider the variable LOTSIZE as omitted
• LOTSIZE is an important variable for explaining SALEPRICE
• If we omit LOTSIZE in the regression, it will try to enter in theonly way it can, namely through its positive correlation withthe included variable BEDROOMS
−→ The coefficient on BEDROOMS will confound the effect of BED-
ROOMS and LOTSIZE on SALEPRICE
30
More formal explanation:
• Omitted-variable bias means that the first OLS assumptionon Slide 18 is violated
• Reasoning:In the multiple regression model the error term ui repre-sents all factors other than the included regressors X1, . . . , Xkthat are determinants of YiIf an omitted variable is correlated with at least one ofthe included regressors X1, . . . , Xk, then ui (which containsthis factor) is correlated with the set of regressors
−→ This implies that
E(ui|X1i, . . . , Xki) 6= 0
31
Important result:
• In the case of omitted-variable bias
the OLS estimators on the corresponding included regres-sors are biased in finite samples
this bias does not vanish in large samples
−→ the OLS estimators are inconsistent
Solutions to omitted-variable bias:
• To be discussed in Section 5
32
2.5. Multicollinearity
Definition 2.6: (Perfect multicollinearity)
Consider the multiple regression model in Definition 2.1 on Slide7. The regressors X1, . . . , Xk are said to be perfectly multi-collinear if one of the regressors is a perfect linear function ofthe other regressors.
Remarks:
• Under perfect multicollinearity the OLS estimates cannot becalculated due to division by zero in the OLS formulas
• Perfect multicollinearity often reflects a logical mistake inchoosing the regressors or some unrecognized feature in thedata set
33
Example: (Dummy variable trap)
• Consider the student-performance dataset
• Suppose we partition the school districts into the 3 categories(1) rural, (2) suburban, (3) urban
• We represent the categories by the dummy regressors
RURALi =
{
1 if district i is rural0 otherwise
and by SUBURBANi and URBANi analogously defined
• Since each district belongs to one and only one category, wehave for each district i:
RURALi + SUBURBANi + URBANi = 1
34
Example: [continued]
• Now, let us define the constant regressor X0 associated withthe intercept coefficient β0 in the multiple regression modelon Slide 7 by
X0i ≡ 1 for i = 1, . . . n
• Then, for i = 1, . . . , n, the following relationship holds amongthe regressors:
X0i = RURALi + SUBURBANi + URBANi
−→ Perfect multicollinearity
• To estimate the regression we must exclude either one of thedummy regressors or the constant regressor X0 (the interceptβ0) from the regression
35
Theorem 2.7: (Dummy variable trap)
Let there be G different categories in the data set representedby G dummy regressors. If
1. each observation i falls into one and only one category,
2. there is an intercept (constant regressor) in the regression,
3. all G dummy regressors are included as regressors,
then regression estimation fails because of perfect multicollinear-ity.
Usual remedy:
• Exclude one of the dummy regressors(G− 1 dummy regressors are sufficient)
36
Definition 2.8: (Imperfect multicollinearity)
Consider the multiple regression model in Definition 2.1 on Slide7. The regressors X1, . . . , Xk are said to be imperfectly multi-collinear if two or more of the regressors are highly correlated inthe sense that there is a linear function of the regressors that ishighly correlated with another regressor.
Remarks:
• Imperfect multicollinearity does not pose any (numeric) prob-lems in calculating OLS estimates
• However, if regressors are imperfectly multicollinear, then thecoefficients on at least one individual regressor will be impre-cisely estimated
37
Remarks: [continued]
• Techniques for identifying and mitigating imperfect multi-collinearity are presented in econometric textbooks(e.g. Hill et al., 2010, pp. 155-156)
38