Upload
roger-asencios-nunez
View
213
Download
1
Embed Size (px)
DESCRIPTION
Econometria
Citation preview
IntroductionMulticollinearity and Micronumerosity
Model Specification
Multicollinearity, Model Specification: Precisionand Bias
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
February 9, 2009
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
The Classical Linear Model:
1 Linearity: Y = X + u.2 Strict exogeneity: E(u|X) = 03 No Multicollinearity: (X) = K, w.p.1.4 No heteroskedasticity/ serial correlation: V (u|X) = 2In.
Gauss/Markov: = (X X)1X Y is best linear unbiased.
This does not mean that is good. It is interesting to explore whatthings make it worse: less precise (higher variance) and morebiased.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Multicollinearity, Micronumerosity and Imprecisions
A crucial assumption is the no-multicollinearity assumption,(X) = K, which guarantees (X X) is invertible, so the OLSproblem has a unique solution.
Any violation to this assumption, so (X) < K will refer to asexact multicollinearity and elliminates the possibility of findingunique OLS estimates.
High multicolinearity is a rather contradictory notion where(X) = K, but the correlation among variables is not exactbut high. In such case, no classical assumptions areremoved, so the Gauss/Markov result holds.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
The following result suggest why practitioners worry about highmulticollinearityResult:
V (j) =2[
(1R2j )Sjj]
with R2j is the R2 coefficient of regressing Xj on all other
explanatory variables, and Sjj =n
i=1(Xji Xj)2
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Proof: By the FWL theorem,
j =
ni=1X
jiYin
i=1X2ji
and
V (j) =2n
i=1X2ji
=2n
i=1 X2ji
SjjSjj
where Xj MjXj and Mj is a matrix that gets residuals ofregression Xj on all other explanatory variables in the model.The result follows by noting
R2j = 1n
i=1X2ji
Sjj= 1
ni=1X
2jin
i=1(Xji Xj)2
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Factors affecting V (j)
Go back to our result
V (j) =2
(1R2j )Sjj=2
n
1(1R2j )(Sjj/n)
Later on we will see that Sjj/n should be a rather stablemagnitude. So there are three main factors that contribute to thevariance:
1 2, the error variance.
2 n, the sample size.
3 R2j , the correlation between Xj and all other variables.
It is important to note that high multicolinearity affects thevariance in the same manner as the number of observations(micronumerosity).
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
It is interesting to remark that under high multicollinearity theremight be situations with really low t significance statistics and highR2 and high global significance F statistics.
We have already explore that high multicollinearity induceshigh variance, and hence is compatible with low ts.
R2 is related to the distance between Y and the span of X,which does not depend on the degree of correlation among itscomponents.
Check carefully what significance ts mean and what globalsignificance F means.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Model a) High multicollinearity
cor(x,y)=0.998983
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04171 0.04426 0.943 0.348
y 0.57840 0.83608 0.692 0.491
x 1.33508 0.83893 1.591 0.115
Residual standard error: 0.4415 on 97 degrees of freedom
Multiple R-squared: 0.9635, Adjusted R-squared: 0.9628
F-statistic: 1282 on 2 and 97 DF, p-value: < 2.2e-16
Model b) Low multicollinearity
cor(x,y1)= 0.4047114
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0009127 0.0465794 -0.02 0.984
y1 0.9773821 0.0220314 44.36
IntroductionMulticollinearity and Micronumerosity
Model Specification
Specification errors, bias and imprecision
So far we have considered that our linear model Y = X + u iscorrect
Consider the following case
Y = X11 +X22 + u
where all classical assumptions hold K1 and K2 are the columns ofX1 and X2. Trivially, our original model corresponds toX = [X1 X2], with K = K1 +K2.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Consider the following scenarios regarding 2 and thecorresponding estimation strategies:
Omission of relevant variables: 2 6= 0, but we wronglyproceed as if 2 = 0, that is, we regress Y on X1 only.Inclusion of irrelevant variables: 2 = 0, but we wronglyproceed as if 2 might be 6= 0, that is, regress Y on X1 andX2 when we could have ignored X2.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Biases
Let us compare results for the estimation of 1 in the two scenarios
I) Omission of relevant variables
First note that in this case
Y = X11 + u
with u = X22 + u. Let 1 = (X1X1)
1X 1Y .
It is easy to see that 1 will be biased unless E(X2|X1) = 0. Thisis a really important result: not all omissions lead to biases.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
II) Inclusion of Irrelavant Variables
In this case we would estimate 1 jointly with 2 by regressing Yon X1 and X2, that is, 1 is a subvector of
=[12
]= (X X)1X Y
It is important to see that under the classical assumptions andhence 1 will be unbiased. Why?
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Variances
Let us compute the bias of 1 explicitely,
1 = (X1X1)
1X 1Y
= (X 1X1)1X 1(X11 +X22 + u)
E(1 |X1) = 1 + (X 1X1)1X 1E(X2|X1) bias
From here, it easy to check
V (1 |X) = 2(X 1X1)1
Using the FWL theorem
V (1|X) = 2(X 1M2X1)1
with M2 = I X2(X 2X2)1X 2.Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Now: V (1|X) V (1 |X) = 2[(X 1M2X1)
1 (X 1X1)1]
Aside: If AB is psd, then B1 A1 is psd. (Greene (2000, pp.49)).
Note: X 1X1 X 1M2X1 = X 1(I M2)X1 = X 1P2X1.
Since P2 is idempotent, for every c
IntroductionMulticollinearity and Micronumerosity
Model Specification
Bias-variance trade-off
To summarize:
In practice we do not know which model holds (the large oneor the small one)?
The trade-off: estimating a small model (omit variables)implies a gain in precision and a likely bias. A large model isless likely to be biased and will be more inefficient.
Variable omission does not necessarily lead to biases.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
Ommited Variable Bias: an example
Computer generated data, but based on Appleton, French andVanderpump (Ignoring a Covariate: an Example of SimponsParadox, The American Statistician, 50, 4, 1996)
Y = risk of death.
SMOKE = consumption of cigarrettes.
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
. reg y smoke
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 1, 98) = 194.34
Model | 7613.25147 1 7613.25147 Prob > F = 0.0000
Residual | 3839.18734 98 39.1753811 R-squared = 0.6648
-------------+------------------------------ Adj R-squared = 0.6614
Total | 11452.4388 99 115.6812 Root MSE = 6.259
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | -1.819348 .1305081 -13.94 0.000 -2.078337 -1.560359
_cons | 158.5975 4.774249 33.22 0.000 149.1231 168.0718
------------------------------------------------------------------------------
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and Micronumerosity
Model Specification
. reg y smoke age
Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 2, 97) = 5424.58
Model | 11350.9524 2 5675.47622 Prob > F = 0.0000
Residual | 101.486373 97 1.04625126 R-squared = 0.9911
-------------+------------------------------ Adj R-squared = 0.9910
Total | 11452.4388 99 115.6812 Root MSE = 1.0229
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | .9431267 .050902 18.53 0.000 .8421004 1.044153
age | .9804631 .0164039 59.77 0.000 .9479059 1.01302
_cons | 12.84084 2.560392 5.02 0.000 7.759169 17.92251
------------------------------------------------------------------------------
. cor y smoke age
(obs=100)
| y smoke age
-------------+---------------------------
y | 1.0000
smoke | -0.8153 1.0000
age | 0.9797 -0.9080 1.0000
Walter Sosa-Escudero Multicollinearity, Model Specification: Precision and Bias
IntroductionMulticollinearity and MicronumerosityModel Specification