Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
EndogeneityInstrumental variables (IV)
Specification Tests
Applied Econometrics (QEM)Enodogeneity & Instrumental Variablesbased on Prinicples of Econometrics
Jakub Mućk
Department of Quantitative Economics
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 1 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Outline
1 Endogeneity
2 Instrumental variables (IV)
3 Specification TestsThe weak instruments testThe Hausman TestTesting Instrument Validity
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 2 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Inconsistency of OLS
Consider standard (general) linear model:
y = α+ β1x1 + . . .+ βkxk + ε (1)
where ε ∼ N (0, σ2ε).The assumptions of OLS (ordinary least squares):
1 Linearity: the specification of (1) is correct.2 Full rank: the matrix X , i.e. X = [x1, . . . , xk ] has full column
rank (not higher than number of observation).3 Nonautocorrelation and homoscedasticity of the errorterm: E(ee′) = σ2εI .
4 Independent observations.5 Exogeneity: E (ε|x1, . . . , xk) = 0.
It is assumed that all independent variables are exogenous(assumption #5).
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 3 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Inconsistency of OLS
Endogenous variablesAn explanatory variable is said to be endogenous when it iscorrelated with error term, i.e., E (ε|x) 6= 0.
Inconsistency of OLSAn endogeneity problem leads to inconsistency of the OLS estimator.
Standard cases when explanatory variables are endogenous:1 Measurement error.2 Omitted variable bias.3 Simultaneity causality.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 4 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Inconsistency of OLS
Endogenous variablesAn explanatory variable is said to be endogenous when it iscorrelated with error term, i.e., E (ε|x) 6= 0.
Inconsistency of OLSAn endogeneity problem leads to inconsistency of the OLS estimator.
Standard cases when explanatory variables are endogenous:1 Measurement error.2 Omitted variable bias.3 Simultaneity causality.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 4 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Measurement error – exampleLet’s assume that true DGP (data generating process) for the con-sumption (c) is as follows:
c = α+ βinc∗ + ε (2)where inc∗ is the permanent income.Usually, we have data on income inc but not the permanent income.If so, we can proxy the permanent income by current income:
inc∗ = inc + η, (3)where η stands for the measurement and η ∼ N
(0, σ2η
).
The current income (inc) is proxy variable for the permanent income(inc∗).Substituting the permanent income into (2):
c = α+ β (inc + η) + ε = α+ βinc + βη + ε = α+ βinc + ν, (4)where ν = ε+ βη.The covariance between inc and error term (ν):cov(inc, ν) = E (incν) = E ((inc∗ + η)(ε+ βη)) = E
(βη2
)= σ2ηβ 6= 0.
(5)Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 5 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Omitted variable bias - example I
Labor economics: returns to education.Let’s assume that true DGP (data generating process) for the log wage(w):
w = α+ ρS + βA+ ε, (6)
where S is the highest grade of schooling completed and A is a measureof personal ability or(and) motivation.Problem: data on A are not unavailable.Consider alternative version of (7):
w = α+ ρS + η, (7)
where the error term η captures personal abilities A, i.e., η = ε+ βA.The OLS estimator of ρ can be simplified to:
ρOLS = cov(w,S)/Var(S). (8)
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 6 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Omitted variable bias - example II
Plugging true DGP for wages w:
ρOLS = cov(α+ ρS + βA+ ε,S)Var(S) , (9)
After manipulation we get:
ρOLS = 1Var(S)E [(α+ ρS + ε)S + βAS] = ρ+β cov(A,S)
Var(S)︸ ︷︷ ︸=bias
6= ρ. (10)
The OLS coefficient on schooling would be upward biased if the signsof β and cov(A,S)/Var(S) are the same.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 7 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Simultaneity causality I
Simple (Keynesian) model of consumption:
c = α+ βy + ε (11)y = c + i (12)
where c is the consumption, y is the aggregate product, i stands forthe investment and ε is the error term, i.e., ε ∼ N (0, σ2ε).In the above system we have to endogenous variables (c and y) and oneexogenous variable (i).
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 8 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Simultaneity causality II
The reduced form will be defined as model in which endogenous vari-able(s) is determined by the exogenous variables as well as the stochas-tic disturbances. In our case:
y = c + iy = α+ βy + ε+ i
(1− β)y = αi + ε
y = α
(1− β) + 1(1− β) i + 1
(1− β)ε.
The general expression of the OLS estimator of the marginal propensityto consume (β) form equation (11):
βOLS = β +∑
(y − y) ε∑(y − y)2︸ ︷︷ ︸
=0 if E(y|ε)=0
. (13)
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 9 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Simultaneity causality III
But we know that y depends on ε (see the reduced form). If so, thenthe βOLS 6= β and the OLS estimator is not consistent.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 10 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Outline
1 Endogeneity
2 Instrumental variables (IV)
3 Specification TestsThe weak instruments testThe Hausman TestTesting Instrument Validity
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 11 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instrumental variables (IV)– general idea
x yβ
x – the explanatory variable;y – the dependent variable;ε – the error term;z – the instrumental variable.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 12 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instrumental variables (IV)– general idea
x yβ
ε
x – the explanatory variable;y – the dependent variable;ε – the error term;z – the instrumental variable.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 12 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instrumental variables (IV)– general idea
x yβ
ε
cov(x, ε) 6= 0
x – the explanatory variable;y – the dependent variable;ε – the error term;z – the instrumental variable.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 12 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instrumental variables (IV)– general idea
x yβ
ε
cov(x, ε) 6= 0
zcov(z, ε) = 0
x – the explanatory variable;y – the dependent variable;ε – the error term;z – the instrumental variable.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 12 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instrumental variables (IV)– general idea
Consider the linear model with single explanatory variable:
y = α+ βx + ε and cov(ε|x) 6= 0. (14)
The OLS estimates of β will be inconsistent.Instrumental variable regression (IV) divides variation of the en-dogenous variable (x) in two parts:
1 a part that might be not correlated with the error term (ε),2 a part that might be correlated with the error term (ε).
It is possible due to using instrumental variable (instrument, z)which is not correlated with ε.The instrument (z) allows to identify the variation in endogenous vari-able that is not correlated with ε and, therefore, can be used to estimateβ.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 13 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The General Instrumental Variables Regression Model
More generally, the IV regression is:y = α+ β1x1 + . . .+ βkxk + βk+1w1 + . . .+ βk+rwr + ε, (15)
wherey is the dependent variable;ε is the error term. In the context of the endogeneity, it might capture omittedfactors as well as measurement error;x1, . . . , xk are k endogenous variables that can be correlated with the errorterm ε;w1, . . . , wr are r exogenous variables that are potentially not correlated withthe error term ε;z1, . . . , zm are m instrumental variables.
IdentificationThe coefficients β1, . . . , βk+r are said to be:
exactly identified if m = k;underidentified if m < k;overidentified if m > k.
The coefficients have to be exactly identified or overidentified ifwe want to apply IV regression.Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 14 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instruments Relevance and Exogenity
Two conditions for valid instruments1 Instrument Relevance
A set of instrumental variables (z1, . . . , zm) must be related to the en-dogenous explanatory variables (x1, . . . , xk). Formally,
cov(zi , xj) 6= 0.
2 Instrument ExogeneityA set of instrumental variables (z1, . . . , zm) cannot be correlated withthe error term ε. Formally,
cov(ε, zi) = 0.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 15 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The Two Stages Least Squares (TSLS) estimatorTwo Stage Least Squares (TSLS):
y = α+ β1x1 + . . .+ βkxk + βk+1w1 + . . .+ βk+rwr + ε. (16)
1 First-Stage Regression(s):Regress each of the endogenous variable (xi) on the instruments (z1, . . . , zm)as well as the exogenous variables (w1, . . . ,wr):
∀i∈1,...,k xi = π0+π1z1+. . .+πmzm+πm+1w1+. . .+πm+rwr +η, (17)
Based on the OLS estimates calculate predicted values, i.e., xi .2 Second -Stage Regression:
Using OLS regress dependent variable y on the predicted values x1, . . .,xk as well as the exogenous variables (w1, . . . ,wr):
y = α+ β1x1 + . . .+ βk xk + βk+1w1 + . . .+ βk+rwr + ε. (18)
The TSLS estimator βTSLS1 , . . ., βTSLS
k , . . ., βTSLSk+r stands for the esti-
mates obtained in the second-stage regression.Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 16 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Monte Carlo experiment
Consider the following DGP (data generating process):
y = α+ βx + u, (19)x = π + γz + ν, (20)
wherey is the dependent variable;x is the explanatory variable;z is the instrument.u and ν are the stochastic disturbances.
Let’s assume thatz ∼ N (2, 1),u and ν are drawn from the joint normal distribution:[
uν
]∼ N
([00
],
[1 0.50.5 1
]),
and α = π = 0, β = .5 and γ = 1.Generate data sample of a size 1000.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 17 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Monte Carlo experiment
Table: Summary of estimatesbased on simulated data
true OLS TSLSα 0.0 -0.540 -0.078
(0.047) (0.067)β 0.5 0.762 0.529
(0.019) (0.030)Note: the expressions in bracketsstand for the standard errors.
−2 0 2 4 6
−2
02
46
x
y
true DGP,the OLS estimates,the TSLS estimates.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 18 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Monte Carlo experiment – 5000 replications
0.40 0.45 0.50 0.55 0.60 0.65 0.70
05
1015
the slope coefficient
−0.4 −0.2 0.0 0.2
02
46
8
the intercept
true DGP, the OLS estimates, the TSLS estimates.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 19 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
Instrumental variables (IV)– examples of instruments (Angrist, Krueger, 2001)
Dependent variable Endogenous x Source of Instru-mental variable
Reference
Earnings Years of schooling Region and timevariation in schoolconstruction
Duflo (2001)
Earnings Years of schooling Proximity to col-lege
Card (1995)
Earnings Years of schooling Quarter of birth Angrist andKrueger (1991)
Earnings Veteran status Cohort dummies Imbens and vander Klaauw (1995)
Birth weight Maternal smoking State cigarettetaxes
Evans and Ringel(1999)
Health Heart attacksurgery
Proximity to car-diac care centers
McClellan, Mc-Neil and New-house (1994)
College enrollment Financial aid Discontinuities infinancial aid for-mula
van der Klaauw(1996)
Crime Police Electoral cycles Levitt (1997)
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 20 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
Outline
1 Endogeneity
2 Instrumental variables (IV)
3 Specification TestsThe weak instruments testThe Hausman TestTesting Instrument Validity
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 21 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
The weak instruments test
To test the strength of instruments it is useful to analyze the first stepregression:
xi = π0 + π1z1 + . . .+ πmzm + πm+1w1 + . . .+ πm+rwr + η, (21)
the intuitive null hypothesis:
H0 π1 = π2 = . . . = πm, (22)
is related to the weak instruments case.It can be tested with F test.The rule of the thumb: if the F statistic is larger than 10 then the nullcan be rejected.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 22 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
The Hausman Test
The Hausman test allows to investigate endogeneity of explanatoryvariable.Key assumption: the IV estimates are unbiased, i.e., instrumentalvariables are strong and exogenous.The null hypothesis:
H0 : cov(x, ε) = 0, (23)
refers to exogeneity of explanatory variable while in the alternativehypothesis
H0 : cov(x, ε) 6= 0, (24)
the explanatory variable is endogenous and, therefore, the OLS esti-mates are inconsistent.There are several version of the Hausman test.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 23 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
The Hausman Test
The Hasuman-Wu test statistic (int hte matrix notation):
H =(βOLS − βTSLS
)′ (Var(βTSLS)−Var(βOLS)
)−1 (βOLS − βTSLS
)(25)
is χ2 distributed with K degrees of freedom.In general, the above version of Hausman test allows to check whetherthe IV and OLS estimates are statistically significant.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 24 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
The Hausman Test I
The other version of Hausman test bases on including the residualsfrom the first step regression into the structural equation. Let’s assumesimply regression model:
y = β0 + β1x + ε, (26)
where x is being tested for endogeneity.First step is the regression of x on the instrumental variables (e.g. z1and z2):
x = θ0 + θ1z1 + θ2z2 + η, (27)
and obtaining the residuals η.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 25 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
The Hausman Test IIIn the second step, the structural equation is extended by residualsfrom the first step, i.e., η
y = β0 + β1x + δη + ε, (28)
The null hypothesis is related to exogeneity:
H : δ = 0, (29)
or no correlation between x and ε. It can be tested with a standardt-test.When there are more explanatory variables that are tested to be en-dogenous then:
regression in the first step is repeated for all variables. The residualsfrom each equation are collected,the auxilary regression in the second step is extended by residuals fromeach first step regression,the F statistic is used to test joint significance of the coefficients on theincluded residuals.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 26 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
Testing Instrument Validity I
When the number of instruments is larger than number of explanatoryvariables we can test their validity.In the first step, we perform IV estimation using all instrumental vari-ables in order to get the residuals ε.Then, the residuals ε are regressed on all available instruments.The surplus instruments are tested with the statistics NR2, where N isthe number of observations and R2 is the coefficient of determination.NR2 is χ2 distributed with m − k degrees of freedom. m − k is thesurplus of the instruments.The null refers to validity of instruments.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 27 / 27
EndogeneityInstrumental variables (IV)
Specification Tests
The weak instruments testThe Hausman TestTesting Instrument Validity
IV – general remarks
Standards errors are little bit more complicated than in the OLSestimator.Weak instruments explain little of variation of the endogenous vari-ables. If the instruments are weak then the TSLS estimates are notreliable.
It can be tested with standard F statistics (testing the hypothesis thatthe coefficients on the all instruments are zero) in the first stage.
Endogeneity of instrumentsThere is no formal statistical test allowing for testing whether instru-ments are correlated with the error term.
Jakub Mućk Applied Econometrics (QEM) Meeting #8 Enodogeneity & IV 28 / 27