Upload
josie-shamrock
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
Part 16: Nonlinear Effects [ 1/103]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Econometric Analysis of Panel Data
16. Nonlinear Effects Models and Models for Binary Choice
Part 16: Nonlinear Effects [ 3/103]
The “Panel Probit Model”
it it it it it it
12 1Ti1
12 2,Ti2
iT 1T 2,T
The German innovation data: T=5 (N=1270)
y *= x + , y 1[x + > 0]
1 ...01 ...0
~N ,... ... ... ...
0 ... 1
β β
Part 16: Nonlinear Effects [ 4/103]
FIML
i5 i5 i1 i1
N
i1 i5i 1
(2y 1) (2y 1)N
1 5i 1
5/ 2 1/ 2 1
i1 i2 12 i1 i5 15
i1 i2 12 i2 i2 22
logL log Prob[y ,...,y ]
log ... g( | *)dv ...dv
g( | *) (2 ) | * | exp[ (1/ 2) ( *) ]
1 q q ... q q
q q 1 ... q q*
... ... ... ..
x β x βv Σ
v Σ Σ v' Σ v
Σ
i1 i5 1T i2 i5 2,T 1
it it
.
q q q q ... 1
q 2y 1
See Greene, W., “Convenient Estimators for the Panel Probit Model: Further Results,” Empirical Economics, 29, 1, Jan. 2004, pp. 21-48.
Part 16: Nonlinear Effects [ 5/103]
GMM
it i
i1 i1
i2 i2
i5 i5
From the marginal distributions:
E[y ( ) | ] 0 (note: strict exogeneity)
Suggests orthogonality conditions
(y ( ))
(y ( ))E
...
(y ( ))
it
i1
i2
i5
x β X
x β x 0
x β x 0
...
x β x 0
5*K moments.
Part 16: Nonlinear Effects [ 6/103]
GMM Estimation-1
i1 i1 i1 i1
1270 i2 i2i 1
i5 i5
Step 1. Pool the data and use probit to estimate .
Compute weighting matrix.
ˆ ˆ(y ( )) (y ( ))
ˆ1 1 (y ( )) (y12701270 ...
ˆ(y ( ))
i1 i1
i2
i5
β
x β x x β x
x β xW
x β x
i2 i2
i5 i5
ˆ( ))
...
ˆ(y ( ))
i2
i5
x β x'
x β x
Part 16: Nonlinear Effects [ 7/103]
GMM Estimation-2
-1
i1 i1
1270 i2 i2
i 1
i5 i5
Step 2. Minimize GMM criterion q = ( ) ( )
(y ( ))
(y ( ))1( )=
...1270
(y ( ))
Note: 8 parameters, 5(8)=40 moment equations.
i1
i2
i5
g β 'W g β
x β x
x β xg β
x β x
Part 16: Nonlinear Effects [ 8/103]
GEE Estimation
Part 16: Nonlinear Effects [ 9/103]
Fractional Response
Part 16: Nonlinear Effects [ 10/103]
Fractional Response Model
Part 16: Nonlinear Effects [ 11/103]
Fractional Response Model
Part 16: Nonlinear Effects [ 12/103]
Part 16: Nonlinear Effects [ 13/103]
Part 16: Nonlinear Effects [ 14/103]
Part 16: Nonlinear Effects [ 15/103]
Part 16: Nonlinear Effects [ 16/103]
Part 16: Nonlinear Effects [ 17/103]
Part 16: Nonlinear Effects [ 18/103]
Part 16: Nonlinear Effects [ 19/103]
Part 16: Nonlinear Effects [ 20/103]
Part 16: Nonlinear Effects [ 21/103]
Application: Health Care Panel DataGerman Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).
Variables in the file are
DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status
Part 16: Nonlinear Effects [ 22/103]
Unbalanced Panels
Group Sizes
Most theoretical results are for balanced panels.
Most real world panels are unbalanced.
Often the gaps are caused by attrition.
The major question is whether the gaps are ‘missing completely at random.’ If not, the observation mechanism is endogenous, and at least some methods will produce questionable results.
Researchers rarely have any reason to treat the data as nonrandomly sampled. (This is good news.)
Part 16: Nonlinear Effects [ 23/103]
Unbalanced Panels and Attrition ‘Bias’
Test for ‘attrition bias.’ (Verbeek and Nijman, Testing for Selectivity Bias in Panel Data Models, International Economic Review, 1992, 33, 681-703. Variable addition test using covariates of presence in the panel Nonconstructive – what to do next?
Do something about attrition bias. (Wooldridge, Inverse Probability Weighted M-Estimators for Sample Stratification and Attrition, Portuguese Economic Journal, 2002, 1: 117-139) Stringent assumptions about the process Model based on probability of being present in each wave of the panel
Part 16: Nonlinear Effects [ 24/103]
Panel Data Binary Choice ModelsRandom Utility Model for Binary Choice
Uit = + ’xit + it + Person i specific effect
Fixed effects using “dummy” variables
Uit = i + ’xit + it
Random effects using omitted heterogeneity
Uit = + ’xit + it + ui
Same outcome mechanism: Yit = 1[Uit > 0]
Part 16: Nonlinear Effects [ 25/103]
Pooled Model
Part 16: Nonlinear Effects [ 26/103]
Ignoring Unobserved Heterogeneity
it
it
it
it it
x
x β
x β
x β x δ
i it
it i it
it it i it
2it it u
Assuming strict exogeneity; Cov( ,u ) 0
y *= u
Prob[y 1| x ] Prob[u - ]
Using the same model format:
Prob[y 1| x ] F / 1+ F( )
This is the 'population averaged model.'
Part 16: Nonlinear Effects [ 27/103]
Ignoring Heterogeneity in the RE Model
Ignoring heterogeneity, we estimate not .
Partial effects are f( ) not f( )
is underestimated, but f( ) is overestimated.
Which way does it go? Maybe ignoring u is ok?
Not if we want
it it
it
δ β
δ x δ β x β
β x β
to compute probabilities or do
statistical inference about Estimated standard
errors will be too small.
β.
Part 16: Nonlinear Effects [ 28/103]
Ignoring Heterogeneity (Broadly) Presence will generally make parameter estimates look
smaller than they would otherwise. Ignoring heterogeneity will definitely distort standard
errors. Partial effects based on the parametric model may not
be affected very much. Is the pooled estimator ‘robust?’ Less so than in the
linear model case.
Part 16: Nonlinear Effects [ 29/103]
Pooled vs. RE Panel Estimator----------------------------------------------------------------------Binomial Probit ModelDependent variable DOCTOR --------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+-------------------------------------------------------------Constant| .02159 .05307 .407 .6842 AGE| .01532*** .00071 21.695 .0000 43.5257 EDUC| -.02793*** .00348 -8.023 .0000 11.3206 HHNINC| -.10204** .04544 -2.246 .0247 .35208--------+-------------------------------------------------------------Unbalanced panel has 7293 individuals--------+-------------------------------------------------------------Constant| -.11819 .09280 -1.273 .2028 AGE| .02232*** .00123 18.145 .0000 43.5257 EDUC| -.03307*** .00627 -5.276 .0000 11.3206 HHNINC| .00660 .06587 .100 .9202 .35208 Rho| .44990*** .01020 44.101 .0000--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 30/103]
Partial Effects----------------------------------------------------------------------Partial derivatives of E[y] = F[*] withrespect to the vector of characteristicsThey are computed at the means of the XsObservations used for means are All Obs.--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Elasticity--------+------------------------------------------------------------- |Pooled AGE| .00578*** .00027 21.720 .0000 .39801 EDUC| -.01053*** .00131 -8.024 .0000 -.18870 HHNINC| -.03847** .01713 -2.246 .0247 -.02144--------+------------------------------------------------------------- |Based on the panel data estimator AGE| .00620*** .00034 18.375 .0000 .42181 EDUC| -.00918*** .00174 -5.282 .0000 -.16256 HHNINC| .00183 .01829 .100 .9202 .00101--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 31/103]
Effect of Clustering Yit must be correlated with Yis across periods Pooled estimator ignores correlation Broadly, yit = E[yit|xit] + wit,
E[yit|xit] = Prob(yit = 1|xit) wit is correlated across periods
Assuming the marginal probability is the same, the pooled estimator is consistent. (We just saw that it might not be.)
Ignoring the correlation across periods generally leads to underestimating standard errors.
Part 16: Nonlinear Effects [ 32/103]
‘Cluster’ Corrected Covariance Matrix
1
the number if clusters
number of observations in cluster c
= negative inverse of second derivatives matrix
= derivative of log density for observation
c
ic
C
n
H
g
Part 16: Nonlinear Effects [ 33/103]
Cluster Correction: Doctor----------------------------------------------------------------------Binomial Probit ModelDependent variable DOCTORLog likelihood function -17457.21899--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- | Conventional Standard ErrorsConstant| -.25597*** .05481 -4.670 .0000 AGE| .01469*** .00071 20.686 .0000 43.5257 EDUC| -.01523*** .00355 -4.289 .0000 11.3206 HHNINC| -.10914** .04569 -2.389 .0169 .35208 FEMALE| .35209*** .01598 22.027 .0000 .47877--------+------------------------------------------------------------- | Corrected Standard ErrorsConstant| -.25597*** .07744 -3.305 .0009 AGE| .01469*** .00098 15.065 .0000 43.5257 EDUC| -.01523*** .00504 -3.023 .0025 11.3206 HHNINC| -.10914* .05645 -1.933 .0532 .35208 FEMALE| .35209*** .02290 15.372 .0000 .47877--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 34/103]
Random Effects
Part 16: Nonlinear Effects [ 35/103]
Quadrature – Butler and Moffitt (1982)
xiN
ii 1
N
ii
T
it it
1
it 1 i
2
u
Th
F(y , v )
g(
v
1 -vexp
22
logL l
is method is used in most commerical software since
og dv
= log dv
(make a change of variable to w = v/ 2
v
198
)
2
=
N
ii 1
N H
i 1 h
2
h h
hh 1
1 l g( 2w)
g( 2z )
The values
og dw
The integral can be
of w (weights) and z
exp -w
(node
computed using
s) are found i
Hermite quadr
n published
tab
ature.
1
les such as A
log w
bramovitz and Stegun (or on the web). H is by
choice. Higher H produces greater accuracy (but takes longer).
2i u
u i
i
u ~ N[0, ]
= v
where v ~ N[0,1]
Part 16: Nonlinear Effects [ 36/103]
Quadrature Log Likelihood
i
i
N H
i 1 h
h
T
it it u ht 1
T
it it ht
h 1
N H
i 1 1 1h
After all the substitutions, the function to be maximized:
Not simple, but
F(y , 2 z )
F(y ,
w1
logL log
1 lo
feasib
)
l
wg z
e.
x
x
Part 16: Nonlinear Effects [ 37/103]
Simulation Based Estimator
xiN
ii 1 i
2i
T
it it u it 1
i
N
ii 1
N
i
i
i1
logL log dv
= log dv
T ]
The expected value of the functio
his equals log E[
n of v can be a
F(y ,
ppro
v
-v1
xima
ex
te
v )
g(v )
d
by dr
g(
aw
p22
v )
ing
xiTN R
S it it u ir
i
i 1
r
ir
r 1 t 1
R random draws v from the population N[0,1] and
averaging the R functions of v . We maxi
1logL log F(y , v
i
)R
mze
Part 16: Nonlinear Effects [ 38/103]
Random Effects Model: Quadrature----------------------------------------------------------------------Random Effects Binary Probit ModelDependent variable DOCTORLog likelihood function -16290.72192 Random EffectsRestricted log likelihood -17701.08500 PooledChi squared [ 1 d.f.] 2820.72616Estimation based on N = 27326, K = 5Unbalanced panel has 7293 individuals--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+-------------------------------------------------------------Constant| -.11819 .09280 -1.273 .2028 AGE| .02232*** .00123 18.145 .0000 43.5257 EDUC| -.03307*** .00627 -5.276 .0000 11.3206 HHNINC| .00660 .06587 .100 .9202 .35208 Rho| .44990*** .01020 44.101 .0000--------+------------------------------------------------------------- |Pooled EstimatesConstant| .02159 .05307 .407 .6842 AGE| .01532*** .00071 21.695 .0000 43.5257 EDUC| -.02793*** .00348 -8.023 .0000 11.3206 HHNINC| -.10204** .04544 -2.246 .0247 .35208--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 39/103]
Random Parameter Model----------------------------------------------------------------------Random Coefficients Probit ModelDependent variable DOCTOR (Quadrature Based)
Log likelihood function -16296.68110 (-16290.72192) Restricted log likelihood -17701.08500Chi squared [ 1 d.f.] 2808.80780Simulation based on 50 Halton draws--------+-------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]--------+------------------------------------------------- |Nonrandom parameters AGE| .02226*** .00081 27.365 .0000 ( .02232) EDUC| -.03285*** .00391 -8.407 .0000 (-.03307) HHNINC| .00673 .05105 .132 .8952 ( .00660) |Means for random parametersConstant| -.11873** .05950 -1.995 .0460 (-.11819) |Scale parameters for dists. of random parametersConstant| .90453*** .01128 80.180 .0000--------+-------------------------------------------------------------
Using quadrature, a = -.11819. Implied from these estimates is.904542/(1+.904532) = .449998 compared to .44990 using quadrature.
Part 16: Nonlinear Effects [ 40/103]
A Dynamic Model
x
x
it it i,t 1 it i
it i,t 1 i0 it it i,t 1 i
y 1[ y u > 0]
Two similar 'effects'
Unobserved heterogeneity
State dependence = state 'persistence'
Pr(y 1| y ,...,y ,x ,u] F[ y u]
How to estimate , , marginal effects, F(.), etc?
(1) Deal with the latent common effect
(2) Handle the lagged effects:
This encounters the initial conditions problem.
Part 16: Nonlinear Effects [ 41/103]
Dynamic Probit Model: A Standard Approach
T
i1 i2 iT i0 i i,t 1 i itt 1
i1 i2 iT i0
(1) Conditioned on all effects, joint probability
P(y ,y ,...,y | y , ,u ) F( y u ,y )
(2) Unconditional density; integrate out the common effect
P(y ,y ,...,y | y , )
i it
i
x x β
x
i1 i2 iT i0 i i i0 i
2i i0 i0 u i i1 i2 iT
i i0 u i
P(y ,y ,...,y | y , ,u )h(u | y , )du
(3) Density for heterogeneity
h(u | y , ) N[ y , ], = [ , ,..., ], so
u = y + w (contains every
i i
i i
i
x x
x x δ x x x x
x δ
it
i1 i2 iT i0
T
i,t 1 i0 u i it i it 1
period of )
(4) Reduced form
P(y ,y ,...,y | y , )
F( y y w ,y )h(w )dw
This is a random effects model
i
it i
x
x
x β x δ
Part 16: Nonlinear Effects [ 42/103]
Simplified Dynamic Model
i
2i i0 i0 u
i i0 i
Projecting u on all observations expands the model enormously.
(3) Projection of heterogeneity only on group means
h(u | y , ) N[ y , ] so
u = y + w
(4) Re
i i
i
x x δ
x δ
i1 i2 iT i0
T
i,t 1 i0 u i it i it 1
duced form
P(y ,y ,...,y | y , )
F( y y w ,y )h(w )dw
Mundlak style correction with the initial value in the equation.
This is (again) a random effects mo
i
it i
x
x β x δ
del
Part 16: Nonlinear Effects [ 43/103]
A Dynamic Model for Public Insurance
AgeHousehold IncomeKids in the householdHealth Status
Add initial value, lagged value, group means
Part 16: Nonlinear Effects [ 44/103]
Dynamic Common Effects Model
Part 16: Nonlinear Effects [ 45/103]
Fixed Effects
Part 16: Nonlinear Effects [ 46/103]
Fixed Effects Models Estimate with dummy variable coefficients Uit = i + ’xit + it
Can be done by “brute force” for 10,000s of individuals
F(.) = appropriate probability for the observed outcome Compute and i for i=1,…,N (may be large) See FixedEffects.pdf in course materials.
1 1log log ( , )iN T
it i iti tL F y
x
Part 16: Nonlinear Effects [ 47/103]
Unconditional Estimation Maximize the whole log likelihood
Difficult! Many (thousands) of parameters.
Feasible – NLOGIT (2001) (“Brute force”)
Part 16: Nonlinear Effects [ 48/103]
Fixed Effects Health ModelGroups in which yit is always = 0 or always = 1. Cannot compute αi.
Part 16: Nonlinear Effects [ 49/103]
Conditional Estimation Principle: f(yi1,yi2,… | some statistic) is free
of the fixed effects for some models. Maximize the conditional log likelihood, given
the statistic. Can estimate β without having to estimate αi. Only feasible for the logit model. (Poisson
and a few other continuous variable models. No other discrete choice models.)
Part 16: Nonlinear Effects [ 50/103]
Binary Logit Conditional Probabiities
i
i
1 1 2 2
1 1
1
T
S1 1
All
Prob( 1| ) .1
Prob , , ,
exp exp
exp exp
i it
i it
i i
i i
i i
t i
i
t i
it it
i i i i iT iT
T T
it it it itt t
T T
it it it i
T
itt
td St t
ey
e
Y y Y y Y y
y
d d
y
y
x
xx
x x
x x
β
β
i
t i
different
iT
it t=1 i
ways that
can equal S
.
Denominator is summed over all the different combinations of T valuesof y that sum to the same sum as the observed . If S is this sum,
there are
it
it
d
y
i
terms. May be a huge number. An algorithm by KrailoS
and Pike makes it simple.
T
Part 16: Nonlinear Effects [ 51/103]
Example: Two Period Binary Logit
i it
i it
i
i
i i i
t it i
it it
T
it itTt 1
i1 i1 i2 i2 iT iT it Tt 1
it itd St 1
2
i1 i2 itt 1
i1 i
eProb(y 1| ) .
1 e
exp y
Prob Y y , Y y , , Y y y ,data .
exp d
Prob Y 0, Y 0 y 0,data 1.
Prob Y 1, Y
x β
x β
x
x
x
2
2 itt 12
i1 i2 itt 12
i1 i2 itt 1
exp( )0 y 1,data
exp( ) exp( )exp( )
Prob Y 0, Y 1 y 1,data exp( ) exp( )
Prob Y 1, Y 1 y 2,data 1.
i1
i1 i2
i2
i1 i2
x βx β x β
x βx β x β
Part 16: Nonlinear Effects [ 52/103]
Estimating Partial Effects
“The fixed effects logit estimator of immediately gives us the effect of each element of xi on the log-odds ratio… Unfortunately, we cannot estimate the partial effects… unless we plug in a value for αi. Because the distribution of αi is unrestricted – in particular, E[αi] is not necessarily zero – it is hard to know what to plug in for αi. In addition, we cannot estimate average partial effects, as doing so would require finding E[Λ(xit + αi)], a task that apparently requires specifying a distribution for αi.”
(Wooldridge, 2002)
Part 16: Nonlinear Effects [ 53/103]
Logit Constant Terms
ii
i
i
ˆT i it
i ˆt 1i i i i
Step 1. Estimate with Chamberlain's conditional estimator
Step 2. Treating as if it were known, estimate from the
first order condition
c1 e e 1y
T T 1 c1 e e
it
it
x β
x β
β
β
i iT T itt 1 t 1
t i i it
i i i i
it
i
i
c1T c
Estimate 1 / exp( ) log
ˆc exp is treated as known data.
Solve one equation in one unknown for each .
Note there is no solution if y = 0 or 1.
Iterating back
itx β
and forth does not maximize logL.
Part 16: Nonlinear Effects [ 54/103]
Fixed Effects Logit Health Model: Conditional vs. Unconditional
Part 16: Nonlinear Effects [ 55/103]
Advantages and Disadvantages of the FE Model
Advantages Allows correlation of effect and regressors Fairly straightforward to estimate Simple to interpret
Disadvantages Model may not contain time invariant variables Not necessarily simple to estimate if very large
samples (Stata just creates the thousands of dummy variables)
The incidental parameters problem: Small T bias
Part 16: Nonlinear Effects [ 56/103]
Incidental Parameters Problems: Conventional Wisdom
General: The unconditional MLE is biased in samples with fixed T except in special cases such as linear or Poisson regression (even when the FEM is the right model). The conditional estimator (that bypasses
estimation of αi) is consistent.
Specific: Upward bias (experience with probit and logit) in estimators of
Part 16: Nonlinear Effects [ 57/103]
A Monte Carlo Study of the FE Estimator: Probit vs. Logit
Estimates of Coefficients and Marginal Effects at the Implied Data Means
Results are scaled so the desired quantity being estimated (, , marginal effects) all equal 1.0 in the population.
Part 16: Nonlinear Effects [ 58/103]
Fixed Effects Attention mostly focused on index function
models; f(yit|xit) = some function of xit’β+αi. Incidental parameters problems
Bias of estimator of β is O(1/T) How do we estimate αi? How can we compute interesting partial effects?
Models Linear model: No problem Poisson (nonlinear model): No problem 1 or 2 other models: No problem Other nonlinear models: The literature speaks in
generalities The probit and logit models have been analyzed at length Almost nothing is known about any other model save for
Greene’s (2002-2004) limited Monte Carlo studies (frontier, tobit, truncation, ordered probit, probit, logit)
Part 16: Nonlinear Effects [ 59/103]
Bias Correction Estimators Motivation: Undo the incidental parameters bias in the
fixed effects probit model: (1) Maximize a penalized log likelihood function, or (2) Directly correct the estimator of β
Advantages For (1) estimates αi so enables partial effects Estimator is consistent under some circumstances (Possibly) corrects in dynamic models
Disadvantage No time invariant variables in the model Practical implementation Extension to other models? (Ordered probit model (maybe) – see
JBES 2009)
Part 16: Nonlinear Effects [ 60/103]
Bias Reduction Parametric (probit and logit) models with fixed effects (We examine non- and semiparametric methods at the end
of the course.) Recent references: All about probit and logit models.
[1] Carro, J., “Estimating dynamic panel data discrete choice models with fixed effects,” JE, 140, 2007, pp. 503-528
[2] Val, F., “Fixed Effects estimation of structural parameters and marginal effects in panel probit models,” JE, 2010
[3] Hahn, J. and G. Kuersteiner, “Bias reduction for dynamic nonlinear panel models with fixed effects,” UCLA, 2003
[4] Hahn, J. and W. Newey, “Jackknife and Analytical Bias reduction for nonlinear panel models,” Econometrica, 2004.
See, also, bibliographies and work of T. Woutersen, B. Honoré and E. Kyriazidou.
Part 16: Nonlinear Effects [ 61/103]
Bias Reduction – 1: Hahn All rely on a large T approximation to the bias
when T is (very small) All analyze the equivalent of the brute force,
unconditional estimator. Hahn/Kuersteiner and Hahn/Newey
plim bMLE = β + B(T) where B(T) is O(1/T) Derive an expression for B(T) The bias corrected estimator is obtained by subtraction No further analysis is obtained to estimate fixed effects
or partial effects.
Part 16: Nonlinear Effects [ 62/103]
Bias Reduction – 2: Val
Plim bMLE = β + B(T) Find, D(T) a large sample approximation
such that Plim bMLE +D(T) = β + F(T) where F(T) is O(1/T2)
Finds a counterpart approximation to the marginal effects.
Part 16: Nonlinear Effects [ 63/103]
Bias Reduction – 3: Carro
Change the log likelihood. Maximum Modified Likelihood Estimator = MMLE
Maximize MMLE such that the solution to MMLE is bMMLE plim bMMLE = β +G(T) where G(T) is O(1/T2).
Also obtains a solution for αi (unlike the others).
ai,MMLE = f(bMMLE)
(A problem? When yit is always the same, there is no solution for ai.)
Part 16: Nonlinear Effects [ 64/103]
Bias Reduction?
Approximations rely on large T Work “moderately well” when T is as low
as 8 or 10. Completely miss the mark when T=2, 3,4 Nothing is known about any other models.
Part 16: Nonlinear Effects [ 65/103]
A Mundlak Correction for the FE Model
*it i
it it
i
y ,i = 1,...,N; t = 1,...,T
y 1 if y > 0, 0 otherwise.
(Projection, not necessarily conditional mean)
w
i it it
i iu
Fixed Effects Model :
x
Mundlak (Wooldridge, Heckman, Chamberlain), ...
x
u 1 2
*it
here u is normally distributed with mean zero and standard
deviation and is uncorrelated with or ( , ,..., )
y ,i = 1,...,N; t = 1,..
i i i iT
i it it iu
x x x x
Reduced form random effects model
x x i
it it
.,T
y 1 if y > 0, 0 otherwise.
Part 16: Nonlinear Effects [ 66/103]
Mundlak Correction
Part 16: Nonlinear Effects [ 67/103]
A Variable Addition Test for FE vs. RE
The Wald statistic of 45.27922 and the likelihood ratio statistic of 40.280 are both far larger than the critical chi squared with 5 degrees of freedom, 11.07. This suggests that for these data, the fixed effects model is the preferred framework.
Part 16: Nonlinear Effects [ 68/103]
Fixed Effects Models Summary Incidental parameters problem if T < 10 (roughly) Inconvenience of computation Appealing specification Alternative semiparametric estimators?
Theory not well developed for T > 2 Not informative for anything but slopes (e.g.,
predictions and marginal effects) Ignoring the heterogeneity definitely produces an
inconsistent estimator (even with cluster correction!) A Hobson’s choice Mundlak correction is a useful common approach.
Part 16: Nonlinear Effects [ 69/103]
Conditional vs. UnconditionalDep. Var. = Healthy
Note, this estimator is not consistent – Incidental Parameters Problem
Part 16: Nonlinear Effects [ 70/103]
Escaping the FE Assumptions
it it i it i i i
it it i it i
Chamberlain (again)
Structure
Prob(y 1| x ) F( x ), w
Reduced form is a random effects model
Prob(y 1| x ) F( x w )
(Does not allow time invariant effects (again).)
Estimation:
x
x
(1) FIML
(2) Period by period, then reconcile with minimum distance
Part 16: Nonlinear Effects [ 71/103]
Modeling a Binary Outcome
Did firm i produce a product or process innovation in year t ? yit : 1=Yes/0=No
Observed N=1270 firms for T=5 years, 1984-1988 Observed covariates: xit = Industry, competitive
pressures, size, productivity, etc. How to model?
Binary outcome Correlation across time Heterogeneity across firms
Part 16: Nonlinear Effects [ 72/103]
Application
Part 16: Nonlinear Effects [ 73/103]
Part 16: Nonlinear Effects [ 74/103]
Estimates of a Fixed Effects Probit Model
-----------------------------------------------------------------------------FIXED EFFECTS Probit ModelDependent variable IPLog likelihood function -2087.22475Estimation based on N = 6350, K = 953Inf.Cr.AIC = 6080.4 AIC/N = .958Model estimated: Apr 16, 2013, 10:00:53Unbalanced panel has 1270 individualsSkipped 552 groups with inestimable aiPROBIT (normal) probability model--------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence IP| Coefficient Error z |z|>Z* Interval--------+-------------------------------------------------------------------- |Index function for probability EMPLP| .12108D-05 .00015 .01 .9934 -.28688D-03 .28930D-03LOGSALES| -.53108 .34472 -1.54 .1234 -1.20672 .14457 IMUM| 4.26647 2.87407 1.48 .1377 -1.36660 9.89953 FDIUM| -7.34800** 3.31144 -2.22 .0265 -13.83830 -.85770--------+--------------------------------------------------------------------Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.Note: ***, **, * ==> Significance at 1%, 5%, 10% level.-----------------------------------------------------------------------------
Part 16: Nonlinear Effects [ 75/103]
+---------------------------------------------+| Probit Regression Start Values for IP |+---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ EMPLP .00013619 .154784D-04 8.798 .0000 580.944724 LOGSALES .13668535 .01792267 7.626 .0000 10.5400961 IMUM .89572747 .14180988 6.316 .0000 .25275054 FDIUM 3.24193536 .38236984 8.479 .0000 .04580618 Constant -1.63354524 .20737277 -7.877 .0000+---------------------------------------------+| Random Effects Binary Probit Model |+---------+--------------+----------------+--------+---------+----------+ EMPLP .00017616 .117150D-04 15.037 .0000 580.944724 LOGSALES .21174534 .04309101 4.914 .0000 10.5400961 IMUM 1.41657383 .34121909 4.152 .0000 .25275054 FDIUM 4.41817066 .83712165 5.278 .0000 .04580618 Constant -2.51015928 .49459030 -5.075 .0000 Rho .58588783 .01864491 31.423 .0000+---------------------------------------------+| FIXED EFFECTS Probit Model |+---------+--------------+----------------+--------+---------+----------+ EMPLP .121081D-05 .00014700 .008 .9934 419.786630 LOGSALES -.53108315 .34473601 -1.541 .1234 10.5368540 IMUM 4.26652343 2.87418573 1.484 .1377 .25359436 FDIUM -7.34808205 3.31155361 -2.219 .0265 .04444097
Pooled, Fixed Effects and Random Effects Probit
Part 16: Nonlinear Effects [ 76/103]
Fixed Effects
Advantages Allows correlation of effect and regressors Fairly straightforward to estimate Simple to interpret
Disadvantages Not necessarily simple to estimate if very large
samples (Stata just creates the thousands of dummy variables)
The incidental parameters problem: Small T bias. No time invariant variables
Part 16: Nonlinear Effects [ 77/103]
Incidental Parameters Problems: Conventional Wisdom
General: Biased in samples with fixed T except in special cases such as linear or Poisson regression
Specific: Upward bias (experience with probit and logit) in estimators of
Part 16: Nonlinear Effects [ 78/103]
What We KNOW - Analytic
Newey and Hahn: MLE converges in probability to a vector of constants. (Variance diminishes with increase in N).
Abrevaya and Hsiao: Logit estimator converges to 2 when T = 2.
Han, Schmidt, Greene: Probit estimator converges to 2 when T = 2.
Part 16: Nonlinear Effects [ 79/103]
What We THINK We Know – Monte Carlo
Heckman: Bias in probit estimator is small if T 8 Bias in probit estimator is toward 0 in some
cases Katz (et al – numerous others), Greene
Bias in probit and logit estimators is large Upward bias persists even as T 20
Part 16: Nonlinear Effects [ 80/103]
Heckman’s Monte Carlo Study
Part 16: Nonlinear Effects [ 81/103]
Some Familiar Territory – A Monte Carlo Study of the FE Estimator: Probit vs. Logit
(Greene, The Econometrics Journal, 7, 2004, pp. 98-119)
Estimates of Coefficients and Marginal Effects at the Implied Data Means
Results are scaled so the desired quantity being estimated (, , marginal effects) all equal 1.0 in the population.
Part 16: Nonlinear Effects [ 82/103]
A Monte Carlo Study of the FE Probit Estimator
Percentage Biases in Estimates of Coefficients and Marginal Effects at the Implied Data Means
Part 16: Nonlinear Effects [ 83/103]
Dynamic Models
it it i,t 1 i it
it i,t 1 i0 it
y 1[x y u > 0]
Two 'effects' with similar impact on observations
Unobserved time persistent heterogeneity
State dependence = state 'persistence'
Pr(y 1| y ,...,y ,x ,u] F
it i,t 1 i[x y u]
How to estimate , , marginal effects, F(.), etc?
(1) Deal with the latent common effect
(a) Random effects approaches
(b) Fixed effects approaches
(2) Handling the lagged effects: The initial conditions problem.
Part 16: Nonlinear Effects [ 84/103]
Application – Doctor VisitsRiphahn, Million Wambach, JAE, 2003
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. DOCTOR = 1(Number of doctor visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped)HHKIDS = children under age 16 in the household = 1; otherwise = 0EDUC = years of schooling AGE = age in yearsMARRIED = marital statusEDUC = years of education
Part 16: Nonlinear Effects [ 85/103]
Application: InnovationsBertschek and Lechner, J of Econometrics, 1998
Part 16: Nonlinear Effects [ 86/103]
ApplicationStewart, JAE, 2007
British Household Panel Survey (1991-1996) 3060 households retained (balanced) out of
4739 total. Unemployment indicator (0.1) Data features
Panel data – unobservable heterogeneity State persistence: “Someone unemployed at t-1
is more than 20 times as likely to be unemployed at t as someone employed at t-1.”
Part 16: Nonlinear Effects [ 87/103]
Application: Direct Approach
Part 16: Nonlinear Effects [ 88/103]
GHK Simulation/Estimation
The presence of the autocorrelation and state dependence in the model invalidate the simple maximum likelihood procedures we examined earlier. The appropriate likelihood function is constructed by formulating the probabilities as
Prob( yi,0, yi,1, . . .) = Prob(yi,0) × Prob(yi,1 | yi,0) × ・ ・ ・ ×Prob(yi,T | yi,T-1) .
This still involves a T = 7 order normal integration, which is approximated in the study using a simulator similar to the GHK simulator.
Part 16: Nonlinear Effects [ 89/103]
A Dynamic RE Probit
i,t i,t i,t 1 i i,t
i i,t
i,0
Habit persistence and latent heterogeneity
y 1[ y 0], t = 1,...,T; i=1,...,N
Fixed effects assumption; Cov[ ,x ] may not be 0.
Initial condition y is observed.
Mundlak (1978),
x
i i
2i i i i u
i,t i,t i,t 1 i i i,t
Chamberlain (1984): Project on
u , u ~ N[0, ]
Implies a random effects model:
y 1[ y u 0], t = 1,...,T; i=1,...,N
X
x
x x
Part 16: Nonlinear Effects [ 90/103]
Problems with Dynamic RE Probit
Assumes yi,0 and the effects are uncorrelated
Assumes the initial conditions are exogenous – OK if the process and the observation begin at the same time, not if different.
Doesn’t allow time invariant variables in the model.
The normality assumption in the projection.
Part 16: Nonlinear Effects [ 91/103]
Heckman’s Solution
i,t i,t i,t 1 i i,t
i i i
i,t i,t i,t 1 i i i,t
Dynamic model
y 1[ y 0], t = 1,...,T; i=1,...,N
Use Mundlak device as before; u
y 1[ y u 0], t = 1,...,T; i=1,...,N
Explicit model for the initi
x
x
x x
2i,0 i,0 i i h i,0 i,0
i i i,t
i i i i i,0 i,0
al condition. ("Reduced form")
y 1[ h 0],h ~N[0, ] ( includes and instruments)
h correlated with u but uncorrelated with .
Project h on u so h u , uncorrelated wi
z z x
i i,t
i,0 i,0 i i,0
th u and .
y 1[ u 0]
Random effects model, but the initial period is different.
z
Part 16: Nonlinear Effects [ 92/103]
Dynamic Probit Model: A “Simplified” Approach (Wooldridge, 2005)
T
i1 i2 iT i0 i i,t 1 i itt 1
i1 i2 iT i0 i1 i2
(1) Conditioned on all effects, joint probability
P(y ,y ,...,y | y , ,u) F( y u ,y )
(2) Unconditional density; integrate out the common effect
P(y ,y ,...,y | y , ) P(y ,y ,
i it
i
x x β
x iT i0 i i i0 i
2i i0 i0 u
i i0 i
i1 i2 iT i0
...,y | y , ,u)h(u | y , )du
(3) (The rabbit in the hat) Density for heterogeneity
h(u | y , ) N[ y , ] so
u = y + w
(4) Reduced form
P(y ,y ,...,y | y , )
i i
i i
i
i
x x
x x δ
x δ
xT
i,t 1 i0 w i it i it 1 F( y y w ,y )h(w )dw
This is a Butler-Moffit style random effects model
it ix β x δ
Part 16: Nonlinear Effects [ 93/103]
Distributional Problem Normal distributions assumed throughout
Normal distribution for the unique component, εi,t
Normal distribution assumed for the heterogeneity, ui
Sensitive to the distribution? Alternative: Discrete distribution for ui.
Heckman and Singer style, latent class model. Conventional estimation methods.
Why is the model not sensitive to normality for εi,t but it is sensitive to normality for ui?
Part 16: Nonlinear Effects [ 94/103]
A Multinomial Logit Common Effects Model
How to handle unobserved effects in other nonlinear models? Single index models such as probit, Poisson,
tobit, etc. that are functions of an xit'β can be modified to be functions of xit'β + ci.
Other models – not at all obvious. Rarely found in the literature.
Dealing with fixed and random effects? Dynamics makes things much worse.
Part 16: Nonlinear Effects [ 95/103]
A Multinomial Logit Model
i,t i,t i,t
The multinomial logit model for unordered choices
U (j) ( j) ( j), j = 1,...,J (choice set)
t = 1,...,T (choice situations)
i =
x
i,t
i,t i,t i,t i,t i,t i,t
1,...,N (individuals)
( j) ~ I.I.D. Type 1 extreme value.
j * = j = index of choice such that U ( j *) U (k) for j * k.
How to modify the model to include common (random or
fixed) effects?
Part 16: Nonlinear Effects [ 96/103]
A Heterogeneous Multinomial Logit Model
i,t i,t i,t i
i,t i,t i,t i
i,t i,t i,t i
The multinomial logit model for unordered choices
U (1) (1) (1) u (1),
U (2) (2) (2) u (2)
...
U (J) (J) (J) u (J)
t = 1,...,T (choice situations)
i = 1,...,N (individua
x
x
x
i,t i,t i,t i,t i,t i,t
i,t
i i i i
ls)
j * = j = index of choice such that U ( j *) U (k) for j * k
( j) ~ I.I.D. Type 1 extreme value, j=1,...,J.
[u (1),u (2),...,u (J)] = J common individual effects.
[A dynamic version
u
of this model in Gong, et al., "Mobility in the
Urban Labor Market" IZA Working Paper 213, Bonn, 2000]
Part 16: Nonlinear Effects [ 97/103]
Common Effects Multinomial Logit
T i,t i,t i i,t
i i Jt 1j 1 i,t i
Fixed Effects is not feasible. Needs N sets of J
dummy variable coefficients (that sum to zero across choices).
Random Effects:
exp[ ( j *) u ( j *)]L |
exp[ ( j) u ( j)]
xu
x
i
T
t 1
T i,t i,t i i,ti i iJt 1
j 1 i,t i
Prob[choice made | (i)]
Unconditional contribution to the log likelihood for person i is
exp[ ( j *) u ( j *)]logL log f( )d
exp[ ( j) u ( j)]u
u
xu u
x
Part 16: Nonlinear Effects [ 98/103]
Simulation Based Estimation
i
i
T i,t i,t i i,ti i iJt 1
j 1 i,t i
TN i,t i,t i i,ti iJi 1 t 1
j 1 i,t i
N
i 1
exp[ ( j *) u ( j *)]logL log f( )d
exp[ ( j) u ( j)]
exp[ ( j *) u ( j *)]logL log f( )d
exp[ ( j) u ( j)]
1SimulatedLogL log
R
u
u
xu u
x
xu u
x
TR i,t i,t j* i,r i,t
Jr 1 t 1j 1 i,t j i,r
i,r
1 J
exp[ ( j *) v ( j *)]
exp[ ( j) v ( j)]
where v ( j) are random draws from the assumed population.
This function is maximized over and ,...,
x
x
Part 16: Nonlinear Effects [ 99/103]
Application Shoe Brand Choice
Simulated Data: Stated Choice, N=400 respondents, T=8 choice situations, 3,200 observations
3 choice/attributes + NONE J=4 Fashion = High / Low Quality = High / Low Price = 25/50/75,100 coded 1,2,3,4; and Price2
Heterogeneity: Sex, Age (<25, 25-39, 40+)
Underlying data generated by a 3 class latent class process (100, 200, 100 in classes)
Thanks to www.statisticalinnovations.com (Latent Gold)
Part 16: Nonlinear Effects [ 100/103]
Application
2i,t 1 i,t 2 i,t 3 i,t 4 i,t
S,1 i Y,1 i O,1 i i,t i
2i,t 1 i,t 2 i,t 3 i,t 4 i,t
S,2 i Y,2 i O,2 i i,t i
i
U (1) F Q P P
Sex Young Older (1) u (1)
U (2) F Q P P
Sex Young Older (2) u (2)
U
2,t 1 i,t 2 i,t 3 i,t 4 i,t
S,3 i Y,3 i O,3 i i,t i
i,t i,t
(3) F Q P P
Sex Young Older (3) u (3)
U (none) (none)
Part 16: Nonlinear Effects [ 101/103]
No Common Effects
+---------------------------------------------+| Start values obtained using MNL model || Log likelihood function -4119.500 |+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+ FASH | 1.45964424 .07748860 18.837 .0000 QUAL | 1.10637961 .07153725 15.466 .0000 PRICE | 2.31763951 3.98732636 .581 .5611 PRICESQ | -55.5527148 13.8684229 -4.006 .0001 ASC4 | .64637513 .24440240 2.645 .0082 B1_MAL1 | -.16751621 .10552035 -1.588 .1124 B1_YNG1 | -.58118337 .11969068 -4.856 .0000 B1_OLD1 | -.02600079 .14091863 -.185 .8536 B2_MAL2 | -.05966758 .10055110 -.593 .5529 B2_YNG2 | -.14991404 .11180414 -1.341 .1800 B2_OLD2 | -.15128297 .14133889 -1.070 .2845 B3_MAL3 | -.12076085 .09301010 -1.298 .1942 B3_YNG3 | -.12265952 .10419547 -1.177 .2391 B3_OLD3 | -.04753400 .12950649 -.367 .7136
Part 16: Nonlinear Effects [ 102/103]
Random Effects MNL Model+---------------------------------------------+| Error Components (Random Effects) model | Restricted logL = -4119.5| Log likelihood function -4112.495 | Chi squared(3) = 14.01 (Crit.Val.=7.81)+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+---------+Nonrandom parameters in utility functions FASH | 1.50759565 .08204283 18.376 .0000 QUAL | 1.14155991 .07884212 14.479 .0000 PRICE | 2.61115484 4.23285024 .617 .5373 PRICESQ | -58.0172769 14.7409678 -3.936 .0001 ASC4 | .72127357 .25703909 2.806 .0050 B1_MAL1 | -.19918832 .11818500 -1.685 .0919 B1_YNG1 | -.61263642 .12580875 -4.870 .0000 B1_OLD1 | -.03213515 .15732926 -.204 .8382 B2_MAL2 | -.04059494 .10950154 -.371 .7108 B2_YNG2 | -.12504492 .11986238 -1.043 .2968 B2_OLD2 | -.12470329 .14151490 -.881 .3782 B3_MAL3 | -.10619757 .10471334 -1.014 .3105 B3_YNG3 | -.10372335 .11851081 -.875 .3815 B3_OLD3 | -.02538899 .13269408 -.191 .8483---------+Standard deviations of latent random effects SigmaE01| .53459541 .09531536 5.609 .0000 SigmaE02| .01799747 .62983694 .029 .9772 SigmaE03| .03109637 .35256770 .088 .9297
Part 16: Nonlinear Effects [ 103/103]
Implementations of RE Models
Linear, Probit, Logit, Poisson, 1 or 2 other models: SAS, about 10 others
Linear, Probit, Logit, 4 or 5 others: MLWin (Using Bayesian MCMC methods)
Linear, Probit, Logit, Poisson, 4 or 5 others: Stata (using quadrature, Proc = GLAMM)
Linear, Probit, Logit, Poisson, MNL, Tobit, about 50 others: LIMDEP/NLOGIT (using maximum simulated likelihood)