Part 16: Nonlinear Effects [ 1/103] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business

Part 16: Nonlinear Effects [ 1/103]

Econometric Analysis of Panel Data

William Greene

Department of Economics

Stern School of Business

Econometric Analysis of Panel Data

16. Nonlinear Effects Models and Models for Binary Choice


The “Panel Probit Model”

it it it it it it

12 1Ti1

12 2,Ti2

iT 1T 2,T

The German innovation data: T=5 (N=1270)

y *= x + , y 1[x + > 0]

1 ...01 ...0

~N ,... ... ... ...

0 ... 1

β β


FIML

i5 i5 i1 i1

N

i1 i5i 1

(2y 1) (2y 1)N

1 5i 1

5/ 2 1/ 2 1

i1 i2 12 i1 i5 15

i1 i2 12 i2 i2 22

logL log Prob[y ,...,y ]

log ... g( | *)dv ...dv

g( | *) (2 ) | * | exp[ (1/ 2) ( *) ]

1 q q ... q q

q q 1 ... q q*

... ... ... ..

x β x βv Σ

v Σ Σ v' Σ v

Σ

i1 i5 1T i2 i5 2,T 1

it it

.

q q q q ... 1

q 2y 1

See Greene, W., “Convenient Estimators for the Panel Probit Model: Further Results,” Empirical Economics, 29, 1, Jan. 2004, pp. 21-48.


GMM

it i

i1 i1

i2 i2

i5 i5

From the marginal distributions:

E[y ( ) | ] 0 (note: strict exogeneity)

Suggests orthogonality conditions

(y ( ))

(y ( ))E

...

(y ( ))

it

i1

i2

i5

x β X

x β x 0

x β x 0

...

x β x 0

5*K moments.


GMM Estimation-1

i1 i1 i1 i1

1270 i2 i2i 1

i5 i5

Step 1. Pool the data and use probit to estimate .

Compute weighting matrix.

ˆ ˆ(y ( )) (y ( ))

ˆ1 1 (y ( )) (y12701270 ...

ˆ(y ( ))

i1 i1

i2

i5

β

x β x x β x

x β xW

x β x

i2 i2

i5 i5

ˆ( ))

...

ˆ(y ( ))

i2

i5

x β x'

x β x


GMM Estimation-2

-1

i1 i1

1270 i2 i2

i 1

i5 i5

Step 2. Minimize GMM criterion q = ( ) ( )

(y ( ))

(y ( ))1( )=

...1270

(y ( ))

Note: 8 parameters, 5(8)=40 moment equations.

i1

i2

i5

g β 'W g β

x β x

x β xg β

x β x


GEE Estimation


Fractional Response


Fractional Response Model


Fractional Response Model











Application: Health Care Panel DataGerman Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).

Variables in the file are

DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status


Unbalanced Panels

Group Sizes

Most theoretical results are for balanced panels.

Most real world panels are unbalanced.

Often the gaps are caused by attrition.

The major question is whether the gaps are ‘missing completely at random.’ If not, the observation mechanism is endogenous, and at least some methods will produce questionable results.

Researchers rarely have any reason to treat the data as nonrandomly sampled. (This is good news.)


Unbalanced Panels and Attrition ‘Bias’

Test for ‘attrition bias.’ (Verbeek and Nijman, Testing for Selectivity Bias in Panel Data Models, International Economic Review, 1992, 33, 681-703. Variable addition test using covariates of presence in the panel Nonconstructive – what to do next?

Do something about attrition bias. (Wooldridge, Inverse Probability Weighted M-Estimators for Sample Stratification and Attrition, Portuguese Economic Journal, 2002, 1: 117-139) Stringent assumptions about the process Model based on probability of being present in each wave of the panel


Panel Data Binary Choice ModelsRandom Utility Model for Binary Choice

Uit = + ’xit + it + Person i specific effect

Fixed effects using “dummy” variables

Uit = i + ’xit + it

Random effects using omitted heterogeneity

Uit = + ’xit + it + ui

Same outcome mechanism: Yit = 1[Uit > 0]


Pooled Model


Ignoring Unobserved Heterogeneity

it

it

it

it it

x

x β

x β

x β x δ

i it

it i it

it it i it

2it it u

Assuming strict exogeneity; Cov( ,u ) 0

y *= u

Prob[y 1| x ] Prob[u - ]

Using the same model format:

Prob[y 1| x ] F / 1+ F( )

This is the 'population averaged model.'


Ignoring Heterogeneity in the RE Model

Ignoring heterogeneity, we estimate not .

Partial effects are f( ) not f( )

is underestimated, but f( ) is overestimated.

Which way does it go? Maybe ignoring u is ok?

Not if we want

it it

it

δ β

δ x δ β x β

β x β

to compute probabilities or do

statistical inference about Estimated standard

errors will be too small.

β.


Ignoring Heterogeneity (Broadly) Presence will generally make parameter estimates look

smaller than they would otherwise. Ignoring heterogeneity will definitely distort standard

errors. Partial effects based on the parametric model may not

be affected very much. Is the pooled estimator ‘robust?’ Less so than in the

linear model case.


Pooled vs. RE Panel Estimator----------------------------------------------------------------------Binomial Probit ModelDependent variable DOCTOR --------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+-------------------------------------------------------------Constant| .02159 .05307 .407 .6842 AGE| .01532*** .00071 21.695 .0000 43.5257 EDUC| -.02793*** .00348 -8.023 .0000 11.3206 HHNINC| -.10204** .04544 -2.246 .0247 .35208--------+-------------------------------------------------------------Unbalanced panel has 7293 individuals--------+-------------------------------------------------------------Constant| -.11819 .09280 -1.273 .2028 AGE| .02232*** .00123 18.145 .0000 43.5257 EDUC| -.03307*** .00627 -5.276 .0000 11.3206 HHNINC| .00660 .06587 .100 .9202 .35208 Rho| .44990*** .01020 44.101 .0000--------+-------------------------------------------------------------


Partial Effects----------------------------------------------------------------------Partial derivatives of E[y] = F[*] withrespect to the vector of characteristicsThey are computed at the means of the XsObservations used for means are All Obs.--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Elasticity--------+------------------------------------------------------------- |Pooled AGE| .00578*** .00027 21.720 .0000 .39801 EDUC| -.01053*** .00131 -8.024 .0000 -.18870 HHNINC| -.03847** .01713 -2.246 .0247 -.02144--------+------------------------------------------------------------- |Based on the panel data estimator AGE| .00620*** .00034 18.375 .0000 .42181 EDUC| -.00918*** .00174 -5.282 .0000 -.16256 HHNINC| .00183 .01829 .100 .9202 .00101--------+-------------------------------------------------------------


Effect of Clustering Yit must be correlated with Yis across periods Pooled estimator ignores correlation Broadly, yit = E[yit|xit] + wit,

E[yit|xit] = Prob(yit = 1|xit) wit is correlated across periods

Assuming the marginal probability is the same, the pooled estimator is consistent. (We just saw that it might not be.)

Ignoring the correlation across periods generally leads to underestimating standard errors.


‘Cluster’ Corrected Covariance Matrix

1

the number if clusters

number of observations in cluster c

= negative inverse of second derivatives matrix

= derivative of log density for observation

c

ic

C

n

H

g


Cluster Correction: Doctor----------------------------------------------------------------------Binomial Probit ModelDependent variable DOCTORLog likelihood function -17457.21899--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- | Conventional Standard ErrorsConstant| -.25597*** .05481 -4.670 .0000 AGE| .01469*** .00071 20.686 .0000 43.5257 EDUC| -.01523*** .00355 -4.289 .0000 11.3206 HHNINC| -.10914** .04569 -2.389 .0169 .35208 FEMALE| .35209*** .01598 22.027 .0000 .47877--------+------------------------------------------------------------- | Corrected Standard ErrorsConstant| -.25597*** .07744 -3.305 .0009 AGE| .01469*** .00098 15.065 .0000 43.5257 EDUC| -.01523*** .00504 -3.023 .0025 11.3206 HHNINC| -.10914* .05645 -1.933 .0532 .35208 FEMALE| .35209*** .02290 15.372 .0000 .47877--------+-------------------------------------------------------------


Random Effects


Quadrature – Butler and Moffitt (1982)

xiN

ii 1

N

ii

T

it it

1

it 1 i

2

u

Th

F(y , v )

g(

v

1 -vexp

22

logL l

is method is used in most commerical software since

og dv

= log dv

(make a change of variable to w = v/ 2

v

198

)

2

=

N

ii 1

N H

i 1 h

2

h h

hh 1

1 l g( 2w)

g( 2z )

The values

og dw

The integral can be

of w (weights) and z

exp -w

(node

computed using

s) are found i

Hermite quadr

n published

tab

ature.

1

les such as A

log w

bramovitz and Stegun (or on the web). H is by

choice. Higher H produces greater accuracy (but takes longer).

2i u

u i

i

u ~ N[0, ]

= v

where v ~ N[0,1]


Quadrature Log Likelihood

i

i

N H

i 1 h

h

T

it it u ht 1

T

it it ht

h 1

N H

i 1 1 1h

After all the substitutions, the function to be maximized:

Not simple, but

F(y , 2 z )

F(y ,

w1

logL log

1 lo

feasib

)

l

wg z

e.

x

x


Simulation Based Estimator

xiN

ii 1 i

2i

T

it it u it 1

i

N

ii 1

N

i

i

i1

logL log dv

= log dv

T ]

The expected value of the functio

his equals log E[

n of v can be a

F(y ,

ppro

v

-v1

xima

ex

te

v )

g(v )

d

by dr

g(

aw

p22

v )

ing

xiTN R

S it it u ir

i

i 1

r

ir

r 1 t 1

R random draws v from the population N[0,1] and

averaging the R functions of v . We maxi

1logL log F(y , v

i

)R

mze


Random Effects Model: Quadrature----------------------------------------------------------------------Random Effects Binary Probit ModelDependent variable DOCTORLog likelihood function -16290.72192 Random EffectsRestricted log likelihood -17701.08500 PooledChi squared [ 1 d.f.] 2820.72616Estimation based on N = 27326, K = 5Unbalanced panel has 7293 individuals--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+-------------------------------------------------------------Constant| -.11819 .09280 -1.273 .2028 AGE| .02232*** .00123 18.145 .0000 43.5257 EDUC| -.03307*** .00627 -5.276 .0000 11.3206 HHNINC| .00660 .06587 .100 .9202 .35208 Rho| .44990*** .01020 44.101 .0000--------+------------------------------------------------------------- |Pooled EstimatesConstant| .02159 .05307 .407 .6842 AGE| .01532*** .00071 21.695 .0000 43.5257 EDUC| -.02793*** .00348 -8.023 .0000 11.3206 HHNINC| -.10204** .04544 -2.246 .0247 .35208--------+-------------------------------------------------------------


Random Parameter Model----------------------------------------------------------------------Random Coefficients Probit ModelDependent variable DOCTOR (Quadrature Based)

Log likelihood function -16296.68110 (-16290.72192) Restricted log likelihood -17701.08500Chi squared [ 1 d.f.] 2808.80780Simulation based on 50 Halton draws--------+-------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]--------+------------------------------------------------- |Nonrandom parameters AGE| .02226*** .00081 27.365 .0000 ( .02232) EDUC| -.03285*** .00391 -8.407 .0000 (-.03307) HHNINC| .00673 .05105 .132 .8952 ( .00660) |Means for random parametersConstant| -.11873** .05950 -1.995 .0460 (-.11819) |Scale parameters for dists. of random parametersConstant| .90453*** .01128 80.180 .0000--------+-------------------------------------------------------------

Using quadrature, a = -.11819. Implied from these estimates is.904542/(1+.904532) = .449998 compared to .44990 using quadrature.


A Dynamic Model

x

x

it it i,t 1 it i

it i,t 1 i0 it it i,t 1 i

y 1[ y u > 0]

Two similar 'effects'

Unobserved heterogeneity

State dependence = state 'persistence'

Pr(y 1| y ,...,y ,x ,u] F[ y u]

How to estimate , , marginal effects, F(.), etc?

(1) Deal with the latent common effect

(2) Handle the lagged effects:

This encounters the initial conditions problem.


Dynamic Probit Model: A Standard Approach

T

i1 i2 iT i0 i i,t 1 i itt 1

i1 i2 iT i0

(1) Conditioned on all effects, joint probability

P(y ,y ,...,y | y , ,u ) F( y u ,y )

(2) Unconditional density; integrate out the common effect

P(y ,y ,...,y | y , )

i it

i

x x β

x

i1 i2 iT i0 i i i0 i

2i i0 i0 u i i1 i2 iT

i i0 u i

P(y ,y ,...,y | y , ,u )h(u | y , )du

(3) Density for heterogeneity

h(u | y , ) N[ y , ], = [ , ,..., ], so

u = y + w (contains every

i i

i i

i

x x

x x δ x x x x

x δ

it

i1 i2 iT i0

T

i,t 1 i0 u i it i it 1

period of )

(4) Reduced form

P(y ,y ,...,y | y , )

F( y y w ,y )h(w )dw

This is a random effects model

i

it i

x

x

x β x δ


Simplified Dynamic Model

i

2i i0 i0 u

i i0 i

Projecting u on all observations expands the model enormously.

(3) Projection of heterogeneity only on group means

h(u | y , ) N[ y , ] so

u = y + w

(4) Re

i i

i

x x δ

x δ

i1 i2 iT i0

T

i,t 1 i0 u i it i it 1

duced form

P(y ,y ,...,y | y , )

F( y y w ,y )h(w )dw

Mundlak style correction with the initial value in the equation.

This is (again) a random effects mo

i

it i

x

x β x δ

del


A Dynamic Model for Public Insurance

AgeHousehold IncomeKids in the householdHealth Status

Add initial value, lagged value, group means


Dynamic Common Effects Model


Fixed Effects


Fixed Effects Models Estimate with dummy variable coefficients Uit = i + ’xit + it

Can be done by “brute force” for 10,000s of individuals

F(.) = appropriate probability for the observed outcome Compute and i for i=1,…,N (may be large) See FixedEffects.pdf in course materials.

1 1log log ( , )iN T

it i iti tL F y

x


Unconditional Estimation Maximize the whole log likelihood

Difficult! Many (thousands) of parameters.

Feasible – NLOGIT (2001) (“Brute force”)


Fixed Effects Health ModelGroups in which yit is always = 0 or always = 1. Cannot compute αi.


Conditional Estimation Principle: f(yi1,yi2,… | some statistic) is free

of the fixed effects for some models. Maximize the conditional log likelihood, given

the statistic. Can estimate β without having to estimate αi. Only feasible for the logit model. (Poisson

and a few other continuous variable models. No other discrete choice models.)


Binary Logit Conditional Probabiities

i

i

1 1 2 2

1 1

1

T

S1 1

All

Prob( 1| ) .1

Prob , , ,

exp exp

exp exp

i it

i it

i i

i i

i i

t i

i

t i

it it

i i i i iT iT

T T

it it it itt t

T T

it it it i

T

itt

td St t

ey

e

Y y Y y Y y

y

d d

y

y

x

xx

x x

x x

β

β

i

t i

different

iT

it t=1 i

ways that

can equal S

.

Denominator is summed over all the different combinations of T valuesof y that sum to the same sum as the observed . If S is this sum,

there are

it

it

d

y

i

terms. May be a huge number. An algorithm by KrailoS

and Pike makes it simple.

T


Example: Two Period Binary Logit

i it

i it

i

i

i i i

t it i

it it

T

it itTt 1

i1 i1 i2 i2 iT iT it Tt 1

it itd St 1

2

i1 i2 itt 1

i1 i

eProb(y 1| ) .

1 e

exp y

Prob Y y , Y y , , Y y y ,data .

exp d

Prob Y 0, Y 0 y 0,data 1.

Prob Y 1, Y

x β

x β

x

x

x

2

2 itt 12

i1 i2 itt 12

i1 i2 itt 1

exp( )0 y 1,data

exp( ) exp( )exp( )

Prob Y 0, Y 1 y 1,data exp( ) exp( )

Prob Y 1, Y 1 y 2,data 1.

i1

i1 i2

i2

i1 i2

x βx β x β

x βx β x β


Estimating Partial Effects

“The fixed effects logit estimator of immediately gives us the effect of each element of xi on the log-odds ratio… Unfortunately, we cannot estimate the partial effects… unless we plug in a value for αi. Because the distribution of αi is unrestricted – in particular, E[αi] is not necessarily zero – it is hard to know what to plug in for αi. In addition, we cannot estimate average partial effects, as doing so would require finding E[Λ(xit + αi)], a task that apparently requires specifying a distribution for αi.”

(Wooldridge, 2002)


Logit Constant Terms

ii

i

i

ˆT i it

i ˆt 1i i i i

Step 1. Estimate with Chamberlain's conditional estimator

Step 2. Treating as if it were known, estimate from the

first order condition

c1 e e 1y

T T 1 c1 e e

it

it

x β

x β

β

β

i iT T itt 1 t 1

t i i it

i i i i

it

i

i

c1T c

Estimate 1 / exp( ) log

ˆc exp is treated as known data.

Solve one equation in one unknown for each .

Note there is no solution if y = 0 or 1.

Iterating back

itx β

and forth does not maximize logL.


Fixed Effects Logit Health Model: Conditional vs. Unconditional


Advantages and Disadvantages of the FE Model

Advantages Allows correlation of effect and regressors Fairly straightforward to estimate Simple to interpret

Disadvantages Model may not contain time invariant variables Not necessarily simple to estimate if very large

samples (Stata just creates the thousands of dummy variables)

The incidental parameters problem: Small T bias


Incidental Parameters Problems: Conventional Wisdom

General: The unconditional MLE is biased in samples with fixed T except in special cases such as linear or Poisson regression (even when the FEM is the right model). The conditional estimator (that bypasses

estimation of αi) is consistent.

Specific: Upward bias (experience with probit and logit) in estimators of


A Monte Carlo Study of the FE Estimator: Probit vs. Logit

Estimates of Coefficients and Marginal Effects at the Implied Data Means

Results are scaled so the desired quantity being estimated (, , marginal effects) all equal 1.0 in the population.


Fixed Effects Attention mostly focused on index function

models; f(yit|xit) = some function of xit’β+αi. Incidental parameters problems

Bias of estimator of β is O(1/T) How do we estimate αi? How can we compute interesting partial effects?

Models Linear model: No problem Poisson (nonlinear model): No problem 1 or 2 other models: No problem Other nonlinear models: The literature speaks in

generalities The probit and logit models have been analyzed at length Almost nothing is known about any other model save for

Greene’s (2002-2004) limited Monte Carlo studies (frontier, tobit, truncation, ordered probit, probit, logit)


Bias Correction Estimators Motivation: Undo the incidental parameters bias in the

fixed effects probit model: (1) Maximize a penalized log likelihood function, or (2) Directly correct the estimator of β

Advantages For (1) estimates αi so enables partial effects Estimator is consistent under some circumstances (Possibly) corrects in dynamic models

Disadvantage No time invariant variables in the model Practical implementation Extension to other models? (Ordered probit model (maybe) – see

JBES 2009)


Bias Reduction Parametric (probit and logit) models with fixed effects (We examine non- and semiparametric methods at the end

of the course.) Recent references: All about probit and logit models.

[1] Carro, J., “Estimating dynamic panel data discrete choice models with fixed effects,” JE, 140, 2007, pp. 503-528

[2] Val, F., “Fixed Effects estimation of structural parameters and marginal effects in panel probit models,” JE, 2010

[3] Hahn, J. and G. Kuersteiner, “Bias reduction for dynamic nonlinear panel models with fixed effects,” UCLA, 2003

[4] Hahn, J. and W. Newey, “Jackknife and Analytical Bias reduction for nonlinear panel models,” Econometrica, 2004.

See, also, bibliographies and work of T. Woutersen, B. Honoré and E. Kyriazidou.


Bias Reduction – 1: Hahn All rely on a large T approximation to the bias

when T is (very small) All analyze the equivalent of the brute force,

unconditional estimator. Hahn/Kuersteiner and Hahn/Newey

plim bMLE = β + B(T) where B(T) is O(1/T) Derive an expression for B(T) The bias corrected estimator is obtained by subtraction No further analysis is obtained to estimate fixed effects

or partial effects.


Bias Reduction – 2: Val

Plim bMLE = β + B(T) Find, D(T) a large sample approximation

such that Plim bMLE +D(T) = β + F(T) where F(T) is O(1/T2)

Finds a counterpart approximation to the marginal effects.


Bias Reduction – 3: Carro

Change the log likelihood. Maximum Modified Likelihood Estimator = MMLE

Maximize MMLE such that the solution to MMLE is bMMLE plim bMMLE = β +G(T) where G(T) is O(1/T2).

Also obtains a solution for αi (unlike the others).

ai,MMLE = f(bMMLE)

(A problem? When yit is always the same, there is no solution for ai.)


Bias Reduction?

Approximations rely on large T Work “moderately well” when T is as low

as 8 or 10. Completely miss the mark when T=2, 3,4 Nothing is known about any other models.


A Mundlak Correction for the FE Model

*it i

it it

i

y ,i = 1,...,N; t = 1,...,T

y 1 if y > 0, 0 otherwise.

(Projection, not necessarily conditional mean)

w

i it it

i iu

Fixed Effects Model :

x

Mundlak (Wooldridge, Heckman, Chamberlain), ...

x

u 1 2

*it

here u is normally distributed with mean zero and standard

deviation and is uncorrelated with or ( , ,..., )

y ,i = 1,...,N; t = 1,..

i i i iT

i it it iu

x x x x

Reduced form random effects model

x x i

it it

.,T

y 1 if y > 0, 0 otherwise.


Mundlak Correction


A Variable Addition Test for FE vs. RE

The Wald statistic of 45.27922 and the likelihood ratio statistic of 40.280 are both far larger than the critical chi squared with 5 degrees of freedom, 11.07. This suggests that for these data, the fixed effects model is the preferred framework.


Fixed Effects Models Summary Incidental parameters problem if T < 10 (roughly) Inconvenience of computation Appealing specification Alternative semiparametric estimators?

Theory not well developed for T > 2 Not informative for anything but slopes (e.g.,

predictions and marginal effects) Ignoring the heterogeneity definitely produces an

inconsistent estimator (even with cluster correction!) A Hobson’s choice Mundlak correction is a useful common approach.


Conditional vs. UnconditionalDep. Var. = Healthy

Note, this estimator is not consistent – Incidental Parameters Problem


Escaping the FE Assumptions

it it i it i i i

it it i it i

Chamberlain (again)

Structure

Prob(y 1| x ) F( x ), w

Reduced form is a random effects model

Prob(y 1| x ) F( x w )

(Does not allow time invariant effects (again).)

Estimation:

x

x

(1) FIML

(2) Period by period, then reconcile with minimum distance


Modeling a Binary Outcome

Did firm i produce a product or process innovation in year t ? yit : 1=Yes/0=No

Observed N=1270 firms for T=5 years, 1984-1988 Observed covariates: xit = Industry, competitive

pressures, size, productivity, etc. How to model?

Binary outcome Correlation across time Heterogeneity across firms


Application



Estimates of a Fixed Effects Probit Model

-----------------------------------------------------------------------------FIXED EFFECTS Probit ModelDependent variable IPLog likelihood function -2087.22475Estimation based on N = 6350, K = 953Inf.Cr.AIC = 6080.4 AIC/N = .958Model estimated: Apr 16, 2013, 10:00:53Unbalanced panel has 1270 individualsSkipped 552 groups with inestimable aiPROBIT (normal) probability model--------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence IP| Coefficient Error z |z|>Z* Interval--------+-------------------------------------------------------------------- |Index function for probability EMPLP| .12108D-05 .00015 .01 .9934 -.28688D-03 .28930D-03LOGSALES| -.53108 .34472 -1.54 .1234 -1.20672 .14457 IMUM| 4.26647 2.87407 1.48 .1377 -1.36660 9.89953 FDIUM| -7.34800** 3.31144 -2.22 .0265 -13.83830 -.85770--------+--------------------------------------------------------------------Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.Note: ***, **, * ==> Significance at 1%, 5%, 10% level.-----------------------------------------------------------------------------


+---------------------------------------------+| Probit Regression Start Values for IP |+---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ EMPLP .00013619 .154784D-04 8.798 .0000 580.944724 LOGSALES .13668535 .01792267 7.626 .0000 10.5400961 IMUM .89572747 .14180988 6.316 .0000 .25275054 FDIUM 3.24193536 .38236984 8.479 .0000 .04580618 Constant -1.63354524 .20737277 -7.877 .0000+---------------------------------------------+| Random Effects Binary Probit Model |+---------+--------------+----------------+--------+---------+----------+ EMPLP .00017616 .117150D-04 15.037 .0000 580.944724 LOGSALES .21174534 .04309101 4.914 .0000 10.5400961 IMUM 1.41657383 .34121909 4.152 .0000 .25275054 FDIUM 4.41817066 .83712165 5.278 .0000 .04580618 Constant -2.51015928 .49459030 -5.075 .0000 Rho .58588783 .01864491 31.423 .0000+---------------------------------------------+| FIXED EFFECTS Probit Model |+---------+--------------+----------------+--------+---------+----------+ EMPLP .121081D-05 .00014700 .008 .9934 419.786630 LOGSALES -.53108315 .34473601 -1.541 .1234 10.5368540 IMUM 4.26652343 2.87418573 1.484 .1377 .25359436 FDIUM -7.34808205 3.31155361 -2.219 .0265 .04444097

Pooled, Fixed Effects and Random Effects Probit


Fixed Effects

Advantages Allows correlation of effect and regressors Fairly straightforward to estimate Simple to interpret

Disadvantages Not necessarily simple to estimate if very large

samples (Stata just creates the thousands of dummy variables)

The incidental parameters problem: Small T bias. No time invariant variables


Incidental Parameters Problems: Conventional Wisdom

General: Biased in samples with fixed T except in special cases such as linear or Poisson regression

Specific: Upward bias (experience with probit and logit) in estimators of


What We KNOW - Analytic

Newey and Hahn: MLE converges in probability to a vector of constants. (Variance diminishes with increase in N).

Abrevaya and Hsiao: Logit estimator converges to 2 when T = 2.

Han, Schmidt, Greene: Probit estimator converges to 2 when T = 2.


What We THINK We Know – Monte Carlo

Heckman: Bias in probit estimator is small if T 8 Bias in probit estimator is toward 0 in some

cases Katz (et al – numerous others), Greene

Bias in probit and logit estimators is large Upward bias persists even as T 20


Heckman’s Monte Carlo Study


Some Familiar Territory – A Monte Carlo Study of the FE Estimator: Probit vs. Logit

(Greene, The Econometrics Journal, 7, 2004, pp. 98-119)

Estimates of Coefficients and Marginal Effects at the Implied Data Means

Results are scaled so the desired quantity being estimated (, , marginal effects) all equal 1.0 in the population.


A Monte Carlo Study of the FE Probit Estimator

Percentage Biases in Estimates of Coefficients and Marginal Effects at the Implied Data Means


Dynamic Models

it it i,t 1 i it

it i,t 1 i0 it

y 1[x y u > 0]

Two 'effects' with similar impact on observations

Unobserved time persistent heterogeneity

State dependence = state 'persistence'

Pr(y 1| y ,...,y ,x ,u] F

it i,t 1 i[x y u]

How to estimate , , marginal effects, F(.), etc?

(1) Deal with the latent common effect

(a) Random effects approaches

(b) Fixed effects approaches

(2) Handling the lagged effects: The initial conditions problem.


Application – Doctor VisitsRiphahn, Million Wambach, JAE, 2003

German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. DOCTOR = 1(Number of doctor visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped)HHKIDS = children under age 16 in the household = 1; otherwise = 0EDUC = years of schooling AGE = age in yearsMARRIED = marital statusEDUC = years of education


Application: InnovationsBertschek and Lechner, J of Econometrics, 1998


ApplicationStewart, JAE, 2007

British Household Panel Survey (1991-1996) 3060 households retained (balanced) out of

4739 total. Unemployment indicator (0.1) Data features

Panel data – unobservable heterogeneity State persistence: “Someone unemployed at t-1

is more than 20 times as likely to be unemployed at t as someone employed at t-1.”


Application: Direct Approach


GHK Simulation/Estimation

The presence of the autocorrelation and state dependence in the model invalidate the simple maximum likelihood procedures we examined earlier. The appropriate likelihood function is constructed by formulating the probabilities as

Prob( yi,0, yi,1, . . .) = Prob(yi,0) × Prob(yi,1 | yi,0) × ・・・ ×Prob(yi,T | yi,T-1) .

This still involves a T = 7 order normal integration, which is approximated in the study using a simulator similar to the GHK simulator.


A Dynamic RE Probit

i,t i,t i,t 1 i i,t

i i,t

i,0

Habit persistence and latent heterogeneity

y 1[ y 0], t = 1,...,T; i=1,...,N

Fixed effects assumption; Cov[ ,x ] may not be 0.

Initial condition y is observed.

Mundlak (1978),

x

i i

2i i i i u

i,t i,t i,t 1 i i i,t

Chamberlain (1984): Project on

u , u ~ N[0, ]

Implies a random effects model:

y 1[ y u 0], t = 1,...,T; i=1,...,N

X

x

x x


Problems with Dynamic RE Probit

Assumes yi,0 and the effects are uncorrelated

Assumes the initial conditions are exogenous – OK if the process and the observation begin at the same time, not if different.

Doesn’t allow time invariant variables in the model.

The normality assumption in the projection.


Heckman’s Solution

i,t i,t i,t 1 i i,t

i i i

i,t i,t i,t 1 i i i,t

Dynamic model

y 1[ y 0], t = 1,...,T; i=1,...,N

Use Mundlak device as before; u

y 1[ y u 0], t = 1,...,T; i=1,...,N

Explicit model for the initi

x

x

x x

2i,0 i,0 i i h i,0 i,0

i i i,t

i i i i i,0 i,0

al condition. ("Reduced form")

y 1[ h 0],h ~N[0, ] ( includes and instruments)

h correlated with u but uncorrelated with .

Project h on u so h u , uncorrelated wi

z z x

i i,t

i,0 i,0 i i,0

th u and .

y 1[ u 0]

Random effects model, but the initial period is different.

z


Dynamic Probit Model: A “Simplified” Approach (Wooldridge, 2005)

T

i1 i2 iT i0 i i,t 1 i itt 1

i1 i2 iT i0 i1 i2

(1) Conditioned on all effects, joint probability

P(y ,y ,...,y | y , ,u) F( y u ,y )

(2) Unconditional density; integrate out the common effect

P(y ,y ,...,y | y , ) P(y ,y ,

i it

i

x x β

x iT i0 i i i0 i

2i i0 i0 u

i i0 i

i1 i2 iT i0

...,y | y , ,u)h(u | y , )du

(3) (The rabbit in the hat) Density for heterogeneity

h(u | y , ) N[ y , ] so

u = y + w

(4) Reduced form

P(y ,y ,...,y | y , )

i i

i i

i

i

x x

x x δ

x δ

xT

i,t 1 i0 w i it i it 1 F( y y w ,y )h(w )dw

This is a Butler-Moffit style random effects model

it ix β x δ


Distributional Problem Normal distributions assumed throughout

Normal distribution for the unique component, εi,t

Normal distribution assumed for the heterogeneity, ui

Sensitive to the distribution? Alternative: Discrete distribution for ui.

Heckman and Singer style, latent class model. Conventional estimation methods.

Why is the model not sensitive to normality for εi,t but it is sensitive to normality for ui?


A Multinomial Logit Common Effects Model

How to handle unobserved effects in other nonlinear models? Single index models such as probit, Poisson,

tobit, etc. that are functions of an xit'β can be modified to be functions of xit'β + ci.

Other models – not at all obvious. Rarely found in the literature.

Dealing with fixed and random effects? Dynamics makes things much worse.


A Multinomial Logit Model

i,t i,t i,t

The multinomial logit model for unordered choices

U (j) ( j) ( j), j = 1,...,J (choice set)

t = 1,...,T (choice situations)

i =

x

i,t

i,t i,t i,t i,t i,t i,t

1,...,N (individuals)

( j) ~ I.I.D. Type 1 extreme value.

j * = j = index of choice such that U ( j *) U (k) for j * k.

How to modify the model to include common (random or

fixed) effects?


A Heterogeneous Multinomial Logit Model

i,t i,t i,t i

i,t i,t i,t i

i,t i,t i,t i

The multinomial logit model for unordered choices

U (1) (1) (1) u (1),

U (2) (2) (2) u (2)

...

U (J) (J) (J) u (J)

t = 1,...,T (choice situations)

i = 1,...,N (individua

x

x

x

i,t i,t i,t i,t i,t i,t

i,t

i i i i

ls)

j * = j = index of choice such that U ( j *) U (k) for j * k

( j) ~ I.I.D. Type 1 extreme value, j=1,...,J.

[u (1),u (2),...,u (J)] = J common individual effects.

[A dynamic version

u

of this model in Gong, et al., "Mobility in the

Urban Labor Market" IZA Working Paper 213, Bonn, 2000]


Common Effects Multinomial Logit

T i,t i,t i i,t

i i Jt 1j 1 i,t i

Fixed Effects is not feasible. Needs N sets of J

dummy variable coefficients (that sum to zero across choices).

Random Effects:

exp[ ( j *) u ( j *)]L |

exp[ ( j) u ( j)]

xu

x

i

T

t 1

T i,t i,t i i,ti i iJt 1

j 1 i,t i

Prob[choice made | (i)]

Unconditional contribution to the log likelihood for person i is

exp[ ( j *) u ( j *)]logL log f( )d

exp[ ( j) u ( j)]u

u

xu u

x


Simulation Based Estimation

i

i

T i,t i,t i i,ti i iJt 1

j 1 i,t i

TN i,t i,t i i,ti iJi 1 t 1

j 1 i,t i

N

i 1


exp[ ( j) u ( j)]


exp[ ( j) u ( j)]

1SimulatedLogL log

R

u

u

xu u

x

xu u

x

TR i,t i,t j* i,r i,t

Jr 1 t 1j 1 i,t j i,r

i,r

1 J

exp[ ( j *) v ( j *)]

exp[ ( j) v ( j)]

where v ( j) are random draws from the assumed population.

This function is maximized over and ,...,

x

x


Application Shoe Brand Choice

Simulated Data: Stated Choice, N=400 respondents, T=8 choice situations, 3,200 observations

3 choice/attributes + NONE J=4 Fashion = High / Low Quality = High / Low Price = 25/50/75,100 coded 1,2,3,4; and Price2

Heterogeneity: Sex, Age (<25, 25-39, 40+)

Underlying data generated by a 3 class latent class process (100, 200, 100 in classes)

Thanks to www.statisticalinnovations.com (Latent Gold)


Application

2i,t 1 i,t 2 i,t 3 i,t 4 i,t

S,1 i Y,1 i O,1 i i,t i

2i,t 1 i,t 2 i,t 3 i,t 4 i,t

S,2 i Y,2 i O,2 i i,t i

i

U (1) F Q P P

Sex Young Older (1) u (1)

U (2) F Q P P


U

2,t 1 i,t 2 i,t 3 i,t 4 i,t

S,3 i Y,3 i O,3 i i,t i

i,t i,t

(3) F Q P P


U (none) (none)


No Common Effects

+---------------------------------------------+| Start values obtained using MNL model || Log likelihood function -4119.500 |+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+ FASH | 1.45964424 .07748860 18.837 .0000 QUAL | 1.10637961 .07153725 15.466 .0000 PRICE | 2.31763951 3.98732636 .581 .5611 PRICESQ | -55.5527148 13.8684229 -4.006 .0001 ASC4 | .64637513 .24440240 2.645 .0082 B1_MAL1 | -.16751621 .10552035 -1.588 .1124 B1_YNG1 | -.58118337 .11969068 -4.856 .0000 B1_OLD1 | -.02600079 .14091863 -.185 .8536 B2_MAL2 | -.05966758 .10055110 -.593 .5529 B2_YNG2 | -.14991404 .11180414 -1.341 .1800 B2_OLD2 | -.15128297 .14133889 -1.070 .2845 B3_MAL3 | -.12076085 .09301010 -1.298 .1942 B3_YNG3 | -.12265952 .10419547 -1.177 .2391 B3_OLD3 | -.04753400 .12950649 -.367 .7136


Random Effects MNL Model+---------------------------------------------+| Error Components (Random Effects) model | Restricted logL = -4119.5| Log likelihood function -4112.495 | Chi squared(3) = 14.01 (Crit.Val.=7.81)+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+---------+Nonrandom parameters in utility functions FASH | 1.50759565 .08204283 18.376 .0000 QUAL | 1.14155991 .07884212 14.479 .0000 PRICE | 2.61115484 4.23285024 .617 .5373 PRICESQ | -58.0172769 14.7409678 -3.936 .0001 ASC4 | .72127357 .25703909 2.806 .0050 B1_MAL1 | -.19918832 .11818500 -1.685 .0919 B1_YNG1 | -.61263642 .12580875 -4.870 .0000 B1_OLD1 | -.03213515 .15732926 -.204 .8382 B2_MAL2 | -.04059494 .10950154 -.371 .7108 B2_YNG2 | -.12504492 .11986238 -1.043 .2968 B2_OLD2 | -.12470329 .14151490 -.881 .3782 B3_MAL3 | -.10619757 .10471334 -1.014 .3105 B3_YNG3 | -.10372335 .11851081 -.875 .3815 B3_OLD3 | -.02538899 .13269408 -.191 .8483---------+Standard deviations of latent random effects SigmaE01| .53459541 .09531536 5.609 .0000 SigmaE02| .01799747 .62983694 .029 .9772 SigmaE03| .03109637 .35256770 .088 .9297


Implementations of RE Models

Linear, Probit, Logit, Poisson, 1 or 2 other models: SAS, about 10 others

Linear, Probit, Logit, 4 or 5 others: MLWin (Using Bayesian MCMC methods)

Linear, Probit, Logit, Poisson, 4 or 5 others: Stata (using quadrature, Proc = GLAMM)

Linear, Probit, Logit, Poisson, MNL, Tobit, about 50 others: LIMDEP/NLOGIT (using maximum simulated likelihood)

Documents

Part 16: Nonlinear Effects [ 1/103] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business