118
Gamma MPH Est Hyp Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J. Heckman Tinbergen Institute, University of Chicago Amsterdam American Bar Foundation University College Dublin Econ 312, Spring 2019 1 / 118

Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Miscellaneous Topics In Single-Spell Duration

Analysis

Jaap H. Abbring James J. HeckmanTinbergen Institute, University of Chicago

Amsterdam American Bar FoundationUniversity College Dublin

Econ 312, Spring 2019

1 / 118

Page 2: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

More On The Gamma Heterogeneity Distribution

Abbring and Van den Berg (2001) give some more motivationfor the choice of the gamma distribution as the mixingdistribution.

Recall that the distribution of V |(T ≥ t,X ) is given by

Pr(V ≤ v |T ≥ t,X ) =

∫ v

0exp(−z(t)h0(X )y)dF (y)∫∞

0exp(−z(t)h0(X )y)dF (y)

. (1)

This is the distribution of the unobservables in a cohort withobserved characteristics X that has survived up to time t.

As we have seen before, this distribution is not the same as theunconditional distribution F (even if V ⊥⊥ X ), because ofdynamic sorting.

2 / 118

Page 3: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

However, it is well known that it is a gamma distribution if theoriginal distribution F is a gamma distributions (e.g.,Lancaster, 1990).

More precisely, if F ≡ Γα,ρ, then V |(T ≥ t,X ) is distributedwith c.d.f., Γz(t)h0(X )+α,ρ.

Then, (z(t)h0(X ) + α)V |(T ≥ t,X ) has a standard gammadistribution, for all t ≥ 0.

If we interpret (1) as a mapping from the space of distributionfunctions with positive support into itself, another way ofsaying this is that the class of gamma distributions is closedunder this mapping.

In particular, the gamma distribution is a fixed point of anappropriately rescaled version of this mapping.

3 / 118

Page 4: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

This suggests that the distribution of the unobservables in acohort of unemployment may converge to a gamma distributionif the cohort grows older.

Indeed, Abbring and Van den Berg (2001) show that thedistribution of

z(t)h0(X )V |(T ≥ t,X )

converges to a standard gamma distribution as t →∞increases, provided that F satisfies certain regularity conditions.

4 / 118

Page 5: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Some intuition for this can be found for the case in which F iscontinuous.

Note that, in that case,

Pr(z(t)h0(X )V ≤ v |T ≥ t,X )

= Pr(V ≤ v/(z(t)h0(X ))|T ≥ t,X )

=

∫ v/(z(t)h0(X ))

0exp(−z(t)h0(X )y)g(y) dy∫∞

0exp(−z(t)h0(X )y)g(y) dy

=

∫ v

0exp(−y)g(y/(z(t)h0(X )))dy∫∞

0exp(−y)g(y/(z(t)h0(X )))dy

.

5 / 118

Page 6: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

If f is bounded and f (0+) = c > 0, then the distribution of

z(t)h0(X )V |(T ≥ t,X )

converges to Γ1, the unit exponential distribution.

6 / 118

Page 7: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The general result implies that the distribution of

c0 + c1z(t)h0(X ) · V |(T ≥ t,X )

converges to a Γ1/c1,ρ distribution, for any c0 ∈ R and c1 > 0.

To see this, note that if zV |Z ≥ z converges to Γρ then(c0 + c1z)V |Z ≥ z converges to Γ1/c1,ρ.

This in turn implies that the distribution of V |(T ≥ t,X ) canbe approximated by a gamma distribution with parameters(c0/c1) +

∫ t

0h(y ,X )dy and ρ, where the value of c0/c1 is

arbitrary: it is not determined by the asymptotic result or byproperties of G .

Exactly the same gamma distribution for V |(T ≥ t,X ) can begenerated by starting off with G = Γc0/c1,ρ, for all t and X .

7 / 118

Page 8: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

These results provide a justification for adopting the family ofgamma distributions for V in duration analysis with the MPHmodel.

After all, for sufficiently large t, an MPH model with gammaheterogeneity behaves similarly to an MPH model with anyheterogeneity distribution that satisfies the regularity conditionsfor our convergence result.

This result does actually not require the full MPH structure,but only that the hazard is multiplicative in the observedcovariate and duration components on the one hand, and V onthe other hand.

Abbring and Van den Berg (2001) provide details.

8 / 118

Page 9: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Non-Parametric Identification Of The MPH ModelA simple identification result

Recall that it is convenient to view the MPH model as a triple(z , h0,LF ).

Each MPH model (z , h0,LF ) maps into a conditional survivalfunction

G (t|X ) = LF (z(t)h0(X )) .

9 / 118

Page 10: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Here, we assume that

1 V is a nonnegative random variable that is independent of X ,and that has distribution F such that F (0) < 1,

2 z : [0,∞)→ [0,∞) can be written as z(t) =∫ t

0ψ(u)du for

some function ψ : [0,∞)→ [0,∞) such that∫ t

0ψ(u)du <∞

for all t ∈ [0,∞), and

3 the support of X by X ⊂ Rm and h0 : X → (0,∞).

Note that, unlike most of the literature, we allow for defects due toboth a mass point in the heterogeneity distribution at 0 (“stayers”),and limt→∞ z(t) <∞ (“defecting movers”).

10 / 118

Page 11: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

We will now briefly discuss under what conditions this mappingis invertible, so that we can find a unique triple (z , h0,LF ) witheach conditional survival function G (t|X ) in the range of themodel.

In this case, we say that the model is identified.

Typically, we will seek identification without imposingparametric assumptions on (z , h0,LF ), in which case we saythat the model is non-parametrically identified.

11 / 118

Page 12: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

To understand why this is relevant, note that, ideally, the datagive us G (t|X ).

So, we have to infer (z , h0,LF ) from data on G (t|X ).

If the model is identified, the data are consistent with only oneMPH model.

In particular, this implies that we can separate dynamic sortingand duration dependence, and consistently estimate theregressor effects on the conditional (“individual”) hazard rates.

12 / 118

Page 13: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The main papers on the non-parametric identifiability of theMPH model from single-spell data are Elbers and Ridder(1982), Heckman and Singer (1984a), Ridder (1990) andKortram, Lenstra, Ridder and Van Rooij (1998).

An accessible exposition along the lines of the latter paper canbe found in Abbring (2002), where the earlier analyses areextended to potentially defective duration distributions.

Here, we will only discuss an identification result underrelatively strong conditions.

We will provide a “high level” proof, which takes the followingtheorem as given.

13 / 118

Page 14: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Proposition 10

A Laplace transform is uniquely determined by its values on anonempty open set.

Proof

This follows directly from the so called “real analyticity” of theLaplace transform. The Appendix provides definitions and a proof ofa more general result that is also useful in the analysis ofmultivariate models.

14 / 118

Page 15: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

It is clear that this result is useful in identification analysis.

It says that we can identify the Laplace transform of the mixingdistribution, and therefore the mixing distribution (by theuniqueness of the Laplace transform), if we can compute it onan open set.

So, there is no need to trace it out on the whole of (0,∞).

15 / 118

Page 16: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The result we will prove here is the following.

Proposition 11

If h0(x) : x ∈ X contains a nonempty open interval and−£′F (0+) = E [V ] <∞, then the MPH model (z , h0,LF ) isuniquely determined by G (t|X ) up to two scale normalizations.

16 / 118

Page 17: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Proof

Without loss of generality, assume that h0(x0) = 1 for some fixed x0 ∈ X andz(t0) = 1 for some fixed t0 in the support of T . The proof proceeds byconstructing h0, LF and z from G .

First, for any x ∈ X , we can identify h0(x) by

∂G (t|x)/∂t

∂G (t|x0)/∂t=

ψ(t)h0(x)L′F (z(t)h0(x))

ψ(t)h0(x0)L′F (z(t)h0(x0))→ h0(x)

as t ↓ 0. Here, we use the finite-mean assumption. Second, as we have alreadyidentified h0, we can vary h0(x) over a nonempty open set by varying x over X .Now, as G (t0|x) = LF (h0(x)), we can trace LF on a nonempty open set, which,by Proposition 10, uniquely determines LF . Finally, z follows fromG (t|x) = LF (h0(x)z(t)), for any x ∈ X .

17 / 118

Page 18: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Some remarks are in order.

We can avoid Proposition 10 altogether by assuming thath0(x) : x ∈ X = (0,∞). However, this is a very strongassumption, and we do not need it.

By the uniqueness of the Laplace transform, we can identify Ffrom LF . Also, ψ is almost everywhere determined by z . Recallthat this “almost everywhere” indeterminacy was the reason forfocusing on z , instead of ψ, in the first place.

18 / 118

Page 19: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The requirement that there is variation in h0 in the data serves therole that we discussed before. If h0 would be constant, we couldnever disentangle duration dependence and dynamic sorting. Withvariation in h0, the proportionality assumption ensures that,conditional on V , relative duration dependence is the same for allx ∈ X , i.e.,

h(t|x ,V )

h(t ′|x ,V )=ψ(t)

ψ(t ′)

for all x ∈ X (for t, t ′, x , V such that h(t|x ,V ), h(t ′|x ,V ) > 0).Duration dependence caused by heterogeneity, on the other hand,“interacts” with the covariates. In particular, it is easy to checkthat h(t|x)/h(t ′|x) typically varies with h0(x). The degree ofdynamic sorting varies with the overall level of the hazards. This isthe key to the identification of the MPH model.

19 / 118

Page 20: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The finite-mean assumption on the mixing distribution is anarbitrary normalization that is not innocuous like the two scalenormalizations. We provide some discussion in the next section.

The model is clearly overidentified under the assumptionsmade. In the third step, we can identify z for any given x ∈ X .This provides overidentifying restrictions.

20 / 118

Page 21: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The Role Of The Finite-Mean Assumption

We will now study the finite-mean assumption in some detail.

Recall that we can represent the MPH model (z , h0,LF ) forT |X as

log (z(T )) = − log h0(X ) + U , (2)

with U = − log(V ) + ε independent of X .

Here, we assume that V > 0.

For expositional convenience, we focus on the linear index casethat h0(X ) = exp(X ′β).

Then,log (z(T )) = −X ′β + U , (3)

The discussion below also applies to the more general case (2).21 / 118

Page 22: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

To understand how the MPH structure on U and thefinite-mean assumption aid identification, first suppose that wedo not impose this structure.

So, we do not require that U is the sum of two independentrandom variables, one of which having a type-I extreme valuedistribution.

This gives the GAFT model (Ridder, 1990).

For identification, we obviously need location normalizations ontwo of log (z(T )), log (h0(X )) and U .

This corresponds to the two scale normalizations on the originalMPH model.

So, suppose that X does not include a constant and setE [U] = 0.

22 / 118

Page 23: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

It is easy to see that, even with these normalizations, the modelis not identified.

If we multiply both sides of (3) by some positive constantγ > 0, then we have

γ log (z(T )) = −X ′(γβ) + γU ,

which can be rewritten as

log ((z∗(T )) = −X ′(β∗) + U∗, (4)

with z∗(T ) = z(T )γ, β∗ = γβ and U∗ = γU .

z∗ is an increasing function such that z∗(0) = 0 andE [U∗] = 0, so that for each γ (4) satisfies the locationnormalizations, and is observationally equivalent to (3).

23 / 118

Page 24: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

We can only achieve identification by imposing an additionalscale normalization on the model in logs.

This normalization is typically imposed by assuming that thereis a regressor with a nonzero regression coefficient, andnormalizing this coefficient (e.g., Horowitz, 1996).

Without loss of generality, assume that the first regressor X1 inX has a nonzero coefficient, say β1.

Then, we can impose the additional normalization by setting|β1| = 1.

24 / 118

Page 25: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

We can further clarify this with an example.

Suppose that z has the Weibull form, so that z(T ) = ψ0Tψ1 .

Then, we can rewrite (3) as

log(T ) = − log(ψ0)

ψ1− X ′

ψ1

)+

U

ψ1. (5)

This is a standard linear regression model.

Heckman and Borjas (1980) analyze this model.

25 / 118

Page 26: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

First, note that we do not normalize z by fixing ψ0, as we havealready imposed location-normalizations on log(h0) and U .

So, − log(ψ0)/ψ1 assumes the role of the constant, and noconstant is included in X .

Second, the assumptions that U ⊥⊥ X and E [U] = 0 ensurethat the linear regression assumption E [U |X ] = 0 holds.

Of course, in this particular case, E [U |X ] = 0 would besufficient and independence of X and U is not required.

26 / 118

Page 27: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Without further assumptions, the model is not identified.

In particular, by multiplying the original regression equation byγ > 0, we can generate observationally equivalent Weibullmodels with any shape parameter γψ1.

For example, one interpretation of (5) is that it corresponds toa Weibull model with shape parameter ψ∗1 = 1, regressionparameter β∗ = β/ψ1, and error term U∗ = U/ψ1.

So, ψ1 is not identified.

27 / 118

Page 28: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Now, impose the additional scale normalization on X ′β byassuming that the first regressor X1 in X has a nonzerocoefficient and setting |β1| = 1.

Then, ψ1 is identified as the reciprocal of the regressioncoefficient of log(T ) on X1, the first regressor.

After all, we then have the model

log(T ) = − log(ψ0)− X ′β∗ + U∗ (6)

with β∗ = (1/ψ1, β2/ψ1, . . . , βm/ψ1)′, where βi is thecoefficient on the i -th regressor in X in the original model.

28 / 118

Page 29: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

A disadvantage of not imposing the MPH structure on U isthat we loose the appealing interpretation of the model interms of duration dependence and heterogeneity.

Imposing a direct scale normalization on X ′β is equivalent todirectly picking one particular γ, where γ indexes the class ofobservational models.

In the Weibull example, this boils down to directly picking theshape parameter ψ1.

In the previous subsection, we have seen that we can actuallyidentify this parameter, and distinguish duration dependenceand dynamic sorting, if we instead impose the MPH structureon U and a finite mean assumption.

29 / 118

Page 30: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

So, now suppose that U = − log(V ) + ε, or actuallyexp(U) = V /E , with V a positive random variable such thatE [V ] <∞ and E = exp(ε) a unit exponential random variableindependent of V .

We will now show that, for γ 6= 1, γU does not have the MPHstructure (with the finite-mean requirement) if U does.

30 / 118

Page 31: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

In particular, we have the following result due to Ridder (1990),

Proposition 12

If U is a random variable such that U = − log(V ) + ε, where V is apositive random variable with E [V ] <∞ and ε is a random variablewith a type-I extreme value distribution such that ε ⊥⊥ V , then

for 0 < γ < 1, γU is a random variable that cannot be writtenas − log(W ) + ε for some positive random variable W suchthat W ⊥⊥ ε, and

for γ > 1, γU is a random variable that can only be written as− log(W ) + ε for some positive random variable W such thatW ⊥⊥ ε that has E [W ] =∞.

31 / 118

Page 32: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Proof (sketch)

First, note that we can write exp(U) = E/V , with E = exp(ε) unit exponentialand independent of V . So, the MPH structure on U is equivalent to

Pr (exp(U) > u) = Pr (E > uV ) =

∫ ∞0

exp(−uv) dF (v) = LF (u).

So, for γ > 0, the distribution of exp(γU) = exp(U)γ satisfies

Pr(exp(U)γ > u) = Pr(exp(U) > u1/γ) = LF

(u1/γ

).

Second, for γU to have an MPH structure, it should be true that we can write

LF

(u1/γ

)= LF∗ (u)

for the distribution F ∗ of some positive random variable W . By Proposition 7,this requires that LF

(u1/γ

)is completely monotone as a function of u. This

can be shown to hold for γ > 1, but not for γ < 1 (see Ridder, 1990).

32 / 118

Page 33: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Proof (sketch, cont’d.)

Finally, if γ > 1, then

E [W ] = − limu↓0L′F∗ (u) = − lim

u↓0

dLF

(u1/γ

)du

= limu↓0

u(1−γ)/γ

γL′F(u1/γ

)=∞,

as E [V ] = L′F (0+) > 0.

33 / 118

Page 34: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

This proposition says that γU does not posses the MPHstructure if 0 < γ < 1, and possesses the MPH structure, butwith an unobservable that has an infinite mean, if γ > 1, if Usatisfies both the MPH structure and the finite-meanassumption.

So, together, the MPH assumption and the finite-meanassumption pin down γ (to be equal to 1) without a directscale normalization on X ′β.

34 / 118

Page 35: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Alternative Tail Conditions On The Mixing Distribution

The moments of a random variable are closely related to theright tail of its distribution function.

In particular, a finite mean of V requires that the tail of F issufficiently thin.

Heckman and Singer (1984a) show that the MPH model canbe identified without assuming that V has a finite mean, but byimposing an alternative tail condition on F .

As Ridder (1990) points out, in terms of Proposition 12 thisboils down to picking one of the models with infinite mean, i.e.,corresponding to one particular γ > 1, instead of the modelcorresponding to γ = 1.

35 / 118

Page 36: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

One peculiar feature of the assumption of Heckman and Singer(1984a) is that it excludes a degenerate distribution F , i.e., theabsence of unobserved heterogeneity.

After all, the support of V cannot be concentrated on ∞, sothat E [V ] cannot equal ∞ in this case.

This seems to imply that, under the assumptions of Heckmanand Singer (1984a), the MPH model does not contain the PHmodel as a special case.

This is not true, however.

36 / 118

Page 37: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

One mixing distribution included by Heckman and Singer(1984a) is the positive stable distribution, with Laplacetransform LF (s) = exp(−s1/γ), for some given parameterγ > 1.

In the corresponding MPH model,

Pr(T > t|X ) = LF

(z(t)eX

′β)

= exp(−z(t)1/γeX

′ βγ

)and

h(t|X ) = − d log (Pr(T > t|X ))

dt=ψ(t)z(t)(1−γ)/γ

γeX

′ βγ .

It appears that the implied model for T |X is a PH model, withbaseline hazard ψ∗(t) = ψ(t)z(t)(1−γ)/γ/γ and covariatefunction h∗0(X ) = exp(X ′ β

γ).

37 / 118

Page 38: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

So, the model with a positive stable F , which is a model in theclass considered by Heckman and Singer (1984a), isobservationally equivalent to the MPH model with degenerateF .

Both models imply a PH model for T |X .

The latter model satisfies the finite-mean assumption and isexcluded by the tail condition imposed by Heckman and Singer.

This further illustrates the role of the assumptions on the tail ofF in identifying the MPH model.

38 / 118

Page 39: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Even though both a degenerate F and a positive stable F leadto a PH model for T |X , both models have differentinterpretations.

In the first MPH model, all heterogeneity enters throughvariation in X , as V is degenerate.

The second model, however, has heterogeneity in V , and mixesover a different distribution of T |X ,V than the first model.

39 / 118

Page 40: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

So, if we interpret T |X ,V as the duration distribution at thelevel of the individual, then both models lead to substantiallydifferent conclusions, although they are empiricallyindistinguishable.

In this sense, the finite-mean assumption, or the alternativeassumption of Heckman and Singer, is not innocuous.

It is however unavoidable in single-spell data.

40 / 118

Page 41: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Finally, note that it can actually be shown that MPH models with adegenerate F and a positive stable F are the only MPH modelsleading to a PH model for T |X (see Van den Berg, 2001).

41 / 118

Page 42: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Semi- And Non-Parametric EstimationPartial likelihood estimation of the PH modelMotivation

Suppose we have a sample ((T1,X1), . . . , (Tn,Xn)) of nobservations that are randomly drawn from the distributionG (t|X ) conditional on n values Xi of the covariates X .

For now, we assume there is no censoring, and that thecovariates are not time-varying.

Let G (t|X ) be given by the PH model, so thatG (t|X ) = exp (−z(t)h0(X )).

42 / 118

Page 43: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Note that we can decompose the duration information, i.e.,(T1, . . . ,Tn), into a rank and an order statistic.

To this end, order the observation from the lowest to thehighest, and denote the i -th smallest duration by T(i).

The vector of ordered durations (T(1), . . . ,T(n)) is called theorder statistic corresponding to (T1, . . . ,Tn).

The vector ((1), . . . , (n)) of the corresponding indices is calledthe rank statistic.

43 / 118

Page 44: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

For example, if T4 is the smallest duration, then (1) = 4 andT(1) = T4.

Together, the order and rank statistics contain the sameinformation as (T1, . . . ,Tn).

If we would only have the order statistic, we could not link thedurations to the (unordered) covariates.

Note that one of the elements of the rank statistic((1), . . . , (n)) is superfluous.

After all, if we know ((1), . . . , (n − 1)), (n) simply follows asthe single remaining index.

44 / 118

Page 45: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

It is intuitively clear that, in the PH model, the ranking ofdurations is affected by the covariates, or actually the values ofh0(Xi), but not by the shape of the baseline hazard ψ, which iscommon to all observations.

The order statistic (the actual durations observed), however,clearly depends on both.

This suggests that we can estimate the parameters of h0(X )without making any assumptions on the baseline hazard byusing the rank information only.

This is the idea underlying Cox’ (1972, 1975) partial likelihood(PL).

45 / 118

Page 46: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

As an example, suppose that n = 2, so that we observe(T,X) = ((T1,X1), (T2,X2)).

The (conditional) likelihood L(h0, z ; T|X) of (h0, z) can bedecomposed as

L(h0, z ; T|X) = Lr1(h0; (1)|T(1),X)Lo1(h0, z ;T(1),X)

× Lr2(h0; (2)|(1),T(1),T(2),X)Lo2(h0, z ;T(2)|(1),T(1),X)

= Lr1(h0; (1)|T(1),X)Lo1(h0, z ;T(1),X)Lo2(h0, z ;T(2)|(1),T(1),X),

where Lri and Loi are the i -th rank and order contributions,respectively.

The second equality follows because (2) is degenerate given(1), as discussed earlier.

46 / 118

Page 47: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Following the intuition above, we have anticipated that Lri doesnot depend on z (see below).

By singling out the rank contributions, we get the PLcorresponding to the rank information,

Lr (h0; T|X) = Lr1(h0; (1)|T(1),X).

In this particular case, the partial likelihood has a properprobabilistic interpretation as a conditional likelihood.

Lr is the probability of the observed ranking of T1 and T2 giventhe smallest duration and the covariates.

47 / 118

Page 48: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Now, note that ψ(t)h0(X1) exp(−z(t)(h0(X1) + h0(X2))) is thesub-density − d Pr(T1 > t,T2 > T1)/ dt of T1 and T2 > T1,and that ψ(t)h0(X2) exp(−z(t)(h0(X1) + h0(X2))) is thesub-density of T2 and T1 > T2.

48 / 118

Page 49: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Also, recall that

ψ(t) (h0(X1) + h0(X2)) exp(−z(t)(h0(X1) + h0(X2)))

is the density of minT1,T2|X.

So, the probability of the observed ranking of T1 and T2 giventhe smallest duration and the covariates follows from

Pr((1) = i |T(1) = t,X

)=

ψ(t)h0(X1) exp(−z(t)(h0(X1) + h0(X2)))

ψ(t) (h0(X1) + h0(X2)) exp(−z(t)(h0(X1) + h0(X2)))

=h0(Xi )

h0(X1) + h0(X2),

which does indeed not depend on z .

As we will see next, this result generalizes to the general caseof n observations, with time-varying covariates and censoring.

49 / 118

Page 50: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

General Cox’ (1975) Partial Likelihood

Before extending the analysis of the previous subsection to thePH model with time-varying covariates and censoring, it isuseful to shortly discuss general PL as envisioned by Cox(1975).

This also reiterates the previous discussion of likelihoodconstruction.

50 / 118

Page 51: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Suppose that we have data Y that are a draw from a jointprobability density q(Y; ξ, η), where ξ is again a parameter ofinterest and η a nuisance parameter.

The corresponding full information likelihood is given byL(ξ, η; Y).

Suppose that Y can be represented by a vector((A1,B1), . . . , (An,Bn)).

Denote Ai = (A1, . . . ,Ai), Bi = (A1, . . . ,Ai), A = An, andB = Bn.

51 / 118

Page 52: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

As before, and in obvious (abuse of) notation, the full informationlikelihood can be decomposed as

L(ξ, η; Y) = L (ξ, η; A,B) =n∏

i=1

Li(ξ, η;Ai ,Bi |Ai−1,Bi−1

)=

n∏i=1

Lri(ξ, η;Ai |Ai−1,Bi

)Loi(ξ, η;Bi |Ai−1,Bi−1

)= Lr (ξ, η; A,B)Lo(ξ, η; A,B).

where

Lr (ξ, η; A,B) =n∏

i=1

Lri(ξ, η;Ai |Ai−1,Bi

)and

Lo (ξ, η; A,B) =n∏

i=1

Loi(ξ, η;Bi |Ai−1,Bi−1

).

52 / 118

Page 53: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Now, if Lr is not informative on the nuisance parameter η, and

Lr (ξ, η; A,B) = Lr (η; A,B) ,

then it is called the partial likelihood of η based on A.

As we have discussed before, a partial likelihood is, in general,not a proper (full, marginal or conditional) likelihood function.

However, for most purposes, it can be treated as one.

53 / 118

Page 54: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

In particular, we can estimate η by the partial likelihoodestimator,

ηr = supe

Lr (e; A,B) ,

and the properties of ηr follow from standard likelihood theory(Cox, 1975, Kalbfleisch and Prentice, 1980, Andersen and Gill,1982, and Andersen et al., 1993).

Recall that the PL estimator ηr will not be efficient if theomitted likelihood factor, Lo , is informative on η.

So, PL estimation allows us to avoid modeling Lo andestimation of the nuisance parameters η, but this comes at thecost of an efficiency loss if Lo is informative on η.

54 / 118

Page 55: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Partial Likelihood Estimation Of The PH Model

Now, consider again the PH model, but with (possibly)time-varying covariates X(t) = (X1(t), . . . ,Xn(t)).

In particular, the hazard rate at time t given the regressor pathX(u); 0 ≤ u ≤ t, only depends on Xi(t) and through a linearindex, so that we can write

Pr (Ti ∈ [t, t + ∆t)|Ti ≥ t, X(u); 0 ≤ u ≤ t) (7)

= ψ(t) exp (Xi(t)′β) ∆t + o(∆t)

for ∆t ↓ 0.

55 / 118

Page 56: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Next, suppose we have a sample of n durations of which k arecomplete and n − k censored.

Assuming that there are no ties among the complete durations,we can denote the ordered complete durations byT(1) < · · · < T(k).

Then, we can define an order statistic (T(1), . . . ,T(k)) and arank statistic ((1), . . . , (k)) for the complete durations only.

56 / 118

Page 57: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

In terms of the previous subsection, let Bi consist of (i)(T(1), . . . ,T(i)), and (ii) all covariate information and (iii) allcensoring information up to and including time T(i).

Let Ai specify which observation is the i -th smallest completeduration in the sample.

More formally, Ai contains the i -th element (i) of the rankstatistic.

Then, (A,B) contains all the information in the data.

57 / 118

Page 58: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Cox (1972, 1975) proposes using the PL based on the rankstatistic, i.e., A, to estimate β without specifying ψ, or z .

In the notation of the last subsection, the full informationlikelihood can be decomposed as

L (β, z ; A,B) = Lr (β, z ; A,B)Lo(β, z ; A,B)

with

Lr (β, z ; A,B) =k∏

i=1

Lri(β, z ;Ai |Ai−1,Bi

).

58 / 118

Page 59: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

In analogy with the example given at the start of thissubsection, the individual contributions Lri to Lr are

Lri(β, z ;Ai |Ai−1,Bi

)(8)

= Pr((i) fails at T(i)|(1), . . . , (i − 1),T(1), . . . ,T(i), X(u); 0 ≤ u ≤ T(i),

cens. ≤ T(i)

).

Here “cens. ≤ T(i)” contains all information on censoring up toand including time T(i).

59 / 118

Page 60: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Now, denote this risk set, i.e., the set of the indices of allobservations at risk, at time t by R(t).

By (7) and random sampling, the joint probability that spelli ∈ R(t) fails in [t, t + ∆) and that neither one of the otherspells in R(t) fails in [t, t + ∆), conditional on survival throught and the covariate path X(u); 0 ≤ u ≤ t, is given by

Pr (Ti ∈ [t, t + ∆t)|Ti ≥ t, X(u); 0 ≤ u ≤ t)

×∏

j∈R(t):j 6=i

[1− Pr

(Tj ∈ [t, t + ∆t)|Ti ≥ t, X(u); 0 ≤ u ≤ t

)]= ψ(t) exp

(Xi (t)′β

)∆t

∏j∈R(t):j 6=i

[1− ψ(t) exp

(Xj (t)′β

)∆t]

+ o(∆t)

= ψ(t) exp(Xi (t)′β

)∆t + o(∆t),

(9)

as ∆t ↓ 0.

60 / 118

Page 61: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Note that this is also the joint probability that spell i ∈ R(t)fails in [t, t + ∆) and that exactly one of the spells in R(t) failsin [t, t + ∆), conditional on survival through t and thecovariate path X(u); 0 ≤ u ≤ t.

61 / 118

Page 62: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Similarly, the probability that exactly one the spells in R(t) failsin [t, t + ∆), conditional on survival through t and thecovariate path X(u); 0 ≤ u ≤ t, is given by∑i∈R(t)

Pr (Ti ∈ [t, t + ∆t)|Ti ≥ t, X(u); 0 ≤ u ≤ t) + o(∆t)

=∑i∈R(t)

ψ(t) exp (Xi(t)′β) ∆t + o(∆t),

(10)

as ∆t ↓ 0.

62 / 118

Page 63: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The conditional probability of spell i ∈ R(t) failing in [t, t + ∆)conditional on exactly one of the spells in R(t) failing in[t, t + ∆), survival through t, and the covariates pathX(u); 0 ≤ u ≤ t is given by dividing (9) by (10).

Letting ∆t ↓ 0, this gives the probability that i ∈ R(t) fails att given that one in R(t) fails at t, again conditioning onsurvival through t and the covariates path X(u); 0 ≤ u ≤ t,

ψ(t) exp (Xi(t)′β)∑j∈R(t) ψ(t) exp (Xj(t)′β)

=exp (Xi(t)′β)∑

j∈R(t) exp (Xj(t)′β)(11)

63 / 118

Page 64: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Now, note that the conditioning information in (8) containsinformation on the risk set at time T(i) (among other things).

Also, under independent censoring, we can ignore the censoringinformation in the conditioning set.

64 / 118

Page 65: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Then, (8) is simply the probability that (i) ∈ R(T(i) fails at T(i)

given that one in R(T(i)) fails at T(i), survival through T(i), thecovariates path X(u); 0 ≤ u ≤ T(i) and T(i), which, by (11),is

Lri(β;Ai |Ai−1,Bi

)=

exp(X(i)(T(i))

′β)∑

j∈R(T(i)) exp(Xj(T(i))′β

) .Then,

Lr(β;Ai |Ai−1,Bi

)=

k∏i=1

exp(Xi(T(i))

′β)∑

j∈R(T(i)) exp(Xj(T(i))′β

) ,is indeed independent of z , and therefore a PL for β.

Again note that, in general, Lr is not a proper (full, marginal orconditional) likelihood.

65 / 118

Page 66: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

First, for this we would again need additional assumptions onthe covariate and censoring processes.

Second, Lri does not condition on the entire order statistic.

So, even if we make the appropriate assumptions on thecovariates and censoring, we should not expect Lr to be alikelihood based on the rank statistic conditional on the orderstatistic.

66 / 118

Page 67: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The Cox (1972, 1975) PL estimator of the PH model is given by

βr = supb

Lr (b; A,B) .

The main advantage of PL estimation is that it avoidsspecifying the baseline.

However, once we have estimated β by βr , we can estimate zin a second stage.

First note that, with time-constant covariates X ,

G (t|X ) = G0(t)exp(X ′β),

where G0(t) = exp(−z(t)).

We observe the empirical equivalent of G (t|X ) and we canestimate exp(X ′β) by exp(X ′βr ).

67 / 118

Page 68: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

So, we can construct a NPML of G0 for a given value βr of β,and estimate G0 as the corresponding NPMLE.

This gives a Kaplan-Meier-like estimator of G0.

In a sense, the first-stage PL estimator is used to“homogenize” the data into data on G0.

See Kalbfleisch and Prentice (1980, Section 4.3) for details.

68 / 118

Page 69: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Stratified Partial Likelihood Estimation

In general, we can use PL based on rank statistics if theranking of observations depends on parameters of interest, butnot on some nuisance parameters.

In the single-spell setting studied by Cox (1972), we use thatthe ranking of observations between individuals is(monotonically) affected by the regressor effects, but not by thecommon baseline.

69 / 118

Page 70: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Cox’s (1972) PL is not used very often in empirical economics,probably due to the fact that it does not allow for unobservedheterogeneity in single-spell models.

The reason that we have given PL ample attention is that itcan be applied much more generally.

If we have multiple spells for each individual, then we can usethe ranking of the spells within individuals, and treat anycommon elements of the hazards within individuals as nuisanceparameters.

This includes time-invariant observed and unobservedheterogeneity terms.

In particular, this allows for fixed individual effects.

70 / 118

Page 71: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Then, as long as we have variation in the covariates betweenindividual spells, we can estimate the covariate effects bymaximizing a stratified partial likelihood (SPL) based on thewithin-individual ranking of spells (Ridder and Tunalı, 1999).

We come back to this later.

71 / 118

Page 72: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Semi- And Non-Parametric Estimators Of The MPH ModelNon-Parametric Maximum Likelihood

One of the more well-known semi-parametric estimators of theMPH model is the NPML estimator (Kiefer and Wolfowitz,1956, Lindsay, 1983a and 1983b, and Heckman and Singer,1984).

Suppose that h0(X ) = exp(X ′β) and z(t) = z(t;ψ1) arespecified up to a finite-dimensional vector of parameters(β, ψ1), but that the distribution F of V is not parameterized.

If L(β, ψ1,F ) again denotes the likelihood, then

(β, ψ1,F ) = arg supp∈M

L(p)

is the NPMLE of (β, ψ1,F ).72 / 118

Page 73: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Here, M is the (infinite-dimensional) “parameter” space.

Heckman and Singer (1984b) show that this NPMLE isconsistent in the case that z is Weibull, but they do not provideother asymptotic properties.

The NPMLE of F is a discrete mixture, which explains thefrequent references to Heckman and Singer (1984b) wheneverdiscrete mixtures are used in the empirical duration literature.

73 / 118

Page 74: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Some Other Semi-Parametric Estimators

Lenstra and Van Rooij (1998) provide a non-parametricestimator of the MPH model for the two-sample case (i.e., thecase in which the regressors take two values).

They prove consistency, but give no further distributionalresults.

Horowitz (1999) provides a consistent semi-parametricestimator for a linear-index version of the model, and developssome asymptotic distribution theory.

74 / 118

Page 75: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

In either case, the estimators do not have very nice properties.

In particular, the estimators of the parameters of interest arenot√n-consistent.

Hahn (1994) establishes a general result for semiparametricestimation of the MPH model.

He studies an MPH model with a Weibull baseline and a linearindex for the covariate function, and shows that no√n-consistent estimators of the covariate parameters β and the

Weibull shape parameter ψ1 exist.

Klaassen and Lenstra (1998) extend this result to thenon-parametric two-sample model.

75 / 118

Page 76: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

However, we have seen that the GAFT model, with a linearindex covariate function, a direct scale normalization on thecovariate effects β, and a Weibull baseline, can be estimated byOLS.

So, in this case, we have√n-consistent and asymptotically

normal estimators of β and the Weibull shape-parameter, ψ1.

If we do not impose a scale normalization on β, but insteadidentify this scale from the MPH structure on the error term,we, obviously, have less information on this scale.

The results of Hahn (1994) and Klaassen and Lenstra (1998)show that we then have insufficient information for√n-consistent estimation.

76 / 118

Page 77: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

It should be stressed that this is specific to the parametersmentioned.

It follows that, in the Weibull MPH model with a linear indexcovariate function, we can estimate β/ψ1 by OLS, which is√n-consistent.

This is basically the same parameter that we estimate in theGAFT model.

77 / 118

Page 78: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Because of these estimation problems, the MPH model isusually estimated with fully parametric ML if only single-spelldata are available.

Alternatively, researcher abandon the mixture interpretation,and estimate AFT or GAFT models.

78 / 118

Page 79: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

It should be noted that the case for semi-parametric estimationis much stronger if we have multiple-spell data.

With such data, and under the assumption that theunobservables are the same between individual spells, we canget rid of the unobservables by using within-individual variation.

We have already shortly discussed one example, SPLestimation.

79 / 118

Page 80: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Hypothesis Testing And Residual AnalysisThe Classical Tests

Of course, you can use the classical tests in this context as inany other.

The Wald test, for example, was already seen used in Meyer(1990), and can be applied to (i) tests on constant ψ and (ii)joint tests on regressor parameters.

The Appendix provides some details.

80 / 118

Page 81: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The Information Matrix Test

Let Y be a sample of n observations (Y1, . . . ,Yn), and letL(ξ; Y) be the likelihood of some parameter vector ξ for thatsample.

Suppose that L(ξ; Y) =∏n

i=1 Li(ξ;Yi), where Li(ξ;Yi) is thelikelihood contribution of the i -th observation.

Denote the corresponding log-likelihood and log-likelihoodcontributions by l(ξ; Y) and li(ξ;Yi), respectively.

81 / 118

Page 82: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Note that∂li(ξ;Yi)

∂ξ=

1

Li(ξ;Yi)

∂Li(ξ;Yi)

∂ξ,

so that

∂2li(ξ;Yi)

∂ξ∂ξ′=

1

Li(ξ;Yi)

∂2Li(ξ;Yi)

∂ξ∂ξ′− ∂li(ξ;Yi)

∂ξ

∂li(ξ;Yi)

∂ξ′.

82 / 118

Page 83: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

If the likelihood is correctly specified, i.e., if Li(ξ;Yi) is also the(conditional) density of the actual data generating process fori , then the expectation of the first term in the right-hand side(with respect to this density) is

E

[1

Li(ξ;Yi)

∂2Li(ξ;Yi)

∂ξ∂ξ′

]=

∫1

Li(ξ; y)

∂2Li(ξ; y)

∂ξ∂ξ′Li(ξ; y) dy

=∂2

∂ξ∂ξ′

∫Li(ξ; y) dy = 0,

under standard regularity conditions.

83 / 118

Page 84: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

So, in that case we have that

−E[∂2li(ξ;Yi)

∂ξ∂ξ′

]= E

[∂li(ξ;Yi)

∂β

∂li(ξ;Yi)

∂ξ′

],

which leads to the well-known information matrix (IM) equality,

I = −E[∂2l(ξ; Y)

∂ξ∂ξ′

]= E

[∂l(ξ; Y)

∂ξ

∂l(ξ; Y)

∂ξ′

].

As the IM equality generally fails if L is misspecified, it providesa basis for a test.

This IM test is a general misspecification test (White, 1982).

The exposition above closely follows Davidson and MacKinnon(1993, Section 16.9).

84 / 118

Page 85: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

The IM test can and should be applied to only a subset of theentries of the information matrix.

At the very least, one should exclude either the upper or lowerdiagonal entries, because of symmetry.

The IM test has been applied to the entries corresponding tothe parameters of the discrete mixing distribution in an MPHmodel (e.g., Gunderson and Melino, 1990).

In this context, Chesher (1984) offers an appealinginterpretation as a test on local parameter heterogeneity.

The Appendix provides details.

85 / 118

Page 86: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Other Tests

There exists a wide variety of tests that are specifically tailoredto, for example, PH models.

An example are the so called log-rank tests.

See, e.g., Kalbfleisch and Prentice (1980) and Andersen et al.(1993).

86 / 118

Page 87: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Generalized Residual Analysis

We can exploit fact that∫ T

0 h(u|X ) du is a unit exponential randomvariable and define generalized errors as

εi = 1−∫ Ti

0h(u|Xi ) du.

Here, Ti and Xi are the i-th duration and covariates in the sample.

Note that 1− εi is unit exponential, so that E [εi ] = 0 andVar(εi ) = 1.

Then, we can define generalized residuals as

ei = 1−∫ Ti

0h(u|Xi ) du,

where h is the ML estimator of h.

87 / 118

Page 88: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Gamma MPH Est Hyp

Obviously, if h is misspecified, then 1− εi is not (necessarily) aunit exponential random variable.

This provides a basis for specification tests and residualanalysis.

See Lancaster (1990, Ch. 11) for an extensive discussion andreferences.

88 / 118

Page 89: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Appendices

89 / 118

Page 90: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Analyticity Of Completely Monotone Functions

Completely monotone functions are frequently encountered in theanalysis of MPH models in the form of (derivatives of) Laplacetransforms.

Definition 11

Let Ω be a nonempty open set in Rn. A function f : Ω→ R isabsolutely monotone if it is nonnegative and has nonnegativecontinuous partial derivatives of all orders. f is completelymonotone if f m is absolutely monotone, wherem : x ∈ ω ∈ Rn : −ω ∈ Ω 7→ −x .

90 / 118

Page 91: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Note that for n = 1 this definition reduces to the familiar definitionsin Widder (1946). The following result is very useful foridentification analysis.

Proposition 13

Let Ψ be a nonempty open connected set in Rn and let f : Ψ→ Rand g : Ψ→ R be completely monotone. If f and g agree on anonempty open set in Ψ, then f ≡ g .

91 / 118

Page 92: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proof of Proposition 13

The proof exploits two facts that are well known for functions on R:(i) completely monotone functions are real analytic and (ii) realanalytic functions are uniquely determined by their values on anopen set.We need the following definition of real analyticity, adapted fromNarasimhan (1971).

Definition 12

Let Ω be a nonempty open set in Rn. The function f : Ω→ R isreal analytic if to each ω ∈ Ω corresponds a power series in x − ωthat converges to f (x) for all x in some neighborhood U ⊂ Ω of ω.

92 / 118

Page 93: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proof of Proposition 13 (cont’d.)

The following lemma is proven in Widder (1946) for the special caseof n = 1 (Theorem 3a in Chapter IV). This lemma with n = 1 issometimes called S. Bernstein’s Theorem (e.g., Krantz and Parks,1992, Theorem 2.4.1). Here we prove it for general n.

Lemma 1

Let Ω be a nonempty open set in Rn. If f : Ω→ R is absolutelymonotone, then f is real analytic.

93 / 118

Page 94: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proof of Proposition 13 (cont’d.)

Proof of Lemma 1

Let ω ∈ Ω, and let ρ > 0 be such that ω + h ∈ Ω forh ∈ Un(ρ) := η ∈ Rn : (η′η)1/2 < ρ. For functions f : Rn → Rdefine

Di f (x) :=∂

∂xif (x),

where x := (x1, . . . , xn). Let D be the n × 1-vector (D1, . . . ,Dn), sothat Df (x) = ∂f (x)/∂x .

94 / 118

Page 95: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proof of Proposition 13 (cont’d.)

Proof of Lemma 1 (cont’d.)

By Taylor’s Theorem with exact remainder (e.g., Widder, 1961), wehave that

f (ω + h) =k∑

j=0

1

j !(h′D)

jf (ω) + Rk(ω, h),

with

Rk(ω, h) =

∫ 1

0

(1− t)k

k!(h′D)

k+1f (ω + th)dt,

for h ∈ Un(ρ).

95 / 118

Page 96: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proof of Proposition 13 (cont’d.)

Proof of Lemma 1 (cont’d.)

Now, take any h := (h1, . . . , hn) ∈ Un(n−1/2ρ). Define a := max|h1|, . . . , |hn|,and denote the n × 1-unit vector by en. Note that 0 ≤ a < n−1/2ρ, whichimplies that aen ∈ Un(ρ). Take any b ∈ R such that a < b < n−1/2ρ. Then,

0 ≤ |Rk(ω, h)| ≤∫ 1

0

(1− t)k

k!(|h|′D)

k+1f (ω + th)dt

≤ ak+1

∫ 1

0

(1− t)k

k!(e′nD)

k+1f (ω + th)dt

≤( ab

)k+1∫ 1

0

(1− t)k

k!(be′nD)

k+1f (ω + tben)dt

=( ab

)k+1

Rk(ω, ben)

≤( ab

)k+1

f (ω + ben) −→ 0 as k →∞.

96 / 118

Page 97: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proof of Proposition 13 (cont’d.)

Obviously, Lemma 1 implies that if a function f : Ω→ R iscompletely monotone on a nonempty open set Ω in Rn, then f isreal analytic.

Narasimhan (1971) shows that if f : Ψ→ R is real analytic on anonempty open connected set Ψ in Rn, and f vanishes on anonempty open subset of Ψ, then f ≡ 0 (Narasimhan, 1971,Proposition 1 in Chapter 1 and Remark 2 on page 4).Proposition 13 now follows immediately, as the difference of two realanalytic functions is real analytic.

97 / 118

Page 98: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

The Classical Tests

Recall that N(µ,Σ) denotes a random vector that has amultivariate normal distribution with mean µ and

variance-covariance-matrix Σ, and that “D→” denotes

“converges in distribution to”.

Also, let χ2l be a random variable that has a chi-square

distribution with l degrees of freedom.

98 / 118

Page 99: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Proposition 14 (Wald test)

Suppose that β1, β2, . . . is an estimator sequence of β ∈ Rk suchthat √

n(βn − β)D→ N(0,Σ)

and that we want to test H0 : r(β) = 0 against some appropriatealternative, where r : Rk → Rl is differentiable in a neighborhood ofβ and the rank of R := ∂r(β)/∂β′ is l ≤ k , then

nr(βn) (RΣR ′)−1

r(βn)D→ χ2

l

under H0

Proof

See any competent econometrics text book.

99 / 118

Page 100: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

The Wald test is typically used to test on constant ψ (noduration dependence), joint tests on the parameters of h0,etcetera.

To implement the test, we have to replace R and Σ byconsistent estimates.

100 / 118

Page 101: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

The Wald test is only one of the three classical tests.

See your econometrics text book for discussions oflikelihood-ratio and Lagrange-multiplier (score) tests.

101 / 118

Page 102: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

The Information-Matrix TestIntroduction

The information-matrix (IM) test is based on the IM equality,

I = −E[∂2l(β; Y)

∂ξ∂ξ′

]= E

[∂l(ξ; Y)

∂ξ

∂l(ξ; Y)

∂ξ′

],

which holds if the log likelihood l = log(L) is correctly specified(i.e., if L is consistent with the data generating process), butgenerally fails under misspecification of L (White, 1982).

102 / 118

Page 103: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

The IM test statistic is based on the difference between theentries of

−∂2l(ξ; Y)

∂ξ∂ξ′and

∂l(ξ; Y)

∂ξ

∂l(ξ; Y)

∂ξ′.

It can and should be applied to only a subset of the entries.

At the very least, one should exclude either the upper or lowerdiagonal entries, because of symmetry.

We illustrate further reasons for excluding certain entriesconsidered in the context of a particular application in the nextsection.

103 / 118

Page 104: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Given a set of entries, the corresponding differences can becombined in a quadratic distance measure in the usual way.

This gives the IM test-statistic.

Under the null hypothesis that model is correctly specified, thisstatistic is asymptotically chi-squared distributed.

104 / 118

Page 105: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Testing The Specification Of The Heterogeneity Distribution

Denote the i -th (quasi-)likelihood contribution by Li .

Suppose that Li , in turn, is a discrete mixture of a conditionallikelihood contribution Li(V ), where V takes m different valuesv1, . . . , vm with associated probabilities 0 < p1, . . . , pm ≤ 1.

105 / 118

Page 106: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Then, the i -th log-likelihood contribution li is given by

li = log(Li) = log

(m∑j=1

pjLi (vj)

). (12)

Denote v = (v1, . . . , vm)′, p = (p1, . . . , pm)′, and π = (v ′, p′).

We are interested in the (sub-)information matrix Iπ,containing rows and columns corresponding to parameters in π,only.

106 / 118

Page 107: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Under the hypothesis that the likelihood contributions Li arenot misspecified, we have the following standard equivalencyresult concerning this information matrix,

Iπ = −E[∂2l

∂π∂π′

]= E

[∂l

∂π

∂l

∂π′

], (13)

where l = log(L) =∑n

i=1 li is the log-likelihood.

White (1982) shows that this equality breaks down under thealternative hypothesis that the model is misspecified.

The Information Matrix (IM) test is based on this equality.

Chesher (1984) shows that the IM test is equivalent to a teston neglected parameter heterogeneity.

107 / 118

Page 108: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Gunderson and Melino (1990) use this idea to give some fleshto an IM test on the parameters of a discrete mixingdistribution (here: π).

According to Chesher’s result, such a test can be seen as a teston additional neglected unobserved heterogeneity in themixture model considered.

Gunderson and Melino state that this IM test reduces to a testwith m degrees of freedom.

We will establish this result first.

108 / 118

Page 109: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Taking first derivatives of (12) yields

∂li∂pj

=Li (vj)

Li(14)

and∂li∂vj

=pjL′i (vj)

Li, (15)

for j = 1, . . . ,m, where L′i (vi) = ∂Li (vi) /∂vi , with obvious abuse ofnotation.

109 / 118

Page 110: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Second derivatives are given by

∂2li∂pj∂pk

= −Li (vj) Li (vk)

L2i

, (16)

∂2li∂vj∂vk

= −pjpkL′i (vj) L

′i (vk)

L2i

, (17)

∂2li

∂vj2 =

pjL′′i (vj)

Li−[pjL′i (vj)

Li

]2

, (18)

∂2li∂pj∂vk

= −Li (vj) pkL′i (vk)

L2i

(19)

and∂2li∂pj∂vj

=L′i (vj)

Li− Li (vj) pjL

′i (vj)

L2i

, (20)

for j , k = 1, . . . ,m, j 6= k , where L′′i (vi) = ∂2Li (vi) /∂v2i .

110 / 118

Page 111: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Now, note that the (quasi-)ML estimator (QMLE) maximizesthe sample (quasi-)log-likelihood

l =n∑

i=1

li . (21)

The IM test on the parameters in π is based on the sampleequivalent to the distance between the score and the Hessianrepresentations of the information matrix, the average distancematrix

D = n−1

[∂2l

∂π∂π′+∂l

∂π

∂l

∂π′

]. (22)

In the absence of misspecification we expect the elements D,evaluated at the QMLE, to be equal to 0, so the IM test isconcerned with testing the distance of the elements D from 0at the QMLE.

111 / 118

Page 112: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Not all elements of D should be introduced in this test.

First of all, symmetry of both estimates of the informationmatrix implies symmetry of D.

So, we should delete either the upper or the lower triangularpart of D.

However, a more rigorous argument can be made, that leads tothe conclusion that only that part of the diagonal of Dassociated with the m parameters from v should be used in theIM test (which is also the result claimed by Gunderson andMelino, 1990).

112 / 118

Page 113: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

First of all, we can directly conclude from (14) and (16) that

∂2li∂p∂p′

+∂li∂p

∂li∂p′

= 0, (23)

so that, a forteriori, we can conclude that the same holds forthe corresponding part of D.

Similarly, we can use (15) and (17) to show that, for allj , k = 1, . . . ,m and j 6= k ,

∂2li∂vj∂vk

+∂li∂vj

∂li∂vk

= 0, (24)

which implies that the off-diagonal part of the block of Dcorresponding to v is identically equal to 0.

113 / 118

Page 114: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

A similar result can be derived for cross-derivatives with respectto p and v , as for j 6= k

∂2li∂pj∂vk

+∂li∂pj

∂li∂vk

= 0, (25)

as can easily be seen from (14), (15), and (19).

For j = k , however, we find, using (20), that

∂2li∂pj∂vj

+∂li∂pj

∂li∂vj

=L′i (vj)

Li. (26)

Thus, the corresponding elements of nD are given by

∂2l

∂pj∂vj+

∂l

∂pj

∂l

∂vj=

n∑i=1

L′i (vj)

Li=

1

pj

∂l

∂vj= 0, (27)

where the last identity is implied by evaluation at the QMLEand the corresponding first order conditions.

114 / 118

Page 115: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Therefore, the only non-zero elements of D left are those at thepart of the diagonal corresponding to the points of support ofthe mixing distribution, and are given by

Djj = n−1

[∂2l

∂vj∂vj+

∂l

∂vj

∂l

∂vj

]= n−1

n∑i=1

pjL′′i (vj)

Li(28)

as can be seen from (18).

Djj denotes the j-th diagonal element of D.

We can stack the m elements of D that are nonzero in a m × 1column vector d , such that the j-th element of d equals Djj .

The IM-test can now be constructed by first showing that,under the absence of misspecification,

√nd is asymptotically

jointly normally distributed with mean 0, and then constructinga quadratic form in

√nd that is asymptotically χ2 distributed.

115 / 118

Page 116: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

A Simple Form Of The IM Statistic

In its original form, the IM statistic involves third derivatives ofthe likelihood function.

Lancaster (1984) provides a version of the test that onlyinvolves first and second derivatives.

The resulting test statistic is easy to compute.

116 / 118

Page 117: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

One should exert some caution in using this IM test, though, asthe finite-sample performance is generally believed to bedreadful.

Better alternatives, including bootstrapping, exist, but we willnot spend time on that here.

See Davidson and MacKinnon (1993, Section 16.9) fordiscussion and references.

117 / 118

Page 118: Miscellaneous Topics In Single-Spell Duration Analysisjenni.uchicago.edu/econ312/Slides/Misc-Topics...Miscellaneous Topics In Single-Spell Duration Analysis Jaap H. Abbring James J

Mono The Classical Tests The Information-Matrix Test

Comment

He shows that the covariance matrix of the IM distance vector ddepends on first and second derivatives of L only if the model is notmisspecified:

V = E [dd ′]− E

[d∂l

∂π′

](E

[∂l

∂π

∂l

∂π′

])−1

E

[∂l

∂πd ′]. (29)

A consistent estimator can be found by replacing expectations E bysample averages n−1

∑, which gives

V = n−1(Y ′1Y1 − Y ′1Y2 (Y ′2Y2)

−1Y ′2Y1

), (30)

where Y1 is a n ×m matrix of observations of d ′, and Y2 is a n ×mmatrix of observations of ∂l/∂π′.

118 / 118