In this chapter, the dependent variable Y is partly ...froelich.vwl.uni-mannheim.de/fileadmin/user_upload/froelich/... · 4. Corner Solutions and Censored Regression . Models (Ch

4. Corner Solutions and Censored Regression Models (Ch. 16) In this chapter, the dependent variable Y is partly continuous but also has positive probability mass at one or more points

Model:

latent variable: y*

observed variable: y = f(y*)

2 cases:

• data censoring: censoring above or below a certain value (data problem)

• corner solution: problem lies in the nature of topic, y* cannot be observed

Example 1: Censored dependent variable

Variable with quantitative meaning y* In many surveys: data on wealth is observed only under 100’000. Otherwise it is stated that it is above 100’000, but not the value itself.

observation rule: wealth = min(wealth*, 100’000)

We are interested in *( | )E y x but y* is censored.

Example: y* is true wealth and *( | )E y x xβ=

But: y* is only observed if y*<100'000.

For y*≥100'000 wealth is coded as 100'000.

Observation rule: y = min(y*,100'000)

How to obtain consistent estimates of β with censored data?

Example 2: Corner Solution Typical in microeconometrics: Let y be the observed choice of an economic agent

where we observe a masspoint at 0 Examples: Hours of labour supply, alcohol consumption, charitable contributions

Result from utility maximising behavior

This case is called corner solution (i.e. not an interior solution) In contrast to Example 1, the observability of dependent variable is not a problem

(For modelling we will often use a latent model with y* below)

Interest in properties of conditional distribution of y e.g. or . ( | )E y x Pr( 0 | )y x=

Why not assume that ( | )E y x xβ= and use OLS?

- if ) cannot be linear in x 0y ≥ ( |E y x- OLS implies constant marginal effects - predicted y can be negative

Censored Normal Regression Model (Tobit)

(16.3) iu*i iy x β= + , 2| (0, )i iu x N σ (x includes intercept)

*(16.4) max(0, )i iy y=

Standard Tobit model or Tobit Type 1

If y* is censored we are interested in effect of x on y*: *( | )E y x

For corner solution models: y* has no meaningful interpretation.

Interested in ( | )E y x ( | , 0)E y x y > ( 0 | )P y x=

Especially for corner solution problems:

avoid placing too much emphasis on the latent variable y*

Expected values In models with corner solution we are interested in ) ( | )E y x ( | , 0E y x y >

(16.8) ( | ) ( 0 | ) 0 ( 0 | ) ( | , 0)E y x P y x P y x E y x y= = ⋅ + > ⋅ >

0( 0 | ) ( | , )P y x E y x y= > ⋅ >

(16.9) *( 0 | ) [1( 0) | ] ( 0 | ) ( | )P y x E y x P y x P u x xβ> = > = > = > −

( / / | ) ( / )P u x x xσ β σ β σ= > − = Φ

we could estimate β/σ consistently by probit of on xi 1( 0)iy >

To derive we need the following property of normal distribution: ( | , 0)E y x y >

if (0,1)z N∼ then ( )( | )1 ( )

cE z z cc

φ> =

−Φ

then for 2(0, )u N σ∼ ( / )( | ) |

1 ( / )u u c cE u u c E

cφ σσ σ

⎡σ σ σ σ

⎤⎛ ⎞> = > =⎜ ⎟ ⎢ ⎥−Φ⎝ ⎠ ⎦

⎣ Therefore

(16.10) ( / )( | , 0) ( | )( / )xE y x y x E u u x xx

φ β σβ β β σβ σ

⎡ ⎤> = + > − = + ⎢ ⎥Φ⎣ ⎦

because 1 ( / ) ( / )x xβ σ β σ−Φ − = Φ

The right-hand side of (16.10) is always positive.

( ) ( ) / (c c )cλ φ= Φ is called the inverse of Mills Ratio.

Marginal effects if jx is a continuous variable we have

( | , 0) ( )j jj

E y x y d xx dx

λβ β ββ

⎡ ⎤∂ >= + ⎢ ⎥∂ ⎦

, ⎣

if jx is not functionally related with another explanatory variable.

Derivative of ( )cλ is ( ) ( )[ ( )]d c c c cdcλ λ λ= − +

Hence

(16.11) ( | , 0) {1 ( / )[ / ( / )]}jj

E y x y x x xx

β λ β σ β σ λ β σ∂ >= − +

∂

The marginal effect is not only jβ , but also depends on a scaling factor.

It can be shown that this factor

( / ) {1 ( / )[ / ( / )]}x x x xθ β σ λ β σ β σ λ β σ= − + is between 0 and 1.

sign of jβ determines sign of marginal effect.

If 11x z= and 12

2x z= then

1 2 11

( | , 0) ( 2 ) ( / )E y x y z xz

β β θ β σ∂ >= +

∂

Rule: the marginal effect of a variable jx is the derivative of xβ with

respect to jx multiplied by the factor θ(.).

Elasticities can be computed in the same fashion

(16.13) 1

1

( | , 0)( | , 0)

E y x y xx E y x y

ε ∂ >= ⋅

∂ >

If kx is binary the marginal effect is the difference between

( | , 0E y x y > ) evaluated at 1kx = and evaluated at ( | , 0)E y x y > 0kx = .

The expectation can be computed as (from 16.8) ( | )E y x

( | ) ( 0 | ) ( | , 0)E y x P y x E y x y= > ⋅ >

(16.14) ( / )[ ( / )]( / ) ( / )x x xx x xβ σ β σλ β σβ σ β σφ β σ

= Φ += Φ +

The partial derivatives of with respect to a continuous ( | )E y x jx are (chain rule)

(16.15) ( | ) ( 0 | ) ( | , 0)j j

E y x P y x E y x yx x

∂ ∂ >= ⋅ >

∂ ∂

( | , 0)( 0 | )

j

E y x yP y xx

∂ >+ > ⋅

∂

2 parts:

- change in probability that , 0y >

- change of y given that . 0y >

Since ( 0 | ) ( / )P y x xβ σ> = Φ )( 0 | ) / ( / ) ( /j jP y x x xβ σ φ β σ∂ > ∂ = .

Inserting in (16.15) gives

(16.16) ( | ) ( / ) jj

E y x xx

β σ β∂= Φ

∂

The estimated scale factor is )ˆ ˆ( /xβ σΦ . This is the probability that given x.

The closer )

0y >

ˆ ˆ( /xβ σΦ is to 1, the closer is the marginal effect to ˆjβ .

Inconsistency of OLS

Why is OLS inconsistent with censored data, both when estimating

with the entire sample and with the uncensored subsample?

Uncensored subsample: from eq. (16.10) we have

(16.17) 1( / )i i iy x x eβ σλ β σ= + +

(16.18) ( | , 0) 0i i iE e x y > =

This implies that ( | , , 0) 0i i i iE e x yλ > = , where ( / )i ixλ λ β σ=

OLS of y on x implies that the variable λ is omitted. λ is clearly

correlated with x omitted variable bias

Full sample

The expectation E(y| x) can be written as (from 16.8)

( | ) ( 0 | ) ( | , 0)E y x P y x E y x y= > ⋅ >

(16.14) ( / )[ ( / )]( / ) ( / )x x xx x xβ σ β σλ β σβ σ β σφ β σ

= Φ += Φ +

(16.14) implies that is nonlinear in x, β and σ violation of ( | )E y x

Gauss Markov assumptions!

Illustration of bias

Estimation with simulated data we know true β *

0 1i i iy x uβ β= + + *max(0, )i iy y=

ix : uniformly distributed

iu : normal distributed with mean 0 and variance 4

0 1 1β β= =

10000 observations

Estimate OLS with a) full sample, b) uncensored subsample, and c) with Tobit

set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g y = 1 + x + u reg y x reg y x if y>0 replace y=0 if y<0 reg y x tobit y x, ll(0)

OLS without

censoring OLS with censoring, full sample

OLS with censoring, uncensored subsample

Tobit

x Constant

1.021 (14.52)** 1.020 (24.85)**

0.817 (14.34)** 1.395 (41.98)**

0.580 (9.71)** 2.032 (56.87)**

1.055 (14.64)** 1.008 (23.68)**

Observations 10000 10000 7720 10000 R-squared 0.01 0.01 0.01

Absolute value of t statistics in parentheses

* significant at 5%; ** significant at 1%

Estimation and Inference in Tobit model

Let { }( , ) : 1,...,i ix y i N= be a random sample with Tobit properties.

For ML we need the density of conditional on iy ix .

(0 | ) ( 0 | ) 1 ( / )i i i if x P y x x β σ= = = −Φ

For , )i0y > *( | ) ( |i i iP y y x P y y x≤ = ≤ , hence *( | ) ( |i if y x f y x= ) for all . 0y > *f is the density of conditional on *

iy ix .

Assuming that )* 2| ( ,i i iy x N x β σ∼ yields * 1( | ) [( ) / ]i if y x y xφ β σσ

= −

The density of conditional on iy ix can now be written as

(16.19) { } { }1[ 0] 1[ 0]( | ) 1 ( / ) (1/ ) [( ) / ]y yi i if y x x y xβ σ σ φ β σ= >= −Φ ⋅ −

The log likelihood function is 2[ ( , )]θ β σ=

(16.20) { }2

( ) 1[ 0]log[1 ( / )]

1[ 0] log [( ) / ] log( ) / 2i i i

i i i

l y x

y y x

θ β σ

φ β σ σ

= = −Φ

+ > − −

The score vectors and hessian are given in Wooldridge, p. 526, eqs.

(16.21) - (16.23).

Derivatives

( ) [ ] ( ) [ ]( ) 21 0 1 01 /

i i ii i i

i

xy y yβθ φ ββ σ β σ σ∂ ⎛ ⎞= − = + > −⎜ ⎟∂ −Φ⎝ ⎠

x xxx

l

( ) [ ] ( )( ) [ ] ( )2

2

11 0 1 02 22 1 /

i ii ii

i

yy y

ββ βθ φ2 4 2σ σ σ σσ β σ

⎛ ⎞−∂ ⎛ ⎞= = + > −⎜ ⎟⎜ ⎟ ⎜ ⎟∂ −Φ⎝ ⎠ ⎝ ⎠

xx xx

l

for covariance use inverse of information matrix

for information matrix use negative Hessian matrix ( )i iE θ⎡ ⎤− ⎣ ⎦H x

( ) ( )1

1, , i i i i ii i

i i i

a bb c

θ θ−

− ′ ′⎛ ⎞= = ⎜⎜

⎝ ⎠

x x xx A xx

V ⎟⎟

φσ γφ− ⎛ ⎞= − − −Φ⎜ ⎟i

−Φ⎝ ⎠x

22

1i

i i ii

a ( )3 2

2

2 1i i

i i i ii

b σ γφγ φ φ− ⎛ ⎞

= − + −⎜ ⎟−Φ⎝ ⎠

xx

( ) ( ) ( ) 243 2

4 1i i

i i i i i ii

cγ φσ γ φ γ φ

− ⎛ ⎞= − + − − Φ⎜ ⎟−Φ⎝ ⎠

xx x ( ) ( ); ; i i i i

βγ φ φ γ γσ

= = Φ = Φx x

Presentation of estimation results (Table 16.1)

- Quantities of interest: ( )β βˆ ˆ, . .j jstd err

- For corner solutions: give partial effects, computed at ( )β βˆ ˆ, . .j jstd err and

selected values of x (e.g. average values of all x)

- Report value of Log-likelihood for testing

Marginal Effects OLS Tobit P(y>0|x) E(y|x,y>0) E(y|x)

nwifeinc

educ

exper

expersq

age

kidslt6

kidsge6

Constant

-3.447 (1.35) 28.761 (2.22)* 65.673 (6.59)** -0.700 (2.16)* -30.512 (6.99)** -442.090 (7.51)** -32.779 (1.41)

1,330.482 (4.91)**

-8.814 (1.98)* 80.646 (3.74)** 131.564 (7.61)** -1.864 (3.47)** -54.405 (7.33)** -894.022 (7.99)** -16.218 (0.42) 965.305 (2.16)*

-0.003 (1.98)* 0.028

(3.74)** 0.045

(7.61)** -0.001 (3.47)** -0.019 (7.33)** -0.307 (7.99)** -0.006 (0.42) 0.331 (2.16)*

-3.746 (1.98)* 34.275 (3.74)** 55.916 (7.61)** -0.792 (3.47)** -23.123 (7.33)** -379.968 (7.99)** -6.893 (0.42) 410.264 (2.16)*

-5.326 (1.98)* 48.734 (3.74)** 79.504 (7.61)** -1.127 (3.47)** -32.877 (7.33)** -540.257 (7.99)** -9.801 (0.42) 583.333 (2.16)*

σ 750.18 1122.02 Observations 753 753 753 753 753 R2 / log

likelihood 0.27 -3819.09

* significant at 5%; ** significant at 1%

use d:\stata\micro\data\mroz

reg hours nwifeinc educ exper expersq age kidslt6 kidsge6

outreg using d:\stata\micro\out\mroz_tobit, nol replace

tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6,ll(0)

outreg using d:\stata\micro\out\mroz_tobit, nol append

dtobit

outreg using d:\stata\micro\out\mroz_tobit, nol append margin(p)

outreg using d:\stata\micro\out\mroz_tobit, nol append margin(c)

outreg using d:\stata\micro\out\mroz_tobit, nol append margin(u)

mfx, predict(pr(0,.))

mfx, predict(e(0,.))

mfx, predict(e(.,.))

5.6 Specification Issues in Tobit Models

Neglected Heterogeneity

( )max 0,y q uβ γ= +x + ( )2, ~ 0,u q N σx

where q is unobserved, independent of x and distributed normally

new error term is γq + u, normally distributed and independent of x

NO PROBLEM (but not possible to give partial effects conditioning on q)

Endogenous right hand side variable

The model now is

(16.26) 1 1 1 1 2 1max(0, )y z y uδ α= + +

(16.27) 2 2 2 1 21 2 22 2y z v z z vδ δ δ= + = + +

1 2( ,u v ) are normally distributed with mean 0 and independent of z.

1 2( , )corr u v 0≠ 2y endogenous

y2 and u1 are correlated INCONSISTENCY of Tobit

Needed: consistent estimator of 1δ and 1α

2-step procedure (Smith Blundell) or full maximum likelihood

2-step approach (Smith and Blundell)

same approach as in two-step estimator for binary probit

Since ) have bivariate normal distribution we have 1 2( ,u v

(16.28) 1 1 2 1u v eθ= +

where 22

1 1 /θ η τ= , 2 1cov( , )i v uη = , 22 2( )Var vτ = . is independent of z 1e

and and has variance 2v 21τ .

Inserting (16.28) in (16.26) gives

(16.29) 1 1 1 1 2 1 2 1max(0, )y z y v eδ α θ= + + +

If we could observe we could estimate (16.29) with Tobit 2v

But can be estimated with OLS of (16.27) 2v

Step 1: Estimate (16.27) with OLS and compute 2 2 2̂v̂ y zδ= −

Step 2: Estimate Tobit of 1y on 1z , 2y and . This yields consistent 2v̂

estimators of 1δ , 1α , 1θ and 21τ

The t-value for is test statistic for hypothesis that 2v̂ 2y is exogenous.

Test is independent of distributional assumption in (16.27)

Estimation with ML

If test rejects exogeneity of 2y ML estimation of (16.26) and (16.27).

Requires joint distribution of )1 2( ,y y :

(16.32) 1 2 1 2 2( , | ) ( | , ) ( | )f y y z f y y z f y z=

The density 2( | )f y z is )22 2( ,N zδ τ

1 2| ,y z y has expectation )21 1 1 2 1 2 1 1 1 2 1 2 2 2( / )(z y v z y y zδ α θ δ α η τ δ+ + = + + −

and variance 22 2 2 21 1 1( / )τ σ η τ= −

21 1( )Var uσ = ; )2 1( ,i Cov v uη = ; 2

2 2( )Var vτ =

From this we can derive log likelihood (insert the two densities into (16.32)).

Heteroskedasticity and nonnormality

Both heteroskedasticity and nonnormality make Tobit inconsistent

Corner solution model: heteroskedasticity and nonnormality change

functional form of and ( | )E y x ( | , 0)E y x y >

also impact on marginal effects

Testing for heteroskedasticity and nonnormality is necessary:

Hereroskedasticity:

specify ( ) ( )2 expVar u σ δ=x z subvector of x z

use LM statistic to test for δ=0 and score approach

LM Test (Wooldridge, p. 534)

Tobit with heteroskedasticity set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g u2 = u*exp(x) g y2 = 1 + x +u2 replace y2=0 if y2<0 reg y2 x reg y2 x if y2>0 tobit y2 x, ll(0) OLS, alle

Beobachtungen OLS, nur positive Werte

Tobit

x Constant

1.990 (21.77)** 1.318 (24.78)**

3.168 (30.56)** 1.848 (30.90)**

2.054 (15.77)** 0.525 (6.87)**

Observations R-squared

10000 0.05

6775 0.12

10000

Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%

Nonnormality:

either specify distribution in maximum likelihood and use LR test

or: test with standard Tobit estimation (Conditional moment test, Pagan and Vella)

Nichtnormalität: conditional moment test tobcm in Stata

“Appropriateness” of the Tobit model

use probit coefficient for dependent 1(y>0) γ̂

compare them to normalised tobit coefficients ˆ ˆ/β σ

should be of same size (no sign changes)

Estimation under less restrictive assumptions

β can be estimated consistently without distributional assumptions for

u and without assumption that x and u are independent.

(16.34) *y x uβ= + , ( | ) 0Med u x =

Implies that *( | )Med y x xβ= . If |u x is symmetrically distributed

around zero the mean and the median are equal.

g(y) non decreasing [ ( )] [ ( )]Med g y g Med y=

(16.35) *( | ) max[0, ( | )] max[0, )]Med y x Med y x xβ= =

(since *max(0, )y y= is non-decreasing).

β can be estimated by the following minimisation problem

(16.36) 1

min max(0, )N

ii

y xβ

β=

−∑

Censored Least Absolute Deviations (CLAD) estimator (Powell, 1984)

Problem: CLAD estimator is not informative about ( 0 | )P y > x and

( | , 0E y x y > ) and in general also about ( | )E y x

Schätzung mit CLAD

set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g u2 = u*exp(x) g y2 = 1 + x +u2 replace y2=0 if y2<0 tobit y2 x, ll(0) clad y2 x, ll(0)

Tobit CLAD x Constant

2.054 (0.127) 0.525 (0.074)

1.057 (0.164) 0.982 (0.067)

Standard Errors in parentheses* significant at 5%; ** significant at 1%

Alternative to Tobit model: Hurdle model

drawback of Tobit: ( 0 | )

j

P y xx

∂ >∂

and ( | , 0)

j

E y x yx

∂ >∂

have same sign;

relative effects of continous variables are identical

Rewrite Tobit density function

1[ 0] 1[ 0]( | ) {1 ( / )} {(1/ ) [( ) / ]}y yi i if y x x y xβ σ σ φ β σ= >= −Φ ⋅ −

by multiplying with and dividing by 1[ 0]{ ( / )} yix β σ >Φ

1[ 0] 1[ 0]

1[ 0] 1[ 0]

( | ) {1 ( / )} { ( / )}

{(1/ ) [( ) / ]} /{ ( / )}

y yi i i

y yi i

f y x x x

y x x

β σ β σ

σ φ β σ β σ

= >

> >

= −Φ ⋅ Φ

× − Φ

Line 1: probit; line 2: truncated regression (next chapter)

Truncated normal distribution:

1( | , 0) {(1/ ) [( ) / ]}{ ( / )}i if y x y y x xσ φ β σ β σ −> = − Φ

where the term 1{ ( / )}ix β σ −Φ ensures that density integrates to 1 over . 0y >

The hurdle model has the following density of y given x 1[ 0] 1[ 0]

1[ 0] 1[ 0]

( | ) {1 ( )} { ( )}

{(1/ ) [( ) / ]} /{ ( / )}

y yi i i

y yi i

f y x x x

y x x

γ γ

σ φ β σ β σ

= >

> >

= −Φ ⋅ Φ

× − Φ

This equals the Tobit density if /γ β σ=

This restriction can be tested e.g. by a likelihood ratio test

Pr 22 (log log ) logTrunc ob TobKl l l χ× + − ∼

Mroz Data example Tobit Truncreg Probit nwifeinc educ exper expersq age kidslt6 kidsge6 Constant

-8.814 (1.98)* 80.646 (3.74) **131.564 (7.61)** -1.864 (3.47)** -54.405 (7.33)** -894.022 (7.99)** -16.218 (0.42) 965.305 (2.16)*

0.153 (0.03) -29.853 (1.31) 72.623 (3.42)** -0.944 (1.55) -27.444 (3.31)** -484.711 (3.15)** -102.657 (2.36)* 2,123.516 (4.39)**

-0.012 (2.48) *0.131 (5.18)** 0.123 (6.59)** -0.002 (3.15)** -0.053 (6.23)** -0.868 (7.33)** 0.036 (0.83) 0.270 (0.53)

σ 1122.02 (26.97) 850.76 (19.42) 1 (-) Log Likelihood -3819.09 -3390.65 -401.30 Observations 753 428 753 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%

5.8 Applying Censored Regression to Panel Data Pooled tobit

( )max 0,it it ity uβ= x + t = 1, 2, ..., T

( )2~ 0,it itu N σx

no strict exogeneity of (allows lagged y also), u can be serially dependent itx

Partial log-likelihood function:

( )2

1 1

,N T

iti t

β σ= =∑∑l

(as if it was a cross section data set of length N·T)

but robust variance matrix estimator is needed

5.8 Applying Censored Regression to Panel Data Unobserved Effects Tobit Models under Strict Exogeneity

( )max 0,it it i ity c uβ= +x + t = 1, 2, ..., T

( )2, ~ 0,it i i uu c N σx

strict exogeneity of , but and correlated, u serially dependent ix ix ic

define distribution for unobserved effect (Chamberlain)

( )2~ ,i i i ac N ψ ξ σ+x x

( )max 0,it it i i ity a uψ β ξ= + + +x x +

( )2, ~ 0,it i i uu a N σx

( )2~ 0,i i aa N σx

This is the random effects Tobit model with an additional set ot time-constant explanatory variables ix

5.8 Applying Censored Regression to Panel Data Dynamic Unobserved Effects Tobit Models

( )1 , 1max 0,it it i t i ity yδ ρ −= + +z c u+ t = 1, 2, ..., T

( ) ( )2, 1 0, ,..., , ~ 0,it i i t i i uu y y c N σ−z t = 1, 2, ..., T

strict exogeneity for iz

define distribution for unobserved effect

( )20 0 0, ~ ,i i i i i ac y N yψ ξ ξ σ + +z z

( )1 , 1 0 0max 0,it it i t i i i ity y y a uψ δ ρ ξ ξ−= + + + + +z z +

( ) ( )2, 1 0, ,..., , ~ 0,it i i t i i uu y y a N σ−z

( )20, ~ 0,i i i aa y N σz

This is the random effects Tobit model with explanatory variables ( ), 1 0, , ,it i t i iy y−z z

Documents

In this chapter, the dependent variable Y is partly ...froelich.vwl.uni-mannheim.de/fileadmin/user_upload/froelich/... · 4. Corner Solutions and Censored Regression . Models (Ch