Upload
nguyenanh
View
221
Download
1
Embed Size (px)
Citation preview
4. Corner Solutions and Censored Regression Models (Ch. 16) In this chapter, the dependent variable Y is partly continuous but also has positive probability mass at one or more points
Model:
latent variable: y*
observed variable: y = f(y*)
2 cases:
• data censoring: censoring above or below a certain value (data problem)
• corner solution: problem lies in the nature of topic, y* cannot be observed
Example 1: Censored dependent variable
Variable with quantitative meaning y* In many surveys: data on wealth is observed only under 100’000. Otherwise it is stated that it is above 100’000, but not the value itself.
observation rule: wealth = min(wealth*, 100’000)
We are interested in *( | )E y x but y* is censored.
Example: y* is true wealth and *( | )E y x xβ=
But: y* is only observed if y*<100'000.
For y*≥100'000 wealth is coded as 100'000.
Observation rule: y = min(y*,100'000)
How to obtain consistent estimates of β with censored data?
Example 2: Corner Solution Typical in microeconometrics: Let y be the observed choice of an economic agent
where we observe a masspoint at 0 Examples: Hours of labour supply, alcohol consumption, charitable contributions
Result from utility maximising behavior
This case is called corner solution (i.e. not an interior solution) In contrast to Example 1, the observability of dependent variable is not a problem
(For modelling we will often use a latent model with y* below)
Interest in properties of conditional distribution of y e.g. or . ( | )E y x Pr( 0 | )y x=
Why not assume that ( | )E y x xβ= and use OLS?
- if ) cannot be linear in x 0y ≥ ( |E y x- OLS implies constant marginal effects - predicted y can be negative
Censored Normal Regression Model (Tobit)
(16.3) iu*i iy x β= + , 2| (0, )i iu x N σ (x includes intercept)
*(16.4) max(0, )i iy y=
Standard Tobit model or Tobit Type 1
If y* is censored we are interested in effect of x on y*: *( | )E y x
For corner solution models: y* has no meaningful interpretation.
Interested in ( | )E y x ( | , 0)E y x y > ( 0 | )P y x=
Especially for corner solution problems:
avoid placing too much emphasis on the latent variable y*
Expected values In models with corner solution we are interested in ) ( | )E y x ( | , 0E y x y >
(16.8) ( | ) ( 0 | ) 0 ( 0 | ) ( | , 0)E y x P y x P y x E y x y= = ⋅ + > ⋅ >
0( 0 | ) ( | , )P y x E y x y= > ⋅ >
(16.9) *( 0 | ) [1( 0) | ] ( 0 | ) ( | )P y x E y x P y x P u x xβ> = > = > = > −
( / / | ) ( / )P u x x xσ β σ β σ= > − = Φ
we could estimate β/σ consistently by probit of on xi 1( 0)iy >
To derive we need the following property of normal distribution: ( | , 0)E y x y >
if (0,1)z N∼ then ( )( | )1 ( )
cE z z cc
φ> =
−Φ
then for 2(0, )u N σ∼ ( / )( | ) |
1 ( / )u u c cE u u c E
cφ σσ σ
⎡σ σ σ σ
⎤⎛ ⎞> = > =⎜ ⎟ ⎢ ⎥−Φ⎝ ⎠ ⎦
⎣ Therefore
(16.10) ( / )( | , 0) ( | )( / )xE y x y x E u u x xx
φ β σβ β β σβ σ
⎡ ⎤> = + > − = + ⎢ ⎥Φ⎣ ⎦
because 1 ( / ) ( / )x xβ σ β σ−Φ − = Φ
The right-hand side of (16.10) is always positive.
( ) ( ) / (c c )cλ φ= Φ is called the inverse of Mills Ratio.
Marginal effects if jx is a continuous variable we have
( | , 0) ( )j jj
E y x y d xx dx
λβ β ββ
⎡ ⎤∂ >= + ⎢ ⎥∂ ⎦
, ⎣
if jx is not functionally related with another explanatory variable.
Derivative of ( )cλ is ( ) ( )[ ( )]d c c c cdcλ λ λ= − +
Hence
(16.11) ( | , 0) {1 ( / )[ / ( / )]}jj
E y x y x x xx
β λ β σ β σ λ β σ∂ >= − +
∂
The marginal effect is not only jβ , but also depends on a scaling factor.
It can be shown that this factor
( / ) {1 ( / )[ / ( / )]}x x x xθ β σ λ β σ β σ λ β σ= − + is between 0 and 1.
sign of jβ determines sign of marginal effect.
If 11x z= and 12
2x z= then
1 2 11
( | , 0) ( 2 ) ( / )E y x y z xz
β β θ β σ∂ >= +
∂
Rule: the marginal effect of a variable jx is the derivative of xβ with
respect to jx multiplied by the factor θ(.).
Elasticities can be computed in the same fashion
(16.13) 1
1
( | , 0)( | , 0)
E y x y xx E y x y
ε ∂ >= ⋅
∂ >
If kx is binary the marginal effect is the difference between
( | , 0E y x y > ) evaluated at 1kx = and evaluated at ( | , 0)E y x y > 0kx = .
The expectation can be computed as (from 16.8) ( | )E y x
( | ) ( 0 | ) ( | , 0)E y x P y x E y x y= > ⋅ >
(16.14) ( / )[ ( / )]( / ) ( / )x x xx x xβ σ β σλ β σβ σ β σφ β σ
= Φ += Φ +
The partial derivatives of with respect to a continuous ( | )E y x jx are (chain rule)
(16.15) ( | ) ( 0 | ) ( | , 0)j j
E y x P y x E y x yx x
∂ ∂ >= ⋅ >
∂ ∂
( | , 0)( 0 | )
j
E y x yP y xx
∂ >+ > ⋅
∂
2 parts:
- change in probability that , 0y >
- change of y given that . 0y >
Since ( 0 | ) ( / )P y x xβ σ> = Φ )( 0 | ) / ( / ) ( /j jP y x x xβ σ φ β σ∂ > ∂ = .
Inserting in (16.15) gives
(16.16) ( | ) ( / ) jj
E y x xx
β σ β∂= Φ
∂
The estimated scale factor is )ˆ ˆ( /xβ σΦ . This is the probability that given x.
The closer )
0y >
ˆ ˆ( /xβ σΦ is to 1, the closer is the marginal effect to ˆjβ .
Inconsistency of OLS
Why is OLS inconsistent with censored data, both when estimating
with the entire sample and with the uncensored subsample?
Uncensored subsample: from eq. (16.10) we have
(16.17) 1( / )i i iy x x eβ σλ β σ= + +
(16.18) ( | , 0) 0i i iE e x y > =
This implies that ( | , , 0) 0i i i iE e x yλ > = , where ( / )i ixλ λ β σ=
OLS of y on x implies that the variable λ is omitted. λ is clearly
correlated with x omitted variable bias
Full sample
The expectation E(y| x) can be written as (from 16.8)
( | ) ( 0 | ) ( | , 0)E y x P y x E y x y= > ⋅ >
(16.14) ( / )[ ( / )]( / ) ( / )x x xx x xβ σ β σλ β σβ σ β σφ β σ
= Φ += Φ +
(16.14) implies that is nonlinear in x, β and σ violation of ( | )E y x
Gauss Markov assumptions!
Illustration of bias
Estimation with simulated data we know true β *
0 1i i iy x uβ β= + + *max(0, )i iy y=
ix : uniformly distributed
iu : normal distributed with mean 0 and variance 4
0 1 1β β= =
10000 observations
Estimate OLS with a) full sample, b) uncensored subsample, and c) with Tobit
set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g y = 1 + x + u reg y x reg y x if y>0 replace y=0 if y<0 reg y x tobit y x, ll(0)
OLS without
censoring OLS with censoring, full sample
OLS with censoring, uncensored subsample
Tobit
x Constant
1.021 (14.52)** 1.020 (24.85)**
0.817 (14.34)** 1.395 (41.98)**
0.580 (9.71)** 2.032 (56.87)**
1.055 (14.64)** 1.008 (23.68)**
Observations 10000 10000 7720 10000 R-squared 0.01 0.01 0.01
Absolute value of t statistics in parentheses
* significant at 5%; ** significant at 1%
Estimation and Inference in Tobit model
Let { }( , ) : 1,...,i ix y i N= be a random sample with Tobit properties.
For ML we need the density of conditional on iy ix .
(0 | ) ( 0 | ) 1 ( / )i i i if x P y x x β σ= = = −Φ
For , )i0y > *( | ) ( |i i iP y y x P y y x≤ = ≤ , hence *( | ) ( |i if y x f y x= ) for all . 0y > *f is the density of conditional on *
iy ix .
Assuming that )* 2| ( ,i i iy x N x β σ∼ yields * 1( | ) [( ) / ]i if y x y xφ β σσ
= −
The density of conditional on iy ix can now be written as
(16.19) { } { }1[ 0] 1[ 0]( | ) 1 ( / ) (1/ ) [( ) / ]y yi i if y x x y xβ σ σ φ β σ= >= −Φ ⋅ −
The log likelihood function is 2[ ( , )]θ β σ=
(16.20) { }2
( ) 1[ 0]log[1 ( / )]
1[ 0] log [( ) / ] log( ) / 2i i i
i i i
l y x
y y x
θ β σ
φ β σ σ
= = −Φ
+ > − −
The score vectors and hessian are given in Wooldridge, p. 526, eqs.
(16.21) - (16.23).
Derivatives
( ) [ ] ( ) [ ]( ) 21 0 1 01 /
i i ii i i
i
xy y yβθ φ ββ σ β σ σ∂ ⎛ ⎞= − = + > −⎜ ⎟∂ −Φ⎝ ⎠
x xxx
l
( ) [ ] ( )( ) [ ] ( )2
2
11 0 1 02 22 1 /
i ii ii
i
yy y
ββ βθ φ2 4 2σ σ σ σσ β σ
⎛ ⎞−∂ ⎛ ⎞= = + > −⎜ ⎟⎜ ⎟ ⎜ ⎟∂ −Φ⎝ ⎠ ⎝ ⎠
xx xx
l
for covariance use inverse of information matrix
for information matrix use negative Hessian matrix ( )i iE θ⎡ ⎤− ⎣ ⎦H x
( ) ( )1
1, , i i i i ii i
i i i
a bb c
θ θ−
− ′ ′⎛ ⎞= = ⎜⎜
⎝ ⎠
x x xx A xx
V ⎟⎟
φσ γφ− ⎛ ⎞= − − −Φ⎜ ⎟i
−Φ⎝ ⎠x
22
1i
i i ii
a ( )3 2
2
2 1i i
i i i ii
b σ γφγ φ φ− ⎛ ⎞
= − + −⎜ ⎟−Φ⎝ ⎠
xx
( ) ( ) ( ) 243 2
4 1i i
i i i i i ii
cγ φσ γ φ γ φ
− ⎛ ⎞= − + − − Φ⎜ ⎟−Φ⎝ ⎠
xx x ( ) ( ); ; i i i i
βγ φ φ γ γσ
= = Φ = Φx x
Presentation of estimation results (Table 16.1)
- Quantities of interest: ( )β βˆ ˆ, . .j jstd err
- For corner solutions: give partial effects, computed at ( )β βˆ ˆ, . .j jstd err and
selected values of x (e.g. average values of all x)
- Report value of Log-likelihood for testing
Marginal Effects OLS Tobit P(y>0|x) E(y|x,y>0) E(y|x)
nwifeinc
educ
exper
expersq
age
kidslt6
kidsge6
Constant
-3.447 (1.35) 28.761 (2.22)* 65.673 (6.59)** -0.700 (2.16)* -30.512 (6.99)** -442.090 (7.51)** -32.779 (1.41)
1,330.482 (4.91)**
-8.814 (1.98)* 80.646 (3.74)** 131.564 (7.61)** -1.864 (3.47)** -54.405 (7.33)** -894.022 (7.99)** -16.218 (0.42) 965.305 (2.16)*
-0.003 (1.98)* 0.028
(3.74)** 0.045
(7.61)** -0.001 (3.47)** -0.019 (7.33)** -0.307 (7.99)** -0.006 (0.42) 0.331 (2.16)*
-3.746 (1.98)* 34.275 (3.74)** 55.916 (7.61)** -0.792 (3.47)** -23.123 (7.33)** -379.968 (7.99)** -6.893 (0.42) 410.264 (2.16)*
-5.326 (1.98)* 48.734 (3.74)** 79.504 (7.61)** -1.127 (3.47)** -32.877 (7.33)** -540.257 (7.99)** -9.801 (0.42) 583.333 (2.16)*
σ 750.18 1122.02 Observations 753 753 753 753 753 R2 / log
likelihood 0.27 -3819.09
* significant at 5%; ** significant at 1%
use d:\stata\micro\data\mroz
reg hours nwifeinc educ exper expersq age kidslt6 kidsge6
outreg using d:\stata\micro\out\mroz_tobit, nol replace
tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6,ll(0)
outreg using d:\stata\micro\out\mroz_tobit, nol append
dtobit
outreg using d:\stata\micro\out\mroz_tobit, nol append margin(p)
outreg using d:\stata\micro\out\mroz_tobit, nol append margin(c)
outreg using d:\stata\micro\out\mroz_tobit, nol append margin(u)
mfx, predict(pr(0,.))
mfx, predict(e(0,.))
mfx, predict(e(.,.))
5.6 Specification Issues in Tobit Models
Neglected Heterogeneity
( )max 0,y q uβ γ= +x + ( )2, ~ 0,u q N σx
where q is unobserved, independent of x and distributed normally
new error term is γq + u, normally distributed and independent of x
NO PROBLEM (but not possible to give partial effects conditioning on q)
Endogenous right hand side variable
The model now is
(16.26) 1 1 1 1 2 1max(0, )y z y uδ α= + +
(16.27) 2 2 2 1 21 2 22 2y z v z z vδ δ δ= + = + +
1 2( ,u v ) are normally distributed with mean 0 and independent of z.
1 2( , )corr u v 0≠ 2y endogenous
y2 and u1 are correlated INCONSISTENCY of Tobit
Needed: consistent estimator of 1δ and 1α
2-step procedure (Smith Blundell) or full maximum likelihood
2-step approach (Smith and Blundell)
same approach as in two-step estimator for binary probit
Since ) have bivariate normal distribution we have 1 2( ,u v
(16.28) 1 1 2 1u v eθ= +
where 22
1 1 /θ η τ= , 2 1cov( , )i v uη = , 22 2( )Var vτ = . is independent of z 1e
and and has variance 2v 21τ .
Inserting (16.28) in (16.26) gives
(16.29) 1 1 1 1 2 1 2 1max(0, )y z y v eδ α θ= + + +
If we could observe we could estimate (16.29) with Tobit 2v
But can be estimated with OLS of (16.27) 2v
Step 1: Estimate (16.27) with OLS and compute 2 2 2̂v̂ y zδ= −
Step 2: Estimate Tobit of 1y on 1z , 2y and . This yields consistent 2v̂
estimators of 1δ , 1α , 1θ and 21τ
The t-value for is test statistic for hypothesis that 2v̂ 2y is exogenous.
Test is independent of distributional assumption in (16.27)
Estimation with ML
If test rejects exogeneity of 2y ML estimation of (16.26) and (16.27).
Requires joint distribution of )1 2( ,y y :
(16.32) 1 2 1 2 2( , | ) ( | , ) ( | )f y y z f y y z f y z=
The density 2( | )f y z is )22 2( ,N zδ τ
1 2| ,y z y has expectation )21 1 1 2 1 2 1 1 1 2 1 2 2 2( / )(z y v z y y zδ α θ δ α η τ δ+ + = + + −
and variance 22 2 2 21 1 1( / )τ σ η τ= −
21 1( )Var uσ = ; )2 1( ,i Cov v uη = ; 2
2 2( )Var vτ =
From this we can derive log likelihood (insert the two densities into (16.32)).
Heteroskedasticity and nonnormality
Both heteroskedasticity and nonnormality make Tobit inconsistent
Corner solution model: heteroskedasticity and nonnormality change
functional form of and ( | )E y x ( | , 0)E y x y >
also impact on marginal effects
Testing for heteroskedasticity and nonnormality is necessary:
Hereroskedasticity:
specify ( ) ( )2 expVar u σ δ=x z subvector of x z
use LM statistic to test for δ=0 and score approach
LM Test (Wooldridge, p. 534)
Tobit with heteroskedasticity set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g u2 = u*exp(x) g y2 = 1 + x +u2 replace y2=0 if y2<0 reg y2 x reg y2 x if y2>0 tobit y2 x, ll(0) OLS, alle
Beobachtungen OLS, nur positive Werte
Tobit
x Constant
1.990 (21.77)** 1.318 (24.78)**
3.168 (30.56)** 1.848 (30.90)**
2.054 (15.77)** 0.525 (6.87)**
Observations R-squared
10000 0.05
6775 0.12
10000
Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%
Nonnormality:
either specify distribution in maximum likelihood and use LR test
or: test with standard Tobit estimation (Conditional moment test, Pagan and Vella)
Nichtnormalität: conditional moment test tobcm in Stata
“Appropriateness” of the Tobit model
use probit coefficient for dependent 1(y>0) γ̂
compare them to normalised tobit coefficients ˆ ˆ/β σ
should be of same size (no sign changes)
Estimation under less restrictive assumptions
β can be estimated consistently without distributional assumptions for
u and without assumption that x and u are independent.
(16.34) *y x uβ= + , ( | ) 0Med u x =
Implies that *( | )Med y x xβ= . If |u x is symmetrically distributed
around zero the mean and the median are equal.
g(y) non decreasing [ ( )] [ ( )]Med g y g Med y=
(16.35) *( | ) max[0, ( | )] max[0, )]Med y x Med y x xβ= =
(since *max(0, )y y= is non-decreasing).
β can be estimated by the following minimisation problem
(16.36) 1
min max(0, )N
ii
y xβ
β=
−∑
Censored Least Absolute Deviations (CLAD) estimator (Powell, 1984)
Problem: CLAD estimator is not informative about ( 0 | )P y > x and
( | , 0E y x y > ) and in general also about ( | )E y x
Schätzung mit CLAD
set obs 10000 g x = uniform() g u = invnorm(uniform()) replace u = 2*u g u2 = u*exp(x) g y2 = 1 + x +u2 replace y2=0 if y2<0 tobit y2 x, ll(0) clad y2 x, ll(0)
Tobit CLAD x Constant
2.054 (0.127) 0.525 (0.074)
1.057 (0.164) 0.982 (0.067)
Standard Errors in parentheses* significant at 5%; ** significant at 1%
Alternative to Tobit model: Hurdle model
drawback of Tobit: ( 0 | )
j
P y xx
∂ >∂
and ( | , 0)
j
E y x yx
∂ >∂
have same sign;
relative effects of continous variables are identical
Rewrite Tobit density function
1[ 0] 1[ 0]( | ) {1 ( / )} {(1/ ) [( ) / ]}y yi i if y x x y xβ σ σ φ β σ= >= −Φ ⋅ −
by multiplying with and dividing by 1[ 0]{ ( / )} yix β σ >Φ
1[ 0] 1[ 0]
1[ 0] 1[ 0]
( | ) {1 ( / )} { ( / )}
{(1/ ) [( ) / ]} /{ ( / )}
y yi i i
y yi i
f y x x x
y x x
β σ β σ
σ φ β σ β σ
= >
> >
= −Φ ⋅ Φ
× − Φ
Line 1: probit; line 2: truncated regression (next chapter)
Truncated normal distribution:
1( | , 0) {(1/ ) [( ) / ]}{ ( / )}i if y x y y x xσ φ β σ β σ −> = − Φ
where the term 1{ ( / )}ix β σ −Φ ensures that density integrates to 1 over . 0y >
The hurdle model has the following density of y given x 1[ 0] 1[ 0]
1[ 0] 1[ 0]
( | ) {1 ( )} { ( )}
{(1/ ) [( ) / ]} /{ ( / )}
y yi i i
y yi i
f y x x x
y x x
γ γ
σ φ β σ β σ
= >
> >
= −Φ ⋅ Φ
× − Φ
This equals the Tobit density if /γ β σ=
This restriction can be tested e.g. by a likelihood ratio test
Pr 22 (log log ) logTrunc ob TobKl l l χ× + − ∼
Mroz Data example Tobit Truncreg Probit nwifeinc educ exper expersq age kidslt6 kidsge6 Constant
-8.814 (1.98)* 80.646 (3.74) **131.564 (7.61)** -1.864 (3.47)** -54.405 (7.33)** -894.022 (7.99)** -16.218 (0.42) 965.305 (2.16)*
0.153 (0.03) -29.853 (1.31) 72.623 (3.42)** -0.944 (1.55) -27.444 (3.31)** -484.711 (3.15)** -102.657 (2.36)* 2,123.516 (4.39)**
-0.012 (2.48) *0.131 (5.18)** 0.123 (6.59)** -0.002 (3.15)** -0.053 (6.23)** -0.868 (7.33)** 0.036 (0.83) 0.270 (0.53)
σ 1122.02 (26.97) 850.76 (19.42) 1 (-) Log Likelihood -3819.09 -3390.65 -401.30 Observations 753 428 753 Absolute value of t statistics in parentheses * significant at 5%; ** significant at 1%
5.8 Applying Censored Regression to Panel Data Pooled tobit
( )max 0,it it ity uβ= x + t = 1, 2, ..., T
( )2~ 0,it itu N σx
no strict exogeneity of (allows lagged y also), u can be serially dependent itx
Partial log-likelihood function:
( )2
1 1
,N T
iti t
β σ= =∑∑l
(as if it was a cross section data set of length N·T)
but robust variance matrix estimator is needed
5.8 Applying Censored Regression to Panel Data Unobserved Effects Tobit Models under Strict Exogeneity
( )max 0,it it i ity c uβ= +x + t = 1, 2, ..., T
( )2, ~ 0,it i i uu c N σx
strict exogeneity of , but and correlated, u serially dependent ix ix ic
define distribution for unobserved effect (Chamberlain)
( )2~ ,i i i ac N ψ ξ σ+x x
( )max 0,it it i i ity a uψ β ξ= + + +x x +
( )2, ~ 0,it i i uu a N σx
( )2~ 0,i i aa N σx
This is the random effects Tobit model with an additional set ot time-constant explanatory variables ix
5.8 Applying Censored Regression to Panel Data Dynamic Unobserved Effects Tobit Models
( )1 , 1max 0,it it i t i ity yδ ρ −= + +z c u+ t = 1, 2, ..., T
( ) ( )2, 1 0, ,..., , ~ 0,it i i t i i uu y y c N σ−z t = 1, 2, ..., T
strict exogeneity for iz
define distribution for unobserved effect
( )20 0 0, ~ ,i i i i i ac y N yψ ξ ξ σ + +z z
( )1 , 1 0 0max 0,it it i t i i i ity y y a uψ δ ρ ξ ξ−= + + + + +z z +
( ) ( )2, 1 0, ,..., , ~ 0,it i i t i i uu y y a N σ−z
( )20, ~ 0,i i i aa y N σz
This is the random effects Tobit model with explanatory variables ( ), 1 0, , ,it i t i iy y−z z