Upload
dominictan
View
215
Download
0
Embed Size (px)
Citation preview
8/17/2019 Lecture 7slides
1/27
HE3021, NTU Lecture 7 FENG Qu
LECTURE 7: BINARY CHOICE MODELS
1 Modeling Choice Decision
2 Binary Choice Models: LPM, Probit, Logit
3 Estimation and Testing
8/17/2019 Lecture 7slides
2/27
2
1. Modeling Choice Decision
Example of (binary) choice decision: having another child
Other examples:
•
female labor force participation: work or be a housewife
• marriage decision: married or single
•
applying for mortgage: accepted or denied
• admitted or not by NTU
• go to graduate school or work: yes or no
•
vote or not vote
•
buy or not buy
(Q: Common feature? )
Binary choice: y = 1 or 0
(decision variable) (yes) (No)
8/17/2019 Lecture 7slides
3/27
3
How to model people’s choice decision?
Example: having another child
Economics: benefit- (opportunity) cost analysis (Gary Becker, 1981)
= 1 for having another child; = 0 for not having.
1 denotes the benefit from having another child, e.g., packages above, tax rebate,
baby bonus, bigger HDB flatter, subsidized childcare, and happiness, etc.;
0 denotes the (opportunity) cost, e.g., economic costs (of pregnancy, birth,
growing, education, healthcare), leisure, pay increase otherwise,
Then, the decision can be modeled by:
= 1 if 1 0 > 0;
0 otherwise
8/17/2019 Lecture 7slides
4/27
4
Another example: female labor force participation
Economics: the outcome of a market process
• Demanders: offer a wage based on labor’s expected marginal product
• Suppliers: whether or not to accept the offer depending on it exceeded their
own reservation wages
From women’s side, being in labor force or not is a trade-off between wage and
alternative: taking care of your kids, housekeeping, leisure,….
= 1 for working; = 0 for not being in labor force. If 1 denotes the utility
from working and 0
denotes the utility from staying at home, then the decision
can be modeled by:
= 1 if 1 0 > 0;
0 otherwise
8/17/2019 Lecture 7slides
5/27
5
Like in linear regression models, suppose that the difference 1 0 can be
interpreted by observable characteristics and unobserved error term,
1 0 = 0 + 11 +⋯+ + .
Thus,
( = 1|) = (1 0 > 0) = (0 + 11 +⋯+ + > 0)
Suppose the random variable follows a distribution with CDF (∙). Thus
( = 1|) = ( > (0 + 11 +⋯+ ))
= 1 ((0 + 11 +⋯+ ))
If the distribution is symmetric, then
( = 1|) = (0 + 11 +⋯+ ) and
( = 0|) = 1 ( = 1) = 1 (0 + 11 +⋯+ ).
8/17/2019 Lecture 7slides
6/27
8/17/2019 Lecture 7slides
7/27
7
LPM: OLS with a binary dependent variable ( = 1 or 0)
= 0 + 11 +⋯+ + , = 1,… , .
Example 1: birth intention in Singapore (data set: babybonus.dta)
dep. var. (y): ( yb) 1 for intention to have another child; 0 otherwise
indep. var.(x): (a scale measure of) policy package (cpl), current numberof children (num), husband monthly income (hmi), wife’s education (we),
wife’s age (wa)
_cons 1.521159 .2663598 5.71 0.000 .9935997 2.048718
wa -.0175984 .0085417 -2.06 0.042 -.0345164 -.0006805
we -.0551932 .0474809 -1.16 0.247 -.1492351 .0388486
hmi -.0005362 .0362719 -0.01 0.988 -.0723773 .0713049
num -.2759397 .0454467 -6.07 0.000 -.3659526 -.1859267
cpl .1865787 .076402 2.44 0.016 .0352548 .3379026
yb Coef. Std. Err. t P>|t| [95 Conf. Interval]
Total 30.4672131 121 .25179515 Root MSE = .41049
Adj R-squared = 0.3308
Residual 19.5458145 116 .168498401 R-squared = 0.3585
Model 10.9213986 5 2.18427972 Prob > F = 0.0000
F( 5, 116) = 12.96
Source SS df MS Number of obs = 122
. reg yb cpl num hmi we wa
8/17/2019 Lecture 7slides
8/27
8
Result:
Policy package (cpl) has big positive effect on people’s birth intention.
(Q: how to interpret the coefficient .187?)
Prediction:
Stata command: predict yhat, xb
(Q: What does the predicted value � = 0.499 mean? 1.121,-.016)
8/17/2019 Lecture 7slides
9/27
9
First, for LPM, [|] = ( = 1|) = 0 + 11 +⋯+ , so
� = 0 + 11 +⋯+ = . ( = 1|)
predicted value �: the predicted probability of “success” (having another child)
0.499 is couple 1’s predicted probability of having another child, given other factors.
Second, is the partial effect of on the probability of “success” (( =
1|)),
=[|]
=
( = 1|)
, =1 ,… ,
: the estimated partial effect (the ceteris paribus interpretation)
E.g., 1 =.187 can be interpreted that additional unit of policy package increases
the probability of having another child by18.7%, holding other factors fixed.
8/17/2019 Lecture 7slides
10/27
10
Example 2: Women’s Labor Force Participation (data set: MROZ.dta)
= 0 + 1 ∙ + 2 ∙ + 3 ∙ 6 + , = 1,… ,
o
: 1 for being in labor force and 0 for being a housewife;
o
nwifeinc: husband’s earning
o
educ: wife’s education
o
kidslt6 : number of children less than 6 years old
Estimation: run the multiple regression:
_cons .0737593 .0931678 0.79 0.429 -.1091417 .2566604 kidslt6 -.2227047 .0325987 -6.83 0.000 -.2867004 -.158709 educ .0572465 .0077912 7.35 0.000 .0419513 .0725418 nwifeinc -.0077404 .001519 -5.10 0.000 -.0107224 -.0047583
inlf Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 184.727756 752 .245648611 Root MSE = .4656 Adj R-squared = 0.1175 Residual 162.3691 749 .216781175 R-squared = 0.1210 Model 22.3586557 3 7.45288523 Prob > F = 0.0000 F( 3, 749) = 34.38 Source SS df MS Number of obs = 753
. reg inlf nwifeinc educ kidslt6
8/17/2019 Lecture 7slides
11/27
11
Advantages of LPM:
•
easy to implement: OLS
•
simple to interpret results
•
straightforward to test hypothesis
• OLS estimator is consistent
(Q:?)
Disadvantages of LPM:
1. heteroskedasticity:
o
heteroskedasticity-robust inference
(exercise)
2. the predicted probability � could be < 0 or > 1!
8/17/2019 Lecture 7slides
12/27
12
Graphic Interpretation of LPM
Example: explain Mortgage application by debt payments to income (P/I) ratio
LPM:
8/17/2019 Lecture 7slides
13/27
13
Probit and Logit Models
LPM model:
[|] = ( = 1|) = (0 + 11 +⋯+ ) = 0 + 11 +⋯+
Probit model: standard normal CDF Φ(∙)
(0 + 11 +⋯+ ) = Φ(0 + 11 +⋯+ )
Logit model: logistic CDF
(0 + 11 +⋯+ ) =exp(0 + 11 +⋯+ )
1 + exp(0 + 11 +⋯+ )
For CDFs, 0 ≤ Φ(∙) ≤ 1 and 0 ≤ exp (∙)
1+exp (∙) ≤ 1, so the estimated probability
� = Φ(̂0 + ̂11 +⋯+ ̂) for probit model
and � =exp (++⋯+)
1+exp (++⋯+) for logit model
8/17/2019 Lecture 7slides
14/27
8/17/2019 Lecture 7slides
15/27
15
probit regression of women’s labor force participation
Stata command: probit inlf nwifeinc educ kidslt6
.1667 is the coefficient of educ from probit regression. (.057 in LPM)
(Q: why so different?)
Compare the estimation results with those of LMP
_cons1.245253 .2714193 4.59 0.000 1.777225 .7132807
kidslt6.6525247 .0996887 6.55 0.000 .847911 .4571383
educ .1666664 .0235149 7.09 0.000 .120578 .2127547 nwifeinc .0231133 .0045451 5.09 0.000 .0320216 .014205
inlf Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood =465.45302
Pseudo R2 =0.0960
Prob > chi2 = 0.0000 LR chi2(3) = 98.84Probit regression Number of obs = 753
Iteration 3: log likelihood =465.45302
Iteration 2: log likelihood =465.4538
Iteration 1: log likelihood = 466.34923Iteration 0: log likelihood = 514.8732
. probit inlf nwifeinc educ kidslt6
8/17/2019 Lecture 7slides
16/27
16
logit regression of women’s labor force participation
Stata command: logit inlf nwifeinc educ kidslt6
.274 is the coefficient of educ from logit regression. (.057 in LPM and .1666 in probit)
(Q: why so different?)
_cons -2.046709 .4552589 -4.50 0.000 -2.939 -1.154418 kidslt6 -1.068074 .167187 -6.39 0.000 -1.395755 -.740394 educ .2741035 .0399976 6.85 0.000 .1957097 .3524973 nwifeinc -.0385731 .0078653 -4.90 0.000 -.0539887 -.0231574
inlf Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -465.55373 Pseudo R2 = 0.0958 Prob > chi2 = 0.0000 LR chi2(3) = 98.64Logistic regression Number of obs = 753
Iteration 3: log likelihood = -465.55373Iteration 2: log likelihood = -465.55673Iteration 1: log likelihood = -466.55427Iteration 0: log likelihood = -514.8732
. logit inlf nwifeinc educ kidslt6
8/17/2019 Lecture 7slides
17/27
17
Partial (or marginal) effect of ( = 1|) = (0 + 11 +⋯+ )
ceteris paribus effect, the effect of one unit of change in on the
probability of success ( = 1|) = (0 + 11 +⋯+ ), given other
factors fixed.
(i) Continuous :
( = 1|)
= ′
(0 + 11 +⋯+ ) , = 1,… ,
For probit model, () = Φ(), ′() = ()
For logit model, () = ()
1+ () and ′() = () =
()
(1+ ())
(for LPM, ′
() = 1)
(ii) Discrete , e.g. 1 from 1 to 0, the partial effect is defined as
(0 + 1 + 22 +⋯+ ) (0 + 22 +⋯+ )
8/17/2019 Lecture 7slides
18/27
18
Remarks:
1. Different from LPM, the partial effects in probit and logit models are notconstant, related with the values of .Slope parameter is NOT the partial
effect of on the probability of “success”, implying that the interpretations of
coefficients in these 3 models are different, not comparable.
2. Since ′ > 0 for probit and logit, the direction of the partial effect of
depends on the sign of .
3. Calculation of marginal effects at the mean values of regressors in probit
and logit regressions :Stata command: mfx
8/17/2019 Lecture 7slides
19/27
19
Probit regression:
following probit regression, run Stata command: mfx
(Note: at the mean values of x)
_cons 1.245253 .2714193 4.59 0.000 1.777225 .7132807 kidslt6 .6525247 .0996887 6.55 0.000 .847911 .4571383 educ .1666664 .0235149 7.09 0.000 .120578 .2127547 nwifeinc .0231133 .0045451 5.09 0.000 .0320216 .014205
inlf Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = 465.45302 Pseudo R2 = 0.0960 Prob > chi2 = 0.0000
LR chi2(3
) =98.84
Probit regression Number of obs =753
Iteration 3: log likelihood = 465.45302Iteration 2: log likelihood = 465.4538Iteration 1: log likelihood = 466.34923Iteration 0: log likelihood = 514.8732
. probit inlf nwifeinc educ kidslt6
kidslt6 .2560349 .03923 6.53 0.000 .332929 .179141 .237716
educ .0653958 .00921 7.10 0.000 .047335 .083457 12.2869nwifeinc .0090691 .00178 5.08 0.000 .012566 .005572 20.129 variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
= .57228348
y = Pr(inlf) (predict)Marginal effects after probit
. mfx
8/17/2019 Lecture 7slides
20/27
20
logit regression:
(Q: any interesting finding from these results?)
_cons -2.046709 .4552589 -4.50 0.000 -2.939 -1.154418 kidslt6 -1.068074 .167187 -6.39 0.000 -1.395755 -.740394
educ .2741035 .0399976 6.85 0.000 .1957097 .3524973 nwifeinc -.0385731 .0078653 -4.90 0.000 -.0539887 -.0231574
inlf Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -465.55373 Pseudo R2 = 0.0958 Prob > chi2 = 0.0000 LR chi2(3) = 98.64Logistic regression Number of obs = 753
Iteration 3: log likelihood = -465.55373Iteration 2: log likelihood = -465.55673Iteration 1: log likelihood = -466.55427Iteration 0: log likelihood = -514.8732
. logit inlf nwifeinc educ kidslt6
kidslt6 -.2614511 .04111 -6.36 0.000 -.342023 -.18088 .237716
educ .0670971 .00977 6.87 0.000 .047943 .086251 12.2869nwifeinc -.0094422 .00193 -4.90 0.000 -.013222 -.005663 20.129 variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
= .57219848
y = Pr(inlf) (predict)Marginal effects after logit
. mfx
8/17/2019 Lecture 7slides
21/27
21
the mean values of regressors:
LPM result:
This empirical example tells us that though the estimates of 2, are different in
LPM, probit and logit regressions, their partial effects evaluated at mean values of
regressors are very close. (Q: why does this make sense?)
kidslt6 753 .2377158 .523959 0 3 educ 753 12.28685 2.280246 5 17 nwifeinc 753 20.12896 11.6348 .0290575 96
Variable Obs Mean Std. Dev. Min Max
. sum nwifeinc educ kidslt6
_cons .0737593 .0931678 0.79 0.429 -.1091417 .2566604
kidslt6 -.2227047 .0325987 -6.83 0.000 -.2867004 -.158709 educ .0572465 .0077912 7.35 0.000 .0419513 .0725418 nwifeinc -.0077404 .001519 -5.10 0.000 -.0107224 -.0047583
inlf Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 184.727756 752 .245648611 Root MSE = .4656 Adj R-squared = 0.1175 Residual 162.3691 749 .216781175 R-squared = 0.1210 Model 22.3586557 3 7.45288523 Prob > F = 0.0000 F( 3, 749) = 34.38
Source SS df MS Number of obs = 753
. reg inlf nwifeinc educ kidslt6
8/17/2019 Lecture 7slides
22/27
22
Partial (marginal) effects in LPM: ̂
,
Partial effects in probit regression: (∙)̂
Partial effects in logit regression: (∙)
(1+ (∙))̂
A simple rule for comparing coefficients in these 3 models: partial effects are
considered to be approximately equal:
̂
≈ (∙)̂
≈ (∙)
(1+ (∙)) ̂
Since (0) ≈ 0.4 for probit and (0)
(1+ (0))= 0.25 for logit, we obtain:
̂
≈ 0.4 ∙ ̂
≈ 0.25 ∙ ̂
or
̂
≈ 2.5̂
, ̂
≈ 4̂
and ̂
≈ 0.625̂
Example above: ̂
= 0.057, ̂= 0.167, ̂
= 0.274
8/17/2019 Lecture 7slides
23/27
23
Calculation of the predicted probability � in probit/logit model:
Stata command: predict ypr, pr (after probit/logit regression)
Check whether � lies in the unit interval and compare the predicted probabilities
in probit and logit models.
Note:
In Stata 11, the calculation of marginal effect has 3 cases: marginal
effect at the mean, marginal effect at a representative value and average
marginal effect. Stata commands are:
margins, dydx(*) atmeanmargins, dydx(*) at(nwifeinc=0 educ=6 kidslt6=1)
margins, dydx(*)
8/17/2019 Lecture 7slides
24/27
24
3. Estimation and Testing: Probit and Logit
We need calculate the likelihood function in probit and logit models:
Prob( = 1|) = (0 + 11 +⋯+ ), Prob( = 0|) = 1 (∙)
or equivalently, () = (∙) ∙ [ 1 (∙)]1−
Then likelihood function(0,1, … ,) = ∏ () = ∏ (∙)
[1 (∙)]1−=1=1
or
ln(0,1, … ,) = ∑ { ln(∙) + (1 ) ln1 (∙)]}=1
as a function of the unknown parameters 0,1, … ,.
() = Φ(), for probit model; () = ()1+ ()
, for logit model.
Maximizing ln (0,1, … ,) gives the probit (or logit) estimates of ’s.
8/17/2019 Lecture 7slides
25/27
25
For probit and logit models, we can’t solve for the maximum explicitly. We need
use numerical methods (iterations): e.g.,
Properties of MLE:
o consistent
o
asymptotically normal
o
asymptotically efficient
_cons 1.245253 .2714193 4.59 0.000 1.777225 .7132807
kidslt6.6525247 .0996887 6.55 0.000 .847911 .4571383
educ.1666664 .0235149 7.09 0.000 .120578 .2127547
nwifeinc .0231133 .0045451 5.09 0.000 .0320216 .014205
inlf Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = 465.45302 Pseudo R2 = 0.0960
Prob > chi2 =0.0000
LR chi2(3
) =98.84
Probit regression Number of obs = 75 3
Iteration 3: log likelihood =465.45302
Iteration 2: log likelihood = 465.4538Iteration 1: log likelihood = 466.34923Iteration 0: log likelihood =
514.8732
. probit inlf nwifeinc educ kidslt6
8/17/2019 Lecture 7slides
26/27
26
Hypothesis Testing in probit and logit models: same as in OLS
Example 1: 0:1 = 2 = 3 = 0 in logit model
State commands:
quitely logit inlf nwifeinc educ kidslt6
test nwifeinc educ kidslt6
0 is rejected since the p-value is 0.
Example 2: linear restriction: 0: 22 3 = 0
0 is rejected since the p-value is 0.
Prob > chi2 = 0.0000
chi2( 3) = 78.00
( 3) [inlf]kidslt6 = 0
( 2) [inlf]educ = 0
( 1) [inlf]nwifeinc = 0
. test nwifeinc educ kidslt6
Prob > chi2 = 0.0000
chi2( 1) = 65.05
( 1) 2*[inlf]educ - [inlf]kidslt6 = 0
. test 2*educ- kidslt6=0
8/17/2019 Lecture 7slides
27/27
27
Example 3: 0: 22 3 = 0 and 1 + 2 = 0
Stata commands:
test (2*educ- kidslt6=0) (educ+ nwifeinc=0)
0 is rejected since the p-value is 0.
Prob > chi2 = 0.0000
chi2( 2) = 69.16
( 2) [inlf]nwifeinc + [inlf]educ = 0
( 1) 2*[inlf]educ - [inlf]kidslt6 = 0
. te
st (2*educ- kidslt6=0) (educ+ nwifeinc=0)