Lecture 7slides

Embed Size (px)

Citation preview

  • 8/17/2019 Lecture 7slides

    1/27

    HE3021, NTU Lecture 7 FENG Qu

    LECTURE 7: BINARY CHOICE MODELS 

    1 Modeling Choice Decision

    2 Binary Choice Models: LPM, Probit, Logit

    3 Estimation and Testing

  • 8/17/2019 Lecture 7slides

    2/27

    1. Modeling Choice Decision

    Example of (binary) choice decision: having another  child

    Other examples:

    • 

    female labor force participation: work or be a housewife

    •  marriage decision: married or single

    • 

    applying for mortgage: accepted or denied

    •  admitted or not by NTU

    •  go to graduate school or work: yes or no

    • 

    vote or not vote

    • 

     buy or not buy

    (Q: Common feature? )

    Binary choice: y = 1 or 0

    (decision variable) (yes) (No)

  • 8/17/2019 Lecture 7slides

    3/27

    How to model people’s choice decision?

    Example: having another  child

    Economics: benefit- (opportunity) cost analysis (Gary Becker, 1981)

    = 1 for having another child; = 0 for not having.

    1 denotes the benefit from having another child, e.g., packages above, tax rebate,

     baby bonus, bigger HDB flatter, subsidized childcare, and happiness, etc.;

    0  denotes the (opportunity) cost, e.g., economic costs (of pregnancy, birth,

    growing, education, healthcare), leisure, pay increase otherwise,

    Then, the decision can be modeled by:

    = 1 if 1  0 > 0;

    0 otherwise

  • 8/17/2019 Lecture 7slides

    4/27

    Another example: female labor force participation

    Economics: the outcome of a market process

    •  Demanders: offer a wage based on labor’s expected marginal product

    •  Suppliers: whether or not to accept the offer depending on it exceeded their

    own reservation wages

    From women’s side, being in labor force or not is a trade-off between wage and

    alternative: taking care of your kids, housekeeping, leisure,….

    = 1 for working; = 0  for not being in labor force. If 1 denotes the utility

    from working and 0

     denotes the utility from staying at home, then the decision

    can be modeled by:

    = 1 if 1  0 > 0;

    0 otherwise

  • 8/17/2019 Lecture 7slides

    5/27

    Like in linear regression models, suppose that the difference 1  0  can be

    interpreted by observable characteristics and unobserved error term,

    1  0 = 0 + 11 +⋯+ + . 

    Thus,

    ( = 1|) = (1  0 > 0) = (0 + 11 +⋯+ + > 0) 

    Suppose the random variable  follows a distribution with CDF (∙). Thus

    ( = 1|) = ( > (0 + 11 +⋯+ )) 

    = 1 ((0 + 11 +⋯+ )) 

    If the distribution is symmetric, then

    ( = 1|) = (0 + 11 +⋯+ )  and

    ( = 0|) = 1 ( = 1) = 1 (0 + 11 +⋯+ ).

  • 8/17/2019 Lecture 7slides

    6/27

  • 8/17/2019 Lecture 7slides

    7/27

    LPM: OLS with a binary dependent variable ( = 1 or 0) 

    = 0 + 11 +⋯+ + , = 1,… , . 

    Example 1: birth intention in Singapore (data set: babybonus.dta)

    dep. var. (y): ( yb) 1 for intention to have another child; 0 otherwise

    indep. var.(x): (a scale measure of) policy package (cpl), current numberof children (num), husband monthly income (hmi), wife’s education (we),

    wife’s age (wa)

    _cons 1.521159 .2663598 5.71 0.000 .9935997 2.048718

      wa -.0175984 .0085417 -2.06 0.042 -.0345164 -.0006805

      we -.0551932 .0474809 -1.16 0.247 -.1492351 .0388486

      hmi -.0005362 .0362719 -0.01 0.988 -.0723773 .0713049

      num -.2759397 .0454467 -6.07 0.000 -.3659526 -.1859267

      cpl .1865787 .076402 2.44 0.016 .0352548 .3379026

     

    yb Coef. Std. Err. t P>|t| [95 Conf. Interval]

     

    Total 30.4672131 121 .25179515 Root MSE = .41049

      Adj R-squared = 0.3308

      Residual 19.5458145 116 .168498401 R-squared = 0.3585

      Model 10.9213986 5 2.18427972 Prob > F = 0.0000

      F( 5, 116) = 12.96

      Source SS df MS Number of obs = 122

    . reg yb cpl num hmi we wa

  • 8/17/2019 Lecture 7slides

    8/27

    Result:

    Policy package (cpl) has big positive effect on people’s birth intention.

    (Q: how to interpret the coefficient .187?)

    Prediction:

    Stata command: predict yhat, xb

    (Q: What does the predicted value � = 0.499 mean? 1.121,-.016)

  • 8/17/2019 Lecture 7slides

    9/27

    First, for LPM, [|] = ( = 1|) = 0 + 11 +⋯+ , so

    � = 0 + 11 +⋯+ = . ( = 1|) 

     predicted value �: the predicted probability of “success” (having another child)

    0.499 is couple 1’s predicted probability of having another child, given other factors.

    Second,  is the partial effect  of  on the probability of “success” (( =

    1|)),

    =[|]

    =

    ( = 1|)

    ,  =1 ,… , 

    : the estimated partial effect (the ceteris paribus interpretation)

    E.g., 1 =.187 can be interpreted that additional unit of policy package increases

    the probability of having another child by18.7%, holding other factors fixed.

  • 8/17/2019 Lecture 7slides

    10/27

    10 

    Example 2: Women’s Labor Force Participation (data set: MROZ.dta)

    = 0 + 1 ∙ + 2 ∙ + 3 ∙ 6 + , = 1,… ,  

    : 1 for being in labor force and 0 for being a housewife; 

    nwifeinc: husband’s earning

    educ: wife’s education

    kidslt6 : number of children less than 6 years old

    Estimation: run the multiple regression:

    _cons .0737593 .0931678 0.79 0.429 -.1091417 .2566604  kidslt6 -.2227047 .0325987 -6.83 0.000 -.2867004 -.158709  educ .0572465 .0077912 7.35 0.000 .0419513 .0725418  nwifeinc -.0077404 .001519 -5.10 0.000 -.0107224 -.0047583 

    inlf Coef. Std. Err. t P>|t| [95% Conf. Interval] 

    Total 184.727756 752 .245648611 Root MSE = .4656  Adj R-squared = 0.1175  Residual 162.3691 749 .216781175 R-squared = 0.1210  Model 22.3586557 3 7.45288523 Prob > F = 0.0000  F( 3, 749) =  34.38  Source SS df MS Number of obs =  753

    . reg inlf nwifeinc educ kidslt6

  • 8/17/2019 Lecture 7slides

    11/27

    11 

    Advantages of LPM:

    • 

    easy to implement: OLS

    • 

    simple to interpret results

    • 

    straightforward to test hypothesis

    •  OLS estimator is consistent

    (Q:?)

    Disadvantages of LPM:

    1. heteroskedasticity:

    heteroskedasticity-robust inference

    (exercise)

    2. the predicted probability � could be < 0 or > 1!

  • 8/17/2019 Lecture 7slides

    12/27

    12 

    Graphic Interpretation of LPM

    Example: explain Mortgage application by debt payments to income (P/I) ratio

    LPM:

  • 8/17/2019 Lecture 7slides

    13/27

    13 

    Probit and Logit Models

    LPM model:

    [|] = ( = 1|) = (0 + 11 +⋯+ ) = 0 + 11 +⋯+  

    Probit model: standard normal CDF Φ(∙) 

    (0 + 11 +⋯+ ) = Φ(0 + 11 +⋯+ ) 

    Logit model: logistic CDF

    (0 + 11 +⋯+ ) =exp(0 + 11 +⋯+ )

    1 + exp(0 + 11 +⋯+ ) 

    For CDFs, 0 ≤ Φ(∙) ≤ 1 and 0 ≤  exp (∙)

    1+exp (∙) ≤ 1, so the estimated probability

    � = Φ(̂0 + ̂11 +⋯+ ̂) for probit model

    and � =exp (++⋯+)

    1+exp (++⋯+) for logit model

  • 8/17/2019 Lecture 7slides

    14/27

  • 8/17/2019 Lecture 7slides

    15/27

    15 

     probit regression of women’s labor force participation

    Stata command: probit inlf nwifeinc educ kidslt6

    .1667 is the coefficient of educ from probit regression. (.057 in LPM)

    (Q: why so different?)

    Compare the estimation results with those of LMP

    _cons1.245253 .2714193 4.59 0.000 1.777225 .7132807

     

    kidslt6.6525247 .0996887 6.55 0.000 .847911 .4571383

      educ .1666664 .0235149 7.09 0.000 .120578 .2127547  nwifeinc .0231133 .0045451 5.09 0.000 .0320216 .014205 

    inlf Coef. Std. Err. z P>|z| [95% Conf. Interval] 

    Log likelihood =465.45302

    Pseudo R2 =0.0960

      Prob > chi2 = 0.0000  LR chi2(3) = 98.84Probit regression Number of obs = 753

    Iteration 3: log likelihood =465.45302

    Iteration 2: log likelihood =465.4538

    Iteration 1: log likelihood = 466.34923Iteration 0: log likelihood = 514.8732

    . probit inlf nwifeinc educ kidslt6

  • 8/17/2019 Lecture 7slides

    16/27

    16 

    logit regression of women’s labor force participation

    Stata command: logit inlf nwifeinc educ kidslt6

    .274 is the coefficient of educ from logit regression. (.057 in LPM and .1666 in probit)

    (Q: why so different?)

    _cons -2.046709 .4552589 -4.50 0.000 -2.939 -1.154418  kidslt6 -1.068074 .167187 -6.39 0.000 -1.395755 -.740394  educ .2741035 .0399976 6.85 0.000 .1957097 .3524973  nwifeinc -.0385731 .0078653 -4.90 0.000 -.0539887 -.0231574 

    inlf Coef. Std. Err. z P>|z| [95% Conf. Interval] 

    Log likelihood = -465.55373 Pseudo R2 = 0.0958  Prob > chi2 = 0.0000  LR chi2(3) = 98.64Logistic regression Number of obs = 753

    Iteration 3: log likelihood = -465.55373Iteration 2: log likelihood = -465.55673Iteration 1: log likelihood = -466.55427Iteration 0: log likelihood = -514.8732

    . logit inlf nwifeinc educ kidslt6

  • 8/17/2019 Lecture 7slides

    17/27

    17 

    Partial (or marginal) effect of ( = 1|) = (0 + 11 +⋯+ ) 

    ceteris paribus  effect, the effect of one unit of change in  on the

     probability of success ( = 1|) = (0 + 11 +⋯+ ), given other

    factors fixed.

    (i) Continuous :

    ( = 1|)

    = ′

    (0 + 11 +⋯+ ) ,  = 1,… ,  

    For probit model, () = Φ(), ′() = () 

    For logit model, () =   ()

    1+ () and ′() = () =

       ()

    (1+ ()) 

    (for LPM, ′

    () = 1)

    (ii) Discrete , e.g. 1 from 1 to 0, the partial effect is defined as

    (0 + 1 + 22 +⋯+ ) (0 + 22 +⋯+ ) 

  • 8/17/2019 Lecture 7slides

    18/27

    18 

    Remarks:

    1. Different from LPM, the partial effects in probit and logit models are notconstant, related with the values of  .Slope parameter     is NOT the partial

    effect of  on the probability of “success”, implying that the interpretations of

    coefficients in these 3 models are different, not comparable.

    2. Since ′ > 0 for probit and logit, the direction of the partial effect of  

    depends on the sign of .

    3. Calculation of marginal effects at the mean values of regressors  in probit

    and logit regressions :Stata command: mfx

  • 8/17/2019 Lecture 7slides

    19/27

    19 

    Probit regression:

    following probit regression, run Stata command: mfx

    (Note: at the mean values of x)

    _cons 1.245253 .2714193 4.59 0.000 1.777225 .7132807  kidslt6 .6525247 .0996887 6.55 0.000 .847911 .4571383  educ .1666664 .0235149 7.09 0.000 .120578 .2127547  nwifeinc .0231133 .0045451 5.09 0.000 .0320216 .014205 

    inlf Coef. Std. Err. z P>|z| [95% Conf. Interval] 

    Log likelihood = 465.45302 Pseudo R2 = 0.0960  Prob > chi2 = 0.0000 

    LR chi2(3

    ) =98.84

    Probit regression Number of obs =753

    Iteration 3: log likelihood = 465.45302Iteration 2: log likelihood = 465.4538Iteration 1: log likelihood = 466.34923Iteration 0: log likelihood = 514.8732

    . probit inlf nwifeinc educ kidslt6

     kidslt6 .2560349 .03923 6.53 0.000 .332929 .179141 .237716

      educ .0653958 .00921 7.10 0.000 .047335 .083457 12.2869nwifeinc .0090691 .00178 5.08 0.000 .012566 .005572 20.129 variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X 

    = .57228348

      y = Pr(inlf) (predict)Marginal effects after probit

    . mfx

  • 8/17/2019 Lecture 7slides

    20/27

    20 

    logit regression:

    (Q: any interesting finding from these results?)

    _cons -2.046709 .4552589 -4.50 0.000 -2.939 -1.154418  kidslt6 -1.068074 .167187 -6.39 0.000 -1.395755 -.740394

      educ .2741035 .0399976 6.85 0.000 .1957097 .3524973  nwifeinc -.0385731 .0078653 -4.90 0.000 -.0539887 -.0231574 

    inlf Coef. Std. Err. z P>|z| [95% Conf. Interval] 

    Log likelihood = -465.55373 Pseudo R2 = 0.0958  Prob > chi2 = 0.0000  LR chi2(3) = 98.64Logistic regression Number of obs = 753

    Iteration 3: log likelihood = -465.55373Iteration 2: log likelihood = -465.55673Iteration 1: log likelihood = -466.55427Iteration 0: log likelihood = -514.8732

    . logit inlf nwifeinc educ kidslt6

     kidslt6 -.2614511 .04111 -6.36 0.000 -.342023 -.18088 .237716

      educ .0670971 .00977 6.87 0.000 .047943 .086251 12.2869nwifeinc -.0094422 .00193 -4.90 0.000 -.013222 -.005663 20.129 variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

     = .57219848

      y = Pr(inlf) (predict)Marginal effects after logit

    . mfx

  • 8/17/2019 Lecture 7slides

    21/27

    21 

    the mean values of regressors:

    LPM result:

    This empirical example tells us that though the estimates of 2, are different in

    LPM, probit and logit regressions, their partial effects evaluated at mean values of

    regressors are very close. (Q: why does this make sense?)

    kidslt6 753 .2377158 .523959 0 3  educ 753 12.28685 2.280246 5 17  nwifeinc 753 20.12896 11.6348 .0290575 96 

    Variable Obs Mean Std. Dev. Min Max

    . sum nwifeinc educ kidslt6

     _cons .0737593 .0931678 0.79 0.429 -.1091417 .2566604

      kidslt6 -.2227047 .0325987 -6.83 0.000 -.2867004 -.158709  educ .0572465 .0077912 7.35 0.000 .0419513 .0725418  nwifeinc -.0077404 .001519 -5.10 0.000 -.0107224 -.0047583 

    inlf Coef. Std. Err. t P>|t| [95% Conf. Interval] 

    Total 184.727756 752 .245648611 Root MSE = .4656  Adj R-squared = 0.1175  Residual 162.3691 749 .216781175 R-squared = 0.1210  Model 22.3586557 3 7.45288523 Prob > F = 0.0000  F( 3, 749) =  34.38

      Source SS df MS Number of obs =  753

    . reg inlf nwifeinc educ kidslt6

  • 8/17/2019 Lecture 7slides

    22/27

    22 

    Partial (marginal) effects in LPM: ̂

    ,

    Partial effects in probit regression: (∙)̂

     

    Partial effects in logit regression: (∙)

    (1+ (∙))̂

     

    A simple rule for comparing coefficients in these 3 models: partial effects are

    considered to be approximately equal:

    ̂

    ≈ (∙)̂

    ≈   (∙)

    (1+ (∙)) ̂

     

    Since (0) ≈ 0.4 for probit and (0)

    (1+ (0))= 0.25 for logit, we obtain:

    ̂

    ≈ 0.4 ∙ ̂

    ≈ 0.25 ∙ ̂

     

    or

    ̂

    ≈ 2.5̂

    , ̂

    ≈ 4̂

    and ̂

    ≈ 0.625̂

     

    Example above: ̂

    = 0.057, ̂= 0.167, ̂

    = 0.274 

  • 8/17/2019 Lecture 7slides

    23/27

    23 

    Calculation of the predicted probability � in probit/logit model:

    Stata command: predict ypr, pr  (after probit/logit regression)

    Check whether � lies in the unit interval and compare the predicted probabilities

    in probit and logit models.

    Note:

    In Stata 11, the calculation of marginal effect has 3 cases: marginal

    effect at the mean, marginal effect at a representative value and average

    marginal effect. Stata commands are:

    margins, dydx(*) atmeanmargins, dydx(*) at(nwifeinc=0 educ=6 kidslt6=1)

    margins, dydx(*)

  • 8/17/2019 Lecture 7slides

    24/27

    24 

    3. Estimation and Testing: Probit and Logit

    We need calculate the likelihood function in probit and logit models:

    Prob( = 1|) = (0 + 11 +⋯+ ), Prob( = 0|) = 1 (∙) 

    or equivalently,  () = (∙) ∙ [ 1 (∙)]1− 

    Then likelihood function(0,1, … ,) = ∏   () = ∏   (∙)

    [1 (∙)]1−=1=1  

    or

    ln(0,1, … ,) = ∑ { ln(∙) + (1 ) ln1 (∙)]}=1  

    as a function of the unknown parameters 0,1, … ,.

    () = Φ(), for probit model; () =    ()1+ ()

    , for logit model.

    Maximizing ln (0,1, … ,) gives the probit (or logit) estimates of ’s.

  • 8/17/2019 Lecture 7slides

    25/27

    25 

    For probit and logit models, we can’t solve for the maximum explicitly. We need

    use numerical methods (iterations): e.g.,

    Properties of MLE:

    o consistent

    asymptotically normal

    asymptotically efficient

    _cons 1.245253 .2714193 4.59 0.000 1.777225 .7132807 

    kidslt6.6525247 .0996887 6.55 0.000 .847911 .4571383

     

    educ.1666664 .0235149 7.09 0.000 .120578 .2127547

      nwifeinc .0231133 .0045451 5.09 0.000 .0320216 .014205 

    inlf Coef. Std. Err. z P>|z| [95% Conf. Interval] 

    Log likelihood = 465.45302 Pseudo R2 = 0.0960 

    Prob > chi2 =0.0000

     

    LR chi2(3

    ) =98.84

    Probit regression Number of obs = 75 3

    Iteration 3: log likelihood =465.45302

    Iteration 2: log likelihood = 465.4538Iteration 1: log likelihood = 466.34923Iteration 0: log likelihood =

    514.8732

    . probit inlf nwifeinc educ kidslt6

  • 8/17/2019 Lecture 7slides

    26/27

    26 

    Hypothesis Testing in probit and logit models: same as in OLS

    Example 1: 0:1 = 2 = 3 = 0 in logit model

    State commands:

    quitely logit inlf nwifeinc educ kidslt6

    test nwifeinc educ kidslt6

    0 is rejected since the p-value is 0.

    Example 2: linear restriction: 0: 22  3 = 0 

    0 is rejected since the p-value is 0.

    Prob > chi2 = 0.0000

      chi2( 3) = 78.00

     ( 3) [inlf]kidslt6 = 0

     ( 2) [inlf]educ = 0

     ( 1) [inlf]nwifeinc = 0

    . test nwifeinc educ kidslt6

      Prob > chi2 = 0.0000

      chi2( 1) = 65.05

     ( 1) 2*[inlf]educ - [inlf]kidslt6 = 0

    . test 2*educ- kidslt6=0

  • 8/17/2019 Lecture 7slides

    27/27

    27 

    Example 3: 0: 22  3 = 0 and 1 + 2 = 0 

    Stata commands:

    test (2*educ- kidslt6=0) (educ+ nwifeinc=0)

    0 is rejected since the p-value is 0.

    Prob > chi2 = 0.0000

      chi2( 2) = 69.16

     ( 2) [inlf]nwifeinc + [inlf]educ = 0

     ( 1) 2*[inlf]educ - [inlf]kidslt6 = 0

    . te

    st (2*educ- kidslt6=0) (educ+ nwifeinc=0)