38
The Probit Model Alexander Spermann University of Freiburg University of Freiburg SoSe 2009

The Probit Model

  • Upload
    leanh

  • View
    252

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Probit Model

The Probit Model

Alexander SpermannUniversity of FreiburgUniversity of Freiburg

SoSe 2009

Page 2: The Probit Model

1. Notation and statistical foundations

2. Introduction to the Probit model

3. Application

4. Coefficients and marginal effects

Course outline

2

4. Coefficients and marginal effects

5. Goodness-of-fit

6. Hypothesis tests

Page 3: The Probit Model

1 2 2

t 0 1 1

'

^ ^

1. y Gujarati

y W ooldridge

2. M atrix

i i k ik i

t k tk t

x x

x x u

Y X

Y x

Y X u

β β β εβ β β

β εβ ε

β

= + + + += + + + +

= += +

= +

K

K

Notation and statistical foundations

3

( ) ( )

'

' '

0 1

1 2 1 2 3 2

2 3

1 1

i i i

i i

Y X u

y x

x x

x x x x

ββ ε

β ββ ββ ββ β

= += +

Page 4: The Probit Model

Notation and statistical foundations – Vectors

� Column vector:

� Transposed (row vector): [ ]

1

2

1

'

xn

n

a

aa

a

a a a a

=

=

M

K

4

� Transposed (row vector):

� Inner product:

[ ]

[ ]

'1 2

1

1

2'1 2

xn

n

n i i

n

a a a a

b

ba b a a a a b

b

=

= =

K

KM

Page 5: The Probit Model

� PDF: probability density function f(x)

� Example: Normal distribution:

( )( )2

2

121

x

x e

µσφ

− − =

Notation and statistical foundations – density function

5

� Example: Standard normal distribution: N(0,1), µ = 0, σ = 1

( )2

x eφσ π

=

( )2

21

2

x

x eφπ

−=

0=µ

Page 6: The Probit Model

� Standard logistic distribution:

Exponential distribution:

( ) 3,0,

1)(

22

2

πσµ ==+

=x

x

e

exf

Notation and statistical foundations – distibutions

6

Exponential distribution:

� Poisson distribution:

220,

1

0,0 ,,0,)( θσθµθθ

θ ==>

=≥

−xe

x

x

xf

θσθµθθ

===−

2,,!

)(x

exf

x

Page 7: The Probit Model

� CDF: cumulative distribution function F(x)

� Example: Standard normal distribution:

( )2

21

2

z x

z e dxπ

−∞

Φ = ∫

Notation and statistical foundations – CDF

7

� The cdf is the integral of the pdf.

Page 8: The Probit Model

� Rule I:

� Rule II:

log log log

n

y x z

y x z

y x

== +

=

Notation and statistical foundations – logarithms

8

� Rule II:

� Rule III:

log log

log log log

b

y x

y n x

y a x

y a b x

==

== +

Page 9: The Probit Model

� Why not use OLS instead?

Introduction to the Probit model – binary variables

=0

1y

OLS

9

� Nonlinear estimation, for example by maximum likelihood.

1

0

OLS(linear)

x x x x x x x

x x x x

Page 10: The Probit Model

� Latent variable: Unobservable variable y* which can take all values in (-∞, +∞).

� Example: y* = Utility(Labour income) - Utility(Non labour income)

Underlying latent model:

Introduction to the Probit model – latent variables

10

� Underlying latent model:

iii

i

ii

xy

y

yy

εβ +=

≤>

=

'*

0*,0

0*,1

Page 11: The Probit Model

� Probit is based on a latent model:

Introduction to the Probit model – latent variables

( )εφ

)|(

)|0(

)|0()|1(

'

'

*

ββε

εβ

ii

ii

ii

xxP

xxP

xyPxyP

−−=

−>=

>+=

>==

11

Assumption: Error terms are independent and normally distributed:

'ix β−

)(1 'βixF −−=

ββ ''ii xx−

)(

1),(1)|1(

'

'

β

σσβ

i

ii

x

xxyP

Φ=

≡−Φ−==

because of symmetry

Page 12: The Probit Model

� Example:

Introduction to the Probit model – CDF

( )zΦ=CDF ( )zΦ=CDF

0,8

1

12

1-0,2=0,81-0,2=0,8z− z0

0,2

0,5

0,8

β'ixz =

Page 13: The Probit Model

� F(z) lies between zero and one

� CDF of Probit: CDF of Logit:

Introduction to the Probit model – CDF Probit vs. Logit

13

β'ixz = β'ixz =

Page 14: The Probit Model

� PDF of Probit: PDF of Logit:

Introduction to the Probit model – PDF Probit vs. Logit

14

Page 15: The Probit Model

� Joint density:

Introduction to the Probit model – The ML principle

[ ]ii

ii

yi

yi

y

iy

ii

FF

xFxFxyf

−=

−=

∏∏

1

)1(''

)1(

)(1)(),|( βββ

15

� Log likelihood function:

ii

i∏

)1ln()1(lnln iii

ii FyFyL −−+=∑

Page 16: The Probit Model

� The principle of ML: Which value of β maximizes the probability of observing the given sample?

Introduction to the Probit model – The ML principle

1

))(1(ln

−−−+=

∂∂

∑ ii i

ii

i

ii xF

fy

F

fyL

β

16

0

)1(

1

=

−−=

−∂

∑ ii

iii

ii

i ii

xfFF

Fy

FFβ

Page 17: The Probit Model

� Example taken from Greene, Econometric Analysis, 5. ed. 2003, ch. 17.3.

� 10 observations of a discrete distribution

� Random sample: 5, 0, 1, 1, 0, 3, 2, 3, 4, 1

Introduction to the Probit model – Example

17

� PDF:

� Joint density :

� Which value of θ makes occurance of the observed sample most probable?

( )!

,i

x

i x

exf

iθθθ−

=

( ) ( )36,207

!,|,,,

2010

10

1

1010

11021

θθθθθθ ⋅=

∑⋅==−

=

= ∏∏ e

x

exfxxxf

ii

x

ii

i i

K

Page 18: The Probit Model

( )( )

ln 10 20ln 12,242

ln 2010 0

L

d L

d

θ θ θθ

θ θ

= − + −

= − + =

( )xL |θ

Introduction to the Probit model – Example

18

( )xL |θ

θ

( )xL |θ

( )xL |ln θ

2Maximumd

Ld

−=22

2 20)(ln

θθθ

Page 19: The Probit Model

Application

� Analysis of the effect of a new teaching method in economic sciences

� Data: Beobachtung GPA TUCE PSI Grade Beobachtung GPA TUCE PSI Grade

1 2,66 20 0 0 17 2,75 25 0 02 2,89 22 0 0 18 2,83 19 0 0

19

Source: Spector, L. and M. Mazzeo, Probit Analysis and Economic Education. In: Journal of Economic Education, 11, 1980, pp.37-44

3 3,28 24 0 0 19 3,12 23 1 04 2,92 12 0 0 20 3,16 25 1 15 4 21 0 1 21 2,06 22 1 06 2,86 17 0 0 22 3,62 28 1 17 2,76 17 0 0 23 2,89 14 1 08 2,87 21 0 0 24 3,51 26 1 09 3,03 25 0 0 25 3,54 24 1 110 3,92 29 0 1 26 2,83 27 1 111 2,63 20 0 0 27 3,39 17 1 112 3,32 23 0 0 28 2,67 24 1 013 3,57 23 0 0 29 3,65 21 1 114 3,26 25 0 1 30 4 23 1 115 3,53 26 0 0 31 3,1 21 1 016 2,74 19 0 0 32 2,39 19 1 1

Page 20: The Probit Model

Application – Variables

� GradeDependent variable. Indicates whether a student improved his grades after the new teaching method PSI had been introduced (0 = no, 1 = yes).

� PSI

20

� PSIIndicates if a student attended courses that used the new method (0 = no, 1 = yes).

� GPAAverage grade of the student

� TUCEScore of an intermediate test which shows previous knowledge of a topic.

Page 21: The Probit Model

Application – Estimation

� Estimation results of the model (output from Stata):

21

Page 22: The Probit Model

Application – Discussion

� ML estimator: Parameters were obtained by maximization of the log likelihood function.Here: 5 iterations were necessary to find the maximum of the log likelihood function (-12.818803)

� Interpretation of the estimated coefficients:

22

� Interpretation of the estimated coefficients:

� Estimated coefficients do not quantify the influence of the rhs variables on the probability that the lhs variable takes on the value one.

� Estimated coefficients are parameters of the latent model.

Page 23: The Probit Model

Coefficients and marginal effects

� The marginal effect of a rhs variable is the effect of an unit change of this variable on the probability P(Y = 1|X = x), given that all other rhs variables are constant:

ββϕ )()|()|1( '

iiiii x

x

xyE

x

xyP =∂

∂=∂=∂

23

� Recap: The slope parameter of the linear regression model measures directly the marginal effect of the rhs variable on the lhs variable.

iii xx ∂∂

Page 24: The Probit Model

Coefficients and marginal effects

� The marginal effect depends on the value of the rhs variable.

� Therefore, there exists an individual marginal effect for each person of the sample:

24

Page 25: The Probit Model

Coefficients and marginal effects – Computation

� Two different types of marginal effects can be calculated:

� Average marginal effect Stata command: margin

25

� Marginal effect at the mean: Stata command: mfx compute

Page 26: The Probit Model

Coefficients and marginal effects – Computation

� Principle of the computation of the average marginal effects:

26

� Average of individual marginal effects

Page 27: The Probit Model

Coefficients and marginal effects – Computation

� Computation of average marginal effects depends on type of rhs variable:

� Continuous variables like TUCE and GPA:

ββϕ∑=

=n

iix

nAME

1

' )(1

27

� Dummy variable like PSI:

=in 1

[ ]∑=

=Φ−=Φ=n

i

kii

kii xxxx

nAME

1

'' )0()1(1 ββ

Page 28: The Probit Model

Coefficients and marginal effects – Interpretation

� Interpretation of average marginal effects:

� Continuous variables like TUCE and GPA:An infinitesimal change of TUCE or GPA changes the probability that the lhs variable takes the value one by X%.

� Dummy variable like PSI:

28

� Dummy variable like PSI:A change of PSI from zero to one changes the probability that the lhs variable takes the value one by X percentage points.

Page 29: The Probit Model

Coefficients and marginal effects – Interpretation

Variable Estimated marginal effect Interpretation

GPA 0.364 If the average grade of a student goes up by an infinitesimal amount, the probability for the variable grade taking the value one rises by

29

the value one rises by 36.4 %.

TUCE 0.011 Analog to GPA,with an increase of 1.1%.

PSI 0.374 If the dummy variable changes from zero to one, the probability for the variable grade taking the value one rises by 37.4 ppts.

Page 30: The Probit Model

Coefficients and marginal effects – Significance

� Significance of a coefficient: test of the hypothesis whether a parameter is significantly different from zero.

� The decision problem is similar to the t-test, wheras the probit test statistic follows a standard normal distribution. The z-value is equal to the estimated parameter divided by

30

The z-value is equal to the estimated parameter divided by its standard error.

� Stata computes a p-value which shows directly the significance of a parameter:

z-value p-value Interpretation

GPA : 3.22 0.001 significant

TUCE: 0,62 0,533 insignificant

PSI: 2,67 0,008 significant

Page 31: The Probit Model

Coefficients and marginal effects

� Only the average of the marginal effects is displayed.

� The individual marginal effects show large variation:

31

Stata command: margin, table

Page 32: The Probit Model

Coefficients and marginal effects

� Variation of marginal effects may be quantified by the confidence intervals of the marginal effects.

� In which range one can expect a coefficient of the population?

In our example:

32

� In our example:

Estimated coefficient Confidence interval (95%)

GPA: 0,364 - 0,055 - 0,782

TUCE: 0,011 - 0,002 - 0,025

PSI: 0,374 0,121 - 0,626

Page 33: The Probit Model

Coefficients and marginal effects

� What is calculated by mfx?

� Estimation of the marginal effect at the sample mean.

33

Sample mean

Page 34: The Probit Model

Goodness of fit

� Goodness of fit may be judged by McFaddens Pseudo R².

� Measure for proximity of the model to the observed data.

� Comparison of the estimated model with a model which only contains a constant as rhs variable.

34

� : Likelihood of model of interest.

� : Likelihood with all coefficients except that of the intercept restricted to zero.

� It always holds that

)(ˆln FullML

)(ˆln InterceptML

)(ˆln FullML ≥ )(ˆln InterceptML

Page 35: The Probit Model

Goodness of fit

� The Pseudo R² is defined as:

� Similar to the R² of the linear regression model, it holds

)(ˆln

)(ˆln122

Intercept

FullMcF

ML

MLRPseudoR −==

35

� Similar to the R² of the linear regression model, it holds that

� An increasing Pseudo R² may indicate a better fit of the model, whereas no simple interpretation like for the R² of the linear regression model is possible.

10 2 ≤≤ McFR

Page 36: The Probit Model

Goodness of fit

� A high value of R²McF does not necessarily indicate a good fit, however, as R²McF = 1 if = 0.

� R²McF increases with additional rhs variables. Therefore, an adjusted measure may be appropriate:

)(ˆln FullML

)(ˆln KML −

36

� Further goodness of fit measures: R² of McKelvey and Zavoinas, Akaike Information Criterion (AIC), etc. See also the Stata command fitstat.

)(ˆln

)(ˆln122

Intercept

FullMcFadjusted

ML

KMLRPseudoR

−−==

Page 37: The Probit Model

Hypothesis tests

� Likelihood ratio test: possibility for hypothesis testing, for example for variable relevance.

� Basic principle: Comparison of the log likelihood functions of the unrestricted model (ln LU) and that of the restricted model (ln LR)

37

model (ln LR)

� Test statistic:

� The test statistic follows a χ² distribution with degrees of freedom equal to the number of restrictions.

( )2R ULR 2ln 2(lnL lnL ) χ Kλ= − = − −

R

U

L0 1

Lλ λ= ≤ ≤

��

Page 38: The Probit Model

Hypothesis tests

� Null hypothesis: All coefficients except that of the intercept are equal to zero.

� In the example:

� Prob > chi2 = 0.0014

2LR (3) 15,55χ =

38

� Interpretation: The hypothesis that all coefficients are equal to zero can be rejected at the 1 percent significance level.