Economic Foundations for Entertainment, Media, and Technology · 2018. 2. 2. · 44/76 A Stochastic...

Preview:

Citation preview

1/76

2/76

William Greene

New York University

True Random Effects in Stochastic

Frontier Models

http://people.stern.nyu.edu/wgreene/appc2014.pdf

3/76

Agenda

Skew normality – Adelchi Azzalini

Stochastic frontier model

Panel Data: Time varying and time invariant inefficiency models

Panel Data: True random effects models

Maximum Simulated Likelihood Estimation

Applications of true random effects

Persistent and transient inefficiency in Swiss railroads

A panel data sample selection corrected stochastic frontier model

Spatial effects in a stochastic frontier model

4/76

Skew Normality

5/76

The Stochastic Frontier Model

2

2

ln ,

~ 0, ,

| |, ~ 0, ,

= v | |

Convenient parameterization (notation)

| | = [0,1] | [0,1] |

i i i i

i v

i i i u

i i i i i

i v i u i v i u

y v u

v N

u U U N

v u U

V U N N

x

6/76

2 2

1

1

, =

2log log

log ( , , , ) = ( )

log

2 = log

uu v

v

i i

N

i

i i

N i i

i

y

Ly

x

x

Log Likelihood

Skew Normal

Density

7/76

Birnbaum (1950) Wrote About Skew Normality

Effect of

Linear

Truncation on

a Multinormal

Population

8/76

Weinstein (1964) Found f()

Query 2: The Sum of

Values from a

Normal and a

Truncated Normal

Distribution

See, also, Nelson (Technometrics, 1964), Roberts (JASA, 1966)

9/76

Resembles f()

O’Hagan and Leonard (1976) Found

Something Like f()

Bayes Estimation

Subject to Uncertainty

About Parameter

Constraints

10/76

ALS (1977) Discovered How

to Make Great Use of f()

See, also, Forsund and Hjalmarsson (1974), Battese and Corra (1976)

Poirier,… Timmer, … several others.

11/76

The standard skew normal distribution

f( ) = 2 ( ) ( )

Azzalini (1985) Figured Out f()

And Noticed the Connection to ALS

© 2014

12/76

http://azzalini.stat.unipd.it/SN/

13/76

http://azzalini.stat.unipd.it/SN/abstracts.html#sn99

ALS

14/76

How to generate pseudo random draws on

1. Draw , from independent N[0,1]

2. = + | |u u

U V

V U

A Useful FAQ About the Skew Normal

15/76

2 2 22 2

2 2

For a particular desired and

Use and = 1 1

Then

(0,1) | (0,1) |

u v u

v uN N

Random Number Generator

16/76

How Many Applications of SF Are There?

17/76

2 ( ) ( )z z

W. D. Walls (2006) On Skewness in the Movies

Cites Azzalini.

18/76

“The skew-normal

distribution

developed by Sahu et

al. (2003)…”

Does not

know Azzalini.

SNARCH Model for Financial Crises (2013)

19/76

1

Mixed Logit Model

exp( )Prob( )

exp( )

Random Parameters

Asymmetric (Skewed) Parameter Distribution

| |~ (0, , )

i ij

i J

i ijj

ik ik

ik ik ik

Choice j

w

w v U SN

x

x

A Skew Normal Mixed Logit Model (2010)

Greene (2010, knows Azzalini and ALS),

Bhat (2011, knows not Azzalini … or ALS)

20/76

Foundation: An Entire Field

Stochastic Frontier Model

Occasional Modeling Strategy

Culture: Skewed Distribution of Movie Revenues

Finance: Crisis and Contagion

Choice Modeling: The Mixed Logit Model

How can these people find each other?

Where else do applications appear?

Skew Normal Applications

21/76

Stochastic Frontier

22/76

The Cross Section Departure Point: 1977

2

2

2

Aigner et al. (ALS) Stochastic Frontier Model

~ [0, ]

| | and ~ [0, ]

Jondrow et al. (JLMS) Inefficiency Estimator

( )ˆ [ | ]

1 ( )

,

i i i i

i v

i i i u

ii i i i

i

ui i i

y v u

v N

u U U N

u E u

v u

x

2 2, ,

iv u i

v

23/76

The Panel Data Models Appear: 1981

2 2

1

Pitt and Lee Random Effects Approach: 1981

~ [0, ], | | and ~ [0, ]

Counterpart to Jondrow et al. (1982)

( / )ˆ [ | ,..., ]

1 ( / )

=

it it it i

it v i i i u

it it i

ii i i iT i

i

ii

y v u

v N u U U N

v u

u E u

T

x

2

2 ,

1 1

u u

vT T

Time

fixed

24/76

Reinterpreting the Within Estimator: 1984

2

Schmidt and Sickles Fixed Effects Approach: 1984

~ [0, ],

.

Counterpart to Jondrow et al. (198

it i it it

it v i

y v

v N

x

semiparametically specified

fixed mean, constant variance

2)

ˆ ˆˆ max ( )

(The cost of the semiparametric specification is the

location of the inefficiency distribution. The authors

also revisit Pitt and Lee to demonstrate.)

i i i iuTime

fixed

25/76

Misgivings About Time Fixed Inefficiency: 1990-

2

0 1 2

2 1

Cornwell Schmidt and Sickles (1990)

Kumbhakar (1990)

[1 exp( )] | |

Battese and Coelli (1992, 1995)

exp[ ( )] | |, exp[ ( , , )] | |

Cuesta (2000)

exp[ ( )]

it i i i

it i

it i it it i

it i

t t

u bt ct U

u t T U u g t T U

u t T

z

| |, exp[ ( , , )] | |i it i it iU u g t T Uz

26/76

Are the systematically time varying models

more like time fixed or freely time varying?

A Pooled Model

Battese and Coelli (1992) exp[ ( )] | |

Pitt and Lee (1981) | |

Where is Battese and Coelli?

Closer to

it it it it

it i

it it it i

y v u

u t T U

y v U

x

x

the pooled model or to Pitt and Lee?

Greene (2004): Much closer to the Pitt and Lee model

27/76

2 2

In these models with time varying inefficiency,

( , ) | |

~ [0, ] and ~ [0, ],

where does unobserved time invariant

heterogeneity end up?

In the inefficiency! Even with t

it it it i it i

it v it u

y v g t U

v N U N

x z

he extensions.

28/76

Skepticism About Time Varying Inefficiency

Models: Greene (2004)

29/76

True Random Effects

30/76

True Random and Fixed Effects: 2004

2 2

True Random and Fixed Effects Approach: 2004

~ [0, ], | | and ~ [0, ]

Unobserved time invariant heterogeneity,

not unobserved time invariant inefficiency

Jo

it i it it it

it v it it it u

i

y v u

v N u U U N

x

2

2 2

ndrow et al. (JLMS) Inefficiency Estimator

( )[ | ]

1 ( )

, , ,

itit it it

it

u itit it it v u i

v

E u

v u

Time

varying

Time

fixed

31/76

Estimation of TFE and TRE Models: 2004

2 2

True Fixed Effects: MLE

~ [0, ], | | and ~ [0, ]

Unobserved time invariant heterogeneity,

not unobserved time invariant inefficiency

it i it it it

it v it it it u

i

y v u

v N u U U N

x

2 2 2

Just add firm dummy variables to the SF model (!)

True Random Effects: Maximum Simulated Likelihood (RPM)

( )

~ [0, ], | | and ~ [0, ], ~ [0, ]

it i it it it

it v it it it u i w

y w v u

v N u U U N w N

x

Unobserved time invariant heterogeneity,

not unobserved time invariant inefficiency

Random parameters stochastic frontier model

i

32/76

1

Log likelihood function for stochastic frontier model

2log log

log ( , , , ) = ( )

log

i i

N

i

i i

y

Ly

x

x

33/76

1 1 1

for stochastic frontier model

with a time invariant random constant term. (TRE model)

2 ( )

1log ( , , , , ) = log

( (

it w ir it

N R TS

w i r t

it w

y w

LR y w

Simulated log likelihood fun t

x

c ion

) )

draws from N[0,1].

ir it

irw

x

34/76

The Most Famous Frontier Study Ever

35/76

The Famous WHO Model

logCOMP= +1logPerCapitaHealthExpenditure +

2logYearsEduc +

3Log2YearsEduc +

= v - u

Schmidt/Sickles FEM

191 Countries.

140 of them observed 1993-1997.

36/76

The Notorious WHO Results

37

37/76

No, it

doesn’t.

August

12, 2012

37

38/76 Huffington Post, April 17, 2014

39/76

we are #37

40/76

Greene, W., Distinguishing Between

Heterogeneity and Inefficiency:

Stochastic Frontier Analysis of the

World Health Organization’s Panel

Data on National Health Care

Systems, Health Economics, 13, 2004,

pp. 959-980.

41/76

21, log , log , log

log , log ,

, , ,

Exp Ed Ed

PopDen PerCapitaGDP

GovtEff VoxPopuli OECD GINI

x

z

42/76

Three Extensions of the

True Random Effects Model

43/76

Generalized True Random Effects Stochastic Frontier Model

Transient random components

Time varying normal - half normal SF

Persistent random com

xit i i it it it

it it

y A B v u

v u

ponents

Time fixed normal - half normal SFi iA B

Generalized True Random Effects Model

44/76

A Stochastic Frontier Model with Short-Run and

Long-Run Inefficiency:

Colombi, R., Kumbhakar, S., Martini, G., Vittadini,

G., University of Bergamo, WP, 2011, JPA 2014,

forthcoming.

Tsionas, G. and Kumbhakar, S.

Firm Heterogeneity, Persistent and Transient Technical Inefficiency:

A Generalized True Random Effects Model

Journal of Applied Econometrics. Published online, November, 2012.

Extremely involved Bayesian MCMC procedure. Efficiency components estimated by

data augmentation.

45/76

2 2

Generalized True Random Effects Stochastic Frontier Model

( | |)

Time varying, transient random components

~ [0, ], | | and ~ [0, ],

Time

it w i i it it it

it v it it it u

y w e v u

v N u U U N

x

invariant random components

~ [0,1], ~ [0,1]

The random constant term in this model has a closed skew

normal distribution, instead of the usual normal distribution.

i iw N e N

46/76

Estimating Efficiency in the CSN Model

1 12

1

Moment Generating Function for the Multivariate CSN Distribution

( , )E[exp( ) | ] exp

( , )

(..., ) Multivariate normal cdf. Parts defined in Colombi et al.

Computed using

T ii i i

T i

Rr tt u y t Rr t t

Rr

1

GHK simulator.

1 0 0

0 1 0, = , , ...,

0 0 1

i

i

i

iT

e

u

u

u t

47/76

Estimating the GTRE Model

48/76

1

Colombi et al. Classical Maximum Likelihood Estimator

log ( , )log

log ( ( , )) log 2

(...) T-variate normal pdf.

(..., )) ( 1) Multivariate normal int

N T i i T

iq i i T

T

q

Lnq

T

y X 1 AVA

R y X 1

egral.

Very time consuming and complicated.

“From the sampling theory perspective, the application

of the model is computationally prohibitive when T is

large. This is because the likelihood function depends

on a (T+1)-dimensional integral of the normal

distribution.” [Tsionas and Kumbhakar (2012, p. 6)]

49/76

Kumbhakar, Lien, Hardaker

Technical Efficiency in Competing Panel Data Models: A Study of

Norwegian Grain Farming, JPA, Published online, September, 2012.

Three steps based on GLS:

(1) RE/FGLS to estimate (,)

(2) Decompose time varying residuals using MoM and SF.

(3) Decompose estimates of time invariant residuals.

50/76

1 1 1

Maximum Simulated Full Information log likelihood function for the

"generalized true random effects stochastic frontier model"

( | |)2,

1logL , = log

,

it w ir ir

TN RS

i r t

w

y w U

R

( ( | |) )

draws from N[0,1]

|U | absolute values of draws from N[0,1]

it

it w ir ir it

ir

ir

y w U

w

x

x

51/76

WHO Results: 2014

21, log , log , log

log , log ,

, , ,

it i i it it

Exp Ed Ed

PopDen PerCapitaGDP

GovtEff VoxPopuli OECD GINI

A B v u

x

z

52/76

53/76

Empirical application

Cost Efficiency of Swiss Railway

Companies

54/76

Model Specification

TC = f ( Y1, Y2, PL , PC , PE , N, NS, dt )

54

C : Total costs

Y1 : Passenger-km

Y2 : Ton-km

PL : Price of labor (wage per FTE)

PC : Price of capital (capital costs / total number of seats)

PE : Price of electricity

N : Network length

NS: Number of stations

Dt: time dummies

55/76

Data

50 railway companies

Period 1985 to 1997

unbalanced panel with number of periods (Ti) varying from 1 to 13 and

with 45 companies with 12 or 13 years, resulting in 605 observations

Data source: Swiss federal transport office

Data set available at http://people.stern.nyu.edu/wgreene/

Data set used in: Farsi, Filippini, Greene (2005), Efficiency and

measurement in network industries: application to the Swiss railway

companies, Journal of Regulatory Economics

55

56/76

57/76

58/76

Cost Efficiency Estimates

58

59/76

Correlations

60/76

MSL Estimation

61/76

Why is the MSL method so computationally

efficient compared to classical FIML and

Bayesian MCMC for this model?

Conditioned on the permanent effects, the group

observations are independent.

The joint conditional distribution is simple and easy to

compute, in closed form.

The full likelihood is obtained by integrating over only

one dimension. (This was discovered by Butler and

Moffitt in 1982.)

Neither of the other methods takes advantage of this

result. Both integrate over T+1 dimensions.

62/76

63/76

Equivalent Log Likelihood – Identical Outcome

One Dimensional Integration over δi

T+1 Dimensional Integration over Rei.

64/76

1 1

1log ( | , , , , , )

N R S

i ir w hi rG

R

Simulated [over (w,h)] Log Likelihood

Very Fast – with T=13, one minute or so

65/76

Also Simulated Log Likelihood

GHK simulator is used to approximate the T+1 variate normal

integrals.

Very Slow – Huge amount of unnecessary computation.

66/76

247 Farms, 6 years.

100 Halton draws.

Computation time:

35 seconds including

computing efficiencies.

Computation of the GTRE Model is Actually Fast and Easy

67/76

Simulation Variance

68/76

Does the simulation chatter degrade the

econometric efficiency of the MSL estimator?

Hajivassiliou, V., “Some practical issues in maximum simulated

likelihood,” Simulation-based Inference in Econometrics: Methods

and Applications, Mariano, R., Weeks, M. and Schuerman, T.,

Cambridge University Press, 2008

Speculated that Asy.Var[estimator] = V + (1/R)C

The contribution of the chatter would be of second or third order.

R is typically in the hundreds or thousands.

No other evidence on this subject.

69/76

An Experiment

Pooled Spanish Dairy Farms Data

Stochastic frontier using FIML.

Random constant term linear regression with

constant term equal to - |w|, w~ N[0,1]

This is equivalent to the stochastic frontier

model.

Maximum simulated likelihood

500 random draws for the simulation for the base case.

Uses Mersenne Twister for the RNG

50 repetitions of estimation based on 500 random

draws to suggest variation due to simulation chatter.

70/76

v

u

ˆ 0.10371

ˆ 0.15573

71/76

Chatter

.00543

.00590

.00042

.00119

Simulation Noise in Standard Errors of Coefficients

72/76

Quasi-Monte Carlo Integration Based on

Halton Sequences

0

Coverage of the unit interval is the objective,

not randomness of the set of draws.

Halton sequences --- Markov chain

p = a prime number,

r= the sequence of integers, decomposed as

H(r|p)

I i

iib p

b 1

0, ,...1 r = r (e.g., 10,11,12,...)

I i

iip

For example, using base p=5, the integer r=37 has b0 = 2, b1 = 2, and b3 = 1; (37=1x52 + 2x51 + 2x50). Then H(37|5) = 25-1 + 25-2 + 15-3 = 0.488.

73/76

Is It Really Simulation?

Halton or Sobol sequences are not

random

Far more stable than random draws, by a

factor of about 10.

There is no simulation chatter

View the same as numerical quadrature

There may be some approximation error.

How would we know?

74/76

0

I i

iib p

b

Halton sequences --- Markov chain

p = a prime number,

r= the sequence of integers, decomposed as

H(r|p)

Coverage of the unit interval is the objective,

not randomness of the set of draws.

1

0, ,...

I i

iip

1 r = r (e.g., 10,11,12,...)

Halton Sequences

75/76

1 1

S

1 1 1

LogL( , , , , , )

2

2log

LogL ( , , , , , )

2

1log

it it i

TNi i

i t it it i

it it ir

TN R

i r t

y

y

y

R

x

x

x

1

1

| |

Halton[prime( ), burn in]

Halton[prime( ), burn in]

it it ir

ir w ir h ir

ir

ir

y

W H

W w r

H h r

x

Haltonized Log Likelihood

76/76

Summary

The skew normal distribution

Two useful models for panel data (and one

potentially useful model pending development)

Extension of TRE model that allows both transient and

persistent random variation and inefficiency

Sample selection corrected stochastic frontier

Spatial autocorrelation stochastic frontier model

Methods: Maximum simulated likelihood as an

alternative to received brute force methods

Simpler

Faster

Accurate

Simulation “chatter” is a red herring – use Halton sequences

77/76

Sample Selection

78/76

TECHNICAL EFFICIENCY ANALYSIS CORRECTING FOR

BIASES FROM OBSERVED AND UNOBSERVED

VARIABLES: AN APPLICATION TO A NATURAL RESOURCE

MANAGEMENT PROJECT Empirical Economics: Volume 43, Issue 1 (2012), Pages 55-72

Boris Bravo-Ureta

University of Connecticut

Daniel Solis

University of Miami

William Greene

New York University

79/76

The MARENA Program in Honduras

Several programs have been implemented to address resource degradation while also seeking to improve productivity, managerial performance and reduce poverty (and in some cases make up for lack of public support).

One such effort is the Programa Multifase de Manejo de Recursos Naturales en Cuencas Prioritarias or MARENA in Honduras focusing on small scale hillside farmers.

80/76

Expected Impact Evaluation

81/76

Methods

A matched group of beneficiaries and control

farmers is determined using Propensity Score

Matching techniques to mitigate biases that

would stem from selection on observed

variables.

In addition, we deal with possible self-selection

on unobservables arising from unobserved

variables using a selectivity correction model for

stochastic frontiers introduced by Greene (2010).

82/76

A Sample Selected SF Model

di = 1[′zi + hi > 0], hi ~ N[0,12]

yi = + ′xi + i, i ~ N[0,2]

(yi,xi) observed only when di = 1.

i = vi - ui

ui = u|Ui| where Ui ~ N[0,12]

vi = vVi where Vi ~ N[0,12].

(hi,vi) ~ N2[(0,1), (1, v, v2)]

83/76

Simulated logL for the Standard SF Model

2 212exp[ ( |) / ]

( | ,| |)2

i i u i vi i i

v

y |Uf y U

xx

2 212

| |

exp[ ( |) / ]( | ) (| |) | |

2

i

i i u i vi i i i

Uv

y |Uf y p U d U

xx

2122exp[ | | ]

(| |) , |U | 0. (Half normal)2

ii i

Up U

2 212

1

1 exp[ ( |) / ]( | )

2

R i i u ir vi r

v

y |Uf y

R

xx

2 212

=1 1

1 exp[ ( |) / ]log ( , , , ) = log

2

N R i i u ir vS u v i r

v

y |UL

R

x

This is simply a linear regression with a random constant term, αi = α - σu |Ui |

84/76

Likelihood For a Sample Selected SF Model

2 212

2

| |

| ( , , ,| |)

exp ( | |) / )

2 (1 ) ( )

( | |) /

1

| ( , , ) | ( , , ,| |) (| |) | |

i

i i i i i

i i u i v

v

i i i

i i u i i

i i i i i i i i i i iU

f y d U

y U

d dy U

f y d f y d U f U d U

x z

x

zx z

x z x z

85/76

Simulated Log Likelihood for a Selectivity

Corrected Stochastic Frontier Model

2 212

1 12

exp ( | |) / )

2

1 ( | |) /log ( , , , , , ) log 1

(1 ) ( )

i i u ir v

i

v

N Ri i u ir iS u v i r

i i

y Ud

y ULR

d

x

x z

z

The simulation is over the inefficiency term.

86/76

JLMS Estimator of ui

2 212

2

1 1

1

ˆˆ ˆ ˆexp ( | |) / )

ˆ 2ˆ

ˆˆ ˆ ˆ ˆ( | |) /

ˆ1

1 1ˆ ˆˆ ˆˆ = ( | |) ,

ˆˆ Estimator of [ | ]

ˆ

ˆ

i i u ir v

v

ir

i i u ir v i

R R

i u ir ir i irr r

ii i i

i

R

irr

y U

f

y U a

A U f B fR R

Au E u

B

g

x

x

1

1

ˆˆ ˆ ˆ| | where , 1

ˆ

Riru ir ir irR r

irr

fU g g

f

87/76

Closed Form for the Selection Model

The selection model can be estimated without

simulation

“The stochastic frontier model with correction

for sample selection revisited.” Lai, Hung-pin.

Forthcoming, JPA

Based on closed skew normal distribution

Similar to Maddala’s 1982 result for the linear

selection model. See slide 42.

Not more computationally efficient.

Statistical properties identical.

Suggested possibility that simulation chatter is an element of

inefficiency in the maximum simulated likelihood estimator.

88/76

Spanish Dairy Farms: Selection based on being farm #1-125. 6 periods

The theory works.

Closed Form vs. Simulation

89/76

Variables Used

in the Analysis

Production

Participation

90/76

Findings from the First Wave

91/76

A Panel Data Model

Selection takes place only at the baseline.

There is no attrition.

0 0 0

0

0 0 0

1[ > 0] Sample Selector

, 0,1,... Stochastic Frontier

Selection effect is exerted on ; Corr( , , )

( , ) ( ) ( | )

C

i i i

it i it it it

i i i

it i i it i

d h

y w v u t

w h w

P y d P d P y d

z

x

0

0 1 0 00

0 1 0 0 0

onditioned on the selection ( ) observations are independent.

( , ,..., | ) ( | )

I.e., the selection is acting like a permanent random effect.

( , ,..., , ) ( ) (

i

T

i i iT i it it

T

i i iT i i it

h

P y y y d P y d

P y y y d P d P y 0| )t id

92/76

Simulated Log Likelihood

,

2 212

1 1 0

0

2

log ( , , , , )

exp ( | |) / )

21log

( | |) /

1

i

S C u v

it it u itr v

T vR

d r t

it it u itr v i

L

y U

R y U a

x

x

93/76

Benefit group is more efficient in both years

The gap is wider in the second year

Both means increase from year 0 to year 1

Both variances decline from year 0 to year 1

Main Empirical Conclusions from Waves 0 and 1

94/76

95/76

Spatial Autocorrelation

96/76

Spatial Stochastic Frontier Models: Accounting for Unobserved

Local Determinants of Inefficiency: A.M.Schmidt, A.R.B.Morris,

S.M.Helfand, T.C.O.Fonseca, Journal of Productivity Analysis, 31,

2009, pp. 101-112

Simply redefines the random effect to be a ‘region effect.’ Just a

reinterpretation of the ‘group.’ No spatial decay with distance.

True REM does not “perform” as well as several other

specifications. (“Performance” has nothing to do with the frontier

model.)

True Random Spatial Effects

Recommended