Part 23: Parameter Heterogeneity [1/115] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business

Part 23: Parameter Heterogeneity [1/115]

Econometric Analysis of Panel Data

William Greene

Department of Economics

Stern School of Business


Econometric Analysis of Panel Data

23. Individual Heterogeneity

and Random Parameter Variation


Heterogeneity

Observational: Observable differences across individuals (e.g., choice makers)

Choice strategy: How consumers make decisions – the underlying behavior

Structural: Differences in model frameworks

Preferences: Differences in model ‘parameters’


Parameter Heterogeneity

i,t it

it i,t

(1) Regression model

y ε

(2) Conditional probability or other nonlinear model

f(y | x , )

(3) Heterogeneity - how are parameters distributed across

individuals?

(a) Discr

i,t i

i

x β

β

ete - the population contains a mixture of Q

types of individuals.

(b) Continuous. Parameters are part of the stochastic

structure of the population.


Distinguish Bayes and Classical Both depart from the heterogeneous ‘model,’ f(yit|

xit)=g(yit,xit,βi) What do we mean by ‘randomness’

With respect to the information of the analyst (Bayesian) With respect to some stochastic process governing ‘nature’

(Classical) Bayesian: No difference between ‘fixed’ and

‘random’ Classical: Full specification of joint distributions for

observed random variables; piecemeal definitions of ‘random’ parameters. Usually a form of ‘random effects’


Fixed Management and Technical Efficiency in a Random Coefficients Model

Antonio Alvarez, University of Oviedo

Carlos Arias, University of Leon

William Greene, Stern School of Business, New York University


The Production Function Model

2

2

1=

21

2

ln ln (ln )

ln

it x xxit it

m mm xmi i it i it

y x x

m m x m v

Definition: Maximal output, given the inputs

Inputs: Variable factors, Quasi-fixed (land)

Form: Log-quadratic - translog

Latent Management as an unobservable input


Application to Spanish Dairy Farms

Input Units Mean Std. Dev. Minimum Maximum

Milk Milk production (liters) 131,108 92,539 14,110 727,281

Cows # of milking cows 2.12 11.27 4.5 82.3

Labor # man-equivalent units 1.67 0.55 1.0 4.0

Land Hectares of land devoted to pasture and crops.

12.99 6.17 2.0 45.1

Feed Total amount of feedstuffs fed to dairy cows (tons)

57,941 47,981 3,924.14 376,732

N = 247 farms, T = 6 years (1993-1998)


Translog Production Model

*

121 1 1

* *2 *1 12 2 1

i

ln = ln -

ln ln ln

+ ln

+ -

m * is an unobserved, time invariant

it it it

K K K

k itk kl itk itlk k l

K

m i mm i km itk ik

it it

y y u

x x x

m m x m

v u

effect.

*

* * 2 21 12 21

= ln - ln

ln 0.

it it it

K

m km kit i i mm i ik

u y y

x m m m m


Random Coefficients Model

* *2 *1 12 21

12 1 1

121 1 1

ln ln

ln ln

ln ln ln

K

it m i mm i k km i itkk

K K

kl itk itl it itk l

K K K

i ki itk kl itk itl itk k l

y m m m x

x x v u

x x x

*

1

ln K

i k k ik

m x w

[Chamberlain/Mundlak:]

(1) Same random effect appears in each random parameter

(2) Only the first order terms are random


Discrete vs. Continuous Variation Classical context: Description of how parameters are

distributed across individuals Variation

Discrete: Finite number of different parameter vectors distributed across individuals Mixture is unknown as well as the parameters: Implies

randomness from the point of the analyst. (Bayesian?) Might also be viewed as discrete approximation to a continuous

distribution Continuous: There exists a stochastic process governing the

distribution of parameters, drawn from a continuous pool of candidates.

Background common assumption: An over-reaching stochastic process that assigns parameters to individuals


Discrete Parameter Variation

The Latent Class Model

(1) Population is a (finite) mixture of Q types of individuals.

q = 1,...,Q. Q 'classes' differentiated by ( )

(a) Analyst does not know class memberships. ('latent.')

qβ

J1 Q q=1 q

i,t it

(b) 'Mixing probabilities' (from the point of view of the

analyst) are ,..., , with 1

(2) Conditional density is

P(y | class q) f(y | , )

i,t qx β


Latent Classes A population contains a mixture of

individuals of different types (classes) Common form of the data generating

mechanism within the classes Observed outcome y is governed by the

common process F(y|x,j ) Classes are distinguished by the

parameters, j.





How Finite Mixture Models Work


ˆ

1 y - 7.05737 1 y - 3.25966F(y) =.28547 +.71453

3.79628 3.79628 1.81941 1.81941

Find the ‘Best’ Fitting Mixture of Two Normal Densities

1000

2 i jji=1 j=1

j j

y -μ1 LogL = log π

σ σ

Maximum Likelihood Estimates

Class 1 Class 2

Estimate Std. Error Estimate Std. error

μ 7.05737 .77151 3.25966 .09824

σ 3.79628 .25395 1.81941 .10858

π .28547 .05953 .71453 .05953


Mixing probabilities .715 and .285


Approximation

Actual Distribution


Application Shoe Brand Choice

Simulated Data: Stated Choice, 400 respondents, 8 choice situations

3 choice/attributes + NONE Fashion = High=1 / Low=0 Quality = High=1 / Low=0 Price = 25/50/75,100,125 coded 1,2,3,4,5 then divided by 25.

Heterogeneity: Sex, Age (<25, 25-39, 40+) categorical

Underlying data generated by a 3 class latent class process (100, 200, 100 in classes)

Thanks to www.statisticalinnovations.com (Latent Gold)


A Random Utility Model

Random Utility Model for Discrete Choice Among J alternatives at time t by person i.

Uitj = j + ′xitj + ijt

j = Choice specific constant

xitj = Attributes of choice presented to person (Information processing strategy. Not all attributes will be evaluated. E.g., lexicographic utility functions over certain attributes.)

= ‘Taste weights,’ ‘Part worths,’ marginal utilities

ijt = Unobserved random component of utility

Mean=E[ijt] = 0; Variance=Var[ijt] = 2


The Multinomial Logit Model

Independent type 1 extreme value (Gumbel): F(itj) = 1 – Exp(-Exp(itj)) Independence across utility functions Identical variances, 2 = π2/6 Same taste parameters for all individuals

j itj

J (i,t)

j itjj=1

exp(α +β'x )Prob[choice j |i, t] =

exp(α +β'x )


Estimated MNL

+---------------------------------------------+| Discrete choice (multinomial logit) model || Log likelihood function -4158.503 || Akaike IC= 8325.006 Bayes IC= 8349.289 || R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj || Constants only -4391.1804 .05299 .05259 |+---------------------------------------------++---------+--------------+----------------+--------+---------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |+---------+--------------+----------------+--------+---------+ BF 1.47890473 .06776814 21.823 .0000 BQ 1.01372755 .06444532 15.730 .0000 BP -11.8023376 .80406103 -14.678 .0000 BN .03679254 .07176387 .513 .6082


Latent Classes and Random Parameters

Q

i iq=1

i,choice classi

j=choice i, j c lass

i,q i,q

Pr(Choice ) = Pr(choice | c lass =q)Pr(c lass =q)

exp(x β )

Heterogeneity with resp

Pr(choice | c lass =q) =Σ exp(x β )

ePr(c lass =

ect

q | i

to 'latent' cons

) = , e.g.,

umer c lasses

F =

( ) ,

i q

q=classes i q

i,choice ii i

j=choice i, j i

i qi q i,q

q=classes i q

Q

i q=1

xp(z )

Σ exp(z )

exp(x β

Simple discrete ran

)Pr(choice |β ) =

Σ exp(x β )

exp(z )Pr β β = q =1,...,Q

Σ exp(

dom param

z )

Pr(Choice

eter v

) =

ariatio

(

n

Pr c

δ

δ

δ

δ

i q qhoice |β β )Pr(β )


Estimated Latent Class

Model

+---------------------------------------------+| Latent Class Logit Model || Log likelihood function -3649.132 |+---------------------------------------------++---------+--------------+----------------+--------+---------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |+---------+--------------+----------------+--------+---------+ Utility parameters in latent class -->> 1 BF|1 3.02569837 .14335927 21.106 .0000 BQ|1 -.08781664 .12271563 -.716 .4742 BP|1 -9.69638056 1.40807055 -6.886 .0000 BN|1 1.28998874 .14533927 8.876 .0000 Utility parameters in latent class -->> 2 BF|2 1.19721944 .10652336 11.239 .0000 BQ|2 1.11574955 .09712630 11.488 .0000 BP|2 -13.9345351 1.22424326 -11.382 .0000 BN|2 -.43137842 .10789864 -3.998 .0001 Utility parameters in latent class -->> 3 BF|3 -.17167791 .10507720 -1.634 .1023 BQ|3 2.71880759 .11598720 23.441 .0000 BP|3 -8.96483046 1.31314897 -6.827 .0000 BN|3 .18639318 .12553591 1.485 .1376 This is THETA(1) in class probability model. Constant -.90344530 .34993290 -2.582 .0098 _MALE|1 .64182630 .34107555 1.882 .0599 _AGE25|1 2.13320852 .31898707 6.687 .0000 _AGE39|1 .72630019 .42693187 1.701 .0889 This is THETA(2) in class probability model. Constant .37636493 .33156623 1.135 .2563 _MALE|2 -2.76536019 .68144724 -4.058 .0000 _AGE25|2 -.11945858 .54363073 -.220 .8261 _AGE39|2 1.97656718 .70318717 2.811 .0049 This is THETA(3) in class probability model. Constant .000000 ......(Fixed Parameter)....... _MALE|3 .000000 ......(Fixed Parameter)....... _AGE25|3 .000000 ......(Fixed Parameter)....... _AGE39|3 .000000 ......(Fixed Parameter).......


Latent Class Elasticities

+-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Effects on probabilities of all choices in the model: | | Attribute is PRICE in choice B1 MNL LCM | | * Choice=B1 .000 .000 .000 -.889 -.801 | | Choice=B2 .000 .000 .000 .291 .273 | | Choice=B3 .000 .000 .000 .291 .248 | | Choice=NONE .000 .000 .000 .291 .219 | | Attribute is PRICE in choice B2 | | Choice=B1 .000 .000 .000 .313 .311 | | * Choice=B2 .000 .000 .000 -1.222 -1.248 | | Choice=B3 .000 .000 .000 .313 .284 | | Choice=NONE .000 .000 .000 .313 .268 | | Attribute is PRICE in choice B3 | | Choice=B1 .000 .000 .000 .366 .314 | | Choice=B2 .000 .000 .000 .366 .344 | | * Choice=B3 .000 .000 .000 -.755 -.674 | | Choice=NONE .000 .000 .000 .366 .302 | +-----------------------------------------------------------------+


Individual Specific Means


A Practical Distinction

Finite Mixture (Discrete Mixture): Functional form strategy Component densities have no meaning Mixing probabilities have no meaning There is no question of “class membership” The number of classes is uninteresting – enough to get a good

fit Latent Class:

Mixture of subpopulations Component densities are believed to be definable “groups”

(Low Users and High Users in Bago d’Uva and Jones application)

The classification problem is interesting – who is in which class?

Posterior probabilities, P(class|y,x) have meaning Question of the number of classes has content in the context

of the analysis


The Latent Class Model

it it

(1) There are Q classes, unobservable to the analyst

(2) Class specific model: f(y | ,class q) g(y , )

(3) Conditional class probabilities (possibly given some

information, ) P(class=q|

it it q

i

x ,x β

z z

iq Q

q 1

q iq iQ

, )

Common multinomial logit form for prior class probabilities

exp( ) P(class=q| , ) , =

exp( )

Note, if no , = log( / ).

i

i qi Q

i q

i

δ

zδz δ δ 0

zδ

z


Estimating an LC Model

i

i

i,t i,t it i,t

i

T

i1 i2 i,T it i,tt 1

i

Conditional density for each observation is

P(y | ,class q) f(y | , )

Joint conditional density for T observations is

f(y ,y ,...,y | ) f(y | , )

(T may be 1. This is not

q

i q q

x x β

X ,β x β

i

i

TQ

i1 i2 i,T iq it i,tq 1 t 1

only a 'panel data' model.)

Maximize this for each class if the classes are known.

They aren't. Unconditional density for individual i is

f(y ,y ,...,y | , ) f(y | , )

LogLikeli

i i qX z x β

iTN Q

1 Q iq it i,ti 1 q 1 t 1

hood

LogL( ,..., , ,..., ) log f(y | , )

1 Q qβ β δ δ x β


Estimating Which Class

i

i

iq

i

T

i1 i2 i,T i it i,tt 1

Prob[class=q| ]=

for T observations is

P(y ,y ,...,y | ,class q) f(y | , )

membership is the pro

i

q

Prior class probability z

Joint conditional density

X x β

Joint density for data and classi

i

i

i

T

i1 i2 i,T i q it i,tt 1

ii1 i2 i,T i

i1 i2 i,T i

duct

P(y ,y ,...,y ,class q| , ) f(y | , )

P( ,class q| , )P(class q| y ,y ,...,y , , )

P(y ,y ,...,y | , )

i q

i ii

i

X z x β

Posterior probability for class, given the data

y X zX z

X z

iQ

iq 1

T

iq it i,tt 1i i

P( ,class q| , )

P( ,class q| , )

Use Bayes Theorem to compute the

f(y | , )w(q| , , ) P(class j | , , )

i i

i i

qi i i i

y X z

y X z

posterior (conditional) probability

x βy X z y X z

i

iTQ

iq it i,tq 1 t 1

iq

f(y | , )

w

qx β

Best guess = the class with the largest posterior probability.


‘Estimating’ βi

Q

iq=1

Q

iqq=1

ˆ(1) Use from the class with the largest estimated probability

(2) Probabilistic - in the same spirit as the 'posterior mean'

ˆ ˆ = Posterior Prob[class=q|data]

ˆˆ = w

Note:

j

i q

q

β

β β

β

This estimates E[ | ], not itself.i i i i iβ y ,X ,z β


How Many Classes?

(1) Q is not a 'parameter' - can't 'estimate' Q with

and

(2) Can't 'test' down or 'up' to Q by comparing

log likelihoods. Degrees of freedom for Q+1

vs. Q classes is not well define

β

d.

(3) Use AKAIKE IC; AIC = -2 logL + 2#Parameters.


Modeling Obesity with a Latent Class Model

Mark HarrisDepartment of Economics, Curtin University

Bruce HollingsworthDepartment of Economics, Lancaster University

Pushkar MaitraDepartment of Economics, Monash University

William GreeneStern School of Business, New York University


300 Million People Worldwide. International Obesity Task Force: www.iotf.org


Costs of Obesity In the US more people are obese than

smoke or use illegal drugs Obesity is a major risk factor for non-

communicable diseases like heart problems and cancer

Obesity is also associated with: lower wages and productivity, and absenteeism low self-esteem

An economic problem. It is costly to society: USA costs are around 4-8% of all annual health

care expenditure - US $100 billion Canada, 5%; France, 1.5-2.5%; and New

Zealand 2.5%


Measuring Obesity

An individual’s weight given their height should lie within a certain range Body Mass Index (BMI) Weight (Kg)/height(Meters)2

World Health Organization guidelines: Underweight BMI < 18.5 Normal 18.5 < BMI < 25 Overweight 25 < BMI < 30 Obese BMI > 30 Morbidly Obese BMI > 40


Two Latent Classes: Approximately Half of European Individuals


Modeling BMI Outcomes Grossman-type health production function

Health Outcomes = f(inputs) Existing literature assumes BMI is an ordinal, not

cardinal, representation of individuals. Weight-related health status Do not assume a one-to-one relationship between BMI

levels and (weight-related) health status levels Translate BMI values into an ordinal scale using WHO

guidelines Preserves underlying ordinal nature of the BMI index

but recognizes that individuals within a so-defined weight range are of an (approximately) equivalent (weight-related) health status level


Conversion to a Discrete Measure

Measurement issues: Tendency to under-report BMI women tend to under-estimate/report

weight; men over-report height.

Using bands should alleviate this Allows focus on discrete ‘at risk’

groups


A Censored Regression Model for BMI

Simple Regression Approach Based on Actual BMI:

BMI* = ′x + , ~ N[0,2] , σ2 = 1

True BMI = weight proxy is unobserved

Interval Censored Regression Approach

WT = 0 if BMI* < 25 Normal 1 if 25 < BMI* < 30 Overweight

2 if BMI* > 30 Obese

Inadequate accommodation of heterogeneity Inflexible reliance on WHO classification Rigid measurement by the guidelines


Heterogeneity in the BMI Ranges

Boundaries are set by the WHO narrowly defined for all individuals

Strictly defined WHO definitions may consequently push individuals into inappropriate categories

We allow flexibility at the margins of these intervals

Following Pudney and Shields (2000) therefore we consider Generalised Ordered Choice models - boundary parameters are now functions of observed personal characteristics


Generalized Ordered Probit Approach

A Latent Regression Model for True BMIBMIi* = ′xi + i , i ~ N[0,σ2], σ2 = 1

Observation Mechanism for Weight Type

WTi = 0 if BMIi* < 0 Normal

1 if 0 < BMIi* < i(wi) Overweight 2 if (wi) < BMIi* Obese


Latent Class Modeling Several ‘types’ or ‘classes. Obesity be due to

genetic reasons (the FTO gene) or lifestyle factors

Distinct sets of individuals may have differing reactions to various policy tools and/or characteristics

The observer does not know from the data which class an individual is in.

Suggests a latent class approach for health outcomes(Deb and Trivedi, 2002, and Bago d’Uva, 2005)


Latent Class Application

Two class model (considering FTO gene): More classes make class interpretations much

more difficult Parametric models proliferate parameters

Endogenous class membership: Two classes allow us to correlate the equations driving class membership and observed weight outcomes via unobservables.


Heterogeneous Class Probabilities

j = Prob(class=j) = governor of a detached natural process. Homogeneous.

ij = Prob(class=j|zi,individual i)Now possibly a behavioral aspect of the process, no longer “detached” or “natural”

Nagin and Land 1993, “Criminal Careers…


Endogeneity of Class Membership

,

,

Class Membership: C* = , C = 1[C* > 0] (Probit)

BMI|Class=0,1 BMI* = , BMI group = OP[BMI*, ( )]

10Endogeneity: ~ ,

10

Bivaria

z

x w

i i

c i c i c i

i c

c i c

u

uN

te Ordered Probit (one variable is binary).

Full information maximum likelihood.


Model Components

x: determines observed weight levels within classes For observed weight levels we use lifestyle factors

such as marital status and exercise levels z: determines latent classes For latent class determination we use genetic proxies

such as age, gender and ethnicity: the things we can’t change

w: determines position of boundary parameters within classes

For the boundary parameters we have: weight-training intensity and age (BMI inappropriate for the aged?) pregnancy (small numbers and length of term unknown)


Data US National Health Interview

Survey (2005); conducted by the National Center for Health Statistics

Information on self-reported height and weight levels, BMI levels

Demographic information Split sample (30,000+) by gender


Outcome Probabilities Class 0 dominated by normal and overweight probabilities ‘normal weight’

class Class 1 dominated by probabilities at top end of the scale ‘non-normal weight’ Unobservables for weight class membership, negatively correlated with those

determining weight levels:


Normal Overweight Obese

Normal Overweight Obese

Class 0

Class 1


Classification (Latent Probit) Model


BMI Ordered Choice Model Conditional on class membership, lifestyle factors Marriage comfort factor only for normal class women Both classes associated with income, education Exercise effects similar in magnitude Exercise intensity only important for ‘non-normal’ class: Home ownership only important for .non-normal.class, and

negative: result of differing socieconomic status distributions across classes?


Effects of Aging on Weight Class


Effect of Education on Probabilities


Effect of Income on Probabilities


Inflated Responses in Self-Assessed Health

Mark HarrisDepartment of Economics, Curtin University

Bruce HollingsworthDepartment of Economics, Lancaster University

William GreeneStern School of Business, New York University


Introduction

Health sector an important part of developed countries’ economies: E.g., Australia 9% of GDP

To see if these resources are being effectively utilized, we need to fully understand the determinants of individuals’ health levels

To this end much policy, and even more academic research, is based on measures of self-assessed health (SAH) from survey data


SAH vs. Objective Health MeasuresFavorable SAH categories seem artificially high. 60% of Australians are either overweight or obese (Dunstan et. al,

2001) 1 in 4 Australians has either diabetes or a condition of impaired glucose

metabolism Over 50% of the population has elevated cholesterol Over 50% has at least 1 of the “deadly quartet” of health conditions

(diabetes, obesity, high blood pressure, high cholestrol) Nearly 4 out of 5 Australians have 1 or more long term health conditions

(National Health Survey, Australian Bureau of Statistics 2006) Australia ranked #1 in terms of obesity ratesSimilar results appear for other countries


SAH vs. Objective Health

1. Are these SAH outcomes are “over-inflated”

2. And if so, why, and what kinds of people are doing the over-inflating/mis-reporting?


HILDA Data

The Household, Income and Labour Dynamics in Australia (HILDA) dataset:1. a longitudinal survey of households in Australia2. well tried and tested dataset3. contains a host of information on SAH and other health

measures, as well as numerous demographic variables


Self Assessed Health “In general, would you say your health is:

Excellent, Very good, Good, Fair or Poor?" Responses 1,2,3,4,5 (we will be using 0,1,2,3,4) Typically ¾ of responses are “good” or “very good”

health; in our data (HILDA) we get 72% Similar numbers for most developed countries Does this truly represent the health of the nation?



A Two Class Latent Class Model

True Reporter Misreporter


Reporter Type Model

*

= 1 if r* > 0 True reporter

0 if r* 0 Misreporter

r is unobserved

r r rr x

r


Y=4

Y=3

Y=2

Y=1

Y=0


Pr(true,y) = Pr(true) * Pr(y | true)


Mis-reporters choose either good or very good The response is determined by a probit model

* m m mm x

Y=3

Y=2



Observed Mixture of Two Classes


Pr( ) Pr( ) Pr( | ) Pr( ) Pr( | )y true y true misreporter y misreporter



Who are the Misreporters?


Priors and Posteriors

M=Misreporter, T=True reporter

Priors : Pr( ) ( ), Pr( ) ( )

Posteriors:

Noninflated outcomes 0, 1, 4

Pr( | 0,1,4) 0, Pr( | 0,1, 4) ( )

Inflated outcomes 2, 3

Pr(

r r

r

M x T x

M y T y x

Pr( 2 | )Pr( )| 2)

Pr( 2 | )Pr( ) Pr( 2 | )Pr( )

y M MM y

y M M y T T


General Results



Latent Class Efficiency Studies

Battese and Coelli – growing in weather “regimes” for Indonesian rice farmers

Kumbhakar and Orea – cost structures for U.S. Banks

Greene (Health Economics, 2005) – revisits WHO Year 2000 World Health Report


Studying Economic Efficiency in Health Care

Hospital and Nursing Home Cost efficiency Role of quality (not studied today)

Agency for Health Reseach and Quality (AHRQ)


Stochastic Frontier Analysis

logC = f(output, input prices, environment) + v + u

ε = v + u v = noise – the usual “disturbance” u = inefficiency

Frontier efficiency analysis Estimate parameters of model Estimate u (to the extent we are able – we use

E[u|ε]) Evaluate and compare observed firms in the

sample


Nursing Home Costs

44 Swiss nursing homes, 13 years Cost, Pk, Pl, output, two environmental

variables Estimate cost function Estimate inefficiency


Estimated Cost Efficiency


Inefficiency?

Not all agree with the presence (or identifiability) of “inefficiency” in market outcomes data.

Variation around the common production structure may all be nonsystematic and not controlled by management

Implication, no inefficiency: u = 0.


A Two Class Model

Class 1: With Inefficiency logC = f(output, input prices, environment) + vv +

uu

Class 2: Without Inefficiency logC = f(output, input prices, environment) + vv

u = 0

Implement with a single zero restriction in a constrained (same cost function) two class model

Parameterization: λ = u /v = 0 in class 2.


LogL= 464 with a common frontier model, 527 with two classes



Random Parameters (Mixed) Models

it it

it

A General Model Structure

f(y | ) g(y | , )

= a set of random parameters =

f( ) h( , , )

= a set of nonrandom parameters in the density of y

= a set of parame

it i it i

i i

i i i i

x ,β x ,β θ

β β + u

β | z = β z Ω

θ

Ω

it it

ters in the distribution of

Typical application "repeated measures" = panel

The "mixed" model

f(y | ) f(y | , )h( , , )d

forms the basis of a likelihood function for th

i

i

it i it i i i iβ

β

x ,z ,θ,Ω x ,β θ β z Ω β

e observed data.


Mixed Model Estimation WinBUGS:

MCMC User specifies the model – constructs the Gibbs Sampler/Metropolis Hastings

SAS: Proc Mixed. Classical Uses primarily a kind of GLS/GMM (method of moments algorithm for loglinear

models) Stata: Classical

Mixing done by quadrature. (Very slow for 2 or more dimensions) Several loglinear models - GLAMM

LIMDEP/NLOGIT Classical Mixing done by Monte Carlo integration – maximum simulated likelihood Numerous linear, nonlinear, loglinear models

Ken Train’s Gauss Code Monte Carlo integration Used by many researchers Mixed Logit (mixed multinomial logit) model only (but free!)

Programs differ on the models fitted, the algorithms, the paradigm, and the extensions provided to the simplest RPM, i = +wi.


Modeling Parameter Heterogeneity

i,t i,t

Conditional Model, linear or nonlinear

density : f(y | , ) g(y , )

Individual heterogeneity in the means of the parameters

+ , E[ | , ]

Heterogeneity in the variances o

i,t i i,t i

i i i i i i

x ,β θ ,x ,β θ

β =β Δz u u X z 0

i,k ik k

ik

f the parameters

Var[u | ] exp( )

Var[ | ] = = diag( )

(Different variables in may appear in means and variances.)

Free correlation: Var[ | ] = = , = a lower triangula

i i k

i i i

i

i i i i

z zδ

u z Φ

z

u z Σ ΓΦ Γ' Γ r

matrix with 1s on the diagonal.


A Mixed Probit Model

it it

Random parameters probit model

f(y | ) [(2y 1) ]

~N[ , ], =

diagonal matrix of standard deviations

= I lower triangular matrix o

it i it i

i i

2i

x ,β x β

β β+u

u 0 Σ Σ ΓΛ Γ'

Λ =

ΓiTN

iti=1 t 1

r I if uncorrelated

LogL( )= log [(2y 1) ] N[ , ]d

i

2it i iβ

β,Γ,Λ x β β ΓΛ Γ' β


Maximum Simulated Likelihood

TN

iti=1 t 1

K 1 K

logL( )=

log f(y | , )h( | , )d

,..., , ,..., ,

i

it i i i iβ

1

θ,Ω

x ,β θ β z Ω β

Ω=β,Δ, δ δ Γ


Simulated Log Likelihood for a Mixed Probit Model

i

i

it it

TN

iti=1 t 1

TN RSiti=1 r 1 t 1

Random parameters probit model

f(y | ) [(2y 1) ]

~N[ , ]

LogL( )= log [(2y 1) ] N[ , ]d

1LogL log [(2y 1) ( )]

RWe now max

i

it i it i

i i

2i

2it i iβ

it ir

x ,β x β

β β+u

u 0 ΓΛ Γ'

β,Γ x β β ΓΛ Γ' β

x β+ΓΛv

imize this function with respect to ( ).β,Γ,Λ


Application – Doctor Visits

German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. DOCTOR = 1(Number of doctor visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of education


Estimates of a Mixed Probit Model

+---------------------------------------------+| Random Coefficients Probit Model || Dependent variable DOCTOR || Log likelihood function -16483.96 || Restricted log likelihood -17700.96 || Unbalanced panel has 7293 individuals. |+---------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Means for random parameters Constant -.09594899 .04049528 -2.369 .0178 AGE .02102471 .00053836 39.053 .0000 43.5256898 HHNINC -.03119127 .03383027 -.922 .3565 .35208362 EDUC -.02996487 .00265133 -11.302 .0000 11.3206310 MARRIED -.03664476 .01399541 -2.618 .0088 .75861817+---------+--------------+----------------+--------+---------+----------+ Constant .02642358 .05397131 .490 .6244 AGE .01538640 .00071823 21.423 .0000 43.5256898 HHNINC -.09775927 .04626475 -2.113 .0346 .35208362 EDUC -.02811308 .00350079 -8.031 .0000 11.3206310 MARRIED -.00930667 .01887548 -.493 .6220 .75861817


Random Parameters Probit

Diagonal elements of Cholesky matrix Constant .55259608 .05381892 10.268 .0000 AGE .279052D-04 .00041019 .068 .9458 HHNINC .03545309 .04094725 .866 .3866 EDUC .00994387 .00093271 10.661 .0000 MARRIED .01013553 .00643526 1.575 .1153 Below diagonal elements of Cholesky matrix lAGE_ONE .00668600 .00071466 9.355 .0000 lHHN_ONE -.23713634 .04341767 -5.462 .0000 lHHN_AGE .09364751 .03357731 2.789 .0053 lEDU_ONE .01461359 .00355382 4.112 .0000 lEDU_AGE -.00189900 .00167248 -1.135 .2562 lEDU_HHN .00991594 .00154877 6.402 .0000 lMAR_ONE -.04871097 .01854192 -2.627 .0086 lMAR_AGE -.02059540 .01362752 -1.511 .1307 lMAR_HHN -.12276339 .01546791 -7.937 .0000 lMAR_EDU .09557751 .01233448 7.749 .0000


Application Shoe Brand Choice

Simulated Data: Stated Choice, 400 respondents, 8 choice situations

3 choice/attributes + NONE Fashion = High=1 / Low=0 Quality = High=1 / Low=0 Price = 25/50/75,100,125 coded 1,2,3,4,5 then divided by 25.

Heterogeneity: Sex, Age (<25, 25-39, 40+) categorical

Underlying data generated by a 3 class latent class process (100, 200, 100 in classes)

Thanks to www.statisticalinnovations.com (Latent Gold and Jordan Louviere)


A Discrete (4 Brand) Choice Model with Heterogeneous and Heteroscedastic Random Parameters

i,1,t F,i i,1,t Q i,1,t P,i i,1,t i,1,t



i,NONE,

U =β Fashion +β Quality +β Price +ε



U t NONE i,NONE,t

F,i F F i F F1 i F2 i F,i F,i

P,i P P i P P1 i P2 i P,i P,i

=α +ε

β =β +δ Sex +[σ exp(γ AgeL25 +γ Age2539)] w ; w ~N[0,1]

β =β +δ Sex +[σ exp(γ AgeL25 +γ Age2539)] w ; w ~N[0,1]


Multinomial Logit Model Estimates


Mixed Logit Estimates+---------------------------------------------+| Random Parameters Logit Model || Log likelihood function -3911.945 || At start values -4158.5029 .05929 .05811 |+---------------------------------------------++---------+--------------+----------------+--------+---------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |+---------+--------------+----------------+--------+---------+ Random parameters in utility functions BF 1.46523951 .12626655 11.604 .0000 BQ 1.14369857 .16954024 6.746 .0000 Nonrandom parameters in utility functions BP -12.1098155 .91584476 -13.223 .0000 BN .17706909 .07784730 2.275 .0229 Heterogeneity in mean, Parameter:Variable BF:MAL .28052695 .14266576 1.966 .0493 BQ:MAL -.42310284 .20387789 -2.075 .0380 Derived standard deviations of parameter distributions NsBF 1.16430284 .13731611 8.479 .0000 NsBQ 1.81872569 .18108194 10.044 .0000 Heteroscedasticity in random parameters sBF|AG -.32466344 .16986949 -1.911 .0560 sBF0|AG -.51032609 .23975740 -2.129 .0333 sBQ|AG -.37953350 .13798031 -2.751 .0059 sBQ0|AG -.41636803 .17143046 -2.429 .0151


Estimated Elasticities

+--------------------------------------------------------------+ | Elasticity Averaged over observations. | | Effects on probabilities of all choices in the model: | | Attribute is PRICE in choice B1 RPL MNL LCM | | * Choice=B1 .000 .000 -.818 -.889 -.801 | | Choice=B2 .000 .000 .240 .291 .273 | | Choice=B3 .000 .000 .244 .291 .248 | | Choice=NONE .000 .000 .241 .291 .219 | | Attribute is PRICE in choice B2 | | Choice=B1 .000 .000 .291 .313 .311 | | * Choice=B2 .000 .000 -1.100 -1.222 -1.248 | | Choice=B3 .000 .000 .270 .313 .284 | | Choice=NONE .000 .000 .276 .313 .268 | | Attribute is PRICE in choice B3 | | Choice=B1 .000 .000 .287 .366 .314 | | Choice=B2 .000 .000 .326 .366 .344 | | * Choice=B3 .000 .000 -.647 -.755 -.674 | | Choice=NONE .000 .000 .311 .366 .302 | +--------------------------------------------------------------+


Conditional Estimators

ˆ

ˆ ˆˆ

ˆ ˆˆ

i

i

TN R

ijt ir iti=1 r=1 t=1

T

i ijt i itt=1

R Tr=1 i,k,r t=1 ijt i it

i,k i Rr=1

Counterpart to Bayesian posterior mean and variance

1Ω =argmax log P (β |Ω,data )

R

L = P (β |Ω,data )

(1/R)Σ β Π P (β |Ω,data )E[β |data ] =

(1/R)Σ Π

ˆˆ ˆ

ˆ ˆˆ ˆ

ˆ ˆ

ˆ ˆ

R

i,r i,k,rr=1Tt=1 ijt i it

R 2 TRr=1 i,k,r t=1 ijt i it2 2

i,k i i,r i,k,rr=1R Tr=1 t=1 ijt i it

2i,k i i,k i i,k

1= w β

RP (β |Ω,data )

(1/R)Σ β Π P (β |Ω,data ) 1E[β |data ] = = w β

R(1/R)Σ Π P (β |Ω,data )

Var[β |data ] =E[β |data ] - E[β |d ˆ

2

i

i,k i i,k i

ata ]

E[β |data ]±2 Var[β |data ] will encompass 95% of any

reasonable distribution


Individual E[i|datai] Estimates

Estimated Upper and Lower Bounds for BPrice(i)

PERSON

-25

-15

-5

5

15

-3581 161 240 320 4001

B_P

rice


Disaggregated Parameters The description of classical methods as only producing

aggregate results is obviously untrue.

As regards “targeting specific groups…” both of these sets of methods produce estimates for the specific data in hand. Unless we want to trot out the specific individuals in this sample to do the analysis and marketing, any extension is problematic. This should be understood in both paradigms.

NEITHER METHOD PRODUCES ESTIMATES OF INDIVIDUAL PARAMETERS, CLAIMS TO THE CONTRARY NOTWITHSTANDING. BOTH PRODUCE ESTIMATES OF THE MEAN OF THE CONDITIONAL (POSTERIOR) DISTRIBUTION OF POSSIBLE PARAMETER DRAWS CONDITIONED ON THE PRECISE SPECIFIC DATA FOR INDIVIDUAL I.


Appendix: EM Algorithm


The EM Algorithm

i

i,j

i,j

TN J

c i,j i,t i,ti 1 j 1 t 1

Latent Class is a ' ' model

d 1 if individual i is a member of class j

If d were observed, the complete data log likelihood would be

logL log d f(y | data ,class j)

missing data

(Only one of the J terms would be nonzero.)

Expectation - Maximization algorithm has two steps

(1) Expectation Step: Form the 'Expected log likelihood'

given the data and a prior guess of the par

ameters.

(2) Maximize the expected log likelihood to obtain a new

guess for the model parameters.

(E.g., http://crow.ee.washington.edu/people/bulyko/papers/em.pdf)


Implementing EM0 0 0 0 0 0 0 0iq i1 i2 iQ q i1 i2 iq

iq

iq

Given initial guesses , ,..., , ,...,

E.g., use 1/Q for each and the MLE of from a one class

model. (Have to perturb each one slightly, as if all are equal

and

, β β β β

β

0

q

q i

all are the same, the model will satisfy the FOC.)

ˆ ˆˆ(1) Compute w(q|i) = posterior class probabilities, using ,

Reestimate each using a weighted log likelihood

ˆ Maximize wrt w

q

0

β

β δ

β

β iN T

q iti=1 t=1

iq

Nq i=1

N

i=1

log f(y | , )

(2) Reestimate by reestimating

ˆ If no , new =(1/N) w(q|i) using old and new ˆ ˆ

exp( )ˆ If , Maximize wrt w(q|i) log

exp(

i1 q

q

i

i qi q Q

q=1 i

x β

δ

z β

zδz δ

zδ

)

Now, return to step 1.

Iterate until convergence.

q


Appendix: Monte Carlo Integration


Monte Carlo Integration

range of v

(1) Integral is of the form

K = g(v|data, ) f(v| ) dv

where f(v) is the density of random variable v

possibly conditioned on a set of parameters

and g(v|data, ) is a function of data and

β Ω

Ω

β

r

parameters.

(2) By construction, K( ) = E[g(v|data, )]

(3) Strategy:

a. Sample R values from the population

of v using a random number generator.

b. Compute average K = (1/R) g(v |dat

Ω β

R

r=1a, )

By the law of large numbers, plim K = K.

β


Monte Carlo Integration

1

1( ) ( ) ( ) [ ( )]

( .)

ii

RP

ir i i i u iur

f u f u g u du E f uR

Certain smoothness conditions must be met

1 2

( ), ~ [0,1]

. ., ( ) [ , ]

ir

ir ir ir

ir ir

Drawing u by 'random sampling'

u t v v U

E g u v for N

Requires many draws, typically

hundreds or thousands


Example: Monte Carlo Integral

2

1 2 3

1

2 3

exp( v / 2)(x .9v) (x .9v) (x .9v) dv

2where is the standard normal CDF and x = .5,

x = -.2, x = .3. (Looks like a RE probit model.)

The weighting function for v is the standard normal.

Strat

r

1 2 3

egy: Draw R (say 1000) standard normal random

draws, v . Compute the 1000 functions

(x .9v) (x .9v) (x .9v) and average them.

(Based on 100, 1000, 10000, I get .28746, .28437, .27242)


Generating a Random DrawMost common approach is the "inverse probability transform"

Let u = a random draw from the standard uniform (0,1).

Let x = the desired population to draw from

Assume the CDF of x is F(x).

The random -1 draw is then x = F (u).

Example: exponential, . f(x)= exp(- x), F(x)=1-exp(- x)

Equate u to F(x), x = -(1/ )log(1-u).

Example: Normal( , ). Inverse function does not exist in

closed form. There are good polynomial approxi-

mations to produce a draw from N[0,1] from a U(0,1).

Then x = + v.

This leaves the question of how to draw the U(0,1).


Drawing Uniform Random Numbers

Computer generated random numbers are not random; they

are Markov chains that look random.

The Original Random Number Generator for 32 bit computers.

SEED originates at some large odd number

d3 = 2147483647.0

d2 = 2147483655.0

d1=16807.0

SEED=Mod(d1*SEED,d3) ! MOD(a,p) = a - INT(a/p) * p

X=SEED/d2 is a random value between 0 and 1.

Problems:

(1) Short period. Based on 32 bits, s 31o recycles after 2 1 values

(2) Evidently not very close to random. (Recent tests have

discredited this RNG)


L’Ecuyer’s RNG

Define: norm = 2.328306549295728e-10,m1 = 4294967087.0, m1 = 4294944443.0,a12 = 140358.0, a13n = 810728.0,a21 = 527612.0, a23n = 1370589.0,

Initialize s10 = the seed, s11 = 4231773.0,s12 = 1975.0, s20 = 137228743.0,s21 = 98426597.0, s22 = 142859843.0.

Preliminaries for each draw (Resets at least some of 5 seeds) p1 = a12*s11 - a13n*s10, k = int(p1/m1), p1 = p1 - k*m1 if p1 < 0, p1 = p1 + m1, s10 = s11, s11 = s12, s12 = p1; p2 = a21*s22 - a23n*s20, k = int(p2/m2), p2 = p2 - k*m2 if p2 < 0, p2 = p2 + m2, s20 = s21, s21 = s22, s22 = p2;Compute the random number u = norm*(p1 - p2) if p1 > p2, u = norm*(p1 - p2 + m1) otherwise.

Passes all known randomness tests. Period = 2191

Pierre L'Ecuyer. Canada Research Chair in Stochastic Simulation and Optimization. Département d'informatique et de recherche opérationnelleUniversity of Montreal.


Quasi-Monte Carlo Integration Based on Halton Sequences

0

Coverage of the unit interval is the objective,

not randomness of the set of draws.

Halton sequences --- Markov chain

p = a prime number,

r= the sequence of integers, decomposed as

H(r|p)

I iii

b p

b 1

0, ,...1 r = r (e.g., 10,11,12,...)

I iiip

For example, using base p=5, the integer r=37 has b0 = 2, b1 = 2, and b3 = 1; (37=1x52 + 2x51 + 2x50). Then H(37|5) = 25-1 + 25-2 + 15-3 = 0.448.


Halton Sequences vs. Random Draws

Requires far fewer draws – for one dimension, about 1/10. Accelerates estimation by a factor of 5 to 10.

Documents

Part 23: Parameter Heterogeneity [1/115] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business