Upload
cecily-palmer
View
222
Download
4
Tags:
Embed Size (px)
Citation preview
Part 23: Parameter Heterogeneity [1/115]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Part 23: Parameter Heterogeneity [2/115]
Econometric Analysis of Panel Data
23. Individual Heterogeneity
and Random Parameter Variation
Part 23: Parameter Heterogeneity [3/115]
Heterogeneity
Observational: Observable differences across individuals (e.g., choice makers)
Choice strategy: How consumers make decisions – the underlying behavior
Structural: Differences in model frameworks
Preferences: Differences in model ‘parameters’
Part 23: Parameter Heterogeneity [4/115]
Parameter Heterogeneity
i,t it
it i,t
(1) Regression model
y ε
(2) Conditional probability or other nonlinear model
f(y | x , )
(3) Heterogeneity - how are parameters distributed across
individuals?
(a) Discr
i,t i
i
x β
β
ete - the population contains a mixture of Q
types of individuals.
(b) Continuous. Parameters are part of the stochastic
structure of the population.
Part 23: Parameter Heterogeneity [5/115]
Distinguish Bayes and Classical Both depart from the heterogeneous ‘model,’ f(yit|
xit)=g(yit,xit,βi) What do we mean by ‘randomness’
With respect to the information of the analyst (Bayesian) With respect to some stochastic process governing ‘nature’
(Classical) Bayesian: No difference between ‘fixed’ and
‘random’ Classical: Full specification of joint distributions for
observed random variables; piecemeal definitions of ‘random’ parameters. Usually a form of ‘random effects’
Part 23: Parameter Heterogeneity [6/115]
Fixed Management and Technical Efficiency in a Random Coefficients Model
Antonio Alvarez, University of Oviedo
Carlos Arias, University of Leon
William Greene, Stern School of Business, New York University
Part 23: Parameter Heterogeneity [7/115]
The Production Function Model
2
2
1=
21
2
ln ln (ln )
ln
it x xxit it
m mm xmi i it i it
y x x
m m x m v
Definition: Maximal output, given the inputs
Inputs: Variable factors, Quasi-fixed (land)
Form: Log-quadratic - translog
Latent Management as an unobservable input
Part 23: Parameter Heterogeneity [8/115]
Application to Spanish Dairy Farms
Input Units Mean Std. Dev. Minimum Maximum
Milk Milk production (liters) 131,108 92,539 14,110 727,281
Cows # of milking cows 2.12 11.27 4.5 82.3
Labor # man-equivalent units 1.67 0.55 1.0 4.0
Land Hectares of land devoted to pasture and crops.
12.99 6.17 2.0 45.1
Feed Total amount of feedstuffs fed to dairy cows (tons)
57,941 47,981 3,924.14 376,732
N = 247 farms, T = 6 years (1993-1998)
Part 23: Parameter Heterogeneity [9/115]
Translog Production Model
*
121 1 1
* *2 *1 12 2 1
i
ln = ln -
ln ln ln
+ ln
+ -
m * is an unobserved, time invariant
it it it
K K K
k itk kl itk itlk k l
K
m i mm i km itk ik
it it
y y u
x x x
m m x m
v u
effect.
*
* * 2 21 12 21
= ln - ln
ln 0.
it it it
K
m km kit i i mm i ik
u y y
x m m m m
Part 23: Parameter Heterogeneity [10/115]
Random Coefficients Model
* *2 *1 12 21
12 1 1
121 1 1
ln ln
ln ln
ln ln ln
K
it m i mm i k km i itkk
K K
kl itk itl it itk l
K K K
i ki itk kl itk itl itk k l
y m m m x
x x v u
x x x
*
1
ln K
i k k ik
m x w
[Chamberlain/Mundlak:]
(1) Same random effect appears in each random parameter
(2) Only the first order terms are random
Part 23: Parameter Heterogeneity [11/115]
Discrete vs. Continuous Variation Classical context: Description of how parameters are
distributed across individuals Variation
Discrete: Finite number of different parameter vectors distributed across individuals Mixture is unknown as well as the parameters: Implies
randomness from the point of the analyst. (Bayesian?) Might also be viewed as discrete approximation to a continuous
distribution Continuous: There exists a stochastic process governing the
distribution of parameters, drawn from a continuous pool of candidates.
Background common assumption: An over-reaching stochastic process that assigns parameters to individuals
Part 23: Parameter Heterogeneity [12/115]
Discrete Parameter Variation
The Latent Class Model
(1) Population is a (finite) mixture of Q types of individuals.
q = 1,...,Q. Q 'classes' differentiated by ( )
(a) Analyst does not know class memberships. ('latent.')
qβ
J1 Q q=1 q
i,t it
(b) 'Mixing probabilities' (from the point of view of the
analyst) are ,..., , with 1
(2) Conditional density is
P(y | class q) f(y | , )
i,t qx β
Part 23: Parameter Heterogeneity [13/115]
Latent Classes A population contains a mixture of
individuals of different types (classes) Common form of the data generating
mechanism within the classes Observed outcome y is governed by the
common process F(y|x,j ) Classes are distinguished by the
parameters, j.
Part 23: Parameter Heterogeneity [14/115]
Part 23: Parameter Heterogeneity [15/115]
Part 23: Parameter Heterogeneity [16/115]
Part 23: Parameter Heterogeneity [17/115]
How Finite Mixture Models Work
Part 23: Parameter Heterogeneity [18/115]
ˆ
1 y - 7.05737 1 y - 3.25966F(y) =.28547 +.71453
3.79628 3.79628 1.81941 1.81941
Find the ‘Best’ Fitting Mixture of Two Normal Densities
1000
2 i jji=1 j=1
j j
y -μ1 LogL = log π
σ σ
Maximum Likelihood Estimates
Class 1 Class 2
Estimate Std. Error Estimate Std. error
μ 7.05737 .77151 3.25966 .09824
σ 3.79628 .25395 1.81941 .10858
π .28547 .05953 .71453 .05953
Part 23: Parameter Heterogeneity [19/115]
Mixing probabilities .715 and .285
Part 23: Parameter Heterogeneity [20/115]
Approximation
Actual Distribution
Part 23: Parameter Heterogeneity [21/115]
Application Shoe Brand Choice
Simulated Data: Stated Choice, 400 respondents, 8 choice situations
3 choice/attributes + NONE Fashion = High=1 / Low=0 Quality = High=1 / Low=0 Price = 25/50/75,100,125 coded 1,2,3,4,5 then divided by 25.
Heterogeneity: Sex, Age (<25, 25-39, 40+) categorical
Underlying data generated by a 3 class latent class process (100, 200, 100 in classes)
Thanks to www.statisticalinnovations.com (Latent Gold)
Part 23: Parameter Heterogeneity [22/115]
A Random Utility Model
Random Utility Model for Discrete Choice Among J alternatives at time t by person i.
Uitj = j + ′xitj + ijt
j = Choice specific constant
xitj = Attributes of choice presented to person (Information processing strategy. Not all attributes will be evaluated. E.g., lexicographic utility functions over certain attributes.)
= ‘Taste weights,’ ‘Part worths,’ marginal utilities
ijt = Unobserved random component of utility
Mean=E[ijt] = 0; Variance=Var[ijt] = 2
Part 23: Parameter Heterogeneity [23/115]
The Multinomial Logit Model
Independent type 1 extreme value (Gumbel): F(itj) = 1 – Exp(-Exp(itj)) Independence across utility functions Identical variances, 2 = π2/6 Same taste parameters for all individuals
j itj
J (i,t)
j itjj=1
exp(α +β'x )Prob[choice j |i, t] =
exp(α +β'x )
Part 23: Parameter Heterogeneity [24/115]
Estimated MNL
+---------------------------------------------+| Discrete choice (multinomial logit) model || Log likelihood function -4158.503 || Akaike IC= 8325.006 Bayes IC= 8349.289 || R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj || Constants only -4391.1804 .05299 .05259 |+---------------------------------------------++---------+--------------+----------------+--------+---------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |+---------+--------------+----------------+--------+---------+ BF 1.47890473 .06776814 21.823 .0000 BQ 1.01372755 .06444532 15.730 .0000 BP -11.8023376 .80406103 -14.678 .0000 BN .03679254 .07176387 .513 .6082
Part 23: Parameter Heterogeneity [25/115]
Latent Classes and Random Parameters
Q
i iq=1
i,choice classi
j=choice i, j c lass
i,q i,q
Pr(Choice ) = Pr(choice | c lass =q)Pr(c lass =q)
exp(x β )
Heterogeneity with resp
Pr(choice | c lass =q) =Σ exp(x β )
ePr(c lass =
ect
q | i
to 'latent' cons
) = , e.g.,
umer c lasses
F =
( ) ,
i q
q=classes i q
i,choice ii i
j=choice i, j i
i qi q i,q
q=classes i q
Q
i q=1
xp(z )
Σ exp(z )
exp(x β
Simple discrete ran
)Pr(choice |β ) =
Σ exp(x β )
exp(z )Pr β β = q =1,...,Q
Σ exp(
dom param
z )
Pr(Choice
eter v
) =
ariatio
(
n
Pr c
δ
δ
δ
δ
i q qhoice |β β )Pr(β )
Part 23: Parameter Heterogeneity [26/115]
Estimated Latent Class
Model
+---------------------------------------------+| Latent Class Logit Model || Log likelihood function -3649.132 |+---------------------------------------------++---------+--------------+----------------+--------+---------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |+---------+--------------+----------------+--------+---------+ Utility parameters in latent class -->> 1 BF|1 3.02569837 .14335927 21.106 .0000 BQ|1 -.08781664 .12271563 -.716 .4742 BP|1 -9.69638056 1.40807055 -6.886 .0000 BN|1 1.28998874 .14533927 8.876 .0000 Utility parameters in latent class -->> 2 BF|2 1.19721944 .10652336 11.239 .0000 BQ|2 1.11574955 .09712630 11.488 .0000 BP|2 -13.9345351 1.22424326 -11.382 .0000 BN|2 -.43137842 .10789864 -3.998 .0001 Utility parameters in latent class -->> 3 BF|3 -.17167791 .10507720 -1.634 .1023 BQ|3 2.71880759 .11598720 23.441 .0000 BP|3 -8.96483046 1.31314897 -6.827 .0000 BN|3 .18639318 .12553591 1.485 .1376 This is THETA(1) in class probability model. Constant -.90344530 .34993290 -2.582 .0098 _MALE|1 .64182630 .34107555 1.882 .0599 _AGE25|1 2.13320852 .31898707 6.687 .0000 _AGE39|1 .72630019 .42693187 1.701 .0889 This is THETA(2) in class probability model. Constant .37636493 .33156623 1.135 .2563 _MALE|2 -2.76536019 .68144724 -4.058 .0000 _AGE25|2 -.11945858 .54363073 -.220 .8261 _AGE39|2 1.97656718 .70318717 2.811 .0049 This is THETA(3) in class probability model. Constant .000000 ......(Fixed Parameter)....... _MALE|3 .000000 ......(Fixed Parameter)....... _AGE25|3 .000000 ......(Fixed Parameter)....... _AGE39|3 .000000 ......(Fixed Parameter).......
Part 23: Parameter Heterogeneity [27/115]
Latent Class Elasticities
+-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Effects on probabilities of all choices in the model: | | Attribute is PRICE in choice B1 MNL LCM | | * Choice=B1 .000 .000 .000 -.889 -.801 | | Choice=B2 .000 .000 .000 .291 .273 | | Choice=B3 .000 .000 .000 .291 .248 | | Choice=NONE .000 .000 .000 .291 .219 | | Attribute is PRICE in choice B2 | | Choice=B1 .000 .000 .000 .313 .311 | | * Choice=B2 .000 .000 .000 -1.222 -1.248 | | Choice=B3 .000 .000 .000 .313 .284 | | Choice=NONE .000 .000 .000 .313 .268 | | Attribute is PRICE in choice B3 | | Choice=B1 .000 .000 .000 .366 .314 | | Choice=B2 .000 .000 .000 .366 .344 | | * Choice=B3 .000 .000 .000 -.755 -.674 | | Choice=NONE .000 .000 .000 .366 .302 | +-----------------------------------------------------------------+
Part 23: Parameter Heterogeneity [28/115]
Individual Specific Means
Part 23: Parameter Heterogeneity [29/115]
A Practical Distinction
Finite Mixture (Discrete Mixture): Functional form strategy Component densities have no meaning Mixing probabilities have no meaning There is no question of “class membership” The number of classes is uninteresting – enough to get a good
fit Latent Class:
Mixture of subpopulations Component densities are believed to be definable “groups”
(Low Users and High Users in Bago d’Uva and Jones application)
The classification problem is interesting – who is in which class?
Posterior probabilities, P(class|y,x) have meaning Question of the number of classes has content in the context
of the analysis
Part 23: Parameter Heterogeneity [30/115]
The Latent Class Model
it it
(1) There are Q classes, unobservable to the analyst
(2) Class specific model: f(y | ,class q) g(y , )
(3) Conditional class probabilities (possibly given some
information, ) P(class=q|
it it q
i
x ,x β
z z
iq Q
q 1
q iq iQ
, )
Common multinomial logit form for prior class probabilities
exp( ) P(class=q| , ) , =
exp( )
Note, if no , = log( / ).
i
i qi Q
i q
i
δ
zδz δ δ 0
zδ
z
Part 23: Parameter Heterogeneity [31/115]
Estimating an LC Model
i
i
i,t i,t it i,t
i
T
i1 i2 i,T it i,tt 1
i
Conditional density for each observation is
P(y | ,class q) f(y | , )
Joint conditional density for T observations is
f(y ,y ,...,y | ) f(y | , )
(T may be 1. This is not
q
i q q
x x β
X ,β x β
i
i
TQ
i1 i2 i,T iq it i,tq 1 t 1
only a 'panel data' model.)
Maximize this for each class if the classes are known.
They aren't. Unconditional density for individual i is
f(y ,y ,...,y | , ) f(y | , )
LogLikeli
i i qX z x β
iTN Q
1 Q iq it i,ti 1 q 1 t 1
hood
LogL( ,..., , ,..., ) log f(y | , )
1 Q qβ β δ δ x β
Part 23: Parameter Heterogeneity [32/115]
Estimating Which Class
i
i
iq
i
T
i1 i2 i,T i it i,tt 1
Prob[class=q| ]=
for T observations is
P(y ,y ,...,y | ,class q) f(y | , )
membership is the pro
i
q
Prior class probability z
Joint conditional density
X x β
Joint density for data and classi
i
i
i
T
i1 i2 i,T i q it i,tt 1
ii1 i2 i,T i
i1 i2 i,T i
duct
P(y ,y ,...,y ,class q| , ) f(y | , )
P( ,class q| , )P(class q| y ,y ,...,y , , )
P(y ,y ,...,y | , )
i q
i ii
i
X z x β
Posterior probability for class, given the data
y X zX z
X z
iQ
iq 1
T
iq it i,tt 1i i
P( ,class q| , )
P( ,class q| , )
Use Bayes Theorem to compute the
f(y | , )w(q| , , ) P(class j | , , )
i i
i i
qi i i i
y X z
y X z
posterior (conditional) probability
x βy X z y X z
i
iTQ
iq it i,tq 1 t 1
iq
f(y | , )
w
qx β
Best guess = the class with the largest posterior probability.
Part 23: Parameter Heterogeneity [33/115]
‘Estimating’ βi
Q
iq=1
Q
iqq=1
ˆ(1) Use from the class with the largest estimated probability
(2) Probabilistic - in the same spirit as the 'posterior mean'
ˆ ˆ = Posterior Prob[class=q|data]
ˆˆ = w
Note:
j
i q
q
β
β β
β
This estimates E[ | ], not itself.i i i i iβ y ,X ,z β
Part 23: Parameter Heterogeneity [34/115]
How Many Classes?
(1) Q is not a 'parameter' - can't 'estimate' Q with
and
(2) Can't 'test' down or 'up' to Q by comparing
log likelihoods. Degrees of freedom for Q+1
vs. Q classes is not well define
β
d.
(3) Use AKAIKE IC; AIC = -2 logL + 2#Parameters.
Part 23: Parameter Heterogeneity [35/115]
Modeling Obesity with a Latent Class Model
Mark HarrisDepartment of Economics, Curtin University
Bruce HollingsworthDepartment of Economics, Lancaster University
Pushkar MaitraDepartment of Economics, Monash University
William GreeneStern School of Business, New York University
Part 23: Parameter Heterogeneity [36/115]
300 Million People Worldwide. International Obesity Task Force: www.iotf.org
Part 23: Parameter Heterogeneity [37/115]
Costs of Obesity In the US more people are obese than
smoke or use illegal drugs Obesity is a major risk factor for non-
communicable diseases like heart problems and cancer
Obesity is also associated with: lower wages and productivity, and absenteeism low self-esteem
An economic problem. It is costly to society: USA costs are around 4-8% of all annual health
care expenditure - US $100 billion Canada, 5%; France, 1.5-2.5%; and New
Zealand 2.5%
Part 23: Parameter Heterogeneity [38/115]
Measuring Obesity
An individual’s weight given their height should lie within a certain range Body Mass Index (BMI) Weight (Kg)/height(Meters)2
World Health Organization guidelines: Underweight BMI < 18.5 Normal 18.5 < BMI < 25 Overweight 25 < BMI < 30 Obese BMI > 30 Morbidly Obese BMI > 40
Part 23: Parameter Heterogeneity [39/115]
Two Latent Classes: Approximately Half of European Individuals
Part 23: Parameter Heterogeneity [40/115]
Modeling BMI Outcomes Grossman-type health production function
Health Outcomes = f(inputs) Existing literature assumes BMI is an ordinal, not
cardinal, representation of individuals. Weight-related health status Do not assume a one-to-one relationship between BMI
levels and (weight-related) health status levels Translate BMI values into an ordinal scale using WHO
guidelines Preserves underlying ordinal nature of the BMI index
but recognizes that individuals within a so-defined weight range are of an (approximately) equivalent (weight-related) health status level
Part 23: Parameter Heterogeneity [41/115]
Conversion to a Discrete Measure
Measurement issues: Tendency to under-report BMI women tend to under-estimate/report
weight; men over-report height.
Using bands should alleviate this Allows focus on discrete ‘at risk’
groups
Part 23: Parameter Heterogeneity [42/115]
A Censored Regression Model for BMI
Simple Regression Approach Based on Actual BMI:
BMI* = ′x + , ~ N[0,2] , σ2 = 1
True BMI = weight proxy is unobserved
Interval Censored Regression Approach
WT = 0 if BMI* < 25 Normal 1 if 25 < BMI* < 30 Overweight
2 if BMI* > 30 Obese
Inadequate accommodation of heterogeneity Inflexible reliance on WHO classification Rigid measurement by the guidelines
Part 23: Parameter Heterogeneity [43/115]
Heterogeneity in the BMI Ranges
Boundaries are set by the WHO narrowly defined for all individuals
Strictly defined WHO definitions may consequently push individuals into inappropriate categories
We allow flexibility at the margins of these intervals
Following Pudney and Shields (2000) therefore we consider Generalised Ordered Choice models - boundary parameters are now functions of observed personal characteristics
Part 23: Parameter Heterogeneity [44/115]
Generalized Ordered Probit Approach
A Latent Regression Model for True BMIBMIi* = ′xi + i , i ~ N[0,σ2], σ2 = 1
Observation Mechanism for Weight Type
WTi = 0 if BMIi* < 0 Normal
1 if 0 < BMIi* < i(wi) Overweight 2 if (wi) < BMIi* Obese
Part 23: Parameter Heterogeneity [45/115]
Latent Class Modeling Several ‘types’ or ‘classes. Obesity be due to
genetic reasons (the FTO gene) or lifestyle factors
Distinct sets of individuals may have differing reactions to various policy tools and/or characteristics
The observer does not know from the data which class an individual is in.
Suggests a latent class approach for health outcomes(Deb and Trivedi, 2002, and Bago d’Uva, 2005)
Part 23: Parameter Heterogeneity [46/115]
Latent Class Application
Two class model (considering FTO gene): More classes make class interpretations much
more difficult Parametric models proliferate parameters
Endogenous class membership: Two classes allow us to correlate the equations driving class membership and observed weight outcomes via unobservables.
Part 23: Parameter Heterogeneity [47/115]
Heterogeneous Class Probabilities
j = Prob(class=j) = governor of a detached natural process. Homogeneous.
ij = Prob(class=j|zi,individual i)Now possibly a behavioral aspect of the process, no longer “detached” or “natural”
Nagin and Land 1993, “Criminal Careers…
Part 23: Parameter Heterogeneity [48/115]
Endogeneity of Class Membership
,
,
Class Membership: C* = , C = 1[C* > 0] (Probit)
BMI|Class=0,1 BMI* = , BMI group = OP[BMI*, ( )]
10Endogeneity: ~ ,
10
Bivaria
z
x w
i i
c i c i c i
i c
c i c
u
uN
te Ordered Probit (one variable is binary).
Full information maximum likelihood.
Part 23: Parameter Heterogeneity [49/115]
Model Components
x: determines observed weight levels within classes For observed weight levels we use lifestyle factors
such as marital status and exercise levels z: determines latent classes For latent class determination we use genetic proxies
such as age, gender and ethnicity: the things we can’t change
w: determines position of boundary parameters within classes
For the boundary parameters we have: weight-training intensity and age (BMI inappropriate for the aged?) pregnancy (small numbers and length of term unknown)
Part 23: Parameter Heterogeneity [50/115]
Data US National Health Interview
Survey (2005); conducted by the National Center for Health Statistics
Information on self-reported height and weight levels, BMI levels
Demographic information Split sample (30,000+) by gender
Part 23: Parameter Heterogeneity [51/115]
Outcome Probabilities Class 0 dominated by normal and overweight probabilities ‘normal weight’
class Class 1 dominated by probabilities at top end of the scale ‘non-normal weight’ Unobservables for weight class membership, negatively correlated with those
determining weight levels:
Part 23: Parameter Heterogeneity [52/115]
Normal Overweight Obese
Normal Overweight Obese
Class 0
Class 1
Part 23: Parameter Heterogeneity [53/115]
Classification (Latent Probit) Model
Part 23: Parameter Heterogeneity [54/115]
BMI Ordered Choice Model Conditional on class membership, lifestyle factors Marriage comfort factor only for normal class women Both classes associated with income, education Exercise effects similar in magnitude Exercise intensity only important for ‘non-normal’ class: Home ownership only important for .non-normal.class, and
negative: result of differing socieconomic status distributions across classes?
Part 23: Parameter Heterogeneity [55/115]
Effects of Aging on Weight Class
Part 23: Parameter Heterogeneity [56/115]
Effect of Education on Probabilities
Part 23: Parameter Heterogeneity [57/115]
Effect of Income on Probabilities
Part 23: Parameter Heterogeneity [58/115]
Inflated Responses in Self-Assessed Health
Mark HarrisDepartment of Economics, Curtin University
Bruce HollingsworthDepartment of Economics, Lancaster University
William GreeneStern School of Business, New York University
Part 23: Parameter Heterogeneity [59/115]
Introduction
Health sector an important part of developed countries’ economies: E.g., Australia 9% of GDP
To see if these resources are being effectively utilized, we need to fully understand the determinants of individuals’ health levels
To this end much policy, and even more academic research, is based on measures of self-assessed health (SAH) from survey data
Part 23: Parameter Heterogeneity [60/115]
SAH vs. Objective Health MeasuresFavorable SAH categories seem artificially high. 60% of Australians are either overweight or obese (Dunstan et. al,
2001) 1 in 4 Australians has either diabetes or a condition of impaired glucose
metabolism Over 50% of the population has elevated cholesterol Over 50% has at least 1 of the “deadly quartet” of health conditions
(diabetes, obesity, high blood pressure, high cholestrol) Nearly 4 out of 5 Australians have 1 or more long term health conditions
(National Health Survey, Australian Bureau of Statistics 2006) Australia ranked #1 in terms of obesity ratesSimilar results appear for other countries
Part 23: Parameter Heterogeneity [61/115]
SAH vs. Objective Health
1. Are these SAH outcomes are “over-inflated”
2. And if so, why, and what kinds of people are doing the over-inflating/mis-reporting?
Part 23: Parameter Heterogeneity [62/115]
HILDA Data
The Household, Income and Labour Dynamics in Australia (HILDA) dataset:1. a longitudinal survey of households in Australia2. well tried and tested dataset3. contains a host of information on SAH and other health
measures, as well as numerous demographic variables
Part 23: Parameter Heterogeneity [63/115]
Self Assessed Health “In general, would you say your health is:
Excellent, Very good, Good, Fair or Poor?" Responses 1,2,3,4,5 (we will be using 0,1,2,3,4) Typically ¾ of responses are “good” or “very good”
health; in our data (HILDA) we get 72% Similar numbers for most developed countries Does this truly represent the health of the nation?
Part 23: Parameter Heterogeneity [64/115]
Part 23: Parameter Heterogeneity [65/115]
A Two Class Latent Class Model
True Reporter Misreporter
Part 23: Parameter Heterogeneity [66/115]
Reporter Type Model
*
= 1 if r* > 0 True reporter
0 if r* 0 Misreporter
r is unobserved
r r rr x
r
Part 23: Parameter Heterogeneity [67/115]
Y=4
Y=3
Y=2
Y=1
Y=0
Part 23: Parameter Heterogeneity [68/115]
Pr(true,y) = Pr(true) * Pr(y | true)
Part 23: Parameter Heterogeneity [69/115]
Mis-reporters choose either good or very good The response is determined by a probit model
* m m mm x
Y=3
Y=2
Part 23: Parameter Heterogeneity [70/115]
Part 23: Parameter Heterogeneity [71/115]
Observed Mixture of Two Classes
Part 23: Parameter Heterogeneity [72/115]
Pr( ) Pr( ) Pr( | ) Pr( ) Pr( | )y true y true misreporter y misreporter
Part 23: Parameter Heterogeneity [73/115]
Part 23: Parameter Heterogeneity [74/115]
Who are the Misreporters?
Part 23: Parameter Heterogeneity [75/115]
Priors and Posteriors
M=Misreporter, T=True reporter
Priors : Pr( ) ( ), Pr( ) ( )
Posteriors:
Noninflated outcomes 0, 1, 4
Pr( | 0,1,4) 0, Pr( | 0,1, 4) ( )
Inflated outcomes 2, 3
Pr(
r r
r
M x T x
M y T y x
Pr( 2 | )Pr( )| 2)
Pr( 2 | )Pr( ) Pr( 2 | )Pr( )
y M MM y
y M M y T T
Part 23: Parameter Heterogeneity [76/115]
General Results
Part 23: Parameter Heterogeneity [77/115]
Part 23: Parameter Heterogeneity [78/115]
Latent Class Efficiency Studies
Battese and Coelli – growing in weather “regimes” for Indonesian rice farmers
Kumbhakar and Orea – cost structures for U.S. Banks
Greene (Health Economics, 2005) – revisits WHO Year 2000 World Health Report
Part 23: Parameter Heterogeneity [79/115]
Studying Economic Efficiency in Health Care
Hospital and Nursing Home Cost efficiency Role of quality (not studied today)
Agency for Health Reseach and Quality (AHRQ)
Part 23: Parameter Heterogeneity [80/115]
Stochastic Frontier Analysis
logC = f(output, input prices, environment) + v + u
ε = v + u v = noise – the usual “disturbance” u = inefficiency
Frontier efficiency analysis Estimate parameters of model Estimate u (to the extent we are able – we use
E[u|ε]) Evaluate and compare observed firms in the
sample
Part 23: Parameter Heterogeneity [81/115]
Nursing Home Costs
44 Swiss nursing homes, 13 years Cost, Pk, Pl, output, two environmental
variables Estimate cost function Estimate inefficiency
Part 23: Parameter Heterogeneity [82/115]
Estimated Cost Efficiency
Part 23: Parameter Heterogeneity [83/115]
Inefficiency?
Not all agree with the presence (or identifiability) of “inefficiency” in market outcomes data.
Variation around the common production structure may all be nonsystematic and not controlled by management
Implication, no inefficiency: u = 0.
Part 23: Parameter Heterogeneity [84/115]
A Two Class Model
Class 1: With Inefficiency logC = f(output, input prices, environment) + vv +
uu
Class 2: Without Inefficiency logC = f(output, input prices, environment) + vv
u = 0
Implement with a single zero restriction in a constrained (same cost function) two class model
Parameterization: λ = u /v = 0 in class 2.
Part 23: Parameter Heterogeneity [85/115]
LogL= 464 with a common frontier model, 527 with two classes
Part 23: Parameter Heterogeneity [86/115]
Part 23: Parameter Heterogeneity [87/115]
Random Parameters (Mixed) Models
it it
it
A General Model Structure
f(y | ) g(y | , )
= a set of random parameters =
f( ) h( , , )
= a set of nonrandom parameters in the density of y
= a set of parame
it i it i
i i
i i i i
x ,β x ,β θ
β β + u
β | z = β z Ω
θ
Ω
it it
ters in the distribution of
Typical application "repeated measures" = panel
The "mixed" model
f(y | ) f(y | , )h( , , )d
forms the basis of a likelihood function for th
i
i
it i it i i i iβ
β
x ,z ,θ,Ω x ,β θ β z Ω β
e observed data.
Part 23: Parameter Heterogeneity [88/115]
Mixed Model Estimation WinBUGS:
MCMC User specifies the model – constructs the Gibbs Sampler/Metropolis Hastings
SAS: Proc Mixed. Classical Uses primarily a kind of GLS/GMM (method of moments algorithm for loglinear
models) Stata: Classical
Mixing done by quadrature. (Very slow for 2 or more dimensions) Several loglinear models - GLAMM
LIMDEP/NLOGIT Classical Mixing done by Monte Carlo integration – maximum simulated likelihood Numerous linear, nonlinear, loglinear models
Ken Train’s Gauss Code Monte Carlo integration Used by many researchers Mixed Logit (mixed multinomial logit) model only (but free!)
Programs differ on the models fitted, the algorithms, the paradigm, and the extensions provided to the simplest RPM, i = +wi.
Part 23: Parameter Heterogeneity [89/115]
Modeling Parameter Heterogeneity
i,t i,t
Conditional Model, linear or nonlinear
density : f(y | , ) g(y , )
Individual heterogeneity in the means of the parameters
+ , E[ | , ]
Heterogeneity in the variances o
i,t i i,t i
i i i i i i
x ,β θ ,x ,β θ
β =β Δz u u X z 0
i,k ik k
ik
f the parameters
Var[u | ] exp( )
Var[ | ] = = diag( )
(Different variables in may appear in means and variances.)
Free correlation: Var[ | ] = = , = a lower triangula
i i k
i i i
i
i i i i
z zδ
u z Φ
z
u z Σ ΓΦ Γ' Γ r
matrix with 1s on the diagonal.
Part 23: Parameter Heterogeneity [90/115]
A Mixed Probit Model
it it
Random parameters probit model
f(y | ) [(2y 1) ]
~N[ , ], =
diagonal matrix of standard deviations
= I lower triangular matrix o
it i it i
i i
2i
x ,β x β
β β+u
u 0 Σ Σ ΓΛ Γ'
Λ =
ΓiTN
iti=1 t 1
r I if uncorrelated
LogL( )= log [(2y 1) ] N[ , ]d
i
2it i iβ
β,Γ,Λ x β β ΓΛ Γ' β
Part 23: Parameter Heterogeneity [91/115]
Maximum Simulated Likelihood
TN
iti=1 t 1
K 1 K
logL( )=
log f(y | , )h( | , )d
,..., , ,..., ,
i
it i i i iβ
1
θ,Ω
x ,β θ β z Ω β
Ω=β,Δ, δ δ Γ
Part 23: Parameter Heterogeneity [92/115]
Simulated Log Likelihood for a Mixed Probit Model
i
i
it it
TN
iti=1 t 1
TN RSiti=1 r 1 t 1
Random parameters probit model
f(y | ) [(2y 1) ]
~N[ , ]
LogL( )= log [(2y 1) ] N[ , ]d
1LogL log [(2y 1) ( )]
RWe now max
i
it i it i
i i
2i
2it i iβ
it ir
x ,β x β
β β+u
u 0 ΓΛ Γ'
β,Γ x β β ΓΛ Γ' β
x β+ΓΛv
imize this function with respect to ( ).β,Γ,Λ
Part 23: Parameter Heterogeneity [93/115]
Application – Doctor Visits
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. DOCTOR = 1(Number of doctor visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of education
Part 23: Parameter Heterogeneity [94/115]
Estimates of a Mixed Probit Model
+---------------------------------------------+| Random Coefficients Probit Model || Dependent variable DOCTOR || Log likelihood function -16483.96 || Restricted log likelihood -17700.96 || Unbalanced panel has 7293 individuals. |+---------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Means for random parameters Constant -.09594899 .04049528 -2.369 .0178 AGE .02102471 .00053836 39.053 .0000 43.5256898 HHNINC -.03119127 .03383027 -.922 .3565 .35208362 EDUC -.02996487 .00265133 -11.302 .0000 11.3206310 MARRIED -.03664476 .01399541 -2.618 .0088 .75861817+---------+--------------+----------------+--------+---------+----------+ Constant .02642358 .05397131 .490 .6244 AGE .01538640 .00071823 21.423 .0000 43.5256898 HHNINC -.09775927 .04626475 -2.113 .0346 .35208362 EDUC -.02811308 .00350079 -8.031 .0000 11.3206310 MARRIED -.00930667 .01887548 -.493 .6220 .75861817
Part 23: Parameter Heterogeneity [95/115]
Random Parameters Probit
Diagonal elements of Cholesky matrix Constant .55259608 .05381892 10.268 .0000 AGE .279052D-04 .00041019 .068 .9458 HHNINC .03545309 .04094725 .866 .3866 EDUC .00994387 .00093271 10.661 .0000 MARRIED .01013553 .00643526 1.575 .1153 Below diagonal elements of Cholesky matrix lAGE_ONE .00668600 .00071466 9.355 .0000 lHHN_ONE -.23713634 .04341767 -5.462 .0000 lHHN_AGE .09364751 .03357731 2.789 .0053 lEDU_ONE .01461359 .00355382 4.112 .0000 lEDU_AGE -.00189900 .00167248 -1.135 .2562 lEDU_HHN .00991594 .00154877 6.402 .0000 lMAR_ONE -.04871097 .01854192 -2.627 .0086 lMAR_AGE -.02059540 .01362752 -1.511 .1307 lMAR_HHN -.12276339 .01546791 -7.937 .0000 lMAR_EDU .09557751 .01233448 7.749 .0000
Part 23: Parameter Heterogeneity [96/115]
Application Shoe Brand Choice
Simulated Data: Stated Choice, 400 respondents, 8 choice situations
3 choice/attributes + NONE Fashion = High=1 / Low=0 Quality = High=1 / Low=0 Price = 25/50/75,100,125 coded 1,2,3,4,5 then divided by 25.
Heterogeneity: Sex, Age (<25, 25-39, 40+) categorical
Underlying data generated by a 3 class latent class process (100, 200, 100 in classes)
Thanks to www.statisticalinnovations.com (Latent Gold and Jordan Louviere)
Part 23: Parameter Heterogeneity [97/115]
A Discrete (4 Brand) Choice Model with Heterogeneous and Heteroscedastic Random Parameters
i,1,t F,i i,1,t Q i,1,t P,i i,1,t i,1,t
i,2,t F,i i,2,t Q i,2,t P,i i,2,t i,2,t
i,3,t F,i i,3,t Q i,3,t P,i i,3,t i,3,t
i,NONE,
U =β Fashion +β Quality +β Price +ε
U =β Fashion +β Quality +β Price +ε
U =β Fashion +β Quality +β Price +ε
U t NONE i,NONE,t
F,i F F i F F1 i F2 i F,i F,i
P,i P P i P P1 i P2 i P,i P,i
=α +ε
β =β +δ Sex +[σ exp(γ AgeL25 +γ Age2539)] w ; w ~N[0,1]
β =β +δ Sex +[σ exp(γ AgeL25 +γ Age2539)] w ; w ~N[0,1]
Part 23: Parameter Heterogeneity [98/115]
Multinomial Logit Model Estimates
Part 23: Parameter Heterogeneity [99/115]
Mixed Logit Estimates+---------------------------------------------+| Random Parameters Logit Model || Log likelihood function -3911.945 || At start values -4158.5029 .05929 .05811 |+---------------------------------------------++---------+--------------+----------------+--------+---------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |+---------+--------------+----------------+--------+---------+ Random parameters in utility functions BF 1.46523951 .12626655 11.604 .0000 BQ 1.14369857 .16954024 6.746 .0000 Nonrandom parameters in utility functions BP -12.1098155 .91584476 -13.223 .0000 BN .17706909 .07784730 2.275 .0229 Heterogeneity in mean, Parameter:Variable BF:MAL .28052695 .14266576 1.966 .0493 BQ:MAL -.42310284 .20387789 -2.075 .0380 Derived standard deviations of parameter distributions NsBF 1.16430284 .13731611 8.479 .0000 NsBQ 1.81872569 .18108194 10.044 .0000 Heteroscedasticity in random parameters sBF|AG -.32466344 .16986949 -1.911 .0560 sBF0|AG -.51032609 .23975740 -2.129 .0333 sBQ|AG -.37953350 .13798031 -2.751 .0059 sBQ0|AG -.41636803 .17143046 -2.429 .0151
Part 23: Parameter Heterogeneity [100/115]
Estimated Elasticities
+--------------------------------------------------------------+ | Elasticity Averaged over observations. | | Effects on probabilities of all choices in the model: | | Attribute is PRICE in choice B1 RPL MNL LCM | | * Choice=B1 .000 .000 -.818 -.889 -.801 | | Choice=B2 .000 .000 .240 .291 .273 | | Choice=B3 .000 .000 .244 .291 .248 | | Choice=NONE .000 .000 .241 .291 .219 | | Attribute is PRICE in choice B2 | | Choice=B1 .000 .000 .291 .313 .311 | | * Choice=B2 .000 .000 -1.100 -1.222 -1.248 | | Choice=B3 .000 .000 .270 .313 .284 | | Choice=NONE .000 .000 .276 .313 .268 | | Attribute is PRICE in choice B3 | | Choice=B1 .000 .000 .287 .366 .314 | | Choice=B2 .000 .000 .326 .366 .344 | | * Choice=B3 .000 .000 -.647 -.755 -.674 | | Choice=NONE .000 .000 .311 .366 .302 | +--------------------------------------------------------------+
Part 23: Parameter Heterogeneity [101/115]
Conditional Estimators
ˆ
ˆ ˆˆ
ˆ ˆˆ
i
i
TN R
ijt ir iti=1 r=1 t=1
T
i ijt i itt=1
R Tr=1 i,k,r t=1 ijt i it
i,k i Rr=1
Counterpart to Bayesian posterior mean and variance
1Ω =argmax log P (β |Ω,data )
R
L = P (β |Ω,data )
(1/R)Σ β Π P (β |Ω,data )E[β |data ] =
(1/R)Σ Π
ˆˆ ˆ
ˆ ˆˆ ˆ
ˆ ˆ
ˆ ˆ
R
i,r i,k,rr=1Tt=1 ijt i it
R 2 TRr=1 i,k,r t=1 ijt i it2 2
i,k i i,r i,k,rr=1R Tr=1 t=1 ijt i it
2i,k i i,k i i,k
1= w β
RP (β |Ω,data )
(1/R)Σ β Π P (β |Ω,data ) 1E[β |data ] = = w β
R(1/R)Σ Π P (β |Ω,data )
Var[β |data ] =E[β |data ] - E[β |d ˆ
2
i
i,k i i,k i
ata ]
E[β |data ]±2 Var[β |data ] will encompass 95% of any
reasonable distribution
Part 23: Parameter Heterogeneity [102/115]
Individual E[i|datai] Estimates
Estimated Upper and Lower Bounds for BPrice(i)
PERSON
-25
-15
-5
5
15
-3581 161 240 320 4001
B_P
rice
Part 23: Parameter Heterogeneity [103/115]
Disaggregated Parameters The description of classical methods as only producing
aggregate results is obviously untrue.
As regards “targeting specific groups…” both of these sets of methods produce estimates for the specific data in hand. Unless we want to trot out the specific individuals in this sample to do the analysis and marketing, any extension is problematic. This should be understood in both paradigms.
NEITHER METHOD PRODUCES ESTIMATES OF INDIVIDUAL PARAMETERS, CLAIMS TO THE CONTRARY NOTWITHSTANDING. BOTH PRODUCE ESTIMATES OF THE MEAN OF THE CONDITIONAL (POSTERIOR) DISTRIBUTION OF POSSIBLE PARAMETER DRAWS CONDITIONED ON THE PRECISE SPECIFIC DATA FOR INDIVIDUAL I.
Part 23: Parameter Heterogeneity [104/115]
Appendix: EM Algorithm
Part 23: Parameter Heterogeneity [105/115]
The EM Algorithm
i
i,j
i,j
TN J
c i,j i,t i,ti 1 j 1 t 1
Latent Class is a ' ' model
d 1 if individual i is a member of class j
If d were observed, the complete data log likelihood would be
logL log d f(y | data ,class j)
missing data
(Only one of the J terms would be nonzero.)
Expectation - Maximization algorithm has two steps
(1) Expectation Step: Form the 'Expected log likelihood'
given the data and a prior guess of the par
ameters.
(2) Maximize the expected log likelihood to obtain a new
guess for the model parameters.
(E.g., http://crow.ee.washington.edu/people/bulyko/papers/em.pdf)
Part 23: Parameter Heterogeneity [106/115]
Implementing EM0 0 0 0 0 0 0 0iq i1 i2 iQ q i1 i2 iq
iq
iq
Given initial guesses , ,..., , ,...,
E.g., use 1/Q for each and the MLE of from a one class
model. (Have to perturb each one slightly, as if all are equal
and
, β β β β
β
0
q
q i
all are the same, the model will satisfy the FOC.)
ˆ ˆˆ(1) Compute w(q|i) = posterior class probabilities, using ,
Reestimate each using a weighted log likelihood
ˆ Maximize wrt w
q
0
β
β δ
β
β iN T
q iti=1 t=1
iq
Nq i=1
N
i=1
log f(y | , )
(2) Reestimate by reestimating
ˆ If no , new =(1/N) w(q|i) using old and new ˆ ˆ
exp( )ˆ If , Maximize wrt w(q|i) log
exp(
i1 q
q
i
i qi q Q
q=1 i
x β
δ
z β
zδz δ
zδ
)
Now, return to step 1.
Iterate until convergence.
q
Part 23: Parameter Heterogeneity [107/115]
Appendix: Monte Carlo Integration
Part 23: Parameter Heterogeneity [108/115]
Monte Carlo Integration
range of v
(1) Integral is of the form
K = g(v|data, ) f(v| ) dv
where f(v) is the density of random variable v
possibly conditioned on a set of parameters
and g(v|data, ) is a function of data and
β Ω
Ω
β
r
parameters.
(2) By construction, K( ) = E[g(v|data, )]
(3) Strategy:
a. Sample R values from the population
of v using a random number generator.
b. Compute average K = (1/R) g(v |dat
Ω β
R
r=1a, )
By the law of large numbers, plim K = K.
β
Part 23: Parameter Heterogeneity [109/115]
Monte Carlo Integration
1
1( ) ( ) ( ) [ ( )]
( .)
ii
RP
ir i i i u iur
f u f u g u du E f uR
Certain smoothness conditions must be met
1 2
( ), ~ [0,1]
. ., ( ) [ , ]
ir
ir ir ir
ir ir
Drawing u by 'random sampling'
u t v v U
E g u v for N
Requires many draws, typically
hundreds or thousands
Part 23: Parameter Heterogeneity [110/115]
Example: Monte Carlo Integral
2
1 2 3
1
2 3
exp( v / 2)(x .9v) (x .9v) (x .9v) dv
2where is the standard normal CDF and x = .5,
x = -.2, x = .3. (Looks like a RE probit model.)
The weighting function for v is the standard normal.
Strat
r
1 2 3
egy: Draw R (say 1000) standard normal random
draws, v . Compute the 1000 functions
(x .9v) (x .9v) (x .9v) and average them.
(Based on 100, 1000, 10000, I get .28746, .28437, .27242)
Part 23: Parameter Heterogeneity [111/115]
Generating a Random DrawMost common approach is the "inverse probability transform"
Let u = a random draw from the standard uniform (0,1).
Let x = the desired population to draw from
Assume the CDF of x is F(x).
The random -1 draw is then x = F (u).
Example: exponential, . f(x)= exp(- x), F(x)=1-exp(- x)
Equate u to F(x), x = -(1/ )log(1-u).
Example: Normal( , ). Inverse function does not exist in
closed form. There are good polynomial approxi-
mations to produce a draw from N[0,1] from a U(0,1).
Then x = + v.
This leaves the question of how to draw the U(0,1).
Part 23: Parameter Heterogeneity [112/115]
Drawing Uniform Random Numbers
Computer generated random numbers are not random; they
are Markov chains that look random.
The Original Random Number Generator for 32 bit computers.
SEED originates at some large odd number
d3 = 2147483647.0
d2 = 2147483655.0
d1=16807.0
SEED=Mod(d1*SEED,d3) ! MOD(a,p) = a - INT(a/p) * p
X=SEED/d2 is a random value between 0 and 1.
Problems:
(1) Short period. Based on 32 bits, s 31o recycles after 2 1 values
(2) Evidently not very close to random. (Recent tests have
discredited this RNG)
Part 23: Parameter Heterogeneity [113/115]
L’Ecuyer’s RNG
Define: norm = 2.328306549295728e-10,m1 = 4294967087.0, m1 = 4294944443.0,a12 = 140358.0, a13n = 810728.0,a21 = 527612.0, a23n = 1370589.0,
Initialize s10 = the seed, s11 = 4231773.0,s12 = 1975.0, s20 = 137228743.0,s21 = 98426597.0, s22 = 142859843.0.
Preliminaries for each draw (Resets at least some of 5 seeds) p1 = a12*s11 - a13n*s10, k = int(p1/m1), p1 = p1 - k*m1 if p1 < 0, p1 = p1 + m1, s10 = s11, s11 = s12, s12 = p1; p2 = a21*s22 - a23n*s20, k = int(p2/m2), p2 = p2 - k*m2 if p2 < 0, p2 = p2 + m2, s20 = s21, s21 = s22, s22 = p2;Compute the random number u = norm*(p1 - p2) if p1 > p2, u = norm*(p1 - p2 + m1) otherwise.
Passes all known randomness tests. Period = 2191
Pierre L'Ecuyer. Canada Research Chair in Stochastic Simulation and Optimization. Département d'informatique et de recherche opérationnelleUniversity of Montreal.
Part 23: Parameter Heterogeneity [114/115]
Quasi-Monte Carlo Integration Based on Halton Sequences
0
Coverage of the unit interval is the objective,
not randomness of the set of draws.
Halton sequences --- Markov chain
p = a prime number,
r= the sequence of integers, decomposed as
H(r|p)
I iii
b p
b 1
0, ,...1 r = r (e.g., 10,11,12,...)
I iiip
For example, using base p=5, the integer r=37 has b0 = 2, b1 = 2, and b3 = 1; (37=1x52 + 2x51 + 2x50). Then H(37|5) = 25-1 + 25-2 + 15-3 = 0.448.
Part 23: Parameter Heterogeneity [115/115]
Halton Sequences vs. Random Draws
Requires far fewer draws – for one dimension, about 1/10. Accelerates estimation by a factor of 5 to 10.