View
215
Download
0
Category
Tags:
Preview:
Citation preview
1/53: Topic 3.1 – Models for Ordered Choices
Microeconometric Modeling
William GreeneStern School of BusinessNew York UniversityNew York NY USA
3.1 Models for Ordered Choices
2/53: Topic 3.1 – Models for Ordered Choices
Concepts
• Ordered Choice• Subjective Well Being• Health Satisfaction• Random Utility• Fit Measures• Normalization• Threshold Values (Cutpoints0• Differential Item Functioning• Anchoring Vignette• Panel Data• Incidental Parameters Problem• Attrition Bias• Inverse Probability Weighting• Transition Matrix
Models
• Ordered Probit and Logit• Generalized Ordered Probit• Hierarchical Ordered Probit• Vignettes• Fixed and Random Effects OPM• Dynamic Ordered Probit• Sample Selection OPM
3/53: Topic 3.1 – Models for Ordered Choices
Ordered Discrete Outcomes
E.g.: Taste test, credit rating, course grade, preference scale Underlying random preferences:
Existence of an underlying continuous preference scale Mapping to observed choices
Strength of preferences is reflected in the discrete outcome Censoring and discrete measurement The nature of ordered data
4/53: Topic 3.1 – Models for Ordered Choices
Ordered Choices at IMDb
5/53: Topic 3.1 – Models for Ordered Choices
6/53: Topic 3.1 – Models for Ordered Choices
Health Satisfaction (HSAT)
Self administered survey: Health Care Satisfaction (0 – 10)
Continuous Preference Scale
7/53: Topic 3.1 – Models for Ordered Choices
Modeling Ordered Choices
Random Utility (allowing a panel data setting)
Uit = + ’xit + it
= ait + it
Observe outcome j if utility is in region j Probability of outcome = probability of cell
Pr[Yit=j] = F(j – ait) - F(j-1 – ait)
8/53: Topic 3.1 – Models for Ordered Choices
Ordered Probability Model
1
1 2
2 3
J -1 J
j-1
y* , we assume contains a constant term
y 0 if y* 0
y = 1 if 0 < y*
y = 2 if < y*
y = 3 if < y*
...
y = J if < y*
In general: y = j if < y*
βx x
j
-1 o J j-1 j,
, j = 0,1,...,J
, 0, , j = 1,...,J
9/53: Topic 3.1 – Models for Ordered Choices
Combined Outcomes for Health Satisfaction
10/53: Topic 3.1 – Models for Ordered Choices
Ordered Probabilities
j-1 j
j-1 j
j j 1
j j 1
j j 1
Prob[y=j]=Prob[ y* ]
= Prob[ ]
= Prob[ ] Prob[ ]
= Prob[ ] Prob[ ]
= F[ ] F[ ]
where F[ ] i
βx
βx βx
βx βx
βx βx
s the CDF of .
11/53: Topic 3.1 – Models for Ordered Choices
12/53: Topic 3.1 – Models for Ordered Choices
Coefficients
j 1 j kk
What are the coefficients in the ordered probit model?
There is no conditional mean function.
Prob[y=j| ] [f( ) f( )]
x
Magnitude depends on the scale factor and the coeff
xβ'x β'x
icient.
Sign depends on the densities at the two points!
What does it mean that a coefficient is "significant?"
13/53: Topic 3.1 – Models for Ordered Choices
Partial Effects in the Ordered Choice Model
Assume the βk is positive.
Assume that xk increases.
β’x increases. μj- β’x shifts to the left for all 5 cells.
Prob[y=0] decreases
Prob[y=1] decreases – the mass shifted out is larger than the mass shifted in.
Prob[y=3] increases – same reason in reverse.
Prob[y=4] must increase.
When βk > 0, increase in xk decreases Prob[y=0] and increases Prob[y=J]. Intermediate cells are ambiguous, but there is only one sign change in the marginal effects from 0 to 1 to … to J
14/53: Topic 3.1 – Models for Ordered Choices
Partial Effects of 8 Years of Education
15/53: Topic 3.1 – Models for Ordered Choices
An Ordered Probability Model for Health Satisfaction
+---------------------------------------------+| Ordered Probability Model || Dependent variable HSAT || Number of observations 27326 || Underlying probabilities based on Normal || Cell frequencies for outcomes || Y Count Freq Y Count Freq Y Count Freq || 0 447 .016 1 255 .009 2 642 .023 || 3 1173 .042 4 1390 .050 5 4233 .154 || 6 2530 .092 7 4231 .154 8 6172 .225 || 9 3061 .112 10 3192 .116 |+---------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Index function for probability Constant 2.61335825 .04658496 56.099 .0000 FEMALE -.05840486 .01259442 -4.637 .0000 .47877479 EDUC .03390552 .00284332 11.925 .0000 11.3206310 AGE -.01997327 .00059487 -33.576 .0000 43.5256898 HHNINC .25914964 .03631951 7.135 .0000 .35208362 HHKIDS .06314906 .01350176 4.677 .0000 .40273000 Threshold parameters for index Mu(1) .19352076 .01002714 19.300 .0000 Mu(2) .49955053 .01087525 45.935 .0000 Mu(3) .83593441 .00990420 84.402 .0000 Mu(4) 1.10524187 .00908506 121.655 .0000 Mu(5) 1.66256620 .00801113 207.532 .0000 Mu(6) 1.92729096 .00774122 248.965 .0000 Mu(7) 2.33879408 .00777041 300.987 .0000 Mu(8) 2.99432165 .00851090 351.822 .0000 Mu(9) 3.45366015 .01017554 339.408 .0000
16/53: Topic 3.1 – Models for Ordered Choices
Ordered Probability Partial Effects+----------------------------------------------------+| Marginal effects for ordered probability model || M.E.s for dummy variables are Pr[y|x=1]-Pr[y|x=0] || Names for dummy variables are marked by *. |+----------------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ These are the effects on Prob[Y=00] at means. *FEMALE .00200414 .00043473 4.610 .0000 .47877479 EDUC -.00115962 .986135D-04 -11.759 .0000 11.3206310 AGE .00068311 .224205D-04 30.468 .0000 43.5256898 HHNINC -.00886328 .00124869 -7.098 .0000 .35208362 *HHKIDS -.00213193 .00045119 -4.725 .0000 .40273000 These are the effects on Prob[Y=01] at means. *FEMALE .00101533 .00021973 4.621 .0000 .47877479 EDUC -.00058810 .496973D-04 -11.834 .0000 11.3206310 AGE .00034644 .108937D-04 31.802 .0000 43.5256898 HHNINC -.00449505 .00063180 -7.115 .0000 .35208362 *HHKIDS -.00108460 .00022994 -4.717 .0000 .40273000 ... repeated for all 11 outcomes These are the effects on Prob[Y=10] at means. *FEMALE -.01082419 .00233746 -4.631 .0000 .47877479 EDUC .00629289 .00053706 11.717 .0000 11.3206310 AGE -.00370705 .00012547 -29.545 .0000 43.5256898 HHNINC .04809836 .00678434 7.090 .0000 .35208362 *HHKIDS .01181070 .00255177 4.628 .0000 .40273000
17/53: Topic 3.1 – Models for Ordered Choices
Ordered Probit Marginal Effects
18/53: Topic 3.1 – Models for Ordered Choices
Analysis of Model Implications
Partial Effects Fit Measures Predicted Probabilities
Averaged: They match sample proportions. By observation Segments of the sample Related to particular variables
19/53: Topic 3.1 – Models for Ordered Choices
Predictions from the Model Related to Age
20/53: Topic 3.1 – Models for Ordered Choices
Fit Measures There is no single “dependent variable” to
explain. There is no sum of squares or other
measure of “variation” to explain. Predictions of the model relate to a set of
J+1 probabilities, not a single variable. How to explain fit?
Based on the underlying regression Based on the likelihood function Based on prediction of the outcome variable
21/53: Topic 3.1 – Models for Ordered Choices
Log Likelihood Based Fit Measures
22/53: Topic 3.1 – Models for Ordered Choices
23/53: Topic 3.1 – Models for Ordered Choices
A Somewhat Better Fit
24/53: Topic 3.1 – Models for Ordered Choices
Different Normalizations
NLOGIT Y = 0,1,…,J, U* = α + β’x + ε One overall constant term, α J-1 “cutpoints;” μ-1 = -∞, μ0 = 0, μ1,… μJ-1, μJ = + ∞
Stata Y = 1,…,J+1, U* = β’x + ε No overall constant, α=0 J “cutpoints;” μ0 = -∞, μ1,… μJ, μJ+1 = + ∞
25/53: Topic 3.1 – Models for Ordered Choices
α̂
ˆjμ
26/53: Topic 3.1 – Models for Ordered Choices
ˆ α
ˆˆ jμ α
27/53: Topic 3.1 – Models for Ordered Choices
Generalizing the Ordered Probit with Heterogeneous Thresholds
i
-1 0 j j-1 J
ij j
Index =
Threshold parameters
Standard model : μ = - , μ = 0, μ >μ > 0, μ = +
Preference scale and thresholds are homogeneous
A generalized model (Pudney and Shields, JAE, 2000)
μ = α +
β x
j i
ik i
ij i j ik ik
Note the identification problem. If z is also in x (same variable)
then μ - = α + γz -βz +... No longer clear if the variable
is in or (or both)
γ z
β x
x z
28/53: Topic 3.1 – Models for Ordered Choices
Hierarchical Ordered Probit
i
-1 0 j j-1 J
Index =
Threshold parameters
Standard model : μ = - , μ = 0, μ >μ > 0, μ = +
Preference scale and thresholds are homogeneous
A generalized model (Harris and Zhao (2000), NLOGIT (2007))
β x
]
],
ij j j i
ij j i j j-1 j
μ = exp[α +
An internally consistent restricted modification
μ = exp[α + α α + exp(θ )
γ z
γ z
29/53: Topic 3.1 – Models for Ordered Choices
Ordered Choice Model
30/53: Topic 3.1 – Models for Ordered Choices
HOPit Model
31/53: Topic 3.1 – Models for Ordered Choices
Differential Item Functioning
32/53: Topic 3.1 – Models for Ordered Choices
A Vignette Random Effects Model
33/53: Topic 3.1 – Models for Ordered Choices
Vignettes
34/53: Topic 3.1 – Models for Ordered Choices
35/53: Topic 3.1 – Models for Ordered Choices
36/53: Topic 3.1 – Models for Ordered Choices
37/53: Topic 3.1 – Models for Ordered Choices
Panel Data Fixed Effects
The usual incidental parameters problem Practically feasible but methodologically
ambiguous Partitioning Prob(yit > j|xit) produces estimable
binomial logit models. (Find a way to combine multiple estimates of the same β.
Random Effects Standard application Extension to random parameters – see above
38/53: Topic 3.1 – Models for Ordered Choices
Incidental Parameters Problem
Table 9.1 Monte Carlo Analysis of the Bias of the MLE in Fixed Effects Discrete Choice Models (Means of empirical sampling distributions, N = 1,000 individuals, R = 200 replications)
39/53: Topic 3.1 – Models for Ordered Choices
A Study of Health Status in the Presence of Attrition
40/53: Topic 3.1 – Models for Ordered Choices
Model for Self Assessed Health
British Household Panel Survey (BHPS) Waves 1-8, 1991-1998 Self assessed health on 0,1,2,3,4 scale Sociological and demographic covariates Dynamics – inertia in reporting of top scale
Dynamic ordered probit model Balanced panel – analyze dynamics Unbalanced panel – examine attrition
41/53: Topic 3.1 – Models for Ordered Choices
Dynamic Ordered Probit Model
*, 1
, 1
, 1
Latent Regression - Random Utility
h = + + +
= relevant covariates and control variables
= 0/1 indicators of reported health status in previous period
H ( ) =
it it i t i it
it
i t
i t j
x H
x
H
*1
1[Individual i reported h in previous period], j=0,...,4
Ordered Choice Observation Mechanism
h = j if < h , j = 0,1,2,3,4
Ordered Probit Model - ~ N[0,1]
Random Effects with
it
it j it j
it
j
20 1 ,1 2
Mundlak Correction and Initial Conditions
= + + u , u ~ N[0, ]i i i i i H x
It would not be appropriate to include hi,t-1 itself in the model as this is a label, not a measure
42/53: Topic 3.1 – Models for Ordered Choices
Random Effects Dynamic Ordered Probit Model
Jit it j 1 j i,t 1 i i,t
i,t j-1 it j
i,t i,t
Jit,j it j it j 1 j i,t 1 i
Random Effects Dynamic Ordered Probit Model
h * h ( j)
h j if < h * <
h ( j) 1 if h = j
P P[h j] ( h ( j) )
x
x
i
i
Jj 1 it j 1 j i,t 1 i
Ji 0 j 1 1,j i,1 i i
TN
it,j j ji=1 t 1
( h ( j) )
Parameterize Random Effects
h ( j) u
Simulation or Quadrature Based Estimation
lnL= ln P f( )d
x
x
43/53: Topic 3.1 – Models for Ordered Choices
Data
44/53: Topic 3.1 – Models for Ordered Choices
Variable of Interest
45/53: Topic 3.1 – Models for Ordered Choices
Dynamics
46/53: Topic 3.1 – Models for Ordered Choices
Attrition
47/53: Topic 3.1 – Models for Ordered Choices
Testing for Attrition Bias
Three dummy variables added to full model with unbalanced panel suggest presence of attrition effects.
48/53: Topic 3.1 – Models for Ordered Choices
Probability Weighting Estimators
A Patch for Attrition (1) Fit a participation probit equation for each wave. (2) Compute p(i,t) = predictions of participation for each
individual in each period. Special assumptions needed to make this work
Ignore common effects and fit a weighted pooled log likelihood: Σi Σt [dit/p(i,t)]logLPit.
49/53: Topic 3.1 – Models for Ordered Choices
Attrition Model with IP Weights
Assumes (1) Prob(attrition|all data) = Prob(attrition|selected variables) (ignorability) (2) Attrition is an ‘absorbing state.’ No reentry. Obviously not true for the GSOEP data above.Can deal with point (2) by isolating a subsample of those present at wave 1 and the monotonically shrinking subsample as the waves progress.
50/53: Topic 3.1 – Models for Ordered Choices
Estimated Partial Effects by Model
51/53: Topic 3.1 – Models for Ordered Choices
Partial Effect for a Category
These are 4 dummy variables for state in the previous period. Using first differences, the 0.234 estimated for SAHEX means transition from EXCELLENT in the previous period to GOOD in the previous period, where GOOD is the omitted category. Likewise for the other 3 previous state variables. The margin from ‘POOR’ to ‘GOOD’ was not interesting in the paper. The better margin would have been from EXCELLENT to POOR, which would have (EX,POOR) change from (1,0) to (0,1).
52/53: Topic 3.1 – Models for Ordered Choices
Ordered Choice Model Extensions
53/53: Topic 3.1 – Models for Ordered Choices
Model Extensions
Multivariate Bivariate Multivariate
Inflation and Two Part Zero inflation Sample Selection Endogenous Latent Class
54/53: Topic 3.1 – Models for Ordered Choices
Generalizing the Ordered Probit with Heterogeneous Thresholds
i
-1 0 j j-1 J
ij j
Index =
Threshold parameters
Standard model : μ = - , μ = 0, μ >μ > 0, μ = +
Preference scale and thresholds are homogeneous
A generalized model (Pudney and Shields, JAE, 2000)
μ = α +
β x
j i
ik i
ij i j ik ik
Note the identification problem. If z is also in x (same variable)
then μ - = α + γz -βz +... No longer clear if the variable
is in or (or both)
γ z
β x
x z
55/53: Topic 3.1 – Models for Ordered Choices
Generalized Ordered Probit-1
it it it
it it i
Pudney, S. and M. Shields, "Gender, Race Pay and Promotion in
the British Nursing Profession," J . Applied Ec'trix, 15/4, July 2000.
Ordered Probit Kernel
y * x
y 0 if y * 0; Prob[y
t it
it it 1 it 1 it it
it 1 it 2 it 2 it 1 it
it it J 1 it J 1 it
0] [-x ]
y 1 if 0 < y * ; Prob[y 1] = [ -x ]- [-x ]
y 2 if < y * Prob[y 2] = [ -x ]- [ -x ]
...
y J if y * Prob[y J] 1- [ -x ]
ij i j
it i j it j i j 1 it j 1
Heterogeneous Thresholds and Latent Regression
z
Prob[y j] = [z -x ( )]- [z -x ( )]
Problems:
(1) Coefficients on variables in both Z and X are unid
j j 1
entified
(2) How do you make sure that is positive?
Y=Grade (rank)
Z=Sex, Race
X=Experience, Education, Training, History, Marital Status, Age
56/53: Topic 3.1 – Models for Ordered Choices
Generalized Ordered Probit-2
it it it
it it it it
it it 1 it 1 it it
it 1 it 2 it 2 it 1 it
it it
y * x
y 0 if y * 0; Prob[y 0] [-x ]
y 1 if 0 < y * ; Prob[y 1] = [ -x ]- [-x ]
y 2 if < y * Prob[y 2] = [ -x ]- [ -x ]
...
y J if y *
J 1 it J 1 it
ij j i
Prob[y J] 1- [ -x ]
exp( z )
57/53: Topic 3.1 – Models for Ordered Choices
A G.O.P Model
+---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Index function for probability Constant 1.73737318 .13231824 13.130 .0000 AGE -.01458121 .00141601 -10.297 .0000 46.7491906 LOGINC .17724352 .03275857 5.411 .0000 -1.23143358 EDUC .03897560 .00780436 4.994 .0000 10.9669624 MARRIED .09391821 .03761091 2.497 .0125 .75458666 Estimates of t(j) in mu(j)=exp[t(j)+d*z] Theta(1) -1.28275309 .06080268 -21.097 .0000 Theta(2) -.26918032 .03193086 -8.430 .0000 Theta(3) .36377472 .02109406 17.245 .0000 Theta(4) .85818206 .01656304 51.813 .0000 Threshold covariates mu(j)=exp[t(j)+d*z] FEMALE .00987976 .01802816 .548 .5837
How do we interpret the result for FEMALE?
58/53: Topic 3.1 – Models for Ordered Choices
Hierarchical Ordered Probit
i
-1 0 j j-1 J
Index =
Threshold parameters
Standard model : μ = - , μ = 0, μ >μ > 0, μ = +
Preference scale and thresholds are homogeneous
A generalized model (Harris and Zhao (2000), NLOGIT (2007))
β x
]
],
ij j j i
ij j i j j-1 j
μ = exp[α +
An internally consistent restricted modification
μ = exp[α + α α + exp(θ )
γ z
γ z
59/53: Topic 3.1 – Models for Ordered Choices
Ordered Choice Model
60/53: Topic 3.1 – Models for Ordered Choices
HOPit Model
61/53: Topic 3.1 – Models for Ordered Choices
A Sample Selection Model
62/53: Topic 3.1 – Models for Ordered Choices
Zero Inflated Ordered Probit
it it it it it
it it
it it
p * =z u , p = 1[p * 0] (PROBIT Model)
Nonparticipants (p 0) always report y 0.
Participants (p 1) report y 0,1,2,...J (Ordered)
Behavioral Regime (Latent Class) = "Participation"
Consum
it it it
it it it it
it it 1 it 1 it it
it 1 it 2 it 2 it
y * x
y 0 if y * 0; Prob[y 0] [-x ]
y 1 if 0 < y * ; Prob[y 1] = [ -x ]- [-x ]
y 2 if < y * Prob[y 2] = [ -x ]-
er Behavior (Ordered Outcome)
1 it
it it J 1 it J 1 it
it it it it it
it it it
[ -x ]
...
y J if y * Prob[y J] 1- [ -x ]
Prob[y =0] =Prob[p =0]+Prob[p =1]Prob[y =0|p =1]
Prob[y =j>0]= Prob[p =1]Prob[y =j |p
Implied Probabilities
it=1]
63/53: Topic 3.1 – Models for Ordered Choices
Teenage Smoking
Harris, M. and Zhao, Z., "Modelling Tobacco Consumption with a Zero
Inflated Ordered Probit Model," (Monash University - under review,
Journal of Econometrics, 2005)
"How often do you currently smoke cigarettes, pipes or other tobacco
products in the last 12 months?"
0 = Not at all (76%)
1 = Less frequently than weekly (4%)
2 = Daily, less than 20/day (13.8%)
3 = Daily, more than 20/day (6.2%)
Splitting Equation: Young & Female, Log(Age), Male, married, Working,
Unemployed, English speaking, ...
Smoking Equation: Prices of alcohol, marijuana, tobacco, Age, Sex,
Married, English speaking, ...
64/53: Topic 3.1 – Models for Ordered Choices
A Bivariate Latent Class Correlated Generalised Ordered
Probit Model with an Application to Modelling Observed Obesity Levels
William GreeneStern School of Business, New York University
With Mark Harris, Bruce Hollingsworth, Pushkar Maitra Monash University
Stern Economics Working Paper 08-18.
http://w4.stern.nyu.edu/emplibrary/ObesityLCGOPpaperReSTAT.pdf
Forthcoming, Economics Letters, 2014
65/53: Topic 3.1 – Models for Ordered Choices
Obesity
The International Obesity Taskforce (http://www.iotf.org) calls obesity one of the most important medical and public health problems of our time.
Defined as a condition of excess body fat; associated with a large number of debilitating and life-threatening disorders
Health experts argue that given an individual’s height, their weight should lie within a certain range
Most common measure = Body Mass Index (BMI): Weight (Kg)/height(Meters)2
WHO guidelines: BMI < 18.5 are underweight 18.5 < BMI < 25 are normal 25 < BMI < 30 are overweight BMI > 30 are obese
Around 300 million people worldwide are obese, a figure likely to rise
66/53: Topic 3.1 – Models for Ordered Choices
Models for BMI
Simple Regression Approach Based on Actual BMI:BMI* = ′x + , ~ N[0,2]
No accommodation of heterogeneity Rigid measurement by the guidelinesInterval Censored Regression Approach
WT = 0 if BMI* < 25 Normal 1 if 25 < BMI* < 30
Overweight 2 if BMI* > 30 Obese
Inadequate accommodation of heterogeneityInflexible reliance on WHO classification
67/53: Topic 3.1 – Models for Ordered Choices
An Ordered Probit Approach
A Latent Regression Model for “True BMI”BMI* = ′x + , ~ N[0,σ2], σ2 = 1
“True BMI” = a proxy for weight is unobserved
Observation Mechanism for Weight TypeWT = 0 if BMI* < 0 Normal
1 if 0 < BMI* < Overweight
2 if BMI* > Obese
68/53: Topic 3.1 – Models for Ordered Choices
A Basic Ordered Probit Model
Prob( 0 | ) Prob( * 0)
Prob( 0)
( )
Prob( 1| ) Prob(0 * )
Prob( ) Prob( 0)
i i
i i
i
i i
i i i i
WT BMI
WT BMI
x
x
x
x
x x
( ) ( )
Prob( 2 | ) Prob( * )
Prob( )
1 Prob( )
1 ( )
i i
i i
i i
i i
i
WT BMI
x x
x
x
x
x
69/53: Topic 3.1 – Models for Ordered Choices
Latent Class Modeling
Irrespective of observed weight category, individuals can be thought of being in one of several ‘types’ or ‘classes. e.g. an obese individual may be so due to genetic reasons or due to lifestyle factors
These distinct sets of individuals likely to have differing reactions to various policy tools and/or characteristics
The observer does not know from the data which class an individual is in.
Suggests use of a latent class approach Growing use in explaining health outcomes (Deb and
Trivedi, 2002, and Bago d’Uva, 2005)
70/53: Topic 3.1 – Models for Ordered Choices
A Latent Class Model
For modeling purposes, class membership is distributed with a discrete distribution,
Prob(individual i is a member of class = c) = ic = c
Prob(WTi = j | xi) = Σc Prob(WTi = j | xi,class = c)Prob(class = c).
71/53: Topic 3.1 – Models for Ordered Choices
Probabilities in the Latent Class Model
, 1,Prob( = | )
There are two classes labeled c = 0 and c = 1.
i i c j c c i j c c icWT j
x x x
1, ,
Prob( = | ) i i
c j c c i j c c i cci
WT j
xx x
x
72/53: Topic 3.1 – Models for Ordered Choices
Class Assignment
1
1
Probit Model for Class Membership
Prob(Class = 1 | )
( )
Prob(Class = 0 | ) 1-
1 ( )
i i i
i
i i i
i
w
w
w
w
( )
Prob(Class = | ) [(2 1) ]i
i i ic c
w
w w
Class membership may relate to demographics such as age and sex.
73/53: Topic 3.1 – Models for Ordered Choices
Generalized Ordered Probit – Latent Classes and Variable Thresholds
,
,
,
Basic Ordered Choice Model
Prob( 0 | , ) ( )
Prob( 1| , ) ( ) ( )
Prob( 2 | , ) 1 ( )
Heterogeneity in Threshold Parameter
exp(
i c i
i i c c i c i
i i c c i
i c c
WT class c
WT class c
WT class c
x x
x x x
x x
)c iz
74/53: Topic 3.1 – Models for Ordered Choices
Data
US National Health Interview Survey (2005); conducted by the National Centre for Health Statistics
Information on self-reported height and weight levels, BMI levels
Demographic information Remove those underweight Split sample (30,000+) by gender
75/53: Topic 3.1 – Models for Ordered Choices
Model Components
x: determines observed weight levels within classes For observed weight levels we use lifestyle factors such as
marital status and exercise levels z: determines latent classes For latent class determination we use genetic proxies such
as age, gender and ethnicity: the things we can’t change
w: determines position of boundary parameters within classes
For the boundary parameters we have: weight-training intensity and age (BMI inappropriate for the aged?) pregnancy (small numbers and length of term unknown)
76/53: Topic 3.1 – Models for Ordered Choices
Correlation Between Classes and Regression
Outcome Model (BMI*|class = c) = c′x + c, c ~ N[0,1] WT|class=c = 0 if BMI*|class = c < 0
1 if 0 < BMI*|class = c < c
2 if BMI*|class = c > c.
Threshold|class = c: c = exp(c + γc′r) Class Assignment
c* = ′w + u, u ~ N[0,1]. c = 0 if c* < 0 1 if c* > 0. Endogenous Class Assignment (c,u) ~ N2[(0,0),(1,c,1)]
Recommended