Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
WHO Multi-Country Studies unit Working Paper 4
Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the
bivariate hierarchical ordered probit model and anchoring vignette methodologies
in SAGE to improve cross-country comparability of self-reported health.
Márton Ispány, Emese Verdes, Ajay Tandon, Somnath Chatterji
March 2012
Introduction
In many social science researches and econometric applications data that arise through
measurement of discrete outcomes or discrete choice among a set of alternatives are in
the form of ordinal or ordered categorical data. Such examples, among others, are self-
report responses in household surveys, modeling labor force participate, or decision of
which product to choose or which candidate to elect. To analyze such data a number of
discrete response (choice) models have been developed in econometric theory, see
Greene [,]. A simple one of these kinds of models is the class of ordered univariate
response models where the number of categories of the dependent variable is greater than
two, i.e. there are several possible outcomes or choices and they are ordered according to
the preferences of respondent. The same data structure also arises in the analysis of
repeated measurements, where the response of each respondent, experimental unit or
subject is observed on multiple occasions to record the level of a specific event.
Responses of this type are known as multivariate or correlated categorical responses. The
theory of univariate ordered models is relatively well developed and they have been
applied extensively in biostatics, economics, political science and sociology while
estimations of the joint probability distribution of two or more ordered categorical
variables are less common in the literature. The relatively new bivariate (multivariate)
ordered probit (BIOPROBIT) models could be treated as an extension of a standard
bivariate (multivariate) probit model when the number of categories of the dependent
variables is greater than two. The estimation procedures and their statistical properties of
BIOPROBIT model are studied in Greene [, Section 11.5.2], Sajaia [] and others. Some
of the applications of BIOPROBIT models are modeling educational level of married
couples (Magee et al. []), educational attainment in French and Germany (Lauer []),
family size (Calhoun [,]), fertility outcome and fertility motivations of Danish twins
(Kohler and Rodgers []), analyzing ownership of cats and dogs or dogs and televisions
(Butler and Chatterjee [,]), and household-level decision between number of seasonal
tickets and number of cars (Scott and Axhausen []).
The paper put forth a new approach to modeling correlated categorical responses in
heterogeneous population where the subjective scale is changing according to the
different segments, such as countries, of the population. We utilize a vignette
methodology to evaluate and correct subjective correlated responses. This methodology
2
is widely applied to many economic applications with subjective scales, e.g. health,
health care, school community strength, HIV risk, state effectiveness, and corruption (see
for example, http://gking.harvard.edu/vign/eg/). WHO’s Study on global AGEing and
adult health (SAGE) used a number of methods to improve the reliability, validity and
comparability of its self-reported health measures, including the use of anchoring
vignettes. The anchoring vignette technique presents the respondent with a set of
hypothetical stories about which the same questions and response categories are used for
self-assessment of health. The vignettes are used to fix the level of ability on a given
health domain to better distinguish between differences in self-ratings due to actual health
differences and those due to varying norms or expectations for health. (Hopkins and
King, 2010; Salomon, 2004).
Measuring the health state of individuals is important for the evaluation of health and
social policies, monitoring and measuring the health of populations. Self-report health is
a common method for assessing health status in household health surveys and single
question versions of self-reported health predict a range of health outcomes from
disability to death (Cutler 2009, Singh-Manoux et al. Psychosomatic Medicine
2007;69:138-43.).
Methods
The models
Categorical data on health are usually described by discrete choice latent variable models,
by assuming that the observed categorical variables, e.g. the self-report health responses,
are discrete transformations of an underlying, unobservable, and continuous true level of
health. For detailed introduction to discrete choice models see Chapter 21 of Greene []
and a recent survey of Greene [, Chapter 11]. If this discrete transformation is constant
across individuals then we say that the homogeneous reporting behavior holds in
responses. On the contrary, reporting heterogeneity means that the mappings between the
latent variables and observed categorical variables are different for various categories of
respondents. In this paper, we consider the multivariate, especially the bivariate case, i.e.
we allow more than one categorical response for each individual.
Let ijy , MjNi ,,1,,,1 , be a self-reported categorical health measure, where i
and j refer to the respondent and the number of question, respectively. Moreover, N and
M denotes the number of respondents and questions, respectively. The latent variable
models assume there is an unobserved continuous latent variable
ijy for ith respondent at
jth question. These latent variables are supposed to depend on observable covariates and
they are modeled by latent equations. Using the linear regression model as one the
simplest ones
ijy is specified as
ijj
T
iij xy (1)
Here ix is a vector of covariates for the i -th respondent, j is a regression coefficient,
and the error vector )( iji has M -dimensional normal distribution with mean 0 and
variance matrix . The diagonal elements of are supposed to be 1, in order to be
3
identified the model. In the homogeneous reporting case it is assumed that the observed
categorical responses ijy of the i -th individual depend on the latent variables in the
following way:
j
kij
j
kij yky
1 , (2)
Kk ,,1 , j
K
j
K
jj 110 with j
K
j ,0 , where K denotes the number of
different answers to self-report questions. The model (1) with cut-point definition (2) is called
multivariate ordered probit model.
In the special case 2M , we speak about bivariate ordered probit (BIOPROPIT) model,
see Section 11.5.2 in Greene [] or Sajaia []. We also remark that in the one-dimensional
case 1M the standard ordered probit (OPROBIT) model is given, see Chapter 21.8 in
Greene []. There are several extensions of the ordered probit model that follow the logic
of bivariate models using two latent equations with correlated error terms, see e.g. Butler
et al. [] and Tobias and Li []. Our setup follows the latter one. Seemingly unrelated and
simultaneous specifications of two-equation ordered probit model are considered e.g. in
Sajaia []. The parameters, which are the regression coefficients j ’s, the cut-points j
k ’s
and the independent non-diagonal elements of are estimated by the maximum
likelihood method using the full information maximum likelihood (FIML), see
supplement S1. In order to be increasing the cut-points we use exponential
parametrization, i.e. )exp(1
j
k
j
k
j
k , jj
11 , and the new parameters j
k ’s will be
estimated. The FIML technique can be easily applied in any statistical or econometric
software, e.g. in STATA, where the cumulative density function of the standard bivariate
normal distribution is implemented, see Sajaia []. Butler et al. [] proposed a two-step
estimation approach based on fitting univariate ordered probit models. Tobias and Li []
suggested a Bayesian alternative estimation. The two latent equations of the BIOPROBIT
model can be rewritten in the following two-dimensional vector form as a standard linear
model
2
1
2
1
2
1
0
0
i
i
T
i
T
i
i
i
x
x
y
y
.
Here the error terms 1i and 2i are distributed as standard bivariate normal with
correlation coefficient . Summarizing, the BIOPROBIT model is a two-dimensional
latent model with ordered probit link function and bivariate normally distributed latent
variables.
In the heterogeneous reporting case the ordered probit models are no longer appropriate
for describing the data. However, it is possible to generalize these models by allowing the
cut-points to depend on covariates as
jT
i
j
i x 11, , jkT
i
j
ki
j
ki x exp1,, , 1,,2 Kk . (3)
Hence the dependence between the categorical observed and continuous latent variables
can be derived in the way
4
j
k,iij
j
1k,iij yky
(4)
Here j
k ’s are parameters which measure the impact of covariates on cut-points, see
Terza [] and Pudney and Shields []. In order to identify the effect of covariates on cut-
points we use vignette ratings as exogeneous information which fix different levels of
respondent’s categories. This technique has been suggested by Tandon et al. [], see also
King et al. [], Salomon et al. []. Suppose that there are L vignettes for each question and
denote by v
ijy the vignette rating for -th vignette of j -th question at i -th respondent. It
is also assumed that the possible vignette values are K,,2,1 , i.e. they coincide with the
possible values of self-reports. In the latent trait model approach it is supposed that there
is an unobservable continuous variable v
ijy behind each v
ijy for all ji, and . We assume
that these latent variables are fixed over the whole population, i.e. they do not depend on
the covariates. In mathematical terms,
ijj
v
ijy , (5)
LMjNi ,,1,,,1,,,1 , where j denotes the vignette mean and the error vector
)( iji has M -dimensional normal distribution with mean 0 and variance matrix v . The
observed vignette ratings depend on the latent vignette variables in the following way:
j
ki
v
ij
j
ki
v
ij yky ,1,
(6)
It should be emphasized that we use the same cut-points as in the self-report part. Thus
the self-report model (1) and the vignette model (5), which are multivariate ordered
probit models, with cut-point relations (4) and (6), respectively, are joined by using
common cut-points parametrized by (3). We refer to the system of these two coupled
multivariate ordered probit models as multivariate hierarchical ordered probit model. In
the one-dimensional case 1M we get back the well-known hierarchical ordered probit
(HOPIT) model. The HOPIT model was originally developed to enhance the cross-
population comparability of self-report survey data, see Tandon et al. []. In the special
case 2M we speak about bivariate hierarchical ordered probit (BIHOPIT) model.
From the identification point of view, we assume that 01 j for all j and the diagonal
entries of the matrix v equal to 1. We remark that in this case we do not need to
suppose anything on entries of the self-report covariance matrix , it is an arbitrary
covariance matrix. The parameters, which are the regression coefficients j ’s, the
vignette means j ’s, the cut-point parameters j
k ’s, the independent entries of , and
the independent non-diagonal entries of v are estimated by the maximum likelihood
method using the full information maximum likelihood (FIML), see supplement S1. To
parameterize the BIHOPIT model in this paper we also suppose that the correlation
structure of the self-report and vignette parts are same. We denote the common
correlation coefficient by . The latent equations of the BIHOPIT model can also be
written in the form of standard linear model. Here, for the sake of simplicity, we assume
that 1L , i.e. there is only one vignette. We have the following vector form for the latent
variables:
5
2
1
2
1
2
1
2
1
2
1
2
1
1000
0100
000
000
i
i
i
i
T
i
T
i
v
i
v
i
i
i
x
x
y
y
y
y
,
where the four-dimensional error term has multivariate normal distribution with mean 0
and covariance matrix:
100
100
00
002
221
21
2
1
.
Summarizing, the BIHOPIT model is a system of coupled bivariate ordered probit
(BIOPROBIT) models with common cut-points, which depend on covariates. One of the
BIOPROBIT models describes the self-report part while the other ones describe the
vignette part.
An alternative approach to analyze multivariate or longitudinal ordered discrete data is
the application of random effect in modeling the latent variables. In this case it is
supposed that there exists an actual level i of respondent i on a continuous, one-
dimensional scale which determines the all responses of this individual. Moreover, the
responses also depend on the covariates. Thus, we assume that the latent variable
ijy can
be expressed as
iji
T
iij xy , (7)
where i is the random effect of the i -th respondent which has Gaussian distribution
with mean 0 and variance 2
and ij is the individual error which has Gaussian
distribution with mean 0 and variance 2
. In the two-dimensional case the linear model
formulation of this model is the following:
2
1
2
1
0
0
ii
ii
T
i
T
i
i
i
x
x
y
y
.
By this vector representation it is clear that the regression coefficients are the same for all
questions which is the main constraint compared to the previous models. The covariance
matrix of the two-dimensional normal error has the form
222
222
.
6
The latent vignette variables are supposed to have normal distribution with mean j ,
where j and refer to the number of self-report and vignette, respectively, and common
variance 2
v . For identification we suppose that 01 j for all j and 12 v . The discrete
transformations between the observed and the latent variables are also defined by the help
of cut-points, see (4) and (6), which are parametrized as in (3). The system of the self-
report and vignette variables with their latent variables and common cut-points defined
above is called compound hierarchical ordered probit (CHOPIT) model. The CHOPIT
model has been introduced by King et al. []. The parameters of the CHOPIT model are
the regression coefficient , the vignette means j ’s, the cut-point parameters j
k ’s, and
variances 22 , . These parameters are estimated by the maximum likelihood method
approximating numerically the full-information likelihood, see supplement S1 as well.
Finally, we remark that CHOPIT model is a one-dimensional latent model with probit
link function, a so-called level-1 model with normally distributed random effect and
vignettes.
Hypothesis tests and goodness of fit
Hypothesis tests about restrictions on the BIHOPIT model parameters can be derived
using any of the three common procedures: likelihood ratio test, Wald statistic and
Lagrange multiplier statistic. Since the computation for the FIML is straightforward the
likelihood ratio test is a natural choice. The likelihood ratio statistic is
)ln(ln2 01 LL ,
where the subscripts 1 and 0 indicate the values of the log-likelihood computed for the
alternative ( 1H ) and null ( 0H ) hypothesis, respectively. The likelihood ratio statistic
has asymptotically 2 distribution with 01 dimdim HHq degree of freedom
under 0H , where, for the hypothesis :H , Hdim denotes the dimension of the
parameter space .
There are many issues that need to answer using hypothesis testing approach in the
context of BIHOPIT model. These include, among others, tests which investigate
different parts of the model separately, tests for studying the relation of the self-report
and vignette parts of the model, and specification issues of the model. One of the most
important of them is testing uni-dimensionality. In formal terms, uni-dimensionality can
be defined as the assumption that any dependence between the self-report questions is
solely due to a single underlying latent trait. In the context of BIHOPIT model this means
that the latent self-report variables only differ in the extent of random error term and the
systematic parts depending on covariates are equal. We refer to this property as weak uni-
dimensionality. (Note that the strong uni-dimensionality supposes the equality of the
error terms as well.) In mathematical term, weak uni-dimensionality can be formulated in
the following hypothesis system:
211
210
:
:
H
H
7
We will refer to the null hypothesis 0H as to the (weak) uni-dimensionality hypothesis.
Whenever the uni-dimensionality hypothesis can not be rejected, the predicted scores for
latent self-reports given by unconditional mean will be the same for all coordinates. The
asymptotic distribution of likelihood ratio statistic for testing the uni-dimensionality
hypothesis is 2
p , where p denotes the number of covariates including the intercept.
The correlation coefficient plays a central role in the identification and inference of
the BIHOPIT model. This correlation coefficient can be interpreted as the correlation
between the two unobservable latent self-report variables, and naturally in the pairs of the
latent vignette variables as well. When 0 , the error terms are uncorrelated in (1) and
(5) implying the independence of self-report responses and vignettes. This leads us to the
definition of the following hypothesis system:
0:
0:
1
0
H
H
We will refer to the null hypothesis 0H as to the independence hypothesis. Whenever the
independence hypothesis can not be rejected, the BIHOPIT model can be simplified to
two separate univariate HOPIT models for subdomain questions. It must be remarked that
under 0H the log-likelihood (8) becomes the sum of the log-likelihood functions of two
HOPIT models and the estimation of parameters are given separately. Thus, the log-
likelihood can be easily computed under 0H , while the log-likelihood under the
alternative hypothesis is given by fitting the BIHOPIT model. Finally, the asymptotic of
the likelihood ratio statistic for testing the independence hypothesis is a 2
1 distribution.
We note that similar hypothesis problem is considered for seemingly unrelated bivariate
probit model in Monfardini and Rosalba []. In their model the null hypothesis 0H is
equivalent to the exogeneity of the model, thus they refer to the null hypothesis as
exogeneity hypothesis. A number of procedures are proposed which are likely to be
successfully applied in our case as well.
There are two important questions concerning the relationship of the self-report and the
vignette parts of BIHOPIT model. One may investigate the equivalence of the cut-points
and the equivalence of the correlation structures for the two parts of the model. The
former one belongs to the so-called response consistency assumption, i.e. whether
individuals use the same response scales to rate the vignettes and self-reports. Let us
suppose that the parameters j
k ’s and vj
k that correspond to the self-report and vignette
cut-points are different. Then response consistency can be tested by comparing these cut-
point parameters using the likelihood ratio test. Such tests for null hypothesis like
equality of cut-points or equality of distances between cut-points are proposed in Bago
D’Uva et al. []. The latter one, in the bivariate case, means the comparison of the
correlation coefficients which belong to the self-report and vignette part, respectively. Let
us suppose that self and vig denote the correlation coefficients for the self-report
responses and the vignette responses, respectively. In mathematical term, the equivalence
of correlation structures can be formulated in the following hypothesis system:
8
vigself
vigself
H
H
:
:
1
0
We will refer to the null hypothesis 0H as to the correlation equivalence hypothesis. If
the correlation equivalence hypothesis is accepted then the BIHOPIT model is adequate
for data. Otherwise, we need to include a bit more complicated model which contains one
extra parameter for the new correlation coefficient. The asymptotic distribution of the
likelihood ratio statistic is again a 2
1 distribution.
Two specification issues can be addressed in the context of BIHOPIT model,
heteroscedasticity and the distributional assumption. Since there are no useful residuals
general approaches such as the Breusch-Pagan test are not available. Hence we must
build heteroscedasticity into the model and test it parametrically. A common approach to
modeling heteroscedasticity in categorical latent choice models is based on Harvey’s
exponential model, see []. In this model it is supposed that the error terms satisfy the
following assumptions:
0),|( iiij zxE , 2exp),|( j
iiiiij zzxVar .
Here iz is a known set of variables that does not include a constant term, and j
i ’s are
new parameter vector to be estimated. Maximization of the log-likelihood function with
respect to all the parameters is a bit more complicated because the function is only locally
concave. The homoscedasticity hypothesis can be tested by investigating the null
hypothesis 0:0 j
iH which can easily be done. The second specification test of interest
concerns the distribution or the link function. In order to solve this problem an
appropriate modification of Silva’s and Vuong’s tests can be a reasonable approach, see
Silva [] and Vuong [].
Assessing goodness-of-fit for ordered categorical data is not obvious because there is no
direct counterpart to the 2R goodness-of-fit statistic and it is a serious challenge to take
into account the order of the possible responses. One can compute the likelihood ratio
index, which is also called
0
2 ln/ln1 LLpseudoR ,
where Lln is the log-likelihood for the estimated model including the constant and 0ln L
is the log-likelihood for a model that only has a constant as parameter. Another way to
evaluate the goodness-of-fit of an ordered discrete statistical model is the prediction
method based on the confusion matrix, which is well-known in the classification problem
of data mining, see Tan et al. [, Section 4.2.]. Define the predicted categorical response as
the one of the possible responses, which is associated with maximum predicted
probability. The confusion matrix is derived as a table containing the counts of
individuals classified according to the true and predicted self-reports, respectively. Then
the goodness-of-fit of a probabilistic model can be evaluated using any performance
metric such as accuracy, which is defined as the fraction of correctly predicted
individuals in the whole population.
Posterior prediction
9
In latent trait models the prediction of the latent variables for each individuals based on
known categorical responses is given by systematic use of the Bayes’ theorem from
elementary probability theory. Suppose we denote the observed and latent variables by y
and y , respectively. The all posterior prediction is based on the posterior predictive
density, which is defined by
dyypyy
ypyyyyp
)()|Pr(
)()|Pr()|( ,
where )( yp denotes the probability density function of the latent variable y . We should
emphasize that compared to the conventional Bayesian approach, such as the CHOPIT
model, all the parameters in the BIHOPIT model are non-random. The only random
quantities are the observed and latent variables for self-reports and vignettes. There are
three general approaches for predicting the latent variable y individually, which will be
used to characterize the latent health status of an individual. These prediction methods are
unconditionally mean, conditionally mean, and maximum a-posteriori (MAP) prediction.
The unconditional mean is defined by
dyypydyyyypyyEy
)()Pr()|()(
i.e. in this case the predicted value for the latent trait does not depend on the observed
responses. In the context of BIHOPIT model the unconditional mean only depends on the
covariates, and we have for the two latent variables that
11)( T
ii xyE and 22)( T
ii xyE .
In contrast, the conditional mean defined by
dyypyy
dyypyyydyyypyyyE
)()|Pr(
)()|Pr()|()|( (8)
is already influenced by the self-report responses. It is well known that the conditional
mean minimizes the quadratic loss conditionally on the observed responses. Finally,
maximum a-posteriori (MAP) prediction y is defined as the mode of the posterior
predictive density function, i.e.
)|(maxargˆ yypyy
.
One of the main goals of this paper is to propose a reasonable individual prediction
method for the latent health status of respondents, which relies on both self-report and
vignette responses in order to correct heterogeneity in reporting health. Since
unconditional mean only depend on covariates and does not depend on self-report
responses, two respondents with same values of covariates will have the same predicted
health status despite the fact that the subjective feelings regarding their health status is
significantly different. Thus, unconditional mean does not appear to be a suitable method
for individual scoring. On the other hand, the maximum a-posteriori prediction seems to
10
be too sensitive to the subjective individual responses for self-report questions. Hence the
conditional mean approach, which takes into account the individual responses but not too
sensitive to them is a good choice for individual scoring. In supplement S2 a procedure is
described for calculating the conditional mean (8) in the context of BIHOPIT model.
Results
In the simulation study a population with two hypothetical countries (Country 1 and 2)
and two covariates was generated. One of the covariates was supposed to be continuous
for modeling age and the other one was supposed to be binary categorical for modeling
sex. It was also assumed that all three variables (country, age, and sex) are independent of
each other. Country and age were designed to have significant effect, and sex was
supposed to be non-significant. Two simulations were performed based on CHOPIT and
BIHOPIT scenarios, respectively. This suggests which one is the true model for the
simulation. For the simulation details, see Supplement 3. Then three statistical models
(BIOPROBIT, BIHOPIT, and CHOPIT) were fitted for the two simulated and one real
dataset. These models were compared on various issues based on these datasets.
Analyzing data sets simulated under BIHOPIT scenario For descriptive results, since the population was generated according predetermined
properties of covariates, only mean categorical rating for self-reports and associated
vignettes are shown in Table 1. The proportion of respondents in each response category
for self-reports and vignettes is shown in Figure 1. The mean categorical rating for self-
reports in case of Country 1 is greater than Country 2, thus the respondents from Country
1 rank themselves higher than those from Country 2. However adjusting for the vignette
responses, the ‘real’ difference is the other way around.
The substantive results including the ‘true’ parameters generated by the BIHOPIT model
and the estimated parameters coming from the three fitted models can be found in Table
2. The BIHOPIT estimates are very precisely retained apart from a scale shift in the
vignette means and self-reports constants. Although CHOPIT estimates look more
different, but the differences appear for the covariates which are not significant anyway,
e.g. sex, and in this sense the estimates are still correct. Only the BIOPROBIT estimates
fail to recover the original parameters not only in terms of magnitude but the direction of
the effect as well. Age becomes falsely positive and country rankings get swapped. See
the cut-point estimates in Table 3. Figure 2 and 3 show the latent score estimates versus
the ‘true’ latent scores. Data points fall near the main diagonal in both figures, scattered a
bit less for BIHOPIT which is not surprising. Cut-points figures can be found in the
Supplement.
Analyzing data sets simulated under CHOPIT with random effect scenario Mean categorical rating for self-reports and associated vignettes are shown in Table 4.
The proportion of respondents in each response category for self-reports and vignettes are
shown in Figure 4. Due to careful planning descriptives are similar to the BIHOPIT ones.
The substantive results including the ‘true’ parameters generated by the CHOPIT model
and the estimated parameters coming from the three fitted models can be found in Table
5. As before here also CHOPIT and BIHOPIT estimates are very close to what expected
apart from a similar parameter shift in vignette means and self-reports constants due to
non-unique identification. Similarly, the BIOPROBIT fails to find the true estimates and
11
same shifts occur as before. The cut-point estimates can be found in Table 6. Regarding
latent score recovery surprisingly BIHOPIT behaves slightly better see Figure 5 and 6.
Our guess is that CHOPIT could work better on multiple items in identifying the random
effect part but having only two items BIHOPIT becomes more precise. Cut-points figures
can be found in the Supplement.
Analyzing the SAGE dataset The real data analysis was performed using the SAGE data (Study on Global Ageing and
Adult Health). SAGE is the WHO's longitudinal panel study on health and health related
outcomes focusing on the population aged 50 years and older in China, Ghana, India,
Mexico, Russian Federation and South Africa.
The health state item pool consists of 16 health related questions, where responses were
recorded on a five point category scale from “no difficulty or problem” to “extreme
difficulty/inability”. These 16 items belong to 8 health domains: vision, mobility, self
care, cognition, interpersonal activities, pain and discomfort, sleep and energy, and affect.
In our example China and India is compared on the domain of cognition, item1 and item2
being difficulties with remembering things and learning a new task, respectively.
Looking at the cognition stackbar of the two countries (Fig. 7), one can see that although
the self responses show worse level of cognition for India, their vignette evaluation is
also more critical and accounting for this we expect less real difference between the two
countries, see mean vignette rankings in Table 7. This is exactly what our parameter
estimates tell us. In Table 8, parameters for the unadjusted and adjusted model differ
only in the country indicator, showing smaller adjusted values for both item1 and item2.
Two sample u-test show respective statistical significance. (This outcome is also due to
the invariance of the vignette responses by the remaining covariates which is not shown
here.)
In Table 10 one can see that the average BIOPROBIT scores of China by age categories
are always greater than the corresponding posterior BIHOPIT scores of India which
verifies that the unadjusted ranking of China is greater than India. In contrast, if we
consider the average of adjusted posterior BIHOPIT scores then we could see that in age
categories 64, 73, 76, 77, and 79 the difference is reversed, e.g., for age 76 the
BIOPROBIT score is greater in China than India by 0.354, but the posterior BIHOPIT
score is less in China than in India by 0.371. Thus, in fact, people in India are healthier
than in China in this category which is far from obvious by the self-report answers.
Discussion
The computational time of BIHOPIT and CHOPIT models was compared by running all
programs on a PC with Intel Core2 CPU6300 1.86 GHz processor and 2GB RAM for
1000 simulated respondent. The running time of BIHOPIT model was 43 second while
this time was 20 199 second for CHOPIT model under BIHOPIT scenario. In case of
CHOPIT scenario the running times were 39 second and 14 384 second for BIHOPIT and
CHOPIT fitting, respectively. Thus, the BIHOPIT code is approximately 400 times faster
than the CHOPIT code allowing the model could run on larger data sets. For example, the
running time for SAGE dataset only with 6 446 respondents was 167 621 second using
the CHOPIT code clearly showing the extra computational demand.
12
References
1. Bago D’Uva T (2005) Latent class models for use of primary care: evidence from
a British panel. Health Econ. 14: 873-892.
2. Bago D’Uva T, van Doorslaer E, Lindeboom M, O’Donnell O (2008) Does
reporting heterogeneity bias the measurement of health disparities? Health Econ.
17: 351-375.
3. Bago D’Uva T, Lindeboom M, O’Donnell O, van Doorslaer E (2009) Slipping
anchor? Testing the vignettes approach to identification and correction of
reporting heterogeneity. Available at
http://www.york.ac.uk/res/herc/documents/wp/09_30.pdf via the Internet.
Accessed 17 Jan 2011.
4. Butler J, Chatterjee J (1995) Pet econometrics: Ownership of cats and dogs.
Working Paper 95-WP1, Department of Economics, Vanderbilt University.
5. Butler J, Chatterjee J (1997) Tests of the specification of univariate and bivariate
ordered probit. Review of Economics and Statistics 79: 343-347.
6. Butler J, Finegan T, Siegfried J (1998) Does more calculus improve student
learning in intermediate micro- and macroeconomic theory? Journal of Applied
Econometrics 13(2): 185-202.
7. Calhoun, C (1989) Estimating the distribution of desired family size and excess
fertility. The Journal of Human Resources 24(4): 709–724.
8. Calhoun, C (1991) Desired and excess fertility in Europe and the United States:
indirect estimates from World Fertility Survey Data. European Journal of
Population 7: 29-57.
9. Gould W, Pitblado J, Poi B (2010) Maximum Likelihood Estimation with Stata.
4th Edition. Stata Press. 352 p.
10. Greene WH (2003) Econometric analyses. 5th Edition. New Jersey: Pearson
Education. 1026 p.
11. Greene WH (2009) Discrete choice modeling. Palgrave handbook of
econometrics. Vol. 2. Applied econometrics. Edited by Terence C. Mills and
Kerry Patterson. Palgrave Macmillan. 1128 p.
12. Greene WH, Harris NM, Hollingworth B, Maitra P. A bivariate latent class
correlated generalized ordered probit model with an application to modeling
observed obesity levels (April 2008). NYU Working Paper No. EC-08-18.
Available at SSRN: http://ssrn.com/abstract=1281910. Accessed 14 Jan 2011.
13. Harvey A (1976) Estimating regression models with multiplicative
heteroscedasticity. Econometrica 44: 461-465.
14. Hopkins DJ, King G (2010) Improving anchoring vignettes: Designing surveys to
correct interpersonal incomparability. Public Opinion Quarterly 74(2): 201-222.
15. Kakwani N, Wagstaff A, van Doorslaer E (1997) Socioeconomic inequalities in
health: Measurement, computation, and statistical inference. Journal of
Econometrics 77: 87-103.
13
16. Kapteyn A, Smith J, van Soest A. (2007) Vignettes and self-reports of work
disability in the US and the Netherlands. American Economic Review 97(1): 461–
473.
17. King G, Murray CJL, Salomon J, Tandon A (2004) Enhancing the validity and
cross-cultural comparability of measurement in survey research. American
Political Science Review 98(1): 184-191.
18. Kohler HP, Rodgers JL (1999) DF-like analyses of binary, ordered, and censored
variables using probit and tobit approaches. Behavior Genetics 29(4): 221-232.
19. Kristensen N, Johansson E (2008) New evidence on cross-country differences in
job satisfaction using anchoring vignettes. Labour Economics 15: 96-117.
20. Laha RG, Rohatgi VK (1979) Probability theory. New York: Wiley. 557 p.
21. Lauer Ch (2003) Family background, cohort and education: A French-German
comparison based on a multivariate ordered probit model of educational
attainment. Labour Economics 10: 231-251.
22. Magee L, Burbidge J, Robb L (2000) The correlation between husband’s and
wife’s education: Canada 1971-1996. Social and Economic Dimensions of an
Aging Population Research Papers, 24, McMaster University.
23. Monfardini Ch, Rosalba R (2008) Testing exogeneity in the bivariate probit
model: A Monte Carlo study. Oxford Bulletin of Economics and Statistics 70(2):
271-282.
24. Murphy A (2007) Score tests of normality in bivariate probit models. Economics
Letters 95: 374-379.
25. Pudney S, Shields M (2000) Gender, race, pay and promotion in the British
nursing profession: Estimation of a generalized ordered probit model. Journal of
Applied Econometrics 15(4): 367-399.
26. Sajaia Z (2008) Maximum likelihood estimation of a bivariate ordered probit
model: implementation and Monte Carlo simulations. Available:
http://www.adeptanalytics.org/download/ado/bioprobit/bioprobit.pdf via the
Internet. Accessed 12 Jan 2011.
27. Sajaia Z (2008) BIOPROBIT: module for bivariate ordered probit regression. The
World Bank. Available: http://fmwww.bc.edu/RePEc/bocode/b via the Internet.
Accessed 13 Jan 2011.
28. Salomon J, Tandon A, Murray CJL, World Health Survey Pilot Study
Collaborating Group (2004) Comparability of self-rated health: cross sectional
multi-country survey using anchoring vignettes. British Medical Journal 328: 258-
263.
29. Scott DM, Axhausen KW (2006) Household mobility tool ownership: modeling
interactions between cars and season tickets. Transportation from Springer 33(4):
311-328.
30. Silva J (2001) A score test for non-nested hypotheses with applications to discrete
response models. Journal of Applied Econometrics 16(5): 577-598.
31. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Boston:
Pearson Education. 769 p.
32. Tandon A, Murray CJL, Salomon JA, King G (2003) Statistical models for
enhancing cross-population comparability. In: Murray CJL, Evans DB editors.
14
Health systems performance assessment: debates, methods and empiricisms.
Geneva: World Health Organization. pp. 727-746.
33. Terza JV (1985) Ordinal probit: a generalization. Communications in Statistics
14(1): 1–11.
34. Tobias J, Li M (2006) Calculus attainment and grades received in intermediate
economic theory. Journal of Applied Econometrics 21(6): 893-896.
35. Vuong Q (1989) Likelihood ratio tests for model selection and non-nested
hypotheses. Econometrica 57: 307–334.
36. Weiss AA (1993) A bivariate ordered probit model with truncation: helmet use an
motorcycle injuries. Applied Statistics 42(3): 487-499.
37. Wright RA (1995) BIVOPROB: Computer program for maximum-likelihood
estimation of bivariate ordered-probit models for censored data, Version 11.92. by
Charles A. Calhoun. The Economic Journal 105(430): 786-787.
Figure Legends
Figure 1. Distribution of the responses obtained for self-reports and vignettes under a
BIHOPIT scenario.
Figure Legend 1. The data set consists of 1,000 observations from a randomly simulated
population of two countries (482 respondents for first country and 512 respondents for
second country). The stackbars show the distribution of their answers for self-report
(Self) and 5 vignette (1-5) questions.
Figure 2. Bihopit prediction for simulated dataset under a BIHOPIT scenario.
15
Figure Legend 2. The scores which are given by conditional mean posterior estimation
are plotted against the true latent scores of BIHOPIT model for the two questions
(domain A and B). The two countries are plotted using different colors for markers.
Figure 3. Chopit prediction for simulated dataset under a BIHOPIT scenario.
Figure Legend 3. The scores which are given by chopit posterior estimation using the
random effect CHOPIT model are plotted against the true latent scores of BIHOPIT
model for the two questions (domain A and B). In the posterior scores estimation
parameter randomization was used with Monte Carlo simulation of sample size 30 and
averaging. The two countries are plotted using different colors for markers.
Figure 4. Distribution of the responses obtained for self-reports and vignettes under a
CHOPIT scenario.
16
Figure Legend 4. The data set consists of 1,000 observations from a randomly simulated
population of two countries (496 respondents for first country and 504 respondents for
second country). The stackbars show the distribution of their answer for self-report (Self)
and 5 vignette (1-5) questions.
Figure 5. Bihopit prediction for simulated dataset under a CHOPIT scenario.
Figure Legend 5. The scores which are given by conditional mean posterior estimation
are plotted against the true latent scores of CHOPIT model for the two questions (domain
A and B). The two countries are plotted using different colors for markers.
Figure 6. Chopit prediction for simulated dataset under a CHOPIT scenario.
17
Figure Legend 3. The scores which are given by chopit posterior estimation using the
random effect CHOPIT model are plotted against the true latent scores of CHOPIT model
for the two questions (domain A and B). In the posterior scores estimation parameter
randomization was used with Monte Carlo simulation of sample size 30 and averaging.
The two countries are plotted using different colors for markers.
Figure 7. Distribution of the responses obtained for self-reports and vignettes for
cognition domain of SAGE dataset.
Figure Legend 7. The data set consists of 3,664 respondents from China and 2,782
respondents from India. The stackbars show the distribution of their answers for self-
report (Self) and 5 vignette (1-5) questions.
Figure 8. Comparing the BIOPROBIT and BIHOPIT models on SAGE dataset for China
and India.
18
Figure Legend 8. The figure compares China and India using scores obtained by
BIOPROBIT and posterior BIHOPIT models. China is plotted by circle marker, empty
circles are for India, while magenta denotes BIOPROBIT scores, and blue denotes
posterior BIHOPIT scores obtained by conditional means. The scores were given by
averaging the two subdomain scores for each model. For comparison point of view
BIOPROBIT scores were rescaled by shifting them with the average of self-report
constant parameters of BIHOPIT model which is (4.0664+3.9505)/2=4.00845.
Figure 9. Cut-points for BIOPROBIT and BIHOPIT models for remembering and
learning questions in SAGE dataset.
19
Figure Legend 8. The fix axis denotes the BIOPROBIT model, where the cut-points are
constants, i.e., they do not depend on countries. On the axis of China and India the
average of cut-point estimations are plotted.
Tables
Table 1. Mean categorical responses for self-reports and vignettes under BIHOPIT
scenario.
Domain A Domain B
Country Self Vig1 Vig2 Vig3 Vig4 Vig5 Self Vig1 Vig2 Vig3 Vig4 Vig5
1 3.16 4.08 3.13 2.19 1.46 1.14 3.35 4.99 4.93 4.36 3.2 1.8
2 2.52 1.6 1.21 1.02 1 1 3.17 4.89 4.3 3 1.6 1.11
Table 2. Estimation results based on BIHOPIT simulation: BIOPROBIT, BIHOPIT, and
CHOPIT models.
Variable True
BIHOPIT
Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT
param. std. err. param. std. err. param. std. err.
vigA 1 2 3.061 0.104 3.285 0.063
2 1.2 2.243 0.1 2.687 0.059
3 0.4 1.448 0.101 1.901 0.055
4 -0.4 0.669 0.107 0.975 0.055
selfA age -0.02 0.026 0.001 -0.0196 0.001 -0.022 0.002
sex 0.1 0.222 0.067 0.037** 0.049 -0.022** 0.057
country 2 -0.581 0.068 1.968 0.062 1.091 0.058
20
cons 2 3.045 0.119 2.757 0.102
vigB 1 3 4.046 0.104
2 2 3.003 0.074
3 1 1.985 0.064
4 0 0.987 0.063
selfB age -0.02 -0.073 0.002 -0.02 0.001
sex -0.1 -0.437 0.07 -0.094* 0.041
country 1 -0.201 0.07 1.072 0.045
cons 1 1.948 0.082
seA 0.5 0.46 0.040 1.023
seB 0.2 0.2 0.037 5.48E-6
corr 0.7 0.652 0.023 0.69 0.019
** denotes the parameters which are not significant at level 95%, * denotes the
parameters which are not significant at level 99% but they are significant at level 95%
Table 3. Cutpoint estimation based on BIHOPIT simulation: BIOPROBIT, BIHOPIT,
and CHOPIT models.
Variable True
BIHOPIT
Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT
param. std. err. param. std. err. param. std. err.
cutA1 age -0.03 -0.029 0.001 -0.012 0.0
sex -0.005 -0.066** 0.047 -0.034** .032
country 2.2 2.177 0.062 1.388 0.0359
cons 2 0.139 0.103 3.046 0.117 2.48 0.073
cutA2 age -0.05 -0.056 0.002 -0.022 0.002
sex -0.01 -0.055** 0.045 0.005** 0.067
country 0.3 0.352 0.046 0.066** 0.074
cons 0.8 0.936 0.107 0.957 0.088 -0.046** 0.114
cutA3 age 0.01 0.014 0.002 -0.001** 0.002
sex 0.02 0.029** 0.058 0.139* 0.069
country 0.2 -0.087** 0.059 -0.347 0.074
cons -2 1.391 0.113 -2.113 0.106 -1.395 0.125
cutA4 age 0.05 0.045 0.002 0.013 0.002
sex -0.005 -0.087** 0.047 -0.069** 0.055
country 0.2 0.296 0.049 0.06** 0.06
cons -4 2.15 0.12 -3.652 0.113 -1.529 0.1
cutB1 age -0.005 -0.005 0.001
sex 0.005 -0.012** 0.041
country 1.1 1.156 0.0452
cons -0.2 -5.614 0.19 0.767 0.077 -1.557 0.042
cutB2 age 0.01 0.009 0.002
sex -0.01 -0.011** 0.058
country -0.2 -0.169 0.057
cons -2.1 -4.61 0.166 -2.062 0.145 -0.539 0.079
cutB3 age -0.02 -0.016 0.001
sex 0.02 0.078* 0.039
country -0.2 -0.144 0.038
cons -0.5 -3.5 0.14 -0.698 0.081 0.45 0.079
cutB4 age 0.005 0.005 0.002
sex -0.005 -0.113 0.036
21
country 0.1 0.087* 0.035
cons -1.4 -1.971 0.121 -1.346 0.081 0.127* 0.063
** denotes the parameters which are not significant at level 95%, * denotes the
parameters which are not significant at level 99% but they are significant at level 95%
Table 4. Mean categorical responses for self-reports and vignettes under CHOPIT
scenario.
Domain A Domain B
Country Self Vig1 Vig2 Vig3 Vig4 Vig5 Self Vig1 Vig2 Vig3 Vig4 Vig5
1 3.41 4.58 3.69 2.71 1.65 1.24 3.99 4.7 3.93 2.83 1.86 1.25
2 2.65 2.59 1.66 1.18 1.01 1 3.13 2.8 1.76 1.21 1.03 1
Table 5. Estimation results based on CHOPIT simulation: BIOPROBIT, BIHOPIT, and
CHOPIT models.
Variable True CHOPIT Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT
parameter std. err. parameter std. err. parameter std. err.
vigA 1 2.5 3.379 0.102 3.501 0.074
2 1.6 2.486 0.097 2.578 0.07
3 0.7 1.7 0.096 1.743 0.069
4 -0.2 0.753 0.102 0.827 0.073
selfA age -0.02 0.028 0.002 -0.022 0.001 -0.02 0.001
sex 0.1 0.144* 0.067 0.049** 0.044 0.069** 0.037
country 1.5 -0.719 0.069 1.7 0.05 1.56 0.042
cons 2 3.001 0.114 3.016 0.088
vigB 1 2.5 3.367 0.1
2 1.6 2.478 0.095
3 0.7 1.673 0.094
4 -0.2 .859 0.098
selfB age -0.02 0.019 0.002 -.019 0.001
sex 0.1 0.262 0.069 .073** 0.044
country 1.5 -0.776 0.07 1.509 0.05
cons 2 2.884 0.109
seA 0.2 0.283 0.037 0.197
seB 0.3 0.3 0.04 0.279
corr 0.666 0.022 0.509 0.017
** denotes the parameters which are not significant at level 95%, * denotes the
parameters which are not significant at level 99% but they are significant at level 95%
Table 6. Cutpoint estimation based on CHOPIT simulation: BIOPROBIT, BIHOPIT, and
CHOPIT models.
Variable True
BIHOPIT
Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT
param. std. err. param. std. err. param. std. err.
cutA1 age -0.03 -0.03 0.001 -0.03 0.0
sex -0.005 -0.04** 0.043 -0.03** 0.033
country 1.7 1.7 0.051 1.742 0.037
22
cons 2 -0.22 0.105 3 0.114 3.026 0.084
cutA2 age -0.05 -0.048 0.002 -0.051 0.003
sex 0.01 0.151 0.042 0.06** 0.064
country 0.4 0.438 0.044 0.479 0.065
cons 0.5 0.771 0.109 0.245 0.083 0.474 0.11
cutA3 age 0.01 0.018 0.002 0.013 0.002
sex -0.02 0.020** 0.067 -0.01** 0.066
country 0.2 0.080** 0.069 0.16 0.068
cons -2.5 1.124 0.113 -2.977 0.121 -2.609 0.118
cutA4 age 0.05 0.056 0.002 0.05** 0.001
sex 0.005 0.097* 0.039 0.029 0.049
country 0.2 0.207 0.043 0.114* 0.052
cons -3.5 2.298 0.122 -4.042 0.108 -3.543 0.107
cutB1 age -0.028 0.001
sex -0.035** 0.043
country 1.645 0.05
cons -0.837 0.107 2.81 0.109 -0.103 0.016
cutB2 age -0.047 0.003
sex 0.078** 0.054
country 0.633 0.058
cons -.129 0.106 -0.325 0.105 -0.523 0.06
cutB3 age 0.011 0.002
sex 0.055** 0.049
country 0.235 0.052
cons 0.471 0.111 -2.164 0.088 0.587 0.07
cutB4 age 0.049 0.001
sex -0.116* 0.046
country 0.119* 0.046
cons 1.071 0.113 -4.256 0.103 -0.831 0.051
** denotes the parameters which are not significant at level 95%, * denotes the
parameters which are not significant at level 99% but they are significant at level 95%
Table 7. Mean categorical responses for self-reports and vignettes in cognition domain of
SAGE dataset.
Remembering Learning
Country Self Vig1 Vig2 Vig3 Vig4 Vig5 Self Vig1 Vig2 Vig3 Vig4 Vig5
China 4.42 4.87 4.39 3.59 3.08 1.8 4.22 4.87 4.46 3.44 3.02 1.7
India 4.09 4 3.73 3.27 2.81 2.85 3.92 3.92 3.67 3.1 2.68 2.71
Table 8. Estimation results for SAGE dataset: BIOPROBIT, CHOPIT, and BIHOPIT
models.
Variable Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT
param. std. err. param. std. err. |u|-value param. std. err.
vigA 1 2.692 0.0222 2.537 0.0158
2 1.998 0.0206 1.935 0.0147
3 1.235 0.0196 1.127 0.0139
4 0.760 0.0192 0.701 0.0135
selfA age -0.0277 0.00057 -0.0292 0.00075 1.592 -0.0293 0.00114
sex -0.1537 0.0154 -0.1452 0.0204 0.332 -0.1254 0.03086
23
educ 0.1579 0.005 0.1724 0.0066 1.751 0.1719 0.01003
country -0.6309 0.0162 -0.4522 0.0215 6.638 -0.2553 0.03229
cons 4.0664 0.0715 3.7522 0.10418
vigB 1 2.727 0.0224
2 2.111 0.0208
3 1.199 0.0195
4 0.803 0.0192
selfB age -0.0294 0.00056 -0.032 0.00076 2.754
sex -0.1311 0.015 -0.1127 0.0205 0.724
educ 0.1794 0.0049 0.1877 0.0067 0.999
country -0.5083 0.0158 -0.3173 0.0215 7.159
cons 3.9505 0.0715
seA 1.0982 0.0084 0.5601
seB 1.1588 0.0079 0.9242
corr 0.7749 0.0036 0.8504 0.0057
** denotes the parameters which are not significant at level 95%, * denotes the parametes
which are not significant at level 99% but they are significant at level 95%
Table 9. Cutpoint estimation for SAGE dataset: BIOPROBIT, BIHOPIT, and CHOPIT
models.
Variable Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT
param. std. err. param. std. err. param. std. err.
cutA1 age 0.00098** 0.00077 -0.00034** 0.0006
sex 0.01737** 0.02245 0.04937 0.0166
educ 0.01227* 0.00705 0.02485 0.0053
country 0.26072 0.02416 -0.0862 0.01835
cons -4.681 0.0627 -0.89106 0.07153 -0.7713 0.0551
cutA2 age -0.00008** 0.00063 0.0011** 0.0005
sex 0.0227** 0.01988 0.0013** 0.0159
educ -0.00487** 0.00607 -0.0169 0.005
country 0.21691 0.02208 0.4715 0.0177
cons -3.622 0.0548 -0.09203** 0.06133 -0.2837 0.0523
cutA3 age 0.00007** 0.00053 0.00095** 0.0004
sex -0.05411 0.01558 -0.0513 0.0139
educ 0.00584** 0.00493 -0.0029** 0.0044
country -0.18954 0.01671 0.046958 0.0144
cons -2.823 0.053 0.01456** 0.04984 -0.1409 0.0459
cutA4 age 0.00159 0.00046 0.0013 0.0005
sex -0.01338** 0.01264 -0.0335 0.0131
educ -0.01247 0.00403 -0.0219 0.0042
country -0.165 0.01356 -0.0656 0.0139
cons -1.85 0.0518 0.02177** 0.04172 0.0189** 0.0433
cutB1 age 0.00267 0.00075
sex 0.05574 0.02095
educ 0.0154* 0.00688
country 0.44305 0.02138
cons -4.098 0.0557 -0.9508 0.07027 0.1886 0.01517
cutB2 age -0.0017* 0.0007
sex -0.03334** 0.01963
educ -0.0199 0.00669
country 0.09922 0.0199
24
cons -3.276 0.0526 0.11744** 0.0666 -0.0169** 0.0143
cutB3 age 0.00009** 0.00056
sex 0.00486** 0.01515
educ 0.00908** 0.00501
country -0.21291 0.01552
cons -2.541 0.0512 -0.0805** 0.0515 0.0129** 0.0128
cutB4 age -0.0002** 0.0005
sex -0.04039 0.01329
educ -0.0179 0.00436
country -0.24242 0.01413
cons -1.634 0.0504 0.12725 0.04467 -0.1163 0.012
** denotes the parameters which are not significant at level 95%, * denotes the
parameters which are not significant at level 99% but they are significant at level 95%
Table 10. The BIOPROBIT, BIHOPIT, and posterior BIHOPIT scores for China and
India in age category greater or equal than 50.
Age China India
Obs Bioprob Bihopit Post. Bih Obs Bioprob Bihopit Post. Bih
50 111 2.815 2.764 2.669 113 1.987 2.104 2.141
51 138 2.747 2.692 2.548 70 2.014 2.133 2.103
52 156 2.726 2.668 2.695 90 1.999 2.117 2.07
53 135 2.686 2.626 2.513 55 1.97 2.085 2.121
54 150 2.594 2.528 2.468 58 1.898 2.007 2.017
55 172 2.565 2.497 2.506 141 1.81 1.915 1.896
56 141 2.548 2.477 2.637 48 1.947 2.06 1.807
57 165 2.524 2.453 2.413 36 1.859 1.964 2.101
58 123 2.453 2.377 2.524 74 1.812 1.915 2.094
59 122 2.429 2.353 2.514 34 1.828 1.932 1.966
60 153 2.428 2.349 2.290 137 1.661 1.754 1.732
61 96 2.415 2.336 2.303 20 1.912 2.02 1.729
62 85 2.327 2.243 2.18 41 1.676 1.769 1.711
63 128 2.315 2.231 2.228 52 1.65 1.742 1.839
64 77 2.307 2.221 2.069 35 1.655 1.747 2.088
65 94 2.237 2.145 2.075 152 1.54 1.624 1.734
66 78 2.278 2.188 2.082 23 1.641 1.728 2.058
67 100 2.221 2.128 1.988 21 1.668 1.758 1.819
68 75 2.208 2.115 2.174 52 1.478 1.554 1.639
69 87 2.119 2.019 1.924 25 1.517 1.595 1.806
70 86 2.059 1.956 2.055 90 1.364 1.435 1.58
71 71 2.073 1.969 1.992 9 1.393 1.465 1.374
72 96 2.043 1.938 1.932 22 1.498 1.577 1.218
73 70 1.983 1.873 1.608 25 1.418 1.491 1.837
74 74 1.945 1.833 1.827 22 1.361 1.427 1.194
75 79 1.961 1.847 1.866 45 1.248 1.308 1.164
76 55 1.775 1.652 1.668 4 1.422 1.485 2.039
77 56 1.815 1.694 1.772 3 1.716 1.799 1.818
78 43 1.807 1.682 1.862 15 1.179 1.234 1.135
79 43 1.729 1.601 1.766 5 1.389 1.459 1.952
80 27 1.715 1.587 1.274 29 1.008 1.053 1.161
81 34 1.633 1.5 1.308 2 1.068 1.107 0.177
25
82 27 1.541 1.401 1.263 4 1.046 1.089 1.162
83 17 1.57 1.431 1.391 7 1.034 1.08 0.668
84 9 1.576 1.438 1.267 4 1.116 1.163 1.093
85 4 1.867 1.742 2.299 14 0.85 0.889 0.949
86 11 1.701 1.571 1.19 3 1.075 1.121 0.78
87 11 1.463 1.321 1.683 2 0.67 0.704 1.322
88 8 1.719 1.583 1.822 3 0.81 0.854 0.403
89 8 1.507 1.362 1.311 4 0.613 0.643 -0.333
90 3 1.154 0.997 0.871 5 0.642 0.664 0.484
Supporting Information
Text S1
Maximum likelihood estimation in bivariate models
In this supplement we describe the full-information maximum likelihood (FIML)
estimation of the parameters of bivariate models applied in this paper. Define the event
that the answers of the i -th respondent are k and for the two self-report questions, i.e.,
),(),( 2
2
2
1
1
1
1
121
ikikii
i
k yyykyE .
If the error terms ),0(~ 2
11 N and ),0(~ 2
22 N in (1) and they are correlated with
correlation coefficient then the probability of this event is given by
,,,,,
,,,,)Pr(
2
2
1
1
1
1
12
2
2
2
1
1
1
1
2
2
2
2
1
1
1
12
2
2
2
1
1
1
2
i
T
i
T
ik
T
i
T
ik
T
i
T
ik
T
i
T
iki
k
xxxx
xxxxE
where 2 denotes the cumulative density function of the standard bivariate normal
distribution with correlation coefficient . The log-likelihood of the BIOPROBIT
model is:
N
i
K
k
i
kiiself EykyIL1 1,
21 )Pr(ln),(ln
(8)
under the restriction that 121 , where )(EI denotes the indicator function of an
event E. Thus, the parameters of the BIOPROBIT model are 1,,1,,,, 21
21 Kkkk
and .
The maximum likelihood estimation in the BIHOPIT model is much more complicated
but a straight generalization of the estimation in the BIOPROBIT model. In order to
26
incorporate information on vignette ratings and the two self-report questions, there are
two components to the log-likelihood: the first component refers to estimation of cut-
points using responses to vignettes, and the second component utilizes responses on the
self-report questions. In this case the cut-points depend on the covariates thus define the
event i
kE as before replacing the fix cut-points by individual dependent ones 2
,
1
, , kiki .
Then the self-report part of the log-likelihood is defined by (8). Moreover, define the
event that the responses of the i -th respondent are j and k for the -th pair of vignette,
i.e.,
),(),( 2
,2
2
1,
1
,1
1
1,21 ki
v
ikiji
v
iji
v
i
v
i
i
jk yykyjyF
.
Then the probability of this event is given by
,,,,
,,,,)Pr(
2
2
1,1
1
1,22
2
1,1
1
,2
2
2
,1
1
1,22
2
,1
1
,2
kijikiji
kijikiji
i
jkF
The vignette part of the log-likelihood is defined by
N
i
L K
kj
i
jk
v
i
v
ivig FkyjyIL1 1 1,
21 )Pr(ln),(ln
.
The overall log-likelihood is given by
vigself LLL lnlnln . (8)
Here we remark that in the definition of the self-report part of the log-likelihood we drop
the constraint made for 21, in the BIOPROBIT case. Thus, the parameters of the
BIHOPIT model are the self-report regression coefficients 21, , the vignette means
,,,2,1,, 21 L the cut-point coefficients 1,,1,, 21 Kkkk , the self-report
variances 21, and the common correlation coefficient .
The maximum likelihood estimation for CHOPIT model is derived in King et al. []. Here
we note that the likelihood for the self-report part is defined by
dxx
L
kyIN
i
K
k
M
j
T
i
j
ki
T
i
j
ki
self
ij
1
)(
1 1 1
1,,
and for the vignette part it is defined by
)(
1 1 1 1
1,,
kyIN
i
M
j
K
k
L
j
j
kij
j
kivig
ij
L
.
27
Since the self-report part of the likelihood involves one-dimensional integrals we can not
maximize it in the standard way. One approach is to approximate this integrals by Gauss-
Hermite quadrature, the other one is the method of maximum simulated likelihood
(MSL), see Greene [, Section 21.5.1].
These log-likelihoods are maximized by using the Stata's modified Newton-Raphson
(NR) procedure. Let denote the parameter vector of the all parameters in the models,
and denote the gradient and the Hessian by g and H . This procedure can be summarized
as follows.
1. Start with a guess 0 .
2. Calculate a direction vector )()( 1
ii gHd for i -th iteration.
3. Calculate a new guess dii 1 , where is a scalar defined by the
following algorithm
a. Start with 1 .
b. If )()( ii d then try 2 . If )()2( dd ii then try
3 and so on.
c. If )()( ii d then back up and try 5.0 . If )()5.0( ii d
then back up and try 25.0 and so on.
4. Go to step 2 and repeat.
If the Hessian is not invertable then the modified Marquardt algorithm is applied, see
Gould et al. []. In the computation of the gradient and Hessian of the log-likelihoods we
need the first and second derivatives of the cumulative density function 2 of the
standard bivariate normal distribution defined by
1 2
21212212 ),,(:),,(
dxdxxx ,
where 2 denotes the bivariate standard normal probability density with correlation
coefficient defined by
)1(2
2exp
12
1),,(
2
2
221
2
1
2212
xxxxxx .
By easy algebra we have
duux
uxxx
1
2
2212
1)(),,(
,
where and denote the cumulative and probability density function of the standard
normal distribution, respectively. Thus, we have for the first derivatives of 2 that
2
121
1
2
1)(
xxx
x,
2
212
2
2
1)(
xxx
x, ),,( 212
2
xx
.
28
For the second derivatives of 2 we obtain, see e.g. Weiss [, Appendix A],
),,(1
)( 2122
12112
1
2
2
xx
xxxx
x
,
),,(1
)( 2122
21222
2
2
2
xx
xxxx
x
,
),,( 212
21
2
2
xxxx
,
2
12212
1
2
2
1),,(
xxxx
x,
2
21212
2
2
2
1),,(
xxxx
x,
22
2
2
2
121
2
22122
2
2
)1(
)()1(
1),,(
xxxxxx .
Text S2
Posterior prediction by conditional means
In this supplement a formula is derived for computing the conditional mean (7) in the
context of BIHOPIT model. We define the conditional means of the two latent variables
21 , yy with respect to the observed responses 21, yy as conditional expectations
),|( 211 yyyE and ),|( 212 yyyE . By definition of the conditional expectation these are
RK 2
,,1 functions. The general formula for conditional expectation is
)(
))(()|(
AP
AIEAE
,
where is a random variable and A is an event, see formula (6.1.3) in Laha and Rohatgi
[]. Let us apply this formula with the choice 1: y (or
2y ) and 21 ,: ykyA .
Using latent variables the event A can be expressed in the
form 2
2
2
1
1
1
1
1 ,
yyA kk . For the sake of simplicity we suppose that
21 ,YY has standard bivariate normal distribution with correlation coefficient . Then
the conditional expectation ),|( 211 yyyE can be expressed as the ratio BA / , where
1
11
2
21
212121 ),,(:k
k
dxdxxxxA
and
1
11
2
21
2121221 ),,(),Pr(:k
k
dxdxxxykyB
.
Define the function as
29
1 2
21212121 ),,(),,(
dxdxxxx .
By easy algebra we have
dxx
x
1
2
221
1)(),,(
.
By integrating by parts we obtain
2
2122
21121
1)(
1)(),,(
,
and the numerator A can be computed as
),,(),,(),,(),,( 2
1
1
1
2
1
121
1
21 kkkkA .
The denominator B is given by similar formula
),,(),,(),,(),,( 2
1
1
12
2
1
1
2
21
12
21
2 kkkkB .
In the general case ),(~ 2
111 TxNy and ),(~ 2
222 TxNy with correlation coefficient
. By standardization the conditional means can be expressed as
C
AxykyyE T 1
11211 ),|( and C
AxykyyE T 2
22212 ),|( ,
where 1A and 2A are computed similarly to A replacing the cut-points by their
standardized ones i
i
Ti
ki
k
x
~ , 2,1i , and interchanging the two kinds of cut-points
21, for 2A . Finally, C is defined similarly to B using again the standardized cut-
points in the formula.
Text S3
Simulation study design
Three independent covariates (age, sex, and country) were generated by a simple Monte
Carlo using uniform pseudo-random numbers. It was supposed that sex and country have
Bernoulli distribution with mean 0.5. The continuous covariate age was generated as a
truncated normal random number in the interval (18,100) with mean 35 and standard
deviation 25. More precisely, the values of age are rounded to an integer.
The latent variables of the simulated CHOPIT model were generated by linear equation
(7), where the random effect and the error term are simulated independently and the
covariates are age, sex, and country with appropriate parameters. The latent vignette
30
variables were generated by equation (5), where the error terms are supposed to be
independent of each other. The cut-point variables were generated recursively by
equation (3) using exponential parameterization to ensure the ordering between the cut-
points. Finally, the ‘observed’ self-report and vignette variables were generated by
discretizing the appropriate continuous ones using equations (4) and (6).
The two latent continuous variables of the simulated BIHOPIT model were generated by
linear equation (1), where the three covariates are age, sex, and country and the error
terms are correlated. Similarly, the latent vignette variables were given by equation (5)
with constant vignette means as parameters, where the error terms are correlated with the
same correlation coefficient to the self-report part. The cut-point variables were generated
by equation (3) again. Then, the discrete responses were given again by equations (4) and
(6).
Figure Supplement
Figure 10. Predicted versus true cut-points for domain A using BIHOPIT model based on
BIHOPIT scenario.
Figure 11. Predicted versus true cut-points for domain B using BIHOPIT model based on
BIHOPIT scenario.
31
Figure 12. Predicted versus true cut-points for domain A using CHOPIT model based on
BIHOPIT scenario.
Figure 13. Predicted versus true cut-points for domain B using CHOPIT model based on
BIHOPIT scenario.
32
Figure 14. Predicted versus true cut-points for domain A using BIHOPIT model based on
CHOPIT scenario.
Figure 15. Predicted versus true cut-points for domain B using BIHOPIT model based on
CHOPIT scenario.
33
Figure 16. Predicted versus true cut-points for domain A using CHOPIT model based on
CHOPIT scenario.
Figure 17. Predicted versus true cut-points for domain A using CHOPIT model based on
CHOPIT scenario.
34