34
1 WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate hierarchical ordered probit model and anchoring vignette methodologies in SAGE to improve cross-country comparability of self-reported health. Márton Ispány, Emese Verdes, Ajay Tandon, Somnath Chatterji March 2012 Introduction In many social science researches and econometric applications data that arise through measurement of discrete outcomes or discrete choice among a set of alternatives are in the form of ordinal or ordered categorical data. Such examples, among others, are self- report responses in household surveys, modeling labor force participate, or decision of which product to choose or which candidate to elect. To analyze such data a number of discrete response (choice) models have been developed in econometric theory, see Greene [,]. A simple one of these kinds of models is the class of ordered univariate response models where the number of categories of the dependent variable is greater than two, i.e. there are several possible outcomes or choices and they are ordered according to the preferences of respondent. The same data structure also arises in the analysis of repeated measurements, where the response of each respondent, experimental unit or subject is observed on multiple occasions to record the level of a specific event. Responses of this type are known as multivariate or correlated categorical responses. The theory of univariate ordered models is relatively well developed and they have been applied extensively in biostatics, economics, political science and sociology while estimations of the joint probability distribution of two or more ordered categorical variables are less common in the literature. The relatively new bivariate (multivariate) ordered probit (BIOPROBIT) models could be treated as an extension of a standard bivariate (multivariate) probit model when the number of categories of the dependent variables is greater than two. The estimation procedures and their statistical properties of BIOPROBIT model are studied in Greene [, Section 11.5.2], Sajaia [] and others. Some of the applications of BIOPROBIT models are modeling educational level of married couples (Magee et al. []), educational attainment in French and Germany (Lauer []), family size (Calhoun [,]), fertility outcome and fertility motivations of Danish twins (Kohler and Rodgers []), analyzing ownership of cats and dogs or dogs and televisions (Butler and Chatterjee [,]), and household-level decision between number of seasonal tickets and number of cars (Scott and Axhausen []). The paper put forth a new approach to modeling correlated categorical responses in heterogeneous population where the subjective scale is changing according to the different segments, such as countries, of the population. We utilize a vignette methodology to evaluate and correct subjective correlated responses. This methodology

WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

1

WHO Multi-Country Studies unit Working Paper 4

Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the

bivariate hierarchical ordered probit model and anchoring vignette methodologies

in SAGE to improve cross-country comparability of self-reported health.

Márton Ispány, Emese Verdes, Ajay Tandon, Somnath Chatterji

March 2012

Introduction

In many social science researches and econometric applications data that arise through

measurement of discrete outcomes or discrete choice among a set of alternatives are in

the form of ordinal or ordered categorical data. Such examples, among others, are self-

report responses in household surveys, modeling labor force participate, or decision of

which product to choose or which candidate to elect. To analyze such data a number of

discrete response (choice) models have been developed in econometric theory, see

Greene [,]. A simple one of these kinds of models is the class of ordered univariate

response models where the number of categories of the dependent variable is greater than

two, i.e. there are several possible outcomes or choices and they are ordered according to

the preferences of respondent. The same data structure also arises in the analysis of

repeated measurements, where the response of each respondent, experimental unit or

subject is observed on multiple occasions to record the level of a specific event.

Responses of this type are known as multivariate or correlated categorical responses. The

theory of univariate ordered models is relatively well developed and they have been

applied extensively in biostatics, economics, political science and sociology while

estimations of the joint probability distribution of two or more ordered categorical

variables are less common in the literature. The relatively new bivariate (multivariate)

ordered probit (BIOPROBIT) models could be treated as an extension of a standard

bivariate (multivariate) probit model when the number of categories of the dependent

variables is greater than two. The estimation procedures and their statistical properties of

BIOPROBIT model are studied in Greene [, Section 11.5.2], Sajaia [] and others. Some

of the applications of BIOPROBIT models are modeling educational level of married

couples (Magee et al. []), educational attainment in French and Germany (Lauer []),

family size (Calhoun [,]), fertility outcome and fertility motivations of Danish twins

(Kohler and Rodgers []), analyzing ownership of cats and dogs or dogs and televisions

(Butler and Chatterjee [,]), and household-level decision between number of seasonal

tickets and number of cars (Scott and Axhausen []).

The paper put forth a new approach to modeling correlated categorical responses in

heterogeneous population where the subjective scale is changing according to the

different segments, such as countries, of the population. We utilize a vignette

methodology to evaluate and correct subjective correlated responses. This methodology

Page 2: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

2

is widely applied to many economic applications with subjective scales, e.g. health,

health care, school community strength, HIV risk, state effectiveness, and corruption (see

for example, http://gking.harvard.edu/vign/eg/). WHO’s Study on global AGEing and

adult health (SAGE) used a number of methods to improve the reliability, validity and

comparability of its self-reported health measures, including the use of anchoring

vignettes. The anchoring vignette technique presents the respondent with a set of

hypothetical stories about which the same questions and response categories are used for

self-assessment of health. The vignettes are used to fix the level of ability on a given

health domain to better distinguish between differences in self-ratings due to actual health

differences and those due to varying norms or expectations for health. (Hopkins and

King, 2010; Salomon, 2004).

Measuring the health state of individuals is important for the evaluation of health and

social policies, monitoring and measuring the health of populations. Self-report health is

a common method for assessing health status in household health surveys and single

question versions of self-reported health predict a range of health outcomes from

disability to death (Cutler 2009, Singh-Manoux et al. Psychosomatic Medicine

2007;69:138-43.).

Methods

The models

Categorical data on health are usually described by discrete choice latent variable models,

by assuming that the observed categorical variables, e.g. the self-report health responses,

are discrete transformations of an underlying, unobservable, and continuous true level of

health. For detailed introduction to discrete choice models see Chapter 21 of Greene []

and a recent survey of Greene [, Chapter 11]. If this discrete transformation is constant

across individuals then we say that the homogeneous reporting behavior holds in

responses. On the contrary, reporting heterogeneity means that the mappings between the

latent variables and observed categorical variables are different for various categories of

respondents. In this paper, we consider the multivariate, especially the bivariate case, i.e.

we allow more than one categorical response for each individual.

Let ijy , MjNi ,,1,,,1 , be a self-reported categorical health measure, where i

and j refer to the respondent and the number of question, respectively. Moreover, N and

M denotes the number of respondents and questions, respectively. The latent variable

models assume there is an unobserved continuous latent variable

ijy for ith respondent at

jth question. These latent variables are supposed to depend on observable covariates and

they are modeled by latent equations. Using the linear regression model as one the

simplest ones

ijy is specified as

ijj

T

iij xy (1)

Here ix is a vector of covariates for the i -th respondent, j is a regression coefficient,

and the error vector )( iji has M -dimensional normal distribution with mean 0 and

variance matrix . The diagonal elements of are supposed to be 1, in order to be

Page 3: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

3

identified the model. In the homogeneous reporting case it is assumed that the observed

categorical responses ijy of the i -th individual depend on the latent variables in the

following way:

j

kij

j

kij yky

1 , (2)

Kk ,,1 , j

K

j

K

jj 110 with j

K

j ,0 , where K denotes the number of

different answers to self-report questions. The model (1) with cut-point definition (2) is called

multivariate ordered probit model.

In the special case 2M , we speak about bivariate ordered probit (BIOPROPIT) model,

see Section 11.5.2 in Greene [] or Sajaia []. We also remark that in the one-dimensional

case 1M the standard ordered probit (OPROBIT) model is given, see Chapter 21.8 in

Greene []. There are several extensions of the ordered probit model that follow the logic

of bivariate models using two latent equations with correlated error terms, see e.g. Butler

et al. [] and Tobias and Li []. Our setup follows the latter one. Seemingly unrelated and

simultaneous specifications of two-equation ordered probit model are considered e.g. in

Sajaia []. The parameters, which are the regression coefficients j ’s, the cut-points j

k ’s

and the independent non-diagonal elements of are estimated by the maximum

likelihood method using the full information maximum likelihood (FIML), see

supplement S1. In order to be increasing the cut-points we use exponential

parametrization, i.e. )exp(1

j

k

j

k

j

k , jj

11 , and the new parameters j

k ’s will be

estimated. The FIML technique can be easily applied in any statistical or econometric

software, e.g. in STATA, where the cumulative density function of the standard bivariate

normal distribution is implemented, see Sajaia []. Butler et al. [] proposed a two-step

estimation approach based on fitting univariate ordered probit models. Tobias and Li []

suggested a Bayesian alternative estimation. The two latent equations of the BIOPROBIT

model can be rewritten in the following two-dimensional vector form as a standard linear

model

2

1

2

1

2

1

0

0

i

i

T

i

T

i

i

i

x

x

y

y

.

Here the error terms 1i and 2i are distributed as standard bivariate normal with

correlation coefficient . Summarizing, the BIOPROBIT model is a two-dimensional

latent model with ordered probit link function and bivariate normally distributed latent

variables.

In the heterogeneous reporting case the ordered probit models are no longer appropriate

for describing the data. However, it is possible to generalize these models by allowing the

cut-points to depend on covariates as

jT

i

j

i x 11, , jkT

i

j

ki

j

ki x exp1,, , 1,,2 Kk . (3)

Hence the dependence between the categorical observed and continuous latent variables

can be derived in the way

Page 4: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

4

j

k,iij

j

1k,iij yky

(4)

Here j

k ’s are parameters which measure the impact of covariates on cut-points, see

Terza [] and Pudney and Shields []. In order to identify the effect of covariates on cut-

points we use vignette ratings as exogeneous information which fix different levels of

respondent’s categories. This technique has been suggested by Tandon et al. [], see also

King et al. [], Salomon et al. []. Suppose that there are L vignettes for each question and

denote by v

ijy the vignette rating for -th vignette of j -th question at i -th respondent. It

is also assumed that the possible vignette values are K,,2,1 , i.e. they coincide with the

possible values of self-reports. In the latent trait model approach it is supposed that there

is an unobservable continuous variable v

ijy behind each v

ijy for all ji, and . We assume

that these latent variables are fixed over the whole population, i.e. they do not depend on

the covariates. In mathematical terms,

ijj

v

ijy , (5)

LMjNi ,,1,,,1,,,1 , where j denotes the vignette mean and the error vector

)( iji has M -dimensional normal distribution with mean 0 and variance matrix v . The

observed vignette ratings depend on the latent vignette variables in the following way:

j

ki

v

ij

j

ki

v

ij yky ,1,

(6)

It should be emphasized that we use the same cut-points as in the self-report part. Thus

the self-report model (1) and the vignette model (5), which are multivariate ordered

probit models, with cut-point relations (4) and (6), respectively, are joined by using

common cut-points parametrized by (3). We refer to the system of these two coupled

multivariate ordered probit models as multivariate hierarchical ordered probit model. In

the one-dimensional case 1M we get back the well-known hierarchical ordered probit

(HOPIT) model. The HOPIT model was originally developed to enhance the cross-

population comparability of self-report survey data, see Tandon et al. []. In the special

case 2M we speak about bivariate hierarchical ordered probit (BIHOPIT) model.

From the identification point of view, we assume that 01 j for all j and the diagonal

entries of the matrix v equal to 1. We remark that in this case we do not need to

suppose anything on entries of the self-report covariance matrix , it is an arbitrary

covariance matrix. The parameters, which are the regression coefficients j ’s, the

vignette means j ’s, the cut-point parameters j

k ’s, the independent entries of , and

the independent non-diagonal entries of v are estimated by the maximum likelihood

method using the full information maximum likelihood (FIML), see supplement S1. To

parameterize the BIHOPIT model in this paper we also suppose that the correlation

structure of the self-report and vignette parts are same. We denote the common

correlation coefficient by . The latent equations of the BIHOPIT model can also be

written in the form of standard linear model. Here, for the sake of simplicity, we assume

that 1L , i.e. there is only one vignette. We have the following vector form for the latent

variables:

Page 5: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

5

2

1

2

1

2

1

2

1

2

1

2

1

1000

0100

000

000

i

i

i

i

T

i

T

i

v

i

v

i

i

i

x

x

y

y

y

y

,

where the four-dimensional error term has multivariate normal distribution with mean 0

and covariance matrix:

100

100

00

002

221

21

2

1

.

Summarizing, the BIHOPIT model is a system of coupled bivariate ordered probit

(BIOPROBIT) models with common cut-points, which depend on covariates. One of the

BIOPROBIT models describes the self-report part while the other ones describe the

vignette part.

An alternative approach to analyze multivariate or longitudinal ordered discrete data is

the application of random effect in modeling the latent variables. In this case it is

supposed that there exists an actual level i of respondent i on a continuous, one-

dimensional scale which determines the all responses of this individual. Moreover, the

responses also depend on the covariates. Thus, we assume that the latent variable

ijy can

be expressed as

iji

T

iij xy , (7)

where i is the random effect of the i -th respondent which has Gaussian distribution

with mean 0 and variance 2

and ij is the individual error which has Gaussian

distribution with mean 0 and variance 2

. In the two-dimensional case the linear model

formulation of this model is the following:

2

1

2

1

0

0

ii

ii

T

i

T

i

i

i

x

x

y

y

.

By this vector representation it is clear that the regression coefficients are the same for all

questions which is the main constraint compared to the previous models. The covariance

matrix of the two-dimensional normal error has the form

222

222

.

Page 6: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

6

The latent vignette variables are supposed to have normal distribution with mean j ,

where j and refer to the number of self-report and vignette, respectively, and common

variance 2

v . For identification we suppose that 01 j for all j and 12 v . The discrete

transformations between the observed and the latent variables are also defined by the help

of cut-points, see (4) and (6), which are parametrized as in (3). The system of the self-

report and vignette variables with their latent variables and common cut-points defined

above is called compound hierarchical ordered probit (CHOPIT) model. The CHOPIT

model has been introduced by King et al. []. The parameters of the CHOPIT model are

the regression coefficient , the vignette means j ’s, the cut-point parameters j

k ’s, and

variances 22 , . These parameters are estimated by the maximum likelihood method

approximating numerically the full-information likelihood, see supplement S1 as well.

Finally, we remark that CHOPIT model is a one-dimensional latent model with probit

link function, a so-called level-1 model with normally distributed random effect and

vignettes.

Hypothesis tests and goodness of fit

Hypothesis tests about restrictions on the BIHOPIT model parameters can be derived

using any of the three common procedures: likelihood ratio test, Wald statistic and

Lagrange multiplier statistic. Since the computation for the FIML is straightforward the

likelihood ratio test is a natural choice. The likelihood ratio statistic is

)ln(ln2 01 LL ,

where the subscripts 1 and 0 indicate the values of the log-likelihood computed for the

alternative ( 1H ) and null ( 0H ) hypothesis, respectively. The likelihood ratio statistic

has asymptotically 2 distribution with 01 dimdim HHq degree of freedom

under 0H , where, for the hypothesis :H , Hdim denotes the dimension of the

parameter space .

There are many issues that need to answer using hypothesis testing approach in the

context of BIHOPIT model. These include, among others, tests which investigate

different parts of the model separately, tests for studying the relation of the self-report

and vignette parts of the model, and specification issues of the model. One of the most

important of them is testing uni-dimensionality. In formal terms, uni-dimensionality can

be defined as the assumption that any dependence between the self-report questions is

solely due to a single underlying latent trait. In the context of BIHOPIT model this means

that the latent self-report variables only differ in the extent of random error term and the

systematic parts depending on covariates are equal. We refer to this property as weak uni-

dimensionality. (Note that the strong uni-dimensionality supposes the equality of the

error terms as well.) In mathematical term, weak uni-dimensionality can be formulated in

the following hypothesis system:

211

210

:

:

H

H

Page 7: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

7

We will refer to the null hypothesis 0H as to the (weak) uni-dimensionality hypothesis.

Whenever the uni-dimensionality hypothesis can not be rejected, the predicted scores for

latent self-reports given by unconditional mean will be the same for all coordinates. The

asymptotic distribution of likelihood ratio statistic for testing the uni-dimensionality

hypothesis is 2

p , where p denotes the number of covariates including the intercept.

The correlation coefficient plays a central role in the identification and inference of

the BIHOPIT model. This correlation coefficient can be interpreted as the correlation

between the two unobservable latent self-report variables, and naturally in the pairs of the

latent vignette variables as well. When 0 , the error terms are uncorrelated in (1) and

(5) implying the independence of self-report responses and vignettes. This leads us to the

definition of the following hypothesis system:

0:

0:

1

0

H

H

We will refer to the null hypothesis 0H as to the independence hypothesis. Whenever the

independence hypothesis can not be rejected, the BIHOPIT model can be simplified to

two separate univariate HOPIT models for subdomain questions. It must be remarked that

under 0H the log-likelihood (8) becomes the sum of the log-likelihood functions of two

HOPIT models and the estimation of parameters are given separately. Thus, the log-

likelihood can be easily computed under 0H , while the log-likelihood under the

alternative hypothesis is given by fitting the BIHOPIT model. Finally, the asymptotic of

the likelihood ratio statistic for testing the independence hypothesis is a 2

1 distribution.

We note that similar hypothesis problem is considered for seemingly unrelated bivariate

probit model in Monfardini and Rosalba []. In their model the null hypothesis 0H is

equivalent to the exogeneity of the model, thus they refer to the null hypothesis as

exogeneity hypothesis. A number of procedures are proposed which are likely to be

successfully applied in our case as well.

There are two important questions concerning the relationship of the self-report and the

vignette parts of BIHOPIT model. One may investigate the equivalence of the cut-points

and the equivalence of the correlation structures for the two parts of the model. The

former one belongs to the so-called response consistency assumption, i.e. whether

individuals use the same response scales to rate the vignettes and self-reports. Let us

suppose that the parameters j

k ’s and vj

k that correspond to the self-report and vignette

cut-points are different. Then response consistency can be tested by comparing these cut-

point parameters using the likelihood ratio test. Such tests for null hypothesis like

equality of cut-points or equality of distances between cut-points are proposed in Bago

D’Uva et al. []. The latter one, in the bivariate case, means the comparison of the

correlation coefficients which belong to the self-report and vignette part, respectively. Let

us suppose that self and vig denote the correlation coefficients for the self-report

responses and the vignette responses, respectively. In mathematical term, the equivalence

of correlation structures can be formulated in the following hypothesis system:

Page 8: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

8

vigself

vigself

H

H

:

:

1

0

We will refer to the null hypothesis 0H as to the correlation equivalence hypothesis. If

the correlation equivalence hypothesis is accepted then the BIHOPIT model is adequate

for data. Otherwise, we need to include a bit more complicated model which contains one

extra parameter for the new correlation coefficient. The asymptotic distribution of the

likelihood ratio statistic is again a 2

1 distribution.

Two specification issues can be addressed in the context of BIHOPIT model,

heteroscedasticity and the distributional assumption. Since there are no useful residuals

general approaches such as the Breusch-Pagan test are not available. Hence we must

build heteroscedasticity into the model and test it parametrically. A common approach to

modeling heteroscedasticity in categorical latent choice models is based on Harvey’s

exponential model, see []. In this model it is supposed that the error terms satisfy the

following assumptions:

0),|( iiij zxE , 2exp),|( j

iiiiij zzxVar .

Here iz is a known set of variables that does not include a constant term, and j

i ’s are

new parameter vector to be estimated. Maximization of the log-likelihood function with

respect to all the parameters is a bit more complicated because the function is only locally

concave. The homoscedasticity hypothesis can be tested by investigating the null

hypothesis 0:0 j

iH which can easily be done. The second specification test of interest

concerns the distribution or the link function. In order to solve this problem an

appropriate modification of Silva’s and Vuong’s tests can be a reasonable approach, see

Silva [] and Vuong [].

Assessing goodness-of-fit for ordered categorical data is not obvious because there is no

direct counterpart to the 2R goodness-of-fit statistic and it is a serious challenge to take

into account the order of the possible responses. One can compute the likelihood ratio

index, which is also called

0

2 ln/ln1 LLpseudoR ,

where Lln is the log-likelihood for the estimated model including the constant and 0ln L

is the log-likelihood for a model that only has a constant as parameter. Another way to

evaluate the goodness-of-fit of an ordered discrete statistical model is the prediction

method based on the confusion matrix, which is well-known in the classification problem

of data mining, see Tan et al. [, Section 4.2.]. Define the predicted categorical response as

the one of the possible responses, which is associated with maximum predicted

probability. The confusion matrix is derived as a table containing the counts of

individuals classified according to the true and predicted self-reports, respectively. Then

the goodness-of-fit of a probabilistic model can be evaluated using any performance

metric such as accuracy, which is defined as the fraction of correctly predicted

individuals in the whole population.

Posterior prediction

Page 9: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

9

In latent trait models the prediction of the latent variables for each individuals based on

known categorical responses is given by systematic use of the Bayes’ theorem from

elementary probability theory. Suppose we denote the observed and latent variables by y

and y , respectively. The all posterior prediction is based on the posterior predictive

density, which is defined by

dyypyy

ypyyyyp

)()|Pr(

)()|Pr()|( ,

where )( yp denotes the probability density function of the latent variable y . We should

emphasize that compared to the conventional Bayesian approach, such as the CHOPIT

model, all the parameters in the BIHOPIT model are non-random. The only random

quantities are the observed and latent variables for self-reports and vignettes. There are

three general approaches for predicting the latent variable y individually, which will be

used to characterize the latent health status of an individual. These prediction methods are

unconditionally mean, conditionally mean, and maximum a-posteriori (MAP) prediction.

The unconditional mean is defined by

dyypydyyyypyyEy

)()Pr()|()(

i.e. in this case the predicted value for the latent trait does not depend on the observed

responses. In the context of BIHOPIT model the unconditional mean only depends on the

covariates, and we have for the two latent variables that

11)( T

ii xyE and 22)( T

ii xyE .

In contrast, the conditional mean defined by

dyypyy

dyypyyydyyypyyyE

)()|Pr(

)()|Pr()|()|( (8)

is already influenced by the self-report responses. It is well known that the conditional

mean minimizes the quadratic loss conditionally on the observed responses. Finally,

maximum a-posteriori (MAP) prediction y is defined as the mode of the posterior

predictive density function, i.e.

)|(maxargˆ yypyy

.

One of the main goals of this paper is to propose a reasonable individual prediction

method for the latent health status of respondents, which relies on both self-report and

vignette responses in order to correct heterogeneity in reporting health. Since

unconditional mean only depend on covariates and does not depend on self-report

responses, two respondents with same values of covariates will have the same predicted

health status despite the fact that the subjective feelings regarding their health status is

significantly different. Thus, unconditional mean does not appear to be a suitable method

for individual scoring. On the other hand, the maximum a-posteriori prediction seems to

Page 10: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

10

be too sensitive to the subjective individual responses for self-report questions. Hence the

conditional mean approach, which takes into account the individual responses but not too

sensitive to them is a good choice for individual scoring. In supplement S2 a procedure is

described for calculating the conditional mean (8) in the context of BIHOPIT model.

Results

In the simulation study a population with two hypothetical countries (Country 1 and 2)

and two covariates was generated. One of the covariates was supposed to be continuous

for modeling age and the other one was supposed to be binary categorical for modeling

sex. It was also assumed that all three variables (country, age, and sex) are independent of

each other. Country and age were designed to have significant effect, and sex was

supposed to be non-significant. Two simulations were performed based on CHOPIT and

BIHOPIT scenarios, respectively. This suggests which one is the true model for the

simulation. For the simulation details, see Supplement 3. Then three statistical models

(BIOPROBIT, BIHOPIT, and CHOPIT) were fitted for the two simulated and one real

dataset. These models were compared on various issues based on these datasets.

Analyzing data sets simulated under BIHOPIT scenario For descriptive results, since the population was generated according predetermined

properties of covariates, only mean categorical rating for self-reports and associated

vignettes are shown in Table 1. The proportion of respondents in each response category

for self-reports and vignettes is shown in Figure 1. The mean categorical rating for self-

reports in case of Country 1 is greater than Country 2, thus the respondents from Country

1 rank themselves higher than those from Country 2. However adjusting for the vignette

responses, the ‘real’ difference is the other way around.

The substantive results including the ‘true’ parameters generated by the BIHOPIT model

and the estimated parameters coming from the three fitted models can be found in Table

2. The BIHOPIT estimates are very precisely retained apart from a scale shift in the

vignette means and self-reports constants. Although CHOPIT estimates look more

different, but the differences appear for the covariates which are not significant anyway,

e.g. sex, and in this sense the estimates are still correct. Only the BIOPROBIT estimates

fail to recover the original parameters not only in terms of magnitude but the direction of

the effect as well. Age becomes falsely positive and country rankings get swapped. See

the cut-point estimates in Table 3. Figure 2 and 3 show the latent score estimates versus

the ‘true’ latent scores. Data points fall near the main diagonal in both figures, scattered a

bit less for BIHOPIT which is not surprising. Cut-points figures can be found in the

Supplement.

Analyzing data sets simulated under CHOPIT with random effect scenario Mean categorical rating for self-reports and associated vignettes are shown in Table 4.

The proportion of respondents in each response category for self-reports and vignettes are

shown in Figure 4. Due to careful planning descriptives are similar to the BIHOPIT ones.

The substantive results including the ‘true’ parameters generated by the CHOPIT model

and the estimated parameters coming from the three fitted models can be found in Table

5. As before here also CHOPIT and BIHOPIT estimates are very close to what expected

apart from a similar parameter shift in vignette means and self-reports constants due to

non-unique identification. Similarly, the BIOPROBIT fails to find the true estimates and

Page 11: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

11

same shifts occur as before. The cut-point estimates can be found in Table 6. Regarding

latent score recovery surprisingly BIHOPIT behaves slightly better see Figure 5 and 6.

Our guess is that CHOPIT could work better on multiple items in identifying the random

effect part but having only two items BIHOPIT becomes more precise. Cut-points figures

can be found in the Supplement.

Analyzing the SAGE dataset The real data analysis was performed using the SAGE data (Study on Global Ageing and

Adult Health). SAGE is the WHO's longitudinal panel study on health and health related

outcomes focusing on the population aged 50 years and older in China, Ghana, India,

Mexico, Russian Federation and South Africa.

The health state item pool consists of 16 health related questions, where responses were

recorded on a five point category scale from “no difficulty or problem” to “extreme

difficulty/inability”. These 16 items belong to 8 health domains: vision, mobility, self

care, cognition, interpersonal activities, pain and discomfort, sleep and energy, and affect.

In our example China and India is compared on the domain of cognition, item1 and item2

being difficulties with remembering things and learning a new task, respectively.

Looking at the cognition stackbar of the two countries (Fig. 7), one can see that although

the self responses show worse level of cognition for India, their vignette evaluation is

also more critical and accounting for this we expect less real difference between the two

countries, see mean vignette rankings in Table 7. This is exactly what our parameter

estimates tell us. In Table 8, parameters for the unadjusted and adjusted model differ

only in the country indicator, showing smaller adjusted values for both item1 and item2.

Two sample u-test show respective statistical significance. (This outcome is also due to

the invariance of the vignette responses by the remaining covariates which is not shown

here.)

In Table 10 one can see that the average BIOPROBIT scores of China by age categories

are always greater than the corresponding posterior BIHOPIT scores of India which

verifies that the unadjusted ranking of China is greater than India. In contrast, if we

consider the average of adjusted posterior BIHOPIT scores then we could see that in age

categories 64, 73, 76, 77, and 79 the difference is reversed, e.g., for age 76 the

BIOPROBIT score is greater in China than India by 0.354, but the posterior BIHOPIT

score is less in China than in India by 0.371. Thus, in fact, people in India are healthier

than in China in this category which is far from obvious by the self-report answers.

Discussion

The computational time of BIHOPIT and CHOPIT models was compared by running all

programs on a PC with Intel Core2 CPU6300 1.86 GHz processor and 2GB RAM for

1000 simulated respondent. The running time of BIHOPIT model was 43 second while

this time was 20 199 second for CHOPIT model under BIHOPIT scenario. In case of

CHOPIT scenario the running times were 39 second and 14 384 second for BIHOPIT and

CHOPIT fitting, respectively. Thus, the BIHOPIT code is approximately 400 times faster

than the CHOPIT code allowing the model could run on larger data sets. For example, the

running time for SAGE dataset only with 6 446 respondents was 167 621 second using

the CHOPIT code clearly showing the extra computational demand.

Page 12: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

12

References

1. Bago D’Uva T (2005) Latent class models for use of primary care: evidence from

a British panel. Health Econ. 14: 873-892.

2. Bago D’Uva T, van Doorslaer E, Lindeboom M, O’Donnell O (2008) Does

reporting heterogeneity bias the measurement of health disparities? Health Econ.

17: 351-375.

3. Bago D’Uva T, Lindeboom M, O’Donnell O, van Doorslaer E (2009) Slipping

anchor? Testing the vignettes approach to identification and correction of

reporting heterogeneity. Available at

http://www.york.ac.uk/res/herc/documents/wp/09_30.pdf via the Internet.

Accessed 17 Jan 2011.

4. Butler J, Chatterjee J (1995) Pet econometrics: Ownership of cats and dogs.

Working Paper 95-WP1, Department of Economics, Vanderbilt University.

5. Butler J, Chatterjee J (1997) Tests of the specification of univariate and bivariate

ordered probit. Review of Economics and Statistics 79: 343-347.

6. Butler J, Finegan T, Siegfried J (1998) Does more calculus improve student

learning in intermediate micro- and macroeconomic theory? Journal of Applied

Econometrics 13(2): 185-202.

7. Calhoun, C (1989) Estimating the distribution of desired family size and excess

fertility. The Journal of Human Resources 24(4): 709–724.

8. Calhoun, C (1991) Desired and excess fertility in Europe and the United States:

indirect estimates from World Fertility Survey Data. European Journal of

Population 7: 29-57.

9. Gould W, Pitblado J, Poi B (2010) Maximum Likelihood Estimation with Stata.

4th Edition. Stata Press. 352 p.

10. Greene WH (2003) Econometric analyses. 5th Edition. New Jersey: Pearson

Education. 1026 p.

11. Greene WH (2009) Discrete choice modeling. Palgrave handbook of

econometrics. Vol. 2. Applied econometrics. Edited by Terence C. Mills and

Kerry Patterson. Palgrave Macmillan. 1128 p.

12. Greene WH, Harris NM, Hollingworth B, Maitra P. A bivariate latent class

correlated generalized ordered probit model with an application to modeling

observed obesity levels (April 2008). NYU Working Paper No. EC-08-18.

Available at SSRN: http://ssrn.com/abstract=1281910. Accessed 14 Jan 2011.

13. Harvey A (1976) Estimating regression models with multiplicative

heteroscedasticity. Econometrica 44: 461-465.

14. Hopkins DJ, King G (2010) Improving anchoring vignettes: Designing surveys to

correct interpersonal incomparability. Public Opinion Quarterly 74(2): 201-222.

15. Kakwani N, Wagstaff A, van Doorslaer E (1997) Socioeconomic inequalities in

health: Measurement, computation, and statistical inference. Journal of

Econometrics 77: 87-103.

Page 13: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

13

16. Kapteyn A, Smith J, van Soest A. (2007) Vignettes and self-reports of work

disability in the US and the Netherlands. American Economic Review 97(1): 461–

473.

17. King G, Murray CJL, Salomon J, Tandon A (2004) Enhancing the validity and

cross-cultural comparability of measurement in survey research. American

Political Science Review 98(1): 184-191.

18. Kohler HP, Rodgers JL (1999) DF-like analyses of binary, ordered, and censored

variables using probit and tobit approaches. Behavior Genetics 29(4): 221-232.

19. Kristensen N, Johansson E (2008) New evidence on cross-country differences in

job satisfaction using anchoring vignettes. Labour Economics 15: 96-117.

20. Laha RG, Rohatgi VK (1979) Probability theory. New York: Wiley. 557 p.

21. Lauer Ch (2003) Family background, cohort and education: A French-German

comparison based on a multivariate ordered probit model of educational

attainment. Labour Economics 10: 231-251.

22. Magee L, Burbidge J, Robb L (2000) The correlation between husband’s and

wife’s education: Canada 1971-1996. Social and Economic Dimensions of an

Aging Population Research Papers, 24, McMaster University.

23. Monfardini Ch, Rosalba R (2008) Testing exogeneity in the bivariate probit

model: A Monte Carlo study. Oxford Bulletin of Economics and Statistics 70(2):

271-282.

24. Murphy A (2007) Score tests of normality in bivariate probit models. Economics

Letters 95: 374-379.

25. Pudney S, Shields M (2000) Gender, race, pay and promotion in the British

nursing profession: Estimation of a generalized ordered probit model. Journal of

Applied Econometrics 15(4): 367-399.

26. Sajaia Z (2008) Maximum likelihood estimation of a bivariate ordered probit

model: implementation and Monte Carlo simulations. Available:

http://www.adeptanalytics.org/download/ado/bioprobit/bioprobit.pdf via the

Internet. Accessed 12 Jan 2011.

27. Sajaia Z (2008) BIOPROBIT: module for bivariate ordered probit regression. The

World Bank. Available: http://fmwww.bc.edu/RePEc/bocode/b via the Internet.

Accessed 13 Jan 2011.

28. Salomon J, Tandon A, Murray CJL, World Health Survey Pilot Study

Collaborating Group (2004) Comparability of self-rated health: cross sectional

multi-country survey using anchoring vignettes. British Medical Journal 328: 258-

263.

29. Scott DM, Axhausen KW (2006) Household mobility tool ownership: modeling

interactions between cars and season tickets. Transportation from Springer 33(4):

311-328.

30. Silva J (2001) A score test for non-nested hypotheses with applications to discrete

response models. Journal of Applied Econometrics 16(5): 577-598.

31. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Boston:

Pearson Education. 769 p.

32. Tandon A, Murray CJL, Salomon JA, King G (2003) Statistical models for

enhancing cross-population comparability. In: Murray CJL, Evans DB editors.

Page 14: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

14

Health systems performance assessment: debates, methods and empiricisms.

Geneva: World Health Organization. pp. 727-746.

33. Terza JV (1985) Ordinal probit: a generalization. Communications in Statistics

14(1): 1–11.

34. Tobias J, Li M (2006) Calculus attainment and grades received in intermediate

economic theory. Journal of Applied Econometrics 21(6): 893-896.

35. Vuong Q (1989) Likelihood ratio tests for model selection and non-nested

hypotheses. Econometrica 57: 307–334.

36. Weiss AA (1993) A bivariate ordered probit model with truncation: helmet use an

motorcycle injuries. Applied Statistics 42(3): 487-499.

37. Wright RA (1995) BIVOPROB: Computer program for maximum-likelihood

estimation of bivariate ordered-probit models for censored data, Version 11.92. by

Charles A. Calhoun. The Economic Journal 105(430): 786-787.

Figure Legends

Figure 1. Distribution of the responses obtained for self-reports and vignettes under a

BIHOPIT scenario.

Figure Legend 1. The data set consists of 1,000 observations from a randomly simulated

population of two countries (482 respondents for first country and 512 respondents for

second country). The stackbars show the distribution of their answers for self-report

(Self) and 5 vignette (1-5) questions.

Figure 2. Bihopit prediction for simulated dataset under a BIHOPIT scenario.

Page 15: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

15

Figure Legend 2. The scores which are given by conditional mean posterior estimation

are plotted against the true latent scores of BIHOPIT model for the two questions

(domain A and B). The two countries are plotted using different colors for markers.

Figure 3. Chopit prediction for simulated dataset under a BIHOPIT scenario.

Figure Legend 3. The scores which are given by chopit posterior estimation using the

random effect CHOPIT model are plotted against the true latent scores of BIHOPIT

model for the two questions (domain A and B). In the posterior scores estimation

parameter randomization was used with Monte Carlo simulation of sample size 30 and

averaging. The two countries are plotted using different colors for markers.

Figure 4. Distribution of the responses obtained for self-reports and vignettes under a

CHOPIT scenario.

Page 16: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

16

Figure Legend 4. The data set consists of 1,000 observations from a randomly simulated

population of two countries (496 respondents for first country and 504 respondents for

second country). The stackbars show the distribution of their answer for self-report (Self)

and 5 vignette (1-5) questions.

Figure 5. Bihopit prediction for simulated dataset under a CHOPIT scenario.

Figure Legend 5. The scores which are given by conditional mean posterior estimation

are plotted against the true latent scores of CHOPIT model for the two questions (domain

A and B). The two countries are plotted using different colors for markers.

Figure 6. Chopit prediction for simulated dataset under a CHOPIT scenario.

Page 17: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

17

Figure Legend 3. The scores which are given by chopit posterior estimation using the

random effect CHOPIT model are plotted against the true latent scores of CHOPIT model

for the two questions (domain A and B). In the posterior scores estimation parameter

randomization was used with Monte Carlo simulation of sample size 30 and averaging.

The two countries are plotted using different colors for markers.

Figure 7. Distribution of the responses obtained for self-reports and vignettes for

cognition domain of SAGE dataset.

Figure Legend 7. The data set consists of 3,664 respondents from China and 2,782

respondents from India. The stackbars show the distribution of their answers for self-

report (Self) and 5 vignette (1-5) questions.

Figure 8. Comparing the BIOPROBIT and BIHOPIT models on SAGE dataset for China

and India.

Page 18: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

18

Figure Legend 8. The figure compares China and India using scores obtained by

BIOPROBIT and posterior BIHOPIT models. China is plotted by circle marker, empty

circles are for India, while magenta denotes BIOPROBIT scores, and blue denotes

posterior BIHOPIT scores obtained by conditional means. The scores were given by

averaging the two subdomain scores for each model. For comparison point of view

BIOPROBIT scores were rescaled by shifting them with the average of self-report

constant parameters of BIHOPIT model which is (4.0664+3.9505)/2=4.00845.

Figure 9. Cut-points for BIOPROBIT and BIHOPIT models for remembering and

learning questions in SAGE dataset.

Page 19: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

19

Figure Legend 8. The fix axis denotes the BIOPROBIT model, where the cut-points are

constants, i.e., they do not depend on countries. On the axis of China and India the

average of cut-point estimations are plotted.

Tables

Table 1. Mean categorical responses for self-reports and vignettes under BIHOPIT

scenario.

Domain A Domain B

Country Self Vig1 Vig2 Vig3 Vig4 Vig5 Self Vig1 Vig2 Vig3 Vig4 Vig5

1 3.16 4.08 3.13 2.19 1.46 1.14 3.35 4.99 4.93 4.36 3.2 1.8

2 2.52 1.6 1.21 1.02 1 1 3.17 4.89 4.3 3 1.6 1.11

Table 2. Estimation results based on BIHOPIT simulation: BIOPROBIT, BIHOPIT, and

CHOPIT models.

Variable True

BIHOPIT

Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT

param. std. err. param. std. err. param. std. err.

vigA 1 2 3.061 0.104 3.285 0.063

2 1.2 2.243 0.1 2.687 0.059

3 0.4 1.448 0.101 1.901 0.055

4 -0.4 0.669 0.107 0.975 0.055

selfA age -0.02 0.026 0.001 -0.0196 0.001 -0.022 0.002

sex 0.1 0.222 0.067 0.037** 0.049 -0.022** 0.057

country 2 -0.581 0.068 1.968 0.062 1.091 0.058

Page 20: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

20

cons 2 3.045 0.119 2.757 0.102

vigB 1 3 4.046 0.104

2 2 3.003 0.074

3 1 1.985 0.064

4 0 0.987 0.063

selfB age -0.02 -0.073 0.002 -0.02 0.001

sex -0.1 -0.437 0.07 -0.094* 0.041

country 1 -0.201 0.07 1.072 0.045

cons 1 1.948 0.082

seA 0.5 0.46 0.040 1.023

seB 0.2 0.2 0.037 5.48E-6

corr 0.7 0.652 0.023 0.69 0.019

** denotes the parameters which are not significant at level 95%, * denotes the

parameters which are not significant at level 99% but they are significant at level 95%

Table 3. Cutpoint estimation based on BIHOPIT simulation: BIOPROBIT, BIHOPIT,

and CHOPIT models.

Variable True

BIHOPIT

Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT

param. std. err. param. std. err. param. std. err.

cutA1 age -0.03 -0.029 0.001 -0.012 0.0

sex -0.005 -0.066** 0.047 -0.034** .032

country 2.2 2.177 0.062 1.388 0.0359

cons 2 0.139 0.103 3.046 0.117 2.48 0.073

cutA2 age -0.05 -0.056 0.002 -0.022 0.002

sex -0.01 -0.055** 0.045 0.005** 0.067

country 0.3 0.352 0.046 0.066** 0.074

cons 0.8 0.936 0.107 0.957 0.088 -0.046** 0.114

cutA3 age 0.01 0.014 0.002 -0.001** 0.002

sex 0.02 0.029** 0.058 0.139* 0.069

country 0.2 -0.087** 0.059 -0.347 0.074

cons -2 1.391 0.113 -2.113 0.106 -1.395 0.125

cutA4 age 0.05 0.045 0.002 0.013 0.002

sex -0.005 -0.087** 0.047 -0.069** 0.055

country 0.2 0.296 0.049 0.06** 0.06

cons -4 2.15 0.12 -3.652 0.113 -1.529 0.1

cutB1 age -0.005 -0.005 0.001

sex 0.005 -0.012** 0.041

country 1.1 1.156 0.0452

cons -0.2 -5.614 0.19 0.767 0.077 -1.557 0.042

cutB2 age 0.01 0.009 0.002

sex -0.01 -0.011** 0.058

country -0.2 -0.169 0.057

cons -2.1 -4.61 0.166 -2.062 0.145 -0.539 0.079

cutB3 age -0.02 -0.016 0.001

sex 0.02 0.078* 0.039

country -0.2 -0.144 0.038

cons -0.5 -3.5 0.14 -0.698 0.081 0.45 0.079

cutB4 age 0.005 0.005 0.002

sex -0.005 -0.113 0.036

Page 21: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

21

country 0.1 0.087* 0.035

cons -1.4 -1.971 0.121 -1.346 0.081 0.127* 0.063

** denotes the parameters which are not significant at level 95%, * denotes the

parameters which are not significant at level 99% but they are significant at level 95%

Table 4. Mean categorical responses for self-reports and vignettes under CHOPIT

scenario.

Domain A Domain B

Country Self Vig1 Vig2 Vig3 Vig4 Vig5 Self Vig1 Vig2 Vig3 Vig4 Vig5

1 3.41 4.58 3.69 2.71 1.65 1.24 3.99 4.7 3.93 2.83 1.86 1.25

2 2.65 2.59 1.66 1.18 1.01 1 3.13 2.8 1.76 1.21 1.03 1

Table 5. Estimation results based on CHOPIT simulation: BIOPROBIT, BIHOPIT, and

CHOPIT models.

Variable True CHOPIT Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT

parameter std. err. parameter std. err. parameter std. err.

vigA 1 2.5 3.379 0.102 3.501 0.074

2 1.6 2.486 0.097 2.578 0.07

3 0.7 1.7 0.096 1.743 0.069

4 -0.2 0.753 0.102 0.827 0.073

selfA age -0.02 0.028 0.002 -0.022 0.001 -0.02 0.001

sex 0.1 0.144* 0.067 0.049** 0.044 0.069** 0.037

country 1.5 -0.719 0.069 1.7 0.05 1.56 0.042

cons 2 3.001 0.114 3.016 0.088

vigB 1 2.5 3.367 0.1

2 1.6 2.478 0.095

3 0.7 1.673 0.094

4 -0.2 .859 0.098

selfB age -0.02 0.019 0.002 -.019 0.001

sex 0.1 0.262 0.069 .073** 0.044

country 1.5 -0.776 0.07 1.509 0.05

cons 2 2.884 0.109

seA 0.2 0.283 0.037 0.197

seB 0.3 0.3 0.04 0.279

corr 0.666 0.022 0.509 0.017

** denotes the parameters which are not significant at level 95%, * denotes the

parameters which are not significant at level 99% but they are significant at level 95%

Table 6. Cutpoint estimation based on CHOPIT simulation: BIOPROBIT, BIHOPIT, and

CHOPIT models.

Variable True

BIHOPIT

Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT

param. std. err. param. std. err. param. std. err.

cutA1 age -0.03 -0.03 0.001 -0.03 0.0

sex -0.005 -0.04** 0.043 -0.03** 0.033

country 1.7 1.7 0.051 1.742 0.037

Page 22: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

22

cons 2 -0.22 0.105 3 0.114 3.026 0.084

cutA2 age -0.05 -0.048 0.002 -0.051 0.003

sex 0.01 0.151 0.042 0.06** 0.064

country 0.4 0.438 0.044 0.479 0.065

cons 0.5 0.771 0.109 0.245 0.083 0.474 0.11

cutA3 age 0.01 0.018 0.002 0.013 0.002

sex -0.02 0.020** 0.067 -0.01** 0.066

country 0.2 0.080** 0.069 0.16 0.068

cons -2.5 1.124 0.113 -2.977 0.121 -2.609 0.118

cutA4 age 0.05 0.056 0.002 0.05** 0.001

sex 0.005 0.097* 0.039 0.029 0.049

country 0.2 0.207 0.043 0.114* 0.052

cons -3.5 2.298 0.122 -4.042 0.108 -3.543 0.107

cutB1 age -0.028 0.001

sex -0.035** 0.043

country 1.645 0.05

cons -0.837 0.107 2.81 0.109 -0.103 0.016

cutB2 age -0.047 0.003

sex 0.078** 0.054

country 0.633 0.058

cons -.129 0.106 -0.325 0.105 -0.523 0.06

cutB3 age 0.011 0.002

sex 0.055** 0.049

country 0.235 0.052

cons 0.471 0.111 -2.164 0.088 0.587 0.07

cutB4 age 0.049 0.001

sex -0.116* 0.046

country 0.119* 0.046

cons 1.071 0.113 -4.256 0.103 -0.831 0.051

** denotes the parameters which are not significant at level 95%, * denotes the

parameters which are not significant at level 99% but they are significant at level 95%

Table 7. Mean categorical responses for self-reports and vignettes in cognition domain of

SAGE dataset.

Remembering Learning

Country Self Vig1 Vig2 Vig3 Vig4 Vig5 Self Vig1 Vig2 Vig3 Vig4 Vig5

China 4.42 4.87 4.39 3.59 3.08 1.8 4.22 4.87 4.46 3.44 3.02 1.7

India 4.09 4 3.73 3.27 2.81 2.85 3.92 3.92 3.67 3.1 2.68 2.71

Table 8. Estimation results for SAGE dataset: BIOPROBIT, CHOPIT, and BIHOPIT

models.

Variable Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT

param. std. err. param. std. err. |u|-value param. std. err.

vigA 1 2.692 0.0222 2.537 0.0158

2 1.998 0.0206 1.935 0.0147

3 1.235 0.0196 1.127 0.0139

4 0.760 0.0192 0.701 0.0135

selfA age -0.0277 0.00057 -0.0292 0.00075 1.592 -0.0293 0.00114

sex -0.1537 0.0154 -0.1452 0.0204 0.332 -0.1254 0.03086

Page 23: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

23

educ 0.1579 0.005 0.1724 0.0066 1.751 0.1719 0.01003

country -0.6309 0.0162 -0.4522 0.0215 6.638 -0.2553 0.03229

cons 4.0664 0.0715 3.7522 0.10418

vigB 1 2.727 0.0224

2 2.111 0.0208

3 1.199 0.0195

4 0.803 0.0192

selfB age -0.0294 0.00056 -0.032 0.00076 2.754

sex -0.1311 0.015 -0.1127 0.0205 0.724

educ 0.1794 0.0049 0.1877 0.0067 0.999

country -0.5083 0.0158 -0.3173 0.0215 7.159

cons 3.9505 0.0715

seA 1.0982 0.0084 0.5601

seB 1.1588 0.0079 0.9242

corr 0.7749 0.0036 0.8504 0.0057

** denotes the parameters which are not significant at level 95%, * denotes the parametes

which are not significant at level 99% but they are significant at level 95%

Table 9. Cutpoint estimation for SAGE dataset: BIOPROBIT, BIHOPIT, and CHOPIT

models.

Variable Est. BIOPROBIT Est. BIHOPIT Est. CHOPIT

param. std. err. param. std. err. param. std. err.

cutA1 age 0.00098** 0.00077 -0.00034** 0.0006

sex 0.01737** 0.02245 0.04937 0.0166

educ 0.01227* 0.00705 0.02485 0.0053

country 0.26072 0.02416 -0.0862 0.01835

cons -4.681 0.0627 -0.89106 0.07153 -0.7713 0.0551

cutA2 age -0.00008** 0.00063 0.0011** 0.0005

sex 0.0227** 0.01988 0.0013** 0.0159

educ -0.00487** 0.00607 -0.0169 0.005

country 0.21691 0.02208 0.4715 0.0177

cons -3.622 0.0548 -0.09203** 0.06133 -0.2837 0.0523

cutA3 age 0.00007** 0.00053 0.00095** 0.0004

sex -0.05411 0.01558 -0.0513 0.0139

educ 0.00584** 0.00493 -0.0029** 0.0044

country -0.18954 0.01671 0.046958 0.0144

cons -2.823 0.053 0.01456** 0.04984 -0.1409 0.0459

cutA4 age 0.00159 0.00046 0.0013 0.0005

sex -0.01338** 0.01264 -0.0335 0.0131

educ -0.01247 0.00403 -0.0219 0.0042

country -0.165 0.01356 -0.0656 0.0139

cons -1.85 0.0518 0.02177** 0.04172 0.0189** 0.0433

cutB1 age 0.00267 0.00075

sex 0.05574 0.02095

educ 0.0154* 0.00688

country 0.44305 0.02138

cons -4.098 0.0557 -0.9508 0.07027 0.1886 0.01517

cutB2 age -0.0017* 0.0007

sex -0.03334** 0.01963

educ -0.0199 0.00669

country 0.09922 0.0199

Page 24: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

24

cons -3.276 0.0526 0.11744** 0.0666 -0.0169** 0.0143

cutB3 age 0.00009** 0.00056

sex 0.00486** 0.01515

educ 0.00908** 0.00501

country -0.21291 0.01552

cons -2.541 0.0512 -0.0805** 0.0515 0.0129** 0.0128

cutB4 age -0.0002** 0.0005

sex -0.04039 0.01329

educ -0.0179 0.00436

country -0.24242 0.01413

cons -1.634 0.0504 0.12725 0.04467 -0.1163 0.012

** denotes the parameters which are not significant at level 95%, * denotes the

parameters which are not significant at level 99% but they are significant at level 95%

Table 10. The BIOPROBIT, BIHOPIT, and posterior BIHOPIT scores for China and

India in age category greater or equal than 50.

Age China India

Obs Bioprob Bihopit Post. Bih Obs Bioprob Bihopit Post. Bih

50 111 2.815 2.764 2.669 113 1.987 2.104 2.141

51 138 2.747 2.692 2.548 70 2.014 2.133 2.103

52 156 2.726 2.668 2.695 90 1.999 2.117 2.07

53 135 2.686 2.626 2.513 55 1.97 2.085 2.121

54 150 2.594 2.528 2.468 58 1.898 2.007 2.017

55 172 2.565 2.497 2.506 141 1.81 1.915 1.896

56 141 2.548 2.477 2.637 48 1.947 2.06 1.807

57 165 2.524 2.453 2.413 36 1.859 1.964 2.101

58 123 2.453 2.377 2.524 74 1.812 1.915 2.094

59 122 2.429 2.353 2.514 34 1.828 1.932 1.966

60 153 2.428 2.349 2.290 137 1.661 1.754 1.732

61 96 2.415 2.336 2.303 20 1.912 2.02 1.729

62 85 2.327 2.243 2.18 41 1.676 1.769 1.711

63 128 2.315 2.231 2.228 52 1.65 1.742 1.839

64 77 2.307 2.221 2.069 35 1.655 1.747 2.088

65 94 2.237 2.145 2.075 152 1.54 1.624 1.734

66 78 2.278 2.188 2.082 23 1.641 1.728 2.058

67 100 2.221 2.128 1.988 21 1.668 1.758 1.819

68 75 2.208 2.115 2.174 52 1.478 1.554 1.639

69 87 2.119 2.019 1.924 25 1.517 1.595 1.806

70 86 2.059 1.956 2.055 90 1.364 1.435 1.58

71 71 2.073 1.969 1.992 9 1.393 1.465 1.374

72 96 2.043 1.938 1.932 22 1.498 1.577 1.218

73 70 1.983 1.873 1.608 25 1.418 1.491 1.837

74 74 1.945 1.833 1.827 22 1.361 1.427 1.194

75 79 1.961 1.847 1.866 45 1.248 1.308 1.164

76 55 1.775 1.652 1.668 4 1.422 1.485 2.039

77 56 1.815 1.694 1.772 3 1.716 1.799 1.818

78 43 1.807 1.682 1.862 15 1.179 1.234 1.135

79 43 1.729 1.601 1.766 5 1.389 1.459 1.952

80 27 1.715 1.587 1.274 29 1.008 1.053 1.161

81 34 1.633 1.5 1.308 2 1.068 1.107 0.177

Page 25: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

25

82 27 1.541 1.401 1.263 4 1.046 1.089 1.162

83 17 1.57 1.431 1.391 7 1.034 1.08 0.668

84 9 1.576 1.438 1.267 4 1.116 1.163 1.093

85 4 1.867 1.742 2.299 14 0.85 0.889 0.949

86 11 1.701 1.571 1.19 3 1.075 1.121 0.78

87 11 1.463 1.321 1.683 2 0.67 0.704 1.322

88 8 1.719 1.583 1.822 3 0.81 0.854 0.403

89 8 1.507 1.362 1.311 4 0.613 0.643 -0.333

90 3 1.154 0.997 0.871 5 0.642 0.664 0.484

Supporting Information

Text S1

Maximum likelihood estimation in bivariate models

In this supplement we describe the full-information maximum likelihood (FIML)

estimation of the parameters of bivariate models applied in this paper. Define the event

that the answers of the i -th respondent are k and for the two self-report questions, i.e.,

),(),( 2

2

2

1

1

1

1

121

ikikii

i

k yyykyE .

If the error terms ),0(~ 2

11 N and ),0(~ 2

22 N in (1) and they are correlated with

correlation coefficient then the probability of this event is given by

,,,,,

,,,,)Pr(

2

2

1

1

1

1

12

2

2

2

1

1

1

1

2

2

2

2

1

1

1

12

2

2

2

1

1

1

2

i

T

i

T

ik

T

i

T

ik

T

i

T

ik

T

i

T

iki

k

xxxx

xxxxE

where 2 denotes the cumulative density function of the standard bivariate normal

distribution with correlation coefficient . The log-likelihood of the BIOPROBIT

model is:

N

i

K

k

i

kiiself EykyIL1 1,

21 )Pr(ln),(ln

(8)

under the restriction that 121 , where )(EI denotes the indicator function of an

event E. Thus, the parameters of the BIOPROBIT model are 1,,1,,,, 21

21 Kkkk

and .

The maximum likelihood estimation in the BIHOPIT model is much more complicated

but a straight generalization of the estimation in the BIOPROBIT model. In order to

Page 26: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

26

incorporate information on vignette ratings and the two self-report questions, there are

two components to the log-likelihood: the first component refers to estimation of cut-

points using responses to vignettes, and the second component utilizes responses on the

self-report questions. In this case the cut-points depend on the covariates thus define the

event i

kE as before replacing the fix cut-points by individual dependent ones 2

,

1

, , kiki .

Then the self-report part of the log-likelihood is defined by (8). Moreover, define the

event that the responses of the i -th respondent are j and k for the -th pair of vignette,

i.e.,

),(),( 2

,2

2

1,

1

,1

1

1,21 ki

v

ikiji

v

iji

v

i

v

i

i

jk yykyjyF

.

Then the probability of this event is given by

,,,,

,,,,)Pr(

2

2

1,1

1

1,22

2

1,1

1

,2

2

2

,1

1

1,22

2

,1

1

,2

kijikiji

kijikiji

i

jkF

The vignette part of the log-likelihood is defined by

N

i

L K

kj

i

jk

v

i

v

ivig FkyjyIL1 1 1,

21 )Pr(ln),(ln

.

The overall log-likelihood is given by

vigself LLL lnlnln . (8)

Here we remark that in the definition of the self-report part of the log-likelihood we drop

the constraint made for 21, in the BIOPROBIT case. Thus, the parameters of the

BIHOPIT model are the self-report regression coefficients 21, , the vignette means

,,,2,1,, 21 L the cut-point coefficients 1,,1,, 21 Kkkk , the self-report

variances 21, and the common correlation coefficient .

The maximum likelihood estimation for CHOPIT model is derived in King et al. []. Here

we note that the likelihood for the self-report part is defined by

dxx

L

kyIN

i

K

k

M

j

T

i

j

ki

T

i

j

ki

self

ij

1

)(

1 1 1

1,,

and for the vignette part it is defined by

)(

1 1 1 1

1,,

kyIN

i

M

j

K

k

L

j

j

kij

j

kivig

ij

L

.

Page 27: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

27

Since the self-report part of the likelihood involves one-dimensional integrals we can not

maximize it in the standard way. One approach is to approximate this integrals by Gauss-

Hermite quadrature, the other one is the method of maximum simulated likelihood

(MSL), see Greene [, Section 21.5.1].

These log-likelihoods are maximized by using the Stata's modified Newton-Raphson

(NR) procedure. Let denote the parameter vector of the all parameters in the models,

and denote the gradient and the Hessian by g and H . This procedure can be summarized

as follows.

1. Start with a guess 0 .

2. Calculate a direction vector )()( 1

ii gHd for i -th iteration.

3. Calculate a new guess dii 1 , where is a scalar defined by the

following algorithm

a. Start with 1 .

b. If )()( ii d then try 2 . If )()2( dd ii then try

3 and so on.

c. If )()( ii d then back up and try 5.0 . If )()5.0( ii d

then back up and try 25.0 and so on.

4. Go to step 2 and repeat.

If the Hessian is not invertable then the modified Marquardt algorithm is applied, see

Gould et al. []. In the computation of the gradient and Hessian of the log-likelihoods we

need the first and second derivatives of the cumulative density function 2 of the

standard bivariate normal distribution defined by

1 2

21212212 ),,(:),,(

dxdxxx ,

where 2 denotes the bivariate standard normal probability density with correlation

coefficient defined by

)1(2

2exp

12

1),,(

2

2

221

2

1

2212

xxxxxx .

By easy algebra we have

duux

uxxx

1

2

2212

1)(),,(

,

where and denote the cumulative and probability density function of the standard

normal distribution, respectively. Thus, we have for the first derivatives of 2 that

2

121

1

2

1)(

xxx

x,

2

212

2

2

1)(

xxx

x, ),,( 212

2

xx

.

Page 28: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

28

For the second derivatives of 2 we obtain, see e.g. Weiss [, Appendix A],

),,(1

)( 2122

12112

1

2

2

xx

xxxx

x

,

),,(1

)( 2122

21222

2

2

2

xx

xxxx

x

,

),,( 212

21

2

2

xxxx

,

2

12212

1

2

2

1),,(

xxxx

x,

2

21212

2

2

2

1),,(

xxxx

x,

22

2

2

2

121

2

22122

2

2

)1(

)()1(

1),,(

xxxxxx .

Text S2

Posterior prediction by conditional means

In this supplement a formula is derived for computing the conditional mean (7) in the

context of BIHOPIT model. We define the conditional means of the two latent variables

21 , yy with respect to the observed responses 21, yy as conditional expectations

),|( 211 yyyE and ),|( 212 yyyE . By definition of the conditional expectation these are

RK 2

,,1 functions. The general formula for conditional expectation is

)(

))(()|(

AP

AIEAE

,

where is a random variable and A is an event, see formula (6.1.3) in Laha and Rohatgi

[]. Let us apply this formula with the choice 1: y (or

2y ) and 21 ,: ykyA .

Using latent variables the event A can be expressed in the

form 2

2

2

1

1

1

1

1 ,

yyA kk . For the sake of simplicity we suppose that

21 ,YY has standard bivariate normal distribution with correlation coefficient . Then

the conditional expectation ),|( 211 yyyE can be expressed as the ratio BA / , where

1

11

2

21

212121 ),,(:k

k

dxdxxxxA

and

1

11

2

21

2121221 ),,(),Pr(:k

k

dxdxxxykyB

.

Define the function as

Page 29: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

29

1 2

21212121 ),,(),,(

dxdxxxx .

By easy algebra we have

dxx

x

1

2

221

1)(),,(

.

By integrating by parts we obtain

2

2122

21121

1)(

1)(),,(

,

and the numerator A can be computed as

),,(),,(),,(),,( 2

1

1

1

2

1

121

1

21 kkkkA .

The denominator B is given by similar formula

),,(),,(),,(),,( 2

1

1

12

2

1

1

2

21

12

21

2 kkkkB .

In the general case ),(~ 2

111 TxNy and ),(~ 2

222 TxNy with correlation coefficient

. By standardization the conditional means can be expressed as

C

AxykyyE T 1

11211 ),|( and C

AxykyyE T 2

22212 ),|( ,

where 1A and 2A are computed similarly to A replacing the cut-points by their

standardized ones i

i

Ti

ki

k

x

~ , 2,1i , and interchanging the two kinds of cut-points

21, for 2A . Finally, C is defined similarly to B using again the standardized cut-

points in the formula.

Text S3

Simulation study design

Three independent covariates (age, sex, and country) were generated by a simple Monte

Carlo using uniform pseudo-random numbers. It was supposed that sex and country have

Bernoulli distribution with mean 0.5. The continuous covariate age was generated as a

truncated normal random number in the interval (18,100) with mean 35 and standard

deviation 25. More precisely, the values of age are rounded to an integer.

The latent variables of the simulated CHOPIT model were generated by linear equation

(7), where the random effect and the error term are simulated independently and the

covariates are age, sex, and country with appropriate parameters. The latent vignette

Page 30: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

30

variables were generated by equation (5), where the error terms are supposed to be

independent of each other. The cut-point variables were generated recursively by

equation (3) using exponential parameterization to ensure the ordering between the cut-

points. Finally, the ‘observed’ self-report and vignette variables were generated by

discretizing the appropriate continuous ones using equations (4) and (6).

The two latent continuous variables of the simulated BIHOPIT model were generated by

linear equation (1), where the three covariates are age, sex, and country and the error

terms are correlated. Similarly, the latent vignette variables were given by equation (5)

with constant vignette means as parameters, where the error terms are correlated with the

same correlation coefficient to the self-report part. The cut-point variables were generated

by equation (3) again. Then, the discrete responses were given again by equations (4) and

(6).

Figure Supplement

Figure 10. Predicted versus true cut-points for domain A using BIHOPIT model based on

BIHOPIT scenario.

Figure 11. Predicted versus true cut-points for domain B using BIHOPIT model based on

BIHOPIT scenario.

Page 31: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

31

Figure 12. Predicted versus true cut-points for domain A using CHOPIT model based on

BIHOPIT scenario.

Figure 13. Predicted versus true cut-points for domain B using CHOPIT model based on

BIHOPIT scenario.

Page 32: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

32

Figure 14. Predicted versus true cut-points for domain A using BIHOPIT model based on

CHOPIT scenario.

Figure 15. Predicted versus true cut-points for domain B using BIHOPIT model based on

CHOPIT scenario.

Page 33: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

33

Figure 16. Predicted versus true cut-points for domain A using CHOPIT model based on

CHOPIT scenario.

Figure 17. Predicted versus true cut-points for domain A using CHOPIT model based on

CHOPIT scenario.

Page 34: WHO Multi-Country Studies unit Working Paper 4 · WHO Multi-Country Studies unit Working Paper 4 Self-reported health and anchoring vignettes in SAGE Wave 1: Applying the bivariate

34