A test of the QALY model when health varies over time

Social Science & Medicine 57 (2003) 1697–1706

A test of the QALY model when health varies over time

Anne Spencer*

Department of Economics, Queen Mary and Westfield College, University of London, Mile End Road, London E1 4NS, UK

Abstract

Quality-adjusted life years (QALYs) estimate the utility derived from health profiles by taking account of life

expectancy and quality of life. In applying QALYs to situations where health varies over time, it is usual to assume that

we can add the utilities from constituent health states. This paper investigates the QALY approach to combining health

states over time using two tests. The first test rejects additive independence, the central assumption of the QALY model,

for individual respondents. The second test is equivocal. The tests are, therefore, unable to conclusively reject the

QALY approach to combining health states over time.

r 2003 Elsevier Science Ltd. All rights reserved.

Keywords: Utility measurement; Additive independence; Health profiles; QALY model

Introduction

Applied work that assesses the cost effectiveness of

treatment faces the challenge of how best to measure the

benefits of treatment. Medical benefits are only one

measure; a broader set of measures could be sought by

eliciting the preferences of interested parties such as

policy makers, patients and the general population.

Quality adjusted life years (QALYs) have been devel-

oped by health economists to measure preference for

treatment. The QALY approach uses a utility-based

measure to elicit and represent the preference for quality

of life and life expectancy. When considering a set of

health states over time, termed a health profile, the

QALY approach assumes that we can simply add the

utilities from constituent health states.

The QALY approach combines a measure of a

respondent’s preferences for time and for the constituent

health states. Consider three health states X ; Y and Z

which make up a health profile which we denote as

XYZ: The QALY approach assumes that it is valid to

estimate the utility of the health profile XYZ by simply

adding the utilities of its constituent health states,

appropriately weighted by a measure of a respondent’s

preferences for time (represented by wi). The QALY

approach applies, therefore, an additive model and the

utility of profile XYZ is estimated by the following

equation.

UðXYZÞ ¼ w1UðX Þ þ w2UðY Þ þ w3UðZÞ; ð1Þ

where wi is the time discount factor at time i; for i ¼1; 2; 3 and Uð:Þ is the utility function. A more holistic

approach, such as the Healthy Years Equivalent, elicits

a respondent’s preferences for the entire profile (i.e.

UðXYZÞ). The QALY approach has advantages over a

holistic approach since it reduces the cost of estimation.

For instance, in the EuroQol classification system,

assigning utilities to all possible combinations of profiles

over a 10 year period would increase the number of

profiles that need to be estimated by an exponent of 10

(i.e. to 24310 profiles). However, if the QALY approach

to combining health states over time is to be widely

accepted, it is important to offer empirical results to

support the approach.

A challenge to the QALY approach arises from

concerns that respondents may have preferences over

the ordering of events, known as sequencing effects

(Gafni, 1995; Ross & Simonson, 1991). A respondent

may desire to overcome ill-health and look forward to

good health (dread and savouring, Loewenstein &

Prelec, 1993). A respondent may also pay more attention

ARTICLE IN PRESS

*Tel: +44-20-7882-5532; fax: +44-20-8983-3580.

E-mail address: [email protected] (A. Spencer).

0277-9536/03/$ - see front matter r 2003 Elsevier Science Ltd. All rights reserved.

doi:10.1016/S0277-9536(02)00554-3

to the final health state in a treatment (Kahneman,

Fredrickson, Schreibner, & Redelmeier, 1993; Varey &

Kahneman, 1992; Ross & Simonson, 1991) or be aware

that they will adapt to health in a positive or negative

manner over time (Ross & Simonson, 1991).

The aim of our paper is to test the QALY approach

when health varies over time using two tests. The first

test examines the additive independence assumption that

underpins the QALY approach. This assumption is

defined in the Background. The second test examines the

implications of the additive model given in Eq. (1) for a

broad category of health profiles where sequencing

effects are likely to arise. In this second test, the profiles

considered offer deteriorating, improving or temporarily

improving health.

The remainder of the paper is set out as follows. The

Background reviews existing tests of the QALY

approach and sets out the extent to which our paper

contributes to these tests. In Methods, we overview the

questionnaire used in our paper and our tests. The

Results reviews the findings and the Conclusion

considers the implications of our tests for the QALY

approach.

Background

It is common practice to assume that preferences

under risk are described by Expected Utility Theory—a

theory of decision making under risk.1 QALYs have

been shown to be a valid measure of preference under

this theory if certain assumptions hold.2 When health

varies over time, Bleichrodt (1995) and Bleichrodt and

Quiggin (1997) have shown that for QALYs to be a valid

measure under Expected Utility Theory, it is necessary

to assume that additive independence holds. Additive

independence holds if the preferences between risky

treatments depend only upon the marginal rather than

the joint probability distributions of the health states

(Bleichrodt & Quiggin, 1997, p. 154; Keeney & Raiffa,

1976, p. 230). Bleichrodt (1995) outlined a test of

additive independence based on a respondent’s choice

between a 50 and 50 gamble of chronic or intermittent

health states but he did not collect data on this. Our first

test collects this data to check whether additive

independence holds. More details of this test are given

in Methods.

An alternative type of test checks the implications of

the additive model given in Eq. (1). In this test, the

additive model will be rejected not only when additive

independence fails, but also when one of the other

assumptions underlying the QALY approach fails. A

popular test of this type checks the extent to which it is

possible to estimate profiles from constituent states.

Such a test relies upon estimating the discount factor by

imposing a particular discount function (usually

assumed to be exponential) and requires the estimation

of the implied discount rate. Using this test, Dolan

and Gudex (1995) found that constituent states over-

estimated temporarily deteriorating profiles and that

respondents derived less benefit from a temporary

deterioration in health. Richardson, Hall, and Salkeld

(1996) also found that constituent states overestimated

deteriorating profiles by between 31% and 57% and that

it was not possible to find a plausible discount rate

that would suggest equivalence. MacKeigan, O’Brien,

and Oh (1999), on the other hand, did not reject

the additive model for profiles offering a gradual

deterioration. MacKeigan et al.’s result could be

explained by the gradual adaptation to the new health

states and reference point effects (Ross & Simonson,

1991).

One drawback in the estimation of profiles from

constituent states is that questions designed to elicit

preferences for the timing of health may also be

capturing preferences for the sequence of events (Gafni,

1995). Krabbe and Bonsel (1998) tried to overcome this

problem by testing the impact of good health appearing

early or later in the treatment. They attempted to

control for discounting by imposing a discount rate that

led to the smallest difference between the impact of good

health appearing early or later in the treatment. The

residual difference should, therefore, give an estimate of

the sequencing effects under this restriction. They found

evidence of a sequencing effect in 6 of the 13 profiles

considered, all of which supported the notion that

respondents placed a higher utility on good health at the

end of the profile. Another drawback arises in tests that

rely on the time trade-off (TTO) method to elicit

preferences in the tests discussed so far (except of

Richardson et al., 1996 who used the TTO method

alongside other methods). The TTO method asks

respondents to consider an improvement in health that

is achieved through a reduction in longevity of life

(Torrance, Thomas, & Sackett, 1972). The role of time is

crucial to this preference elicitation process. The TTO

values are therefore hard to interpret since they reflect

an interplay of timing and sequencing effects. This

suggests that elicitation methods, which do not rely on

ARTICLE IN PRESS

1Rank dependent utility theory has been cited as a more

accurate description of decision making under risk. Bleichrodt

and Quiggin (1997) outline the assumptions that are required

for QALYs to be a valid measure of preferences under this

model.2These assumptions have been defined separately for the

cases when health remains constant or varies over time. When

health remains constant, Pliskin, Shepard, and Weinstein (1980)

outline the assumptions required for QALYs to be a valid

measure of preferences (a critique of these assumptions is given

by Loomes & McKenzie, 1989). Bleichrodt, Wakker, and

Johannesson (1997) and Miyamoto, Wakker, Bleichrodt, and

Peters (1998) have simplified these assumptions.

A. Spencer / Social Science & Medicine 57 (2003) 1697–17061698

time to elicit their responses, are more appropriate for

tests of the additive model.

Lipscomb (1989) used regression analysis to predict

the extent to which different components affected a

profile’s utility. The regression showed significant

interaction between different health states which sug-

gested that a simple additive model was inappropriate.

In addition, for three different discount rates (0%, 5%

and 10%) the profiles for a representative respondent

could not easily be estimated by a weighted average of

constituent health states. Kuppermann, Shiboski,

Feeny, Elkin, and Washington (1997) compared profiles

against constituent health states for patients at a

maternity clinic who were considering prenatal diag-

nosis. Kuppermann et al. considered two additive

models: one model assumed that respondents had no

preferences for time (a zero discount rate); the other

model attached statistically inferred weights to time. The

constituent health states in the latter model gave better

predictions of the holistic profiles.3 However, Holmes

(1998) points out, that statistically inferred weights have

no underlying conceptual basis in terms of respondents’

preferences for time. It is therefore difficult to check the

extent to which statistically inferred weights are

consistent with the QALY approach.

Treadwell (1998) offered an innovative solution to

some of the shortcomings of these tests. Treadwell was

concerned with examining the preferential independence

assumption that underpins the additive model under

certainty. Preferential independence holds if preferences

between profiles that contain the same health state in

period i do not depend upon the severity of the health

state in period i (Keeney & Raiffa, 1976, p. 101).

Treadwell asked respondents to choose between two

profiles that occurred with certainty and included health

state Z in period i: He then tested whether changing theseverity of health state Z altered a respondent’s choice

between these two profiles. Given that the comparison

of health states was made within the same period, this

test offers a simple technique to control for a

respondent’s preferences for time. He concluded that

preferential independence held in 36 out of 42 tests. Our

second test is similar to Treadwell’s approach but rather

than testing the additive model under certainty we test

the implications of the additive model under risk given

that most treatment outcomes are risky. The test is

based on two profiles that contain health state Z in

period i:We measure the utility that is derived from eachof the profiles using a method that reflects a respon-

dent’s attitude towards risk. The test then checks

whether changing the severity of health state Z in

period i alters a respondent’s preferences between these

two profiles.

Method

Overview of the questionnaire

The study uses the EuroQol classification system,

which describes states of health along five dimensions:

mobility, self-care, usual activities, pain and anxiety

(Kind, Dolan, Gudex, & Williams, 1998; Dolan, 1996,

1997).4 Each dimension has three levels of severity: no

problems, some problems and severe problems, denoted

by 1, 2 and 3, respectively and colour-coded black, blue

and red in our study.5 Each health state is colour-coded

and we refer to these as follows: 11111 as N; 12221 asW ; 21222 as Y ; 22232 as Z and death as D:6 Health stateW allows respondents to become familiar with the

methods, but the utilities are not used in the study. The

respondents are asked to imagine a profile in which each

health state lasts for 10 years without change, to be

followed immediately by death. The profiles are depicted

on cards and respondents are asked to rank the cards.

They are then asked to consider treatments leading to

changes of health over the next 10 years. It is explained

to respondents that each profile consists of three

periods: the first 3 years, the second 3 years and the

last 4 years. Health states are constant in any one period.

The respondents are asked to consider two or three

different health states in the 10 year profile (after which

they would die).7 The profiles are again depicted on

cards, for instance, Fig. 1 shows 3 years in health state

N ; followed by 3 years in health state Y and 4 years in

health state Z; which we denote by NYZ: The

questionnaire considers a wide variety of profiles which

may lead to a rejection of the additive model, such as

adaptation to temporary health states and a desire to

overcome ill-health and look forward to good health

(for a list of profiles see column 2 of Table 1). The cards

are passed to respondents and they are asked to rank

them.

Respondents are then asked 10 standard gamble (SG)

questions that elicit cardinal von Neumann and

ARTICLE IN PRESS

3A model gives better predictions if it has a lower predictive

error. The predictive error for the model that assumed a zero

discount rate was 0.35 and the predictive error for the model

that assumed statistically inferred weights was 0.19.

4Mean and median estimates of all the EuroQol states can

calculated from the formula given by Dolan (1996) and Dolan

(1997) respectively.5Level 3 pain was described as moderate pain or discomfort

with periods of severe pain or discomfort rather than extreme

pain or discomfort used in the EuroQol work.6Health states Z is more severe than health state Y in terms

of self-care and pain, so that respondents could easily

discriminate between them.7Two states have the advantage of simplicity, three allow the

possibility of considering the effects of more complicated

patterns, for instance declining or improving states.

A. Spencer / Social Science & Medicine 57 (2003) 1697–1706 1699

Morgenstern (1944) utilities, denoted by Uð:Þ; for

different health profiles. The SG questions give an

estimate of the utility of the entire health profile, and so

are a type of holistic elicitation procedure. In each SG

question, the choice is between remaining in a health

profile, say ZNN ; or undergoing a risky treatment. Therisky treatment has a probability p of succeeding,

resulting in a better health state, normal health, or

(1� p) of failing, resulting in a worse health state, death

(as shown in Fig. 2). They are asked to state the chance

of success and failure where they consider the alter-

natives to be most finely balanced and they do not mind

which treatment they receive. Probability p is varied

until the respondents are indifferent between the two

alternatives. To help them with this they are given a

sheet of paper listing the chances of success/failure

against which they mark their response (based on Jones-

Lee, Loomes, & Philips, 1995). The point at which

respondents are indifferent between the profile and the

risky treatment is used to derive the SG utility for the

profile. In our example, the point of indifference

between the profile ZNN and the risky treatment can

be represented by:

UðZNNÞ ¼ p � UðNNNÞ þ ð1� pÞ � UðDDDÞ:

If the UðNNNÞ ¼ 1 and UðDDDÞ ¼ 0 this expression

becomes:

UðZNNÞ ¼ p: ð2Þ

In this example, the SG utility for profile ZNN is p:

Finally, the questionnaire includes a test of additive

independence (Bleichrodt, 1995). The question asked

respondents to imagine that they became ill and are

offered a choice between two treatments shown in Fig. 3.

No treatment would result in them remaining in health

state ZZZ: In treatment C; they have a 50% probability

of it succeeding and returning to ZNN over the next 10

years and a 50% probability of it failing and resulting in

NZZ: In treatment D; they have a 50% probability of it

succeeding and returning to full health for the next 10 years

and a 50% probability of it failing and then remaining in

health state Z for the next 10 years. The respondent is

asked which they prefer or, if they do not mind which

treatment they receive, and to explain their answer.

A test of additive independence

The test of additive independence forms the first test

of the QALY approach. We estimate the proportion of

respondents who prefer each treatment or are indiffer-

ent. If there is no measurement error and additive

independence holds, respondents will be indifferent

between treatments C and D: If there is measurementerror, respondents could be indifferent between the two

treatments but inadvertently report a preference. Given

that we do not know the distribution of this measure-

ment error, we simply calculate a confidence interval

around the proportion of respondents who are indiffer-

ent, to be suggestive of the proportion of respondents

ARTICLE IN PRESS

Fig. 1. An example of a health profile.

Table 1

An overview of the questionnaire

(1) Question (2) Profiles (3) Mean SG utility (4) Median SG utility (5) Standard deviation SG utility

1 WWW 0.800 0.825 0.170

2 YYY 0.777 0.800 0.142

3 ZZZ 0.461 0.450 0.245

4 YYZ 0.642 0.660 0.184

5 ZNN 0.903 0.950 0.110

6 YYN 0.875 0.925 0.117

7 NYZ 0.707 0.750 0.198

8 ZYN 0.803 0.850 0.149

9 ZYZ 0.508 0.500 0.230

10 YYD 0.482 0.500 0.236

Treatment C or D Additive independence test n/a n/a n/a


who would be indifferent at a population level.

Confidence intervals for proportional data are based

on the exact probabilities of the binomial distribution

because the proportions involved are small (Bland, 1995,

pp. 125–126).8 We also test whether there is a strict

preference for C or D within the sample using

McNeumar’s test (Daniel, 1990, p. 165). Bleichrodt

(1995) anticipated that more intermittent health states,

such as ZNN or NZZ would be preferred to chronic

states, such as NNN or ZZZ: The null hypothesistested here predicts that the same proportion of

respondents prefer treatments C and D: The alternativehypothesis predicts that significantly more respondents

prefer C or D:

ARTICLE IN PRESS

3 yr 6yr 10 yr

3 yr 6yr 10 yr

(1-p)%

Treatment A

p%

0

0

Place each card in turn100%

Treatment B

Fig. 2. The format of the SG questions 1–10.

Fig. 3. A test of additive independence.

8The software package STATAr is used to construct the

confidence intervals based on the exact probabilities of the

binomial distribution.


Additive independence is rejected only for responses

that represent a strict preference for treatments C or D:Qualitative comments are recorded and help verify

whether this was the case or whether respondents who

expressed a strict preference for a treatment were

indifferent between the two treatments (Varian, 1992).

A test of the implications of the additive model

The second test reported in this paper investigates the

implications of the additive model based on the SG

utilities derived in questions 4–10. The questionnaire

incorporates two versions of this test: Versions 1 and 2.

In Version 1, we compare the profiles ZNN and ZYZ

with the profiles NNN and NYZ as shown in Fig. 4.

Preferences should be the same between these two pairs

of profiles, since the only difference between them is that

the former pair of profiles begin with health state Z

whilst the latter pair begin with health state N :9 The nullhypothesis of this test predicts that the additive model

holds and the differences in SG utilities between profiles

ZNN and ZYZ are the same as the differences in SG

utilities between profiles NNN and NYZ (i.e.

½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ� ¼ 0).

The alternative hypothesis predicts that the SG utilities

of the two pairs of profiles differ (i.e.

½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ�a0).The SG utilities used in this test are those derived in

questions 5, 9 and 7, respectively (except profile NNN

that is assigned the utility of 1). Similarly, in Version 2,

we compare the profiles ZYN and ZYZ; with the

profiles YYN and YYZ: The null hypothesis of this testpredicts that the additive model holds and the differ-

ences in SG utilities between profiles ZYN and ZYZ are

the same as the differences in SG utilities between

profiles YYN and YYZ (i.e. ½UðZYNÞ � UðZYZÞ��½UðYYNÞ � UðYYZÞ� ¼ 0). In both versions of this test,

the Wilcoxon’s matched pairs test is used to test whether

the magnitudes of differences in SG utilities are

sufficient to reject the additive model (Bland, 1995, pp.

212–215).10 The test of the additive model relies only on

a comparison of preferences for health states occurring

within the same time period.11 The test, therefore,

controls for the effect of time preference.

Finally, we investigate the extent to which SG utilities

based on holistic elicitation procedures (questions 4–10)

equal the SG utilities implied by a profile’s constituent

states. The utility of health states Y and Z can be

estimated from profiles in which the health state lasts for

the full 10 years without change, i.e. YYY and ZZZ;respectively (questions 2–3). These constituent health

states can then be used to calculate an implied utility for a

profile based on a 0%, 5%, 10% discount rate. If we have

correctly estimated the discount rate and the additive

model holds, then SG utilities based on holistic elicitation

procedures equal the SG utilities implied by a profile’s

constituent states.12 The Wilcoxon’s matched pairs test is

used to check whether these utilities are equal, but is a

weaker test of the QALY approach than those discussed

so far since it relies on imposing a particular discount

rate. The results therefore are used only to indicate the

profiles that may be driving the results in the second test.

Data

The researcher contacted residents of York who had

taken part in a pilot Health and Safety study in the

previous 4 months. Respondents were invited to take

part in a 60-minutes interview in the Department of

Economics at York University for a payment of d10. All

interviews were tape-recorded. The sample size of the

study was based on detecting a difference of 0.1 in the

SG utilities between the pairs of profiles in the second

test (i.e. ½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ�;etc.).13 This test was designed to have an 80% chance of

detecting significant differences based on a standard

deviation 0.2.14,15 In total, 29 respondents were inter-

viewed, 15 males and 14 females.

ARTICLE IN PRESS

9Additive independence implies that if a respondent prefers

ZNN over ZYZ; then they prefer NNN over NYZ: Similarly,additive independence implies that if a respondent prefers ZYZ

over ZNN; then they prefer NYZ over NNN (likewise for

indifference).10A two-tailed Kolmogorov–Smirnov test at a 5% signifi-

cance level was used to test the hypothesis that data were

normally distributed. The test statistic was sufficiently large to

reject the null hypothesis. We, therefore, apply nonparametric

tests in what follows.11 If we illustrate this for Version 1 of the test using the

notation of Eq. (1), the differences in the SG utilities between

(footnote continued)

the two pairs of profiles are: UðZNNÞ � UðZYZÞ ¼UðNNNÞ � UðNYZÞ ¼ w2ðUðNÞ � UðY ÞÞ þ w3ðUðNÞ � UðZÞÞ:The two pairs of profiles therefore compare the same health

states at the same time period.12 In this calculation we assume that a respondent’s rate of

time preference does not vary and that the SG utilities based on

holistic elicitation procedures equal the SG utilities implied by a

profile’s constituent states implied for only one of the three

discount rates.13 Johnston, Brown, Gerard, O’Hanlon, and Morton (1998)

and Dolan (1996) powered their tests to detect a difference of

0.1 in health state utilities.14Bleichrodt and Johannesson (1997) based their power

calculations on a standard deviation of 0.20y given that the

standard deviations for the time-trade-off and standard gamble

quality weights reported in the literature rarely exceed 0.20 (p.

27).15The power calculations are based on the standard

deviations of ½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ�in Version 1 of the second test and ½UðZYNÞ � UðZYZÞ� �½UðYYNÞ � UðYYZÞ� in Version 2.


Results

In the additive independence test, the sample was split

into those preferring treatment C or treatment D (13

preferred C; 15 preferred D), with only one respondent

expressing no preference between the two treatments.

We calculate a confidence interval around the proportion

of respondents who are indifferent using the binomial

distribution. In the sample, 3.4% (1/29) of respondents

were indifferent with a 95% confidence interval of 0.08–

17%. This confidence interval suggests that the propor-

tion of indifferent respondents at a population level is

below 17%. We also test whether the proportion

preferring C is significantly different from the proportion

preferring D: The McNeumar’s test statistic is 0.378 andis less than the z-score of 1.96 at a 5% significance level,

so we are unable to reject the null hypothesis.

The test is unable to identify those respondents who

are very close to indifference and categorised as

preferring C or D: In the qualitative comments, threerespondents felt that the treatment outcomes were

similar, but only one of these expressed indifference.

Therefore, only a small proportion of respondents who

stated a preference for a treatment appeared to regard

the two treatments as similar. The qualitative comments

support the notion that few respondents who were

indifferent between the two treatments had inadvertently

reported a preference.

In the additive model test, Table 2 reports the

differences in SG utilities between pairs of profiles

considered in Versions 1 and 2 of this test. In Version 1,

the mean and median differences in SG utilities between

the two pairs of profiles (i.e. ½UðZNNÞ � UðZYZÞ��½UðNNNÞ � UðNYZÞ�) were 0.102 and 0.070, respectively

ARTICLE IN PRESS

Fig. 4. Health profiles used in Version 1 of the second test.

Table 2

Differences in SG utilities between pairs of profiles

(1) (2) Mean (3) Median (3) Standard deviation

Version 1

UðZNNÞ � UðZYZÞ 0.394 0.400 0.224

UðNNNÞ � UðNYZÞ 0.293 0.250 0.198

½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ� 0.102 0.070 0.164

Version 2

UðZYNÞ � UðZYZÞ 0.295 0.250 0.201

UðYYNÞ � UðYYZÞ 0.233 0.200 0.171

½UðZYNÞ � UðZYZÞ� � ½UðYYNÞ � UðYYZÞ� 0.062 0.021 0.152


(line 4, Table 2). The Wilcoxon’s matched pairs test

found these differences to be statistically significant, and

so we reject the additive model (two tailed P ¼ 0:0048).In Version 2, the mean and median differences in the

SG utilities between the two pairs of profiles (i.e.

½UðZYNÞ � UðZYZÞ� � ½UðYYNÞ � UðYYZÞ�) were

0.062 and 0.021, respectively (line 8, Table 2). However,

in this version, the Wilcoxon’s matched pairs test found

these differences not to be statistically significant, and so

we could not reject the additive model (P ¼ 0:0656).The test had been powered to detect a difference of 0.1

in SG utilities between the two pairs of profiles based on

a standard deviation of 0.2. Version 2 had a slightly

higher power to detect these differences, 94% compared

to 91%, since the standard deviation was lower. Despite

this, Version 2 failed to find statistically significant

differences.

We investigate the extent to which the SG utilities

based on holistic elicitation procedures are equal to the

SG utilities implied by their constituent states. Table 3

shows the implied median utilities for profiles based on

constituent states and a 0%, 5% and 10% discount rate

(column 4) and holistic elicitation procedures (column

3). In questions that involve an improvement in health,

ZNN ; YYN; and ZYN (in questions 5, 6 and 8,

respectively), the SG utilities based on holistic elicitation

procedures were best estimated by imposing a zero

discount rate on the constituent states. However, in the

questions that involved a decline in health YYZ; NYZ

and YYD (in questions 4, 7 and 10, respectively), the SG

utilities were best estimated by imposing a positive

discount rate on the constituent states. For instance, in

profile NYZ the SG utilities were best estimated by

imposing a 10% discount rate on the constituent

states.16

The Wilcoxon’s matched pairs test was used to check

whether the differences between profiles estimated

holistically and by their constituent states were statisti-

cally significant for a zero discount rate. These showed

that the profile utilities for questions ZNN and ZYZ

(questions 5 and 9, respectively) were statistically

different from the utilities based on the constituent

states at a 5% significance level (P ¼ 0:0049 and P ¼0:0094; respectively). In profile ZYZ the constituent

states overestimated the profile whilst in profile ZNN

the constituent states underestimated the profile.

Conclusion

In the additive independence test, individual respon-

dents’ preferences lead us to reject additive indepen-

dence. We also find that the sample was split almost

equally between respondents who preferred treatment C

and those who preferred treatment D: A caveat to this

test is that we did not collect information on the strength

of preference, we are therefore unable to identify from

the quantitative data those respondents who are very

close to indifference and categorised as preferring

C or D:In the additive model test, with only one of the two

tests detecting statistically significant differences, we are

unable to conclusively reject the model. A caveat of this

test is that it is designed to detect a difference of 0.1

only. It is possible that policy makers may want to detect

ARTICLE IN PRESS

Table 3

Median SG utilities for profiles estimated holistically and by their constituent states

(1) Question (2) Profile (3) SG utilities based on

holistic elicitation

procedures

(4) SG utilities of profiles estimated by their constituent

states

Discount rate of 0% Discount rate of 5% Discount

rate of 10%

4 YYZ 0.660 0.630 0.647 0.674

5 ZNN 0.950 0.835 0.806 0.777

6 YYN 0.925 0.880 0.869 0.858

7 NYZ 0.750 0.680 0.713 0.751

8 ZYN 0.850 0.745 0.705 0.676

9 ZYZ 0.500 0.530 0.530 0.530

10 YYD 0.500 0.480 0.526 0.567

16For profile ZYZ; the median utility implied by its

constituent states does not change as the discount rate increases

(footnote continued)

from 0% to 10%, though the mean slightly increases. Profile

ZYZ is the only profile where health state Z appears in both the

first and last period. The primary impact of the increase in

discount rate, therefore, is to apply more weight to health state

Z in the first period and less weight to health state Z in the last

period, with little overall impact on the median utility.


differences that are less than 0.1, but this would require

a larger sample.

Which profiles lead to a rejection of the additive

model? For this we return to Version 1 of the second test

that is found to be significantly different. In this version,

the observed differences could arise from one or more of

the following: profile ZNN is valued higher or profiles

ZYZ and NYZ are valued lower than the additive model

would predict.17 An indication of the profiles that are

driving this result can be found by looking again at the

estimation of profiles from constituent states. When

there is a zero discount rate, the case considered by

Dolan and Gudex (1995), the constituent states over-

estimate the temporarily improving profile ZYZ and

underestimate the improving profile ZNN: From this we

tentatively conclude that profiles ZYZ and ZNN lead to

a rejection of the additive model in our study.

Two recommendations arise from this paper. Our first

recommendation is that when testing for additive

independence, strength of preference information should

be collected. This information will help to identify

responses that are close to indifference. In addition,

when preferences are very polarised, it will clarify the

extent to which summary statistics differ from individual

respondents’ preferences (Dolan, 2000). For example,

individual respondents’ preferences may fail to comply

with additive independence, but at a sample level these

preferences cancel each other out and on average,

additive independence holds. A similar difference

between respondents’ preferences and the sample’s

summary statistics has been reported in the estimation

of time preferences and discount rates. There is a wide

variation in respondents’ discount rates but, on average,

the discount rate is zero (Dolan, 2000). New research is

exploring the potential for subgroup analyses of

preferences and the extent to which preferences should

be aggregated (Sculpher & Gafni, 2001). Our second

recommendation is that future research should continue

to check for sequencing effects. The extent to which

sequencing effects arise appears to be heavily dependent

upon the profile and the viewpoint about the benefit

derived from such profiles. This element of subjectivity is

in keeping with the notion that respondents’ preferences

for health treatments are related to their expectations of

health (Chapman, 1996). Patterns are beginning to

emerge in the empirical work about the instances in

which the additive model does not hold, but our tests are

unable to conclusively reject the QALY approach. At

the moment the best way forward would be to include

estimation of profiles and constituent states in future

studies to check the extent to which the QALY approach

continues to hold.

Acknowledgements

The author would like to thank Graham Loomes,

Karl Claxton, Sandra Eldridge, Judith Covey, Bj .orn

Lindgren, Carl Hampus Lyttkens and two referees for

their valuable comments. In addition, the author is

grateful to the Swedish Social Research Council for

funding a visiting research fellowship to pilot the

approach, via research grants to Bj .orn Lindgren, Lund

University. The author is also grateful to the Leverhulme

Trust for financial support for the UK study (funded by

the project ‘The Anatomy of Decision Making under

Risk Over Time’). Any errors are the responsibility of

the author alone.

References

Bland, M. (1995). An introduction to medical statistics (2nd ed.).

Oxford: Oxford University Press.

Bleichrodt, H. (1995). QALYS & HYEs: Under what condi-

tions are they equivalent? Journal of Health Economics, 14,

17–37.

Bleichrodt, H., & Johannesson, M. (1997). An experimental test

of constant proportional tradeoff and utility independence.

Medical Decision Making, 17, 21–32.

Bleichrodt, H., & Quiggin, J. (1997). Characterizing QALYs

under a general rank dependent utility model. Journal of

Risk and Uncertainty, 15, 151–165.

Bleichrodt, H., Wakker, P., & Johannesson, M. (1997).

Characterising QALYs by risk neutrality. Journal of Risk

and Uncertainty, 15, 107–114.

Chapman, G. B. (1996). Expectations and preferences for

sequences of health and money. Organizational Behavior and

Human Decision Processes, 67, 59–75.

Daniel, W. W. (1990). Applied nonparametric statistics (2nd

ed.). Boston: PWS–KENT Publishing Company.

Dolan, P. (1996). Modelling valuations for health states: The

effect of duration. Health Policy, 38, 189–203.

Dolan, P. (1997). Aggregating health state valuations. Journal

of Health Services Research and Policy, 2, 160–165.

Dolan, P. (2000). The measurement of health-related quality of

life. In A. Culyer, & J. P. Newhouse (Eds.), Handbook of

health economic (pp. 1723–1759). Amsterdam: Elsevier.

Dolan, P., & Gudex, C. (1995). Time preference, duration and

health state valuations. Health Economics, 4, 289–299.

Gafni, A. (1995). Time in health: Can we measure individuals’

pure time preference. Medical Decision Making, 15, 31–37.

Holmes, A. M. (1998). Measurement of short term health

effects in economic evaluations. Pharmacoeconomics, 13,

171–174.

Johnston, K., Brown, J., Gerard, K., O’Hanlon, M., & Morton,

A. (1998). Valuing temporary and chronic health states

associated with breast screening. Social Science and

Medicine, 47, 213–222.

Jones-Lee, M. W., Loomes, G., & Philips, P. (1995). Valuing

the prevention of non-fatal road injuries: Contingent

valuation versus standard gamble. Oxford Economic Papers,

47, 676–695.

ARTICLE IN PRESS

17 In Version 1, the SG utilities between profiles ZNN and

ZYZ were greater than the differences in SG utilities between

profiles NNN and NYZ:


Kahneman, D., Fredrickson, B. L., Schreibner, C. A., &

Redelmeier, D. A. (1993). When more pain is preferred to

less: Adding a better end. Psychological Science, 4, 401–405.

Keeney, R. L., & Raiffa, H. (1976). Decisions with multiple

objectives, preferences and value tradeoffs. London: Wiley.

Kind, P., Dolan, P., Gudex, C., & Williams, A. (1998).

Variations in population health status: Results from a

United Kingdom national questionnaire survey. British

Medical Journal, 16, 736–741.

Krabbe, P. F., & Bonsel, G. J. (1998). Sequence effects, health

profiles, and the QALY model: In search of realistic

modelling. Medical Decision Making, 18, 178–186.

Kuppermann, M., Shiboski, S., Feeny, D., Elkin, E. P., &

Washington, A. E. (1997). Can preference scores for discrete

states be used to derive preference scores for an entire path

of events? Medical Decision Making, 17, 42–55.

Lipscomb, J. (1989). The preference for health in cost-

effectiveness analysis. Medical Care, 27, S233–253.

Loewenstein, G., & Prelec, D. (1993). Preferences for sequences

of outcomes. Psychological Review, 100, 91–108.

Loomes, G., & McKenzie, L. (1989). The use of QALYs in

health care decision making. Social Science and Medicine,

28, 299–308.

Mackeigan, L. D., O’Brien, B. J., & Oh, P. I. (1999). Holistic

versus composite preferences for lifetime treatment se-

quences for type 2 diabetes. Medical Decision Making, 19,

113–121.

Miyamoto, J. M., Wakker, P. P., Bleichrodt, H., & Peters, H. J.

M. (1998). The zero-condition: A simplifying assumption in

QALY measurement and multiattribute utility. Manage-

ment Science, 44, 839–849.

von Neumann, J., & Morgenstern, O. (1944). Theory of games

and economic behavior. Princeton, NJ: Princeton University

Press.

Pliskin, J. S., Shepard, D. S., & Weinstein, M. C. (1980). Utility

functions of life-years and health status. Operational

Research, 28, 206–224.

Richardson, J., Hall, J., & Salkeld, G. (1996). The measurement

of utility in multiphase health states. International Journal of

Technology Assessment in Health Care, 12, 151–162.

Ross, W. T., & Simonson, I. (1991). Evaluating pairs of

experiences: A preference for happy endings. Journal of

Behavioral Decision Making, 4, 273–282.

Sculpher, M., & Gafni, A. (2001). Can we reflecct variation in

soceital health state preferences in cost-effectiveness analy-

sis? International Health Economics Association, third

international conference, York.

Torrance, G. W., Thomas, W. H., & Sackett, D. L. (1972). A

utility maximization model for evaluation of health care

programmes. Health Service Research, 7, 118–133.

Treadwell, J. R. (1998). Tests of preferential independence

in the QALY model. Medical Decision Making, 18,

418–428.

Varey, C., & Kahneman, D. (1992). Experiences extended

across time: Evaluation of moments and episodes. Journal

of Behavioral Decision Making, 5, 169–185.

Varian, H. R. (1992). Microeconomic analysis (3rd ed.). New

York: Norton and Company.

ARTICLE IN PRESSA. Spencer / Social Science & Medicine 57 (2003) 1697–17061706

Documents

A test of the QALY model when health varies over time