Upload
anne-spencer
View
214
Download
2
Embed Size (px)
Citation preview
Social Science & Medicine 57 (2003) 1697–1706
A test of the QALY model when health varies over time
Anne Spencer*
Department of Economics, Queen Mary and Westfield College, University of London, Mile End Road, London E1 4NS, UK
Abstract
Quality-adjusted life years (QALYs) estimate the utility derived from health profiles by taking account of life
expectancy and quality of life. In applying QALYs to situations where health varies over time, it is usual to assume that
we can add the utilities from constituent health states. This paper investigates the QALY approach to combining health
states over time using two tests. The first test rejects additive independence, the central assumption of the QALY model,
for individual respondents. The second test is equivocal. The tests are, therefore, unable to conclusively reject the
QALY approach to combining health states over time.
r 2003 Elsevier Science Ltd. All rights reserved.
Keywords: Utility measurement; Additive independence; Health profiles; QALY model
Introduction
Applied work that assesses the cost effectiveness of
treatment faces the challenge of how best to measure the
benefits of treatment. Medical benefits are only one
measure; a broader set of measures could be sought by
eliciting the preferences of interested parties such as
policy makers, patients and the general population.
Quality adjusted life years (QALYs) have been devel-
oped by health economists to measure preference for
treatment. The QALY approach uses a utility-based
measure to elicit and represent the preference for quality
of life and life expectancy. When considering a set of
health states over time, termed a health profile, the
QALY approach assumes that we can simply add the
utilities from constituent health states.
The QALY approach combines a measure of a
respondent’s preferences for time and for the constituent
health states. Consider three health states X ; Y and Z
which make up a health profile which we denote as
XYZ: The QALY approach assumes that it is valid to
estimate the utility of the health profile XYZ by simply
adding the utilities of its constituent health states,
appropriately weighted by a measure of a respondent’s
preferences for time (represented by wi). The QALY
approach applies, therefore, an additive model and the
utility of profile XYZ is estimated by the following
equation.
UðXYZÞ ¼ w1UðX Þ þ w2UðY Þ þ w3UðZÞ; ð1Þ
where wi is the time discount factor at time i; for i ¼1; 2; 3 and Uð:Þ is the utility function. A more holistic
approach, such as the Healthy Years Equivalent, elicits
a respondent’s preferences for the entire profile (i.e.
UðXYZÞ). The QALY approach has advantages over a
holistic approach since it reduces the cost of estimation.
For instance, in the EuroQol classification system,
assigning utilities to all possible combinations of profiles
over a 10 year period would increase the number of
profiles that need to be estimated by an exponent of 10
(i.e. to 24310 profiles). However, if the QALY approach
to combining health states over time is to be widely
accepted, it is important to offer empirical results to
support the approach.
A challenge to the QALY approach arises from
concerns that respondents may have preferences over
the ordering of events, known as sequencing effects
(Gafni, 1995; Ross & Simonson, 1991). A respondent
may desire to overcome ill-health and look forward to
good health (dread and savouring, Loewenstein &
Prelec, 1993). A respondent may also pay more attention
ARTICLE IN PRESS
*Tel: +44-20-7882-5532; fax: +44-20-8983-3580.
E-mail address: [email protected] (A. Spencer).
0277-9536/03/$ - see front matter r 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S0277-9536(02)00554-3
to the final health state in a treatment (Kahneman,
Fredrickson, Schreibner, & Redelmeier, 1993; Varey &
Kahneman, 1992; Ross & Simonson, 1991) or be aware
that they will adapt to health in a positive or negative
manner over time (Ross & Simonson, 1991).
The aim of our paper is to test the QALY approach
when health varies over time using two tests. The first
test examines the additive independence assumption that
underpins the QALY approach. This assumption is
defined in the Background. The second test examines the
implications of the additive model given in Eq. (1) for a
broad category of health profiles where sequencing
effects are likely to arise. In this second test, the profiles
considered offer deteriorating, improving or temporarily
improving health.
The remainder of the paper is set out as follows. The
Background reviews existing tests of the QALY
approach and sets out the extent to which our paper
contributes to these tests. In Methods, we overview the
questionnaire used in our paper and our tests. The
Results reviews the findings and the Conclusion
considers the implications of our tests for the QALY
approach.
Background
It is common practice to assume that preferences
under risk are described by Expected Utility Theory—a
theory of decision making under risk.1 QALYs have
been shown to be a valid measure of preference under
this theory if certain assumptions hold.2 When health
varies over time, Bleichrodt (1995) and Bleichrodt and
Quiggin (1997) have shown that for QALYs to be a valid
measure under Expected Utility Theory, it is necessary
to assume that additive independence holds. Additive
independence holds if the preferences between risky
treatments depend only upon the marginal rather than
the joint probability distributions of the health states
(Bleichrodt & Quiggin, 1997, p. 154; Keeney & Raiffa,
1976, p. 230). Bleichrodt (1995) outlined a test of
additive independence based on a respondent’s choice
between a 50 and 50 gamble of chronic or intermittent
health states but he did not collect data on this. Our first
test collects this data to check whether additive
independence holds. More details of this test are given
in Methods.
An alternative type of test checks the implications of
the additive model given in Eq. (1). In this test, the
additive model will be rejected not only when additive
independence fails, but also when one of the other
assumptions underlying the QALY approach fails. A
popular test of this type checks the extent to which it is
possible to estimate profiles from constituent states.
Such a test relies upon estimating the discount factor by
imposing a particular discount function (usually
assumed to be exponential) and requires the estimation
of the implied discount rate. Using this test, Dolan
and Gudex (1995) found that constituent states over-
estimated temporarily deteriorating profiles and that
respondents derived less benefit from a temporary
deterioration in health. Richardson, Hall, and Salkeld
(1996) also found that constituent states overestimated
deteriorating profiles by between 31% and 57% and that
it was not possible to find a plausible discount rate
that would suggest equivalence. MacKeigan, O’Brien,
and Oh (1999), on the other hand, did not reject
the additive model for profiles offering a gradual
deterioration. MacKeigan et al.’s result could be
explained by the gradual adaptation to the new health
states and reference point effects (Ross & Simonson,
1991).
One drawback in the estimation of profiles from
constituent states is that questions designed to elicit
preferences for the timing of health may also be
capturing preferences for the sequence of events (Gafni,
1995). Krabbe and Bonsel (1998) tried to overcome this
problem by testing the impact of good health appearing
early or later in the treatment. They attempted to
control for discounting by imposing a discount rate that
led to the smallest difference between the impact of good
health appearing early or later in the treatment. The
residual difference should, therefore, give an estimate of
the sequencing effects under this restriction. They found
evidence of a sequencing effect in 6 of the 13 profiles
considered, all of which supported the notion that
respondents placed a higher utility on good health at the
end of the profile. Another drawback arises in tests that
rely on the time trade-off (TTO) method to elicit
preferences in the tests discussed so far (except of
Richardson et al., 1996 who used the TTO method
alongside other methods). The TTO method asks
respondents to consider an improvement in health that
is achieved through a reduction in longevity of life
(Torrance, Thomas, & Sackett, 1972). The role of time is
crucial to this preference elicitation process. The TTO
values are therefore hard to interpret since they reflect
an interplay of timing and sequencing effects. This
suggests that elicitation methods, which do not rely on
ARTICLE IN PRESS
1Rank dependent utility theory has been cited as a more
accurate description of decision making under risk. Bleichrodt
and Quiggin (1997) outline the assumptions that are required
for QALYs to be a valid measure of preferences under this
model.2These assumptions have been defined separately for the
cases when health remains constant or varies over time. When
health remains constant, Pliskin, Shepard, and Weinstein (1980)
outline the assumptions required for QALYs to be a valid
measure of preferences (a critique of these assumptions is given
by Loomes & McKenzie, 1989). Bleichrodt, Wakker, and
Johannesson (1997) and Miyamoto, Wakker, Bleichrodt, and
Peters (1998) have simplified these assumptions.
A. Spencer / Social Science & Medicine 57 (2003) 1697–17061698
time to elicit their responses, are more appropriate for
tests of the additive model.
Lipscomb (1989) used regression analysis to predict
the extent to which different components affected a
profile’s utility. The regression showed significant
interaction between different health states which sug-
gested that a simple additive model was inappropriate.
In addition, for three different discount rates (0%, 5%
and 10%) the profiles for a representative respondent
could not easily be estimated by a weighted average of
constituent health states. Kuppermann, Shiboski,
Feeny, Elkin, and Washington (1997) compared profiles
against constituent health states for patients at a
maternity clinic who were considering prenatal diag-
nosis. Kuppermann et al. considered two additive
models: one model assumed that respondents had no
preferences for time (a zero discount rate); the other
model attached statistically inferred weights to time. The
constituent health states in the latter model gave better
predictions of the holistic profiles.3 However, Holmes
(1998) points out, that statistically inferred weights have
no underlying conceptual basis in terms of respondents’
preferences for time. It is therefore difficult to check the
extent to which statistically inferred weights are
consistent with the QALY approach.
Treadwell (1998) offered an innovative solution to
some of the shortcomings of these tests. Treadwell was
concerned with examining the preferential independence
assumption that underpins the additive model under
certainty. Preferential independence holds if preferences
between profiles that contain the same health state in
period i do not depend upon the severity of the health
state in period i (Keeney & Raiffa, 1976, p. 101).
Treadwell asked respondents to choose between two
profiles that occurred with certainty and included health
state Z in period i: He then tested whether changing theseverity of health state Z altered a respondent’s choice
between these two profiles. Given that the comparison
of health states was made within the same period, this
test offers a simple technique to control for a
respondent’s preferences for time. He concluded that
preferential independence held in 36 out of 42 tests. Our
second test is similar to Treadwell’s approach but rather
than testing the additive model under certainty we test
the implications of the additive model under risk given
that most treatment outcomes are risky. The test is
based on two profiles that contain health state Z in
period i:We measure the utility that is derived from eachof the profiles using a method that reflects a respon-
dent’s attitude towards risk. The test then checks
whether changing the severity of health state Z in
period i alters a respondent’s preferences between these
two profiles.
Method
Overview of the questionnaire
The study uses the EuroQol classification system,
which describes states of health along five dimensions:
mobility, self-care, usual activities, pain and anxiety
(Kind, Dolan, Gudex, & Williams, 1998; Dolan, 1996,
1997).4 Each dimension has three levels of severity: no
problems, some problems and severe problems, denoted
by 1, 2 and 3, respectively and colour-coded black, blue
and red in our study.5 Each health state is colour-coded
and we refer to these as follows: 11111 as N; 12221 asW ; 21222 as Y ; 22232 as Z and death as D:6 Health stateW allows respondents to become familiar with the
methods, but the utilities are not used in the study. The
respondents are asked to imagine a profile in which each
health state lasts for 10 years without change, to be
followed immediately by death. The profiles are depicted
on cards and respondents are asked to rank the cards.
They are then asked to consider treatments leading to
changes of health over the next 10 years. It is explained
to respondents that each profile consists of three
periods: the first 3 years, the second 3 years and the
last 4 years. Health states are constant in any one period.
The respondents are asked to consider two or three
different health states in the 10 year profile (after which
they would die).7 The profiles are again depicted on
cards, for instance, Fig. 1 shows 3 years in health state
N ; followed by 3 years in health state Y and 4 years in
health state Z; which we denote by NYZ: The
questionnaire considers a wide variety of profiles which
may lead to a rejection of the additive model, such as
adaptation to temporary health states and a desire to
overcome ill-health and look forward to good health
(for a list of profiles see column 2 of Table 1). The cards
are passed to respondents and they are asked to rank
them.
Respondents are then asked 10 standard gamble (SG)
questions that elicit cardinal von Neumann and
ARTICLE IN PRESS
3A model gives better predictions if it has a lower predictive
error. The predictive error for the model that assumed a zero
discount rate was 0.35 and the predictive error for the model
that assumed statistically inferred weights was 0.19.
4Mean and median estimates of all the EuroQol states can
calculated from the formula given by Dolan (1996) and Dolan
(1997) respectively.5Level 3 pain was described as moderate pain or discomfort
with periods of severe pain or discomfort rather than extreme
pain or discomfort used in the EuroQol work.6Health states Z is more severe than health state Y in terms
of self-care and pain, so that respondents could easily
discriminate between them.7Two states have the advantage of simplicity, three allow the
possibility of considering the effects of more complicated
patterns, for instance declining or improving states.
A. Spencer / Social Science & Medicine 57 (2003) 1697–1706 1699
Morgenstern (1944) utilities, denoted by Uð:Þ; for
different health profiles. The SG questions give an
estimate of the utility of the entire health profile, and so
are a type of holistic elicitation procedure. In each SG
question, the choice is between remaining in a health
profile, say ZNN ; or undergoing a risky treatment. Therisky treatment has a probability p of succeeding,
resulting in a better health state, normal health, or
(1� p) of failing, resulting in a worse health state, death
(as shown in Fig. 2). They are asked to state the chance
of success and failure where they consider the alter-
natives to be most finely balanced and they do not mind
which treatment they receive. Probability p is varied
until the respondents are indifferent between the two
alternatives. To help them with this they are given a
sheet of paper listing the chances of success/failure
against which they mark their response (based on Jones-
Lee, Loomes, & Philips, 1995). The point at which
respondents are indifferent between the profile and the
risky treatment is used to derive the SG utility for the
profile. In our example, the point of indifference
between the profile ZNN and the risky treatment can
be represented by:
UðZNNÞ ¼ p � UðNNNÞ þ ð1� pÞ � UðDDDÞ:
If the UðNNNÞ ¼ 1 and UðDDDÞ ¼ 0 this expression
becomes:
UðZNNÞ ¼ p: ð2Þ
In this example, the SG utility for profile ZNN is p:
Finally, the questionnaire includes a test of additive
independence (Bleichrodt, 1995). The question asked
respondents to imagine that they became ill and are
offered a choice between two treatments shown in Fig. 3.
No treatment would result in them remaining in health
state ZZZ: In treatment C; they have a 50% probability
of it succeeding and returning to ZNN over the next 10
years and a 50% probability of it failing and resulting in
NZZ: In treatment D; they have a 50% probability of it
succeeding and returning to full health for the next 10 years
and a 50% probability of it failing and then remaining in
health state Z for the next 10 years. The respondent is
asked which they prefer or, if they do not mind which
treatment they receive, and to explain their answer.
A test of additive independence
The test of additive independence forms the first test
of the QALY approach. We estimate the proportion of
respondents who prefer each treatment or are indiffer-
ent. If there is no measurement error and additive
independence holds, respondents will be indifferent
between treatments C and D: If there is measurementerror, respondents could be indifferent between the two
treatments but inadvertently report a preference. Given
that we do not know the distribution of this measure-
ment error, we simply calculate a confidence interval
around the proportion of respondents who are indiffer-
ent, to be suggestive of the proportion of respondents
ARTICLE IN PRESS
Fig. 1. An example of a health profile.
Table 1
An overview of the questionnaire
(1) Question (2) Profiles (3) Mean SG utility (4) Median SG utility (5) Standard deviation SG utility
1 WWW 0.800 0.825 0.170
2 YYY 0.777 0.800 0.142
3 ZZZ 0.461 0.450 0.245
4 YYZ 0.642 0.660 0.184
5 ZNN 0.903 0.950 0.110
6 YYN 0.875 0.925 0.117
7 NYZ 0.707 0.750 0.198
8 ZYN 0.803 0.850 0.149
9 ZYZ 0.508 0.500 0.230
10 YYD 0.482 0.500 0.236
Treatment C or D Additive independence test n/a n/a n/a
A. Spencer / Social Science & Medicine 57 (2003) 1697–17061700
who would be indifferent at a population level.
Confidence intervals for proportional data are based
on the exact probabilities of the binomial distribution
because the proportions involved are small (Bland, 1995,
pp. 125–126).8 We also test whether there is a strict
preference for C or D within the sample using
McNeumar’s test (Daniel, 1990, p. 165). Bleichrodt
(1995) anticipated that more intermittent health states,
such as ZNN or NZZ would be preferred to chronic
states, such as NNN or ZZZ: The null hypothesistested here predicts that the same proportion of
respondents prefer treatments C and D: The alternativehypothesis predicts that significantly more respondents
prefer C or D:
ARTICLE IN PRESS
3 yr 6yr 10 yr
3 yr 6yr 10 yr
(1-p)%
Treatment A
p%
0
0
Place each card in turn100%
Treatment B
Fig. 2. The format of the SG questions 1–10.
Fig. 3. A test of additive independence.
8The software package STATAr is used to construct the
confidence intervals based on the exact probabilities of the
binomial distribution.
A. Spencer / Social Science & Medicine 57 (2003) 1697–1706 1701
Additive independence is rejected only for responses
that represent a strict preference for treatments C or D:Qualitative comments are recorded and help verify
whether this was the case or whether respondents who
expressed a strict preference for a treatment were
indifferent between the two treatments (Varian, 1992).
A test of the implications of the additive model
The second test reported in this paper investigates the
implications of the additive model based on the SG
utilities derived in questions 4–10. The questionnaire
incorporates two versions of this test: Versions 1 and 2.
In Version 1, we compare the profiles ZNN and ZYZ
with the profiles NNN and NYZ as shown in Fig. 4.
Preferences should be the same between these two pairs
of profiles, since the only difference between them is that
the former pair of profiles begin with health state Z
whilst the latter pair begin with health state N :9 The nullhypothesis of this test predicts that the additive model
holds and the differences in SG utilities between profiles
ZNN and ZYZ are the same as the differences in SG
utilities between profiles NNN and NYZ (i.e.
½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ� ¼ 0).
The alternative hypothesis predicts that the SG utilities
of the two pairs of profiles differ (i.e.
½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ�a0).The SG utilities used in this test are those derived in
questions 5, 9 and 7, respectively (except profile NNN
that is assigned the utility of 1). Similarly, in Version 2,
we compare the profiles ZYN and ZYZ; with the
profiles YYN and YYZ: The null hypothesis of this testpredicts that the additive model holds and the differ-
ences in SG utilities between profiles ZYN and ZYZ are
the same as the differences in SG utilities between
profiles YYN and YYZ (i.e. ½UðZYNÞ � UðZYZÞ��½UðYYNÞ � UðYYZÞ� ¼ 0). In both versions of this test,
the Wilcoxon’s matched pairs test is used to test whether
the magnitudes of differences in SG utilities are
sufficient to reject the additive model (Bland, 1995, pp.
212–215).10 The test of the additive model relies only on
a comparison of preferences for health states occurring
within the same time period.11 The test, therefore,
controls for the effect of time preference.
Finally, we investigate the extent to which SG utilities
based on holistic elicitation procedures (questions 4–10)
equal the SG utilities implied by a profile’s constituent
states. The utility of health states Y and Z can be
estimated from profiles in which the health state lasts for
the full 10 years without change, i.e. YYY and ZZZ;respectively (questions 2–3). These constituent health
states can then be used to calculate an implied utility for a
profile based on a 0%, 5%, 10% discount rate. If we have
correctly estimated the discount rate and the additive
model holds, then SG utilities based on holistic elicitation
procedures equal the SG utilities implied by a profile’s
constituent states.12 The Wilcoxon’s matched pairs test is
used to check whether these utilities are equal, but is a
weaker test of the QALY approach than those discussed
so far since it relies on imposing a particular discount
rate. The results therefore are used only to indicate the
profiles that may be driving the results in the second test.
Data
The researcher contacted residents of York who had
taken part in a pilot Health and Safety study in the
previous 4 months. Respondents were invited to take
part in a 60-minutes interview in the Department of
Economics at York University for a payment of d10. All
interviews were tape-recorded. The sample size of the
study was based on detecting a difference of 0.1 in the
SG utilities between the pairs of profiles in the second
test (i.e. ½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ�;etc.).13 This test was designed to have an 80% chance of
detecting significant differences based on a standard
deviation 0.2.14,15 In total, 29 respondents were inter-
viewed, 15 males and 14 females.
ARTICLE IN PRESS
9Additive independence implies that if a respondent prefers
ZNN over ZYZ; then they prefer NNN over NYZ: Similarly,additive independence implies that if a respondent prefers ZYZ
over ZNN; then they prefer NYZ over NNN (likewise for
indifference).10A two-tailed Kolmogorov–Smirnov test at a 5% signifi-
cance level was used to test the hypothesis that data were
normally distributed. The test statistic was sufficiently large to
reject the null hypothesis. We, therefore, apply nonparametric
tests in what follows.11 If we illustrate this for Version 1 of the test using the
notation of Eq. (1), the differences in the SG utilities between
(footnote continued)
the two pairs of profiles are: UðZNNÞ � UðZYZÞ ¼UðNNNÞ � UðNYZÞ ¼ w2ðUðNÞ � UðY ÞÞ þ w3ðUðNÞ � UðZÞÞ:The two pairs of profiles therefore compare the same health
states at the same time period.12 In this calculation we assume that a respondent’s rate of
time preference does not vary and that the SG utilities based on
holistic elicitation procedures equal the SG utilities implied by a
profile’s constituent states implied for only one of the three
discount rates.13 Johnston, Brown, Gerard, O’Hanlon, and Morton (1998)
and Dolan (1996) powered their tests to detect a difference of
0.1 in health state utilities.14Bleichrodt and Johannesson (1997) based their power
calculations on a standard deviation of 0.20y given that the
standard deviations for the time-trade-off and standard gamble
quality weights reported in the literature rarely exceed 0.20 (p.
27).15The power calculations are based on the standard
deviations of ½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ�in Version 1 of the second test and ½UðZYNÞ � UðZYZÞ� �½UðYYNÞ � UðYYZÞ� in Version 2.
A. Spencer / Social Science & Medicine 57 (2003) 1697–17061702
Results
In the additive independence test, the sample was split
into those preferring treatment C or treatment D (13
preferred C; 15 preferred D), with only one respondent
expressing no preference between the two treatments.
We calculate a confidence interval around the proportion
of respondents who are indifferent using the binomial
distribution. In the sample, 3.4% (1/29) of respondents
were indifferent with a 95% confidence interval of 0.08–
17%. This confidence interval suggests that the propor-
tion of indifferent respondents at a population level is
below 17%. We also test whether the proportion
preferring C is significantly different from the proportion
preferring D: The McNeumar’s test statistic is 0.378 andis less than the z-score of 1.96 at a 5% significance level,
so we are unable to reject the null hypothesis.
The test is unable to identify those respondents who
are very close to indifference and categorised as
preferring C or D: In the qualitative comments, threerespondents felt that the treatment outcomes were
similar, but only one of these expressed indifference.
Therefore, only a small proportion of respondents who
stated a preference for a treatment appeared to regard
the two treatments as similar. The qualitative comments
support the notion that few respondents who were
indifferent between the two treatments had inadvertently
reported a preference.
In the additive model test, Table 2 reports the
differences in SG utilities between pairs of profiles
considered in Versions 1 and 2 of this test. In Version 1,
the mean and median differences in SG utilities between
the two pairs of profiles (i.e. ½UðZNNÞ � UðZYZÞ��½UðNNNÞ � UðNYZÞ�) were 0.102 and 0.070, respectively
ARTICLE IN PRESS
Fig. 4. Health profiles used in Version 1 of the second test.
Table 2
Differences in SG utilities between pairs of profiles
(1) (2) Mean (3) Median (3) Standard deviation
Version 1
UðZNNÞ � UðZYZÞ 0.394 0.400 0.224
UðNNNÞ � UðNYZÞ 0.293 0.250 0.198
½UðZNNÞ � UðZYZÞ� � ½UðNNNÞ � UðNYZÞ� 0.102 0.070 0.164
Version 2
UðZYNÞ � UðZYZÞ 0.295 0.250 0.201
UðYYNÞ � UðYYZÞ 0.233 0.200 0.171
½UðZYNÞ � UðZYZÞ� � ½UðYYNÞ � UðYYZÞ� 0.062 0.021 0.152
A. Spencer / Social Science & Medicine 57 (2003) 1697–1706 1703
(line 4, Table 2). The Wilcoxon’s matched pairs test
found these differences to be statistically significant, and
so we reject the additive model (two tailed P ¼ 0:0048).In Version 2, the mean and median differences in the
SG utilities between the two pairs of profiles (i.e.
½UðZYNÞ � UðZYZÞ� � ½UðYYNÞ � UðYYZÞ�) were
0.062 and 0.021, respectively (line 8, Table 2). However,
in this version, the Wilcoxon’s matched pairs test found
these differences not to be statistically significant, and so
we could not reject the additive model (P ¼ 0:0656).The test had been powered to detect a difference of 0.1
in SG utilities between the two pairs of profiles based on
a standard deviation of 0.2. Version 2 had a slightly
higher power to detect these differences, 94% compared
to 91%, since the standard deviation was lower. Despite
this, Version 2 failed to find statistically significant
differences.
We investigate the extent to which the SG utilities
based on holistic elicitation procedures are equal to the
SG utilities implied by their constituent states. Table 3
shows the implied median utilities for profiles based on
constituent states and a 0%, 5% and 10% discount rate
(column 4) and holistic elicitation procedures (column
3). In questions that involve an improvement in health,
ZNN ; YYN; and ZYN (in questions 5, 6 and 8,
respectively), the SG utilities based on holistic elicitation
procedures were best estimated by imposing a zero
discount rate on the constituent states. However, in the
questions that involved a decline in health YYZ; NYZ
and YYD (in questions 4, 7 and 10, respectively), the SG
utilities were best estimated by imposing a positive
discount rate on the constituent states. For instance, in
profile NYZ the SG utilities were best estimated by
imposing a 10% discount rate on the constituent
states.16
The Wilcoxon’s matched pairs test was used to check
whether the differences between profiles estimated
holistically and by their constituent states were statisti-
cally significant for a zero discount rate. These showed
that the profile utilities for questions ZNN and ZYZ
(questions 5 and 9, respectively) were statistically
different from the utilities based on the constituent
states at a 5% significance level (P ¼ 0:0049 and P ¼0:0094; respectively). In profile ZYZ the constituent
states overestimated the profile whilst in profile ZNN
the constituent states underestimated the profile.
Conclusion
In the additive independence test, individual respon-
dents’ preferences lead us to reject additive indepen-
dence. We also find that the sample was split almost
equally between respondents who preferred treatment C
and those who preferred treatment D: A caveat to this
test is that we did not collect information on the strength
of preference, we are therefore unable to identify from
the quantitative data those respondents who are very
close to indifference and categorised as preferring
C or D:In the additive model test, with only one of the two
tests detecting statistically significant differences, we are
unable to conclusively reject the model. A caveat of this
test is that it is designed to detect a difference of 0.1
only. It is possible that policy makers may want to detect
ARTICLE IN PRESS
Table 3
Median SG utilities for profiles estimated holistically and by their constituent states
(1) Question (2) Profile (3) SG utilities based on
holistic elicitation
procedures
(4) SG utilities of profiles estimated by their constituent
states
Discount rate of 0% Discount rate of 5% Discount
rate of 10%
4 YYZ 0.660 0.630 0.647 0.674
5 ZNN 0.950 0.835 0.806 0.777
6 YYN 0.925 0.880 0.869 0.858
7 NYZ 0.750 0.680 0.713 0.751
8 ZYN 0.850 0.745 0.705 0.676
9 ZYZ 0.500 0.530 0.530 0.530
10 YYD 0.500 0.480 0.526 0.567
16For profile ZYZ; the median utility implied by its
constituent states does not change as the discount rate increases
(footnote continued)
from 0% to 10%, though the mean slightly increases. Profile
ZYZ is the only profile where health state Z appears in both the
first and last period. The primary impact of the increase in
discount rate, therefore, is to apply more weight to health state
Z in the first period and less weight to health state Z in the last
period, with little overall impact on the median utility.
A. Spencer / Social Science & Medicine 57 (2003) 1697–17061704
differences that are less than 0.1, but this would require
a larger sample.
Which profiles lead to a rejection of the additive
model? For this we return to Version 1 of the second test
that is found to be significantly different. In this version,
the observed differences could arise from one or more of
the following: profile ZNN is valued higher or profiles
ZYZ and NYZ are valued lower than the additive model
would predict.17 An indication of the profiles that are
driving this result can be found by looking again at the
estimation of profiles from constituent states. When
there is a zero discount rate, the case considered by
Dolan and Gudex (1995), the constituent states over-
estimate the temporarily improving profile ZYZ and
underestimate the improving profile ZNN: From this we
tentatively conclude that profiles ZYZ and ZNN lead to
a rejection of the additive model in our study.
Two recommendations arise from this paper. Our first
recommendation is that when testing for additive
independence, strength of preference information should
be collected. This information will help to identify
responses that are close to indifference. In addition,
when preferences are very polarised, it will clarify the
extent to which summary statistics differ from individual
respondents’ preferences (Dolan, 2000). For example,
individual respondents’ preferences may fail to comply
with additive independence, but at a sample level these
preferences cancel each other out and on average,
additive independence holds. A similar difference
between respondents’ preferences and the sample’s
summary statistics has been reported in the estimation
of time preferences and discount rates. There is a wide
variation in respondents’ discount rates but, on average,
the discount rate is zero (Dolan, 2000). New research is
exploring the potential for subgroup analyses of
preferences and the extent to which preferences should
be aggregated (Sculpher & Gafni, 2001). Our second
recommendation is that future research should continue
to check for sequencing effects. The extent to which
sequencing effects arise appears to be heavily dependent
upon the profile and the viewpoint about the benefit
derived from such profiles. This element of subjectivity is
in keeping with the notion that respondents’ preferences
for health treatments are related to their expectations of
health (Chapman, 1996). Patterns are beginning to
emerge in the empirical work about the instances in
which the additive model does not hold, but our tests are
unable to conclusively reject the QALY approach. At
the moment the best way forward would be to include
estimation of profiles and constituent states in future
studies to check the extent to which the QALY approach
continues to hold.
Acknowledgements
The author would like to thank Graham Loomes,
Karl Claxton, Sandra Eldridge, Judith Covey, Bj .orn
Lindgren, Carl Hampus Lyttkens and two referees for
their valuable comments. In addition, the author is
grateful to the Swedish Social Research Council for
funding a visiting research fellowship to pilot the
approach, via research grants to Bj .orn Lindgren, Lund
University. The author is also grateful to the Leverhulme
Trust for financial support for the UK study (funded by
the project ‘The Anatomy of Decision Making under
Risk Over Time’). Any errors are the responsibility of
the author alone.
References
Bland, M. (1995). An introduction to medical statistics (2nd ed.).
Oxford: Oxford University Press.
Bleichrodt, H. (1995). QALYS & HYEs: Under what condi-
tions are they equivalent? Journal of Health Economics, 14,
17–37.
Bleichrodt, H., & Johannesson, M. (1997). An experimental test
of constant proportional tradeoff and utility independence.
Medical Decision Making, 17, 21–32.
Bleichrodt, H., & Quiggin, J. (1997). Characterizing QALYs
under a general rank dependent utility model. Journal of
Risk and Uncertainty, 15, 151–165.
Bleichrodt, H., Wakker, P., & Johannesson, M. (1997).
Characterising QALYs by risk neutrality. Journal of Risk
and Uncertainty, 15, 107–114.
Chapman, G. B. (1996). Expectations and preferences for
sequences of health and money. Organizational Behavior and
Human Decision Processes, 67, 59–75.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd
ed.). Boston: PWS–KENT Publishing Company.
Dolan, P. (1996). Modelling valuations for health states: The
effect of duration. Health Policy, 38, 189–203.
Dolan, P. (1997). Aggregating health state valuations. Journal
of Health Services Research and Policy, 2, 160–165.
Dolan, P. (2000). The measurement of health-related quality of
life. In A. Culyer, & J. P. Newhouse (Eds.), Handbook of
health economic (pp. 1723–1759). Amsterdam: Elsevier.
Dolan, P., & Gudex, C. (1995). Time preference, duration and
health state valuations. Health Economics, 4, 289–299.
Gafni, A. (1995). Time in health: Can we measure individuals’
pure time preference. Medical Decision Making, 15, 31–37.
Holmes, A. M. (1998). Measurement of short term health
effects in economic evaluations. Pharmacoeconomics, 13,
171–174.
Johnston, K., Brown, J., Gerard, K., O’Hanlon, M., & Morton,
A. (1998). Valuing temporary and chronic health states
associated with breast screening. Social Science and
Medicine, 47, 213–222.
Jones-Lee, M. W., Loomes, G., & Philips, P. (1995). Valuing
the prevention of non-fatal road injuries: Contingent
valuation versus standard gamble. Oxford Economic Papers,
47, 676–695.
ARTICLE IN PRESS
17 In Version 1, the SG utilities between profiles ZNN and
ZYZ were greater than the differences in SG utilities between
profiles NNN and NYZ:
A. Spencer / Social Science & Medicine 57 (2003) 1697–1706 1705
Kahneman, D., Fredrickson, B. L., Schreibner, C. A., &
Redelmeier, D. A. (1993). When more pain is preferred to
less: Adding a better end. Psychological Science, 4, 401–405.
Keeney, R. L., & Raiffa, H. (1976). Decisions with multiple
objectives, preferences and value tradeoffs. London: Wiley.
Kind, P., Dolan, P., Gudex, C., & Williams, A. (1998).
Variations in population health status: Results from a
United Kingdom national questionnaire survey. British
Medical Journal, 16, 736–741.
Krabbe, P. F., & Bonsel, G. J. (1998). Sequence effects, health
profiles, and the QALY model: In search of realistic
modelling. Medical Decision Making, 18, 178–186.
Kuppermann, M., Shiboski, S., Feeny, D., Elkin, E. P., &
Washington, A. E. (1997). Can preference scores for discrete
states be used to derive preference scores for an entire path
of events? Medical Decision Making, 17, 42–55.
Lipscomb, J. (1989). The preference for health in cost-
effectiveness analysis. Medical Care, 27, S233–253.
Loewenstein, G., & Prelec, D. (1993). Preferences for sequences
of outcomes. Psychological Review, 100, 91–108.
Loomes, G., & McKenzie, L. (1989). The use of QALYs in
health care decision making. Social Science and Medicine,
28, 299–308.
Mackeigan, L. D., O’Brien, B. J., & Oh, P. I. (1999). Holistic
versus composite preferences for lifetime treatment se-
quences for type 2 diabetes. Medical Decision Making, 19,
113–121.
Miyamoto, J. M., Wakker, P. P., Bleichrodt, H., & Peters, H. J.
M. (1998). The zero-condition: A simplifying assumption in
QALY measurement and multiattribute utility. Manage-
ment Science, 44, 839–849.
von Neumann, J., & Morgenstern, O. (1944). Theory of games
and economic behavior. Princeton, NJ: Princeton University
Press.
Pliskin, J. S., Shepard, D. S., & Weinstein, M. C. (1980). Utility
functions of life-years and health status. Operational
Research, 28, 206–224.
Richardson, J., Hall, J., & Salkeld, G. (1996). The measurement
of utility in multiphase health states. International Journal of
Technology Assessment in Health Care, 12, 151–162.
Ross, W. T., & Simonson, I. (1991). Evaluating pairs of
experiences: A preference for happy endings. Journal of
Behavioral Decision Making, 4, 273–282.
Sculpher, M., & Gafni, A. (2001). Can we reflecct variation in
soceital health state preferences in cost-effectiveness analy-
sis? International Health Economics Association, third
international conference, York.
Torrance, G. W., Thomas, W. H., & Sackett, D. L. (1972). A
utility maximization model for evaluation of health care
programmes. Health Service Research, 7, 118–133.
Treadwell, J. R. (1998). Tests of preferential independence
in the QALY model. Medical Decision Making, 18,
418–428.
Varey, C., & Kahneman, D. (1992). Experiences extended
across time: Evaluation of moments and episodes. Journal
of Behavioral Decision Making, 5, 169–185.
Varian, H. R. (1992). Microeconomic analysis (3rd ed.). New
York: Norton and Company.
ARTICLE IN PRESSA. Spencer / Social Science & Medicine 57 (2003) 1697–17061706