Upload
cambridge
View
0
Download
0
Embed Size (px)
Citation preview
!
10,626 words2013–2014HPSC3026
The Reference Class Problem vis-à-vis Evidence-Based Medicine
by Connor Cummings | [email protected] | !!
Supervised by: Dr. Brendan Clarke Department of Science and Technology Studies
University College London !
!!!!
“Through seeking we may learn and know things
better. But as for certain truth, no man has known
it/ Nor shall he know it… For all is but a woven
web of guesses.”
– Xenophanes
!!
“It should be borne in mind that when we are
attempting to make real inferences about things as
yet unknown, it is in this form that the problem will
practically present itself.”
– Venn, 1876: 194
!
!ABSTRACT
Doctors are instructed to assess published trial results on their relevance to
individual patients. This paper examines the inferential steps that link evidence
showing that an intervention is effective in a trial population to the prediction of
effectiveness for an individual receiving that treatment. Typically, trial participants
are classified using a small number of relevant biological, social or other
characteristics (for example, age, gender, or ethnicity). However, individuals are
not simple: their important characteristics will not be fully specified by these few
classifiers. Apparently minor changes in classification (for example, a patient as ‘a
50 year old man’, or as ‘a 50 year old man with a stressful job’) may radically
change the interventional outcomes. This paper explicates the pertinence of this
problem vis-à-vis the predominant theories of probability that presently underpin
randomised control trials (namely: frequentism and Bayesianism). I argue that the
Reference Class Problem is both pervasive and profound vis-à-vis evidence-based
medicine and that an acknowledgement of the problem by the medical community
is long overdue.
!ACKNOWLEDGEMENTS
I would like to extend my sincerest gratitude to Dr. Brendan Clarke at the
Department of Science and Technology Studies, University College London. He
has been both an encouraging and informative source of support throughout my
work on this dissertation. More broadly, he has afforded me not merely time but
also inspiration.
CONTENTS !
1. INTRODUCTION 6......................................................................................
2. THE REFERENCE CLASS PROBLEM 7...................................................2.1 The evolution of the ‘Reference Class Problem’
2.1.1 Venn
2.1.2 Reichenbach
2.1.3 Hájek
2.2 The ubiquity of the problem
3. THE PROBLEM VIS-À-VIS EVIDENCE-BASED MEDICINE 13...........3.1 Preliminary analogy
3.2 Clinical trials
3.3 External validity/generalisation 3.3.1 A matter of probabilistic inference
3.3.2 Frequentist interpretation
3.3.3 Bayesian approach
3.4 Summary
4. CONSTRUCTING REFERENCE CLASSES 27.........................................4.1 Heterogeneous reference classes
4.1.1 Venn
4.2 Homogeneous reference classes 4.2.1 Absolute homogeneity
4.2.2 Salmon
4.2.3 Relevant homogeneity vis-à-vis randomised control trials
4.3 How evidence-based medicine ought to evolve 4.3.1 Transparency vis-à-vis the Reference Class Problem
4.3.2 Extraneous considerations
5. CONCLUSION 34.........................................................................................
6. REFERENCES 36.........................................................................................
1. INTRODUCTION
Suppose, if you will, that I sought to determine the chance that my house
were to burn down, at some stage in the next five years. There are a number of 1
ways in which I could go about determining such a probability; one credible way
could be by collecting data on what proportion of houses, similar to my own, have
burned down in the past (over some appropriate five-year period). Before I am able
to gather data, however, I must first incorporate my house into some appropriate
reference class in order to identify those other houses that are similar to my own.
Therein, however, begins to emerge a problem: how am I to determine which
characteristics of my house ought to dictate which other houses are similar?
This problem has its roots in the fact that there appears to be an
indeterminable number of ways in which I am able to reference my house. It is ‘a
house in Holloway’, ‘a house with an open fire’, a ‘house with gas cooking
appliances’, a ‘house inhabited by a smoker’, a ‘house with a crimson door’ and so
forth. Depending on which reference class I assign my house will dictate which
other houses I include in my data set and, consequently, the apparent likelihood of
my own house burning down will differ. Importantly, however, none of these
reference classes appear to be objectively the right one. This is the Reference Class
Problem, in a nutshell.
In this paper, I outline that the above Reference Class Problem appears to be
of troubling pertinence to our current paradigm of medical practice: evidence-based
medicine. I begin by outlining the problem in abstraction. As the reader will soon
discover, the problem appears to be both profound and pervasive. In the latter
chapters, I map the Reference Class Problem directly onto evidence-based
medicine and explore the ways in which it threatens to corrode the very evidence
on which our medical practice is presently being based.
Pg. !6Connor Cummings
For reasons that will become apparent shortly, it is important to emphasise that I am interested in my 1
particular house, and not houses in general.
2. THE REFERENCE CLASS PROBLEM
In this chapter, I outline how philosophers of science have become familiar
with the Reference Class Problem, historically. The reader will, first, be taken back
to Cambridge in the middle of the nineteenth century, where John Venn was
developing a new philosophical theory of probability: frequentism. (Gillies, 2000:
88) In doing so, he stumbled upon a problem that was later coined by Hans
Reichenbach, in 1949, as ‘the problem of the reference class’ (see Section 2.1.2). In
the latter half of this chapter (Sections 2.1.3 and 2.2), I outline the apparent
ubiquity of this problem in a contemporary context.
2.1 The evolution of the ‘Reference Class Problem’
2.1.1 Venn
The ‘Reference Class Problem’, as it has come to be known today, naturally,
has a history. In order to develop a thorough understanding of the notion presently,
it is apt to begin by taking the reader back to the work of John Venn, in the latter
half of the nineteenth century. The Logic of Chance was originally published in
1866, with second and third editions emerging in 1876 and 1888, respectively. 2
Therein, Venn lays the foundations on which the ‘Reference Class Problem’ has
since been established.
Venn begins by initially abstracting that a practical problem is bound to
emerge “when we are attempting to make real inferences about things as yet
unknown”. (1876: 194) He proceeds to outline a thought experiment in which one
is interested in deriving the probability of John Smith living for a further eleven
years. He concedes that this may superficially appear to be a rather menial
endeavour – demanding a distinct lack of mental fitness – that is undertaken merely
“by counting how many men of the age John Smith, respectively do and do not live
for eleven years.” (194) However, whilst this may induce the probability that a
‘fifty year old man’ will live for eleven years more, it does not bequeath unto one
the specific probability of John Smith doing so. For Venn, a problem begins to
emerge due to the following observation: !“It is obvious that every individual thing or
event has an indefinite number of properties or
attributes observable in it, and might therefore be
Pg. !7Connor Cummings
In this section, I will be referring specifically to the second edition of Venn’s The Logic of Chance (1976); 2
this edition is significantly more extensive than Venn’s first, which does not include the above material pertinent to this paper and the Reference Class Problem.
considered as belonging to an indefinite number of
different classes of things.”
(Venn, 1876: 194)
Indeed, it does not seem immediately apparent as to which ‘properties or attributes’
of John Smith’s are relevant to the probabilistic endeavour at hand.
For Venn, the problem grows teeth as, by assigning a thing or event to one
reference class, one is simultaneously assigning it to “all the higher classes, the
genera, of which that class was a species.” (195) For example, ‘a man of fifty years
old’ falls within the class of ‘mammals of fifty years old’ and, of course, the class
of ‘living things of fifty years old’ – to name just two. Venn argues, on these
grounds, that reference classes are thus assigned somewhat arbitrarily; indeed, John
Smith does not naturally present himself with one particular reference class to
which he definitively ought to be assigned. One must, therefore, assign him to a
reference class by means of some auxiliary judgement: !“In saying that it is thus arbitrary under which
class he is placed, we mean, of course, that there
are no logical grounds of decision; the selection
must be determined by some extraneous
considerations.”
(Venn, 1876: 195)
For Venn, the problem of the reference class was profound and largely shaped by
his own philosophical theory of probability, as I address below.
In Cambridge, United Kingdom, at the middle of the nineteenth century,
Venn and Ellis were attempting to establish a new interpretation of probability: the
frequentist theory. (Gillies, 2000: 88) Donald Gillies outlines the philosophical
foundations of their new probabilistic theory, below: !“In contrast to both [logical and subjective] views,
the frequency approach sees probability theory as a
mathematical science, such as mechanics, but
dealing with a different range of observable
phenomena.”
(Gillies, 2000: 88)
In an attempt to rid probabilistic inquiry of subjectivity, Venn strove to establish a
theory of probability that was entirely grounded in science: affording one objective
probabilistic truth. However, for Venn, difficulties selecting the most appropriate
reference class only proved to taint his theory of probability with an apparently
inescapable element of subjectivity. Indeed, Hájek argues that, to this day, the
Pg. !8Connor Cummings
Reference Class Problem is often regarded as “the most serious problem that
frequentism faces.” (Hájek, 2000: 565)
In his Logic of Chance, Venn proceeds to develop his problem of reference
with specific respect to probability in some detail. However, it was not until
Reichenbach’s, The Theory of Probability (1949) that the notion began to decidedly
resemble the ‘Reference Class Problem’ with which one may be familiar, today.
2.1.2 Reichenbach
Reichenbach’s account of the problem is largely analogous to Venn’s, above,
however it is nuanced in two relevant respects. Firstly, Reichenbach establishes the
problem nominally; secondly, he puts forward a more refined approach to
constructing a suitable reference class than Venn’s “[determination] by some
extraneous considerations” (Venn, 1876: 195) that we have seen, above.
Reichenbach ultimately concedes, along with Venn, that there appears to be no
definitive solution to the problem.
The culmination of Reichenbach’s discussion of the Reference Class
Problem is illustrated in the following passage from his Theory of Probability
(1949):
“If we are asked to find the probability holding for
an individual future event, we must first
incorporate the case in a suitable reference class.
An individual thing or event may be incorporated
in many reference classes, from which different
probabilities will result. This ambiguity has been
called the problem of the reference class.”
(374)
Most interestingly, Reichenbach proceeds by explicitly introducing the notion that
new information about the thing or individual in question can be valuable in order
to assign it to a more appropriate reference class. He continues by outlining that, on
refinement of a reference class, a more accurate probability for a specific future
event outcome can be subsequently derived. (1949, 372–378)
Reichenbach illustrates this point by way of a thought experiment. He
proposes that, subsequent to investigation by chest X-ray, a patient can be moved
from ‘patients with tuberculosis’ to ‘patients with severe tuberculosis’, based on the
new information available to the physician. (1949: 394) In light of this new
information, a more accurate probability of the patient’s prognosis can thence be
Pg. !9Connor Cummings
derived. He concludes that, vis-à-vis the Reference Class Problem, one ought to 3
proceed by “considering the narrowest class for which reliable statistics can be
compiled.” (1949, 374)
Although Reichenbach proposes a way in which the Reference Class
Problem might be attenuated, it is important to note that he explicitly ascertains
that the problem is ultimately inescapable. (1949: 374) Indeed, the ubiquitous and
pervasive nature of the Reference Class Problem is elaborated on by a number of
contemporary philosophers: most notably by Alan Hájek in 2007, whereby he
emphatically argues that the Reference Class Problem is your problem too.
2.1.3 Hájek
The Reference Class Problem was once again brought to the attention of the
philosopher of science in 2007, by Alan Hájek. In his paper, Hájek outlines the
different interpretations of probability that presently exist and proceeds to explicate
how the Reference Class Problem affects each and every (useful) probabilistic
interpretation, in turn.
Hájek begins by outlining that there are (at least) five established
interpretations of probability that one has the capacity to adhere to (each of which
is comprised of two further subclasses): !1. Frequentism: (i) actual and (ii) hypothetical.
2. Classical: (i) finite sample spaces, and (ii) infinite
sample spaces.
3. Logical: (i) fully constrained and (ii) less constrained.
4. Propensity: (i) frequency- or symmetry-based and (ii)
neither frequency- nor symmetry-based.
5. Subjectivism: (i) radical and (ii) constrained.
(Hájek, 2007: 566)
In his paper, Hájek argues that the Reference Class Problem has hitherto been
largely referred to with respect to frequentism alone. He thence proceeds to
methodically explicate that, in actuality, each and every possible probabilistic
Pg. !10Connor Cummings
This may seem an obvious conclusion, however, vis-à-vis our current paradigm of medical practice, it is 3
one that appears to be somewhat overlooked. Venn and Reichenbach’s approaches to attenuating the Reference Class Problem will be returned to in Chapter 4, where I address ‘Constructing Reference Classes’ with respect to evidence-based medicine.
interpretation is susceptible, in its own particular way, to the Reference Class
Problem. 4
It is important to note that Hájek does in fact concede that there exist few
cases in which certain interpretations of probability can (and do) evade the
Reference Class Problem: for example, ‘radical subjectivism’. However, such
evasion, comes at a price; Hájek outlines that this probabilistic interpretation has
entirely no parameters and is, thus, entirely void of any possible practical use: !“For example, you may with no insult to rationality
assign probability 0.999 to George Bush turning
into a prairie dog, provided that you assign 0.001
to this not being the case (and that your other
assignments also obey the probability calculus)…
Your probability assignments can be completely at
odds with the way the world is, and thus are
‘guides [to life]’ in name only.”
(2007: 577)
This observation, ultimately, leads Hájek to conclude that the Reference Class
Problem “is seemingly inescapable among theories that make substantive claims
about what probabilities are and how they should be determined–that might be
genuine guides to life.” (2007: 580) 5
Hájek argues that the Reference Class Problem exhibits itself differently,
according to the interpretation of probability one chooses to adopt. Ultimately,
however, any useful probability is inherently relativised to some set of conditions
(2007: 583) and, thus, is inescapably susceptible to the Reference Class Problem in
some way or another. For Hájek, the Reference Class Problem is ubiquitous and,
consequently, your problem too.
2.2 The ubiquity of the problem
The Reference Class Problem appears to be troublingly pervasive. For this
reason, it has been of significant interest to both philosophers and statisticians
across a wide range of disciplines in recent years. For example, it has been
addressed with respect to risk and safety by Ayen & Reniers in 2013 and in the
context of legal evidence by Cheng in 2009. The problem, however, appears to
have been given little attention in the medical profession.
Pg. !11Connor Cummings
I do not go into each interpretation in detail in this paper as Hájek has already done so commendably. 4
(2007) I do, however, later return to those interpretations that are pertinent to evidence-based medicine specifically: namely frequentism and subjectivism, in Sections 3.3.2 and 3.3.3, respectively.
For more on radical subjectivism, Hájek directs the reader to de Finetti (1937).5
As we have seen in this section, Reichenbach touched on the Reference
Class Problem in a thought experiment that involved a patient with tuberculosis in
his Theory of Probability (1949). Indeed, more recently, Sterne addressed the
problem with respect to “ICU mortality predictions” in 2010. These accounts,
however, are largely deficient in one important respect. They address the ways in
which the Reference Class Problem affect the probability of an individual’s risk of
survival at a pre-interventional level. However, the Reference Class Problem
appears to be pertinent to the medical profession in a far more substantive respect
than has hitherto been acknowledged: at an interventional level.
This paper outlines that the Reference Class Problem appears to be firmly
entrenched in our current evidence-based medicine paradigm, at the level of the
randomised control trial that currently underpins medical guidelines. Interference
with another individual’s health on the basis of irrelevant evidence is extremely
morally problematic and, for this reason, an investigation into the Reference Class
Problem vis-à-vis evidence-based medicine, is long overdue.
We will now move on to address how the Reference Class Problem appears
to be of relevance to evidence-based medicine, specifically. Subsequently, in
Chapter 4, I propose avenues for further research that are both compatible with the
nature of the problem and incorporate suggestions from both Venn and
Reichenbach (1876 & 1949, respectively).
Pg. !12Connor Cummings
3. THE PROBLEM VIS-À-VIS EVIDENCE-BASED MEDICINE
The purpose of this chapter is to illustrate the pertinence of the Reference
Class Problem vis-à-vis evidence-based medicine; I begin by way of an analogy
(see Section 3.1). I thence unpack the framework of the evidence-based medicine
paradigm, at the level of the randomised control trial, in Section 3.2. I illustrate that
in order to draw conclusions from trial populations, some form of statistical
inference vis-à-vis individual future cases is necessitated. In this chapter, I
subsequently address the predominant interpretations of probability that presently
underpin randomised control trials (namely, frequentism and Bayesianism). I
finally explicate that the Reference Class Problem persists vis-à-vis evidence-based
medicine, regardless of which interpretation of probability one chooses to adopt. 6
3.1 Preliminary analogy
Assume, if you will, that a new calcium-channel blocker has been recently
developed that is thought to potentially have an anti-hypertensive effect on human
physiology. In order to investigate the potential pharmacological merits of the new
drug, a double-blind randomised control trial was designed in which the drug was
administered to a trial population of, say, men with hypertension over the age of
55. Let us now assume that the trial was successfully run and the findings
concluded that the drug was effective in 90% of the men in the trial population.
The results were consequently published for the attention of medical practitioners.
Let us now assume that a doctor is met with a 58 year old male (‘patient Q’),
whom is found to have hypertension. Adhering to the policy outlined by the
Evidence-Based Medicine Working Group (Guyatt, et al.), the doctor undertakes a
literature survey to establish the most appropriate treatment for his particular
patient:
“[The resident] proceeds to the library and…
conducts a computerized literature search. She
enters the Medical Subject Headings terms
epilepsy, prognosis, and recurrence, and the
program retrieves 25 relevant articles. Surveying
the titles, one appears directly relevant. She
reviews the paper, finds that is meets criteria she
has previously learned for a valid investigation of
Pg. !13Connor Cummings
For a rigorous account of the Philosophical Theories of Probability, see Gillies (2000).6
prognosis, and determines that the results are
applicable to her patient.”
(Guyatt et al., 1992: 2420)
Let us assume that the doctor followed the above dogma judiciously and
consequently stumbled upon the new antihypertensive drug outlined at the
beginning of this section. On rigorous and prudent analysis, the trial is praised by
the doctor to have been methodologically sound and statistically significant. The
patient in question, a 58 year old man with hypertension, falls neatly into the
reference class of the trial population (‘men with hypertension over the age of 55’),
thus, the doctor prescribes him the new drug. However, despite the doctor’s dutiful
adherence to the recommendations outlined in the name of evidence based-
medicine, let us assume that the drug did not have the desired effect on the patient.
I now outline how this result can be plausibly explained in terms of the Reference
Class Problem.
If we return to Hájek’s work on the Reference Class Problem, we can
observe that he summarises the problem in the following, succinct, fashion: !“Relativized to condition A, X has one probability;
relativized to condition B, it has another; and so
on. Yet none of the conditions stands out as being
the right one.”
(2007: 565)
The chosen reference class in our above randomised control trial is ‘men with
hypertension over the age of 55’. With respect to Hájek’s above model, “relativized
to condition A” (‘men with hypertension over the age of 55’), X (the desired effect
of reduced blood pressure) has probability 0.9. The Reference Class Problem,
however, arises here due to the fact that the particular patient of concern, above, is
more than just ‘a man with hypertension over the age of 55’. Consider, if you will,
that he is a man of African ethnic origin. There is, thus, no grounds on which to be
certain that “relativized to condition B” (‘men of African ethnic origin with
hypertension, over the age of 55’), the patient will remain to have a 0.9 probability
of experiencing the expected pharmacological effect of the new drug. Indeed,
recent research has led academics and policy makers to acknowledge that ethnicity
is a relevant factor on which pharmacological efficacy can be dependent. (Clarke,
et al., 2013: 13–14 with reference to NICE, 2011)
To emphasise the clinical importance of the problem, consider a 35 year old
man with hypertension, of African ethnic origin – ‘patient R’. With a desire to
lower his blood pressure, the patient’s doctor consults the National Institute of
Pg. !14Connor Cummings
Clinical Excellence (NICE) guidelines on the appropriate administration of
antihypertensives: !“Offer step 1 antihypertensive treatment with a
calcium-channel blocker (CCB) to people aged
over 55 years…”
(NICE, 2011)
On the above interpretation, it would appear that ‘patient R’ ought not receive the
above treatment. However, the guidelines continue, as follows: !“… and to black people of African or Caribbean
family origin of any age.”
(NICE, 2011)
In other words, by merely altering the reference class to which ‘patient R’ is
assigned, his predicted physiological response to CCBs appears to have changed
(see table, immediately below): !
!A priori, neither of the above reference classes appear to be the right one: ‘Patient
R’ falls into both equally and entirely. It is only with some auxiliary judgement that
the correct reference class can be identified. 7
As we have seen, above, the Reference Class Problem appears to have the
potential to significantly undermine the evidence on which our current medical
practice is presently being based. Before addressing how one is to potentially solve
– or at very least, attenuate – the problem, it is necessary to explore the framework
of evidence-based medicine in greater detail. In the following section, there is an
elaboration of randomised control trials; the stage at which reference classes first
begin to take shape. This leads on to a subsequent discussion of how results
observed in a randomised control trial are assumed to apply patients beyond merely
those within the trial population.
Reference Class of ‘Patient R’ NICE guideline
‘people under 55 years old’
prescription of CCB not justified
‘people of African family origin’
prescription of CCB justified!
!
Pg. !15Connor Cummings
This is analogous with Venn’s “extraneous conditions” (Venn, 1876: 195), addressed in Section 2.1.1. This 7
is a point to which I refer again, in Section 4.3.2, where I suggest a potential solution to the Reference Class Problem vis-à-vis evidence-based medicine.
3.2 Clinical trials
An exhaustive clinical trial is typically a four stage process; the third of
which is often referred to as a randomised control trial. The inherent aim of a
randomised control trial is to identify, by experimentation, “the most appropriate
treatment for future cases.” (Teira, 2011: 255 citing Pocock, 1983) This section
gives an overview of the four stages of the trial process by which a new drug is
made clinically available. 8
Stage I clinical trials are focussed largely on identifying an appropriate
dosage of a new drug. The treatment of concern is administered to some small 9
number of participants (less than 30), whom are thence closely monitored for any
changes in their physiology (either positive or adverse). Once a preliminary dose 10
has been determined (by means of weighing up the therapeutic benefit of the drug
with its side-effects and toxicity), the drug will likely proceed to Stage II of the
trial process. (Hackshaw, 2009: 9 & Teira, 2011: 255)
Compared to Stage I clinical trials, Stage II involves a slightly greater
number of participants and the main focus of investigation is the establishment of 11
a “preliminary estimate of efficacy.” (Hackshaw, 2009: 9) This stage is not strictly
designed in order to identify whether the drug is effective or not; the primary aim is
to generate preparatory data that may prove to be useful in designing an
appropriate Stage III trial. (Hackshaw, 2009: 9)
Stage III clinical trials – with which we are predominantly concerned in this
paper – invariably concern an extremely large number of participants (“usually
Pg. !16Connor Cummings
As Teira notes, trials can be designed to assess the outcomes of practically any form of intervention. In a 8
medical context, this can typically include “medical devices, surgery, alternative medicine therapies, etc.” (2011: 255) In this paper, for the purposes of clearly explicating the Reference Class Problem vis-à-vis evidence-based medicine, I will outline the problem with respect to drug trials alone. As we have seen, above, the Reference Class Problem is ubiquitous and the reader, as he/she wishes, is encouraged to explore the pertinence of the problem beyond simply randomised control drug trials.
‘Stage I clinical trials’ are alternatively referred to as ‘translational clinical trials’ as they form a bridge 9
between the developmental process in the laboratory and the human experience in the clinic. They are “among the most common types of clinical trials performed.” (Piantadosi, 2005: Ch9)
Observation is often extensive and scrupulous. It often, indeed, extends beyond merely physiological 10
effects: given the circumstances, pharmacological and/or psychological observation may be apt.
It is important to note that there is a degree of discontinuity in the literature over an exact figure; 11
Hackshaw outlines that a Stage II trial often concerns “around 50 [subjects]” (2009: 10), whereas Teira reports that the figure is somewhere “between 100 and 200.” (2011: 255) For the purposes of this paper, however, one need not concern oneself with the particulars; the number of participants in Stage II trials is almost entirely circumstantial and, indeed, of little relevance – if any – to our interest in the Reference Class Problem.
several hundred or thousand people” [Hackshaw, 2009: 9]). Participants are
randomly allocated into either the interventional group (that receives the new
treatment) or the control group (that receives the current standard of treatment, for
comparative purposes). Thus, it is this stage in the clinical trial process that is often
referred to as the ‘randomised control trial’ (or RCT). Hackshaw outlines that this
stage of the trial process alone often take years, or longer, to complete and must be
large in order to achieve the statistical power necessary to convince policy makers
to amend current guidelines. (2009: 9–11) 12
Once a “definitive answer on whether a new treatment is better than the
control group”, the new drug enters Stage IV of the trial process. For completeness,
though not of our primary concern in this paper, Stage IV clinical trials involve
incorporating the new treatment into clinical practice. The effects of the new drug
are closely monitored in a clinical context and any important information is fed
back to the pharmaceutical company. (Hackshaw, 2009: 11 & Teira, 2011: 255)
It is the third stage of clinical trials (otherwise known as randomised control
trials) with which we are concerned in this paper, for the following reason: !“A… phase II study is not usually designed for a
direct statistical comparison of the trial endpoint
between two interventions… However, a phase III
trial is designed for a direct comparison, allowing a
full evaluation of the new intervention and,
usually, a definitive conclusion.”
(Hackshaw, 2009: 10–11)
Deep within the philosophy of the randomised control trial (clinical trial Stage III)
is this comparative assumption; it is assumed that the results observed in a trial
population (by comparison of treatment and control groups) will repeat themselves
in future cases. This assumption is often referred to as ‘external validity’ or
‘generalisation’ and is the point at which the Reference Class Problem takes hold.
Given that randomised control trials (and consequently evidence-based medicine,
more broadly) depend entirely on this notion of ‘external validity’, it is apt to
explore the assumption in greater detail, at this stage of our discussion.
Pg. !17Connor Cummings
This reference to ‘statistical power’ is largely pertinent to randomised control trials run with respect to the 12
frequentist interpretation of probability. This is a point to which I return at length in Section 3.3.2 and again in Section 4.2.2.
In the following section, the reader will observe that ‘external validity’ is
more complicated than many scientists and statisticians frequently acknowledge. 13
Firstly, we shall see that ‘external validity’ is a matter of probabilistic inference,
thus, subject to the Reference Class Problem. However, depending on which
interpretation of probability one chooses to adopt, the Reference Class Problem
appears to present itself in a slightly different form. Consequently, there follows a
discussion of the main interpretations of probability that currently underpin
randomised control trials (namely: frequentism and Bayesianism). Immediately
following a discussion of the different probabilistic interpretations relevant to
randomised control trials, follows the ways in which the Reference Class Problem
arises in each case, respectively.
3.3 External validity/generalisation
3.3.1 A matter of probabilistic inference
In order to draw any conclusion from a set of data, “analytic tools become
necessary.” (Piantadosi, 2005: 108) Therein, however, emerges a problem: !“Statistics is not a perfectly unified field,
particularly with regard to the best method for
making inferences from data.”
(Piantadosi, 2005: 125)
There are, indeed, a number of statistical models with which one is able to make an
apparently objective inference from a data-set. Furthermore, there appears to be no
consensus over which mode of statistical analysis is universally the right one.
With respect to randomised control trials specifically, statistical inferences
made from gathered data are largely – if not entirely – probabilistic: !“Once the end point for the evaluation of the
treatment is reached, the interpretation of the
collected data determines whether or not we should
accept our hypothesis about the effectiveness of the
Pg. !18Connor Cummings
For example, Vitoria, et al. (2004) declare that one can confidently rely upon external validity due to “the 13
assumption of ‘universal biological response’, i.e. different individuals will respond to a treatment or drug in the same way.” (Clarke, et al. 2013: Sect 2.4.2) Piantadosi has a somewhat similar attitude to Victoria, et al. (2004), however, he is more constructive in his disregard of critics of external validity; he claims that “[critics] ignore the principle justification for external validity, that being biological knowledge regarding mechanism.” On this point, I largely agree with Piantadosi and it is a point to which I briefly return in Section 4.3.2. There I agree that a ‘biological knowledge regarding mechanism’ is, indeed, necessary in order to make confident claims about external validity (thus, avoid the Reference Class Problem) and that this might be an avenue for further research.
drug, assigning a certain probability to this
judgement.”
(Teira, 2011: 255)
As has been addressed earlier in this paper, there exist a number of interpretations
of probability. It is, thus, apt to explicate how the Reference Class Problem arises 14
in the context of the predominant probabilistic interpretations that presently
underpin randomised control trials (and, consequently, evidence-based medicine,
more broadly).
The majority of randomised control trials are underpinned by frequentism.
However, as Teira argues, there is growing support for a Bayesian approach to
drawing inferences from randomised control trial data. (2011) I will, therefore, now
address how the Reference Class Problem arises in each of these cases,
respectively.
It is important to note that the interest of this paper is not to advocate one
interpretation of probability over another. The purpose of this section is to merely 15
explicate that there exist a number of probabilistic interpretations that presently
underpin randomised control trials. Furthermore, it appears that the Reference
Class Problem persists regardless of which probabilistic interpretation one chooses
to adopt.
3.3.2 Frequentist interpretation
The vast majority of randomised control trials are presently underpinned by
the frequentist interpretation of probability. Ever since the first randomised control
trial was run in 1948, frequentism has been embraced “as a testing standard by the
international medical community and by pharmaceutical regulatory agencies all
over the world.” (Teira, 2011: 256) As we are about to see, however, it is not long
before frequentism is corrupted by the Reference Class Problem.
As we have seen, frequentism “was first developed in the middle of the
nineteenth century by the Cambridge school of Ellis and Venn”. (Gillies, 2000: 88)
The frequentist theory of probability has since developed into a binary notion: it
can stand in reference to (i) actual frequentism or (ii) hypothetical frequentism.
Actual frequentism is relevant to discuss largely for historical reasons, as it is
rarely esteemed by contemporary statisticians. (Hájek, 2007: 566) Proponents of
actual frequentism, such as Venn: !
Pg. !19Connor Cummings
For a extensive and thorough account of different philosophical theories of probability, see Gillies (2000).14
For an account of the advantages and limitations of different probabilistic interpretations vis-à-vis 15
randomised control trials, see Frequentist versus Bayesian Clinical Trials (Teira, 2011).
“[identified] the probability of an attribute or event
A in a reference class B with the relative frequency
of actual occurrences of A within B.”
(Hájek, 2007: 566)
An actual frequentist’s approach to probability, outlined in the quote above, is
analogous to Venn’s ‘John Smith’ model, that we saw in Section 2.1.1. Therein, the
probability that John Smith would live for a further 11 years more was simply
determined by collecting data on similar people to Mr. Smith. (Venn, 1876: 194 &
Hájek, 2007: 566)
Hypothetical frequentism has since been born out of actual frequentism: !“Hypothetical frequentists such as Reichenbach
(1949) and von Mises (1957) are inspired by the
dictum that probability is a long-run relative
frequency.”
(Hájek, 2007: 567)
The hypothetical frequentist interpretation of probability is defined by seeking to
gather the greatest number of actual observations possible. Hypothetically, the true
probability of an attribute or event can be, thus, represented by limit, p, as n → ∞
in a sequence of observations. It is this hypothetical interpretation of frequentist 16
probability (with a tendency toward an infinite number of observations) by which
the vast majority of randomised control trials are presently underpinned. (Teira,
2011)
As we have seen, in Section 3.2, a randomised control trial (Stage III clinical
trial) is run with a large number of subjects. The reason for this is, in essence, to
cater to the demands of hypothetical frequentism; in order to achieve the statistical
power necessary to draw a reliable probability from a data set of actual
observations, the number of observations must be as great as possible. In other
words, the observed, cumulative probability of the effect of the treatment on a trial
population will be closer to that of the true probability as the trial population
increases in number. Indeed, the true probability of a treatment outcome is
represented by limit, p, as n → ∞. (Reichenbach, 1949: 69) This hypothetical
frequentist probability is then, ultimately, applied to future cases, as a guide to
clinical decision-making. There arises a problem, however, as no randomised
control trial is run with a trial population of infinity.
In order to overcome this pragmatic problem with the hypothetical
frequentist interpretation of probability, biostatisticians employ 'significance tests’
to transform observed data into generalised, hypothetical frequentist probabilities
Pg. !20Connor Cummings
For a more on hypothetical frequentism, see Reichenbach (1949: 67–69) and Gillies (2000: 96–105).16
(wherein n → ∞). Arguably the most predominant significance test is that of the p-
value – the hypothetical nature of which is outlined, below: 17!“Once the experiment is run and actual data
provide the observed value of the statistic, we can
also calculate how likely it is, assuming the truth of
the [null] hypothesis, to obtain a result with less or
equal probability than the observed one: this is the
p-value. In other words, the p-value is the
proportion of an infinite series of repetitions of an
experiment, all conducted assuming the truth of the
null hypothesis, that would yield data contradicting
it as strongly or more so that the observed result.
Therefore, the p-value is a probability of observed
and unobserved results which is tied to the design
of the experiment and cannot be properly
interpreted without it.” 18
(Teira, 2011: 260)
It is important to note, however, that the: “the truth of the hypothesis can never be
established with significance testing: it is just assumed.” (Teira, 2011: 261)
Let us assume, for a moment, that a hypothetical frequentist probability was
actually determined: by means of an infinitely large trial population, or by means
of some infallible significance test. Even with this hypothetical frequentist
probability of the effects of a treatment in a trial population – which may initially
appear to be, quite unequivocally, the true probability – the Reference Class
Problem arises at the point at which the probability of a future event-token is
sought. We have reached the crux of the Reference Class Problem vis-à-vis the
frequentist randomised control trial.
Pg. !21Connor Cummings
It may be necessary to clarify that a ‘null hypothesis’ is simply “the reverse of what the study is designed 17
to show [and] is such that the researcher usually wants to reject it”. (see Daly & Bourke, 2000: 67-69) For example, in a drug trial, the null hypothesis may be that the drug has no therapeutic effect.
For a more detailed account of hypothetical frequentist ‘significance testing’ with respect to randomised 18
control trials, specifically, see (Teira, 2011: 260–262). For more on null hypotheses, p-values and other statistical constructs in clinical trials (and evidence-based medicine, more generally), see Piantadosi (2005) and Daly & Bourke (2000). For the sake of simplicity (and momentum), this section focuses on the p-value –a cornerstone of statistical inference in frequentist randomised control trials. However, the reader will hopefully soon be able to see the pertinence of the Reference Class Problem beyond merely that of the p-value.
To illustrate the Reference Class Problem with respect to hypothetical
frequentism, allow me to borrow a thought experiment from Hájek. (2007: 567)
Therein, he poses that “we are interested in the probability that a given coin lands
heads on a given toss.” (567) Supposing that the outcome (‘heads’ or ‘tails’) of one
particular toss were sought: !“we may suppose that the probability from ‘all tosses of
our coin’ to ‘heads’ is well-defined (non-trivial though
the supposition is).”
(Hájek, 2007: 567)
However, the Reference Class Problem arises as it is possible (and, perhaps,
appropriate) to break down the reference class ‘all tosses of our coin’. Indeed,
Hájek points out that ‘coin tosses’ can be specified differently: “for example, as a
toss of our coin with such-and-such angular momentum, or within a certain time-
period”. (567)
With respect to the probability of a particular coin-toss giving ‘heads’, the
coin-toss can be incorporated qua member of ‘coin-tosses’, however, also qua
member of ‘coin-tosses of a given angular momentum’ and so on. Gathering actual
frequentist data on the proportion of ‘coin-tosses’ that give ‘heads’, will almost
certainly be different from the data gathered on the proportion of ‘coin tosses of a
given angular momentum’ that give ‘heads’. Even if the coins were tossed an
infinite number of times (thus, indulging the dictum of hypothetical frequentism),
the two reference classes would yield difference frequentist data. Consequently, a
particular coin-toss qua member of ‘coin-tosses’ will give one probability of
‘heads’; whereas a particular coin-toss qua member of ‘coin-tosses of a given
angular momentum’, will give another probability of ‘heads’, and so on. (Hájek,
2007: 567) As such, the frequentist interpretation of probability (both actual and
hypothetical) has hit the Reference Class Problem.
The relevance of the problem to randomised control trials will hopefully be
becoming clear. In order to ensure such clarity, allow the following explication.
Frequentist randomised control trials with a trial population of ‘men over 55 years
old’ would almost certainly gather different data to another with a trial population
of ‘people of African family origin’. Suppose a doctor was presented with a patient:
a 60 year old man of African family origin. This particular patient qua member of
one reference class (e.g. ‘men over 55 years old’) will appear to have one
probability of experiencing a given treatment effect, whereas the same patient qua
member of another reference class (e.g. ‘people of African family origin’) will have
another. Even if we were to assume that a hypothetical frequentist probability was
Pg. !22Connor Cummings
actually determined, it would not be clear into which reference class this 19
particular patient ought to be incorporated, thus, which probability is the right one.
In summary, even if the true hypothetical frequentist probability of a
treatment effect on a trial population were determined (as randomised control trials
presently aspire to attain), an individual still has to be assigned some reference
class. This inevitably gives rise to the Reference Class Problem. In other words,
even in the case of apparently absolute statistical power, individual patients must
still be assigned to a reference class. Statistical correlation alone (even in its most
theoretically pure form), therefore, appears to be insufficient grounds on which to
assume external validity of randomised control trials. We ought to, consequently,
turn our attention towards constructing appropriate reference classes if we are to
confidently apply the results of randomised control trials to individual future cases.
Before the matter of ‘Constructing Reference Classes’ can be addressed , it is 20
appropriate to briefly address some of the alternative interpretations of probability
that can potentially underpin clinical trials. As will become clear, the Reference
Class Problem cannot be avoided by merely adopting an alternative interpretation
of probability.
3.3.3 Bayesian approach
In recent years, it has been argued that evidence-based medicine ought to
bring frequentism down from its pedestal. In 2011, David Teira published a paper
in which he convincingly highlighted the ethical, practical, regulatory and
epistemological problems with the frequentist interpretation of probability that
currently underpins the vast majority of randomised control trials and,
consequently, medical guidelines more broadly. Furthermore, he proposes that
there is another, perhaps more apt, alternative to frequentism in clinical trials:
Bayesianism.
In this section, it is not necessary to join the debate over the advantages and
limitations of Bayesian versus frequentist clinical trials. Neither is it apt to spend
too much time on the structural or pragmatic demands of a Bayesian trial. For the 21
purposes of this paper, it is simply necessary to highlight that a Bayesian approach
Pg. !23Connor Cummings
For example, by running trials with populations of infinity, or by employment of some infallible statistical 19
significance test, as we have seen, above.
See Chapter 4.20
For more on Bayesianism see Gillies (2000: 82–85); for more on the Bayesian/frequentist debate, 21
specifically, see Teira (2011); for a historical review of Bayesian statistical approaches in medicine, see Ashby (2006); and for a detail account of Bayesian approaches to clinical trials and healthcare, more broadly, see Spiegelhalter et al. (1994 & 2004).
to clinical trials does not evade the Reference Class Problem. As it is neither
necessary nor possible to fully elaborate the Bayesian statistical framework in this
paper, this section merely highlights the pertinent facet of Bayesianism that gives
rise to the Reference Class Problem.
Under a Bayesian approach, statistical probabilities are ascertained as
degrees of prior beliefs: !“Before the experiment begins, the Bayesian
summarizes knowledge about the unknown
statistical parameter (e.g., treatment effect) in the
form of a probability distribution, called the “prior
distribution.”
(Piantadosi, 2005: 116)
As data is subsequently gathered on the effect of a treatment (from a trial or
elsewhere), a likelihood about the truth of the initial belief about the parameter
(treatment outcome) is ascertained: !“Evidence from further data is summarized by a
likelihood function for the parameter, and the
normalized product of the prior and the likelihood
form the posterior distribution on the basis of
which conclusions should be drawn”
(Spiegelhalter et al., 1994: 360)
To summarise, a Bayesian approach is not limited to solely one particular trial
design, nor any particular canon of observation, such as actual frequentism. A
Bayesian approach is free to incorporate any relevant data that is available. (Teira,
2011: 256) It merely ascertains a likelihood, relativised to the truth of some initial
belief about the statistical parameter (e.g. treatment effect). Such an initial belief
can be based on “objective evidence or subjective judgment or a combination”.
(Piantadosi, 1994: 360) It is out of this ‘initial belief about the parameter’ that the
Reference Class Problem arises.
Let us assume that a Bayesian trial were to be run and the initial prior belief
concerning some parameter was well-supported: for example, ‘CCBs reduce blood
pressure’. However, the blood pressure of whom? In order to proceed with the 22
trial, further information is gathered about the effects of CCBs. However, effects of
CCBs on whom? In order to ascertain a likelihood about the truth of the parameter,
Pg. !24Connor Cummings
If an initial parameter were well-supported by an esteemed body of evidence, it may not seem clear why a 22
trial ought to be run in the first place. However, although the evidence supporting an initial parameter may be congruent, it may be lacking in other respects, such as: quality, statistical power or objectivity. After all, Bayesian trials can be run on the bases of entirely subjective beliefs.
the data gathered must be relativised to some reference class. (Hájek, 2007: 567)
Data gathered on the effects of CCBs on one reference class (for example,
‘people’) may very well differ from data gathered on another (e.g. ‘people of
African ethnic origin’). In such circumstances, the likelihood ascertained about the
truth of the parameter will differ. The Reference Class problem arises, again, with
respect to the external validity of the trial's results. 23
As we have seen, a particular patient “has an indefinite number of properties
or attributes observable in it” (Venn, 1876: 194) and can therefore be “incorporated
in many reference classes, from which different probabilities will
result.” (Reichenbach, 1949: 374) In isolation, however, none of the reference
classes appear to be, entirely, the right one. The Bayesian approach to clinical trials
has, consequently, fallen upon the Reference Class Problem in much the same way
as frequentism has done.
3.4 Summary
Clinical trials presently underpin evidence-based medicine: as we have seen
in this chapter, they provide observational data from which conclusions are drawn
about the effects of given interventions. Such conclusions thence shape clinical
guidelines and are, ultimately, used to guide the most appropriate treatment of
future individual patients. However, in order for the observed results of clinical
trials to apply to future cases, it is necessary to adopt some interpretation of
probability. (Piantadosi, 2005: 108)
This section has outlined, in detail, how the Reference Class Problem applies
to the standard frequentist randomised control trial. Furthermore, it has been
illustrated that favouring a Bayesian approach to the analysis of trial data does not
manage to evade the problem in any useful way. Indeed, as Hájek argues, the
Reference Class Problem pertains to every useful interpretation of probability due
to the fact that probabilities, by their very nature, must be relativised to some
reference class. (2007: 567) In the context of evidence-based medicine,
incorporating a particular patient in different reference classes will yield different
probabilities of his effect to a treatment. However, it may not be clear which is the
most appropriate reference class for him, thus, which the true probability really is.
Pg. !25Connor Cummings
There is, of course, the circumstance in which there is no evidence to support the initial belief concerning 23
the parameter. In this particular case, the Reference Class Problem does not arise. However, in this circumstance, as Hájek puts it, “your degrees of belief can be whatever you like… [thus], Your probability assignments can be completely at odds with the way the world is”. (2007, 576–577) A Bayesian probability drawn from an unsupported initial belief concerning the parameter may well evade the Reference Class Problem, but it is also likely to be entirely vacuous. For more on this point, see Hájek (2007: 576–577).
In order to address the Reference Class Problem, we appear to have to divert
our attention to the ascertainment of the most appropriate reference class. In the
following chapter, I outline the problem in terms of drawing conclusions from
heterogeneous trial populations and, ultimately, argue that references classes ought
to be constructed in accordance with statistically-relevant homogeneity.
Pg. !26Connor Cummings
4. CONSTRUCTING REFERENCE CLASSES
4.1 Heterogeneous reference classes
4.1.1 Venn
The Reference Class Problem essentially arises as probabilities about future
individual cases are drawn from heterogenous trial populations. Such heterogeneity
can give rise to an element of uncertainty over the most appropriate reference class
to which a particular individual case ought to be assigned. Consequently, statistical
correlation drawn from a heterogeneous trial population, alone, appears to be
insufficient grounds on which to ensure the external validity of trial results, with
respect to particular future patients. 24
In his Logic of Chance, Venn outlines that the Reference Class Problem
arises as a product of drawing probabilities about individual cases from
heterogeneous populations: !“Now when it is said of any such heterogeneous
body that, say, nine/tenths die, what is meant (or
rather implied) is that the class might be broken up
into smaller subdivisions of a more homogeneous
character, in some of which, of course, more than
nine-tenths die, whilst in others less, the
differences depending upon their character,
constitution, profession, &c. ; the number of such
divisions and the amount of their divergence from
one another being perhaps very considerable.”
(1876: 208)
Consequently, by refining the reference class of an individual case in some way, his
particular probability of dying may be altered considerably. The Reference Class
Problem duly arises as it may not be entirely clear which reference class is entirely
the right one.
Intuitively, it follows that if there were some way of ensuring complete
homogeneity in a trial population, the Reference Class Problem could be evaded
entirely; there would, consequently, be no need to reference particular cases at all:
Pg. !27Connor Cummings
See Chapter 3, particularly Section 3.3.2.24
an individual’s reference class would be everything he is. The next chapter, 25
consequently, breaks down this matter of ‘homogenisation’, as it is more complex
than it may initially appear to be. As we will see, the homogenisation of trial
populations – at least in a crude sense – only proves to replace the Reference Class
Problem with another. I argue, however, that homogeneity in only the relevant
respects may prove to be a viable, pragmatic solution to the Reference Class
Problem vis-à-vis evidence-based medicine.
4.2 Homogeneous reference classes
Somewhat ironically, ‘homogeneous reference classes’ is a multifarious
notion that requires unpacking. It is therefore necessary to establish exactly to what
‘homogeneous reference classes’ can be referring. In this section, I address the
possible interpretations of ‘homogeneity’ one is free to adopt and consequently
argue that homogeneity in merely the statistically relevant respects is necessary vis-
à-vis the Reference Class Problem.
4.2.1 Absolute homogeneity
Randomised control trials with a homogenous trial population – in an
absolute sense – would have a trial population of one; the trial population would
consist entirely and solely of the very patient to whom the results of the trial would
then be used to treat. For obvious reasons, this is neither feasible nor useful in any
evident sense. With respect to the current framework of randomised control trials,
absolute homogeneity would require running a personalised trial for every
treatment that exists, on the every patient that may require the results. Such a
proposal is ludicrous. Furthermore, the statistical power of conclusions drawn from
a trial population of one would be so minuscule that they could not be applied to
any other individuals. On this interpretation, absolute homogeneity – in the crudest
sense – is entirely incompatible with the randomised control trial: it only replaces
the Reference Class Problem with a plethora of others.
Despite being ludicrous vis-à-vis randomised control trials, absolute
homogeneity may be useful in the sense of ‘personalised medicine’. This is,
perhaps, an avenue to explore with respect to evidence-based medicine, however, it
is neither possible nor appropriate to address this matter here. The purpose of this
paper is to explicate the Reference Class Problem vis-à-vis current medical practice
Pg. !28Connor Cummings
The reader ought to be reminded of Reichenbach’s solution to the Reference Class Problem that was 25
outlined in Section 2.1.2, wherein he proposed that a particular attribute or event ought to be assigned to the “narrowest class for which reliable statistics can be compiled.” (1949: 374) As we will see in the next section, this matter of ascertaining ‘reliable statistics’ can prove problematic with the homogenisation of trial populations.
(specifically, randomised control trials). An inquiry into the potential advantages of
personalised medicine would require some renovation of our current statistical
tools, or even an entirely new approach to gathering evidence on which to base
medical practice. Although a potentially fruitful endeavour, such an inquiry is not
possible to cover here and goes beyond the aims of this paper: to illustrate the
pertinence of the Reference Class Problem to our current framework of evidence-
based medicine (and not overhaul the paradigm entirely).
4.2.2 Salmon
As we have seen in Section 2.1.2, Reichenbach proposed a solution to the
Reference Class Problem in his Theory of Probability, concluding that we are to: !“proceed by considering the narrowest class for
which reliable statistics can be compiled.”
(Reichenbach, 1949: 374)
This proposal, however, appears to yield an element of pragmatic difficulty. As a
reference class is narrowed, it consequently has, by definition, fewer members and
this appears to be antithetical to the attainment of statistical power. Picking up on
this point, Salmon sets out a reformation of Reichenbach’s solution: ! “The aim in selecting a reference class to which to
assign a single case is not to select the narrowest,
but the widest, available class. However, the
reference class should be homogeneous, and
achieving homogeneity requires making the
reference class narrower if it was not already
homogeneous. I would reformulate Reichenbach’s
method of selection of a reference class as follows:
choose the broadest homogeneous reference class
to which the single event belongs. I shall call this
the reference class rule.”
(Salmon, 1970: 43)
Although this proposal may initially appear conclusive, it raises one crucial
question: homogeneity in respect to what?
As we have seen, absolute homogeneity is far from appropriate in the
context of randomised control trials. Reference classes, therefore, appear to have to
be homogeneous in some relevant respect(s). A particular patient may be assigned
to the broad reference class ‘people’ that is, indeed, homogeneous in many
respects. However, the reference class ‘people’, to which a particular patient
belongs, is less broad than the reference class ‘vertebrates’ to which he also
Pg. !29Connor Cummings
belongs. Both reference classes ‘people’ and ‘vertebrates’ are homogeneous in their
own respects, however, it would not appear sensical to run clinical trials on
‘vertebrates’, merely because it is a broader reference class than ‘people’. It
appears that we have not yet evaded the Reference Class Problem in any useful
sense.
If it were possible, however, to identify homogeneous reference classes
within which no further statistically relevant partition could be found, this would
neatly satisfy the conditions of Salmon’s above Reference Class Rule. In 1977,
Salmon published an investigation into the possibility of the existence of such
objectively homogeneous reference classes. Therein, he outlines the following: !“A reference class A is homogeneous with respect
to an attribute B provided there is no set of
properties Ci in terms of which A can be relevantly
partitioned. A partition of A by means of Ci is
relevant with respect to B if, for some value of i,
P(A.Ci, B) ≠ P(A,B) . . . To say that a reference
class is homogeneous – objectively homogeneous
for emphasis – means that there is no way, even in
principle, to effect the relevant partition.”
(Salmon, 1977: 399)
Later in his paper, Salmon goes on to elucidate exactly what he means by ‘relevant
partitioning’, which – for the sake of clarity – is important to include at this stage: !“Suppose that P(A,B) = ½. Let C1 = B and C2 = B.
Then P(A.C1,B) = 1 and P(A.C2,B) = 0; thereby a
relevant partition has been achieved.”
(Salmon, 1977: 399)
In order to ascertain the broadest homogeneous reference class to which the
single event belongs, one ought to strive to achieve homogeneity in merely the
relevant respects. Theoretically, a reference class ought, therefore, only include
those characteristics on which the effects of an intervention are dependent, in order
for the class to remain as broad as possible. Often in practice, however, it is apt
merely to ensure that those statistically relevant characteristics are catered for
somewhere within the reference class.
In conclusion, a focus on statistically relevant partitions would appear to
illuminate the most appropriate reference class from which to draw conclusions
about individual future cases. Adhering to Salmon’s Reference Class Rule by
focussing on only relevant homogeneity within a reference class, appears to avoid
Pg. !30Connor Cummings
sacrificing statistical power. In the next section, I address how such a conclusion
can be applied vis-à-vis randomised control trials.
4.2.3 Relevant homogeneity vis-à-vis randomised control trials
As we have seen, homogeneity in a trial population appears to be useful – if
not, necessary – in order to evade the Reference Class Problem. Homogeneity
appears to enable one to draw reliable probabilistic conclusions from a trial
population that can be confidently applied to individual future cases.
Absolute homogeneity would remove the need for reference classes entirely.
However, this approach vis-à-vis randomised control trials proves to be a
practically inappropriate and a statistically poor solution (see Section 4.2.1).
Relevant homogeneity, however, allows for the reference class of a trial population
to remain broad, which in turn facilitates the collation and analysis of large pools of
data. Indeed, irrelevant homogeneity (i.e. homogenisation by means of some
statistically irrelevant partition[s]) only proves to narrow the reference class of a
trial population unnecessarily, the repercussions of which may significantly
compromise the statistical power of trial results.
4.3 How evidence-based medicine ought to evolve
As we have seen in this paper, the Reference Class Problem has the potential
to undermine the evidence on which our medical practice is presently being based.
This section proposes two ways in which the medical community might move
forward, in light of this problem of statistical inference. My first proposal (Section
4.3.1) is largely an epistemological one: now that the Reference Class Problem has
been outlined vis-à-vis evidence-based medicine, the medical community must
accept and accommodate for its ramifications. My second proposal (Section 4.3.2)
consists of a suggestion for further research into how the most appropriate
reference class for individual cases might be pragmatically ascertained.
4.3.1 Transparency vis-à-vis the Reference Class Problem
In order for clinicians to be able to make an informed decision about the
most appropriate care of a particular patient, evidential transparency is essential.
Indeed, at the heart of the evidence-based medicine paradigm, is the importance of
professional clarity and candour: !“Evidence-based medicine is the conscientious,
explicit and judicious use of current best evidence
in making decisions about the care of individual
patients.”
(Sackett et al., 1996: 71)
Pg. !31Connor Cummings
I see no reason why such transparency ought not apply to the Reference Class
Problem vis-à-vis evidence-based medicine.
My proposal for the future of evidence-based medicine, in light of the
statistical problem of the reference class, constitutes firstly a mere
acknowledgement of the problem by researchers, biostatisticians, clinicians and
policy-makers (such as, NICE). The extent of the problem has been outlined, in
detail, in this paper and its pertinence to individual patient care ought to be taken
seriously in a clinic context.
On acceptance of the problem, I propose that those aforementioned parties
ought to be explicit about most appropriate reference classes to which individual
patients ought to be assigned and, furthermore, the underlying justification for such
a reference class. Indeed, such transparency is congruous with the fundamental
principles of the evidence-based medicine paradigm, as outlined by Sackett, et al.,
above. (1996: 71) Such clarity would appear to not only enable clinicians to make
an informed decisions about patient care but would also encourage researchers to
focus due attention on ascertaining the most appropriate reference classes for
individual cases.
4.3.2 Extraneous considerations
As this paper draws to a close, I would like to briefly propose a suggestion
regarding further research into a potential solution to the Reference Class Problem
vis-à-vis evidence-based medicine. If I may, I would like to begin by take the
reader back to Venn’s account of the problem of class selection, outlined at the
beginning of this paper, in Section 2.1.1.
In 1876, Venn put forward the following: !“In saying that it is thus arbitrary under which
class he is placed, we mean, of course, that there
are no logical grounds of decision; the selection
must be determined by some extraneous
considerations.”
(Venn, 1876: 195)
By this, Venn is essentially arguing that there appear to be no objective grounds on
which to construct an entirely relevant reference class for a single thing or event;
one must, therefore, look to employ some auxiliary judgement in the construction
of relevant reference classes. I suggest that this approach may be fruitful, not
Pg. !32Connor Cummings
merely as a solution to the Reference Class Problem in an abstract sense, but also
as a means of attenuating the problem vis-à-vis evidence-based medicine. 26
As we have seen in this paper, the interpretation of results of randomised
control trials requires some element of statistical inference. However, the 27
importance of extraneous considerations must not be overlooked in the
interpretation of trial validity: !“[RCTs] cannot alone support the expectation that
a policy will work for you. What they tell you is
true – that this policy produced that result there.
But they do not tell you why that is relevant to
what you need to bet on getting the result you want
here. For that, you will need to know a lot more.”
(Cartwright & Hardie, 2012: ix)
Such knowledge appears to be of great importance vis-à-vis the Reference Class
Problem. In order to ensure that the results observed in a trial population are
applicable to individual future cases, it is necessary to ensure that reference classes
are statistically relevant with respect to individual patients. As Venn pointed out in
1876, “extraneous considerations” are necessary in order to ensure the right
reference class has been selected for “an individual thing or event”. (194–195) It
would appear that the importance of such considerations ought not be overlooked
in the case of the Reference Class Problem vis-à-vis evidence-based medicine.
Pg. !33Connor Cummings
This may, for example, come in the form of an understanding of the underlying causal mechanism of the 26
intervention. Indeed, many philosophers have argued that a mechanistic understanding is crucial in order to ensure external validity of trial results. (Schnaffer, 1993: 306–307; Russo & Williamson, 2007; Cartwright & Hardie, 2012; Clarke, et al., 2013) Furthermore, Piantadosi outlines that “biological knowledge regarding mechanism” is “the principle justification for external validity” (2005: 317). An understanding of the underlying causal mechanism(s) of an intervention, therefore, ought not be overlooked and may prove to afford a means by which to ascertain statistically relevant partitions within reference classes. A mechanistic approach is perhaps, therefore, an area for further research into the attenuation of the Reference Class Problem vis-à-vis evidence-based medicine. For more on causal mechanisms in general, see Machamer et al. (2000) and Machamer (2004).
See Section 3.3.1.27
5. CONCLUSION
In this paper, I have outlined a profound problem with our present paradigm
of medical practice. Depending on how one references an individual patient (for
example, as ‘a man over 50 years old’ or ‘a man of African family origin’), his
particular probability of experiencing a given effect of a treatment will differ.
Moreover, it may not be entirely clear which reference class is objectively the right
one for any given patient. This is the Reference Class Problem vis-à-vis evidence-
based medicine; it is a problem with potentially grave ramifications and one that
ought to be explicitly addressed by the medical community.
At the outset of this paper, I began by delivering a detailed account of the
Reference Class Problem, in an abstract sense. The reader is taken through the
works of Venn, Reichenbach and Hájek in order to understand the evolution of the
problem and how it has become of contemporary relevance vis-à-vis evidence-
based medicine. The ubiquity of the Reference Class Problem is somewhat
troubling; indeed, the problem has been highlighted in a number of academic
disciplines, including law and quantitative biology (see Section 2.2). However,
despite its potential to entirely undermine statistical inferences, the problem has not
yet been acknowledge vis-à-vis evidence-based medicine.
In Chapter 3, I have illustrates how the Reference Class Problem appears to
be of worrying pertinence to our current paradigm of medical practice. As we have
seen in this paper, clinical judgement is made largely – if not entirely – on the basis
of conclusions drawn from clinical trials (specifically, randomised control trials).
Doctors are instructed to assess published trial results on their relevance to
individual patients. However, this practice necessitates one potentially problematic
assumption: that the effects observed in a trial population are applicable to a given
individual patient (known as ‘external validity’ or ‘generalisation’).
In this paper, I have unpacked the framework of the randomised control trial
and have illustrated that external validity is largely a matter of probabilistic
inference. Given that there exist a number of philosophical interpretations of
probability, I have explicated how the Reference Class Problem arises vis-à-vis the
predominant interpretations that presently underpin randomised control trials
(namely, frequentism and Bayesianism). I have argued that the Reference Class
Problem appears to persist, regardless of which interpretation of probability one
chooses to adopt.
In Chapter 4, I have outlined how the Reference Class Problem appears to
arise due to the heterogeneity of trial populations vis-à-vis individual cases. This
chapter, thus, comprises of an inquiry into the construction of the most appropriate
Pg. !34Connor Cummings
reference class for a given individual. In line with Salmon’s Reference Class Rule
(see Section 4.2.2), I have argued that, in theory, reference classes ought to be
homogeneous in only the statistically relevant respects. This allows for the
reference class to remain broad, thus, ensuring external validity without
compromising the statistical power of the trial’s results. The Reference Class
Problem may thence be avoided, as the most appropriate reference class for an
individual case will become possible to ascertain: constructed in accordance with
statistically-relevant partitioning.
In this paper, I have explicated the Reference Class Problem vis-à-vis
evidence-based medicine in some detail. It is important to note that, although this
paper has been thorough, it has not been entirely comprehensive. Indeed, a number
of avenues for further research into the Reference Class Problem and its pragmatic
ramifications vis-à-vis evidence-based medicine have been outlined in this paper.
The first step, however, must be to merely acknowledge its existence – something
the medical community appears yet to do.
Pg. !35Connor Cummings
6. REFERENCES
Texts
Ashby, D. 2006. Bayesian Statistics in Medicine: A 25 year review. Statistics in Medicine. 25, pp. 3589–3631. !Aven, T. & Reniers, G. 2013. How to define and interpret a probability in a risk and safety setting. Safety Science. [online]. 51, pp. 223–231. Available from: www.elsevier.com/locate/ssci [Accessed: 9th April 2014]. !Cartwright, N. & Hardie, J. 2012. Evidence-based Policy: A Practical Guide to Doing it Better. Oxford, U.K. & New York, NY, U.S.A.: Oxford University Press. !Cheng, E.K. 2009. A Practical Solution to the Reference Class Problem. Colombia Law Review. [online]. 109(8), pp. 2081–2105. Available from: http://www.jstor.org/stable/40380407 [Accessed: 10th April 2014]. !Clarke, B.; Gillies, G.; Illari, P.; Russo, F. & Williamson, J. 2013. Mechanisms and the Evidence Hierarchy. Topoi. [online]. Available from: http://link.springer.com/article/10.1007%2Fs11245-013-9220-9 [Accessed 5th January 2014]. !Daly, L.E. & Bourke, G.J. 2000. Interpretation and Uses of Medical Statistics. 5th Ed. Oxford, UK: Blackwell Science Ltd. !de Finetti, B. 1937. Foresight: Its Logical Laws, Its Subjective Sources. (English Translation). In: H.E. Kyburg and H.E. Smokler, eds. 1964. Studies in Subjective Probability. New York, NY, U.S.A.: John Wiley & Sons. !Gillies, D. 2000. Philosophical Theories of Probability. London, U.K.: Routledge. !Guyatt, G. et al. (Evidence Based Medicine Working Group). 1992. Evidence Based Medicine: A new approach to teaching the practice of medicine. Journal of the American Medical Association. 288, pp. 2420-2425. !Hackshaw, A. 2009. A Concise Guide to Clinical Trials. West Sussex, U.K.: John Wiley & Sons Ltd. !Hájeck, A. 2007. The reference class problem is your problem too. Synthese. [online]. 156(3), pp. 563-585. Available from: http://link.springer.com/article/10.1007/s11229-006-9138-5 [Accessed 3rd December 2013].
Pg. !36Connor Cummings
!Machamer, P.; Darden, L.; Craver, C.F. 2000. Thinking about mechanisms. Philosophy of Science. [online]. 67, pp. 1–25. Available from: http://www.jstor.org/stable/188611 [Accessed 25th November 2013]. !Machamer, P. 2004. Activities and Causation: The Metaphysics and Epistemology of Mechanisms. International Studies in the Philosophy of S c i e n c e . [ o n l i n e ] . 1 8 ( 1 ) , p p 2 7 – 3 9 . Av a i l a b l e f r o m : h t t p : / /dx.doi.org.libproxy.ucl.ac.uk/10.1080/02698590412331289242 [Accessed: 19th November 2013]. !von Mises, R. 1957. Probability, Statistics and Truth. 2nd revised English ed. New York, NY, U.S.A.: Macmillan. !NICE, 2011. CG127: Hypertension: quick reference guide. National Institute for Health and Clinical Excellence, London. [online]. Available from www.nice.org.uk [Accessed 12th January 2014]. !Piantadosi, S. 2005. Clinical Trials: A Methodological Perspective. 2nd ed. Hoboken, NJ, U.S.A.: John Wiley & Sons, Inc. !Pocock, S.J. 1983. Clinical Trials: A Practical Approach. Chichester, U.K. & New York, NY, U.S.A: John Wiley & Sons Inc. !Reichenbach, H. 1949. The Theory of Probability: An inquiry into the logical and mathematical foundations of the calculus of probability. E.H. Hutten and M. Reichenbach, trs. 2nd ed. Berkley & Los Angeles, CA, U.S.A. & London, U.K.: University of California Press. !Russo, F. & Williamson, J. 2007. Interpreting causality in the health sciences. International Studies in the Philosophy of Science. [online]. 21(2), pp. 1157–70. A v a i l a b l e f r o m : h t t p : / / d x . d o i . o r g . l i b p r o x y . u c l . a c . u k /10.1080/02698590701498084 [Accessed 10th December 2013]. !Sackett D.L.; Rosenberg W.M.C.; Gray J.A.M.; Haynes R.B. & Richardson W.S. 1996. Evidence Based Medicine: what it is and what it isn't. BMJ. [online]. 312(7023), pp 71–72. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2349778/ [Accessed 1st December 2013]. !Salmon, W.C. 1970. Statistical Explanation. In: W.C. Salmon; R.C. Jeffrey and J.G. Greeno, eds. 1971. Statistical Explanation and Statistical Relevance. Pittsburgh, PA, U.S.A.: University of Pittsburgh Press, pp. 29–87. !
Pg. !37Connor Cummings
Salmon, W.C. 1977. Objectively Homogeneous Reference Classes. Synthese. [online]. 36(4), pp. 399–414. Available from http://link.springer.com/article/10.1007%2FBF00486104?LI=true [Accessed 29th November 2013]. !Schaffner, K.F. 1993. Discovery and Explanation in Biology and Medicine. Chicago, IL, U.S.A. & London, U.K.: The University of Chicago Press. !Spiegelhalter, D.J.; Freedman, L.S. & Parmar, M.K.B. 1994. Bayesian Approaches to Randomized Trials. Journal of the Royal Statistical Society: Series A (Statistics in Society). 157, pp. 357–416. !Spiegelhalter, D.J.; Abrams, K. & Myles, J. 2004. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Chichester, U.K.: John Wiley & Sons. !Sterne, R.H. 2010. The Discordance of Individual Risk Estimates and the Reference Class Problem. Quantitative Biology. [online]. Available from: http://arxiv.org/abs/1001.2499 [Accessed 25th March 2014]. !Teira, D. 2011. Frequentist versus Bayesian Clinical Trials. In: F. Gifford; D.M. Gabbay; P. Thagard and J. Woods, eds. 2011. Philosophy of Medicine (Handbook of the Philosophy of Science). Oxford, U.K.; Amsterdam, The Netherlands & Burlington, MA, U.S.A.: Elsevier, pp. 255–297. !Venn, J. 1876. The Logic of Chance: An essay on the foundations and province of the theory of probability, with especial reference to its application to moral and social science. 2nd ed. London & Cambridge, U.K.: Macmillan and co. !Victora, C.G.; Habicht, J.-P. & Bryce, J. 2004. Evidence-based public health: Moving beyond randomized trials. American Journal of Public Health. 94, pp 400–405. !!Images
Cover image. Available from: http://www.math.cornell.edu/~numb3rs/lipa/imgs/
venn4.png [Accessed 14th April 2014].
Pg. !38Connor Cummings