The Reference Class Problem vis-à-vis Evidence-Based Medicine

!

10,626 words2013–2014HPSC3026

The Reference Class Problem vis-à-vis Evidence-Based Medicine

by Connor Cummings | [email protected] | !!

Supervised by: Dr. Brendan Clarke Department of Science and Technology Studies

University College London !

!!!!

“Through seeking we may learn and know things

better. But as for certain truth, no man has known

it/ Nor shall he know it… For all is but a woven

web of guesses.”

– Xenophanes

!!

“It should be borne in mind that when we are

attempting to make real inferences about things as

yet unknown, it is in this form that the problem will

practically present itself.”

– Venn, 1876: 194

!

!ABSTRACT

Doctors are instructed to assess published trial results on their relevance to

individual patients. This paper examines the inferential steps that link evidence

showing that an intervention is effective in a trial population to the prediction of

effectiveness for an individual receiving that treatment. Typically, trial participants

are classified using a small number of relevant biological, social or other

characteristics (for example, age, gender, or ethnicity). However, individuals are

not simple: their important characteristics will not be fully specified by these few

classifiers. Apparently minor changes in classification (for example, a patient as ‘a

50 year old man’, or as ‘a 50 year old man with a stressful job’) may radically

change the interventional outcomes. This paper explicates the pertinence of this

problem vis-à-vis the predominant theories of probability that presently underpin

randomised control trials (namely: frequentism and Bayesianism). I argue that the

Reference Class Problem is both pervasive and profound vis-à-vis evidence-based

medicine and that an acknowledgement of the problem by the medical community

is long overdue.

!ACKNOWLEDGEMENTS

I would like to extend my sincerest gratitude to Dr. Brendan Clarke at the

Department of Science and Technology Studies, University College London. He

has been both an encouraging and informative source of support throughout my

work on this dissertation. More broadly, he has afforded me not merely time but

also inspiration.

CONTENTS !

1. INTRODUCTION 6......................................................................................

2. THE REFERENCE CLASS PROBLEM 7...................................................2.1 The evolution of the ‘Reference Class Problem’

2.1.1 Venn

2.1.2 Reichenbach

2.1.3 Hájek

2.2 The ubiquity of the problem

3. THE PROBLEM VIS-À-VIS EVIDENCE-BASED MEDICINE 13...........3.1 Preliminary analogy

3.2 Clinical trials

3.3 External validity/generalisation 3.3.1 A matter of probabilistic inference

3.3.2 Frequentist interpretation

3.3.3 Bayesian approach

3.4 Summary

4. CONSTRUCTING REFERENCE CLASSES 27.........................................4.1 Heterogeneous reference classes

4.1.1 Venn

4.2 Homogeneous reference classes 4.2.1 Absolute homogeneity

4.2.2 Salmon

4.2.3 Relevant homogeneity vis-à-vis randomised control trials

4.3 How evidence-based medicine ought to evolve 4.3.1 Transparency vis-à-vis the Reference Class Problem

4.3.2 Extraneous considerations

5. CONCLUSION 34.........................................................................................

6. REFERENCES 36.........................................................................................

1. INTRODUCTION

Suppose, if you will, that I sought to determine the chance that my house

were to burn down, at some stage in the next five years. There are a number of 1

ways in which I could go about determining such a probability; one credible way

could be by collecting data on what proportion of houses, similar to my own, have

burned down in the past (over some appropriate five-year period). Before I am able

to gather data, however, I must first incorporate my house into some appropriate

reference class in order to identify those other houses that are similar to my own.

Therein, however, begins to emerge a problem: how am I to determine which

characteristics of my house ought to dictate which other houses are similar?

This problem has its roots in the fact that there appears to be an

indeterminable number of ways in which I am able to reference my house. It is ‘a

house in Holloway’, ‘a house with an open fire’, a ‘house with gas cooking

appliances’, a ‘house inhabited by a smoker’, a ‘house with a crimson door’ and so

forth. Depending on which reference class I assign my house will dictate which

other houses I include in my data set and, consequently, the apparent likelihood of

my own house burning down will differ. Importantly, however, none of these

reference classes appear to be objectively the right one. This is the Reference Class

Problem, in a nutshell.

In this paper, I outline that the above Reference Class Problem appears to be

of troubling pertinence to our current paradigm of medical practice: evidence-based

medicine. I begin by outlining the problem in abstraction. As the reader will soon

discover, the problem appears to be both profound and pervasive. In the latter

chapters, I map the Reference Class Problem directly onto evidence-based

medicine and explore the ways in which it threatens to corrode the very evidence

on which our medical practice is presently being based.

Pg. !6Connor Cummings

For reasons that will become apparent shortly, it is important to emphasise that I am interested in my 1

particular house, and not houses in general.

2. THE REFERENCE CLASS PROBLEM

In this chapter, I outline how philosophers of science have become familiar

with the Reference Class Problem, historically. The reader will, first, be taken back

to Cambridge in the middle of the nineteenth century, where John Venn was

developing a new philosophical theory of probability: frequentism. (Gillies, 2000:

88) In doing so, he stumbled upon a problem that was later coined by Hans

Reichenbach, in 1949, as ‘the problem of the reference class’ (see Section 2.1.2). In

the latter half of this chapter (Sections 2.1.3 and 2.2), I outline the apparent

ubiquity of this problem in a contemporary context.

2.1 The evolution of the ‘Reference Class Problem’

2.1.1 Venn

The ‘Reference Class Problem’, as it has come to be known today, naturally,

has a history. In order to develop a thorough understanding of the notion presently,

it is apt to begin by taking the reader back to the work of John Venn, in the latter

half of the nineteenth century. The Logic of Chance was originally published in

1866, with second and third editions emerging in 1876 and 1888, respectively. 2

Therein, Venn lays the foundations on which the ‘Reference Class Problem’ has

since been established.

Venn begins by initially abstracting that a practical problem is bound to

emerge “when we are attempting to make real inferences about things as yet

unknown”. (1876: 194) He proceeds to outline a thought experiment in which one

is interested in deriving the probability of John Smith living for a further eleven

years. He concedes that this may superficially appear to be a rather menial

endeavour – demanding a distinct lack of mental fitness – that is undertaken merely

“by counting how many men of the age John Smith, respectively do and do not live

for eleven years.” (194) However, whilst this may induce the probability that a

‘fifty year old man’ will live for eleven years more, it does not bequeath unto one

the specific probability of John Smith doing so. For Venn, a problem begins to

emerge due to the following observation: !“It is obvious that every individual thing or

event has an indefinite number of properties or

attributes observable in it, and might therefore be


In this section, I will be referring specifically to the second edition of Venn’s The Logic of Chance (1976); 2

this edition is significantly more extensive than Venn’s first, which does not include the above material pertinent to this paper and the Reference Class Problem.

considered as belonging to an indefinite number of

different classes of things.”

(Venn, 1876: 194)

Indeed, it does not seem immediately apparent as to which ‘properties or attributes’

of John Smith’s are relevant to the probabilistic endeavour at hand.

For Venn, the problem grows teeth as, by assigning a thing or event to one

reference class, one is simultaneously assigning it to “all the higher classes, the

genera, of which that class was a species.” (195) For example, ‘a man of fifty years

old’ falls within the class of ‘mammals of fifty years old’ and, of course, the class

of ‘living things of fifty years old’ – to name just two. Venn argues, on these

grounds, that reference classes are thus assigned somewhat arbitrarily; indeed, John

Smith does not naturally present himself with one particular reference class to

which he definitively ought to be assigned. One must, therefore, assign him to a

reference class by means of some auxiliary judgement: !“In saying that it is thus arbitrary under which

class he is placed, we mean, of course, that there

are no logical grounds of decision; the selection

must be determined by some extraneous

considerations.”

(Venn, 1876: 195)

For Venn, the problem of the reference class was profound and largely shaped by

his own philosophical theory of probability, as I address below.

In Cambridge, United Kingdom, at the middle of the nineteenth century,

Venn and Ellis were attempting to establish a new interpretation of probability: the

frequentist theory. (Gillies, 2000: 88) Donald Gillies outlines the philosophical

foundations of their new probabilistic theory, below: !“In contrast to both [logical and subjective] views,

the frequency approach sees probability theory as a

mathematical science, such as mechanics, but

dealing with a different range of observable

phenomena.”

(Gillies, 2000: 88)

In an attempt to rid probabilistic inquiry of subjectivity, Venn strove to establish a

theory of probability that was entirely grounded in science: affording one objective

probabilistic truth. However, for Venn, difficulties selecting the most appropriate

reference class only proved to taint his theory of probability with an apparently

inescapable element of subjectivity. Indeed, Hájek argues that, to this day, the


Reference Class Problem is often regarded as “the most serious problem that

frequentism faces.” (Hájek, 2000: 565)

In his Logic of Chance, Venn proceeds to develop his problem of reference

with specific respect to probability in some detail. However, it was not until

Reichenbach’s, The Theory of Probability (1949) that the notion began to decidedly

resemble the ‘Reference Class Problem’ with which one may be familiar, today.

2.1.2 Reichenbach

Reichenbach’s account of the problem is largely analogous to Venn’s, above,

however it is nuanced in two relevant respects. Firstly, Reichenbach establishes the

problem nominally; secondly, he puts forward a more refined approach to

constructing a suitable reference class than Venn’s “[determination] by some

extraneous considerations” (Venn, 1876: 195) that we have seen, above.

Reichenbach ultimately concedes, along with Venn, that there appears to be no

definitive solution to the problem.

The culmination of Reichenbach’s discussion of the Reference Class

Problem is illustrated in the following passage from his Theory of Probability

(1949):

“If we are asked to find the probability holding for

an individual future event, we must first

incorporate the case in a suitable reference class.

An individual thing or event may be incorporated

in many reference classes, from which different

probabilities will result. This ambiguity has been

called the problem of the reference class.”

(374)

Most interestingly, Reichenbach proceeds by explicitly introducing the notion that

new information about the thing or individual in question can be valuable in order

to assign it to a more appropriate reference class. He continues by outlining that, on

refinement of a reference class, a more accurate probability for a specific future

event outcome can be subsequently derived. (1949, 372–378)

Reichenbach illustrates this point by way of a thought experiment. He

proposes that, subsequent to investigation by chest X-ray, a patient can be moved

from ‘patients with tuberculosis’ to ‘patients with severe tuberculosis’, based on the

new information available to the physician. (1949: 394) In light of this new

information, a more accurate probability of the patient’s prognosis can thence be


derived. He concludes that, vis-à-vis the Reference Class Problem, one ought to 3

proceed by “considering the narrowest class for which reliable statistics can be

compiled.” (1949, 374)

Although Reichenbach proposes a way in which the Reference Class

Problem might be attenuated, it is important to note that he explicitly ascertains

that the problem is ultimately inescapable. (1949: 374) Indeed, the ubiquitous and

pervasive nature of the Reference Class Problem is elaborated on by a number of

contemporary philosophers: most notably by Alan Hájek in 2007, whereby he

emphatically argues that the Reference Class Problem is your problem too.

2.1.3 Hájek

The Reference Class Problem was once again brought to the attention of the

philosopher of science in 2007, by Alan Hájek. In his paper, Hájek outlines the

different interpretations of probability that presently exist and proceeds to explicate

how the Reference Class Problem affects each and every (useful) probabilistic

interpretation, in turn.

Hájek begins by outlining that there are (at least) five established

interpretations of probability that one has the capacity to adhere to (each of which

is comprised of two further subclasses): !1. Frequentism: (i) actual and (ii) hypothetical.

2. Classical: (i) finite sample spaces, and (ii) infinite

sample spaces.

3. Logical: (i) fully constrained and (ii) less constrained.

4. Propensity: (i) frequency- or symmetry-based and (ii)

neither frequency- nor symmetry-based.

5. Subjectivism: (i) radical and (ii) constrained.

(Hájek, 2007: 566)

In his paper, Hájek argues that the Reference Class Problem has hitherto been

largely referred to with respect to frequentism alone. He thence proceeds to

methodically explicate that, in actuality, each and every possible probabilistic


This may seem an obvious conclusion, however, vis-à-vis our current paradigm of medical practice, it is 3

one that appears to be somewhat overlooked. Venn and Reichenbach’s approaches to attenuating the Reference Class Problem will be returned to in Chapter 4, where I address ‘Constructing Reference Classes’ with respect to evidence-based medicine.

interpretation is susceptible, in its own particular way, to the Reference Class

Problem. 4

It is important to note that Hájek does in fact concede that there exist few

cases in which certain interpretations of probability can (and do) evade the

Reference Class Problem: for example, ‘radical subjectivism’. However, such

evasion, comes at a price; Hájek outlines that this probabilistic interpretation has

entirely no parameters and is, thus, entirely void of any possible practical use: !“For example, you may with no insult to rationality

assign probability 0.999 to George Bush turning

into a prairie dog, provided that you assign 0.001

to this not being the case (and that your other

assignments also obey the probability calculus)…

Your probability assignments can be completely at

odds with the way the world is, and thus are

‘guides [to life]’ in name only.”

(2007: 577)

This observation, ultimately, leads Hájek to conclude that the Reference Class

Problem “is seemingly inescapable among theories that make substantive claims

about what probabilities are and how they should be determined–that might be

genuine guides to life.” (2007: 580) 5

Hájek argues that the Reference Class Problem exhibits itself differently,

according to the interpretation of probability one chooses to adopt. Ultimately,

however, any useful probability is inherently relativised to some set of conditions

(2007: 583) and, thus, is inescapably susceptible to the Reference Class Problem in

some way or another. For Hájek, the Reference Class Problem is ubiquitous and,

consequently, your problem too.

2.2 The ubiquity of the problem

The Reference Class Problem appears to be troublingly pervasive. For this

reason, it has been of significant interest to both philosophers and statisticians

across a wide range of disciplines in recent years. For example, it has been

addressed with respect to risk and safety by Ayen & Reniers in 2013 and in the

context of legal evidence by Cheng in 2009. The problem, however, appears to

have been given little attention in the medical profession.


I do not go into each interpretation in detail in this paper as Hájek has already done so commendably. 4

(2007) I do, however, later return to those interpretations that are pertinent to evidence-based medicine specifically: namely frequentism and subjectivism, in Sections 3.3.2 and 3.3.3, respectively.

For more on radical subjectivism, Hájek directs the reader to de Finetti (1937).5

As we have seen in this section, Reichenbach touched on the Reference

Class Problem in a thought experiment that involved a patient with tuberculosis in

his Theory of Probability (1949). Indeed, more recently, Sterne addressed the

problem with respect to “ICU mortality predictions” in 2010. These accounts,

however, are largely deficient in one important respect. They address the ways in

which the Reference Class Problem affect the probability of an individual’s risk of

survival at a pre-interventional level. However, the Reference Class Problem

appears to be pertinent to the medical profession in a far more substantive respect

than has hitherto been acknowledged: at an interventional level.

This paper outlines that the Reference Class Problem appears to be firmly

entrenched in our current evidence-based medicine paradigm, at the level of the

randomised control trial that currently underpins medical guidelines. Interference

with another individual’s health on the basis of irrelevant evidence is extremely

morally problematic and, for this reason, an investigation into the Reference Class

Problem vis-à-vis evidence-based medicine, is long overdue.

We will now move on to address how the Reference Class Problem appears

to be of relevance to evidence-based medicine, specifically. Subsequently, in

Chapter 4, I propose avenues for further research that are both compatible with the

nature of the problem and incorporate suggestions from both Venn and

Reichenbach (1876 & 1949, respectively).


3. THE PROBLEM VIS-À-VIS EVIDENCE-BASED MEDICINE

The purpose of this chapter is to illustrate the pertinence of the Reference

Class Problem vis-à-vis evidence-based medicine; I begin by way of an analogy

(see Section 3.1). I thence unpack the framework of the evidence-based medicine

paradigm, at the level of the randomised control trial, in Section 3.2. I illustrate that

in order to draw conclusions from trial populations, some form of statistical

inference vis-à-vis individual future cases is necessitated. In this chapter, I

subsequently address the predominant interpretations of probability that presently

underpin randomised control trials (namely, frequentism and Bayesianism). I

finally explicate that the Reference Class Problem persists vis-à-vis evidence-based

medicine, regardless of which interpretation of probability one chooses to adopt. 6

3.1 Preliminary analogy

Assume, if you will, that a new calcium-channel blocker has been recently

developed that is thought to potentially have an anti-hypertensive effect on human

physiology. In order to investigate the potential pharmacological merits of the new

drug, a double-blind randomised control trial was designed in which the drug was

administered to a trial population of, say, men with hypertension over the age of

55. Let us now assume that the trial was successfully run and the findings

concluded that the drug was effective in 90% of the men in the trial population.

The results were consequently published for the attention of medical practitioners.

Let us now assume that a doctor is met with a 58 year old male (‘patient Q’),

whom is found to have hypertension. Adhering to the policy outlined by the

Evidence-Based Medicine Working Group (Guyatt, et al.), the doctor undertakes a

literature survey to establish the most appropriate treatment for his particular

patient:

“[The resident] proceeds to the library and…

conducts a computerized literature search. She

enters the Medical Subject Headings terms

epilepsy, prognosis, and recurrence, and the

program retrieves 25 relevant articles. Surveying

the titles, one appears directly relevant. She

reviews the paper, finds that is meets criteria she

has previously learned for a valid investigation of


For a rigorous account of the Philosophical Theories of Probability, see Gillies (2000).6

prognosis, and determines that the results are

applicable to her patient.”

(Guyatt et al., 1992: 2420)

Let us assume that the doctor followed the above dogma judiciously and

consequently stumbled upon the new antihypertensive drug outlined at the

beginning of this section. On rigorous and prudent analysis, the trial is praised by

the doctor to have been methodologically sound and statistically significant. The

patient in question, a 58 year old man with hypertension, falls neatly into the

reference class of the trial population (‘men with hypertension over the age of 55’),

thus, the doctor prescribes him the new drug. However, despite the doctor’s dutiful

adherence to the recommendations outlined in the name of evidence based-

medicine, let us assume that the drug did not have the desired effect on the patient.

I now outline how this result can be plausibly explained in terms of the Reference

Class Problem.

If we return to Hájek’s work on the Reference Class Problem, we can

observe that he summarises the problem in the following, succinct, fashion: !“Relativized to condition A, X has one probability;

relativized to condition B, it has another; and so

on. Yet none of the conditions stands out as being

the right one.”

(2007: 565)

The chosen reference class in our above randomised control trial is ‘men with

hypertension over the age of 55’. With respect to Hájek’s above model, “relativized

to condition A” (‘men with hypertension over the age of 55’), X (the desired effect

of reduced blood pressure) has probability 0.9. The Reference Class Problem,

however, arises here due to the fact that the particular patient of concern, above, is

more than just ‘a man with hypertension over the age of 55’. Consider, if you will,

that he is a man of African ethnic origin. There is, thus, no grounds on which to be

certain that “relativized to condition B” (‘men of African ethnic origin with

hypertension, over the age of 55’), the patient will remain to have a 0.9 probability

of experiencing the expected pharmacological effect of the new drug. Indeed,

recent research has led academics and policy makers to acknowledge that ethnicity

is a relevant factor on which pharmacological efficacy can be dependent. (Clarke,

et al., 2013: 13–14 with reference to NICE, 2011)

To emphasise the clinical importance of the problem, consider a 35 year old

man with hypertension, of African ethnic origin – ‘patient R’. With a desire to

lower his blood pressure, the patient’s doctor consults the National Institute of


Clinical Excellence (NICE) guidelines on the appropriate administration of

antihypertensives: !“Offer step 1 antihypertensive treatment with a

calcium-channel blocker (CCB) to people aged

over 55 years…”

(NICE, 2011)

On the above interpretation, it would appear that ‘patient R’ ought not receive the

above treatment. However, the guidelines continue, as follows: !“… and to black people of African or Caribbean

family origin of any age.”

(NICE, 2011)

In other words, by merely altering the reference class to which ‘patient R’ is

assigned, his predicted physiological response to CCBs appears to have changed

(see table, immediately below): !

!A priori, neither of the above reference classes appear to be the right one: ‘Patient

R’ falls into both equally and entirely. It is only with some auxiliary judgement that

the correct reference class can be identified. 7

As we have seen, above, the Reference Class Problem appears to have the

potential to significantly undermine the evidence on which our current medical

practice is presently being based. Before addressing how one is to potentially solve

– or at very least, attenuate – the problem, it is necessary to explore the framework

of evidence-based medicine in greater detail. In the following section, there is an

elaboration of randomised control trials; the stage at which reference classes first

begin to take shape. This leads on to a subsequent discussion of how results

observed in a randomised control trial are assumed to apply patients beyond merely

those within the trial population.

Reference Class of ‘Patient R’ NICE guideline

‘people under 55 years old’

prescription of CCB not justified

‘people of African family origin’

prescription of CCB justified!

!


This is analogous with Venn’s “extraneous conditions” (Venn, 1876: 195), addressed in Section 2.1.1. This 7

is a point to which I refer again, in Section 4.3.2, where I suggest a potential solution to the Reference Class Problem vis-à-vis evidence-based medicine.

3.2 Clinical trials

An exhaustive clinical trial is typically a four stage process; the third of

which is often referred to as a randomised control trial. The inherent aim of a

randomised control trial is to identify, by experimentation, “the most appropriate

treatment for future cases.” (Teira, 2011: 255 citing Pocock, 1983) This section

gives an overview of the four stages of the trial process by which a new drug is

made clinically available. 8

Stage I clinical trials are focussed largely on identifying an appropriate

dosage of a new drug. The treatment of concern is administered to some small 9

number of participants (less than 30), whom are thence closely monitored for any

changes in their physiology (either positive or adverse). Once a preliminary dose 10

has been determined (by means of weighing up the therapeutic benefit of the drug

with its side-effects and toxicity), the drug will likely proceed to Stage II of the

trial process. (Hackshaw, 2009: 9 & Teira, 2011: 255)

Compared to Stage I clinical trials, Stage II involves a slightly greater

number of participants and the main focus of investigation is the establishment of 11

a “preliminary estimate of efficacy.” (Hackshaw, 2009: 9) This stage is not strictly

designed in order to identify whether the drug is effective or not; the primary aim is

to generate preparatory data that may prove to be useful in designing an

appropriate Stage III trial. (Hackshaw, 2009: 9)

Stage III clinical trials – with which we are predominantly concerned in this

paper – invariably concern an extremely large number of participants (“usually


As Teira notes, trials can be designed to assess the outcomes of practically any form of intervention. In a 8

medical context, this can typically include “medical devices, surgery, alternative medicine therapies, etc.” (2011: 255) In this paper, for the purposes of clearly explicating the Reference Class Problem vis-à-vis evidence-based medicine, I will outline the problem with respect to drug trials alone. As we have seen, above, the Reference Class Problem is ubiquitous and the reader, as he/she wishes, is encouraged to explore the pertinence of the problem beyond simply randomised control drug trials.

‘Stage I clinical trials’ are alternatively referred to as ‘translational clinical trials’ as they form a bridge 9

between the developmental process in the laboratory and the human experience in the clinic. They are “among the most common types of clinical trials performed.” (Piantadosi, 2005: Ch9)

Observation is often extensive and scrupulous. It often, indeed, extends beyond merely physiological 10

effects: given the circumstances, pharmacological and/or psychological observation may be apt.

It is important to note that there is a degree of discontinuity in the literature over an exact figure; 11

Hackshaw outlines that a Stage II trial often concerns “around 50 [subjects]” (2009: 10), whereas Teira reports that the figure is somewhere “between 100 and 200.” (2011: 255) For the purposes of this paper, however, one need not concern oneself with the particulars; the number of participants in Stage II trials is almost entirely circumstantial and, indeed, of little relevance – if any – to our interest in the Reference Class Problem.

several hundred or thousand people” [Hackshaw, 2009: 9]). Participants are

randomly allocated into either the interventional group (that receives the new

treatment) or the control group (that receives the current standard of treatment, for

comparative purposes). Thus, it is this stage in the clinical trial process that is often

referred to as the ‘randomised control trial’ (or RCT). Hackshaw outlines that this

stage of the trial process alone often take years, or longer, to complete and must be

large in order to achieve the statistical power necessary to convince policy makers

to amend current guidelines. (2009: 9–11) 12

Once a “definitive answer on whether a new treatment is better than the

control group”, the new drug enters Stage IV of the trial process. For completeness,

though not of our primary concern in this paper, Stage IV clinical trials involve

incorporating the new treatment into clinical practice. The effects of the new drug

are closely monitored in a clinical context and any important information is fed

back to the pharmaceutical company. (Hackshaw, 2009: 11 & Teira, 2011: 255)

It is the third stage of clinical trials (otherwise known as randomised control

trials) with which we are concerned in this paper, for the following reason: !“A… phase II study is not usually designed for a

direct statistical comparison of the trial endpoint

between two interventions… However, a phase III

trial is designed for a direct comparison, allowing a

full evaluation of the new intervention and,

usually, a definitive conclusion.”

(Hackshaw, 2009: 10–11)

Deep within the philosophy of the randomised control trial (clinical trial Stage III)

is this comparative assumption; it is assumed that the results observed in a trial

population (by comparison of treatment and control groups) will repeat themselves

in future cases. This assumption is often referred to as ‘external validity’ or

‘generalisation’ and is the point at which the Reference Class Problem takes hold.

Given that randomised control trials (and consequently evidence-based medicine,

more broadly) depend entirely on this notion of ‘external validity’, it is apt to

explore the assumption in greater detail, at this stage of our discussion.


This reference to ‘statistical power’ is largely pertinent to randomised control trials run with respect to the 12

frequentist interpretation of probability. This is a point to which I return at length in Section 3.3.2 and again in Section 4.2.2.

In the following section, the reader will observe that ‘external validity’ is

more complicated than many scientists and statisticians frequently acknowledge. 13

Firstly, we shall see that ‘external validity’ is a matter of probabilistic inference,

thus, subject to the Reference Class Problem. However, depending on which

interpretation of probability one chooses to adopt, the Reference Class Problem

appears to present itself in a slightly different form. Consequently, there follows a

discussion of the main interpretations of probability that currently underpin

randomised control trials (namely: frequentism and Bayesianism). Immediately

following a discussion of the different probabilistic interpretations relevant to

randomised control trials, follows the ways in which the Reference Class Problem

arises in each case, respectively.

3.3 External validity/generalisation

3.3.1 A matter of probabilistic inference

In order to draw any conclusion from a set of data, “analytic tools become

necessary.” (Piantadosi, 2005: 108) Therein, however, emerges a problem: !“Statistics is not a perfectly unified field,

particularly with regard to the best method for

making inferences from data.”

(Piantadosi, 2005: 125)

There are, indeed, a number of statistical models with which one is able to make an

apparently objective inference from a data-set. Furthermore, there appears to be no

consensus over which mode of statistical analysis is universally the right one.

With respect to randomised control trials specifically, statistical inferences

made from gathered data are largely – if not entirely – probabilistic: !“Once the end point for the evaluation of the

treatment is reached, the interpretation of the

collected data determines whether or not we should

accept our hypothesis about the effectiveness of the


For example, Vitoria, et al. (2004) declare that one can confidently rely upon external validity due to “the 13

assumption of ‘universal biological response’, i.e. different individuals will respond to a treatment or drug in the same way.” (Clarke, et al. 2013: Sect 2.4.2) Piantadosi has a somewhat similar attitude to Victoria, et al. (2004), however, he is more constructive in his disregard of critics of external validity; he claims that “[critics] ignore the principle justification for external validity, that being biological knowledge regarding mechanism.” On this point, I largely agree with Piantadosi and it is a point to which I briefly return in Section 4.3.2. There I agree that a ‘biological knowledge regarding mechanism’ is, indeed, necessary in order to make confident claims about external validity (thus, avoid the Reference Class Problem) and that this might be an avenue for further research.

drug, assigning a certain probability to this

judgement.”

(Teira, 2011: 255)

As has been addressed earlier in this paper, there exist a number of interpretations

of probability. It is, thus, apt to explicate how the Reference Class Problem arises 14

in the context of the predominant probabilistic interpretations that presently

underpin randomised control trials (and, consequently, evidence-based medicine,

more broadly).

The majority of randomised control trials are underpinned by frequentism.

However, as Teira argues, there is growing support for a Bayesian approach to

drawing inferences from randomised control trial data. (2011) I will, therefore, now

address how the Reference Class Problem arises in each of these cases,

respectively.

It is important to note that the interest of this paper is not to advocate one

interpretation of probability over another. The purpose of this section is to merely 15

explicate that there exist a number of probabilistic interpretations that presently

underpin randomised control trials. Furthermore, it appears that the Reference

Class Problem persists regardless of which probabilistic interpretation one chooses

to adopt.

3.3.2 Frequentist interpretation

The vast majority of randomised control trials are presently underpinned by

the frequentist interpretation of probability. Ever since the first randomised control

trial was run in 1948, frequentism has been embraced “as a testing standard by the

international medical community and by pharmaceutical regulatory agencies all

over the world.” (Teira, 2011: 256) As we are about to see, however, it is not long

before frequentism is corrupted by the Reference Class Problem.

As we have seen, frequentism “was first developed in the middle of the

nineteenth century by the Cambridge school of Ellis and Venn”. (Gillies, 2000: 88)

The frequentist theory of probability has since developed into a binary notion: it

can stand in reference to (i) actual frequentism or (ii) hypothetical frequentism.

Actual frequentism is relevant to discuss largely for historical reasons, as it is

rarely esteemed by contemporary statisticians. (Hájek, 2007: 566) Proponents of

actual frequentism, such as Venn: !


For a extensive and thorough account of different philosophical theories of probability, see Gillies (2000).14

For an account of the advantages and limitations of different probabilistic interpretations vis-à-vis 15

randomised control trials, see Frequentist versus Bayesian Clinical Trials (Teira, 2011).

“[identified] the probability of an attribute or event

A in a reference class B with the relative frequency

of actual occurrences of A within B.”

(Hájek, 2007: 566)

An actual frequentist’s approach to probability, outlined in the quote above, is

analogous to Venn’s ‘John Smith’ model, that we saw in Section 2.1.1. Therein, the

probability that John Smith would live for a further 11 years more was simply

determined by collecting data on similar people to Mr. Smith. (Venn, 1876: 194 &

Hájek, 2007: 566)

Hypothetical frequentism has since been born out of actual frequentism: !“Hypothetical frequentists such as Reichenbach

(1949) and von Mises (1957) are inspired by the

dictum that probability is a long-run relative

frequency.”

(Hájek, 2007: 567)

The hypothetical frequentist interpretation of probability is defined by seeking to

gather the greatest number of actual observations possible. Hypothetically, the true

probability of an attribute or event can be, thus, represented by limit, p, as n → ∞

in a sequence of observations. It is this hypothetical interpretation of frequentist 16

probability (with a tendency toward an infinite number of observations) by which

the vast majority of randomised control trials are presently underpinned. (Teira,

2011)

As we have seen, in Section 3.2, a randomised control trial (Stage III clinical

trial) is run with a large number of subjects. The reason for this is, in essence, to

cater to the demands of hypothetical frequentism; in order to achieve the statistical

power necessary to draw a reliable probability from a data set of actual

observations, the number of observations must be as great as possible. In other

words, the observed, cumulative probability of the effect of the treatment on a trial

population will be closer to that of the true probability as the trial population

increases in number. Indeed, the true probability of a treatment outcome is

represented by limit, p, as n → ∞. (Reichenbach, 1949: 69) This hypothetical

frequentist probability is then, ultimately, applied to future cases, as a guide to

clinical decision-making. There arises a problem, however, as no randomised

control trial is run with a trial population of infinity.

In order to overcome this pragmatic problem with the hypothetical

frequentist interpretation of probability, biostatisticians employ 'significance tests’

to transform observed data into generalised, hypothetical frequentist probabilities


For a more on hypothetical frequentism, see Reichenbach (1949: 67–69) and Gillies (2000: 96–105).16

(wherein n → ∞). Arguably the most predominant significance test is that of the p-

value – the hypothetical nature of which is outlined, below: 17!“Once the experiment is run and actual data

provide the observed value of the statistic, we can

also calculate how likely it is, assuming the truth of

the [null] hypothesis, to obtain a result with less or

equal probability than the observed one: this is the

p-value. In other words, the p-value is the

proportion of an infinite series of repetitions of an

experiment, all conducted assuming the truth of the

null hypothesis, that would yield data contradicting

it as strongly or more so that the observed result.

Therefore, the p-value is a probability of observed

and unobserved results which is tied to the design

of the experiment and cannot be properly

interpreted without it.” 18

(Teira, 2011: 260)

It is important to note, however, that the: “the truth of the hypothesis can never be

established with significance testing: it is just assumed.” (Teira, 2011: 261)

Let us assume, for a moment, that a hypothetical frequentist probability was

actually determined: by means of an infinitely large trial population, or by means

of some infallible significance test. Even with this hypothetical frequentist

probability of the effects of a treatment in a trial population – which may initially

appear to be, quite unequivocally, the true probability – the Reference Class

Problem arises at the point at which the probability of a future event-token is

sought. We have reached the crux of the Reference Class Problem vis-à-vis the

frequentist randomised control trial.


It may be necessary to clarify that a ‘null hypothesis’ is simply “the reverse of what the study is designed 17

to show [and] is such that the researcher usually wants to reject it”. (see Daly & Bourke, 2000: 67-69) For example, in a drug trial, the null hypothesis may be that the drug has no therapeutic effect.

For a more detailed account of hypothetical frequentist ‘significance testing’ with respect to randomised 18

control trials, specifically, see (Teira, 2011: 260–262). For more on null hypotheses, p-values and other statistical constructs in clinical trials (and evidence-based medicine, more generally), see Piantadosi (2005) and Daly & Bourke (2000). For the sake of simplicity (and momentum), this section focuses on the p-value –a cornerstone of statistical inference in frequentist randomised control trials. However, the reader will hopefully soon be able to see the pertinence of the Reference Class Problem beyond merely that of the p-value.

To illustrate the Reference Class Problem with respect to hypothetical

frequentism, allow me to borrow a thought experiment from Hájek. (2007: 567)

Therein, he poses that “we are interested in the probability that a given coin lands

heads on a given toss.” (567) Supposing that the outcome (‘heads’ or ‘tails’) of one

particular toss were sought: !“we may suppose that the probability from ‘all tosses of

our coin’ to ‘heads’ is well-defined (non-trivial though

the supposition is).”

(Hájek, 2007: 567)

However, the Reference Class Problem arises as it is possible (and, perhaps,

appropriate) to break down the reference class ‘all tosses of our coin’. Indeed,

Hájek points out that ‘coin tosses’ can be specified differently: “for example, as a

toss of our coin with such-and-such angular momentum, or within a certain time-

period”. (567)

With respect to the probability of a particular coin-toss giving ‘heads’, the

coin-toss can be incorporated qua member of ‘coin-tosses’, however, also qua

member of ‘coin-tosses of a given angular momentum’ and so on. Gathering actual

frequentist data on the proportion of ‘coin-tosses’ that give ‘heads’, will almost

certainly be different from the data gathered on the proportion of ‘coin tosses of a

given angular momentum’ that give ‘heads’. Even if the coins were tossed an

infinite number of times (thus, indulging the dictum of hypothetical frequentism),

the two reference classes would yield difference frequentist data. Consequently, a

particular coin-toss qua member of ‘coin-tosses’ will give one probability of

‘heads’; whereas a particular coin-toss qua member of ‘coin-tosses of a given

angular momentum’, will give another probability of ‘heads’, and so on. (Hájek,

2007: 567) As such, the frequentist interpretation of probability (both actual and

hypothetical) has hit the Reference Class Problem.

The relevance of the problem to randomised control trials will hopefully be

becoming clear. In order to ensure such clarity, allow the following explication.

Frequentist randomised control trials with a trial population of ‘men over 55 years

old’ would almost certainly gather different data to another with a trial population

of ‘people of African family origin’. Suppose a doctor was presented with a patient:

a 60 year old man of African family origin. This particular patient qua member of

one reference class (e.g. ‘men over 55 years old’) will appear to have one

probability of experiencing a given treatment effect, whereas the same patient qua

member of another reference class (e.g. ‘people of African family origin’) will have

another. Even if we were to assume that a hypothetical frequentist probability was


actually determined, it would not be clear into which reference class this 19

particular patient ought to be incorporated, thus, which probability is the right one.

In summary, even if the true hypothetical frequentist probability of a

treatment effect on a trial population were determined (as randomised control trials

presently aspire to attain), an individual still has to be assigned some reference

class. This inevitably gives rise to the Reference Class Problem. In other words,

even in the case of apparently absolute statistical power, individual patients must

still be assigned to a reference class. Statistical correlation alone (even in its most

theoretically pure form), therefore, appears to be insufficient grounds on which to

assume external validity of randomised control trials. We ought to, consequently,

turn our attention towards constructing appropriate reference classes if we are to

confidently apply the results of randomised control trials to individual future cases.

Before the matter of ‘Constructing Reference Classes’ can be addressed , it is 20

appropriate to briefly address some of the alternative interpretations of probability

that can potentially underpin clinical trials. As will become clear, the Reference

Class Problem cannot be avoided by merely adopting an alternative interpretation

of probability.

3.3.3 Bayesian approach

In recent years, it has been argued that evidence-based medicine ought to

bring frequentism down from its pedestal. In 2011, David Teira published a paper

in which he convincingly highlighted the ethical, practical, regulatory and

epistemological problems with the frequentist interpretation of probability that

currently underpins the vast majority of randomised control trials and,

consequently, medical guidelines more broadly. Furthermore, he proposes that

there is another, perhaps more apt, alternative to frequentism in clinical trials:

Bayesianism.

In this section, it is not necessary to join the debate over the advantages and

limitations of Bayesian versus frequentist clinical trials. Neither is it apt to spend

too much time on the structural or pragmatic demands of a Bayesian trial. For the 21

purposes of this paper, it is simply necessary to highlight that a Bayesian approach


For example, by running trials with populations of infinity, or by employment of some infallible statistical 19

significance test, as we have seen, above.

See Chapter 4.20

For more on Bayesianism see Gillies (2000: 82–85); for more on the Bayesian/frequentist debate, 21

specifically, see Teira (2011); for a historical review of Bayesian statistical approaches in medicine, see Ashby (2006); and for a detail account of Bayesian approaches to clinical trials and healthcare, more broadly, see Spiegelhalter et al. (1994 & 2004).

to clinical trials does not evade the Reference Class Problem. As it is neither

necessary nor possible to fully elaborate the Bayesian statistical framework in this

paper, this section merely highlights the pertinent facet of Bayesianism that gives

rise to the Reference Class Problem.

Under a Bayesian approach, statistical probabilities are ascertained as

degrees of prior beliefs: !“Before the experiment begins, the Bayesian

summarizes knowledge about the unknown

statistical parameter (e.g., treatment effect) in the

form of a probability distribution, called the “prior

distribution.”

(Piantadosi, 2005: 116)

As data is subsequently gathered on the effect of a treatment (from a trial or

elsewhere), a likelihood about the truth of the initial belief about the parameter

(treatment outcome) is ascertained: !“Evidence from further data is summarized by a

likelihood function for the parameter, and the

normalized product of the prior and the likelihood

form the posterior distribution on the basis of

which conclusions should be drawn”

(Spiegelhalter et al., 1994: 360)

To summarise, a Bayesian approach is not limited to solely one particular trial

design, nor any particular canon of observation, such as actual frequentism. A

Bayesian approach is free to incorporate any relevant data that is available. (Teira,

2011: 256) It merely ascertains a likelihood, relativised to the truth of some initial

belief about the statistical parameter (e.g. treatment effect). Such an initial belief

can be based on “objective evidence or subjective judgment or a combination”.

(Piantadosi, 1994: 360) It is out of this ‘initial belief about the parameter’ that the

Reference Class Problem arises.

Let us assume that a Bayesian trial were to be run and the initial prior belief

concerning some parameter was well-supported: for example, ‘CCBs reduce blood

pressure’. However, the blood pressure of whom? In order to proceed with the 22

trial, further information is gathered about the effects of CCBs. However, effects of

CCBs on whom? In order to ascertain a likelihood about the truth of the parameter,


If an initial parameter were well-supported by an esteemed body of evidence, it may not seem clear why a 22

trial ought to be run in the first place. However, although the evidence supporting an initial parameter may be congruent, it may be lacking in other respects, such as: quality, statistical power or objectivity. After all, Bayesian trials can be run on the bases of entirely subjective beliefs.

the data gathered must be relativised to some reference class. (Hájek, 2007: 567)

Data gathered on the effects of CCBs on one reference class (for example,

‘people’) may very well differ from data gathered on another (e.g. ‘people of

African ethnic origin’). In such circumstances, the likelihood ascertained about the

truth of the parameter will differ. The Reference Class problem arises, again, with

respect to the external validity of the trial's results. 23

As we have seen, a particular patient “has an indefinite number of properties

or attributes observable in it” (Venn, 1876: 194) and can therefore be “incorporated

in many reference classes, from which different probabilities will

result.” (Reichenbach, 1949: 374) In isolation, however, none of the reference

classes appear to be, entirely, the right one. The Bayesian approach to clinical trials

has, consequently, fallen upon the Reference Class Problem in much the same way

as frequentism has done.

3.4 Summary

Clinical trials presently underpin evidence-based medicine: as we have seen

in this chapter, they provide observational data from which conclusions are drawn

about the effects of given interventions. Such conclusions thence shape clinical

guidelines and are, ultimately, used to guide the most appropriate treatment of

future individual patients. However, in order for the observed results of clinical

trials to apply to future cases, it is necessary to adopt some interpretation of

probability. (Piantadosi, 2005: 108)

This section has outlined, in detail, how the Reference Class Problem applies

to the standard frequentist randomised control trial. Furthermore, it has been

illustrated that favouring a Bayesian approach to the analysis of trial data does not

manage to evade the problem in any useful way. Indeed, as Hájek argues, the

Reference Class Problem pertains to every useful interpretation of probability due

to the fact that probabilities, by their very nature, must be relativised to some

reference class. (2007: 567) In the context of evidence-based medicine,

incorporating a particular patient in different reference classes will yield different

probabilities of his effect to a treatment. However, it may not be clear which is the

most appropriate reference class for him, thus, which the true probability really is.


There is, of course, the circumstance in which there is no evidence to support the initial belief concerning 23

the parameter. In this particular case, the Reference Class Problem does not arise. However, in this circumstance, as Hájek puts it, “your degrees of belief can be whatever you like… [thus], Your probability assignments can be completely at odds with the way the world is”. (2007, 576–577) A Bayesian probability drawn from an unsupported initial belief concerning the parameter may well evade the Reference Class Problem, but it is also likely to be entirely vacuous. For more on this point, see Hájek (2007: 576–577).

In order to address the Reference Class Problem, we appear to have to divert

our attention to the ascertainment of the most appropriate reference class. In the

following chapter, I outline the problem in terms of drawing conclusions from

heterogeneous trial populations and, ultimately, argue that references classes ought

to be constructed in accordance with statistically-relevant homogeneity.


4. CONSTRUCTING REFERENCE CLASSES

4.1 Heterogeneous reference classes

4.1.1 Venn

The Reference Class Problem essentially arises as probabilities about future

individual cases are drawn from heterogenous trial populations. Such heterogeneity

can give rise to an element of uncertainty over the most appropriate reference class

to which a particular individual case ought to be assigned. Consequently, statistical

correlation drawn from a heterogeneous trial population, alone, appears to be

insufficient grounds on which to ensure the external validity of trial results, with

respect to particular future patients. 24

In his Logic of Chance, Venn outlines that the Reference Class Problem

arises as a product of drawing probabilities about individual cases from

heterogeneous populations: !“Now when it is said of any such heterogeneous

body that, say, nine/tenths die, what is meant (or

rather implied) is that the class might be broken up

into smaller subdivisions of a more homogeneous

character, in some of which, of course, more than

nine-tenths die, whilst in others less, the

differences depending upon their character,

constitution, profession, &c. ; the number of such

divisions and the amount of their divergence from

one another being perhaps very considerable.”

(1876: 208)

Consequently, by refining the reference class of an individual case in some way, his

particular probability of dying may be altered considerably. The Reference Class

Problem duly arises as it may not be entirely clear which reference class is entirely

the right one.

Intuitively, it follows that if there were some way of ensuring complete

homogeneity in a trial population, the Reference Class Problem could be evaded

entirely; there would, consequently, be no need to reference particular cases at all:


See Chapter 3, particularly Section 3.3.2.24

an individual’s reference class would be everything he is. The next chapter, 25

consequently, breaks down this matter of ‘homogenisation’, as it is more complex

than it may initially appear to be. As we will see, the homogenisation of trial

populations – at least in a crude sense – only proves to replace the Reference Class

Problem with another. I argue, however, that homogeneity in only the relevant

respects may prove to be a viable, pragmatic solution to the Reference Class

Problem vis-à-vis evidence-based medicine.

4.2 Homogeneous reference classes

Somewhat ironically, ‘homogeneous reference classes’ is a multifarious

notion that requires unpacking. It is therefore necessary to establish exactly to what

‘homogeneous reference classes’ can be referring. In this section, I address the

possible interpretations of ‘homogeneity’ one is free to adopt and consequently

argue that homogeneity in merely the statistically relevant respects is necessary vis-

à-vis the Reference Class Problem.

4.2.1 Absolute homogeneity

Randomised control trials with a homogenous trial population – in an

absolute sense – would have a trial population of one; the trial population would

consist entirely and solely of the very patient to whom the results of the trial would

then be used to treat. For obvious reasons, this is neither feasible nor useful in any

evident sense. With respect to the current framework of randomised control trials,

absolute homogeneity would require running a personalised trial for every

treatment that exists, on the every patient that may require the results. Such a

proposal is ludicrous. Furthermore, the statistical power of conclusions drawn from

a trial population of one would be so minuscule that they could not be applied to

any other individuals. On this interpretation, absolute homogeneity – in the crudest

sense – is entirely incompatible with the randomised control trial: it only replaces

the Reference Class Problem with a plethora of others.

Despite being ludicrous vis-à-vis randomised control trials, absolute

homogeneity may be useful in the sense of ‘personalised medicine’. This is,

perhaps, an avenue to explore with respect to evidence-based medicine, however, it

is neither possible nor appropriate to address this matter here. The purpose of this

paper is to explicate the Reference Class Problem vis-à-vis current medical practice


The reader ought to be reminded of Reichenbach’s solution to the Reference Class Problem that was 25

outlined in Section 2.1.2, wherein he proposed that a particular attribute or event ought to be assigned to the “narrowest class for which reliable statistics can be compiled.” (1949: 374) As we will see in the next section, this matter of ascertaining ‘reliable statistics’ can prove problematic with the homogenisation of trial populations.

(specifically, randomised control trials). An inquiry into the potential advantages of

personalised medicine would require some renovation of our current statistical

tools, or even an entirely new approach to gathering evidence on which to base

medical practice. Although a potentially fruitful endeavour, such an inquiry is not

possible to cover here and goes beyond the aims of this paper: to illustrate the

pertinence of the Reference Class Problem to our current framework of evidence-

based medicine (and not overhaul the paradigm entirely).

4.2.2 Salmon

As we have seen in Section 2.1.2, Reichenbach proposed a solution to the

Reference Class Problem in his Theory of Probability, concluding that we are to: !“proceed by considering the narrowest class for

which reliable statistics can be compiled.”

(Reichenbach, 1949: 374)

This proposal, however, appears to yield an element of pragmatic difficulty. As a

reference class is narrowed, it consequently has, by definition, fewer members and

this appears to be antithetical to the attainment of statistical power. Picking up on

this point, Salmon sets out a reformation of Reichenbach’s solution: ! “The aim in selecting a reference class to which to

assign a single case is not to select the narrowest,

but the widest, available class. However, the

reference class should be homogeneous, and

achieving homogeneity requires making the

reference class narrower if it was not already

homogeneous. I would reformulate Reichenbach’s

method of selection of a reference class as follows:

choose the broadest homogeneous reference class

to which the single event belongs. I shall call this

the reference class rule.”

(Salmon, 1970: 43)

Although this proposal may initially appear conclusive, it raises one crucial

question: homogeneity in respect to what?

As we have seen, absolute homogeneity is far from appropriate in the

context of randomised control trials. Reference classes, therefore, appear to have to

be homogeneous in some relevant respect(s). A particular patient may be assigned

to the broad reference class ‘people’ that is, indeed, homogeneous in many

respects. However, the reference class ‘people’, to which a particular patient

belongs, is less broad than the reference class ‘vertebrates’ to which he also


belongs. Both reference classes ‘people’ and ‘vertebrates’ are homogeneous in their

own respects, however, it would not appear sensical to run clinical trials on

‘vertebrates’, merely because it is a broader reference class than ‘people’. It

appears that we have not yet evaded the Reference Class Problem in any useful

sense.

If it were possible, however, to identify homogeneous reference classes

within which no further statistically relevant partition could be found, this would

neatly satisfy the conditions of Salmon’s above Reference Class Rule. In 1977,

Salmon published an investigation into the possibility of the existence of such

objectively homogeneous reference classes. Therein, he outlines the following: !“A reference class A is homogeneous with respect

to an attribute B provided there is no set of

properties Ci in terms of which A can be relevantly

partitioned. A partition of A by means of Ci is

relevant with respect to B if, for some value of i,

P(A.Ci, B) ≠ P(A,B) . . . To say that a reference

class is homogeneous – objectively homogeneous

for emphasis – means that there is no way, even in

principle, to effect the relevant partition.”

(Salmon, 1977: 399)

Later in his paper, Salmon goes on to elucidate exactly what he means by ‘relevant

partitioning’, which – for the sake of clarity – is important to include at this stage: !“Suppose that P(A,B) = ½. Let C1 = B and C2 = B.

Then P(A.C1,B) = 1 and P(A.C2,B) = 0; thereby a

relevant partition has been achieved.”

(Salmon, 1977: 399)

In order to ascertain the broadest homogeneous reference class to which the

single event belongs, one ought to strive to achieve homogeneity in merely the

relevant respects. Theoretically, a reference class ought, therefore, only include

those characteristics on which the effects of an intervention are dependent, in order

for the class to remain as broad as possible. Often in practice, however, it is apt

merely to ensure that those statistically relevant characteristics are catered for

somewhere within the reference class.

In conclusion, a focus on statistically relevant partitions would appear to

illuminate the most appropriate reference class from which to draw conclusions

about individual future cases. Adhering to Salmon’s Reference Class Rule by

focussing on only relevant homogeneity within a reference class, appears to avoid


sacrificing statistical power. In the next section, I address how such a conclusion

can be applied vis-à-vis randomised control trials.

4.2.3 Relevant homogeneity vis-à-vis randomised control trials

As we have seen, homogeneity in a trial population appears to be useful – if

not, necessary – in order to evade the Reference Class Problem. Homogeneity

appears to enable one to draw reliable probabilistic conclusions from a trial

population that can be confidently applied to individual future cases.

Absolute homogeneity would remove the need for reference classes entirely.

However, this approach vis-à-vis randomised control trials proves to be a

practically inappropriate and a statistically poor solution (see Section 4.2.1).

Relevant homogeneity, however, allows for the reference class of a trial population

to remain broad, which in turn facilitates the collation and analysis of large pools of

data. Indeed, irrelevant homogeneity (i.e. homogenisation by means of some

statistically irrelevant partition[s]) only proves to narrow the reference class of a

trial population unnecessarily, the repercussions of which may significantly

compromise the statistical power of trial results.

4.3 How evidence-based medicine ought to evolve

As we have seen in this paper, the Reference Class Problem has the potential

to undermine the evidence on which our medical practice is presently being based.

This section proposes two ways in which the medical community might move

forward, in light of this problem of statistical inference. My first proposal (Section

4.3.1) is largely an epistemological one: now that the Reference Class Problem has

been outlined vis-à-vis evidence-based medicine, the medical community must

accept and accommodate for its ramifications. My second proposal (Section 4.3.2)

consists of a suggestion for further research into how the most appropriate

reference class for individual cases might be pragmatically ascertained.

4.3.1 Transparency vis-à-vis the Reference Class Problem

In order for clinicians to be able to make an informed decision about the

most appropriate care of a particular patient, evidential transparency is essential.

Indeed, at the heart of the evidence-based medicine paradigm, is the importance of

professional clarity and candour: !“Evidence-based medicine is the conscientious,

explicit and judicious use of current best evidence

in making decisions about the care of individual

patients.”

(Sackett et al., 1996: 71)


I see no reason why such transparency ought not apply to the Reference Class

Problem vis-à-vis evidence-based medicine.

My proposal for the future of evidence-based medicine, in light of the

statistical problem of the reference class, constitutes firstly a mere

acknowledgement of the problem by researchers, biostatisticians, clinicians and

policy-makers (such as, NICE). The extent of the problem has been outlined, in

detail, in this paper and its pertinence to individual patient care ought to be taken

seriously in a clinic context.

On acceptance of the problem, I propose that those aforementioned parties

ought to be explicit about most appropriate reference classes to which individual

patients ought to be assigned and, furthermore, the underlying justification for such

a reference class. Indeed, such transparency is congruous with the fundamental

principles of the evidence-based medicine paradigm, as outlined by Sackett, et al.,

above. (1996: 71) Such clarity would appear to not only enable clinicians to make

an informed decisions about patient care but would also encourage researchers to

focus due attention on ascertaining the most appropriate reference classes for

individual cases.

4.3.2 Extraneous considerations

As this paper draws to a close, I would like to briefly propose a suggestion

regarding further research into a potential solution to the Reference Class Problem

vis-à-vis evidence-based medicine. If I may, I would like to begin by take the

reader back to Venn’s account of the problem of class selection, outlined at the

beginning of this paper, in Section 2.1.1.

In 1876, Venn put forward the following: !“In saying that it is thus arbitrary under which

class he is placed, we mean, of course, that there

are no logical grounds of decision; the selection

must be determined by some extraneous

considerations.”

(Venn, 1876: 195)

By this, Venn is essentially arguing that there appear to be no objective grounds on

which to construct an entirely relevant reference class for a single thing or event;

one must, therefore, look to employ some auxiliary judgement in the construction

of relevant reference classes. I suggest that this approach may be fruitful, not


merely as a solution to the Reference Class Problem in an abstract sense, but also

as a means of attenuating the problem vis-à-vis evidence-based medicine. 26

As we have seen in this paper, the interpretation of results of randomised

control trials requires some element of statistical inference. However, the 27

importance of extraneous considerations must not be overlooked in the

interpretation of trial validity: !“[RCTs] cannot alone support the expectation that

a policy will work for you. What they tell you is

true – that this policy produced that result there.

But they do not tell you why that is relevant to

what you need to bet on getting the result you want

here. For that, you will need to know a lot more.”

(Cartwright & Hardie, 2012: ix)

Such knowledge appears to be of great importance vis-à-vis the Reference Class

Problem. In order to ensure that the results observed in a trial population are

applicable to individual future cases, it is necessary to ensure that reference classes

are statistically relevant with respect to individual patients. As Venn pointed out in

1876, “extraneous considerations” are necessary in order to ensure the right

reference class has been selected for “an individual thing or event”. (194–195) It

would appear that the importance of such considerations ought not be overlooked

in the case of the Reference Class Problem vis-à-vis evidence-based medicine.


This may, for example, come in the form of an understanding of the underlying causal mechanism of the 26

intervention. Indeed, many philosophers have argued that a mechanistic understanding is crucial in order to ensure external validity of trial results. (Schnaffer, 1993: 306–307; Russo & Williamson, 2007; Cartwright & Hardie, 2012; Clarke, et al., 2013) Furthermore, Piantadosi outlines that “biological knowledge regarding mechanism” is “the principle justification for external validity” (2005: 317). An understanding of the underlying causal mechanism(s) of an intervention, therefore, ought not be overlooked and may prove to afford a means by which to ascertain statistically relevant partitions within reference classes. A mechanistic approach is perhaps, therefore, an area for further research into the attenuation of the Reference Class Problem vis-à-vis evidence-based medicine. For more on causal mechanisms in general, see Machamer et al. (2000) and Machamer (2004).

See Section 3.3.1.27

5. CONCLUSION

In this paper, I have outlined a profound problem with our present paradigm

of medical practice. Depending on how one references an individual patient (for

example, as ‘a man over 50 years old’ or ‘a man of African family origin’), his

particular probability of experiencing a given effect of a treatment will differ.

Moreover, it may not be entirely clear which reference class is objectively the right

one for any given patient. This is the Reference Class Problem vis-à-vis evidence-

based medicine; it is a problem with potentially grave ramifications and one that

ought to be explicitly addressed by the medical community.

At the outset of this paper, I began by delivering a detailed account of the

Reference Class Problem, in an abstract sense. The reader is taken through the

works of Venn, Reichenbach and Hájek in order to understand the evolution of the

problem and how it has become of contemporary relevance vis-à-vis evidence-

based medicine. The ubiquity of the Reference Class Problem is somewhat

troubling; indeed, the problem has been highlighted in a number of academic

disciplines, including law and quantitative biology (see Section 2.2). However,

despite its potential to entirely undermine statistical inferences, the problem has not

yet been acknowledge vis-à-vis evidence-based medicine.

In Chapter 3, I have illustrates how the Reference Class Problem appears to

be of worrying pertinence to our current paradigm of medical practice. As we have

seen in this paper, clinical judgement is made largely – if not entirely – on the basis

of conclusions drawn from clinical trials (specifically, randomised control trials).

Doctors are instructed to assess published trial results on their relevance to

individual patients. However, this practice necessitates one potentially problematic

assumption: that the effects observed in a trial population are applicable to a given

individual patient (known as ‘external validity’ or ‘generalisation’).

In this paper, I have unpacked the framework of the randomised control trial

and have illustrated that external validity is largely a matter of probabilistic

inference. Given that there exist a number of philosophical interpretations of

probability, I have explicated how the Reference Class Problem arises vis-à-vis the

predominant interpretations that presently underpin randomised control trials

(namely, frequentism and Bayesianism). I have argued that the Reference Class

Problem appears to persist, regardless of which interpretation of probability one

chooses to adopt.

In Chapter 4, I have outlined how the Reference Class Problem appears to

arise due to the heterogeneity of trial populations vis-à-vis individual cases. This

chapter, thus, comprises of an inquiry into the construction of the most appropriate


reference class for a given individual. In line with Salmon’s Reference Class Rule

(see Section 4.2.2), I have argued that, in theory, reference classes ought to be

homogeneous in only the statistically relevant respects. This allows for the

reference class to remain broad, thus, ensuring external validity without

compromising the statistical power of the trial’s results. The Reference Class

Problem may thence be avoided, as the most appropriate reference class for an

individual case will become possible to ascertain: constructed in accordance with

statistically-relevant partitioning.

In this paper, I have explicated the Reference Class Problem vis-à-vis

evidence-based medicine in some detail. It is important to note that, although this

paper has been thorough, it has not been entirely comprehensive. Indeed, a number

of avenues for further research into the Reference Class Problem and its pragmatic

ramifications vis-à-vis evidence-based medicine have been outlined in this paper.

The first step, however, must be to merely acknowledge its existence – something

the medical community appears yet to do.


6. REFERENCES

Texts

Ashby, D. 2006. Bayesian Statistics in Medicine: A 25 year review. Statistics in Medicine. 25, pp. 3589–3631. !Aven, T. & Reniers, G. 2013. How to define and interpret a probability in a risk and safety setting. Safety Science. [online]. 51, pp. 223–231. Available from: www.elsevier.com/locate/ssci [Accessed: 9th April 2014]. !Cartwright, N. & Hardie, J. 2012. Evidence-based Policy: A Practical Guide to Doing it Better. Oxford, U.K. & New York, NY, U.S.A.: Oxford University Press. !Cheng, E.K. 2009. A Practical Solution to the Reference Class Problem. Colombia Law Review. [online]. 109(8), pp. 2081–2105. Available from: http://www.jstor.org/stable/40380407 [Accessed: 10th April 2014]. !Clarke, B.; Gillies, G.; Illari, P.; Russo, F. & Williamson, J. 2013. Mechanisms and the Evidence Hierarchy. Topoi. [online]. Available from: http://link.springer.com/article/10.1007%2Fs11245-013-9220-9 [Accessed 5th January 2014]. !Daly, L.E. & Bourke, G.J. 2000. Interpretation and Uses of Medical Statistics. 5th Ed. Oxford, UK: Blackwell Science Ltd. !de Finetti, B. 1937. Foresight: Its Logical Laws, Its Subjective Sources. (English Translation). In: H.E. Kyburg and H.E. Smokler, eds. 1964. Studies in Subjective Probability. New York, NY, U.S.A.: John Wiley & Sons. !Gillies, D. 2000. Philosophical Theories of Probability. London, U.K.: Routledge. !Guyatt, G. et al. (Evidence Based Medicine Working Group). 1992. Evidence Based Medicine: A new approach to teaching the practice of medicine. Journal of the American Medical Association. 288, pp. 2420-2425. !Hackshaw, A. 2009. A Concise Guide to Clinical Trials. West Sussex, U.K.: John Wiley & Sons Ltd. !Hájeck, A. 2007. The reference class problem is your problem too. Synthese. [online]. 156(3), pp. 563-585. Available from: http://link.springer.com/article/10.1007/s11229-006-9138-5 [Accessed 3rd December 2013].


http://www.elsevier.com/locate/ssci

http://www.jstor.org/stable/40380407

http://link.springer.com/article/10.1007%2Fs11245-013-9220-9

http://link.springer.com/article/10.1007/s11229-006-9138-5

!Machamer, P.; Darden, L.; Craver, C.F. 2000. Thinking about mechanisms. Philosophy of Science. [online]. 67, pp. 1–25. Available from: http://www.jstor.org/stable/188611 [Accessed 25th November 2013]. !Machamer, P. 2004. Activities and Causation: The Metaphysics and Epistemology of Mechanisms. International Studies in the Philosophy of S c i e n c e . [ o n l i n e ] . 1 8 ( 1 ) , p p 2 7 – 3 9 . Av a i l a b l e f r o m : h t t p : / /dx.doi.org.libproxy.ucl.ac.uk/10.1080/02698590412331289242 [Accessed: 19th November 2013]. !von Mises, R. 1957. Probability, Statistics and Truth. 2nd revised English ed. New York, NY, U.S.A.: Macmillan. !NICE, 2011. CG127: Hypertension: quick reference guide. National Institute for Health and Clinical Excellence, London. [online]. Available from www.nice.org.uk [Accessed 12th January 2014]. !Piantadosi, S. 2005. Clinical Trials: A Methodological Perspective. 2nd ed. Hoboken, NJ, U.S.A.: John Wiley & Sons, Inc. !Pocock, S.J. 1983. Clinical Trials: A Practical Approach. Chichester, U.K. & New York, NY, U.S.A: John Wiley & Sons Inc. !Reichenbach, H. 1949. The Theory of Probability: An inquiry into the logical and mathematical foundations of the calculus of probability. E.H. Hutten and M. Reichenbach, trs. 2nd ed. Berkley & Los Angeles, CA, U.S.A. & London, U.K.: University of California Press. !Russo, F. & Williamson, J. 2007. Interpreting causality in the health sciences. International Studies in the Philosophy of Science. [online]. 21(2), pp. 1157–70. A v a i l a b l e f r o m : h t t p : / / d x . d o i . o r g . l i b p r o x y . u c l . a c . u k /10.1080/02698590701498084 [Accessed 10th December 2013]. !Sackett D.L.; Rosenberg W.M.C.; Gray J.A.M.; Haynes R.B. & Richardson W.S. 1996. Evidence Based Medicine: what it is and what it isn't. BMJ. [online]. 312(7023), pp 71–72. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2349778/ [Accessed 1st December 2013]. !Salmon, W.C. 1970. Statistical Explanation. In: W.C. Salmon; R.C. Jeffrey and J.G. Greeno, eds. 1971. Statistical Explanation and Statistical Relevance. Pittsburgh, PA, U.S.A.: University of Pittsburgh Press, pp. 29–87. !


http://www.jstor.org/stable/188611

http://www.nice.org.uk

http://dx.doi.org.libproxy.ucl.ac.uk/10.1080/02698590701498084

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2349778/

Salmon, W.C. 1977. Objectively Homogeneous Reference Classes. Synthese. [online]. 36(4), pp. 399–414. Available from http://link.springer.com/article/10.1007%2FBF00486104?LI=true [Accessed 29th November 2013]. !Schaffner, K.F. 1993. Discovery and Explanation in Biology and Medicine. Chicago, IL, U.S.A. & London, U.K.: The University of Chicago Press. !Spiegelhalter, D.J.; Freedman, L.S. & Parmar, M.K.B. 1994. Bayesian Approaches to Randomized Trials. Journal of the Royal Statistical Society: Series A (Statistics in Society). 157, pp. 357–416. !Spiegelhalter, D.J.; Abrams, K. & Myles, J. 2004. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Chichester, U.K.: John Wiley & Sons. !Sterne, R.H. 2010. The Discordance of Individual Risk Estimates and the Reference Class Problem. Quantitative Biology. [online]. Available from: http://arxiv.org/abs/1001.2499 [Accessed 25th March 2014]. !Teira, D. 2011. Frequentist versus Bayesian Clinical Trials. In: F. Gifford; D.M. Gabbay; P. Thagard and J. Woods, eds. 2011. Philosophy of Medicine (Handbook of the Philosophy of Science). Oxford, U.K.; Amsterdam, The Netherlands & Burlington, MA, U.S.A.: Elsevier, pp. 255–297. !Venn, J. 1876. The Logic of Chance: An essay on the foundations and province of the theory of probability, with especial reference to its application to moral and social science. 2nd ed. London & Cambridge, U.K.: Macmillan and co. !Victora, C.G.; Habicht, J.-P. & Bryce, J. 2004. Evidence-based public health: Moving beyond randomized trials. American Journal of Public Health. 94, pp 400–405. !!Images

Cover image. Available from: http://www.math.cornell.edu/~numb3rs/lipa/imgs/

venn4.png [Accessed 14th April 2014].


http://link.springer.com/article/10.1007%2FBF00486104?LI=true

http://arxiv.org/abs/1001.2499

http://www.math.cornell.edu/~numb3rs/lipa/imgs/venn4.png

Documents

The Reference Class Problem vis-à-vis Evidence-Based Medicine