T h e c r o s s - o v e r t r i a l : a s u b t l e k n i f eT h e c r o s s - o v e r t r i a l : a s u b t l e k n i f e
In the second book of the trilogy His Dark Ma-terials, author Philip Pullman, describes how one of the main characters gains possession of the subtle knife1.
This knife is so sharp that it can cut through any known material and even cut through the curtains that divide this world from the infi nite number of adjoining worlds that simultaneously exist. You will have to read the books to under-stand the concept of other adjoining worlds!
What has this got to do with cross-over trials? Well this type of trial is not unlike the subtle knife: maybe not as sharp, but certainly sharp enough compared to the alternatives, and one that is quite at home in the various adjacent worlds that make up the vast spectrum of ap-plication areas where the design is used. These worlds include medicine, psychology, sports science, dairy science and agriculture. So what is it? Read on!
The 22 cross-over trial
The 22 cross-over trial is the simplest and prob-ably most used of these designs: though one that is not without controversy. Like the subtle knife, not everyone can use it well.
We will take as our context the comparison of two drugs that provide relief for patients suffer-ing from chronic obstructive pulmonary disease (COPD). This is a disease of the lungs that causes breathlessness and is prevalent among heavy smokers. We will label the drugs as A and B and suppose a suffi ciently large number of patients have volunteered to take part in a clinical trial that will evaluate and compare the effects of these drugs. The patients are randomly divided into two groups of equal size, which we will label as I and II. Each patient in group I takes drug
A for 4 weeks and then crosses over (hence the name of the trial) to take drug B for 4 weeks. In group II each patient takes drug B for 4 weeks and then crosses over to take drug A for 4 weeks. The plan of the trial is summarised in Table 1. After the trial is completed the data obtained are analysed to determine if drug A is superior to drug B.
Why does each patient need to take both drugs? The answer is that this is the strength of the design: the effects of drugs A and B can be compared directly using the two responses recorded on each patient. The alternative would be to compare the effects of drug A observed on some patients with the effects of drug B observed on a different set of patients. This is the so-called parallel-groups design, and we will say more about this design later. But why do we need two groups of patients when we can make the within-subject comparisons by using just one of the groups? Well tempus fugit, time moves on, and the conditions of the trial in period 2 might be different to those in period 1. For example, if the weather happened to be cold and damp in the second period and warm and dry in the fi rst period, we might expect to see more severe symptoms in the second period. Or perhaps the medical staff and equipment in the second period are not the same as the fi rst, etc. In well-planned trials we do not necessar-
ily expect to see an effect of time, but building this possibility into the design provides some insurance against such an eventuality. Using two groups ensures that we can fi t a statistical model that will separate out the difference between the two drugs from any differences between the two periods. For an example of a real COPD cross-over trial, see Jones and Kenward2, where much more information on cross-over trials can be found.
Plots and analysis
The data obtained from a 22 trial consist of two measurements on each patient: one from the end of period 1 and one from the end of period 2. In our example of treating COPD, the recorded response is the mean morning peak fl ow rate (PEFR) in litres per minute, which is a measure of the rate at which a patient can exhale. We expect that the drugs will increase this rate, and one more so than the other. When plotted, such data might look like the plot in Figure 1, where there are 20 patients in each group. Here the two responses from each patient have been joined by a line: If the line is going up it indicates that the response was higher in the second period and vice versa.
We can see that most patients have a down-ward trend in response in group I and an upward trend in group II. In other words, patients seem to be doing better on drug A. However, not all patients are better off taking drug A, as can be seen in the plot of the group I data. This plot also highlights the differences between the patients. Some have low PEFR values and others have high values. However, the improvement with drug A as compared with drug B is about the same for almost everyone. This is an ideal situation for a cross-over trial: large between-patient variability
Cross-over trials are a sharp and useful tool. Byron Jones explains just how sharp they can beand how not to cut yourself.
Table 1. 22 cross-over trial examining effect of two different drugs on two groups of patients
Group Period 1 (fi rst 4 weeks)
Period 2 (second 4 weeks)
I Drug A Drug BII Drug B Drug A
and small within-patient variability. By basing our analyses on the within-patient differences we can get a precise estimate of the mean dif-ference between the two drugs. How do we esti-mate this difference? Assume we have calculated the mean of the responses in each period and in each group, to give four means, m11, m12, m21 and m22, where mij is the mean from group i and period j, i = 1,2 and j = 1,2. These are shown and identifi ed in Table 2, along with the values obtained from our example dataset. These means are plotted in Figure 2, where we have joined the means from the same treatment with a line.
The difference m11m12 is an estimate of the mean effect of (drug A drug B) plus the mean
effect of (period 1 period 2). The difference m21 m22 is an estimate of the mean effect of (drug B drug A) plus the mean effect of ( period 1 period 2). Therefore to cancel out the period difference and estimate the mean effect of drug A drug B we calculate 0.5[(m11 m12) (m21 m22)]. If there are N patients in total (N/2 in each group) and the standard deviation of the response is , then the standard error of this estimator is (2(1)/N), where is the cor-relation between the two responses on the same patient. So as gets larger the standard error gets smaller. An alternative, but equivalent, way to think about the standard error of this estimator is to consider the within-patient and between-patient components of the variance of a response on a patient: 2W and 2B, re-spectively, where 2 = 2W +
2B. Then = 2B/
(2W + 2B) and the formula for the standard error
is (22W/N), which has a familiar form for the standard error of the difference in two means, with N observations in each mean. As the vari-ance of the difference between the two repeated observations on a patient (22W) gets smaller, the standard error gets smaller.
The parallel-groups design
Before saying more about the cross-over trial, let us return, as promised, to the parallel-groups design. In this design the available patients are randomly divided into two groups. Everyone in group I will get drug A and everyone in group II will get drug B. This design is summarised in Table 3.
If this design was a knife it would not be as sharp as the subtle knife (the cross-over): in fact it is quite blunt. By this I mean that to achieve the same precision of estimation of the mean difference in drug effects that can be achieved by the cross-over trial, this design requires many more patients. To see this, we note that the es-timator of the difference in mean drug effects is the difference of the observed mean response of the group I patients minus the observed mean response of the group II patients. If there are N/2 patients in each group then the standard error of this estimator is (4/N). The ratio of the standard errors from the two designs (paral-lel-groups:cross-over) is (2/(1)). If = 2/3, for example, which is not at all unusual, then the parallel-groups design will require 2/(1), i.e. 6 times as many patients as the cross-over trial (note that sample size depends on the square
of the ratio of the standard errors). Even when = 0, and the ratio reduces to 2, the benefi t of the cross-over design is that the replicate data on the same patient is as valuable as data on two different patients. Of course, the parallel-groups design can be sharpened up a little if a baseline response value is taken before the drug is taken and a change-from-baseline score is used as the response. Then the ratio is 4.
A plot of the ratio of sample sizes is given in Figure 3, where we have indicated the ratios corresponding to = 0 and = 2/3. Finally, we note something that we will refer to later: that the cross-over trial contains within it a parallel-groups design. If only the data from the fi rst period were used, then the structure of the data would be as in Table 3.
If the cross-over trial is as sharp as I have made out you may wonder why it is not used in every clinical trial. Well, unlike the subtle knife, it can-not be used in all circumstances. A key assump-tion is that patients will be in the same medical state at the start of period 2 as they were at the start of period 1. If our treatment cures the dis-ease then clearly the cross-over cannot be used. The sorts of medical conditions where the cross-over has been successfully used are those that are chronic and stable over time. Good examples are asthma, COPD, migraine, hypertension, arthritis and heart problems, to name but a few (other ex-amples are given towards th