The cross-over trial: a subtle knife

  • Published on

  • View

  • Download


  • september2008 135

    T h e c r o s s - o v e r t r i a l : a s u b t l e k n i f eT h e c r o s s - o v e r t r i a l : a s u b t l e k n i f e

    In the second book of the trilogy His Dark Ma-terials, author Philip Pullman, describes how one of the main characters gains possession of the subtle knife1.

    This knife is so sharp that it can cut through any known material and even cut through the curtains that divide this world from the infi nite number of adjoining worlds that simultaneously exist. You will have to read the books to under-stand the concept of other adjoining worlds!

    What has this got to do with cross-over trials? Well this type of trial is not unlike the subtle knife: maybe not as sharp, but certainly sharp enough compared to the alternatives, and one that is quite at home in the various adjacent worlds that make up the vast spectrum of ap-plication areas where the design is used. These worlds include medicine, psychology, sports science, dairy science and agriculture. So what is it? Read on!

    The 22 cross-over trial

    The 22 cross-over trial is the simplest and prob-ably most used of these designs: though one that is not without controversy. Like the subtle knife, not everyone can use it well.

    We will take as our context the comparison of two drugs that provide relief for patients suffer-ing from chronic obstructive pulmonary disease (COPD). This is a disease of the lungs that causes breathlessness and is prevalent among heavy smokers. We will label the drugs as A and B and suppose a suffi ciently large number of patients have volunteered to take part in a clinical trial that will evaluate and compare the effects of these drugs. The patients are randomly divided into two groups of equal size, which we will label as I and II. Each patient in group I takes drug

    A for 4 weeks and then crosses over (hence the name of the trial) to take drug B for 4 weeks. In group II each patient takes drug B for 4 weeks and then crosses over to take drug A for 4 weeks. The plan of the trial is summarised in Table 1. After the trial is completed the data obtained are analysed to determine if drug A is superior to drug B.

    Why does each patient need to take both drugs? The answer is that this is the strength of the design: the effects of drugs A and B can be compared directly using the two responses recorded on each patient. The alternative would be to compare the effects of drug A observed on some patients with the effects of drug B observed on a different set of patients. This is the so-called parallel-groups design, and we will say more about this design later. But why do we need two groups of patients when we can make the within-subject comparisons by using just one of the groups? Well tempus fugit, time moves on, and the conditions of the trial in period 2 might be different to those in period 1. For example, if the weather happened to be cold and damp in the second period and warm and dry in the fi rst period, we might expect to see more severe symptoms in the second period. Or perhaps the medical staff and equipment in the second period are not the same as the fi rst, etc. In well-planned trials we do not necessar-

    ily expect to see an effect of time, but building this possibility into the design provides some insurance against such an eventuality. Using two groups ensures that we can fi t a statistical model that will separate out the difference between the two drugs from any differences between the two periods. For an example of a real COPD cross-over trial, see Jones and Kenward2, where much more information on cross-over trials can be found.

    Plots and analysis

    The data obtained from a 22 trial consist of two measurements on each patient: one from the end of period 1 and one from the end of period 2. In our example of treating COPD, the recorded response is the mean morning peak fl ow rate (PEFR) in litres per minute, which is a measure of the rate at which a patient can exhale. We expect that the drugs will increase this rate, and one more so than the other. When plotted, such data might look like the plot in Figure 1, where there are 20 patients in each group. Here the two responses from each patient have been joined by a line: If the line is going up it indicates that the response was higher in the second period and vice versa.

    We can see that most patients have a down-ward trend in response in group I and an upward trend in group II. In other words, patients seem to be doing better on drug A. However, not all patients are better off taking drug A, as can be seen in the plot of the group I data. This plot also highlights the differences between the patients. Some have low PEFR values and others have high values. However, the improvement with drug A as compared with drug B is about the same for almost everyone. This is an ideal situation for a cross-over trial: large between-patient variability

    Cross-over trials are a sharp and useful tool. Byron Jones explains just how sharp they can beand how not to cut yourself.

    Table 1. 22 cross-over trial examining effect of two different drugs on two groups of patients

    Group Period 1 (fi rst 4 weeks)

    Period 2 (second 4 weeks)

    I Drug A Drug BII Drug B Drug A

  • september2008136

    and small within-patient variability. By basing our analyses on the within-patient differences we can get a precise estimate of the mean dif-ference between the two drugs. How do we esti-mate this difference? Assume we have calculated the mean of the responses in each period and in each group, to give four means, m11, m12, m21 and m22, where mij is the mean from group i and period j, i = 1,2 and j = 1,2. These are shown and identifi ed in Table 2, along with the values obtained from our example dataset. These means are plotted in Figure 2, where we have joined the means from the same treatment with a line.

    The difference m11m12 is an estimate of the mean effect of (drug A drug B) plus the mean

    effect of (period 1 period 2). The difference m21 m22 is an estimate of the mean effect of (drug B drug A) plus the mean effect of ( period 1 period 2). Therefore to cancel out the period difference and estimate the mean effect of drug A drug B we calculate 0.5[(m11 m12) (m21 m22)]. If there are N patients in total (N/2 in each group) and the standard deviation of the response is , then the standard error of this estimator is (2(1)/N), where is the cor-relation between the two responses on the same patient. So as gets larger the standard error gets smaller. An alternative, but equivalent, way to think about the standard error of this estimator is to consider the within-patient and between-patient components of the variance of a response on a patient: 2W and 2B, re-spectively, where 2 = 2W +

    2B. Then = 2B/

    (2W + 2B) and the formula for the standard error

    is (22W/N), which has a familiar form for the standard error of the difference in two means, with N observations in each mean. As the vari-ance of the difference between the two repeated observations on a patient (22W) gets smaller, the standard error gets smaller.

    The parallel-groups design

    Before saying more about the cross-over trial, let us return, as promised, to the parallel-groups design. In this design the available patients are randomly divided into two groups. Everyone in group I will get drug A and everyone in group II will get drug B. This design is summarised in Table 3.

    If this design was a knife it would not be as sharp as the subtle knife (the cross-over): in fact it is quite blunt. By this I mean that to achieve the same precision of estimation of the mean difference in drug effects that can be achieved by the cross-over trial, this design requires many more patients. To see this, we note that the es-timator of the difference in mean drug effects is the difference of the observed mean response of the group I patients minus the observed mean response of the group II patients. If there are N/2 patients in each group then the standard error of this estimator is (4/N). The ratio of the standard errors from the two designs (paral-lel-groups:cross-over) is (2/(1)). If = 2/3, for example, which is not at all unusual, then the parallel-groups design will require 2/(1), i.e. 6 times as many patients as the cross-over trial (note that sample size depends on the square

    of the ratio of the standard errors). Even when = 0, and the ratio reduces to 2, the benefi t of the cross-over design is that the replicate data on the same patient is as valuable as data on two different patients. Of course, the parallel-groups design can be sharpened up a little if a baseline response value is taken before the drug is taken and a change-from-baseline score is used as the response. Then the ratio is 4.

    A plot of the ratio of sample sizes is given in Figure 3, where we have indicated the ratios corresponding to = 0 and = 2/3. Finally, we note something that we will refer to later: that the cross-over trial contains within it a parallel-groups design. If only the data from the fi rst period were used, then the structure of the data would be as in Table 3.

    Potential disadvantages

    If the cross-over trial is as sharp as I have made out you may wonder why it is not used in every clinical trial. Well, unlike the subtle knife, it can-not be used in all circumstances. A key assump-tion is that patients will be in the same medical state at the start of period 2 as they were at the start of period 1. If our treatment cures the dis-ease then clearly the cross-over cannot be used. The sorts of medical conditions where the cross-over has been successfully used are those that are chronic and stable over time. Good examples are asthma, COPD, migraine, hypertension, arthritis and heart problems, to name but a few (other ex-amples are given towards the end of this article). The other limiting factor is that the half-lives of the drugs to be compared must not be too long. If the active constituents of a drug given in period 1 are not cleared from the body before the start of period 2, then there will be a residual amount of the fi rst drug still present when the second drug is given in period 2. This residual amount could affect the performance of the drug given

    Figure 1. Patients plot for mean peak fl ow rate (PEFR)


    n P


    : ltrs








    A B B AGrp I Grp II

    Table 2. 22 cross-over trial: means of patient responses with drugs A and B

    Group Period 1 (fi rst 4 weeks)

    Period 2 (second 4 weeks)

    I m11 = 229 (drug A) m12 = 209 (drug B)II m21 = 201 (drug B) m22 = 252 (drug A)

    Mean PEFR: ltrs/min









    A, m11

    B, m12B, m21

    A, m22

    Period 1 Period 2

    Figure 2. Means plot for mean peak fl ow rate (PEFR)

    Table 3. Parallel-groups trial examining effect of two different drugs on two groups of patients

    Group Period (4 weeks)

    I Drug A

    II Drug B



    of s


    e si














    0.0 0.2 0.4 0.6 0.8 1.0

    Figure 3. Ratio of parallel-group (PG) sample size to cross-over (CO) sample size

  • september2008 137

    in period 2. This is referred to as the carry-over effect of the fi rst drug. If carry-over effects are large it is diffi cult to separate the true effect of the second drug from the residual effect of the fi rst. In order to remove any suspected residual effects, a so-called wash-out period is sometimes interposed between period 1 and period 2. How long the wash-out period needs to be will depend on the half-life of the drug or the lag time of drug effect due to that effect having to move through a number of different biological pathways to manifest itself. A period equal to about fi ve half-lives is usually considered to be adequate for pharmacological carry-over.

    The potential for carry-over effects is certain-ly a limiting factor on the use of this particular cross-over design. However, if more periods and/or additional sequences of taking the drugs are used, then carry-over effects can be effi ciently separated from the drug effects and unambigu-ous results obtained.

    Testing for a carry-over effect difference

    Isnt it annoying to be told that something you have been doing for years, and thought was right, is wrong and should be stopped? Well this is what happened to one proposal for test-ing in the situation where two drugs have any carry-over effects. Before we say more about this proposal, a little more has to be said about how we might test for carry-over effects. Suppose we were to add together both the responses from each patient to give a total response. The total response from a patient in group I would, apart from the background random variability, measure the sum of the period effects plus the sum of the drug effects. The same can be said for a total response from a patient in group II. In other words, if there are no other fi xed effects in our statistical model, the true difference between the mean of the patient totals from group I and the mean of the patient totals from group II will equal zero. However, if only one of the drugs has a carry-over effect, or if both drugs have carry-over effects but these are not the same, then the aforementioned true difference will not be zero. The null hypothesis of a zero difference between the true mean totals can therefore be tested us-ing a simple two-sample t-test (assuming that the data are normally distributed). If this t-test gives a signifi cant result, there is evidence that the carry-over effects are unequal, i.e. one drug has a carry-over effect and the other does not or both drugs have carry-over effects and these are unequal. So far so good. But recall the earlier comparison in precision between the cross-over trial and the parallel-groups design. The test we have just described is for a comparison of two groups in a parallel-groups design. As we know, if this design was a knife it would not be very sharp: in fact when testing for carry-over effects the knife is very blunt. The large between-patient

    variability that is seen in cross-over trials is what makes the knife blunt. So the chance of fi nding a difference between the carry-over effects that is of the same size or smaller than the difference between the drug effects is very low indeed. In other words, it is a pretty worthless test. If a signifi cant result is obtained then you are either very unlucky that the 1 in 20 or 1 in 100 times you expect to make a type I error has happened or there is a very large difference in the carry-over effects. In the latter case the trial planners need to go back to the drawing board and fi gure out why they did not anticipate this in the fi rst place and instead use a parallel-groups trial to compare the drugs (in the former case dont try making a living as a gambler!).

    Now the proposal I wish to refer to was in-troduced by Grizzle3 and went like this: we fi rst test for (differential) carry-over effects using the difference in group mean totals, as described above. If this test is not signifi cant then we can assume that there are no car...


View more >