Should non-inferiority drug trials be banned altogether?

Reviews�POSTSCREEN

Drug Discovery Today � Volume 18, Numbers 11/12 � June 2013 REVIEWS

Should non-inferiority drug trials bebanned altogether?Grace Wangge1, Olaf H. Klungel1, Kit C.B. Roes2, Anthonius de Boer1, Arno W. Hoes2

and Mirjam J. Knol2,3

1Division of Pharmacoepidemiology & Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, PO Box 80082, 3508TB,

Utrecht, The Netherlands2 Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, PO Box 85500, 3508GA Utrecht, The Netherlands3 RIVM, National Institute for Public Health and the Environment, Centre for Infectious Disease Control Netherlands (CIb), Epidemiology and Surveillance Unit (EPI),

PO Box 1, 3720BA Bilthoven, The Netherlands

Non-inferiority (NI) trials can be used in a situation when a new drug is expected to have a similar

efficacy to its comparator but can offer other advantages over the existing drug, such as a more

convenient method of administration or fewer side effects. Here, we discuss the advantages and

disadvantages of NI trials from an ethical, methodological and regulatory perspective. We suggest that

such trials should be designed to address simultaneously the objective of showing NI with regard to drug

efficacy and the objective of establishing superiority of the additional advantages of a drug over its active

comparator.

IntroductionIn their article published in 2007, Garattini and Bertele con-

demned non-inferiority (NI) trials as unethical because NI trials

do not have the intention to show that a new drug is better than

the standard drug and that the new drug might be even be worse

than its comparator [1]. Garattini and Bertele went as far as to

suggest that the scientific community should ban NI (and equiva-

lence) trials, even when measures are taken to improve the meth-

odological problems inherent in NI trials. This publication was the

first paper to represent the voices of parties who do not agree with

or who questioned the method of NI trials.

Since that publication the number of published NI trials has not

reduced, instead it has increased – based on a Pubmed-Medline

search conducted on 13th December 2012 with publication type

keywords: non-inferior* OR ‘active control’ AND (‘equivalence’

NOT ‘bioequivalence’) and ‘Randomized Controlled Trial’ – to

over 200 publications per year. In addition, new guidelines on

NI trials have been released (i.e. the CONSORT statement exten-

sion on NI trials 2008 [2] and the draft FDA guidelines on NI trials

2010 [3]). These guidelines indicate not only a growing interest in

Corresponding author:. de Boer, A. ([email protected])

1359-6446/06/$ - see front matter � 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.drudis.

NI trials but also continued efforts of overcoming the methodo-

logical challenges of NI trials. Here, we will discuss whether NI

trials are at all useful in drug research and for daily practice – based

on ethical, methodological and regulatory aspects.

Arguments for banning NI trialsThe first reason to ban NI trials is that these trials do not have any

intention to show that a new drug is superior to an active standard

treatment. This is considered as unethical [1]. The NI trials put trial

subjects at risk of inconveniences and side effects, without an

intention to show benefit that leads to a superior drug on the

market. It is even possible that NI trials accept that the new drug

is somewhat less effective than its comparator, quantified by the NI

margin (i.e. the clinically acceptable lower limit of the 95% con-

fidence interval of the effect measured). For example, suppose a new

drug is expected to reduce the incidence of an adverse outcome

compared with placebo. Then, if an NI trial is to be performed

comparing this new drug versus an active control one typically

assumes a risk difference (RD) of zero (new-active) at the design

stage, or possibly even a slight advantage for the new drug (RD

negative). The NI margin, however, will be set at, for example, 10%,

with NI demonstrated if the upper limit of the 95% confidence

2013.01.003 www.drugdiscoverytoday.com 601

mailto:[email protected]

http://dx.doi.org/10.1016/j.drudis.2013.01.003

REVIEWS Drug Discovery Today � Volume 18, Numbers 11/12 � June 2013

Review

s�P

OSTSCREEN

interval of new-active is below 10%. This does imply that one

accepts that the new treatment (at population level) might actually

be less effective, with the size of the difference with 95% confidence

limited to 10%.

The second reason for banning NI trials is the methodological

argument that an NI margin cannot be validly and objectively

determined. So far, most of the efforts to overcome the methodo-

logical challenges in NI trials have concentrated on this issue. An

NI margin is a clinically acceptable limit within which it can still

be concluded that the new drug is similar or not worse than its

comparator. Theoretically, an NI margin should be chosen in such

a way that the new drug can be considered effective relative to

placebo (although a placebo-controlled patient group is not

included in an NI trial). This NI margin thus needs to account

for the uncertainty in the effect size of the active control versus

placebo.

Methods for determining the NI margin vary considerably. In

22% of publications of the 232 NI trials we reviewed in 2009, NI

margins were determined merely based on subjective (clinical)

considerations of the investigator. In 20 (8.7%) trials the NI

margins were copied from other publications or reviews, in 18

(7.7%) trials the NI margins were obtained following available

guidelines and in 17 (7.3%) trials the NI margins were determined

by the investigators based on data obtained from previous trials

[4]. In addition, we observed that clinical judgements and percep-

tions of the investigators play an important part in the process of

determining an NI margin. Importantly, however, such rather

subjective clinical judgement (together with statistical judgement)

has been acknowledged by regulators as the key step in determin-

ing NI margins [3,5] because it helps in preventing biocreep (see

fourth reason to ban NI trials below).

In one study we used an online survey to ask 25 experts

(including clinicians from academic and non-academic hospitals,

regulators and researchers from the pharmaceutical industry) to

choose an appropriate NI margin for a hypothetical NI trial on a

new oral anticoagulant indicated for prophylaxis of venous throm-

boembolic events in post-orthopaedic surgery. Experts were asked

to give their choices and their reasoning of their NI margin of

choice in two study-sections: ‘before’ and ‘after’ additional infor-

mation on the statistical NI margin was presented. The median NI

margin of RD was 1.8% [interquartile range (IQR) 1–2%] and the

median NI margin of risk ratio (RR) was 1.3 (IQR 1.05–1.50). After

information on statistical consideration for the NI margin was

provided, the median NI margin of RD changed to 9% (IQR 7.7–

10.0%), whereas for RR it was 1.25 (IQR 1.2–1.5). Clear reasons

underlying the choice of NI margin were provided by only 60% of

the experts, even though additional information on the statistical

NI margin was presented.

The complexity of NI margin determination was also shown in

questions posed by applicants who requested scientific advice

from the European Medicines Agency (EMA). We conducted a

content analysis of 156 (from 94 different applicants) scientific

advices pertaining to NI trials requested from the EMA during 2008

and 2009. We found that questions on ‘how’ to conduct an NI trial

were more frequently asked by applicants than ‘whether an NI trial

should be conducted’ questions (74% versus 26%). The NI margin

was the topic most often asked about by the applicants (28% of the

total number of specific questions on NI trials). In addition, most

602 www.drugdiscoverytoday.com

of the proposed NI margins from applicants were challenged or not

approved by the EMA. In 40% of the EMA answers to questions

about NI margin the EMA was recommending the use of a stricter

margin; and in 10% of the EMA answers on the NI margin they

questioned the justification of the NI margin. This is remarkable

because the guidelines on how the NI margin should be deter-

mined were already available before 2008, such as the guideline on

the choice of the NI margin released by the Committee for

Medicinal Products for Human Use (CHMP) [5].

Furthermore, in determining an NI margin, an estimated treat-

ment effect between the active comparator and the placebo is used

and it should be accurate for the NI trial at hand. This is called the

constancy assumption. We found only 3.9% of published NI trials

discussed the constancy assumption [6]. To guarantee a constancy

assumption a proper meta-analysis can be conducted and similar-

ity between the current trial (for example similarity in the main

inclusion criteria) and the placebo-controlled trials used for setting

the NI margin should be demonstrated. Unfortunately, a meta-

analysis is not a perfect solution either, because it is not always

easy to decide which trials are similar ‘enough’ to be used for NI

margin determination [7].

The third reason to ban NI trials is related to the fact that each NI

trial conclusion depends on external considerations that cannot

be verified by the NI trial itself. A drug is considered effective if it

shows a significant treatment effect compared with placebo. Thus,

each clinical trial should have the ability to distinguish an effective

treatment from an ineffective treatment – this is called assay

sensitivity. In a superiority trial a significant difference between

two treatments directly confirms assay sensitivity. In an NI trial the

efficacy of both drugs over placebo is not directly shown. A result

of NI can be interpreted as both drugs being effective, but it could

also mean that both drugs were ineffective (i.e. similar to placebo).

To prove assay sensitivity investigators should ideally include a

placebo arm in the NI trial. At the very least, investigators should

discuss how they arrived at the conclusion that the trial had assay

sensitivity, for example by discussing the results of all placebo-

controlled trials of the active comparator, in the NI trial publica-

tions. Without such discussion, readers cannot reliably judge

whether the conclusions from the trial are valid and relevant

for treatment decisions. We observed that only 6% of NI trials

included a placebo arm to evaluate assay sensitivity and none

discussed assay sensitivity [6].

The fourth reason to ban NI trials is the threat of biocreep [8,9].

The biocreep can be explained as follows: after an NI trial a new

therapy could be accepted as effective, even if its treatment effect is

slightly smaller than the current standard. It is therefore possible

that, after a series of trials where the new therapy is slightly worse

than the preceding drugs, an ineffective or harmful therapy might

be incorrectly declared efficacious [10].

The final reason why we should consider banning NI trials is the

issue of additional benefit. One of the reasons why NI trials are

performed is because the new drug could offer advantages over the

active comparator drug, such as a more attractive method of

administration (e.g. oral instead of intravenously) or a superior

safety profile. Ideally, the additional benefit claims of a new drug

should be proven superior compared with its active comparator.

This can be done in an independent superiority trial or in combi-

nation with an NI trial that aims to prove NI of the drug’s intended

Drug Discovery Today � Volume 18, Numbers 11/12 � June 2013 REVIEWS

Reviews�POSTSCREEN

effect. In the latter situation, the trial should have sufficient power

to prove the NI of the new drug’s intended effect and superiority

for the additional benefit. In that sense, we do not need NI trials in

the strict sense, because there is always a superiority counterpart in

a ‘so-called’ NI trial, namely for the additional benefit.

Arguments against banning NI trialsThe first argument against banning NI trials is that we need a few

drugs with similar efficacy on the market, so that patients, doctors

and third-party payers have more alternatives [11–14]. NI trials

provide an opportunity to test these alternative drugs, although

one could argue which and how many drugs are clinically neces-

sary as alternatives. Ideally, doctors wish to choose from various

drugs with similar efficacy, but with various clinically important

additional benefits. In a review of 41 Phase IV NI trials we found

that, among 25 industry-initiated trials, 14 NI trials claimed multi-

ple additional benefits, whereas five out of 12 non-industry-

initiated trials claimed the single additional benefit of ‘better

safety profile’ (in four trials the type of sponsor was not clear).

The variety of these additional benefit claims seems encouraging.

It might be a sign of how the industry aims to answer the need for

alternative drug options, albeit that there are multiple examples

from the past of claimed benefits that turned out to be irrelevant

for patients. Today, in some countries reimbursement decisions

are dependent upon the added value of a new drug in clinical

practice [15].

The second argument here is the possibility of overcoming the

main methodological limitation of NI trials (i.e. the difficulty in

determining NI margins). It is clear from the previous section that

the main issues in determining an NI margin are related to assay

sensitivity and constancy assumption. Assessment of assay sensi-

tivity relies heavily on clinical judgement. The use of data from

similar but ‘outdated’ placebo-controlled trials might not be

avoidable, but with sufficient knowledge on the current evidence

base of the drugs and the disease itself the size of the estimated

treatment effect between the active comparator and the placebo

can be more accurately defined. In addition, this could lead to

consensus between the investigators themselves that a specific NI

margin is clinically acceptable. Thus, we also need to incorporate

clinical judgement to determine an NI margin. The fear of biocreep

seems somewhat overstated [10,16], indicating that clinical judge-

ment might have prevented the drugs tested with NI trials, moving

to less-effective treatments gradually.

The subjectivity in clinical judgement should not be a major

concern because it is not solely a problem of NI trials. Planning of

superiority trials is also not free from subjectivity. Defining the

smallest difference to be detected in superiority trials depends on

the experience and the perspective (individual, professional or

societal) of the investigators. It could also depend on feasibility

grounds [17,18]. Efforts to reduce subjectivity have been studied

more extensively in the field of social science and psychology [19].

In clinical trials, similar methodologies could be applied. These

efforts include patient perception and use of a systematic scoring

system in defining a minimal clinically important difference

[18,20]. Thus, there is still room for improvement in the metho-

dology of NI trials.

The third argument not to ban NI trials is the existence of many

regulatory guidelines that can act as a safety net for NI trials. The

first regulatory guidance on NI trials in drugs was the ‘guideline on

the evaluation of medicinal products indicated for treatment of

bacterial infections’ which was released by the European regulators

in 1995 [21]. It stated that each trial that is indicated for the

treatment or prophylaxis of infection should be adequately pow-

ered to show at least NI to an acceptable active comparative

regimen or superiority to placebo (whenever considered to be

possible) or, possibly, both. In addition, it also mentioned to

use an NI margin of 10% for anti-infective agents. This guideline

was later followed by similar guidelines in other therapeutic areas,

such as for antidiabetic drugs [22,23]. These guidelines have been

revised recently. In 2011 the 10% NI margin was no longer men-

tioned in the anti-infective agent guideline [24]. In addition,

general guidelines, such as the CHMP guideline on NI trials and

draft FDA guideline on NI trials, are available [3,5]. Beyond these

guidelines, specific issues in NI trials can also be solved with

dialogue between regulators and investigators and/or sponsors,

such as via a scientific advice procedure [25] or pre-investigational-

new-drug (IND) consultations [26].

Lastly, the argument that one should not enrol subjects into a

trial with a primary objective that implies that a drug does not

have additional benefit compared with the control can also be

refuted. In general, trials with new drugs were (also) set out to

provide evidence that the new drug has additional benefit (such as

mode of administration). In these circumstances, we still need an

NI part of the study that assesses NI of the primary efficacy out-

come.

A trial that is designed to pursue an NI objective for primary

efficacy as well as superiority of a designated benefit simulta-

neously might not be straightforward. A potential approach to

arrive at a useful design could be along the following lines: a first

step is to define the objectives of the trial in terms of when the trial

will be declared successful, for example when NI on the primary

endpoint and superiority on a targeted benefit can be concluded.

Subsequently, a testing strategy can be determined that will pre-

serve the overall type 1 error (in the strong sense) for declaring this

success (in the example, the probability of concluding when either

NI or superiority or both do not hold). Sample size and power

calculations will be a challenge because they do not only depend

on the separate outcome measures for primary efficacy and the

target benefit but also on their correlation and there might not be

data available to assess this correlation. However, when the success

is defined as achieving both objectives, in general no type 1 error

correction per endpoint is needed (co-primary endpoints). In most

cases, the NI objective will require the largest sample size and

hence the impact of this strategy on total sample size could be

limited and render this type of approach feasible in practice. A

comprehensive assessment of the characteristics of such designs is

beyond the scope of the present paper, but can be undertaken by

investigators when preparing for a specific trial.

Concluding remarksAlthough NI trials can be criticised based on ethical, methodological

and regulatory arguments, NI trials should not be banned. We can

see the main reason to ban NI trials is the unethical concerns about

exposing patients to drugs without the intention to show that these

drugs have additional benefit. However, even when one believes

that showing superiority on an outcome is an integral part of all

www.drugdiscoverytoday.com 603

REVIEWS Drug Discovery Today � Volume 18, Numbers 11/12 � June 2013

Review

s�P

OSTSCREEN

trials, an NI aim for another outcome could be important. In

addition, there is still ample room to improve the determination

of the NI margin. To support it, dialogue with regulators to solve

specific issues in NI trials could be improved, for example through

scientific advice. To ban NI trials altogether could hinder the

development of alternative drug therapies. The new oral anticoa-

gulant drugs could serve as an example, irrespective of whether

604 www.drugdiscoverytoday.com

these drugs eventually prove to be an acceptable or even preferable

alternative to current standard treatment [27,28].

Furthermore, we hope this comprehensive discussion can trig-

ger further research on how (NI) trials could be designed to address

simultaneously the objectives of showing NI with regard to effi-

cacy and establishing superiority of the additional advantages over

active comparators.

References

1 Garattini, S. and Bertele, V. (2007) Non-inferiority trials are unethical because they

disregard patients’ interests. Lancet 370, 1875–1877

2 Piaggio, G. (2006) Reporting of noninferiority and equivalence randomized trials:

an extension of the CONSORT statement. J. Am. Med. Assoc. 295, 1152–1160

3 Center for Drug Evaluation and Research (CDER) and Center for Biologics

Evaluation and Research (CBER), (2010) Draft Guidance: Guidance for Industry Non-

Inferiority Clinical Trials 2010.

4 Wangge, G. et al. (2010) Room for improvement in conducting and reporting non-

inferiority randomized controlled trials on drugs: a systematic review. PLoS ONE 5,

e13550

5 Committee for Medicinal Products for Human Use (CHMP), (2005) Guideline on the

Choice of the Non-inferiority Margin. EMEA/CPMP/EWP/2158/99 2005.

6 Wangge, G. et al. (2010) Interpretation and inference in noninferiority randomized

controlled trials in drug research. Clin. Pharmacol. Ther. 88, 420–423

7 Wangge, G. et al. (2012) The challenges of determining noninferiority margins: a

case study of noninferiority randomized controlled trials of novel oral

anticoagulants. Can. Med. Assoc. J. [Epub ahead of print]

8 D‘Agostino, R.B.S. et al. (2003) Non-inferiority trials: design concepts and issues –

the encounters of academic consultants in statistics. Stat. Med. 22, 169–186

9 Fleming, T.R. (2008) Current issues in non-inferiority trials. Stat. Med. 27, 317–332

10 Everson-Stewart, S. and Emerson, S.S. (2010) Bio-creep in non-inferiority clinical

trials. Stat. Med. 29, 2769–2780

11 Nunn, A.J. et al. (2008) The ethics of non-inferiority trials. Lancet 371, 895

12 Chuang-Stein, C. et al. (2008) The ethics of non-inferiority trials. Lancet 371,

895–896

13 Gandjour, A. (2008) The ethics of non-inferiority trials. Lancet 371, 895

14 Soliman, E.Z. (2008) The ethics of non-inferiority trials. Lancet 371, 895

15 Wonder, M. et al. (2012) Australian managed entry scheme: a new manageable

process for the reimbursement of new medicines? Value Health 15, 586–590

16 Beryl, P. and Vach, W. (2011) Is there a danger of ‘‘biocreep’’ with non-inferiority

trials? Trials 12 (Suppl. 1), A29

17 Gayet-Ageron, A. et al. (2010) What differences are detected by superiority trials or

ruled out by noninferiority trials? A cross-sectional study on a random sample of

two-hundred two-arms parallel group randomized clinical trials. BMC Med. Res.

Methodol. 10, 93

18 Man-Son-Hing, M. et al. (2002) Determination of the clinical importance of study

results. J. Gen. Intern. Med. 17, 469–476

19 Peterson, L., ‘‘Clinical’’ Significance: ‘‘Clinical’’ Significance and ‘‘Practical’’

Significance are NOT the Same Things, Paper presented at the Annual Meeting

of the Southwest Educational Research Association (New Orleans, LA, Feb 7,

2008)

20 Campbell, T.C. (2005) An Introduction to clinical significance: an alternative index

of intervention effect for group experimental designs. J. Early Intervention 27,

210–227

21 Committee for Proprietary Medicinal Products (CPMP), (1995) Guideline on the

Evaluation of Medicinal Products Indicated for Treatment of Bacterial Infections.

22 Committee for Medicinal Products for Human Use (CHMP), (2011) Guideline on

Clinical Investigation of Medicinal Products in the Treatment of Diabetes Mellitus. CPMP/

EWP/1080/00 Rev 1*.

23 Center for Drug Evaluation and Research (CDER), (2008) Diabetes Mellitus:

Developing Drugs and Therapeutic Biologics for Treatment and Prevention Draft Guidance.

24 Committee for Medicinal Products for Human Use (CHMP), (2011) Guideline on the

Evaluation of Medicinal Products Indicated for Treatment of Bacterial Infections. CPMP/

EWP/558/95 Rev 2.

25 European Medicines Agency, (2010) European Medicines Agency Guidance for

Companies Requesting Scientific Advice and Protocol Assistance. EMEA-H-4260-01-Rev 6.

26 Pre-IND Consultation Program. Available at: http://www.fda.gov/Drugs/

DevelopmentApprovalProcess/HowDrugsareDevelopedandApproved/

ApprovalApplications/InvestigationalNewDrugINDApplication/Overview/

default.htm (accessed 05.09.12)

27 Miller, C.S. et al. (2012) Meta-analysis of efficacy and safety of new oral

anticoagulants (dabigatran, rivaroxaban, apixaban) versus warfarin in patients with

atrial fibrillation. Am. J. Cardiol. 110, 453–460

28 Sobieraj, D.M. et al. (2012) Comparative effectiveness of low-molecular-weight

heparins versus other anticoagulants in major orthopedic surgery: a systematic

review and meta-analysis. Pharmacotherapy 32, 799–808

http://www.fda.gov/Drugs/DevelopmentApprovalProcess/HowDrugsareDevelopedandApproved/ApprovalApplications/InvestigationalNewDrugINDApplication/Overview/default.htm




Documents

Should non-inferiority drug trials be banned altogether?