11
Copyright # 2003 John Wiley & Sons, Ltd. Received \60\re /teci PHARMACEUTICAL STATISTICS Pharmaceut. Statist. 2003; 2: 241–251 (DOI:10.1002/pst.063) Issues in applying recent CPMP Points to Considerand FDA guidance documents with biostatistical implications Alan Phillips 1, * ,y , Alan Ebbutt 2 , Lesley France 3 , David Morgan 4 , Mick Ireson 5 , Lesley Struthers 6 and Guenter Heimann 7 1 Wyeth Research, Collegeville, PA, USA 2 GlaxoSmithKline, Greenford, Middlesex, UK 3 AstraZeneca, Macclesfield, Cheshire, UK 4 Ingenix Pharmaceutical Services, Maidenhead, Berks, UK 5 GlaxoSmithKline, Harlow, Essex, UK 6 Roche Products Ltd, Welwyn Garden City, Hertfordshire, UK 7 Pfizer Global Research and Development, Sandwich, Kent, UK The International Conference on Harmonisation guideline Statistical Principles for Clinical Trialswas adopted by the Committee for Proprietary Medicinal Products (CPMP) in March 1998, and consequently is operational in Europe. Since then more detailed guidance on selected topics has been issued by the CPMP in the form of Points to Considerdocuments. The intent of these was to give guidance particularly to non-statistical reviewers within regulatory authorities, although of course they also provide a good source of information for pharmaceutical industry statisticians. In addition, the Food and Drug Administration has recently issued a draft guideline on data monitoring committees. In November 2002 a one-day discussion forum was held in London by Statisticians in the Pharmaceutical Industry (PSI). The aim of the meeting was to discuss how statisticians were responding to some of the issues covered in these new guidelines, and to document consensus views where they existed. The forum was attended by industry, academic and regulatory statisticians. This paper outlines the questions raised, resulting discussions and consensus views reached. It is clear from the guidelines and discussions at the workshop that the statistical analysis strategy must be planned during the design phase of a clinical trial and carefully documented. Once the study is complete the analysis strategy should be thoughtfully executed and the findings reported. Copyright # 2003 John Wiley & Sons Ltd. Keywords: biostatistical regulatory guidelines; non-inferiority; meta-analysis; one pivotal trial; missing data; multiplicity; covariates; data monitoring committees *Correspondence to: Alan Phillips, ICON Clinical Research, 2 Globeside, Globeside Business Park, Marlow, Buckinghamshire, SL7 1TB, U.K. y E-mail: [email protected]

Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

Embed Size (px)

Citation preview

Page 1: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

Copyright # 2003 John Wiley & Sons, Ltd.Received \60\re /teci

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 2003; 2: 241–251 (DOI:10.1002/pst.063)

Issues in applying recent CPMP ‘Points

to Consider’ and FDA guidance

documents with biostatistical implications

Alan Phillips1,*,y, Alan Ebbutt2, Lesley France3, David Morgan4,Mick Ireson5, Lesley Struthers6 and Guenter Heimann7

1Wyeth Research, Collegeville, PA, USA2GlaxoSmithKline, Greenford, Middlesex, UK3AstraZeneca, Macclesfield, Cheshire, UK4 Ingenix Pharmaceutical Services, Maidenhead, Berks, UK5GlaxoSmithKline, Harlow, Essex, UK6Roche Products Ltd, Welwyn Garden City, Hertfordshire, UK7Pfizer Global Research and Development, Sandwich, Kent, UK

The International Conference on Harmonisation guideline ‘Statistical Principles for Clinical Trials’ was

adopted by the Committee for Proprietary Medicinal Products (CPMP) in March 1998, and

consequently is operational in Europe. Since then more detailed guidance on selected topics has been

issued by the CPMP in the form of ‘Points to Consider’ documents. The intent of these was to give

guidance particularly to non-statistical reviewers within regulatory authorities, although of course they

also provide a good source of information for pharmaceutical industry statisticians. In addition, the

Food and Drug Administration has recently issued a draft guideline on data monitoring committees. In

November 2002 a one-day discussion forum was held in London by Statisticians in the Pharmaceutical

Industry (PSI). The aim of the meeting was to discuss how statisticians were responding to some of the

issues covered in these new guidelines, and to document consensus views where they existed. The forum

was attended by industry, academic and regulatory statisticians. This paper outlines the questions

raised, resulting discussions and consensus views reached. It is clear from the guidelines and discussions

at the workshop that the statistical analysis strategy must be planned during the design phase of a

clinical trial and carefully documented. Once the study is complete the analysis strategy should be

thoughtfully executed and the findings reported. Copyright # 2003 John Wiley & Sons Ltd.

Keywords: biostatistical regulatory guidelines; non-inferiority; meta-analysis; one pivotal trial;

missing data; multiplicity; covariates; data monitoring committees

*Correspondence to: Alan Phillips, ICON Clinical Research, 2 Globeside, Globeside Business Park, Marlow, Buckinghamshire,SL7 1TB, U.K.

y E-mail: [email protected]

Page 2: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

BACKGROUND

The application of biostatistics in clinical trialdesign and analysis is recognized world-wide asbeing essential. Over the last 10–20 years theincreasing importance of biostatistical methodol-ogy has led to a rapid increase in biostatisticalresources in the pharmaceutical industry in Eur-ope, the USA and elsewhere. Moreover, to providedirection to sponsors in the design, conduct,analysis and evaluation of clinical trials, regula-tory authorities have issued guidelines partially orentirely concerned with biostatistics [1–3].

Arguably the most important biostatisticalregulatory guideline is the International Confer-ence on Harmonisation (ICH) E9 guideline on‘Statistical Principles for Clinical Trials’ [4]. Theguideline was developed by an ICH expert work-ing group. The group utilized the Committee forProprietary Medicinal Products (CPMP) biosta-tistical guideline [1] as a starting point, but werealso influenced by the US Food and DrugAdministration (FDA) [2] and Japanese Ministryof Health and Welfare [3] guidelines. The ICH E9guideline was adopted by the CPMP in March1998. It also became operational in the USA andJapan later in the same year.

Since March 1998 more detailed guidance onselected topics has been issued by the CPMP in theform of ‘Points to Consider’ documents. Guidancehas been provided on: switching between super-iority and non-inferiority [5]; meta analysis andone pivotal study [6]; missing data [7]; adjustmentfor multiplicity [8]; and adjustment for baselinecovariates [9]. The primary intent of these docu-ments was to give guidance to non-statisticalreviewers within regulatory authorities. However,they also provide a good source of information forpharmaceutical industry statisticians. Apart fromthe adjustment for baseline covariates document[9], all of the guidance documents were adopted bythe CPMP by September 2002. The document onadjustment for baseline covariates was released forconsultation in December 2001. In addition, in2001 the FDA issued a draft guideline on the‘Establishment and Operation of Clinical DataMonitoring Committees’ [10].

Experienced statisticians from the regulatoryauthorities were involved in drafting these gui-dance documents. Furthermore, a draft version ofeach document was made available to stake-holders, including statisticians, for consultation.However, as with all guidelines, the documents areopen to interpretation and so many industrystatisticians have started to consider how to applythe advice outlined in the documents in their day-to-day work. To establish a consensus on how toensure that clinical trials meet the desired stan-dard, a one-day discussion forum was held inLondon in November 2002. The meeting wasarranged by Statisticians in the PharmaceuticalIndustry (PSI), a UK-based professional associa-tion of statisticians interested in the application ofstatistics in the pharmaceutical industry. Theforum was similar to a previous PSI meeting heldin October 1998 on ICH E9. The findings fromthat meeting are discussed by Phillips et al. [11]. Inthe present paper, the issues raised during theNovember 2002 meeting and resulting discussionare recorded. We record some of the differentpoints of view identified, and document whereconsensus was reached.

FORMAT OF DISCUSSION FORUM

The one-day discussion forum was attended byapproximately 50 UK and mainland Europeandelegates from many pharmaceutical companies,contract research organizations (CROs), academiaand the Medicines Control Agency (MCA). NoUS representatives were present at the forum.

The forum comprised six workshops, eachfocusing on one CPMP Points to Considerdocument or the FDA draft guideline on datamonitoring committees (DMCs). Each workshopwas chaired by an experienced statistician from apharmaceutical company. Prior to the meeting,delegates were asked to identify issues for discus-sion. These were used to construct a series ofquestions for the workshops. The remainder ofthis paper documents the questions raised, result-ing discussions and any consensus reached. After abrief review of the issues associated with each

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

242 A. Phillips et al.

Page 3: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

workshop, the major questions raised are listed,immediately followed by a summary of thediscussion and agreements.

SWITCHING BETWEENSUPERIORITY AND NON-INFERIORITY

The use and interpretation of superiority, non-inferiority and equivalence trials in clinical devel-opment programmes are well described in the ICHE9 and E10 guidelines [4,12]. However, neitherguideline addresses difficulties relating to switchingfrom one design objective to another at the time ofthe analysis. To provide further guidance to theCPMP, a specific Points to Consider documentwas developed on the topic from the perspective ofan efficacy trial with a single primary variable.

Q1. It seems relatively straightforward to switch fromequivalence or non-inferiority to superiority. Conse-quently, should we always plan for equivalence ornon-inferiority, then we can never lose?

If demonstrating non-inferiority is sufficient toachieve regulatory approval, then this was deemedto be an appropriate strategy. However, there arerecognized issues with non-inferiority such as theneed for an acceptable comparator, a definablenon-inferiority margin, a clearly identifiable pri-mary analysis population and the need for assaysensitivity as discussed in ICH E10. A consequenceof this is that it would be necessary to specify boththe non-inferiority margins and the clinicallyrelevant difference for superiority in the protocol.If regulatory approval requires a demonstration ofsuperiority, there is no value in starting with non-inferiority.

Q2. Where in a protocol (and how) should it bestated that the purpose of a trial is to demonstratenon-inferiority – in the statistical section or explicitlyin the study objectives?

The regulatory statisticians in attendancepointed out that, from their perspective, it issufficient to state it in the statistical section, as theywill always read this carefully. However from acompany perspective it is important to ensure thatall members of a project team understand the

objective of the study, and not everyone may readthe statistical section thoroughly.

It was noted that some clinical staff are wary ofstating the objective of a study as ‘to demonstratenon-inferiority’ as it may appear to be prejudgingthe results: ‘assess non-inferiority’ might be betterphrasing. An alternative suggestion was to use amore neutral phrase such as ‘assess the difference’in the objectives, give a technical description ofnon-inferiority in the hypothesis, and in thestatistical section specify non-inferiority and statelimits and power explicitly.

Q3. For non-inferiority trials, intent-to-treat and per-protocol populations are stated to be of equalimportance. Which should be chosen for sample sizecalculation?

Both populations are relevant so that theconclusions can be seen to be robust. Oneapproach would be to estimate the sample sizefor both situations and use the higher number tobe conservative.

In general, the question was felt unlikely to beimportant. In practice, the variability in the twopopulations is unlikely to be very different. If alarge number of patients are excluded from theper-protocol population, this in itself may castdoubt on the credibility of the study results.

One exception to the above worth noting isantibiotic trials where the percentage of clinicallyevaluable patients can vary anywhere between50% and 90% in different trials. Consequently, thenumber of patients in the intent-to-treat popula-tion can be quite different than the per-protocolpopulation. However, this is expected at the designstage and so, in such cases, the sample sizes aretypically adjusted to accommodate the varyingevaluability rates. Nevertheless, it is importantthat the inferences drawn from both populationsare similar; that is, the conclusions are notdependent on the analysis population.

Q4. Which population should be used for secondaryendpoints, when the objective of the trial is todemonstrate non-inferiority in the primary endpoint?

Delegates felt that the intent-to-treat populationmay be more appropriate for secondary endpoints,unless secondary non-inferiority hypotheses (with

Issues in applying CPMP ‘Points to Consider’ 243

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

Page 4: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

non-inferiority limits) were clearly prespecified.However, it was noted that little extra effort wasneeded to run additional sensitivity analyses onboth populations.

Q5. There is still some debate about whether to usetwo-sided 90% or 95% confidence intervals forequivalence. Should we use one- or two-sidedconfidence intervals for non-inferiority?

The regulatory view is that clinical efficacy hashistorically been assessed using two-sided tests atthe 5% significance level, or effectively one-sided atthe 2.5% level in the usual case where only the taildemonstrating superiority is of interest. Thus thereseemed to be consensus that two-sided 95% or one-sided 97.5% confidence intervals are appropriatefor assessing non-inferiority. From a practicalperspective it was felt there is little to choosebetween the one-sided and two-sided intervals,though there was a general feeling that two-sidedintervals allowed the data to ‘speak for itself’regardless of the prior thoughts of the experimenter.

Q6. What role if any does significance testing play innon-inferiority studies? Do p-values need to bereported for such studies? To which hypothesisshould they refer?

In general the confidence interval was felt to giveall the relevant information. If a p-value is to bequoted it was agreed that it should be stated clearlywhich hypothesis it refers to. That is, is it theprobability of observing the results under the usualnull hypothesis of equality of means (which is notactually being tested in a non-inferiority trial), orthe probability of observing the results under theactual null hypothesis of A being inferior to B,evaluated at the non-inferiority limit? The latter is amore relevant p-value, but is perhaps more difficultto interpret and is certainly less widely used.However some delegates saw value in quoting itas a measure of the ‘strength of evidence’.

Q7. How should the margin of equivalence (delta) bechosen?

The Points to Consider document on choice ofdelta is awaited with anticipation. It was recog-nized that the actual choice of delta would be verymuch dependent on the indication in question.

However, a general philosophy needs to bedeveloped.

The basic principles to be noted are thatsponsors are responsible for defining the non-inferiority limit, which they should document inthe protocol, with justification given for the limitsselected. Consultation with regulatory authoritieswould be prudent before embarking on the trial toensure that a conclusion of non-inferiority on thatbasis would be acceptable.

META-ANALYSIS AND ONEPIVOTAL STUDY

ICH E9 clearly indicates that an ordered pro-gramme of clinical trials, each with its own specificobjectives, is needed to find out whether a drug isefficacious and safe. Nevertheless, there are caseswhere it may be necessary to rely on the results ofone study or a meta-analysis of several studies.

Q1. Is there harmony between the European Agencyfor the Evaluation of Medicinal Products (EMEA)and the FDA regarding one pivotal trial?

Delegates felt that there does seem to be generalacknowledgement of circumstances where confir-matory evidence from a single pivotal trial will beacceptable by regulatory authorities. It was feltlikely that the same answer would be given to thequestion ‘Is one study enough?’ from both sides ofthe Atlantic, as outlined in the FDA Guideline [13],Schultz [14] and Fisher [15]. A situation wasdiscussed where the FDA had been happy to accepta submission based on a single superiority trial, butwould not accept an argument based on a singlenon-inferiority trial. The guidance from the Pointsto Consider document had been useful in discus-sions with the FDA. Some had found it perplexingin assessing appropriate circumstances (and har-mony) at the planning stage, but in practice noconflicts were noted by delegates at the meeting.

Q2. What is a ‘compelling’ p-value for a single pivotaltrial?

Delegates agreed with the non-prescriptivenature of the Points to Consider document inrelation to p-values. There was consensus that

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

244 A. Phillips et al.

Page 5: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

sponsors and regulators should judge‘overwhelming’ evidence, although account doesneed to be taken of an FDA ‘p-value’ culture.

One approach to determining a ‘compelling’p-value might be to compare the Type I risk in asingle trial with a combined Type I risk from twosuccessful independent trials. This gives a value of0.00125 as a potential ‘compelling’ p-value for atwo-sided single clinical trial. Some delegatesconfirmed that p-values less extreme than 0.00125had been found persuasive, particularly whereprevious regulatory practice (e.g., in certain areasof oncology) had been to accept single trials. Thatis, regulators have considered and agreed to amore pragmatic Type I risk in some situations,somewhere between 0.00125 and 0.05.

During the discussion, concern was expressed asto whether ethical committees or institutionalreview boards and investigators would be comfor-table with very small p-values being required, butit was noted that a single study was in generalmore efficient and experience was cited where suchstudies had been acceptable.

Q3. How should meta-analysis be viewed – pivotal orto the ‘rescue’?

Delegates confirmed that, as per the ICH E9guideline and the Points to Consider document,the body of evidence from an ordered programmeof studies was seen as important by regulators. Ameta-analysis was recognized as valuable inproviding more precise estimates of treatmenteffects, but even a prospective meta-analysis wouldnot be expected to be the ‘only’ support.

Section II.1.3 of the Points to Consider docu-ment makes it clear that a retrospective meta-analysis would not provide sufficient evidence for aclaim where one study is positive but the other isinconclusive or negative. However, it could beinfluential where there are, for example, twopositive studies, two non-significant with a positivetrend and one neutral. Although in agreement withthe spirit of this, there was concern that a ‘rule’could not cover all situations – for example,combining a very convincing study with an‘almost’ convincing study might, in some situa-tions, provide good evidence.

Q4. Can meta-analyses help in relation to safety,secondary endpoints, summaries and subgroups?

There does not seem to be an explicit and clearexpectation by regulators of a formal meta-analysis forming part of an expert report or othersummaries, but there was recognition of the valueof a more formal, ‘objective’ approach to thesesummaries.

There was diversity of views on whether it wasbeneficial to apply formal meta-analysis techni-ques to analyse safety data on a routine basis,though it was recognized that safety data are‘underanalysed’ in comparison with efficacy data.A prespecified safety meta-analysis for certain‘anticipated’ events was thought to be useful. Itshould not be used for uncontrolled trials.

The extra power from meta-analyses might giveadditional insight from secondary endpoints orsubgroups. How ‘conclusive’ these could be woulddepend on many factors, and it was felt that theywould need to be viewed on a case-by-case basis.

Q5. Are the prerequisites of retrospective meta-analyses (Section II.1.3 of the Points to Considerdocument) too severe?

It was concluded that it may be too severe tomandate a lack of statistical significance forheterogeneity, particularly as the general tone ofthe Points to Consider document is to take areasoned line. However, one explanation for thestance was that the document may be recognizingthat, due to low power, any statistically detectedheterogeneity is likely to be of a size that will beclinically significant.

MISSING DATA

As outlined in ICH E9, missing data are apotential source of bias in a clinical trial. Hence,every effort should be undertaken to fulfil all therequirements of the protocol concerning thecollection and management of data. In reality,however, there will almost always be some missingdata. A trial may be regarded as valid, nonetheless,provided the method of dealing with any missingdata is sensible and predefined in the protocol. It isalso important to define sensitivity analyses so that

Issues in applying CPMP ‘Points to Consider’ 245

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

Page 6: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

the impact of the missing data on trial results canbe understood. When the amount of missing datais substantial, the interpretation of the results of atrial may become problematic.

Q1. Conflict between the results of various sensitivityanalyses is not uncommon. How can this be plannedfor and how can such results be explained?

The general consensus was that more use shouldbe made of sensitivity analyses in regulatorysubmissions to investigate departures from anyassumptions made in the modelling of the missingdata. This is especially true for European submis-sions where only a small number of statisticiansare employed by the various regulatory agencies,and consequently they have little opportunity toconduct additional analyses themselves.

Delegates agreed that it is important to describethe intended method for handling missing data inthe protocol or in the analysis plan, so that there isevidence of methods being selected before thetreatment blind is broken. In some instances itmay not be possible to predict the extent of andreasons for missing data at an early stage, and itwas agreed that it should be acceptable to makefurther decisions as part of a blind review.However, it is important to clarify in the trialreport when decisions were made. When the resultsof different sensitivity analyses are not similar, it isimportant to discuss and explain the reasons andthe impact as far as possible.

Q2. Is there a preference for simple imputationtechniques (e.g., Last Observation Carried Forward(LOCF) plus sensitivity analyses rather than morecomplex techniques (multiple imputation or max-imum likelihood)?

It was agreed that there was no single answer tothis question. It was likely to be dependent on thedisease, the stage of drug development, the type oftreatment benefit expected and the amount ofmissing data. The benefits of a simple approachwere that it could easily be explained to non-statisticians and also that the likely biases intro-duced could be understood. However, simpleapproaches could lead to inappropriate conclu-sions – for example, if LOCF was used but thetreatment effect was not constant over time, more

sophisticated approaches might then be preferred.It was necessary to specify the approach intendedin the protocol, and in difficult situations to seekadvice in advance from regulatory authorities.

Q3. The guideline mentions very conservative optionssuch as replacing missing data with the worst case foractive and the best case for comparator. Whatregulatory experience is there with these approaches?

The consensus was that the primary analysis in atrial was usually based on a realistic assumptionabout missing data rather than an extremelyconservative assumption, and this was acceptedby regulators. Such an approach leads to realisticestimates of treatment effects and has only a smalleffect on assumptions about the distribution of teststatistics. More conservative assumptions weretypically used to investigate robustness, althougheven in this case it was rare to use such an extremeoption as outlined in the question. It is worthnoting, however, that in the cases where thetreatment–control difference is very large, extre-mely conservative assumptions that maintain thestatistical significance would suggest that allreasonable sensitivity analyses would also main-tain the significance, thereby strengthening theconfidence in the validity of the primary analysis.

Q4. The guideline gives a somewhat negativeimpression of the value of survival analysis techni-ques – is this intended? In studies with majorendpoints (e.g., death) and low event rates, even theresults of very large studies would not be robustagainst imputation of endpoints where data aremissing. What advice can be given about this? Isthere experience of using data collected after treat-ment withdrawal?

It was agreed that survival analysis techniquesare appropriate in many instances. They doprovide a way forward without imputation,provided that the censoring process is not seen asproblematic and hazard rates are not varyingwidely across time. Some robustness analyses arealso needed, but these need not make extremeassumptions. It was important to clearly identifyand discuss reasons for censoring for all thepatients in the trial. The regulatory concernsrelated to accounting appropriately for all thesubjects in a trial, and every effort is needed to

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

246 A. Phillips et al.

Page 7: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

ensure as complete a follow-up as possible. A goodestimate in the protocol (perhaps based onpublished experience) of the proportion of cen-sored subjects to be expected may help to put thestudy results into context.

Follow-up after treatment withdrawal couldhelp in some circumstances (and is almost alwaysrequired where the endpoint is survival). Wherethe protocol defines a fixed treatment plan afterwithdrawal, the data may be more useful. Issuesmay arise about whether the aim is to comparetrial treatments or treatment strategies (whichwould include post-withdrawal treatment).

Q5. Are there circumstances when it is acceptable toexclude patients with no data after baseline ratherthan carrying forward the baseline value?

When the number of patients with no follow-updata in a given study is likely to be small, thisapproach is sometimes adopted by industrystatisticians. However, it is accepted that consid-eration needs to be given to issues which arise indeviating from a full analysis set approach byomitting randomized patients from the analysis.

Q6. Are there differences in the way that missing dataneed to be handled for equivalence studies comparedto studies aimed at showing a difference? Wouldmore discussion of this in the document be helpful?

The document does mention the issue but doesnot discuss it in detail. The consensus was that thetype of trial did affect the way in which missingdata are handled. Potential bias might operate in adifferent direction for an equivalence study thanfor a study seeking to show a difference, and thisneeds to be discussed when choosing a method ofhandling missing data. It would also affect thechoice of sensitivity analysis.

Q7. What impact does the potential for missing datahave on trial design?

There was clear consensus that missing dataneed to be considered during the study design. Afundamental principle is to avoid missing datawhenever possible, and the desire to do this couldreduce trial complexity by specifying fewer visits,using less invasive procedures, allowing reductionsin the study medication dosage, allowing the use of

escape medications or additional medications andso on. Sample size calculations also need to takeaccount of the possible loss of power that mayresult from withdrawals and the need to makeconservative assumptions about missing data. Thedocument also makes clear that as far as possiblethe methodology proposed for handling missingdata needs to be specified at the design stage.

MULTIPLICITY ISSUES INCLINICAL TRIALS

A well-known area of concern to regulatorystatisticians worldwide is ‘multiplicity’; that is,multiple treatment comparisons, multiple outcomevariables, composite variables, subgroup analyses,etc. Multiplicity occurs in virtually all clinicaltrials. However, there are many views on the rangeof different strategies for the adjustment of theType I error rate (false positive results) formultiplicity. Whatever the strategy, it is importantto ensure the appropriate estimation of thetreatment effects and the interpretation of thedata. This workshop focused on such issues andtheir relationship to regulatory claims.

Q1. How should the apparent conflict betweenadjusting p-values and unadjusted confidence inter-vals be resolved?

There was consensus that, where possible,adjusted confidence intervals should be produced.It was acknowledged that closed test and hier-archical procedures can be a problem becausecritical values are not always available. In this caseit may be appropriate to provide all the informa-tion, including the raw p-values and confidenceintervals, and state whether the pre-specifiedcriteria for the significance level has been met.However, when the data are presented it isimportant to interpret data appropriately and tobe aware of the apparent conflicts.

Q2. Hierarchical strategies are frequently recom-mended in the Points to Consider document onmultiplicity. However, no guidance is provided onhow to choose the hierarchies. What are the preferredmethods? How should the potential conflict ofdifferent conclusions being reached depending onthe hierarchical strategy used be handled?

Issues in applying CPMP ‘Points to Consider’ 247

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

Page 8: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

Different conclusions may be drawn from differ-ent hierarchical strategies. Consequently, it wasconsidered to be crucial to define the hierarchy inthe protocol by considering the objectives, relativepower and the clinical relevance of the outcomevariables. If claims are required from secondaryoutcome variables then these should be appropri-ately powered. It was agreed that the complexity ofthe analysis should be considered in the design ofstudies. There is a danger of having an over-complicated analysis if a study is designed to answer‘too many’ questions; for example, dose responsewith multiple primary and secondary variables.

Q3. What alternatives are there to hierarchicalapproaches for secondary outcome variables, whichmay be the basis for a claim?

There are a number of alternatives to hierarchicalapproaches (e.g., alpha spending, co-primaries).Frequently individual studies do not have sufficientpower for all the secondary outcome variables.Consequently, it was noted that it may be moreappropriate to consider secondary variables viameta-analyses across a program. In such cases, theprespecification of the meta-analysis, including theappropriate adjustments for multiplicity, is essen-tial. It was agreed that the objectives, relative powerand clinical importance are key considerationswhen considering alternative strategies.

Q4. How should primary composite variables beanalysed? Is there a need to adjust for multiplicitywhen analysing the individual components?

It was noted that statistical composite variablesdo not seem to be discussed in the Points toConsider document on multiplicity; for example,O’Brien [16]. It was agreed that validated compo-site variables should be treated differently fromnon-validated composites. There seemed to be aconsensus that, for non-validated composites, allthe components should move in the same direc-tion. This does not mean that each component isrequired to be statistically significant, unless theindividual elements of the composite are requiredfor specific claims. If specific claims are requiredfrom the elements of the composite, there areimplications for the Type I and II errors whichneed to be considered. At the design stage

consideration of the clinical relevance and powerof the components of the composite variable isessential.

The document describes competing risks issuesfor non-validated composites, in particular thepossibility that some treatments under study mayhave an adverse effect on one or more of thecomponents of the composite endpoint. During thediscussion it was agreed that careful considerationof competing risks scenarios is required, includinginvestigation of the relative risks through time.

Q5. In the document distinctions are drawn betweenexploratory analysis and confirmatory analysis.Should routine subgroup analyses of exploratoryfactors such as gender, age and region be conductedat a study level or a programme level?

The consensus was that exploratory analyses ofthis nature are better conducted at a programmelevel where there is more information to explorethese factors.

Q6. When is it appropriate to draw conclusions fromsubgroups?

There was general agreement with the viewsexpressed in the document. Unplanned subgroupanalyses, trawling for regulatory claims, should beavoided.

ADJUSTMENT FOR BASELINECOVARIATES

Both statisticians and physicians agree that base-line data collected prior to the start of a trialshould be used in the analysis to reduce variability,increase sensitivity and adjust for imbalances intreatment groups. However, how such baselinecovariate data are identified and used in theanalysis is often the source of many discussions.

Q1. The Points to Consider document states that oneshould include ‘centre’ in the statistical model as acovariate or stratum, if one has used ‘centre’ as astratum for the randomization. Is this a reasonableapproach for studies that have many centres, eachonly contributing a few patients, and where manymay have incomplete recruitment?

The general view was that one should fit centreas a factor in the model when the randomization is

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

248 A. Phillips et al.

Page 9: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

stratified for centre. There was agreement that oneshould not always stratify the randomization forcentre, unless centre is likely to have an effect. Ifcentre sizes are expected to be small and centre isnot considered to be an important covariate, thenthere seems little justification to use a by-centrerandomization. No consensus was reached onwhether centre should be considered a fixed orrandom effect in the statistical analysis model.

Although there was broad agreement on the needto stratify the analysis by centre if the randomiza-tion was stratified, some discussants did identifysituations where this approach is not followed.Typically this occurs when centre sizes are small andthere is a desire to ensure that all centres use bothtrial treatments. Randomization ignoring centremay not achieve this, but stratification usually willif block sizes are small. It is defined and justified inthe protocol that centre is not an importantcovariate and that the intent is to do the analysisunadjusted. Although this may not be technicallyoptimal, it is seen as a defensible approach.

It is interesting to note that the agreementsreached are similar to those reached at the October1998 meeting, as discussed in Phillips et al. [11]. Atthis meeting delegates agreed that the analysisstrategy should reflect the randomization processused. In addition, they confirmed that in themajority of statistical analyses, centre is includedas a fixed effect in the statistical analysis, with noclear view on when centre should be considered arandom effect.

Q2. How should baseline covariates to be included inthe primary analysis be identified? When should theybe identified?

As per the Points to Consider document, therewas clear consensus that baseline imbalancesshould not be used as a reason to include acovariate in the primary analysis model. Strongexpected correlation with the primary outcomevariable is the first and best criterion to justify theinclusion of a covariate.

Ideally covariates should be prespecified inthe protocol. It was recognized, however, that ifthe state of knowledge has changed between thewriting of the protocol and before unblinding

the study then it is appropriate to reconsider andupdate the prespecified covariates in, for example,the analysis plan. It was agreed that data-drivenmodel selection such as stepwise regression tech-niques should be avoided since the selectionprocedure may lead to a biased (more extreme)p-value for the treatment effect. Further, theselection algorithm cannot be easily justified.

Q3. How should treatment-by-covariate interactionsbe handled?

There was broad agreement with the Points toConsider document on how treatment-by-covari-ate interactions should be handled. That is, theprimary statistical analysis model should notinclude any treatment-by-covariate interactionterms. If such an interaction is to be expectedprior to the study, then this should be addressed inthe study design. If during the analysis phase atreatment-by-covariate interaction is suspected,then it should be explored. The primary analysisas specified in the protocol, without the interactionterm in the model, remains the primary analysisfrom which inferences should be drawn. However,any inferences could be invalidated in the presenceof important interactions.

ESTABLISHMENT AND OPERATIONOF CLINICAL DATA MONITORINGCOMMITTEES

The use of DMCs is increasing within clinicalresearch. However, there are still a large numberof different views on the membership and remit ofsuch committees. For example, important issueswhich remain unresolved are whether or not aDMC should comprise only members external tothe sponsor and whether an internal companystatistician should conduct any interim analysesand present the data to the DMC.

Q1. When is it appropriate to use an internal DMC,and should this be clarified in the guideline?

The use of an internal or external DMC largelydepends on the nature and importance of the studybeing conducted. The guideline does indicate thatunder certain circumstances an internal DMC

Issues in applying CPMP ‘Points to Consider’ 249

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

Page 10: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

(e.g., phase 2) is appropriate. However, there areso many different scenarios that can occur inclinical research that it would be difficult to coverall cases (e.g., monitoring of safety data for phase2 and/or phase 3, open label studies). It was agreedthat the guideline was useful and important, andhad adequately covered the issue. If clarificationwas needed for a specific clinical trial then thiscould and should be sought by discussion with theregulatory authorities.

Q2. In many instances it is not practical for anindependent statistician/programmer to generate theunblinded data reports for a DMC. For example, it isoften not feasible for a non-sponsor statistician/programmer to create the required data displaysduplicating the sponsor software. What should theguideline state in this situation?

There was no overall consensus on this issue.Some delegates felt strongly that there should beno problem for the company statistician/program-mer to be involved in the generation of the reports,and others felt it should be done by someoneexternal to the sponsor. However, some agreementrelating to the topic was reached.

Delegates felt that if an internal statistician/programmer generates the unblinded reports thenthe company should recognize that the ‘burden ofproof’ changes and there is a greater need toensure the integrity of the study, particularly if anychanges to the study are made. Of specialimportance is that the blind has been maintained.Ideally the internal statistician/programmer gen-erating the unblinding reports should not be on theproject and, better still, should be located atanother site. There was also consensus that reportsshould only be generated for the issues beingconsidered by the DMC. Further, only the patientsincluded in the reports should be unblinded.Finally, it was agreed that there are very few‘truly’ independent statisticians, both within asponsor company and/or external to the sponsor.Even external statisticians are financially compen-sated for their services by the sponsor.

Q3. Should representatives from the sponsor com-pany attend DMC meetings?

There was consensus that company representa-tives should be free to attend open sessions of the

DMC where issues relating to the conduct of thetrial are discussed (e.g., recruitment status). Withregard to the ‘blinded’ session, where the results ofthe trial are presented and discussed, somedelegates felt that no one from the companyshould attend, others that the internal statisticianand safety physician should attend. In this latercase, it was agreed that the sponsor representativesshould be non-voting members.

Q4. How can you ensure that the risk assessmentperformed by a DMC when deciding to stop a trial isconsistent with the risk level accepted by thecompany sponsor? What happens when the sponsoris more conservative?

The main conclusions reached by delegates werethat all DMC should have well-defined operatingprocedures or a charter. This helps to minimize anyinconsistencies between the sponsor and DMC.Such operating procedures should include roles andresponsibilities, a comprehensive discussion of thestopping rules, the actions with regards to crossingany boundaries and the processes to be followed. Apractice session with simulated data could becarried out to help test the adequacy of the charterand setting expectations with regard to risk.

SUMMARY

The ICH E9 guideline on ‘Statistical Principles forClinical Trials’ was approved in 1998 and is nowoperational world-wide. Since then more detailedguidance on selected topics has been issued by theCPMP in the form of ‘Points to Consider’documents. The intent of these was to giveguidance particularly to non-statistical reviewersalthough, of course, they also provide a goodsource of information for statisticians. Guidancehas been provided on: switching between super-iority and non-inferiority; meta analysis and onepivotal study; missing data; adjustment for multi-plicity; and adjustment for baseline covariates. TheFDA has also issued a recent draft guideline on the‘Establishment and Operation of Clinical DataMonitoring Committees’.

As these documents have been issued manystatisticians have started to consider how to use

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251

250 A. Phillips et al.

Page 11: Issues in applying recent CPMP ‘Points to Consider’ and FDA guidance documents with biostatistical implications

the information discussed in the documents intheir day-to-day work. It was clear from ourdiscussion forum that statisticians found all theguidance documents developed so far very useful,and were applying the principles widely in clinicaltrials. The guidance leaves room for professionalinterpretation in a number of areas. There is broadagreement on interpretation across industry, aca-demia and regulatory authorities.

One disappointment that surfaced during theone-day workshop was the fact that there is still aneed for separate regional regulatory guidancedocuments. Since there seems to be fairly goodagreement across some of the regions on the issues,there is an opportunity to harmonize the guidancedocuments across Europe, the USA and Japan asper the ICH process. However, it is worth notingthat in order to achieve consensus across Europe,the USA and Japan, the content of any harmo-nized guideline may be restricted and its guidanceless specific.

From all the guidelines that have been issued todate and the discussions at the one-day workshop,one important message has emerged. It is impera-tive that the statistical analysis strategy is plannedduring the design phase of a clinical trial anddocumented. This includes, how to handle missingdata, how to address multiplicity issues, and how toadjust for baseline covariates. Once the study iscomplete the analysis strategy should be executedand the findings reported; that is, plan, execute andreport. Unavoidable changes to the analysis strat-egy after the study completes should be explainedand documented, and their effects examined.

ACKNOWLEDGEMENTS

The authors would like to acknowledge the PSI’ssupport for the one-day discussion forum, Ann Gibbwho contributed a significant number of questions fordiscussion, and the contribution made by all delegates,without whom this paper would not have been possible.

REFERENCES

1. Lewis JA, Jones DR, Rohmel J. Biostatisticalmethodology in clinical trials – a European guide-line. Statistics in Medicine 1995; 14:1655–1682.

2. Food and Drug Administration. Guideline for theFormat and Content of the Clinical and StatisticalSections of an application. FDA, US Department ofHealth and Human Services. Rockville, MD, 1988.Available at www.fda.gov/cder/guidance/statnda.pdf.

3. Ministry of Health and Welfare. Guideline for theStatistical Analysis of Clinical Trials. MHW Phar-maceutical Affairs Bureau, Tokyo, 1992 (in Japa-nese).

4. ICH E9 Expert Working Group. Statistical princi-ples for clinical trials: ICH harmonized tripartiteguideline. Statistics in Medicine 1999; 18:1905–1942.

5. CPMP. Points to Consider on Switching betweenSuperiority and Non-inferiority EMEA: London,2000. Available at www.emea.eu.int/htms/human/ewp/ewpptc.htm.

6. CPMP. Points to Consider on Application with 1.Meta analyses; 2. One Pivotal Study. EMEA:London, 2001. Available at www.emea.eu.int/htms/human/ewp/ewpptc.htm.

7. CPMP. Points to Consider on Missing Data. EMEA:London, 2001. Available at www.emea.eu.int/htms/human/ewp/ewpptc.htm.

8. CPMP. Points to Consider on Adjustment for Multi-plicity Issues in Clinical Trials. EMEA: London,2002. Available at www.emea.eu.int/htms/human/ewp/ewpptc.htm. (consultation version)

9. CPMP. Points to Consider on Adjustment for Base-line Covariates. EMEA: London, 2001. Available atwww.emea.eu.int/htms/human/ewp/ewpptc.htm.

10. Food and Drug Administration. On the Establish-ment and Operation of Clinical Data MonitoringCommittees. FDA: Rockville, MD, 2001. Availableat www.fda.gov/cder/guidance/guidance.htm.

11. Phillips AJ, Ebbutt A, France L, Morgan D. ICHguideline ‘Statistical Principles for Clinical Trials’:Issues in applying the guideline in practice. DrugInformation Journal 2000; 34:337–348.

12. International Conference on Harmonisation. Choiceof Control Group in Clinical Trials, Guideline E10,2000. Available at www.emea.eu.int/htms/human/ewp/ewpptc.htm.

13. Food and Drug Administration. Guidance forIndustry: Providing Clinical Evidence of Effectivenessfor Human Drugs and Biological Products. Centrefor Drug Evaluation and Research: Rockville, MD,1998. Available at www.fda.gov/cder/guidance/1397fnl.pdf.

14. Schultz WB. Statement regarding the demonstra-tions of effectiveness of human drug products anddevices. Federal Register 1995; 60(147):39 180–39 181.

15. Fisher L. One large well-designed multi-centre studyas an alternative to the normal FDA paradigm.Drug Information Journal 1999; 33:265–271.

16. O’Brien PC. Comparisons of multiple endpoints.Biometrics 1984; 40:1079–1087.

Issues in applying CPMP ‘Points to Consider’ 251

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 241–251