Statistical approaches for conducting network meta-analysis in drug development

MAIN PAPER

(wileyonlinelibrary.com) DOI: 10.1002/pst.533 Published online in Wiley Online Library

Statistical approaches for conducting networkmeta-analysis in drug development†

Byron Jones,a* James Roger,b Peter W. Lane,c Andy Lawton,d ChrissieFletcher,e Joseph C. Cappelleri,f Helen Tate,g Patrick Moneuse,h and onbehalf of PSI Health Technology Special Interest Group, EvidenceSynthesis sub-team

We introduce health technology assessment and evidence synthesis briefly, and then concentrate on the statistical approachesused for conducting network meta-analysis (NMA) in the development and approval of new health technologies. NMA is anextension of standard meta-analysis where indirect as well as direct information is combined and can be seen as similar tothe analysis of incomplete-block designs. We illustrate it with an example involving three treatments, using fixed-effects andrandom-effects models, and using frequentist and Bayesian approaches. As most statisticians in the pharmaceutical industryare familiar with SASr software for analyzing clinical trials, we provide example code for each of the methods we illustrate. Oneissue that has been overlooked in the literature is the choice of constraints applied to random effects, and we show how thisaffects the estimates and standard errors and propose a symmetric set of constraints that is equivalent to most current prac-tice. Finally, we discuss the role of statisticians in planning and carrying out NMAs and the strategy for dealing with importantissues such as heterogeneity. Copyright © 2011 John Wiley & Sons, Ltd.

Keywords: network meta-analysis; health technology assessment; statistical methodology

1. INTRODUCTION

Anyone faced with the task of making a healthcare decisionpotentially affecting millions of people wants to make the deci-sion based on the best available information. In practice, this usu-ally means finding and combining data from different sources.There are many ways to combine such data. This paper exploresone such method: network meta-analysis (NMA). NMA is a statis-tical technique that combines studies involving different sets oftreatments, using a network of evidence, within a single analy-sis (see [1]). This integrated and unified analysis incorporates alldirect and indirect comparative evidence about treatments. How-ever, whether or not to conduct an NMA needs careful considera-tion as it may not be appropriate in all circumstances dependingupon the extent of clinical and/or statistical heterogeneity.

The aim of this paper is to give a brief introduction to NMA, todescribe the key assumptions, discuss the statistical models andapproaches that can be used to conduct an NMA, and highlightthe impact the assumptions and modelling approach can have onthe analysis and interpretation. We restrict our attention to com-bining information obtained from randomized controlled trials.Although there are already a number of papers that describe NMAmethodology (e.g., [1]), we emphasize the connections betweenalternative models and where they differ as well as showing howour models can be fitted using SASr.

Before examining the statistical methodology in more detail,we first consider the needs of the consumers of the results andhow different regulatory and government authorities view its rel-evance. In the healthcare environment, the wages and pensionsof the healthcare professionals, the drugs and appliances they

use, the buildings in which they provide care, all of these haveto be paid for. In the UK, for example, healthcare is paid for by UKtaxpayers and provided by the elected government through theNational Health Service. Currently, this costs each person in theUK around £1500 per year. There is clearly a need for taxpayers toknow that their contribution is spent equitably and that care willbe available when they need it themselves. In England, this assur-ance is provided by the National Institute for Health and Clinical

†Supporting information may be found in the online version of this article.

aNovartis, Basel, Switzerland

bLondon School of Hygiene and Tropical Medicine, London, UK

cQuantitative Sciences, GlaxoSmithKline, Stevenage, UK

dQuantitative Sciences, GlaxoSmithKline, Stockley Park, UK

eBiostatistics, Amgen Ltd, Cambridge, UK

fStatistics, Pfizer Inc, Groton, CT, USA

gSaffron Walden, Essex

hBiostatistics, Vifor Pharma Ltd, Glattbrugg, Switzerland

*Correspondence to: Byron Jones, Statistical Methodology, Integrated InformationSciences, Novartis Pharma AG, WSJ-27.1.032, Novartis Campus, CH-4056 Basel,Switzerland.E-mail: [email protected]

This article is published in Pharmaceutical Statistics as a special issue on Focusingon the PSI Special Interest Groups, edited by John Stevens, Centre for BayesianStatistics in Health Economics, ScHARR, Regent Court, 30 Regent Street, Sheffield,South Yorkshire, S1 4DA, UK.

Pharmaceut. Statist. 2011, 10 523–531 Copyright © 2011 John Wiley & Sons, Ltd.

52

3

B. Jones et al.

Excellence (NICE), in Scotland by the Scottish Medicines Consor-tium, and in Wales by the All Wales Medicines Strategy Group. Forinformation about other countries, see the PSI HTA Handbook [2].

Not only do payers have to make tough decisions but also thebasis for their decisions must be strong enough for scrutiny bypoliticians, the press, and potentially the courts. It is thereforeimperative that they assess all healthcare technologies (includingdrugs) in a fair and balanced way.

Suppose that a decision maker wants to decide whether a newcancer drug should be paid for. He or she might construct a sim-ple economic model of the disease, containing such parametersas the total cost per year of treatment, the mean length of sur-vival for patients and the mean annual utility associated withsurvival. Suppose that there are 10 high-quality clinical trials inwhich this disease has been studied. If all these trials comparedthe same treatments in a head-to-head manner, then a standardmeta-analysis would suffice to integrate and summarize the infor-mation on pairwise treatment comparisons. If the trials compareddifferent sets of treatments, then an NMA might be appropriate.The task of the NMA practitioner is to combine the estimates ofdifferent treatment effect sizes from these trials so that the deci-sion maker can populate the model and so enable a decision tobe made. Although NMA is recommended generally only for ran-domized controlled trials, where randomization is preserved, themethodology of NMA could be applied to situations where thedata are obtained from other types of investigation.

In making reimbursement decisions, different countries take adifferent view of the value of an NMA. In their advisory capacity inthe UK, NICE can make funding decisions taking into account theresults of an NMA and includes this technique in their methodsguide (http://www.nice.org.uk/, Section 5.3.13), although directevidence from head-to-head randomized controlled trials is stillconsidered to be the most valuable. In the USA, where the deci-sions on who will receive ‘coverage’ from a particular drug aremade by insurance companies, the situation is less clear-cut.The value of NMA depends on its acceptance by the insuranceand medical communities, and this will be driven by empiricalresearch demonstrating the validity of the evidence it provides.For patients in the USA, not privately insured, the Center forMedicare and Medicaid Services will ultimately determine cover-age. The decision makers may be purely federally aligned (withMedicare) or there may be a state component (with Medicaid),but all parties may consider evidence from NMA in reaching theircoverage decisions.

In summary, for our industry, recognition of the economicaspect to prescribing is key to the survival of a drug once itreaches the market place. A portfolio of products with provenquality, safety and efficacy is no longer a sufficient condition for asustainable industry. Companies must develop arguments basedon sound scientific principles and supported by good qualityevidence, that will convince healthcare payers to purchase, andhealthcare providers to prescribe, their products.

Statisticians are well placed to assist in this endeavour. But to dothis effectively, they must understand the challenges faced by thepayer in taking decisions about which drugs to fund, and embracemethodologies, such as NMA, that can potentially play such animportant role in providing the evidence to make those decisions.

In the next section, we use a published dataset to describethe use of NMA and introduce some necessary notation. In thefollowing sections, we describe a number of alternative meth-ods of analysis. In the final section, we provide some discussionand conclusions.

2. ILLUSTRATIVE EXAMPLE

To illustrate the use of NMA, we will use a dataset first publishedby Pagliaro et al. [3] and analysed by Higgins and Whitehead [4]and Whitehead [5]. Although it contains a larger number of stud-ies than might typically be included in a submission to NICE, it per-mits the illustration of a number of statistical issues. This datasethas also been used by Lu and Ades [1], which is a key paper in thedevelopment of current methods for NMA.

This dataset gave the results of 26 studies that directly com-pared either two or three nonsurgical treatments (A, B and C) forprevention of first bleeding in cirrhosis. Treatment A was the useof beta-blockers, B was sclerotherapy, and C was a control treat-ment. A brief description of the studies is given in Table I. Thedataset is displayed in Table A.1 in Appendix A. Two of the 26 stud-ies compared A, B and C, 7 studies compared A and C only and 17studies compared B and C only. The response recorded on eachpatient was binary (bleeding or not), and the data from each treat-ment arm in each study was recorded as (r/n), where r was thenumber of patients with bleeding out of n patients. Both Higginsand Whitehead [4] and Whitehead [5] analysed the data on thelog-odds scale and gave the results of Bayesian analyses. We willconsider various alternative methods of analysing this dataset inthe next section. Here, we use this example to introduce someideas and notation.

In the studies labeled as 1–9, there is a direct comparison oftreatments A and C. In the studies labeled as 1, 2, 10–26, there isa direct comparison of B and C. In addition to these direct com-parisons, there are also indirect comparisons. For example, if thedirect estimate of the difference between A and C is comparedwith the direct estimate of the difference between B and C, thenan indirect estimate of A versus B can be obtained. The direct(from studies 1 and 2) and the indirect estimates of A versus Bcan then be combined to give a single estimate. The combin-ing of direct and indirect information to estimate the treatmentdifferences is often referred to as a network meta-analysis or theanalysis of mixed treatment comparisons. For more details on theidea of networks in meta-analysis, see [6] and [7], for example.

Some readers may see an immediate similarity of NMA withthe analysis of experimental data that has been obtained usingincomplete blocks of treatments, where only intra-block informa-tion is used. From this perspective, there is nothing inherentlynew in NMA.

An important consideration in combining direct and indi-rect estimates of a particular treatment comparison is to deter-mine whether the indirect and direct estimates are in some wayinconsistent with each other [8]. That is, in our example, if thedirect estimate of A � B is not statistically ‘similar’ to the indi-rect estimate obtained by taking the difference of the estimates ofA�C and B�C. Of course defining ‘similar’ and developing tests forinconsistency can be problematic because the direct information

Table I. Treatments compared in the 26 studies.

Number of Study Treatments Directstudies labels compared comparisons

2 1–2 A B C AB, AC, BC7 3–9 A — C AC17 10–26 — B C BC5

24

Copyright © 2011 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2011, 10 523–531

B. Jones et al.

is less sensitive to variability in the definition of the treatments.So, for example, the A�B comparison averaged over studies, withthe effect of B varying, gives a suitable average based on directinformation. All the differences being averaged actually occurredwithin some real study. But when based on indirect informationfrom, say, the .A � C/ � .B � C/ comparison, it is less reliablewhen the C effect varies from trial to trial. This is especially unsuit-able when that variation is systematic. That is, .A� C/ studies arerun chronologically earlier than .B� C/ studies and for some rea-son the effectiveness of C, but not A or B, is altered by a changein background therapy. However, see [8] for a review and someapproaches to detecting inconsistency.

3. FIXED-EFFECTS MODELS

The fixed-effects model is linear on the log-odds scale and hasfixed effects for study and treatment. It is an example of a gen-eralized linear model (GLM) for binary data with a log-linear linkfunction. See [9] or other standard texts for further details of suchGLMs.

Let the probability, pij of an event for a subject in the i’th studywho is on treatment j out of K possible treatments be defined by

logit.pij/D log

�pij

1� pij

�D si C tj , iD 1, 2, : : : , N, jD 1, : : : , K .

Subjects are assumed independent so that the number rij out ofnij in treatment arm j in study i who have the event is assumedto be Binomial Bin.nij , pij/. We note that in the above form of themodel the intercept is absent. This model is not uniquely definedas we can add any number to all the si and subtract it from all thetj , and we get exactly the same model. We need a constraint. By

default SASr will add an intercept or global mean and use twoconstraints sN D 0 and tK D 0. We will omit the intercept andconstrain the first level as t1 D 0.

Typical SASr code to fit this model using the GENMOD proce-dure is given below:

��

� ��

��

��

We note that our illustrative dataset and all the SASr code dis-played in this paper are included in the online supplementarymaterial that accompanies this paper.

Given that we have a collection of two-arm and three-arm stud-ies, we can obtain an estimate of A � B in two different ways:directly using only the three-arm studies and indirectly using onlythe two-arm studies. The indirect estimate is obtained by tak-ing the difference of the estimates of A � C (from studies 3–9)

and B � C (from studies 10–26). If these direct and indirect esti-mates differ markedly, relative to the precision of estimation, thenthis would be evidence of inconsistency ([8], [10], [11]). If there isno strong evidence against the assumption of consistency thena mixed treatment comparison (MTC) of A and B is obtained byfitting the GLM to all 26 studies.

For the purposes of illustration, we will obtain estimates of A�B,separately using direct and indirect information, and informallycompare them. In practice, of course, we would use both types ofinformation, as we also illustrate. The different types of estimateof the treatment difference, on the log-odds scale, obtained usingall or selected subsets of the studies, are given in Table II. The com-putation of the Bayesian results for this fixed-effects analysis isexplained in Section 5, and we do not discuss them further in thissection. The results for A�C in the second column were obtainedby fitting the model to the data from studies 3–9, and the resultsfor B�C were obtained by fitting the model to the data from stud-ies 10–26. The indirect comparison of A� B in the third column isobtained by taking the difference of the two estimates in the sec-ond column and calculating the standard error of the difference.The direct estimate of A � B in the fourth column was obtainedfrom the analysis of studies 1 and 2. The results for A � B in thefifth column (MTC) were obtained from fitting the model to thedata from all 26 studies. Although the size of the three estimatesof A� B differ, in no case is there sufficient evidence to reject thenull hypothesis that the true A� B difference is zero.

If one combines the direct and indirect information for A � Busing inverse-variance weights, then the combined estimate is–0.125 (0.190). The precision is effectively identical to the MTCresult, but the estimate is slightly altered. This comes from theC arm in studies 1 and 2 altering the estimated study effects sj

which, in turn, alters the weight across trials for the A � B effect.This comes from the variance-mean relationship in the general-ized linear model (GLM): if an estimated study effect is altered, onealso alters the weight for that study.

4. RANDOM-EFFECTS MODELS

The random-effects model is similar in structure to the GLM usedin the previous section, except that the treatment effects withinstudy are assumed to vary from study to study. This is now aGLMM. See [12] for details on GLMMs and the use of Gaussianquadrature to fit these models. The distribution of the raw dataremains the same, but we add a random effect onto the linearpredictor

logit.pij/D si C tj C �ij

Table II. Estimates (standard errors) on the log odds ratios scale for direct and indirect treatment comparisons.

Direct estimate Indirect estimate Direct estimate MTC estimate Bayesian MTCComparison studies 3-26 studies 1-2 all studies all studies

A� C �0.603.0.185/ — �0.730.0.363/ �0.670.0.161/ �0.679.0.161/B� C �0.608.0.121/ — �0.234.0.326/ �0.553.0.113/ �0.559.0.113/A� B — 0.005.0.221/ �0.496.0.371/ �0.117.0.189/ �0.120.0.190/

MTC = mixed treatment comparison.


52

5

B. Jones et al.

where the vector �i of �ij values for the i’th study has an indepen-dent K-dimensional multivariate normal distribution N.0,�/. Theconstraint t1 D 0 remains.

The important thing to note is that the si remain as fixed effects.Although some authors have proposed fitting study as a randomeffect, it is generally accepted that it should be treated as a fixedeffect (e.g., see Section 5.12 in [5]). This avoids potential issues ofcross-level bias. All information about the treatment comparisonscomes from within each trial where the treatments were random-ized. The nature of the matrix � and the aliasing of fixed andrandom effects are discussed later.

Typical SASr code to fit this model, using the GLIMMIX pro-cedure, is given in the following text. The marginal likelihood,integrated over the random effects using Gaussian quadrature,is maximized over the fixed-effects parameters si and tj andrandom-effect parameter �2. It is important to use either thisquadrature approach, or the somewhat similar Laplace approx-imation. The GLIMMIX default penalized quasi likelihood algo-rithm is known to behave poorly with such binomial logistic mod-els and should not be used in this context. The �� optionuses normal asymptotic approximations rather than attemptingto base tests and confidence intervals on the t distribution.

��

� ��

��

��

��

��

This is the default way to declare the model and is equivalentto the model specified in Section 5.9.2 of [5] using the %GLIMMIXmacro as opposed to the GLIMMIX procedure used here. But thisassumes that the matrix � D �2IK . That is, the random effects fortreatment within study are independent with equal variance �2.This model is not recommended, but we include it here to adviseagainst its use and to demonstrate the under-estimation of thevariance �2.

The classic texts in this application area focus directly onthe treatment differences and their variation across studies. Forinstance, Lu and Ades [1] define pij as

logit.pij/D �iC ıij �†KkD1ıik=K

with ıi1 constrained to be zero for all i D 1, : : : , N and.ıi2, : : : , ıiK / � NŒ.d2, : : : , dK /,†�. Note how the random effectsare constrained to lie in one fewer dimension than in the modelspecified earlier. As such, it is a special case, and there is a directone-way mapping. The fixed effects are related by, dj D tj and�i �†

KjD1dj=K D si . So �, the vector of �i values, is offset from s,

the vector of si values, by the average of the treatment effects andso is interpreted as the event rate for an average treatment effectrather than that for treatment 1. The random effects are linked by.ıij � dj/D �ij � �i1 and†K

jD1.ıij � dj/=K D��i1. There is a result-

ing constraint †KjD1�ij D 0 for every study i. This implies that the

matrix� is singular, with�D A†AT , where the .i, j/th element ofthe K � .K �1/matrix A is defined by Aij D fiDD .jC1/g� .1=K/.The binary operatorDD has value 1 only if the two arguments areequal and is zero otherwise.

As we noted before, the random effects �i within study i arealiased with its fixed effect si which is the intercept within study.This is not a problem in defining the model, but in analysis,

any such variation will get absorbed completely into the fixed-effects estimation. This is discussed in the context of MarkovChain Monte Carlo (MCMC) for GLMMs in Chapter 16 of [13]. Instandard mixed models restricted maximum likelihood is used toallow for this reduction in dimension. Here, we do something sim-ilar by choosing a model that assumes no random variation in thestudy intercept. That is†K

jD1�ij D 0.

Both [5] and [1] recommend a simple structure for † basedon symmetry among the treatments, suggesting that †ij D

�2ŒfiDD jg.1��/C�� leading to�11 D .K�1/.1C.K�2/�/�2=K2

and �jj D ..K2 � K � 1/ � .K C 1/.K � 2/�/�2=K2 for j > 1.For these variances to be equal, we require that � D 0.5. Then†D . 1

2 JK�1C12 IK�1/�

2 where Jk is a k� k matrix of ones and Ik isa k�k unit matrix. On the other hand,� is a K�K matrix with diag-onal elements .K � 1/�2=2K and off-diagonal elements ��2=2K ,with correlation �1=.K � 1/ throughout. For two treatments, thiscorrelation is -1, as the random effect for one treatment is the neg-ative value of the same random effect for the other treatment.This demonstrates how the constraint that the random effectsmust add to zero, induces this correlation structure.

To make � singular and induce the required correlation, weuse the following SASr code instead. Here, the variable ��

holds the number of treatments in this study, whereas ��

is the index 1, 2, : : : , �� for each record within this study.The other variables are the number of events � out of � sub-jects, and the treatment ��, whereas �� indexes the study.The !"#$%& matrix is simply I3�2. These three random effectsbecome weighted by the values in the variables '%, '(, and '),which are .1 � 1=p/=

p2 or �1=

p2p, leading to variances .p �

1/�2=2p and covariances ��2=2p for†. Here p is the actual num-ber of treatments for the study (held in variable ��). Previously,the constraint has been described in terms of the total number Kof treatments. But when an individual trial uses only a subset p ofthe K treatments, the constraint is then that the sum of these prandom effects is zero.

� ��

�� *)+ �%,�)�

� ��% �� )�

�� -� �� *�+��.��$/01&2$$��&,%��&�

� �� *�+�/�

��

��

� ��

��

��

�� '% '( ') � �� !"#$%& �

��

Note that two-arm studies have ') set to zero and hence sim-ply use the average of the first two random effects. Extending toinstances with more arms per study is simply a matter of extend-ing the series of variables '%, '(, ') upwards to the maximumnumber of arms per study. Note how, for any one study, the val-ues in the covariance matrix depend upon the number of arms inthat study and not on the maximum number of arms in the meta-analysis. This is why they have to be fed in as part of the dataset inthe form of regression variables and cannot be specified directlyto the GLIMMIX procedure.

An alternative approach which constrains � to be singular isto set �ik D 0 for some specific treatment k rather than set†K

jD1�ij D 0, in a similar way to constraining the fixed effects.52

6


B. Jones et al.

The relationship between � and † then simply imbeds † into� with an additional row and column of zeros for the selectedarm k. For the Bayesian analysis that follows, using this alterna-tive parameterization will give an identical posterior for the treat-ment differences as long as independent flat priors are used forthe study effect. Indeed several of the publicly available WinBUGScodes for this meta-analysis problem rely on this fact althoughit is not explicitly stated anywhere. However, if we use this con-straint with a likelihood approach, we get different estimates aswe demonstrate here by constraining either the first or the lastarm within a study. For a two-treatment study, this means that therandom effect is applied to either A or B on its own, whereas withthe symmetric approach, it is applied to the difference between Aand B. Because of the nonlinearity of the link function, these leadto slightly different estimates for the treatment differences.

Although not recommended in the context of a maximumlikelihood analysis, we indicate how this alternative constraintapproach is implemented in SASr. The full�matrix with correla-tion 0.5 is defined using the � option with a single parameter�2. The variable��here has value zero for the first arm andeffectively sets the random effect to zero for this arm.

� 3��

�� #�� 4 �� %,�� )�

� ��

% % % /01 /01

% ( /01 % /01

% ) /01 /01 %

��

��

� ��

��

��

�� 2��5�� 3��$%&

��

In Table III, we give the results obtained using the first incorrectcode and from applying these three different forms of constraint.Note that it is the last column representing the symmetric con-straint which we recommend as the correct way to fit this modelusing likelihood with Gaussian quadrature. The bottom row ofthis table gives the estimated variance of the random treatmentdifference. All the results in this table use data from all the studiesand so correspond to the mixed treatment effects in the previoustable. All three forms of the constraint achieve the same objectiveof forcing the variance of the difference between any two randomtreatment effects to be same for all three pair-wise comparisons.The values in this table have all been confirmed using a similarGaussian quadrature approach in the SASr procedure NLMIXED.

Here, the symmetric constraint is included by subtracting themean of the random effects within each study from the randomeffects before adding them into the linear predictor, rather thanby inducing covariance. The results match to within a single unitin the last digit shown in the table. In problems such as this, wherethe variance parameter may be poorly estimated, the search for amaximum can become complicated by the approximate natureof the Gaussian quadrature approximation to the true likelihood,introducing small fluctuations in the evaluated likelihood at eachstep. Although this did not occur in this case, independent pro-gramming in this way is highly recommended. Looking backto the last column of Table II, we note the expected increasein the size of the standard errors. The qualitative conclusionsare the same in that differences A�C and B�C remain statisticallysignificant and A� B remains not statistically significant.

5. BAYESIAN ANALYSES

Indirect comparisons models have standardly been fitted usingBayesian methods via MCMC simulation within the WinBUGS soft-ware package [14]. Dias et al. [15], for example, describe Bayesianmodels and give WinBUGS code for a series of examples. TheBayesian model is the same as that used previously, but each ofthe parameters requires the definition of a prior distribution. Usu-ally these are chosen to be independent and ‘uninformative’ orflat on some suitable scale for the parameter. Usually the fixed-effects parameters for treatment and study are assumed to beindependent normal random variables with mean zero and stan-dard deviation 100. Note that 100 on the log-odds scale is anextreme change.

The Bayesian solution to the fixed-effects model can be fittedvery easily by adding the line

�� 67�%/////�

to the GENMOD code for the fixed-effects analysis. The �67�

option, which defines the length of the chain, is needed to drivethe Monte Carlo standard error down beyond the digits displayedin the table. For most purposes, a much shorter chain is sufficient.The results appear in the final column of Table II.

Choosing a suitable prior for the variance in the random-effectsmodel is more debatable. Whitehead [5] used an inverse-gammadistribution with parameters 0.001 and 0.001 for the variance ofthe random effect. However, such inverse gamma distributionsfor variances in hierarchical models have lost popularity becauseof their properties close to zero ([16]). The WinBUGS code in [15]uses a uniform distribution on the interval [0,5].

Both these WinBUGS solutions use the constraint that the vari-ance of the random effect for the reference or control arm is setto zero. For the Bayesian analysis, any choice of constraint leads

Table III. Mixed treatment estimates and their standard errors obtained from fittingrandom-effects models with different constraints and the study level variance of treat-ment differences.

Default V(first) = 0 V(last) = 0 Symmetric

A� C �0.717.0.260/ �0.694.0.349/ �0.792.0.366/ �0.740.0.383/B� C �0.586.0.181/ �0.558.0.245/ �0.619.0.256/ �0.581.0.268/A� B �0.132.0.302/ �0.136.0.406/ �0.173.0.426/ �0.159.0.446/Variance .�2/ 0.322 0.802 0.902 1.024


52

7

B. Jones et al.

to identical answers as long as a flat prior is used for the studyfixed effects, and interest is restricted to treatment differences.This is generally the case, so this approach can be used safely inthis context. The symmetric constraint is required whenever aninformative prior is used for the study fixed effects. So Table IVmakes no distinction between the constraints being used for therandom effects within studies. Also, we need to check that theMC chains are behaving well and have converged to a steadystate and are not subject to long term fluctuation. AlthoughWinBUGS, and its derivative OpenBUGS [17], are traditionallyused, it is possible to fit exactly the same Bayesian models inSASr. Indeed fitting a model in separate software packages usingdifferent algorithms is useful to confirm the results of such com-putation. Without this, it is difficult to guarantee truly indepen-dent quality control. Of course, once some code or algorithm hasbeen fully validated, it may not be necessary to confirm the resultseach time using alternative code or algorithms. The MCMC pro-cedure in SASr uses a Metropolis Hastings algorithm, and as aresult, there is usually much more autocorrelation in the result-ing chain than when using WinBUGS. However, by thinning thechain and using longer runs, it is still possible to carry out aneffective Bayesian analysis of indirect comparisons. In version 9.3of SASr, there is a new ��!6 statement, and this will makethe chain have better properties and run considerably faster (see[18], for more information on the MCMC procedure). The resultsin Table IV use the facilities in SASr 9.2 release M3, where theMCMC procedure became approved rather than experimental.The prior for the standard deviation of the random effects usesa uniform distribution on the interval [0.001,10]. The lower limitis a safety margin to assist good chain behavior for this standarddeviation parameter.

The means and standard deviations for the posterior distri-butions for the log-odds ratios are presented in Table IV. Themedians are very similar to the means because of the sym-metry of the posterior and are not presented here, although

in general, medians and percentiles are safer summaries thanmeans and standard deviations. The second column uses publiclyavailable WinBUGS code (https://www.bris.ac.uk/cobm/research/mpes/mtc.html) and uses a chain length of 2 million after a burn-in of 10,000. The SASr MCMC procedure results use 10,000 itera-tions per tuning unit (default is 500), a burn-in of 10,000 (default is1000) and 100 million iterations with a thinning of 1%. This manyis requested to get the MCMC standard error down to the orderof 0.001 on the log-odds scale. Running the same code with only1 million iterations gives accuracy to the order of 0.01, which issufficient for most purposes.

We note the similarity of the results in Table IV to those ofthe frequentist results for the random-effects model displayed inTable III. Frequentist and Bayesian estimates of log-odds ratios dif-fer by around 10% of their standard error, confirming that similarconclusions would be drawn from either approach.

The estimates of the variance, �2, are about 40% larger forthe Bayesian method compared with the maximum likelihoodmethod. As a result, the standard deviation of the posterior dis-tribution for the log-odds ratio is about 20% larger than theequivalent standard error, leading to slightly more conservativeconclusions. In summary, and relative to the maximum likelihoodsolution, the Bayesian approach gives a change in the log-oddsratio of about 10% of the standard error and a broadening of anyinterval estimate by about 20%.

Lu and Ades [2] suggest a series of possible, more compli-cated, parameterizations of the † matrix. These can be imple-mented most easily using the NLMIXED procedure in SAS, wherewe can easily center the random effects using program state-ments. It is well suited to more complex covariance models withrelatively few treatments per study. For problems with severalarms within a single trial, but simple covariance, the GLIMMIXapproach is preferred.

Finally, in Table V, we show how the results depend on thechoice of prior distribution for the trial effect and the choice of

Table IV. Estimates: mean (standard deviation) of posterior distribution for log-odds andmedian for variance �2, obtained using WinBUGS and the MCMC procedure in SAS.

Whitehead (2002) code Bristol code MCMC in SASr

A� C �0.784.0.442/ �0.784.0.455/ �0.783.0.458/B� C �0.599.0.312/ �0.598.0.319/ �0.599.0.322/A� B �0.185.0.515/ �0.185.0.531/ �0.184.0.535/Variance .�2/ 1.330 1.453 1.458

MCMC, Markov Chain Monte Carlo.

Table V. Estimates: mean (standard deviation) of posterior distribution for log-odds andobtained using WinBUGS with an informative and uninformative prior.

Prior N(-1,1/5) Prior N(0,1000)

Reference treatment Reference treatment

First Last First Last

A� C �0.241.0.339/ 0.845.0.340/ 0.783.0.457/ 0.784.0.455/B� C �0.400.0.242/ 0.549.0.235/ �0.599.0.320/ �0.599.0.319/A� B �0.159.0.395/ 0.296.0.394/ 0.185.0.531/ 0.185.0.530/5

28


B. Jones et al.

reference treatment. As expected, the use of an uninformativeprior gives the same results (with some minor numerical variation)irrespective of whether the first or last treatment is chosen as thereference treatment. However, when an informative prior is used,the choice of prior leads to a marked difference in the resultswhen the reference treatment is changed.

6. DISCUSSION AND CONCLUSIONS

Decisions made by reimbursement agencies, taxpayers andpatients need to be based on relevant and meaningful informa-tion. An important tool in providing some of that information isNMA. To be useful, however, the NMA needs to be the result of acollaboration between statisticians, clinicians, health economists,and epidemiologists. Each has a part to play, and in this paper, wehave emphasized the role of the statistician. He or she needs tobe fully versed in the necessary methodology and to understandits limitations as well as its strengths.

Following a careful review of the evidence that will form thebasis of the NMA, and establishing that an NMA is appropriate,the two main decisions to make when choosing a method forNMA are between fixed-effects and random-effects models, andbetween Bayesian and frequentist approaches. As noted earlier,a fixed-effects analysis is appropriate when it is reasonable toassume a common effect of treatment across the trials to be com-bined. This does not mean that the actual treatment means arecommon across trials: it is the differences between means (orbetween log odds-ratios or whatever measure of effect size isused). The approach becomes unreasonable when there is sub-stantial heterogeneity either in the observed effect of the treat-ment (statistical heterogeneity) or in the context of the trial suchas type of patient, severity of disease or length of study (clini-cal heterogeneity). Statistical heterogeneity of the selected stud-ies can be assessed using statistics like I2 [19] and DIC [15, 20],but use of formal tests to derive rules based on these or otherstatistics such as Cochrane’s Q is not helpful ([21]). Instead, thedecision about which method to use should be based on bothan assessment of clinical heterogeneity, and on the use to bemade of the resulting combined estimate in the light of theclinical heterogeneity.

When appropriate, a fixed-effects estimator based on inverse-variance is most efficient for estimation of the treatment effect,from the point of view of statistical information. However, whenthere is doubt about heterogeneity, the fixed-effects analysis canbe misleading. For example, in the case of just two trials, onelarge and one small, the combined estimate will be based almostentirely on the large trial and take little account of the evidencein the small trial. The small trial may be different in some clini-cally important way, but in a fixed-effects analysis, the standarderrors will not represent the impact that any heterogeneity hason the precision.

It is widely accepted that a random-effects analysis should beused if a combined estimate from the assembled data is judgedappropriate despite the presence of substantial heterogeneity.However, this is not a simple decision. The random-effects anal-ysis is only appropriate if a suitable distribution can be assumedfor the effects, and we are interested in the average over a hypo-thetical population of trials, including those we know about buttreating them as representative of a much wider population. Thecause and pattern of heterogeneity should be investigated bytechniques such as meta-regression [22], and a simple random-effects analysis should be used only if this investigation suggests

it is appropriate. If it is, then the random-effects analysis takesaccount of trial-to-trial variation in the treatment effect and esti-mates of precision of the combined estimate takes into accountthe extra level of uncertainty resulting from the hypothesizedpopulation of trials.

However, it is always questionable whether the assumed pop-ulation is realistic. If the heterogeneity in the collected trialsis actually due to some simple difference between the trials,plus random variation, the combined estimate may be seriouslybiased. For example, one set of trials may have been carried outin such a way as to estimate a smaller effect than the remainderof the trials: the simple dichotomous heterogeneity being due tosome unknown attribute of the trial protocols or management.In such a case, the fixed-effects analysis might be more appro-priate, although this depends on whether the proportion of trialsin the two sets is well estimated by the trials actually combined.In general, care has to be taken in using random-effects analysiswhen trials differ substantially in size because the small trials mayexert undue weight on the results ([23], Section 9.5.4). Whetheror not there is heterogeneity, a fixed-effects analysis provides acombined estimate from all the information that is actually avail-able; however, its standard error must be interpreted in the lightof this specific weighted average across the studies, and not inany wider context.

In the context of meta-analysis, Bayesian and frequentist meth-ods are expected to give much the same results because it is com-mon practice to use non-informative priors ([22], Section 16.8.1).Although Bayesian analyses can make use of informative priors,we have only seen one published example where this has beendone ([4]). A disadvantage of the Bayesian approach is the com-plexity of the technique, requiring knowledge of the MCMC pro-cess and assessment of its convergence. There is also a minorproblem that MCMC estimates are subject to simulation error; thiscan be reduced by increasing the number of simulations fromwhich an estimate is derived, but it cannot be eradicated com-pletely. The Bayesian analysis generally uses larger estimates of� than the likelihood analysis, as the former integrates over nui-sance parameters, the study effects in this case, whereas the latterprofiles them out.

For rare-event data, many methods can have difficulty han-dling information from studies with zero events under one ormore treatments ([24]). The method of logistic regression han-dles these issues, and the frequentist approach using this methodtakes account of such studies appropriately (excluding studieswith no events at all, as these provide no information about rel-ative odds) although there are potential concerns with asymp-totic approximations. Frequentist random-effects meta-analysisof binary data requires the use of generalized linear mixedmodels, for which algorithms have been implemented but areless well-established than linear mixed models. The Bayesianapproach using logistic regression is well-established within theMCMC framework.

The literature related to NMA is increasing at a fast pace andstatisticians will need to keep up with methodological devel-opments. The International Society for Pharmacoeconomics andOutcomes Research Task Force on Indirect Comparison GoodResearch Practices has recently published their recommendationson the use of NMA ([25] and [26]). These two publications, and thereferences they cite, may be consulted for good practice and asvaluable reading for statisticians. Similarly, much useful technicaladvice on implementing the Bayesian approach and supportingWinBUGS code, can be found in [11], [15] and [27]).


52

9

B. Jones et al.

Acknowledgements

We are grateful to the referees, John Stevens and Jason Wang forcomments and suggestions on an earlier version of this paper.

REFERENCES

[1] Lu G, Ades AE. Combination of direct and indirect evidence in mixedtreatment comparisons. Statistics in Medicine 2004; 23:3105–3124.

[2] Fletcher C, Whately-Smith C, Lawton A, Reid C, Moneuse P,Paget M-A, Ducournau P, Hepworth D, Bravo M-L, Mann H, Tate H,Chuang-Stein C, Jones B, Gibb A. PSI health technology assessment(HTA) handbook, version 1.0. Available at: http://www.psiweb.org/docs/psihandbookfinalv1010may10.doc (accessed 07.11.2011).

[3] Pagliaro L, D’Amico G, Sorensen TI, Lebrec D, Burroughs AK,Morabito A, Tine F, Politi F, Traina M. Prevention of first bleedingin cirrhosis. A meta-analysis of randomized trials of nonsurgicaltreatment. Annals of Internal Medicine 1992; 117:59–70.

[4] Higgins JPT, Whitehead A. Borrowing strength from external trialsin meta-analysis. Statistics in Medicine 1996; 15:2733–2749.

[5] Whitehead A. Meta-analysis of controlled clinical trials. John Wileyand Son: Chichester, 2002.

[6] Salanti G, Higgins JPT, Ades AE, Ioannidis JPA. Evaluation of net-works in randomized trials. Statistical Methods in Medical Research2008; 17:279–301.

[7] Ioannidis JP. Integration of evidence from multiple meta-analyses: aprimer on umbrella reviews, treatment networks and multiple treat-ments meta-analyses. Canadian Medical Association Journal 2009;181(8):488–493.

[8] Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. NICETechnical Support Document 4: Inconsistency in networks of evi-dence based on randomized controlled trials. Decision SupportUnit, ScHARR, University of Sheffield, Regent Court, 30 RegentStreet Sheffield, S1 4DA, 2011. Available at: http://www.nicedsu.org.uk (accessed 07.11.2011).

[9] Collett D. Modelling binary data, (2nd edn). Chapman and Hall: BocaRaton, 2003.

[10] Bucher HC, Guyatt GH, Griffith LE, Walter DS. The results ofdirect and indirect treatment comparisons in meta-analysis of ran-domized controlled trials. Journal of Clinical Epidemiology 1997;50(6):683–691.

[11] Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency inmixed treatment comparison meta-analysis. Statistics in Medicine2010; 29(7-8):932–944.

[12] Agresti A. Categorical data analysis, (2nd edn). Wiley: Hoboken,2002.

[13] Gilks WR, Richardson S, Spiegelhalter DJ. Markov chain Monte Carloin practice. Chapman and Hall: Boca Raton, 1996.

[14] Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – a Bayesianmodelling framework: concepts, structure, and extensibility. Statis-tics and Computing 2000; 10:325–337.

[15] Dias S, Welton NJ, Sutton AJ, Ades AE. NICE Technical Support Doc-ument 2: A generalised linear modelling framework for pairwiseand network meta-analysis of randomized controlled trials. Deci-sion Support Unit, ScHARR, University of Sheffield, Regent Court,30 Regent Street Sheffield, S1 4DA, 2011. Available at: http://www.nicedsu.org.uk (accessed 07.11.2011).

[16] Gelman A. Prior distributiona for variance parameters in hierarchicalmodels. Bayesian Analysis 2006; 1:515–533.

[17] Lunn D, Spiegelhalter D, Thomas A, Best N. The BUGS project: evo-lution, critique and future directions (with discussion). Statistics inMedicine 2009; 28:3049–3082.

[18] Chen F. The RANDOM statement and more: moving on with PROCMCMC. SAS Global Forum Proceedings, 2011; Paper 334–2011. http://support.sas.com/resources/papers/proceedings11/334-2011.pdf.

[19] Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002; 21:1539–1558.

[20] Spiegelhalter DJ, Best NG, Carlin BP, Van der Linde A. Bayesian mea-sures of model complexity and fit (with discussion). Journal of theRoyal Statistical Society, Series B 2002; 64(4):583–616.

[21] Hardy RJ, Thompson SG. Detecting and describing heterogeneity inmeta-analysis. Statistics in Medicine 1998; 17:841–856.

[22] Morton SC, Adams JL, Suttorp MJ, Shekelle PG. Meta-regressionapproaches: what, why, when and how? Technical review 8(prepared by Southern California-RAND Evidence-based PracticeCenter, under Contract No. 290-97-001). AHRQ Publication No. 04-0033, MD: Agency for Healthcare Research and Quality, Rockville,March 2004.

[23] Cochrane Handbook. Available at: http://www.mrc-bsu.cam.ac.uk/cochrane/handbook/chapter_9/9_5_4_incorporating_heterogeneity_into_random_effects_models.htm(accessed 2011).

[24] Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Useand avoidance of continuity corrections in meta-analysis of sparsedata. Statistics in Medicine 2004; 23:1351–1375.

[25] Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N,Lee MA. Interpreting indirect treatment comparisons and networkmeta-analysis for health care decision-making. Report of the ISPORTask Force on Indirect Comparison Good Research Practices - Part 1.Value in Health 2011; 14:417–428.

[26] Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC,Boersma C, Thompson D, Larholt KM, Diaz M, Barrett A. Conductingindirect treatment comparisons and network meta-analysis stud-ies. Report of the ISPOR Task Force on Indirect Comparison GoodResearch Practices - Part 2. Value in Health 2011; 14:429–437.

[27] Dias S, Welton NJ, Sutton AJ, Ades AE. NICE Technical SupportDocument 1: Introduction to evidence synthesis for decision mak-ing. Decision Support Unit, ScHARR, University of Sheffield, RegentCourt, 30 Regent Street Sheffield, S1 4DA, 2011. Available at:http://www.nicedsu.org.uk (accessed 07.11.2011).

53

0


B. Jones et al.

APPENDIX A. DATASET

Table A.1. Data from [3]. Table arranged to display studies directly comparing A and C on the left side and studies directlycomparing B and C on the right side.

Set 1 Set 2

A C B C

Study Label r n r n Study Label r n r n

1 2 43 13 41 1 9 42 13 412 12 68 13 72 2 13 73 13 723 4 20 4 16 10 4 18 0 194 20 116 30 111 11 3 35 22 365 1 30 11 49 12 5 56 30 536 7 53 10 53 13 5 16 6 187 18 85 31 89 14 3 23 9 228 2 51 11 51 15 11 49 31 469 8 23 2 25 16 19 53 9 60

17 17 53 26 6018 10 71 29 6919 12 41 14 4120 0 21 3 2021 13 33 14 3522 31 143 23 13823 20 55 19 5124 3 13 12 1625 3 21 5 2826 6 22 2 24


53

1

Documents

Statistical approaches for conducting network meta-analysis in drug development