6
Edinburgh Research Explorer Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Citation for published version: Suchard, MA, Lemey, P, Baele, G, Ayres, DL, Drummond, AJ & Rambaut, A 2018, 'Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10', Virus Evolution, vol. 4, no. 1, vey016. https://doi.org/10.1093/ve/vey016 Digital Object Identifier (DOI): 10.1093/ve/vey016 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: Virus Evolution Publisher Rights Statement: VC The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 06. Jan. 2021

Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

Edinburgh Research Explorer

Bayesian phylogenetic and phylodynamic data integration usingBEAST 110

Citation for published versionSuchard MA Lemey P Baele G Ayres DL Drummond AJ amp Rambaut A 2018 Bayesian phylogeneticand phylodynamic data integration using BEAST 110 Virus Evolution vol 4 no 1 vey016httpsdoiorg101093vevey016

Digital Object Identifier (DOI)101093vevey016

LinkLink to publication record in Edinburgh Research Explorer

Document VersionPublishers PDF also known as Version of record

Published InVirus Evolution

Publisher Rights StatementVC The Author(s) 2018 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License(httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation If you believe that the public display of this file breaches copyright pleasecontact openaccessedacuk providing details and we will remove access to the work immediately andinvestigate your claim

Download date 06 Jan 2021

Bayesian phylogenetic and phylodynamic data

integration using BEAST 110Marc A Suchard123dagger Philippe Lemey4Dagger Guy Baele4sect Daniel L Ayres5

Alexei J Drummond67 and Andrew Rambaut81Department of Biomathematics David Geffen School of Medicine University of California Los Angeles 621Charles E Young Dr South Los Angeles CA 90095 USA 2Department of Biostatistics Fielding School ofPublic Health University of California Los Angeles 650 Charles E Young Dr South Los Angeles CA 90095USA 3Department of Human Genetics David Geffen School of Medicine University of California Los Angeles695 Charles E Young Dr South Los Angeles CA 90095 USA 4Department of Microbiology and ImmunologyRega Institute KU Leuven Herestraat 49 3000 Leuven Belgium 5Center for Bioinformatics andComputational Biology University of Maryland College Park 125 Biomolecular Science Bldg 296 CollegePark MD 20742 USA 6Department of Computer Science University of Auckland 30338 Princes StAuckland 1010 NZ 7Centre for Computational Evolution University of Auckland 30338 Princes StAuckland 1010 NZ and 8Institute of Evolutionary Biology University of Edinburgh Ashworth LaboratoriesEdinburgh EH9 3FL UK

Corresponding author E-mail msucharduclaedu (MAS) alexeicsaucklandacnz (AJD) arambautedacuk (AR)daggerhttporcidorg0000-0001-9818-479X

Daggerhttporcidorg0000-0003-2826-5353

secthttporcidorg0000-0002-1915-7732

httporcidorg0000-0003-4337-3707

Abstract

The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesianphylogenetic and phylodynamic inference from genetic sequence data BEAST unifies molecular phylogenetic reconstruc-tion with complex discrete and continuous trait evolution divergence-time dating and coalescent demographic models inan efficient statistical inference engine using Markov chain Monte Carlo integration A convenient cross-platform graphicaluser interface allows the flexible construction of complex evolutionary analyses

Key words phylogenetics phylodynamics Bayesian inference Markov chain Monte Carlo

1 Introduction

First released over 14 years ago the Bayesian EvolutionaryAnalysis by Sampling Trees (BEAST) software package has be-come firmly established in a broad diversity of biological fieldsfrom phylogenetics and paleontology population dynamics

ancient DNA and the phylodynamics and molecular epidemiol-ogy of infectious disease (Drummond et al 2012) BEASTrsquos spe-cific focus on time-scaled trees and the evolutionary analysesdependent on them has given it a unique place in the toolboxof molecular evolution and phylogenetic researchers Since in-ception a strong motivation for BEAST development has been

VC The Author(s) 2018 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

1

Virus Evolution 2018 4(1) vey016

doi 101093vevey016Resources

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

the rapid growth of pathogen genome sequencing as part ofpublic health responses to infectious diseases (Grenfell et al2004) In particular fast evolving viruses can now be tracked innear real-time (see eg Quick et al 2016) to understand their ep-idemiology and evolutionary dynamics

In BEAST version 110 we have introduced a series of advan-ces with a particular focus on delivering accurate and informa-tive insights for infectious disease research through theintegration of diverse data sources including phenotypic andepidemiological information with molecular evolutionarymodels These advances fall into three broad themesmdashthe inte-gration of diverse sources of extrinsic information as covariatesof evolutionary processes the increased flexibility and modula-rization of the model design process with robust and accuratemodel testing methods and substantial improvements on thespeed and efficiency of the statistical inference

2 Data integration

Many traits in phylogenetics are represented as or partitionedinto a finite number of discrete values with geographical loca-tion standing out as a popular example Because BEAST is dedi-cated to sampling time-scaled phylogenies new developmentsof discrete character mapping enable the reconstruction oftimed viral dispersal patterns while accommodating phyloge-netic uncertainty By extending the discrete diffusion models toincorporate empirical data as covariates or predictors of transi-tion rates BEAST can simultaneously test and quantify a rangeof potential predictive variables of the diffusion process (Lemeyet al 2014) Further realizations of the trait transition processcan also be efficiently produced to pinpoint the nature and tim-ing of changes in evolutionary history beyond ancestral nodestate reconstruction (termed Markov jumps) or to infer the timespent in a particular state (Markov rewards) (Minin and Suchard2008) For molecular data fast stochastic mapping approachesare also employed to obtain site-specific dN=dS estimates inte-grating over the posterior distribution of phylogenies and an-cestral reconstructions to quantify uncertainty on thesemeasures of the selective forces on individual codons (Lemeyet al 2012)

Multivariate continuous traits are incorporated using phylo-genetic Brownian diffusion processes modelling the shared an-cestral dependence across taxa and the correlations betweenthese variables Such continuous models have most frequentlybeen applied to diffusion on a geographical landscape with thetraits representing coordinates and the phylogeny reconstruct-ing the epidemiological process within the host population(Lemey et al 2010) The landscapes can also represent otherspaces and integration of antibody binding assay data have ex-tended lsquoantigenic cartographyrsquo (Smith et al 2004) approaches tomodel simultaneous antigenic and genetic evolution and inferthe viral trajectories in the immunological space generated bythe host population (Bedford et al 2014)

Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unreal-istic for many evolutionary problems (including geographicalreconstruction) A recently developed relaxed directional ran-dom walk allows the diffusion processes to take on different di-rectional trends in different parts of the phylogeny whilepreserving model identifiability (Gill et al 2017) and opens upthese processes for a wide range of applications BEAST 110also extends multivariate phylogenetic diffusion to latent liabil-ity model formulations in order to assess correlations betweentraits of different data types including (various combinations

of) continuous binary and discrete traits (Cybis et al 2015) asdemonstrated by applications to flower morphology antibioticresistance and viral epitope evolution To infer correlations be-tween high-dimensional traits computationally efficiently anovel phylogenetic factor analysis approach assumes that asmall unknown number of independent evolutionary factorsevolve along the phylogeny and generate clusters of dependenttraits at the tips (Tolkoff et al 2018)

Further extending the data integration approach BEAST 110

includes a flexible framework for incorporating time-varyingcovariates of the effective population size over time This usesGaussian Markov random fields to reconstruct smoothed effec-tive population size trajectories while simultaneously estimat-ing to what extent predictor variables (eg fluctuations inclimatic factors host mobility or vector density) may havedriven the dynamics (Gill et al 2016) Using a similar general-ized linear modeling (GLM) approach classical epidemiologicaltime-series data such as case counts (Gill et al 2016) can be inte-grated with pathogen genome sequence data to provide joint in-ference of important epidemiological parameters

Finally recent host-transmission models allow the integra-tion of complete or partial knowledge of a pathogenrsquos transmis-sion history enabling the simultaneous inference of within-host population dynamics viral evolutionary processes andtransmission times and bottlenecks (Vrancken et al 2014)Likewise other priors enable the reconstruction of transmissiontrees of infectious disease epidemics and outbreaks while ac-commodating phylogenetic uncertainty and employ a newlydesigned set of phylogenetic tree proposals that respect nodepartitions (Hall et al 2015)

3 Flexible model design

BEASTrsquos companion graphical user interface program BEAUtiallows the user to import data select models choose prior dis-tributions and specify the settings for both Bayesian inferenceand marginal likelihood estimation Our efforts on BEAUti 110have focused on allowing the user to easily link or unlink substi-tution clock and tree models across multiple partitions as well

as linking individual parameters to provide considerable adapt-ability in model design Additionally BEAUti can also group var-ious parameters in a hierarchical phylogenetic model prior(Suchard et al 2003) which allows parameters to take differentvalues but be linked by a common distribution the parametersof which can then be inferred For example flexible codonmodel parameterizations using hierarchical phylogenetic mod-els (Baele et al 2016b) and incorporating a range of potentialpredictive variables for substitution behaviour (Bielejec et al2016a) provide insight into the tempo and mode of pathogenevolution

Marginal likelihood estimation to compare models usingBayes factors has become common practice in Bayesian phylo-genetic inference BEAST 110 now features marginal likelihoodestimation (Baele et al 2012) using path sampling (Gelman andMeng 1998 Lartillot and Philippe 2006) and stepping-stone sam-pling (Xie et al 2011) as well as the recently developed general-ized stepping-stone sampling (Fan et al 2011 Baele et al 2016a)that offers increased accuracy and improved numerical stabilityby employing the concept of lsquoworking distributionsrsquo ie distri-butions with known normalizing constants and parameterizedusing samples from the posterior distribution

2 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

4 Performance and efficiency

Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two

advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V

41 Example

Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities

Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)

trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-

graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors

for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-

rating case counts

M A Suchard et al | 3

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

5 Relationship to BEAST2 and other software

Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains

A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree

51 Availability

BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)

Acknowledgements

We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo

KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research

Conflict of interest None declared

ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30

Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]

Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3

Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67

amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805

amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64

Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005

Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914

Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023

amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9

Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537

Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969

Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73

Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J

4 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15

Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32

Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85

Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319

Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56

Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32

Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613

Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207

Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect

Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56

Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932

amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85

Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95

Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32

Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042

Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6

Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64

To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97

Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99

Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025

Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505

Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60

M A Suchard et al | 5

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

  • l
Page 2: Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

Bayesian phylogenetic and phylodynamic data

integration using BEAST 110Marc A Suchard123dagger Philippe Lemey4Dagger Guy Baele4sect Daniel L Ayres5

Alexei J Drummond67 and Andrew Rambaut81Department of Biomathematics David Geffen School of Medicine University of California Los Angeles 621Charles E Young Dr South Los Angeles CA 90095 USA 2Department of Biostatistics Fielding School ofPublic Health University of California Los Angeles 650 Charles E Young Dr South Los Angeles CA 90095USA 3Department of Human Genetics David Geffen School of Medicine University of California Los Angeles695 Charles E Young Dr South Los Angeles CA 90095 USA 4Department of Microbiology and ImmunologyRega Institute KU Leuven Herestraat 49 3000 Leuven Belgium 5Center for Bioinformatics andComputational Biology University of Maryland College Park 125 Biomolecular Science Bldg 296 CollegePark MD 20742 USA 6Department of Computer Science University of Auckland 30338 Princes StAuckland 1010 NZ 7Centre for Computational Evolution University of Auckland 30338 Princes StAuckland 1010 NZ and 8Institute of Evolutionary Biology University of Edinburgh Ashworth LaboratoriesEdinburgh EH9 3FL UK

Corresponding author E-mail msucharduclaedu (MAS) alexeicsaucklandacnz (AJD) arambautedacuk (AR)daggerhttporcidorg0000-0001-9818-479X

Daggerhttporcidorg0000-0003-2826-5353

secthttporcidorg0000-0002-1915-7732

httporcidorg0000-0003-4337-3707

Abstract

The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesianphylogenetic and phylodynamic inference from genetic sequence data BEAST unifies molecular phylogenetic reconstruc-tion with complex discrete and continuous trait evolution divergence-time dating and coalescent demographic models inan efficient statistical inference engine using Markov chain Monte Carlo integration A convenient cross-platform graphicaluser interface allows the flexible construction of complex evolutionary analyses

Key words phylogenetics phylodynamics Bayesian inference Markov chain Monte Carlo

1 Introduction

First released over 14 years ago the Bayesian EvolutionaryAnalysis by Sampling Trees (BEAST) software package has be-come firmly established in a broad diversity of biological fieldsfrom phylogenetics and paleontology population dynamics

ancient DNA and the phylodynamics and molecular epidemiol-ogy of infectious disease (Drummond et al 2012) BEASTrsquos spe-cific focus on time-scaled trees and the evolutionary analysesdependent on them has given it a unique place in the toolboxof molecular evolution and phylogenetic researchers Since in-ception a strong motivation for BEAST development has been

VC The Author(s) 2018 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

1

Virus Evolution 2018 4(1) vey016

doi 101093vevey016Resources

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

the rapid growth of pathogen genome sequencing as part ofpublic health responses to infectious diseases (Grenfell et al2004) In particular fast evolving viruses can now be tracked innear real-time (see eg Quick et al 2016) to understand their ep-idemiology and evolutionary dynamics

In BEAST version 110 we have introduced a series of advan-ces with a particular focus on delivering accurate and informa-tive insights for infectious disease research through theintegration of diverse data sources including phenotypic andepidemiological information with molecular evolutionarymodels These advances fall into three broad themesmdashthe inte-gration of diverse sources of extrinsic information as covariatesof evolutionary processes the increased flexibility and modula-rization of the model design process with robust and accuratemodel testing methods and substantial improvements on thespeed and efficiency of the statistical inference

2 Data integration

Many traits in phylogenetics are represented as or partitionedinto a finite number of discrete values with geographical loca-tion standing out as a popular example Because BEAST is dedi-cated to sampling time-scaled phylogenies new developmentsof discrete character mapping enable the reconstruction oftimed viral dispersal patterns while accommodating phyloge-netic uncertainty By extending the discrete diffusion models toincorporate empirical data as covariates or predictors of transi-tion rates BEAST can simultaneously test and quantify a rangeof potential predictive variables of the diffusion process (Lemeyet al 2014) Further realizations of the trait transition processcan also be efficiently produced to pinpoint the nature and tim-ing of changes in evolutionary history beyond ancestral nodestate reconstruction (termed Markov jumps) or to infer the timespent in a particular state (Markov rewards) (Minin and Suchard2008) For molecular data fast stochastic mapping approachesare also employed to obtain site-specific dN=dS estimates inte-grating over the posterior distribution of phylogenies and an-cestral reconstructions to quantify uncertainty on thesemeasures of the selective forces on individual codons (Lemeyet al 2012)

Multivariate continuous traits are incorporated using phylo-genetic Brownian diffusion processes modelling the shared an-cestral dependence across taxa and the correlations betweenthese variables Such continuous models have most frequentlybeen applied to diffusion on a geographical landscape with thetraits representing coordinates and the phylogeny reconstruct-ing the epidemiological process within the host population(Lemey et al 2010) The landscapes can also represent otherspaces and integration of antibody binding assay data have ex-tended lsquoantigenic cartographyrsquo (Smith et al 2004) approaches tomodel simultaneous antigenic and genetic evolution and inferthe viral trajectories in the immunological space generated bythe host population (Bedford et al 2014)

Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unreal-istic for many evolutionary problems (including geographicalreconstruction) A recently developed relaxed directional ran-dom walk allows the diffusion processes to take on different di-rectional trends in different parts of the phylogeny whilepreserving model identifiability (Gill et al 2017) and opens upthese processes for a wide range of applications BEAST 110also extends multivariate phylogenetic diffusion to latent liabil-ity model formulations in order to assess correlations betweentraits of different data types including (various combinations

of) continuous binary and discrete traits (Cybis et al 2015) asdemonstrated by applications to flower morphology antibioticresistance and viral epitope evolution To infer correlations be-tween high-dimensional traits computationally efficiently anovel phylogenetic factor analysis approach assumes that asmall unknown number of independent evolutionary factorsevolve along the phylogeny and generate clusters of dependenttraits at the tips (Tolkoff et al 2018)

Further extending the data integration approach BEAST 110

includes a flexible framework for incorporating time-varyingcovariates of the effective population size over time This usesGaussian Markov random fields to reconstruct smoothed effec-tive population size trajectories while simultaneously estimat-ing to what extent predictor variables (eg fluctuations inclimatic factors host mobility or vector density) may havedriven the dynamics (Gill et al 2016) Using a similar general-ized linear modeling (GLM) approach classical epidemiologicaltime-series data such as case counts (Gill et al 2016) can be inte-grated with pathogen genome sequence data to provide joint in-ference of important epidemiological parameters

Finally recent host-transmission models allow the integra-tion of complete or partial knowledge of a pathogenrsquos transmis-sion history enabling the simultaneous inference of within-host population dynamics viral evolutionary processes andtransmission times and bottlenecks (Vrancken et al 2014)Likewise other priors enable the reconstruction of transmissiontrees of infectious disease epidemics and outbreaks while ac-commodating phylogenetic uncertainty and employ a newlydesigned set of phylogenetic tree proposals that respect nodepartitions (Hall et al 2015)

3 Flexible model design

BEASTrsquos companion graphical user interface program BEAUtiallows the user to import data select models choose prior dis-tributions and specify the settings for both Bayesian inferenceand marginal likelihood estimation Our efforts on BEAUti 110have focused on allowing the user to easily link or unlink substi-tution clock and tree models across multiple partitions as well

as linking individual parameters to provide considerable adapt-ability in model design Additionally BEAUti can also group var-ious parameters in a hierarchical phylogenetic model prior(Suchard et al 2003) which allows parameters to take differentvalues but be linked by a common distribution the parametersof which can then be inferred For example flexible codonmodel parameterizations using hierarchical phylogenetic mod-els (Baele et al 2016b) and incorporating a range of potentialpredictive variables for substitution behaviour (Bielejec et al2016a) provide insight into the tempo and mode of pathogenevolution

Marginal likelihood estimation to compare models usingBayes factors has become common practice in Bayesian phylo-genetic inference BEAST 110 now features marginal likelihoodestimation (Baele et al 2012) using path sampling (Gelman andMeng 1998 Lartillot and Philippe 2006) and stepping-stone sam-pling (Xie et al 2011) as well as the recently developed general-ized stepping-stone sampling (Fan et al 2011 Baele et al 2016a)that offers increased accuracy and improved numerical stabilityby employing the concept of lsquoworking distributionsrsquo ie distri-butions with known normalizing constants and parameterizedusing samples from the posterior distribution

2 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

4 Performance and efficiency

Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two

advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V

41 Example

Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities

Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)

trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-

graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors

for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-

rating case counts

M A Suchard et al | 3

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

5 Relationship to BEAST2 and other software

Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains

A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree

51 Availability

BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)

Acknowledgements

We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo

KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research

Conflict of interest None declared

ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30

Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]

Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3

Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67

amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805

amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64

Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005

Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914

Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023

amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9

Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537

Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969

Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73

Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J

4 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15

Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32

Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85

Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319

Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56

Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32

Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613

Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207

Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect

Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56

Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932

amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85

Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95

Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32

Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042

Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6

Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64

To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97

Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99

Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025

Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505

Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60

M A Suchard et al | 5

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

  • l
Page 3: Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

the rapid growth of pathogen genome sequencing as part ofpublic health responses to infectious diseases (Grenfell et al2004) In particular fast evolving viruses can now be tracked innear real-time (see eg Quick et al 2016) to understand their ep-idemiology and evolutionary dynamics

In BEAST version 110 we have introduced a series of advan-ces with a particular focus on delivering accurate and informa-tive insights for infectious disease research through theintegration of diverse data sources including phenotypic andepidemiological information with molecular evolutionarymodels These advances fall into three broad themesmdashthe inte-gration of diverse sources of extrinsic information as covariatesof evolutionary processes the increased flexibility and modula-rization of the model design process with robust and accuratemodel testing methods and substantial improvements on thespeed and efficiency of the statistical inference

2 Data integration

Many traits in phylogenetics are represented as or partitionedinto a finite number of discrete values with geographical loca-tion standing out as a popular example Because BEAST is dedi-cated to sampling time-scaled phylogenies new developmentsof discrete character mapping enable the reconstruction oftimed viral dispersal patterns while accommodating phyloge-netic uncertainty By extending the discrete diffusion models toincorporate empirical data as covariates or predictors of transi-tion rates BEAST can simultaneously test and quantify a rangeof potential predictive variables of the diffusion process (Lemeyet al 2014) Further realizations of the trait transition processcan also be efficiently produced to pinpoint the nature and tim-ing of changes in evolutionary history beyond ancestral nodestate reconstruction (termed Markov jumps) or to infer the timespent in a particular state (Markov rewards) (Minin and Suchard2008) For molecular data fast stochastic mapping approachesare also employed to obtain site-specific dN=dS estimates inte-grating over the posterior distribution of phylogenies and an-cestral reconstructions to quantify uncertainty on thesemeasures of the selective forces on individual codons (Lemeyet al 2012)

Multivariate continuous traits are incorporated using phylo-genetic Brownian diffusion processes modelling the shared an-cestral dependence across taxa and the correlations betweenthese variables Such continuous models have most frequentlybeen applied to diffusion on a geographical landscape with thetraits representing coordinates and the phylogeny reconstruct-ing the epidemiological process within the host population(Lemey et al 2010) The landscapes can also represent otherspaces and integration of antibody binding assay data have ex-tended lsquoantigenic cartographyrsquo (Smith et al 2004) approaches tomodel simultaneous antigenic and genetic evolution and inferthe viral trajectories in the immunological space generated bythe host population (Bedford et al 2014)

Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unreal-istic for many evolutionary problems (including geographicalreconstruction) A recently developed relaxed directional ran-dom walk allows the diffusion processes to take on different di-rectional trends in different parts of the phylogeny whilepreserving model identifiability (Gill et al 2017) and opens upthese processes for a wide range of applications BEAST 110also extends multivariate phylogenetic diffusion to latent liabil-ity model formulations in order to assess correlations betweentraits of different data types including (various combinations

of) continuous binary and discrete traits (Cybis et al 2015) asdemonstrated by applications to flower morphology antibioticresistance and viral epitope evolution To infer correlations be-tween high-dimensional traits computationally efficiently anovel phylogenetic factor analysis approach assumes that asmall unknown number of independent evolutionary factorsevolve along the phylogeny and generate clusters of dependenttraits at the tips (Tolkoff et al 2018)

Further extending the data integration approach BEAST 110

includes a flexible framework for incorporating time-varyingcovariates of the effective population size over time This usesGaussian Markov random fields to reconstruct smoothed effec-tive population size trajectories while simultaneously estimat-ing to what extent predictor variables (eg fluctuations inclimatic factors host mobility or vector density) may havedriven the dynamics (Gill et al 2016) Using a similar general-ized linear modeling (GLM) approach classical epidemiologicaltime-series data such as case counts (Gill et al 2016) can be inte-grated with pathogen genome sequence data to provide joint in-ference of important epidemiological parameters

Finally recent host-transmission models allow the integra-tion of complete or partial knowledge of a pathogenrsquos transmis-sion history enabling the simultaneous inference of within-host population dynamics viral evolutionary processes andtransmission times and bottlenecks (Vrancken et al 2014)Likewise other priors enable the reconstruction of transmissiontrees of infectious disease epidemics and outbreaks while ac-commodating phylogenetic uncertainty and employ a newlydesigned set of phylogenetic tree proposals that respect nodepartitions (Hall et al 2015)

3 Flexible model design

BEASTrsquos companion graphical user interface program BEAUtiallows the user to import data select models choose prior dis-tributions and specify the settings for both Bayesian inferenceand marginal likelihood estimation Our efforts on BEAUti 110have focused on allowing the user to easily link or unlink substi-tution clock and tree models across multiple partitions as well

as linking individual parameters to provide considerable adapt-ability in model design Additionally BEAUti can also group var-ious parameters in a hierarchical phylogenetic model prior(Suchard et al 2003) which allows parameters to take differentvalues but be linked by a common distribution the parametersof which can then be inferred For example flexible codonmodel parameterizations using hierarchical phylogenetic mod-els (Baele et al 2016b) and incorporating a range of potentialpredictive variables for substitution behaviour (Bielejec et al2016a) provide insight into the tempo and mode of pathogenevolution

Marginal likelihood estimation to compare models usingBayes factors has become common practice in Bayesian phylo-genetic inference BEAST 110 now features marginal likelihoodestimation (Baele et al 2012) using path sampling (Gelman andMeng 1998 Lartillot and Philippe 2006) and stepping-stone sam-pling (Xie et al 2011) as well as the recently developed general-ized stepping-stone sampling (Fan et al 2011 Baele et al 2016a)that offers increased accuracy and improved numerical stabilityby employing the concept of lsquoworking distributionsrsquo ie distri-butions with known normalizing constants and parameterizedusing samples from the posterior distribution

2 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

4 Performance and efficiency

Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two

advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V

41 Example

Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities

Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)

trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-

graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors

for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-

rating case counts

M A Suchard et al | 3

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

5 Relationship to BEAST2 and other software

Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains

A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree

51 Availability

BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)

Acknowledgements

We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo

KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research

Conflict of interest None declared

ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30

Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]

Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3

Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67

amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805

amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64

Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005

Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914

Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023

amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9

Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537

Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969

Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73

Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J

4 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15

Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32

Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85

Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319

Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56

Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32

Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613

Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207

Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect

Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56

Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932

amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85

Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95

Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32

Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042

Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6

Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64

To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97

Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99

Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025

Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505

Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60

M A Suchard et al | 5

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

  • l
Page 4: Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

4 Performance and efficiency

Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two

advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V

41 Example

Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities

Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)

trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-

graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors

for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-

rating case counts

M A Suchard et al | 3

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

5 Relationship to BEAST2 and other software

Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains

A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree

51 Availability

BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)

Acknowledgements

We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo

KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research

Conflict of interest None declared

ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30

Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]

Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3

Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67

amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805

amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64

Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005

Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914

Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023

amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9

Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537

Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969

Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73

Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J

4 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15

Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32

Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85

Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319

Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56

Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32

Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613

Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207

Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect

Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56

Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932

amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85

Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95

Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32

Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042

Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6

Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64

To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97

Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99

Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025

Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505

Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60

M A Suchard et al | 5

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

  • l
Page 5: Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

5 Relationship to BEAST2 and other software

Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains

A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree

51 Availability

BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)

Acknowledgements

We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo

KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research

Conflict of interest None declared

ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30

Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]

Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3

Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67

amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805

amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64

Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005

Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914

Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023

amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9

Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537

Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969

Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73

Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J

4 | Virus Evolution 2018 Vol 4 No 1

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15

Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32

Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85

Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319

Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56

Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32

Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613

Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207

Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect

Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56

Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932

amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85

Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95

Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32

Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042

Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6

Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64

To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97

Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99

Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025

Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505

Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60

M A Suchard et al | 5

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

  • l
Page 6: Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes

Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15

Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32

Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85

Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319

Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56

Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32

Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613

Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207

Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect

Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56

Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932

amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85

Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95

Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32

Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042

Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6

Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64

To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97

Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99

Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025

Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505

Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60

M A Suchard et al | 5

Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018

  • l