12
Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, Belgium, and Departments of Biomathematics, Biostatistics and Human Genetics University of California, Los Angeles SISMID Lemey and Suchard (UCLA) SISMID 1 / 16 Review: Continuous-Time Coalescent Time measured in N generation units N = const u k Exp ( k 2 ) N = N (t ) Pr(u k > t |t k +1 )= e -( k 2 ) R t +t k +1 t k +1 N N(u) du u k are not independent any more Constant population size Exponential growth Lemey and Suchard (UCLA) SISMID 2 / 16

Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Non-Parametric Bayesian Population DynamicsInference

Philippe Lemey and Marc A. Suchard

Department of Microbiology and ImmunologyK.U. Leuven, Belgium, and

Departments of Biomathematics, Biostatistics and Human GeneticsUniversity of California, Los Angeles

SISMID

Lemey and Suchard (UCLA) SISMID 1 / 16

Review: Continuous-Time Coalescent

Time measured in N generation unitsN = const ! uk " Exp

!"k2#$

N = N(t) !

Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1

NN(u)du

uk are not independent any more

Constantpopulation size

Exponentialgrowth

Lemey and Suchard (UCLA) SISMID 2 / 16

Page 2: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Review: Continuous-Time Coalescent

Time measured in N generation unitsN = const ! uk " Exp

!"k2#$

N = N(t) !

Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1

NN(u)du

uk are not independent any more

Constantpopulation size

Exponentialgrowth

Lemey and Suchard (UCLA) SISMID 2 / 16

Review: Continuous-Time Coalescent

Time

t4

t3

t2

t1

u4

u3

u2

Time measured in N generation unitsN = const ! uk " Exp

!"k2#$

N = N(t) !

Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1

NN(u)du

uk are not independent any more

Constantpopulation size

Exponentialgrowth

Lemey and Suchard (UCLA) SISMID 2 / 16

Page 3: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Review: Continuous-Time Coalescent

Time

t4

t3

t2

t1

u4

u3

u2

Time measured in N generation unitsN = const ! uk " Exp

!"k2#$

N = N(t) !

Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1

NN(u)du

uk are not independent any more

Constantpopulation size

Exponentialgrowth

N(t) = N

N(t) = Ne−100t

Lemey and Suchard (UCLA) SISMID 2 / 16

Sequence Data ! Population Model Parameters

accggaaacgcgcgaaatttacacgggggaccggaaacgcgcgaaatttacacggggg

accggaaacgcgcgaaatttacacgggggaccggaaacgcgcgaaatttacacgggggaccggaaacgcgcgaaatttacacggggg N(t)

Time

GenealogySequence Data Pop. Dynamics

More Formally (Bayesian Approach):Pr (G, Q,! |D) # Pr (D |G, Q) Pr (Q) Pr (G |!) Pr (!)

G - genealogy with branch lengthsQ - substitution matrix! - population genetics parametersD - sequence dataPr (G |!) - Coalescent prior

Lemey and Suchard (UCLA) SISMID 3 / 16

Page 4: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Piecewise Constant Demographic Model

u3 u4

t2t1 t3 t4 t5= 0

u2

42Number of Lineages: 53

5u

Isochronous DataNe(t) = !k for tk < t $ tk!1.

u2, . . . ,un are independent

Pr (uk | !k ) = k(k!1)2!k

e!k(k!1)uk

2!k

Pr (F |!) #%n

k=2 Pr (uk | !k )

Equivalent to estimating exponential mean from one observation.

Need further restrictions to estimate !!Lemey and Suchard (UCLA) SISMID 4 / 16

Piecewise Constant Demographic Model

Number of Lineages:

41 w51w50 w52

u3 u4 u5

w30 w40

u2

w20

343432 1

w

Heterochronous Dataw20, . . . ,wnjn are independent

Pr(wk0 | !k )=nk0(nk0!1)

2!ke!

nk0(nk0!1)wk02!k

Pr(wkj | !k)=e!

nkj (nkj!1)wkj2!k , j>0

Pr (F |!) "Qn

k=2Qjk

j=0 Pr`

wkj | !k´

Equivalent to estimating exponential mean from one observation.

Need further restrictions to estimate !!Lemey and Suchard (UCLA) SISMID 4 / 16

Page 5: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Piecewise Constant Demographic Model

Number of Lineages:

41 w51w50 w52

u3 u4 u5

w30 w40

u2

w20

343432 1

w

Heterochronous Dataw20, . . . ,wnjn are independent

Pr(wk0 | !k )=nk0(nk0!1)

2!ke!

nk0(nk0!1)wk02!k

Pr(wkj | !k)=e!

nkj (nkj!1)wkj2!k , j>0

Pr (F |!) "Qn

k=2Qjk

j=0 Pr`

wkj | !k´

Equivalent to estimating exponential mean from one observation.

Need further restrictions to estimate !!Lemey and Suchard (UCLA) SISMID 4 / 16

Current Approaches

Strimmer and Pybus (2001)Make Ne(t) constant across some inter-Coalescent timesGroup inter-Coalescent intervals with AIC

Drummond et al. (2005)Multiple change-point model with fixed number of change-pointsChange-points allowed only at Coalescent eventsJoint estimation of phylogenies and population dynamics

Opgen-Rhein et al. (2005)Multiple change-point model with random number ofchange-pointsChange-points allowed anywhere in interval (0, t1]Posterior is approximated with rjMCMC

Lemey and Suchard (UCLA) SISMID 5 / 16

Page 6: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Current Approaches

Strimmer and Pybus (2001)Make Ne(t) constant across some inter-Coalescent timesGroup inter-Coalescent intervals with AIC

Drummond et al. (2005)Multiple change-point model with fixed number of change-pointsChange-points allowed only at Coalescent eventsJoint estimation of phylogenies and population dynamics

Opgen-Rhein et al. (2005)Multiple change-point model with random number ofchange-pointsChange-points allowed anywhere in interval (0, t1]Posterior is approximated with rjMCMC

Lemey and Suchard (UCLA) SISMID 5 / 16

Current Approaches

Strimmer and Pybus (2001)Make Ne(t) constant across some inter-Coalescent timesGroup inter-Coalescent intervals with AIC

Drummond et al. (2005)Multiple change-point model with fixed number of change-pointsChange-points allowed only at Coalescent eventsJoint estimation of phylogenies and population dynamics

Opgen-Rhein et al. (2005)Multiple change-point model with random number ofchange-pointsChange-points allowed anywhere in interval (0, t1]Posterior is approximated with rjMCMC

Lemey and Suchard (UCLA) SISMID 5 / 16

Page 7: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Smoothing Prior (GMRF approach)Go to the log scale xk = log !k

Pr(x |") # "(n!2)/2 exp&

%"

2

n!2'

k=1

1dk

(xk+1 % xk )2(

31x x4 xn!2

xn!1

d1 d2 d3 dn−2

x2 x

Weighting Schemes1 Uniform: dk = 12 Time-Aware: dk = uk+1+uk

2

Pr(x,") = Pr(x |")Pr(")

Pr(") # ""!1e!#$, diffuse prior with # = 0.01, $ = 0.01Lemey and Suchard (UCLA) SISMID 6 / 16

MCMC Algorithm

Pr (G, Q, x |D) # Pr (D |G, Q) Pr (Q) Pr (G |x) Pr (x)

Updating Population Size TrajectoryUse fast GMRF sampling (Rue et al., 2001, 2004)Draw "# from an arbitrary univariate proposal distributionUse Gaussian approximation of Pr(x |"#, G) to propose x#

Jointly accept/reject ("#, x#) in Metropolis-Hastings step

Object-Oriented Reality?

BEAST = Bayesian Evolutionary Analysis Sampling Trees

Pr(G |x, D, Q) - sampled by BEASTPr(Q |G, D) - sampled by BEAST

Lemey and Suchard (UCLA) SISMID 7 / 16

Page 8: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Simulation: Constant Population SizeClassical Skyline Plot

Effe

ctive

Pop

ulat

ion

Size

1.5 1.0 0.5 0.0

0.1

1.0

5.0

ORMCP Model

1.5 1.0 0.5 0.0

0.1

1.0

5.0

Beast MCP

1.5 1.0 0.5 0.0

0.1

1.0

5.0

Uniform GMRF

Time (Past to Present)

Effe

ctive

Pop

ulat

ion

Size

1.5 1.0 0.5 0.0

0.1

1.0

5.0

Time−Aware GMRF

Time (Past to Present)1.5 1.0 0.5 0.0

0.1

1.0

5.0

Beast GMRF

Time (Past to Present)1.5 1.0 0.5 0.0

0.1

1.0

5.0

Lemey and Suchard (UCLA) SISMID 8 / 16

Simulation: Exponential GrowthClassical Skyline Plot

Effe

ctive

Pop

ulat

ion

Size

0.014 0.010 0.006 0.002

0.00

10.

11.

0

ORMCP Model

0.014 0.010 0.006 0.002

0.00

10.

11.

0

Beast MCP

0.014 0.010 0.006 0.002

0.00

10.

11.

0

Uniform GMRF

Time (Past to Present)

Effe

ctive

Pop

ulat

ion

Size

0.014 0.010 0.006 0.002

0.00

10.

11.

0

Time−Aware GMRF

Time (Past to Present)0.014 0.010 0.006 0.002

0.00

10.

11.

0

Beast GMRF

Time (Past to Present)0.014 0.010 0.006 0.002

0.00

10.

11.

0

Lemey and Suchard (UCLA) SISMID 9 / 16

Page 9: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Simulation: Exponential Growth with BottleneckClassical Skyline Plot

Effe

ctive

Pop

ulat

ion

Size

0.15 0.10 0.05 0.00

0.00

10.

011.

0

ORMCP Model

0.15 0.10 0.05 0.00

0.00

10.

011.

0

Beast MCP

0.15 0.10 0.05 0.00

0.00

10.

011.

0

Uniform GMRF

Time (Past to Present)

Effe

ctive

Pop

ulat

ion

Size

0.15 0.10 0.05 0.00

0.00

10.

011.

0

Time−Aware GMRF

Time (Past to Present)0.15 0.10 0.05 0.00

0.00

10.

011.

0Beast GMRF

Time (Past to Present)0.15 0.10 0.05 0.00

0.00

10.

011.

0Lemey and Suchard (UCLA) SISMID 10 / 16

Accuracy in Simulations

Percent Error =

) TMRCA

0

|*Ne(t) % Ne(t)|Ne(t)

dt & 100, (1)

Table: Percent error in simulations. We compare percent errors, defined inequation (1), for the Opgen-Rhein multiple change-point (ORMCP), uniformand fixed-tree time-aware Gaussian Markov random field (GMRF) smoothing,BEAST multiple change-point (MCP) model, and BEAST GMRF smoothing.

Model Constant Exponential BottleneckORMCP 14.0 1.7 7.4Uniform GMRF 32.8 1.5 5.9Time-Aware GMRF 2.8 1.2 4.8BEAST MCP 38.2 1.6 5.2BEAST GMRF 1.7 1.0 5.4

Lemey and Suchard (UCLA) SISMID 11 / 16

Page 10: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

GMRF Precision Prior Sensitivity" - GMRF precision, controls smoothnessUsually Pr(" |D) is sensitive to perturbations of Pr(")

Not in our Coalescent model!

GMRF Precision Prior and Posterior

log τ

Dens

ity0.

00.

20.

40.

60.

8

−10 −8 −6 −4 −2 0 2

prior

posterior

1 2 5 10 100−6

−5−4

−3

GMRF Precision Sensitiviy to Prior

Prior Mean of τ

Post

erio

r of l

og τ

Lemey and Suchard (UCLA) SISMID 12 / 16

HCV Epidemics in EgyptEstimated Genealogy BEAST GMRF

Scal

ed E

ffect

ive

Pop.

Siz

e

250 150 50 0

1010

210

310

410

5

Unconstrained Fixed−Tree GMRF

Time (Years Before 1993)

Scal

ed E

ffect

ive

Pop.

Siz

e

250 150 50 0

1010

210

310

410

5

Constrained Fixed−Tree GMRF

Time (Years Before 1993)250 150 50 0

1010

210

310

410

5

PAT Starts

Random populationsampleNo sign of populationsub-structure

Parenteralantischistosomaltherapy (PAT) waspracticed from 1920sto 1980s

Bayes Factor 12,880in favor of constantpopulation size priorto 1920

Lemey and Suchard (UCLA) SISMID 13 / 16

Page 11: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Influenza Intra-Season Population Dynamics1999−2000 Season

Time (Weeks before 03/01/00)

Scal

ed E

ffect

ive

Pop.

Siz

e

40 30 20 10 0

1010

010

310

4

October 1

Time (Weeks before 03/01/00)100 80 60 40 20 0

1010

010

310

4

2001−2002 Season

Time (Weeks before 03/21/02)

Scal

ed E

ffect

ive

Pop.

Siz

e

60 50 40 30 20 10 0

1010

010

310

4

October 1

Time (Weeks before 03/21/02)70 50 30 10 0

1010

010

310

4New York state hemagglutinin sequences serially sampled(Ghedin et al., 2005)

Lemey and Suchard (UCLA) SISMID 14 / 16

Influenza Intra-Season Population Dynamics2001−2002 Season

Time (Weeks before 03/21/02)

Scal

ed E

ffect

ive

Pop.

Siz

e

60 50 40 30 20 10 0

1010

010

310

4

October 1

Time (Weeks before 03/21/02)70 50 30 10 0

1010

010

310

4

2003−2004 Season

Time (Weeks before 02/05/04)

Scal

ed E

ffect

ive

Pop.

Siz

e

80 60 40 20 0

110

100

103

104

105

October 1

Time (Weeks before 02/05/04)50 40 30 20 10 0

110

100

103

104

105

New York state hemagglutinin sequences serially sampled(Ghedin et al., 2005)

Lemey and Suchard (UCLA) SISMID 14 / 16

Page 12: Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U

Summary

Genealogies inform us about population size trajectoriesPrior restrictions are necessary for non(semi)-parametricestimation of Ne(t)Smoothing can be imposed by GMRF priors

Software: The Skyride

Implemented as a Coalescent prior inBEASTExploits approximate Gibbs samplingFaster convergence? Better mixing?

Reference: Minin, Bloomquist and Suchard (2008) Molecular Biology &Evolution, 25, 1459–1471.

Lemey and Suchard (UCLA) SISMID 15 / 16

Active Ideas: GMRFs are Highly Generalizable

Hierarchical Modeling

Flu genes display similar (not equal) dynamics

Incorporate multiple locisimultaneouslyPool information for statistical powerNo need for strict equality

Introducing Covariates

Augment field at fixed observation timesFormal statistical testing for:

External factors (environment, drug tx)Population dynamics (bottle-necks, growth)

Lemey and Suchard (UCLA) SISMID 16 / 16