19
This content has been downloaded from IOPscience. Please scroll down to see the full text. Download details: IP Address: 64.106.42.43 This content was downloaded on 03/02/2014 at 03:23 Please note that terms and conditions apply. Lineage dynamics and mutation–selection balance in non-adapting asexual populations View the table of contents for this issue, or go to the journal homepage for more J. Stat. Mech. (2013) P01013 (http://iopscience.iop.org/1742-5468/2013/01/P01013) Home Search Collections Journals About Contact us My IOPscience

Lineage dynamics and mutation–selection balance in non-adapting asexual populations

Embed Size (px)

Citation preview

This content has been downloaded from IOPscience. Please scroll down to see the full text.

Download details:

IP Address: 64.106.42.43

This content was downloaded on 03/02/2014 at 03:23

Please note that terms and conditions apply.

Lineage dynamics and mutation–selection balance in non-adapting asexual populations

View the table of contents for this issue, or go to the journal homepage for more

J. Stat. Mech. (2013) P01013

(http://iopscience.iop.org/1742-5468/2013/01/P01013)

Home Search Collections Journals About Contact us My IOPscience

J.Stat.M

ech.(2013)P

01013

ournal of Statistical Mechanics:J Theory and Experiment

Lineage dynamics andmutation–selection balance innon-adapting asexual populations

Sophie Penisson1, Paul D Sniegowski2, Alexandre Colato3

and Philip J Gerrish4,5

1 Universite Paris-Est, LAMA (UMR 8050), UPEMLV, UPEC, CNRS,94010 Creteil, France2 Department of Biology, 213 Leidy Laboratories, University of Pennsylvania,Philadelphia, PA 19104, USA3 Departamento de Ciencias da Natureza, Matematica e Educacao, FederalUniversity of Sao Carlos, UFSCar Araras, Brazil4 Center for Evolutionary and Theoretical Immunology, Department ofBiology, 1 University of New Mexico, 230 Castetter Hall, MSC03-2020,Albuquerque, NM 87131, USA5 Centro de Matematica e Aplicacoes Fundamentais, Departamento deMatematica, Faculdade de Ciencias, Universidade de Lisboa, AvenidaProfessor Gama Pinto, 2, 1649-003 Lisbon, Portugal

E-mail: [email protected], [email protected],[email protected] and [email protected]

Received 7 September 2012Accepted 20 January 2013Published 25 February 2013

Online at stacks.iop.org/JSTAT/2013/P01013doi:10.1088/1742-5468/2013/01/P01013

Abstract. In classical population genetics, mutation–selection balance refersto the equilibrium frequency of a deleterious allele established and maintainedunder two opposing forces: recurrent mutation, which tends to increase thefrequency of the allele; and selection, which tends to decrease its frequency.In a haploid population, if µ denotes the per capita rate of production ofthe deleterious allele by mutation and s denotes the selective disadvantage ofcarrying the allele, then the classical mutation–selection balance frequency of theallele is approximated by µ/s. This calculation assumes that lineages carryingthe mutant allele in question—the ‘focal allele’—do not accumulate deleteriousmutations linked to the focal allele. In principle, indirect selection against the

c© 2013 IOP Publishing Ltd and SISSA Medialab srl 1742-5468/13/P01013+18$33.00

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

focal allele caused by such additional mutations can decrease the frequency ofthe focal allele below the classical mutation–selection balance. This effect ofindirect selection will be strongest in an asexual population, in which the entiregenome is in linkage. Here, we use an approach based on a multitype branchingprocess to investigate this effect, analyzing lineage dynamics under mutation,direct selection, and indirect selection in a non-adapting asexual population.We find that the equilibrium balance between recurrent mutation to the focalallele and the forces of direct and indirect selection against the focal alleleis closely approximated by γµ/(s + U) (s = 0 if the focal allele is neutral),where γ ≈ eθθ−(ω+θ)(ω + θ) (Γ(ω + θ)− Γ(ω + θ, θ)), θ = U/s, and ω = s/s; Udenotes the genomic deleterious mutation rate and s denotes the geometricmean selective disadvantage of deleterious mutations elsewhere on the genome.This mutation–selection balance for asexual populations can remain surprisinglyinvariant over wide ranges of the mutation rate.

Keywords: mutational and evolutionary processes (theory), populationdynamics (theory)

Contents

1. Introduction 3

2. Theory 3

2.1. Verbal description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1. The focal allele.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2. Assumptions and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3. The model and classification. . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1. The offspring probability distribution and generating function. . . . . 5

2.3.2. The mean matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.3. Classification of the types. . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4. Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4.1. Extinction probabilities. . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.2. Sojourn times: tail behavior.. . . . . . . . . . . . . . . . . . . . . . . 8

2.4.3. Sojourn times: half-lives. . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.4. The total number of descendants. . . . . . . . . . . . . . . . . . . . . 10

2.4.5. Fitness dynamics and the single-type branching processapproximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.6. Mutation–selection balance in asexual populations. . . . . . . . . . . 14

3. Discussion 16

3.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2. Background selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Acknowledgments 17

doi:10.1088/1742-5468/2013/01/P01013 2

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Appendix. Computation of the cumulative distribution function of theextinction time 17

References 18

1. Introduction

Classical population genetics theory was derived largely under the assumption that allelefrequencies at an individual genetic locus are affected only by evolutionary forces actingon that locus. This was a reasonable assumption in the days of Mendelian genetics, whenrelatively few genetic loci had as yet been inferred and in only a few model organisms,and most such loci segregated independently or were loosely linked. The discovery of DNAstructure and the rise of sequencing, however, brought the realization that genomes containvery large numbers of segregating loci (for example, nucleotide sites) in tight linkage toone another. Accordingly, molecular population genetics theory has increasingly focusedon the effects of linkage on the response to selection and on levels of genetic variation [2,14, 11]—a topic presaged by classic early work [13]. At the same time, the rapid growth ofexperimental evolution studies on asexual populations [5] has inspired the development ofa somewhat related body of theory concerned with the dynamics of selection in the totalabsence of recombination [8, 4, 6, 18, 16].

In this paper, we explore a topic that has received little attention in the theoreticalliterature on evolution at linked sites: how the frequencies of newly arising neutral anddeleterious alleles in a population are affected by the subsequent accumulation of linkeddeleterious mutations. In particular, we (1) characterize the distribution of the time toextinction for lineages of newly arisen neutral and deleterious molecular variants (alleles)that are accumulating additional deleterious mutations, (2) derive expressions for the totalcumulative number of members of such lineages, and (3) derive mutation–selection balancefrequencies for the neutral and deleterious alleles in question. Our derivations make theassumptions of haploidy and complete linkage (asexuality); we briefly discuss possibleextensions to diploid or polyploid organisms and to recombination and sex. We note thatthe phenomenon that we explore is closely related to the concept of ‘background selection’in molecular population genetics [3, 2, 14]. We avoid using this term in developing thetheory, however, because in its original formulation at least [3], background selection refersto the effects of the pre-existing linked genetic background on which a new allele arises,whereas the phenomenon that we model here is concerned with how the linked geneticbackground changes after the appearance of a new allele. More recent work generalizesthe background selection concept and refers to the ancestral distribution as the initialbackground upon which forward evolution proceeds [12]. (How fitness distributions changein forward time in haploid asexual populations is also analyzed in [9].) We consider therelationship between forward-time accumulation of deleterious mutations and backgroundselection effects in some more detail in the discussion.

2. Theory

2.1. Verbal description

We consider a population in which recurrent mutation at a particular locus of interestproduces new ‘originals’ of a focal allele that are then copied through reproduction. The

doi:10.1088/1742-5468/2013/01/P01013 3

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

focal allele may be either neutral or deleterious, and we are interested in calculatingits equilibrium frequency in the population. Each mutational event giving rise to a neworiginal of the focal allele may be thought of as founding an allelic lineage whose sizeand sojourn time indicate the ultimate contribution of that particular mutational eventto the population frequency of the focal allele. We employ a multitype branching processapproach to model the dynamics of such allelic lineages in a haploid asexual populationin which the mode of reproduction is cell division (e.g., of bacteria or somatic cells). Theinitial condition of the process is a single individual carrying a newly arisen focal allele—the founder of an allelic lineage. This original individual, and each of its descendants,produces a random number of offspring whose mean is a function of fitness. Each offspringhas a chance, determined by the genomic mutation rate, of carrying more mutations thanits parent, and we assume that all such additional mutations are deleterious. Because themodel is asexual, all such deleterious mutations remain linked to the genome in whichthey have arisen, thereby decreasing the fitness of the particular copy of the focal alleleharbored by that genome. We define a focal allelic ‘mutant class’ or ‘type’ by the numberof such additional deleterious mutations it carries. Thus, the overall frequency of the focalallele in the population will be affected by the rate of recurrent mutation producing neworiginals of the focal allele, balanced by the direct selection (if any) against the fitnesseffect of the focal allele itself as well as indirect selection against the focal allele causedby linked deleterious mutations.

2.1.1. The focal allele. When we say that the focal allele is ‘neutral’ or ‘deleterious’, wemean that when the focal allele appears by mutation on a genome, the resulting mutantgenome is neutral or deleterious relative to the rest of the population. A neutral allele isdefined as one whose discrete-time growth rate is exactly 1 at the time of appearance ofthe allele (or whose continuous-time growth rate is exactly 0 at the time of appearanceof the allele). Likewise, a deleterious allele is defined as one whose discrete-time growthrate is exactly 1 − s at the time of appearance of the allele. We note that this definitionsays nothing about the subsequent growth rate of the lineage formed by this allele. Whilethere are different ways to interpret our definition of ‘neutral’ or ‘deleterious’, we adhereto the standard definition: a ‘neutral’ allele does not change the fitness of its bearer, anda ‘deleterious’ allele changes the fitness of its bearer by a factor of 1 − s. Adherence tothese standard definitions of ‘neutral’ and ‘deleterious’, however, requires an assumptionthat a new allele arises on an average genetic background whose fitness equals the meanfitness of the population. This assumption may seem restrictive, but the focus of thiswork is not the genetic background upon which an allele arises (this is the arena of the‘background selection’ literature) but what happens to the allele after it has arisen. Thegenetic background in our model is, in this sense, a given. In the discussion, we outlineone way to couple our results with background selection.

2.2. Assumptions and notation

The branching process begins with a single individual in which a new original of the focalallele has arisen by mutation. The focal allele itself confers selective disadvantage s > 0on this individual or has no effect on fitness (s = 0). The new original of the focal allelegives rise to an allelic lineage whose members acquire, and remain linked to, deleterious

doi:10.1088/1742-5468/2013/01/P01013 4

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

mutations elsewhere on their genomes at per capita rate U (the genomic deleteriousmutation rate); the selective disadvantage of such additional deleterious mutations hasgeometric mean s. Beneficial mutations are assumed to be absent. For now, we makean additional assumption that each new original of the focal allele arises on a geneticbackground of average fitness, such that the individual in which the focal allele hasarisen has relative fitness 1 − s. For relaxation of this assumption, see the section aboutbackground selection in the discussion.

The largest possible growth rate of an allelic lineage occurs when the focal allele isitself neutral (s = 0) and no deleterious mutations occur in any of the descendants of theoriginal genome that harbored the new focal allele (U = 0). The branching process formedunder these conditions is critical, and the branching processes studied here are thereforecritical at best.

The founding individual in which the focal allele first appears by mutation is of type0. There exist d deleterious mutant classes or types that may ultimately arise from thefounding type (types 1, . . . , d). A mutant individual of type i results from mutation oftype i − 1. An individual of type i < d produces descendants of either type i or typei + 1; an individual of type d only produces descendants of type d. The probabilitiesthat a mutation does and does not occur during the production of one descendant are,respectively, φ = 1− e−U and 1− φ = e−U , where U is the genomic deleterious mutationrate.

We assume that reproduction is due to cell division, and hence each individualproduces two descendants; however, in order to keep population size constant, we assumethat each descendant might or might not survive. In the model, an individual thus produceszero, one or two descendants. The probability for a descendant of type i to survive isci/2, where ci ∈]0, 2[; we assume that the mutant types are less fit as their number ofmutations increases, and hence we assume that cd < · · · < c0. While we will keep most ofour expressions in general form (i.e., in terms of ci), our results will employ the assumptionthat ci = (1− s)(1− s)i, for s, s ∈]0, 1[.

We denote by Xn := (Xn,0, . . . , Xn,d) the composition of the population at each timen ∈ N, where Xn,i denotes the number of mutants of type i in the population at timen. Under the previous assumptions, the process (Xn)n∈N is a branching process withd + 1 types. We refer the reader to Athreya and Ney [1] for a detailed description ofsuch processes. We denote by 0 the null vector in Rd+1, and by ei the ith basis vector inRd+1. A vector x as a subscript on P (resp. E) denotes a probability (resp. mean value)

conditionally on the event {X0 = x}. We write xy :=∏di=0x

yii .

2.3. The model and classification

2.3.1. The offspring probability distribution and generating function. We compute theoffspring probability distribution (pi(k))k∈Nd+1 for each i = 0, . . . , d, where for eachk ∈ Nd+1, pi(k) denotes the probability for an individual of type i to produce k0 individualsof type 0, k1 individuals of type 1, etc.

First, for each i = 0, . . . , d − 1, we know that pi(k) > 0 if and only if kj = 0 for allj 6∈ {i, i+1} and (ki, ki+1) ∈ {(0, 0), (1, 0), (0, 1), (2, 0), (0, 2), (1, 1)}. To simplify notation,

we write p(0,0)i , p

(1,0)i , p

(0,1)i etc, where p

(k,l)i corresponds to the probability for an individual

of type i to produce k individuals of type i and l individuals of type i+ 1. We thus have,

doi:10.1088/1742-5468/2013/01/P01013 5

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

for all i = 0, . . . , d−1, p(0,0)i +p

(1,0)i +p

(0,1)i +p

(0,2)i +p

(0,2)i +p

(1,1)i = 1. For instance, the zero

class p(0,0)i is calculated as a sum of three probabilities: (1) the probability that neither of

the descendants has mutated and neither of them survives, (2) the probability that onehas mutated and neither the mutant nor the non-mutant survives, and (3) the probabilitythat both have mutated and neither of these mutants survives. Consequently,

p(0,0)i = (1− φ)2

(1− ci

2

)2

+ 2φ(1− φ)(

1− ci2

)(1− ci+1

2

)+ φ2

(1− ci+1

2

)2

=[(1− φ)

(1− ci

2

)+ φ

(1− ci+1

2

)]2

. (1)

We similarly obtain that

p(1,0)i = (1− φ)ci

[(1− φ)

(1− ci

2

)+ φ

(1− ci+1

2

)],

p(0,1)i = φci+1

[(1− φ)

(1− ci

2

)+ φ

(1− ci+1

2

)],

p(2,0)i = (1− φ)2

(ci2

)2

, p(0,2)i = φ2

(ci+1

2

)2

, p(1,1)i = 2φ(1− φ)

ci2

ci+1

2.

(2)

The offspring generating function defined on [0, 1]d+1 by fi(Φ) := Eei(ΦX1) =∑

k∈Nd+1pi(k)Φk is thus

fi(Φ) = p(0,0)i + p

(1,0)i Φi + p

(0,1)i Φi+1 + p

(2,0)i Φ2

i + p(0,2)i Φ2

i+1 + p(1,1)i ΦiΦi+1

=[(1− φ)

(1− ci

2(1− Φi)

)+ φ

(1− ci+1

2(1− Φi+1)

)]2

. (3)

Let us finally consider the offspring probability distribution (pd(k))k∈Nd+1 . We know byassumption that the last mutant type only produces descendants of its own type d, namely0, 1 or 2. We thus have pd(k) > 0 if and only if ki = 0 for all i < d and kd ∈ {0, 1, 2}. Letus denote for simplicity by pkd the probability for an individual of type d to produce kindividuals of type d. We obtain, similarly to before,

p0d =

(1− cd

2

)2

, p1d = cd

(1− cd

2

), p2

d =(cd

2

)2

, (4)

and thus

fd(Φ) = p0d + p1

dΦd + p2dΦ

2d =

[1− cd

2(1− Φd)

]2

. (5)

Now that the offspring generating function f := (f0, . . . , fd) is known, we can deducethe generating function fn of the process (Xn)n≥0 at time n, satisfying, for each x ∈ Nd+1,

Ex(ΦXn) = [fn(Φ)]x. (6)

It is obtained as the nth iterate of f iteratively by setting, for all n ∈ N, fn := f ◦ fn−1,and f1 := f . We define fn := (fn,0, . . . , fn,d), where for all Φ ∈ [0, 1]d+1 and all i = 0, . . . , d,fn,i(Φ) = Eei

(ΦXn).

2.3.2. The mean matrix. Let us compute M := [mij]0≤i,j≤d, the so-called mean matrixof the process, where for each i, j = 0, . . . , d, mij denotes the mean number of individualsof type j produced by an individual of type i. By definition, we thus have mij :=

doi:10.1088/1742-5468/2013/01/P01013 6

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance∑k∈Nd+1kjpi(k). For all i = 0, . . . , d− 1,

mii :=∑

k∈Nd+1

kipi(k) = p(1,0)i + p

(1,1)i + 2p

(2,0)i = (1− φ)ci, (7)

mi,i+1 :=∑

k∈Nd+1

ki+1pi(k) = p(0,1)i + p

(1,1)i + 2p

(0,2)i = φci+1. (8)

Moreover,

mdd :=∑

k∈Nd+1

kdpd(k) = p1d + 2p2

d = cd, (9)

and mij = 0 otherwise. Consequently, the mean matrix of the multitype branching processis given by the following reducible matrix:

M =

(1− φ)c0 φc1 0 0 . . . 0

0 (1− φ)c1 φc2 0 . . . 0

0 0. . . . . .

......

. . . . . . 0... 0 (1− φ)cd−1 φcd0 0 . . . 0 cd

. (10)

Finally, we denote, for each n ∈ N, by m(n)ij the entries of the nth power Mn of the

mean matrix. It is satisfied, for each x ∈ Nd+1, that Ex (Xn) = xMn, and in particular

Eei(Xn,j) = m

(n)ij .

2.3.3. Classification of the types. The eigenvalues of M are {(1 − φ)c0, . . . , (1 −φ)cd−1, cd}, and hence its Perron–Frobenius root is ρ(M) := (1 − φ)c0 < 1 and theprocess is noncritical. We will thus make use of the results presented by Ogura [15]dealing with reducible noncritical aperiodic multitype branching processes. We firstcheck that the two assumptions called (D) and (DN) in [15] are verified. Denotingby q the extinction probability vector (see below), we have indeed qi > 0 for eachi = 0, . . . , d, and for each i, j = 0, . . . , d, (∂fi/∂Φj)(q) = mij < ∞. Moreover, for all

i, j = 0, . . . , d,∑

k∈Nd+1kj ln kj pi(k) <∞ since for i < d,∑

k∈Nd+1ki ln ki pi(k) = 2 ln 2 p(2,0)i ,∑

k∈Nd+1ki+1 ln ki+1 pi(k) = 2 ln 2 p(0,2)i ,

∑k∈Nd+1kd ln kd pd(k) = 2 ln 2 p2

d, and the sum isnull otherwise. Following the classification of Ogura, each type i = 0, . . . , d then has agiven rank νi which plays a role in the asymptotic behavior of the offspring generatingfunction, as we will see later. In this branching model, we have

νi = d− i. (11)

2.4. Analyses

In what follows, we give results for lineages of two types, descended from an original of thefocal allele that arose by mutation: (1) the total descendant lineage for a given mutationto the focal allele, which is comprised of all of the descendants of the original of thefocal allele regardless of whether their genomes have accumulated additional mutations;

doi:10.1088/1742-5468/2013/01/P01013 7

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

and (2) the clean descendant sublineage for a given mutation to the focal allele, whichis comprised only of those descendants whose genomes have not accumulated additionalmutations beyond those already present when the focal allele arose.

2.4.1. Extinction probabilities. It is shown by Sewastjanow [17] that the extinctionprobability vector q of (Xn)n∈N defined by qi := Pei

(limn→∞Xn = 0) for each i = 0, . . . , dis the least nonnegative fixed point of f in [0, 1]d+1. From the definition (3)–(5) of fit immediately follows that q = 1. Extinction is thus certain for the total descendantlineage (q0 = 1), from which it is implicit that the clean descendant sublineage will alsogo extinct. This is what we expect, given that the branching process that we describe isat most critical.

2.4.2. Sojourn times: tail behavior. We study the probability distribution of theextinction time T when the process starts with one original of the focal allele that arose bymutation, i.e., we study Pe0(T ≤ n), and show its dependence on the genomic deleteriousmutation rate U . First, we characterize the total sojourn times of such new focal allelesby analyzing the tail behavior of T .

Let T := inf{n ∈ N : Xn = 0}. From (6) we know that for all x ∈ Nd+1, Px(Xn = 0) =[fn(0)]x. In particular, for each n ∈ N,

Pe0(T ≤ n) = Pe0(Xn = 0) = fn,0(0). (12)

The cumulative distribution function n 7→ Pe0(T ≤ n) of the extinction time starting fromone original of the focal allele can thus be computed exactly for each n ∈ N by iterating ntimes the offspring generating function f defined by (3)–(5), by taking its first coordinateand applying it to the value 0. This can be done easily with a computing program (seethe Appendix), but cannot be written explicitly due to the complexity of the iteration.

However we can explicitly obtain the growth rate of Pe0(T ≤ n) as n→∞. Indeed,according to theorem 2.1 in [15] and thanks to (11), there exists for each i = 0, . . . , d− 1some constant γi > 0 such that, as n→∞,

fn,i(0) = 1− nνi (1− φ)n cni (γi + o (1))

= 1− nd−ie−nUcni (γi + o (1)) . (13)

Similarly, there exists γd > 0 such that, as n→∞,

fn,d(0) = 1− nνdcnd (γd + o (1))

= 1− cnd (γd + o (1)) . (14)

Applying ci = (1 − s)(1 − s)i, we obtain the extinction probability at generation n asn→∞ for a focal allele that arose as a result of a particular mutational event:

Pe0(T ≤ n) = 1− nd(1− s)ne−nU (γ0 + o (1)) . (15)

2.4.3. Sojourn times: half-lives. A second way in which we characterize the sojourn timesof newly arisen focal alleles is to determine the expected time until the number of membersin the allelic lineage becomes half its initial value. We determine such ‘half-lives’ for boththe clean descendant sublineage and the total descendant lineage.

doi:10.1088/1742-5468/2013/01/P01013 8

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Figure 1. Half-life of clean descendant sublineages of a neutral allele (s = 0) as afunction of genomic deleterious mutation rate U ; it is not a function of any otherparameters.

Clean descendant sublineages. The half-life of the clean descendant sublineage is thesmallest integer n such that, starting from the founding individual of type 0, the expectednumber of descendants of type 0 becomes smaller than or equal to 1

2, i.e. such that

Ee0(Xn,0) = m(n)00 = mn

00 = e−nUcn0 ≤ 12. Hence it corresponds to the smallest integer

satisfying

n ≥ − ln 0.5

−U + ln c0

, (16)

or, considering that ci = (1− s)(1− s)i,

n ≥ ln 0.5

−U − ln(1− s). (17)

Figure 1 illustrates the dependence on U of the half-life of a clean descendant sublineagefounded from a neutral (s = 0) original focal allele. Note that the half-life does not dependon the selective disadvantage of linked deleterious mutations s nor on the number ofmutant types d.

Total descendant sublineages. The half-life of the original mutation is the smallest integern such that, starting from one individual of the original type, the expected number ofall its descendants becomes smaller than or equal to 1

2, i.e. such that Ee0(

∑di=0Xn,i) =∑d

i=0m(n)0i ≤ 1

2. The nth power of the mean matrix M cannot be easily written down

explicitly; however it can be computed with a program which enables identification of thesmallest integer n satisfying

d∑i=0

m(n)0i ≤ 1

2. (18)

doi:10.1088/1742-5468/2013/01/P01013 9

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Figure 2. Half-life of the total descendant lineage of a neutral allele (s = 0).

Figure 2 illustrates the dependence on U of the half-life of total descendant sublineagesfounded from a neutral original focal allele, assuming d = 50. Note that the half-life inthis case does depend on the selective disadvantage of linked mutations s. This figureremained essentially unchanged on increasing the number of types from d = 10 to 50.

2.4.4. The total number of descendants. Let N :=∑∞

n=0Xn be the total number ofdescendants, including the X0 ancestors, and let g = (g1, . . . , gd) be its generating function,satisfying for each Φ ∈ [0, 1]d+1, gi (Φ) = Eei

(ΦN). It is known (see e.g. [10]) that g

satisfies

gi(Φ) = Φifi(g(Φ)). (19)

for each Φ ∈ [0, 1]d+1 and each i = 0, . . . , d. We are interested in the expected numberof ‘clean’ descendants, D, as well as the total number of descendants, Dd. Denoting byaij := Eei

(Nj) = (∂gi/∂Φj)(1) the mean value of the total progeny of type j, starting withone individual of type i, we thus have

D = a00 and Dd =d∑i=0

a0i. (20)

We deduce from (19) that

∂gi∂Φj

(1) = δij +d∑

k=0

∂fi∂Φk

(1)∂gk∂Φj

(1), (21)

and thus

aij = δij +d∑

k=0

mikakj. (22)

doi:10.1088/1742-5468/2013/01/P01013 10

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Denoting by A the matrix with entries aij, this implies A = (I −M)−1, from which wededuce that for each i = 0, . . . , d,

a0i =

∏i−1j=0mj,j+1∏i

j=0(1−mjj), (23)

where an empty product equals by convention 1. Consequently, for each i = 0, . . . , d− 1,

a0i =1

1− e−U(1− s)

i∏j=1

(1− e−U)(1− s)(1− s)j

1− e−U(1− s)(1− s)j

=1

1− e−U(1− s)

i∏j=1

(1− 1− (1− s)(1− s)j

1− e−U(1− s)(1− s)j

), (24)

and

a0d =(1− e−U)(1− s)(1− s)d

[1− e−U(1− s)][1− (1− s)(1− s)d]

d−1∏j=1

(1− e−U)(1− s)(1− s)j

1− e−U(1− s)(1− s)j

=(1− e−U)(1− s)(1− s)d

[1− e−U(1− s)][1− (1− s)(1− s)d]

d−1∏j=1

(1− 1− (1− s)(1− s)j

1− e−U(1− s)(1− s)j

).

(25)

We then deduce easily from (20), (24) and (25) the exact values of D and Dd for each s, sand d finite. In particular, for each d ∈ N, D = (1− e−U(1− s))−1.

In a real biological organism, the number of potentially deleterious mutations dis essentially unlimited. This consideration suggests that the following limit gives thebiologically relevant total number of descendants:

limd→∞

Dd =1

1− e−U(1− s)

∞∑i=0

i∏j=1

(1− 1− (1− s)(1− s)j

1− e−U(1− s)(1− s)j

)(26)

for each s ∈ [0, 1[, s ∈]0, 1[. We illustrate in figure 3 how the mean cumulative numberDd changes as a function of U , for s = 0, for different fixed s and for d = 50. In figure 4we compare the outcomes for different values of d (and for fixed s = 0, s = 0.01). Wemoreover observed that the outcomes are almost identical for, for example, d = 50 andd = 100. The following expressions represent the biologically relevant limit of d→∞.

Clean descendants.

D = (1− e−U(1− s))−1. (27)

Total descendants.

D =1

1− e−U(1− s)

∞∑i=0

i∏j=1

(1− 1− (1− s)(1− s)j

1− e−U(1− s)(1− s)j

). (28)

It is apparent that D = γD, where γ =∑∞

i=0

∏ij=1[1 − (1 − (1 − s)(1 − s)j)/(1 −

e−U(1 − s)(1 − s)j)] which, it turns out, is well approximated by γ ≈ eθθ−(ω+θ)(ω +

doi:10.1088/1742-5468/2013/01/P01013 11

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Figure 3. Mean number of descendants of a neutral allele (i.e., total cumulativenumber of lineage members) as a function of mutation rate and selectivedisadvantages of linked mutations. Here d = 50, i = 0, . . . , d, s = 0, and ci =(1− s)(1− s)i. Values for s are given in the legend.

Figure 4. Effect of d on the mean number of descendants of a neutral allele. Hereci = (1 − s)(1 − s)i, i = 0, . . . , d, values of d are given in the legend, s = 0 ands = 0.01.

θ) (Γ(ω + θ)− Γ(ω + θ, θ)), where θ = U/s and ω = s/s. For small s and U , D ≈ 1/(s+U),giving D ≈ γ/(s + U). In figure 5, the approximate expression derived here is comparedto agent-based simulations. We suspect that the discrepancy between predictions andsimulations at low mutation rates is due to the fact that the mean is greatly affected by

doi:10.1088/1742-5468/2013/01/P01013 12

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Figure 5. Comparing theoretical predictions with individual-based simulations.Mean cumulative number of lineage members (mean numbers of descendants)are plotted as a function of mutation rate and selective disadvantage oflinked mutations. Each lineage starts with one neutral individual. Each dotrepresents a mean value taken from 100 000 simulations. In simulations, theselection coefficient for each mutation was drawn at random from an exponentialdistribution with geometric mean s. Curves plot the approximate analyticalexpression D ≈ eθθ(1−θ) (Γ(θ)− Γ(θ, θ)) /U , where θ = U/s. Here s = 0 ands = 0.01.

the heavy tail of the distribution (for a critical branching process, i.e., when U = 0, thetail is so heavy that the mean number of descendants is infinite). Theoretical predictionsaccount for the entire tail of the distribution, whereas the tail is necessarily truncatedin the simulations; theoretical predictions for the mean are therefore larger than meanscalculated from simulations.

2.4.5. Fitness dynamics and the single-type branching process approximation. Thedynamics of mean mutational load in a growing lineage was derived by Fontanari et al [7]to be k(t) = (U/s) ∗ (1 − s) ∗ (1 − (1 − s)t). While this result was derived for growinglineages, i.e., for supercritical branching processes with growth rate R > 1, the authorsdo not rule out its applicability to subcritical lineages (R < 1), and even state that theresult should hold independently of the value of R. We have verified its applicability tosubcritical lineages; with the dynamics of the mean mutational load described by k(t),and for small s, our simulations show that the dynamics of the mean fitness are wellapproximated by w(t) ≈ (1− s)(1− s)k(t).

Knowledge of the fitness dynamics suggests an approximation to the branchingprocess: it seems reasonable to model the process as a single-type branching processwith dynamic mean number of offspring given by w(t). If we let gt(Φ) denote the pgffor the total cumulative number of lineage members t generations after the lineage wasfounded, then we have that gt+1(Φ) = Φft(gt(Φ)), where ft(Φ) is the pgf for the offspringdistribution at time t, and initial condition g0(Φ) = Φ. From this relation, together withthe fact that f ′t(1) = w(t), the dynamics of the mean cumulative number of lineage

doi:10.1088/1742-5468/2013/01/P01013 13

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Figure 6. Equilibrium frequency of a deleterious allele in an asexual populationas a fraction of classical mutation–selection balance: frequency of the allele onclean backgrounds µD/(µ/s) (green curves), frequency of the allele µD/(µ/s) (redcurves), analytical approximation (µγ/(s + U))/(µ/s) (black dots), and single-type approximation µM /(µ/s) (blue curves). Here s = 0.03 and s = 0.03.

members, mt = g′t(1), is immediate: mt+1 = w(t)mt + 1, and the total cumulative numberof a lineage is thus M = max ({mt : t = 1, . . . ,∞}). This quantity can be rewritten

as M = maxt(

1 +∑t−1

i=1

∏t−1j=t−iw(j)

). This approximation appears to reproduce the

dependence trends quite well (figure 6), but seems to be slightly off by a constant value.

2.4.6. Mutation–selection balance in asexual populations. To obtain equilibriumfrequencies that balance recurrent mutation creating new originals of the focal allele withdirect and indirect selection against the focal allele, we multiply the rate at which suchoriginals arise by the expected number of descendants of each such original. The resultingexpressions give expected frequencies of neutral (s = 0) or deleterious (s > 0) focal allelesin a non-adapting asexual population.

We define the parameter µ to denote the rate per replication at which new originalsof the focal allele arise by mutation. The variable G denotes the frequency of cleandescendants of originals from the focal allele (originals are produced recurrently bymutation), and F denotes the frequency of the focal allele. The equilibrium frequency of

clean descendants of originals from the focal allele is G = µD, whereas the equilibriumfrequency of the focal allele is F = µD. For small s and U , these mutation–selectionbalance frequencies may be approximated by G ≈ µ/(s + U), and F ≈ µγ/(s + U),where γ ≈ eθθ−(ω+θ)(ω + θ) (Γ(ω + θ)− Γ(ω + θ, θ)), θ = U/s, and ω = s/s. In the

absence of indirect selection, U = 0, giving γ = 1 (γθ→0−−→ 1) and recovering classical

mutation–selection balance, µ/s. Because γ ≥ 1, mutation–selection balance in asexualpopulations has lower bound µ/(s+U). The single-type branching process approximation

gives rise to the expression for mutation–selection balance, F ≈ µM .In figure 6, G and F are compared to classical mutation–selection balance frequencies,

given by the formula µ/s. We note that in much of the classical population genetics

doi:10.1088/1742-5468/2013/01/P01013 14

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

Figure 7. Mutation–selection balance frequency of a deleterious ‘focal allele’ inan asexual population as a function of genomic mutation rate. We note thestriking invariance of the mutation–selection balance frequency over a range offour orders of magnitude in mutation rate. These results would seem to dismissfears that increasing the mutation rate of a bacterial pathogen, perhaps throughchemical mutagenesis, would increase the frequency of antibiotic resistance. Theparameters are s = 0.1, s = 0.001, µ = kU where k = 10−4; we assume one inthree mutations to be deleterious.

literature, µ is assumed to be directly proportional to the overall genomic rate ofdeleterious mutation, U . While this assumption may hold for relatively homogeneouspopulations and low rates of mutation, we caution that it may not hold as mutation ratesincrease: at high mutation rates, the values of µ will be influenced by the frequencies ofgenotypes that are ‘mutational neighbors’ of the mutation in question.

Figure 7 illustrates the fact that the mutation–selection balance in asexual populationscan be surprisingly invariant over a wide range of mutation rates. This is because a balanceis struck between two opposing tendencies: (1) as the genomic mutation rate increases,the rate of recurrence of the focal allele also increases, thereby tending to increase itsfrequency; and (2) as the genomic mutation rate increases, the rate of occurrence oflinked deleterious mutations increases, thereby tending to decrease the frequency of thefocal allele.

One case of particular interest is the expected frequency of clean descendants of neutralalleles (s = 0). Clean descendants of neutral alleles are more likely to hitchhike withbeneficial mutations than contaminated descendants (i.e., descendants that have acquiredone or more deleterious mutation). The frequency of clean descendants of neutral allelesis thus indicative of the rate of fixation of neutral alleles.

Not only is this category of particular interest, but also an intriguing result is obtained.Under the assumption that neutral alleles are generated in the population at a rateproportional to the genomic mutation rate such that µ = kU , then the mean frequency ofclean descendants of neutral alleles is

G = kU/(1− e−U). (29)

doi:10.1088/1742-5468/2013/01/P01013 15

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

For small U , the frequency of such neutral ‘hitchhiking candidates’ is essentiallyindependent of the mutation rate: G ≈ kU/U = k. A more exact interpretation, andone that applies for higher mutation rates as well, is that the frequency of such neutralhitchhiking candidates is proportional to the ratio of mutation rate to mutation probability :G = k(U/φ), where φ = 1− e−U is the mutation probability.

3. Discussion

3.1. Summary

We have derived the sojourn times and total lineage sizes for neutral and deleterious allelesarising by mutation in a non-adapting, asexual population. These quantities differ fromthose predicted by classical mutation–selection dynamics, where selection occurs only onthe particular site in question. These classical predictions rely on the assumption of highrates of recombination surrounding the site in question, such that its behavior is notaffected by selection at other loci. When recombination is rare or absent, the behaviorof the site in question cannot be assumed to be independent of selection at other loci.When such ‘linked’ loci experience deleterious mutation, purifying selection at these locican add to any purifying selection that may exist against the locus in question—or ‘focal’locus. Such selection can cause the focal allele to be removed from the population at amuch higher rate than would be predicted on the basis of its own selective disadvantage.We have studied the effects of selection on linked sites in its most extreme case, namely,in asexual populations.

3.2. Background selection

There is an intuitive connection between the analyses and results we present here andthe established concept of ‘background selection’. Background selection also pertains tothe effects of selection against loci that are linked to a focal locus. But in its originalformulation at least, background selection refers exclusively to selection against pre-existing linked loci or, more generally, the pre-existing genetic background on whichthe focal allele happens to appear by mutation. This fact is apparent in Charlesworth’scalculation of the effective population size as being Nf0, where N is the population sizeand f0 is the fraction of the population that are ‘mutation-free’ [3]. The logic underlyingthis calculation is as follows: background selection effectively eliminates the fraction of thepopulation that are not mutation-free (i.e., fraction 1− f0), and any alleles that happento appear on a background that is not already mutation-free are doomed to die out andare thus inconsequential from an evolutionary standpoint.

In real populations, the two phenomena will surely coexist—the phenomenon thatwe describe of indirect selection against subsequently occurring linked mutations, andthe phenomenon of indirect selection against pre-existing linked mutations (backgroundselection). One way to modify our results so as to accommodate both phenomena wouldbe to introduce a random variable S to denote the coefficient of selection against thepre-existing genetic background. This is added to the coefficient of selection against thefocal allele s, and the fitness of the genome upon which the allele appears is therefore1− s−S. Thus, accounting for both pre-existing linked mutations (background selection)and subsequently occurring linked mutations, the expected number of descendants of a

doi:10.1088/1742-5468/2013/01/P01013 16

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

single, randomly chosen allele in the population is

E(D) = γ

∫ 1

0

(1− we−U)−1g(w) dw, (30)

where g(w) is the probability density associated with random variable W = 1 − s − S;this probability density describes how mutant genetic backgrounds are distributed in thepopulation as a function of their fitness, W .

Extension to sexual populations Extension of our results to diploid, sexual organisms isachieved by replacement of s and s with hs and hs where h is the coefficient of dominance,and by careful redefinition of U to reflect only linked mutation.

Acknowledgments

This work was supported by the US National Institutes of Health grants R01 GM079843-01 (PJG/PDS) and ARRA PDS#35063 (PJG/PDS), European Commission grant FP7231807 (PJG), and the Center for Evolutionary and Theoretical Immunology at theUniversity of New Mexico—both direct support as well as a seed grant.

Appendix. Computation of the cumulative distribution function of the extinctiontime

Code in Scilab:

function [p]=extinctiontime(n,d,s,u)

// computes P(T<=n) for the branching model with d types, mutation rate

// u and selection probability ci=(1-s)^(i-1)

// computation of the coefficients ci

for i=1:d, c(i)=(1-s)^(i-1); end

// computation of the offspring generating function in point 0

for i=1:d-1

f(i)=((1-u)*(1-c(i)/2)+u*(1-c(i+1)/2))^2;

end

f(d)=(1-c(d)/2)^2;

if n>1

// n iteration of f

for k=1:(n-1)

r=f;

for i=1:d-1

f(i)=((1-u)*(1-c(i)*(1-r(i))/2)+u*(1-c(i+1)*(1-r(i+1))/2))^2;

end

f(d)=(1-c(d)*(1-r(d))/2)^2;

end

end

doi:10.1088/1742-5468/2013/01/P01013 17

J.Stat.M

ech.(2013)P

01013

Mutation–background–selection balance

// take the first coordinate of the n-th iterate

p=f(1);

endfunction

References

[1] Athreya K B and Ney P E, 1972 Branching Processes (New York: Springer)[2] Charlesworth B, The effects of deleterious mutations on evolution at linked sites, 2011 Genetics 190 5[3] Charlesworth B, Morgan M T and Charlesworth D, The effect of deleterious mutations on neutral

molecular variation, 1993 Genetics 134 1289[4] Desai M M and Fisher D S, Beneficial mutation selection balance and the effect of linkage on positive

selection, 2007 Genetics 176 1759[5] Elena S F and Lenski R E, Evolution experiments with microorganisms: The dynamics and genetic bases of

adaptation, 2003 Nature Rev. Genet. 4 457[6] Fogle C A, Nagle J L and Desai M M, Clonal interference, multiple mutations and adaptation in large

asexual populations, 2008 Genetics 180 2163[7] Fontanari J F, Colato A and Howard R S, Mutation accumulation in growing asexual lineages, 2003 Phys.

Rev. Lett. 91 218101[8] Gerrish P J and Lenski R E, The fate of competing beneficial mutations in an asexual population, 1998

Genetica 102/103 127[9] Good B H, Rouzine I M, Balick D J, Hallatschek O and Desai M M, Distribution of fixed beneficial

mutations and the rate of adaptation in asexual populations, 2012 Proc. Nat. Acad. Sci. 109 4950[10] Good I J, The generalization of Lagrange’s expansion and the enumeration of trees, 1965 Math. Proc.

Camb. Phil. Soc. 61 499[11] Hahn M W, Toward a selection theory of molecular evolution, 2008 Evolut.; Int. J. Org. Evolut. 62 255[12] Hermisson J, Redner O, Wagner H and Baake E, Mutation–selection balance: ancestry, load, and maximum

principle, 2002 Theoret. Popul. Biol. 62 9[13] Hill W G and Robertson A, The effect of linkage on limits to artificial selection, 1966 Genet. Res. 8 269[14] Kim Y and Stephan W, Joint effects of genetic hitchhiking and background selection on neutral variation,

2000 Genetics 155 1415[15] Ogura Y, Asymptotic behavior of multitype Galton–Watson processes, 1975 Kyoto J. Math. 15 251[16] Rozen D E, Arjan J, de Visser G M and Gerrish P J, Fitness effects of fixed beneficial mutations in

microbial populations, 2002 Curr. Biol. 12 1040[17] Sewastjanow B A, 1975 Verzweigungsprozesse (Munich: R. Oldenbourg Verlag)[18] Sniegowski P D and Gerrish P J, Beneficial mutations and the dynamics of adaptation in asexual

populations, 2010 Phil. Trans. R. Soc. B 365 1255

doi:10.1088/1742-5468/2013/01/P01013 18