21
arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013 The Moran model with selection: Fixation probabilities, ancestral lines, and an alternative particle representation Sandra Kluth, Ellen Baake Faculty of Technology, Bielefeld University, Box 100131, 33501 Bielefeld, Germany Abstract. We reconsider the Moran model in continuous time with pop- ulation size N , two types, and selection. We introduce a new particle rep- resentation, which we call labelled Moran model, and which has the same empirical type distribution as the original Moran model, provided the initial values are chosen appropriately. In the new model, individuals are labelled 1, 2,...,N ; neutral resampling events may take place between arbitrary la- bels, whereas selective events only occur in the direction of increasing labels. With the help of elementary methods only, we do not only recover fixation probabilities, but obtain detailed insight into the number and nature of the selective events that play a role in the fixation process forward in time. Key words: Moran model with selection, fixation probability, labelled Moran model, defining event, ancestral line. 1 Introduction In population genetics models for finite populations including selection, many properties are still unknown, in particular when it comes to ancestral processes and genealogies. This is why a classics, namely the Moran model in continuous time with population size N , two types, and (fertility) selection, still is the subject of state-of-the-art research, see e.g. [2, 14, 16]. In particular, new insights are currently obtained by drawing a more fine- grained picture: Rather than looking only at the type frequencies as a function of time, one considers ancestral lines and genealogies of samples (as in the ancestral selection graph (ASG) [13, 15]), or even particle representations that make all individual lines and their interactions explicit (here the main tool is the lookdown construction [3, 4, 5]). See [7, Ch. 5] for an excellent overview of the area. * Corresponding author. E-mail address: ebaake@ techfak.uni-bielefeld.de Tel.: +49 521 106 4896 (E. Baake) 1

arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

arX

iv:1

306.

2772

v1 [

q-bi

o.PE

] 1

2 Ju

n 20

13

The Moran model with selection: Fixation

probabilities, ancestral lines, and an alternative

particle representation

Sandra Kluth, Ellen Baake∗

Faculty of Technology, Bielefeld University, Box 100131, 33501 Bielefeld, Germany

Abstract. We reconsider the Moran model in continuous time with pop-ulation size N , two types, and selection. We introduce a new particle rep-resentation, which we call labelled Moran model, and which has the sameempirical type distribution as the original Moran model, provided the initialvalues are chosen appropriately. In the new model, individuals are labelled1, 2, . . . , N ; neutral resampling events may take place between arbitrary la-bels, whereas selective events only occur in the direction of increasing labels.With the help of elementary methods only, we do not only recover fixationprobabilities, but obtain detailed insight into the number and nature of theselective events that play a role in the fixation process forward in time.

Key words: Moran model with selection, fixation probability, labelledMoran model, defining event, ancestral line.

1 Introduction

In population genetics models for finite populations including selection, many propertiesare still unknown, in particular when it comes to ancestral processes and genealogies.This is why a classics, namely the Moran model in continuous time with population sizeN , two types, and (fertility) selection, still is the subject of state-of-the-art research, seee.g. [2, 14, 16]. In particular, new insights are currently obtained by drawing a more fine-grained picture: Rather than looking only at the type frequencies as a function of time,one considers ancestral lines and genealogies of samples (as in the ancestral selectiongraph (ASG) [13, 15]), or even particle representations that make all individual linesand their interactions explicit (here the main tool is the lookdown construction [3, 4, 5]).See [7, Ch. 5] for an excellent overview of the area.

∗Corresponding author.E-mail address: [email protected].: +49 521 106 4896 (E. Baake)

1

Page 2: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

The way fixation probabilities (that is, the probabilities that the individuals of agiven type eventually take over in the population) are being dealt with is typical ofthis development. The classical result is that of Kimura [11], which is based on typefrequencies in the diffusion limit. Short and elegant standard arguments today start fromthe discrete setting, use a first-step or martingale approach, and arrive at the fixationprobability in the form of a (normalised) series expansion in the reproduction rate ofthe favourable type [6, Thm. 6.1]. An alternative formulation is given in [12]. Both areeasily seen to converge to Kimura’s result in the diffusion limit.

Recently, Mano [14] obtained the fixation probabilities with the help of the ASG, basedon methods of duality (again in the diffusion limit); his argument is nicely recapitulatedin [16]. What is still missing is a derivation within the framework of a full particlerepresentation.

This will be done in the present article. To this end, we will introduce an alternativeparticle system, which we will call the labelled Moran model, and which is particularlywell-suited for finite populations under selection. It is reminiscent of the N-particle look-down process, but the main new idea is that each of the N individuals is characterisedby a different reproductive behaviour, which we indicate by a label. With the helpof elementary methods only, we do not only recover fixation probabilities, but obtaindetailed insight into the number and nature of the selective events that play a role inthe fixation process forward in time.

The paper is organised as follows. We start with a brief survey of the Moran modelwith selection (Sec. 2). Sec. 3 is a warmup exercise that characterises fixation prob-abilities by way of a reflection principle. Sec. 4 introduces the labelled Moran model.With the help of a coupling argument that involves permutations of labels, we obtainthe probability that a particular label becomes fixed. Within this setting, we identifyreproduction events that affect the fixation probability of a given label; they are termeddefining events. The number of defining events that are selective turns out as the pivotalquantity to characterise the fixation probabilities of the individual labels, and hence thelabel of the line that will become ancestral to the entire population. In Sec. 5 we passto the diffusion limit. We continue with a simulation algorithm that generates the labelthat becomes fixed together with the targets of the selective defining events (Sec. 6).Sec. 7 summarises and discusses the results.

2 The Moran model with selection

We consider a haploid population of fixed size N ∈ N in which each individual is char-acterised by a type i ∈ S = A,B. If an individual reproduces, its single offspringinherits the parent’s type and replaces a randomly chosen individual, maybe its ownparent. This way the replaced individual dies and the population size remains constant.

Individuals of type B reproduce at rate 1, whereas individuals of type A reproduce atrate 1+sN , sN ≥ 0. Accordingly, type-A individuals are termed ‘fit’, type-B individualsare ‘unfit’. In line with a central idea of the ASG, we will decompose reproductionevents into neutral and selective ones. Neutral ones occur at rate 1 and happen to

2

Page 3: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

all individuals, whereas selective events occur at rate sN and are reserved for type-Aindividuals.

The Moran model has a well-known graphical representation as an interacting particlesystem (cf. Fig. 1). The N vertical lines represent the N individuals, time runs fromtop to bottom in the figure. Reproduction events are represented by arrows with thereproducing individual at the base and the offspring at the tip.

t

AAAA

A A AA BBBB

BBBB

Figure 1: The Moran model. The types (A = fit, B = unfit) are indicated for the initialpopulation (top) and the final one (bottom).

We are now interested in the process(

ZNt

)

t≥0(or simply ZN), where ZN

t is the numberof individuals of type A at time t. (Note that this is the type frequency representation,which contains less information than the particle (or graphical) representation in Fig.1.) ZN is a birth-death process with birth rates λN

i and death rates µNi when in state i,

where

λNi = (1 + sN )i

N − i

Nand µN

i = (N − i)i

N. (1)

The absorbing states are 0 and N , thus, one of the two types, A or B, will almostsurely become fixed in the population in finite time. Let TN

k := mint ≥ 0 | ZNt = k,

0 ≤ k ≤ N , be the first hitting time of ZN . The fixation probability of type A given thatthere are initially k type-A individuals is well-known to be (cf. [6, Thm. 6.1])

hNk := P(TN

N < TN0 | ZN

0 = k) =

∑N−1j=N−k(1 + sN)

j

∑N−1j=0 (1 + sN)

j. (2)

An alternative formulation is obtained from [12], where fixation probabilities underselection and mutation are considered. It is easily verified that, taking together theirEq. (40), Theorem 2, and Eq. (30) and setting the mutation rate to zero gives therepresentation

hNk =

k

N+

N−1∑

n=1

aNnk

N

n−1∏

j=0

N − i− j

N − 1− j, (3)

3

Page 4: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

where the coefficients aNn follow the recursion

aNn − aNn+1 = sNN − n

n+ 1(aNn−1 − aNn ) (4)

for 1 ≤ n ≤ N − 2, with initial conditions

aN0 = 1 and aN1 = aN0 −N(1 − hNN−1). (5)

Eq. (3) is suggestive: It decomposes the fixation probability into the neutral part (k/N)plus additional terms due to offspring created by selective events. It is the purposeof this paper to make this interpretation more precise, and to arrive at a thoroughunderstanding of Eq. (3). In particular, we aim at a probabilistic understanding of thecoefficients aNn .

Let us now turn to the model in the diffusion limit. To this end, we use the usualrescaling

(

XNt

)

t≥0:=

1

N

(

ZNNt

)

t≥0,

and assume that limN→∞NsN = σ, 0 ≤ σ < ∞. As N → ∞, we obtain the well-knowndiffusion

(Xt)t≥0 := limN→∞

(

XNt

)

t≥0.

It is characterised by the drift coefficient a(x) = σ(1 − x)x and the diffusion coefficientb(x) = 2x(1− x). Hence, the infinitesimal generator A of the diffusion is defined by

Af(x) = (1− x)x∂2

∂x2f(x) + σ(1− x)x

∂xf(x), f ∈ C2([0, 1]).

Define the first-passage time Tx := inft ≥ 0 | Xt = x for x ∈ [0, 1]. For σ > 0 thefixation probability of type A follows a classical result of Kimura [11] (see also [8, Ch.5.3]):

h(x) := P(T1 < T0 | X0 = x) =1− exp(−σx)

1− exp(−σ). (6)

See also [8, Ch. 4, 5] or [10, Ch. 15] for reviews of the diffusion process describing theMoran model with selection and its probability of fixation.

According to [12] and previous results of Fearnhead [9] and Taylor [17] (obtained inthe context of the common ancestor process in the Moran model with selection andmutation), the equivalent of (3) reads

h(x) = x+∑

n≥1

anx(1− x)n (7)

with coefficients an = limN→∞ aNn , n ≥ 1. Following (4) and (5), the an are characterisedby the recursion

an − an+1 =σ

n + 1(an−1 − an) (8)

with initial conditionsa0 = 1 and a1 = a0 − h′(1). (9)

4

Page 5: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

3 Reflection principle

In [6, Sec. 6.1.1] equation (2) is proven in two ways, namely, using a first-step approachand a martingale argument, respectively. Both approaches rely on the process

(

ZNt

)

t≥0,

without reference to an underlying particle representation. As a warmup exercise, wecomplement this by an approach based on the particle picture, which we call reflectionprinciple, see Fig. 2.

Definition 1. Let a graphical realisation of the Moran model be given, with ZN0 = k,

1 ≤ k ≤ N − 1, and fixation of type A. Now interchange the types (i.e., replace all Aindividuals by B individuals and vice versa), without changing the graphical realisationotherwise. This results in a graphical realisation of the Moran model with ZN

0 = N − kin which type B becomes fixed, and is called the reflected realisation.

AAAAAAAA

AAAAA BBB

t

1

1

1 + sN

1 + sN

1 + sN

1 + sN

1 + sN

AAA

BBBBBBBB

BBBBB

1

1

1

1

1

1 + sN1 + sN

Figure 2: Reflection principle: Realisations ω (left) and ω (right). Bold lines representtype-A individuals, thin ones type-B individuals; likewise, arrows emanatingfrom type-A (type-B) individuals are bold (thin). Interchange of types trans-forms the realisation on the left into the realisation on the right and viceversa, such that the respective other type becomes fixed. Reproduction ratesof parental lines are noted above dashed and dotted arrows; unmarked arrowsappear at rate 1. Dashed arrows represent the elimination of individuals (ex-cept new descendants) of the type that eventually gets lost in the population.The births of their new descendants and their replacements are represented bydotted arrows. Altogether PN(ω)dω = (1+sN)

−3(1+sN )2(1+sN)

−2PN(ω)dωwith N = 8.

Put differently, in the case ZN0 = k, 1 ≤ k ≤ N−1, reflection transforms a realisation in

which the (offspring of the) k fit individuals become fixed (altogether, this happens withprobability hN

k ) into a realisation in which the (offspring of the) unfit individuals becomefixed (which happens with probability 1−hN

N−k), and vice versa. This operation does notchange the graphical structure, but the measures of the realisations are different sincea different weight is attached to (some of) the arrows. To make the situation tractable,we will work with what we will call the reduced Moran model : Starting from the original

5

Page 6: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

Moran model, we remove those selective arrows that appear between individuals that areboth of the fit type. Obviously, this does not affect the process ZN (in particular, it doesnot change the fixation probabilities), but it changes the graphical representation. Letnow ΩN be the set of graphical realisations of the reduced Moran model for t ∈ [0, T ],where T := minTN

0 , TNN . Let PN be the probability measure on ΩN . For a realisation

ω ∈ ΩN with ZN0 = k, 1 ≤ k ≤ N − 1, in which A becomes fixed, we define ω ∈ ΩN as

the corresponding reflected realisation.Our strategy will now be to deduce the fixation probabilities by comparing the mea-

sures of ω and ω. To assess the relative weights of PN(ω)dω and PN(ω)dω, note firstthat, since the fit individuals become fixed in ω, all N −k type-B individuals have to bereplaced by fit ones, i.e. by arrows that appear at rate 1+sN . (Here and in what follows,the ‘rate at which arrows appear’ is to be understood as rate per parental line; the rateper pair of lines has an additional factor of 1/N .) In ω, the corresponding arrows pointfrom unfit to fit individuals and thus occur only at rate 1, cf. the dashed arrows in Fig.2. Second, we have to take into account so-called new descendants defined as follows:

Definition 2. A descendant of a type-i individual, i ∈ S, that originates by replacingan individual of a different type (j 6= i) is termed a new descendant of type i.

Remark 1. The total number of new descendants of type i is almost surely finite.

Every new descendant of a type-B individual goes back to an arrow that occurs atrate 1. If the fit individuals go to fixation (as in ω), all new descendants of type Bmust eventually be replaced by arrows that emanate from fit individuals and thereforeoccur at rate 1 + sN . In ω this situation corresponds to new descendants of type Athat originate from reproduction events at rate 1+ sN and are eventually eliminated byarrows that emanate from type-B individuals. See the thin and bold dotted arrows inFig. 2, which always appear in pairs.

Let now DNB (ω) be the number of new descendants of type B in ω. Then DN

B (ω) < ∞almost surely and we obtain for the measure of ω

PN(ω)dω = (1 + sN)−(N−k)(1 + sN)

DNB(ω)

( 1

1 + sN

)DNB(ω)

PN(ω)dω

= (1 + sN)−(N−k)PN(ω)dω.

Note that the effects of creating new descendants and their replacement cancel eachother, so that the relative weights of PN(ω)dω and PN(ω)dω do not depend on ω. Sincereflection provides a one-to-one correspondence between realisations with fixation of typeA and of type B, respectively, we obtain the system of equations

1− hNN−k = (1 + sN)

−(N−k)hNk , 1 ≤ k ≤ N − 1. (10)

Together with the boundary conditions hN0 = 0 and hN

N = 1, equation (10) becomes alinear system that is solved by (2).

6

Page 7: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

4 Labelled Moran model

In this Section we introduce a new particle model, which we call labelled Moran model,and which has the same empirical type distribution as the original Moran model withselection of Sec. 2, provided the initial conditions are chosen appropriately. As before,we consider a population of fixed size N in continuous time, but now every individualis assigned a label i ∈ 1, . . . , N with different reproductive behaviour to be specifiedbelow. As in the original Moran model, birth events are represented by arrows; theylead to a single offspring, which inherits the parent’s label, and replaces a randomlychosen individual. Again we distinguish between neutral (at rate 1) and selective (atrate sN) events. Neutral arrows appear as before, at rate 1/N per ordered pair of lines,irrespective of their labels. But we only allow for selective arrows emanating from alabel i and pointing to a label j > i (at rate sN/N per ordered pair of lines with suchlabels). The idea is to have each label behave as ‘unfit’ towards lower labels and as ‘fit’towards higher labels. We now fix an initial population that contains all N labels andintroduce a spatial structure so that initially position i in the graphical representationis occupied by label i, 1 6 i 6 N , see also Fig. 3.

t

222

2

1

1

1

1

1 3

444

4

55

5 6 7 8

sN

sN

sN

sN

sN

Figure 3: The labelled Moran model. The labels are indicated for the initial population(top) and a later one (bottom). Reproduction rates are noted above the arrows.

4.1 Ancestors and fixation probabilities

Since there is no mutation in the labelled Moran model, one of the N labels will even-tually take over in the population. Thm. 1 assigns fixation probabilities to labels:

Theorem 1. Let IN be the label that becomes fixed in the population. Then, IN is termedthe ancestor and distributed according to

ηNi := P(IN = i) = (1 + sN)N−iηNN = hN

i − hNi−1, 1 ≤ i ≤ N, (11)

7

Page 8: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

with

ηNN := P(IN = N) =1

∑N−1j=0 (1 + sN)

j= 1− hN

N−1. (12)

We will give two proofs of Thm. 1. The first provides an intuitive explanation and isbased on the type frequencies. In the second proof, we will use an alternative approachin the spirit of the reflection principle of Sec. 3, which provides more insight into theparticle representation. This proof is somewhat more complicated, but it permits toclassify reproduction events into those that have an effect on the fixation probability ofa given label and those that do not; this will become important later on. In analogy withDef. 2 we understand new descendants of labels in S, S ⊆ 1, . . . , N, as descendantsof individuals with label in S that originate by replacing an individual of a label in thecomplement of S.

First proof of Thm. 1. For a given i, let ZN,it be the number of individuals with labels

in 1, . . . , i. Like ZN , the process ZN,i =(

ZN,it

)

t≥0is a birth-death process with rates

λNj and µN

j of (1). This is because every individual with label in 1, . . . , i sends arrowsinto the set with labels in i+1, . . . , N at rate 1+sN ; in the opposite direction, the rateis 1 per individual; and arrows within the label classes do not matter. Since ZN,i

0 = i,the processes ZN,i and ZN thus have the same law provided ZN

0 = i. As a consequence,P(IN ≤ i) =

∑i

j=1 ηNj = hN

i , 1 ≤ i ≤ N , which, together with (2), immediately yieldsthe assertions of Thm. 1.

Second proof of Thm. 1. This proof aims at a direct calculation of ηNi , 1 6 i 6 N−1, asa function of ηNN . Let a realisation of the labelled Moran model be given in which labelN becomes fixed (this happens with probability ηNN , still to be calculated). The basicidea now is to move label i to position N by way of a cyclic permutation of the labels’positions while keeping the arrows in their original positions; the result is a realisationin which label i is fixed, as illustrated in Fig. 4. More precisely, in the spatial orderingat time t = 0 we move every label N − i positions to the right. That is, we move labelj (at position j in the original setting) to position (j + N − i) mod N . Apart fromthe ordering of the labels, we keep the graphical realisation identical, that is, the arrowsappear at the same times and between the same positions as before. We thus obtainwhat we will call the permuted realisation of order N − i. Each lineage occupied by anindividual of label j in the original realisation corresponds to a lineage occupied by anindividual of label (j−N + i) mod N in the permuted realisation. Throughout, we willkeep the original meaning of the labels: Between every ordered pair of lines with labels(i, j), arrows appear at rate (1+ sN)/N if j > i, and at rate 1 otherwise. (But obviouslythe initial ordering of the labels in the permuted realisation differs from our conventionthat label j occupies position j at time t = 0.) Since, in the permuted realisation, wekeep the position of each arrow, it now affects a different pair of labels, which may changeits rate. As a result, the permuted realisation has a different weight than the original

8

Page 9: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

one; this will now be used to calculate ηNi . (Mathematically, the following may be seenas a coupling argument.)

So, let ΩN be the set of realisations of the labelled Moran model for t ∈ [0, T ], whereT now is the time at which one of the labels is fixed. Let PN be the probability measureon ΩN . For a realisation ωN ∈ ΩN , in which label N becomes fixed, we define ωi ∈ ΩN

as the corresponding permuted realisation of order N − i. We will now calculate ηNi byassessing the weight of the measure of ωi relative to that of ωN . Below we run brieflythrough the cases.

1

1

1

1

1 2 3 4 5 6 7 8

8 8 888888

1 + sN 1

1 2 3 4

5 5 5 5 5 5 5 5

56 7 8

1 + sN

1 + sN

1 + sN

1 + sN

Figure 4: Cyclic permutation. Realisations ω8 (left) and ω5 (right) in a labelled Moranmodel of size N = 8. A shift of the initial ordering of N−i = 3 positions to theright transforms ω8, in which label N = 8 becomes fixed, into ω5 (with fixationof label i = 5). Descendants of the label that becomes fixed are marked boldin both cases. The label sets 1, . . . , N − i and N − i + 1, . . . , N (left),and i + 1, . . . , N and 1, . . . , i (right), respectively, are encircled at thetop. Arrows within these sets (solid) appear at the same rates in ω8 and ω5.Arrows between these label sets are dotted or dashed, their rates (noted abovethe arrows) differ between ω8 and ω5. There is exactly one new descendant ofthe labels in 1, . . . , N − i (left) and i + 1, . . . , N (right), respectively. Itoriginates from the respective first dotted arrow, and is replaced via the seconddotted arrow. Replacements of the labels 1, . . . , N− i (left) and of i+1, . . . , N(right), respectively (except their new descendants) are represented by dashedarrows. Altogether PN(ω5)dω8 = (1 + sN )

−1(1 + sN)(1 + sN)3PN(ω8)dω8.

a) Arrows in ωN that point from and to labels within the sets 1, . . . , N − i orN − i+1, . . . , N, respectively, turn into arrows within the sets i+1, . . . , N or1, . . . , i, respectively, under the permutation. Such arrows appear at identicalrates in ωN and ωi (cf. solid arrows in Fig. 4).

b) Arrows in ωN that emanate from the set of labels 1, . . . , N − i and point to theset of labels N − i+1, . . . , N occur at rate 1+sN and create new descendants oflabels in 1, . . . , N − i. Since label N becomes fixed, every such new descendantis eventually eliminated by an arrow at rate 1, see also the dotted arrows in Fig.

9

Page 10: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

4. The corresponding situation in ωi concerns new descendants of labels in i +1, . . . , N, which result from neutral arrows and finally are replaced at rate 1+ sNeach (cf. dotted arrows in Fig. 4).

c) It remains to deal with the replacement of the labels 1, . . . , N − i (except for theirnew descendants) in ωN . Exactly N − i neutral arrows are responsible for this,they transform into arrows at rate 1+sN through the permutation (cf. the dashedarrows in Fig. 4).

Let now DN≤i(ωN) be the number of new descendants of labels in 1, . . . , i in ωN

(DN≤i(ωN) < ∞ almost surely). Then, a) - c) yield for the measure of ωi

PN(ωi)dωN = (1 + sN)DN

≤i(ωN )

( 1

1 + sN

)DN≤i

(ωN )

(1 + sN)N−iPN(ωN)dωN

= (1 + sN)N−iPN(ωN)dωN .

As in the reflection principle, the effects of the new descendants cancel each other andso the relative weights of PN(ωi)dωN and PN(ωN)dωN are independent of the particularchoice of ωN . Since the cyclic permutation yields a one-to-one correspondence betweenrealisations that lead to fixation of label N and label i, respectively, we obtain

ηNi = (1 + sN)N−iηNN , 1 ≤ i ≤ N. (13)

Finally, the normalisation

1 =N∑

i=1

ηNi = ηNN

N∑

i=1

(1 + sN)N−i

yields the assertion of Thm. 1.

4.2 Defining events

We are now ready to investigate the effect of selection in more depth. In contrast tothe second proof of Thm. 1, we return to the original convention that label i occupiesposition i at time t = 0. The second proof of Thm. 1 then allows to identify thereproduction events that affect the distribution of IN (i.e. that are responsible for thefactor (1 + sN)

N−IN in (11)) with the dashed arrows in Fig. 4, whereas dotted arrowsappear pairwise and their effects cancel each other. It suggests itself to term thesereproductions defining events:

Definition 3. Let IN = i. Arrows that emanate from the set of labels 1, . . . , i andtarget individuals with labels in the set i + 1, . . . , N that are not new descendants oflabels in i+ 1, . . . , N are called defining events.

10

Page 11: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

Loosely speaking, a defining event occurs every time the descendants of 1, . . . , IN‘advance to the right’. In particular, for every j ∈ IN + 1, . . . , N, the first arrow thatemanates from a label in 1, . . . , IN and hits the individual at position j is a definingevent. Altogether, there will be N − IN defining events until fixation. It is important tonote that they need not to be reproduction events of the ancestral label IN itself. SeeFig. 5 for an illustration.

IN

∗∗

IN

Figure 5: Defining events. Descendants of the ancestor IN are marked bold, definingevents are represented by dashed arrows. Left: Both defining events are re-production events of IN . The arrows that are indicated by ∗ and ∗∗ are notdefining events: The first one (∗) concerns only labels within IN +1, . . . , N,the second one (∗∗) targets a new descendant of IN +1, . . . , N. Right: Onlythe second defining event is a reproduction event of IN , the first one of a labelless than IN .

It is clear that all defining events correspond to rate 1 + sN arrows. It is also clearthat each defining event is either a selective (probability sN/(1 + sN)) or a neutral(probability 1/(1+ sN)) reproduction event, independently of the others and of IN . LetV Ni , IN + 1 ≤ i ≤ N , be the corresponding family of Bernoulli random variables that

indicate whether the respective defining event is selective. Let Y N denote the numberof selective defining events, that is,

Y N :=

N∑

i=IN+1

V Ni (14)

with the independent and identically distributed (i.i.d.) Bernoulli variables V Ni just

defined. Y N will turn out as a pivotal quantity for everything to follow. Let us nowcharacterise its distribution, and the dependence between IN and Y N . This will alsoprovide us with an interpretation of the coefficients aNn in (3), as well as another repre-sentation of the fixation probabilities.

It is clear from (14) that, given IN = i, Y N follows a binomial distribution withparameters N − i and sN/(1 + sN). Thus, for 1 ≤ n ≤ N − i, we obtain (via (11)) that

P(Y N = n, IN = i) =

(

N − i

n

)

( sN1 + sN

)n( 1

1 + sN

)N−i−n

ηNi =

(

N − i

n

)

snNηNN (15)

11

Page 12: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

and thus

P(Y N = n) =N∑

i=1

(

N − i

n

)

snNηNN =

N−1∑

i=0

(

i

n

)

snNηNN =

(

N

n+ 1

)

snNηNN , (16)

where the last equality is due to the well-known identity

k∑

i=0

(

i

)

=

(

k + 1

ℓ+ 1

)

, ℓ, k ∈ N0, 0 ≤ ℓ ≤ k. (17)

In particular, P(Y N = 0) = NηNN . Obviously, P(Y N = n), n > 1, may also be expressedrecursively as

P(Y N = n) = sNN − n

n+ 1P(Y N = n− 1). (18)

Furthermore, equations (15) and (16) immediately yield the conditional probability

P(IN = i | Y N = n) =

(

N−i

n

)

(

N

n+1

) . (19)

This coincides with the probability of getting n labels from the set i+1, . . . , N, givena sample of n + 1 labels is drawn from 1, 2, . . . , N (uniformly without replacement).The intuitive content of this important fact will become clear in Sec. 6.

Compare now (18) with (4) and, for the initial condition, compare (16) with (5) (withthe help of (12)). Obviously, the P(Y N = n) and the aNn − aNn+1, 0 ≤ n ≤ N − 2, havethe same initial value (at n = 0) and follow the same recursion. So they agree, and wehave found a probabilistic meaning for the aNn , 0 ≤ n ≤ N − 1, namely,

aNn = P(Y N ≥ n). (20)

This will be crucial in what follows.Another interesting characterisation is the following:

Proposition 1. Let WN := Y N + 1. Then WN follows the binomial distribution withparameters N and sN/(1 + sN), conditioned to be positive.

Proof. Let A be a random variable distributed according to Bin(N, sN/(1+ sN)). Then

P(A > 0) = 1−1

(1 + sN)N

=(

1−1

1 + sN

)

N−1∑

i=0

1

(1 + sN)i

by the geometric series. For n > 1, therefore,

P(A = n | A > 0) =

(

N

n

)

snN(1 + sN)−N

sN

1+sN

∑N−1i=0 (1 + sN)

−i=

(

N

n

)

sn−1N

∑N−1i=0 (1 + sN )

i= P(Y N = n− 1),

where the last step is due to (16). This proves the claim.

12

Page 13: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

Remark that label N is obviously not capable of selective reproduction events andits fixation implies the absence of (selective) defining events. Its fixation probability ηNNcoincides with the fixation probability of any label i, 1 ≤ i ≤ N , in the absence of anyselective defining events: P(Y N = 0, IN = i) = ηNN , cf. (15). For this reason, we termηNN the basic fixation probability of every label i, 1 ≤ i ≤ N , and express all relevantquantities in terms of ηNN . Note that ηNN is different from the neutral fixation probability,1/N , that applies to every label in the case sN = 0.

Decomposition according to the number of selective defining events yields a furtheralternative representation of the fixation probability hN

i (cf. (2)) of the Moran model.Consider

P(IN ≤ i | Y N = n) =1

(

N

n+1

)

i∑

j=1

(

N − j

n

)

=1

(

N

n+1

)

N−1∑

j=N−i

(

j

n

)

=

(

N

n+1

)

−(

N−i

n+1

)

(

N

n+1

) , (21)

where we have used (19) and (17). This leads us to the following series expansion in sN :

hNi = P(IN ≤ i) =

N−1∑

n=0

P(IN ≤ i | Y N = n)P(Y N = n)

=

N−1∑

n=0

[(

N

n + 1

)

(

N − i

n+ 1

)]

snNηNN .

(22)

We will come back to this in the next Section.

5 Diffusion limit of the labelled Moran model

In this Section we analyse the number of selective defining events in the diffusion limit(cf. Sec. 2). First of all we recapitulate that

limN→∞

P(IN ≤ iN ) = h(x)

for a sequence (iN)N∈N with iN ∈ 1, . . . , N and limN→∞iNN

= x, x ∈ [0, 1], and h asgiven in (6). Thus, the sequence (IN/N)N∈N converges to a random variable I∞ withdistribution function h.

At the same time, the sequence of random variables (Y N)N∈N converges to a randomvariable Y ∞. Analogous to our reasoning in Sec. 4.2, Y ∞ follows a Poisson distributionwith parameter σ(1−I∞) in the sense of a two-stage random experiment. That is, givenI∞ = x, Y ∞ is Poisson-distributed with parameter σ(1−x). (Of course this can also bechecked directly by taking the N → ∞ limit in (16).)

SinceP(Y ∞ ≥ n) = lim

N→∞P(Y N ≥ n) = lim

N→∞aNn = an, (23)

13

Page 14: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

with coefficients an as in (8) and (9), we may conclude that P(Y ∞ = n) = an − an+1 forn ≥ 0. We thus have found an interpretation of the coefficients an in (7). In particular,

P(Y ∞ = 0) = limN→∞

P(Y N = 0) = limN→∞

NηNN = limN→∞

[

1

N

N−1∑

j=0

(

1 +NsNN

)j

NN

]−1

=

[∫ 1

0

exp(σp)dp

]−1

exp(σ)− 1,

(24)

where we have used (16) and (12). According to (8) we have the recursion

P(Y ∞ = n) =σ

n+ 1P(Y ∞ = n− 1) (25)

and iteratively via (25) and (24)

P(Y ∞ = n) =σn

(n + 1)!P(Y ∞ = 0) =

σn+1

(n + 1)!(exp(σ)− 1). (26)

With an argument analogous to that in Prop. 1, one also obtains that W∞ := Y ∞ + 1follows a Poisson distribution with parameter σ, conditioned to be positive.

We now aim at expressing h in terms of a decomposition according to the values ofY ∞ (in analogy with (22)). We recapitulate equation (21) to obtain

P(I∞ ≤ x | Y ∞ = n) = limN→∞

P(IN ≤ iN | Y N = n) = limN→∞

1−

(

N−iN

n+1

)

(

N

n+1

) = 1− (1− x)n+1.

Then, the equivalent to (22) is a series expansion in σ:

h(x) = P(I∞ ≤ x) =∑

n≥0

P(I∞ ≤ x | Y ∞ = n)P(Y ∞ = n)

=1

exp(σ)− 1

n≥1

1

n!(1− (1− x)n)σn.

(27)

Finally note that (27) may also be expressed as

h(x) =∑

n≥1

(1− (1− x)n)(an−1 − an) = E(1 − (1− x)W∞

). (28)

Interestingly, but not unsurprisingly, this coincides with a representation given by Poka-lyuk and Pfaffelhuber in their proof of Kimura’s fixation probability (6) [16, Lemma2.2]. They follow an argument of Mano [14] that establishes a connection between theASG at stationarity and the fixation probability. A key here is the insight that the num-ber of lines in the ASG at stationarity follows a Poisson distribution with parameter σ,conditioned to be positive – which coincides with the distribution of W∞.

14

Page 15: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

6 The targets of selective defining events and

construction of the ancestral line

We have, so far, been concerned with the ancestral label and with the number of selectivedefining events, but have not investigated the targets of these events yet. This will nowbe done. We begin with another definition, see also Fig. 6.

Definition 4. Let Y N = n be given. We then denote by JN1 , . . . , JN

n , with JN1 < · · · <

JNn , the (random) positions that are hit by the n selective defining events.

IN

sN

sN

1

Figure 6: Targets of selective defining events. The descendants of the ancestor IN aremarked bold, dashed arrows (with rates noted above) represent defining events.Here, N = 5, IN = 2, Y N = 2, JN

1 = 3, JN2 = 5. Note that the first selective

defining event hits position 3, which is occupied by an individual of label 4 atthat time.

In terms of the family V Ni , IN + 1 ≤ i ≤ N , of i.i.d. Bernoulli random variables (cf.

Sec. 4.2), the JN1 , . . . , JN

Y N are characterised as

JN1 , . . . , JN

Y N = i ∈ IN + 1, . . . , N | V Ni = 1.

(For Y N = 0 we have the empty set.) Given that Y N = n, the n-tuple (JN1 , . . . , JN

n ) isuniformly (without replacement) distributed on the set of positions IN + 1, . . . , N:

P(JN1 = j1, . . . , J

Nn = jn | IN = i, Y N = n) =

1(

N−i

n

) ,

which implies (via (19))

P(IN = i, JN1 = j1, . . . , J

Nn = jn | Y N = n) =

1(

N

n+1

) . (29)

Hence, the (n+1)-tuples (IN , JN1 , . . . , JN

n ) are sampled from 1, . . . , N uniformly with-out replacement. Note that WN of Prop. 1 is the size of this tuple. Note also that the

15

Page 16: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

tuples do not contain any further information about the appearance of arrows in theparticle picture. In particular, we do not learn which label ‘sends’ the arrows, nor do weinclude selective arrows that do not belong to defining events.

It is instructive to formulate a simulation algorithm (or a construction rule) for thesetuples.

Algorithm 1. First draw the number of selective defining events, that is, a realisationn of Y N (according to (16)). Then simulate (IN , JN

1 , . . . , JNn ) in the following way:

Step 0: Generate a random number U (0) that is uniformly distributed on 1, . . . , N.Define I(0) := U (0). If n > 0 continue with step 1, otherwise stop.

Step 1: Generate (independently of U (0)) a second random number U (1) that is uniformlydistributed on 1, . . . , N \ U (0).

(a) If U (1) > I(0), define I(1) := I(0), J(1)1 := U (1).

(b) If U (1) < I(0), define I(1) := U (1), J(1)1 := I(0).

If n > 1 continue with step 2, otherwise stop.

Step k: Generate (independently of U (0), . . . , U (k−1)) a random number U (k) that is uni-formly distributed on 1, . . . , N \ U (0), . . . , U (k−1).

(a) If U (k) > I(k−1), define I(k) := I(k−1) and assign the variables U (k),J(k−1)1 , . . . ,J

(k−1)k−1

to J(k)1 , . . . ,J

(k)k , such that J

(k)1 < · · · < J

(k)k .

(b) If U (k) < I(k−1), define I(k) := U (k), J(k)1 := I(k−1) and J

(k)ℓ := J

(k−1)ℓ−1 for

2 ≤ ℓ ≤ k.

If n > k continue with step k + 1, otherwise stop.

The simulation algorithm first produces a vector (U (0) , . . . , U (n)) uniformly distributedon the set of unordered (n+1)-tuples and then turns it into the vector (I(n),J

(n)1 , . . . ,J

(n)n ),

which is uniformly distributed on the set of ordered (n + 1)-tuples. The latter there-fore has the same distribution as the vector of random variables (IN , JN

1 , . . . , JNn ), given

Y N = n (cf. (29)). The interesting point now is that we may interpret the algorithm asa procedure for the construction of the ancestral line. It successively adds selective ar-rows to realisations of the labelled Moran model, such that they coincide with additionalselective defining events; obviously, n+ 1 is the number of steps.

In step 0 we randomly choose one of the N labels. This represents the label thatbecomes fixed, i.e. the ancestor, in a particle representation with no selective definingevents (cf. Fig. 7, left). (Actually, this coincides with the neutral situation, sN = 0.)In the following steps selective arrows are added one by one, where the point to note isthat each of them may or may not move the ancestor ‘to the left’, depending on whether(a) or (b) applies. For instance, consider step k, that is, a realisation with k−1 selectivedefining events is augmented by a further one. In case (a) (cf. Figs. 7 and 8, middle)we add a selective defining event that does not change the label that becomes fixed (i.e.

16

Page 17: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

the ancestor remains the same), and that targets the newly chosen position U (k). Incontrast, in case (b) (cf. Figs. 7 and 8, right) the additional selective defining eventemanates from position U (k) and hits the ancestor of the previous step. The result is ashifting of the ancestor ‘to the left’, i.e. to position U (k). To avoid misunderstandings,we would like to emphasise that the details of the genealogies in the pictures are for thepurpose of illustration only; the only thing we really construct is the sequence of tuples(I(n),J

(n)1 , . . . ,J

(n)n ).

I(0)

1

1

1

1

I(1) J(1)1

1

1

1

1

sN

I(1) J(1)1

1

1

1

1

sN

Figure 7: Steps 0 and 1 of Algorithm 1 and corresponding genealogical interpretations.Left: step 0 (no selective defining events), middle: step 1 (a), right: step 1(b) (each with one selective defining event). Defining events are representedby dashed arrows, corresponding rates are indicated above the arrows. Theancestor (I(0) and I(1), respectively) and the target of a selective definingevent (J (1)

1 , if present) are indicated at the top. Bold lines represent thegenealogy of the entire population at the bottom. On the left, the ancestralline is unaffected by the selective defining event. On the right, it is shifted toa lower label; the previous ancestor (which is no longer the true ancestor) isrepresented by the dotted line.

We now have everything at hand to provide a genealogical interpretation for thefixation probabilities hN

i respectively h(x) (cf. (3) respectively (7)). We have seen thatthe tuples (I(n),J

(n)1 , . . . ,J

(n)n ) are constructed such that

P(I(n) = i,J(n)1 = j1, . . . ,J

(n)n = jn) = P(IN = i, JN

1 = j1, . . . , JNn = jn | Y N = n)

for all n ≥ 0 and j1, . . . , jn ∈ i+ 1, . . . , N, and, via marginalisation,

P(I(n) = i) = P(IN = i | Y N = n).

We reformulate the decomposition of (22) to obtain

hNi = P(IN ≤ i | Y N = 0)P(Y N ≥ 0)

+N−1∑

n=1

[

P(IN ≤ i | Y N = n)− P(IN ≤ i | Y N = n− 1)]

P(Y N ≥ n)

17

Page 18: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

I(k−1) J(k−1)1 J

(k−1)k−1

1

1

1

1

sN

sN

I(k) J(k)1 J

(k)2 J

(k)k

1

1

1

sN

sN

sN

I(k) J(k)1 J

(k)2 J

(k)k

1

1

1

1

sN

sN

sN

Figure 8: Step k of Algorithm 1 and corresponding genealogical interpretations. Situa-tion after step k−1 (left) and its modification according to step k (a) (middle)and step k (b) (right). As in Fig. 7, dashed arrows represent defining events,rates are indicated above arrows, ancestors (I(k−1) and I(k), respectively) andtargets of selective defining events (J (k−1)

1 , . . . ,J(k−1)k−1 and J

(k)1 , . . . ,J

(k)k , re-

spectively) are indicated at the top. Bold lines represent the genealogy of thepopulation at the bottom, dotted lines correspond to lines that have been an-cestors in previous steps of Algorithm 1. Each selective defining event thatgoes back to case (b) gives rise to a new dotted line.

= P(I(0) ≤ i)P(Y N ≥ 0) +N−1∑

n=1

[

P(I(n) ≤ i)− P(I(n−1) ≤ i)]

P(Y N ≥ n)

= P(I(0) ≤ i)P(Y N ≥ 0) +

N−1∑

n=1

P(I(n) ≤ i, I(n−1) > i)P(Y N ≥ n), (30)

where the last equality is due to the fact that the ancestor’s label is non-increasing inn. In (30), the fixation probability hN

i is thus decomposed according to the first stepin the algorithm in which the ancestor has a label in 1, . . . , i. The probability thatthis event takes place in step n may, in view of the simulation algorithm, be expressedexplicitly as

P(I(n) ≤ i, I(n−1) > i) = P(I(n) ≤ i, I(0), I(1), . . . , I(n−1) > i)

= P(U (n) ≤ i, U (0), U (1), . . . , U (n−1) > i)

=i

N

n−1∏

j=0

N − i− j

N − 1− j. (31)

Together with (30) and (20) this provides us with a term-by-term interpretation of (3).In particular, (31) implies that

limN→∞

P(I(n) ≤ iN , I(n−1) > iN) = x(1− x)n

for a sequence (iN)N∈N with iN ∈ 1, . . . , N and limN→∞iN

N= x, x ∈ [0, 1]. Thus, in

18

Page 19: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

the diffusion limit, and with an = limN→∞ P(Y N ≥ n) of (23), (30) turns into

h(x) = limN→∞

hNiN= x+

n≥1

x(1− x)nan,

which is the representation (7), and which is easily checked to coincide with (27) (asobtained by the direct approach of Sec. 5).

7 Discussion

We have reanalysed the process of fixation in a Moran model with two types and (fer-tility) selection by means of the labelled Moran model. It is interesting to compare thelabelled Moran model with the corresponding lookdown construction: In the lookdownwith fertility selection, neutral arrows only point in one direction (from lower to higherlevels), whereas selective arrows may appear between arbitrary levels. In contrast, thelabelled Moran model contains neutral arrows in all directions, but selective arrows onlyoccur from lower to higher labels. Also, the labelled Moran model deliberately dispenseswith exchangeability, which is an essential ingredient of the lookdown. It may be con-ceived that it is possible to transform the labelled Moran model into a lookdown (byway of random permutations), but this remains to be elucidated.

We certainly do not advertise the labelled Moran model as a general-purpose tool;in particular, due to its arbitrary neutral arrows, it does not allow the constructionof a sequence of models with increasing N on the same probability space. However,it turned out to be particularly useful for the purpose considered in this paper; thereason seems to be that the fixation probabilities of its individuals, ηNi , coincide withthe individual terms in (2). Likewise, we obtained a term-by-term interpretation of thefixation probability (3) as a decomposition according to the number of selective definingevents, which may successively shift the ancestral line to the left, thus placing moreweight on the fit individuals.

Indeed, the incentive for this paper was the intriguing representation (7) of the fixationprobabilities, which was first observed by Fearnhead [9] in the context of (a prunedversion of) the ASG, for the diffusion limit of the Moran model with mutation andselection at stationarity, and further investigated by Taylor [17]. Quite remarkably, (7)carries over to the case without mutation (which turns the stationary Markov chain intoan absorbing one), with Fearnhead’s an reducing to ours (see [12] and Sec. 2). Theobservation that our W∞, that is, 1 plus the number of selective defining events in thediffusion limit, has the same distribution as the number of branches in the ASG (seeSec. 5) fits nicely into this context, but still requires some further thought. Needlessto say, the next challenge will be to extend the results to the case with mutation – thisdoes not seem to be straightforward but possible, given the tools and insights that havebecome available lately.

19

Page 20: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

Acknowldegments

It is our pleasure to thank Anton Wakolbinger, Ute Lenz, and Alison Etheridge for en-lightening discussions. This project received financial support from Deutsche Forschungs-gemeinschaft (Priority Programme SPP 1590 ‘Probabilistic Structures in Evolution’,grant no. BA 2469/5-1).

References

[1] E. Baake, R. Bialowons, Ancestral processes with selection: Branching and Moranmodels, Banach Center Publications 80 (2008), 33-52

[2] B. Bah, E. Pardoux, A. B. Sow, A look-down model with selection, in: StochasticAnalysis and Related Topics: In Honour of Ali Süleyman Üstünel, Paris, June2010, (L. Decreusefond, J. Najim eds.), pp. 1-28, Springer, Berlin, Heidelberg,2012

[3] P. Donnelly, T. G. Kurtz, A countable representation of the Fleming-Viotmeasure-valued diffusion, Ann. Prob. 24 (1996), 698-742

[4] P. Donnelly, T. G. Kurtz, Particle representations for measure-valued populationmodels, Ann. Prob. 27 (1999), 166-205

[5] P. Donnelly, T. G. Kurtz, Genealogical processes for Fleming-Viot models withselection and recombination, Ann. Appl. Prob. 9 (1999), 1091-1148

[6] R. Durrett, Probability Models for DNA Sequence Evolution, 2. ed., Springer,New York, 2008

[7] A. Etheridge, Some Mathematical Models from Population Genetics, Springer,Heidelberg, 2011

[8] W. J. Ewens, Mathematical Population Genetics. I. Theoretical Introduction, 2.ed., Springer, New York, 2004

[9] P. Fearnhead, The common ancestor at a nonneutral locus, J. Appl. Prob. 39(2002), 38-54

[10] S. Karlin, H. M. Taylor, A Second Course in Stochastic Processes, AcademicPress, San Diego, 1981

[11] M. Kimura, On the probability of fixation of mutant genes in a population, Ge-netics 47 (1962), 713-719

[12] S. Kluth, T. Hustedt, E. Baake, The common ancestor process revisited, submit-ted (2013); arXiv:1305.5939[q-bio.PE]

20

Page 21: arXiv:1306.2772v1 [q-bio.PE] 12 Jun 2013

[13] S. M. Krone, C. Neuhauser, Ancestral processes with selection, Theor. Pop. Biol.51 (1997), 210-237

[14] S. Mano, Duality, ancestral and diffusion processes in models with selection,Theor. Pop. Biol. 75 (2009), 164-175

[15] C. Neuhauser, S. M. Krone, The genealogy of samples in models with selection,Genetics 145 (1997), 519-534

[16] C. Pokalyuk, P. Pfaffelhuber, The ancestral selection graph under strong direc-tional selection, Theor. Pop. Biol. (2012)

[17] J. E. Taylor, The common ancestor process for a Wright-Fisher diffusion, Elec-tron. J. Probab. 12 (2007), 808-847

21