15
Copyright 0 1986 by the Genetics Society of America POPULATION GENETICS OF AN EXPANDING FAMILY OF MOBILE GENETIC ELEMENTS TOMOKO OHTA National Institute of Genetics, Mishima, 41 I Japan Manuscript received September 18, 1985 Revised copy accepted January 7, 1986 ABSTRACT A model of an expanding family of dispersed repetitive DNA was studied. Based on the previous result of the model of duplicative transposition, an ap- proximate solution to give allelism and identity coefficients as functions of time was obtained, and theoretical predictions were verified by Monte Carlo experi- ments. The results show that, even if the copy number per genome increases very rapidly, allelism and identity coefficients may take a long time to reach equilibrium. The changes of allelism and allelic identity are similar to that of homozygosity at an ordinary single locus, whereas that of nonallelic identity can be much slower, particularly when the copy number per genome is large. Thus, many existing families of highly repetitive sequences may represent nonequilib- rium states for nonallelic identity. The present model may be extended to in- clude other evolutionary forces such as gene conversion or the recurrent inser- tion from normal gene copies. most remarkable finding in recent years is the abundance of repetitive A DNA families in genomes of higher organisms. Especially in mammals, extraordinary repetitive DNA families, such as Alu or Kpn, exist and constitute a substantial fraction of the genome. It is not quite clear what they are doing for the organisms, but at least some of them are transcribed and, hence, could have some effects on the expression of other genes (SINGER 1982). Thus, they may be important as controlling elements (BRITTEN and DAVIDSON 1969; SHA- PIRO 1983), or merely selfish DNA (DOOLITTLE and SAPIENSA 1980; ORGEL and CRICK 1980). Unlike typical multigene families for which function, chro- mosomal locations, copy number and so on are not changing greatly, the rise and fall of many dispersed repetitive sequences are quite rapid in evolution. Several recent review articles on retroposons (ROGERS 1985a,b; WEINER, DEININGER and EFSTRATIADIS 1986) are quite stimulating to evolutionary ge- neticists. Retroposons include all those mobile genetic elements which trans- pose via RNA by reverse transcriptase; from viral to nonviral retroposon and from pseudogene to highly repetitive DNA. Particularly noteworthy among them are those retroposon families that possess RNA polymerase I11 sites within themselves, and hence, newly transposed elements retain their trans- posability. They include many of the highly repetitive dispersed families found Genetics 115: 145-159 May, 1986. Downloaded from https://academic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

population genetics of an expanding family of mobile genetic elements

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: population genetics of an expanding family of mobile genetic elements

Copyright 0 1986 by the Genetics Society of America

POPULATION GENETICS OF AN EXPANDING FAMILY OF MOBILE GENETIC ELEMENTS

TOMOKO OHTA

National Institute of Genetics, Mishima, 41 I Japan

Manuscript received September 18, 1985 Revised copy accepted January 7, 1986

ABSTRACT

A model of an expanding family of dispersed repetitive DNA was studied. Based on the previous result of the model of duplicative transposition, an ap- proximate solution to give allelism and identity coefficients as functions of time was obtained, and theoretical predictions were verified by Monte Carlo experi- ments. The results show that, even if the copy number per genome increases very rapidly, allelism and identity coefficients may take a long time to reach equilibrium. The changes of allelism and allelic identity are similar to that of homozygosity at an ordinary single locus, whereas that of nonallelic identity can be much slower, particularly when the copy number per genome is large. Thus, many existing families of highly repetitive sequences may represent nonequilib- rium states for nonallelic identity. The present model may be extended to in- clude other evolutionary forces such as gene conversion or the recurrent inser- tion from normal gene copies.

most remarkable finding in recent years is the abundance of repetitive A DNA families in genomes of higher organisms. Especially in mammals, extraordinary repetitive DNA families, such as Alu or Kpn, exist and constitute a substantial fraction of the genome. It is not quite clear what they are doing for the organisms, but at least some of them are transcribed and, hence, could have some effects on the expression of other genes (SINGER 1982). Thus, they may be important as controlling elements (BRITTEN and DAVIDSON 1969; SHA- PIRO 1983), or merely selfish DNA (DOOLITTLE and SAPIENSA 1980; ORGEL and CRICK 1980). Unlike typical multigene families for which function, chro- mosomal locations, copy number and so on are not changing greatly, the rise and fall of many dispersed repetitive sequences are quite rapid in evolution.

Several recent review articles on retroposons (ROGERS 1985a,b; WEINER, DEININGER and EFSTRATIADIS 1986) are quite stimulating to evolutionary ge- neticists. Retroposons include all those mobile genetic elements which trans- pose via RNA by reverse transcriptase; from viral to nonviral retroposon and from pseudogene to highly repetitive DNA. Particularly noteworthy among them are those retroposon families that possess RNA polymerase I11 sites within themselves, and hence, newly transposed elements retain their trans- posability. They include many of the highly repetitive dispersed families found

Genetics 115: 145-159 May, 1986.

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 2: population genetics of an expanding family of mobile genetic elements

146 T. OHTA

in mammals, such as primate Alu, rodent B2 and artiodactyl C. Apparently, such families originate from cellular RNA species (tRNA, 7SL RNA, mRNA) and have expanded recently in evolutionary time scale. On the other hand, those families that have RNA polymerase I1 sites usually generate inactive retroposon; they include many of the processed pseudogenes of various protein coding genes.

OKADA and his colleagues have found that some discrete-sized short RNA transcribed in vitro from the total DNA of mammalian cells has a tRNA-like structure (SAKAMOTO and OKADA 1985; ENDOH and OKADA 1986; see also DANIELS and DEININGER 1985). They called such segments Hirt (highly repet- itive and transcribable) sequences and proposed that each one of these repet- itive families may have its own specific tRNA gene as its origin. Later, they renamed them PolIIl/SINE (MATSUMOTO, MURAKAMI and OKADA 1986). These sequences include the rodent type 2 Alu family, rat identifier sequence, rabbit C family, bovine 75 base pairs repeat and others; many of them appear to be species specific. The number of copies of these repetitive families is often quite large, with lo4 or more copies per genome. Also, it seems that some repetitive families consist of uniform copies, and others consist of considerably varying members in a species (ENDOH and OKADA 1986).

These and other examples of repetitive families suggest that, occasionally, in each species a unique tRNA, 7SL RNA or other similar RNA gene segment has rapidly spread in the genomes of the species (ROGERS 1985a,b; WEINER, DEININGER and EFSTRATIADIS 1986). The original copy must have had a greater chance of duplication than of deletion in order to increase its copy number. In order to understand such processes, population genetics of ex- panding repetitive sequences becomes necessary. Most previous theoretical studies of repetitive genes treat the equilibrium situation in the sense of a stable copy number (for reviews, see OHTA 1980, 1983a; DOVER 1982; ARN- HEIM 1983). Some of these models (OHTA 1985), however, may be extended to the expanding phase of repetitive sequences. In the following sections, the- oretical and Monte Carlo simulation studies on expanding repetitive families will be presented. The results are applicable not only to the above-mentioned examples but also to various transposon families and other mobile elements.

THE MODEL AND THE INCREASE OF COPY NUMBER

Unlike previous models of repetitive genes or DNA (OHTA 1980, 1982, 1983a, 1984, 1985; NACYLAKI 1984a,b; LANGLEY, BROOKFIELD and KAPLAN 1983; CHARLESWORTH and CHARLESWORTH 1983; SLATKIN 1985; HUDSON and KAPLAN 1986), the copy number is assumed here to increase with time. The change of copy number should be studied stochastically, but for simplicity, its increase is treated deterministically, whereas other quantities such as gene identity or allelism are studied stochastically.

Let us assume a random mating population of effective size N . A new family of transposons, retroposons or other mobile elements is introduced into the population. This family can increase its number, i .e. , the probability of dupli- cative transposition is larger than that of deletion. In addition, the rate of

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 3: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 147

duplicative transposition is assumed to be copy-number dependent; when the copy number is low the rate of transposition is high, but as it increases, the rate gets lower and becomes zero after the number reaches a certain value. Let n, be the copy number per genome (haploid set) that is a function of time, t. I t is further assumed that, initially, the copy number is one in every genome in the population, i .e. , n, = 1 at t = 0, each occupying a different chromosomal site. With a constant low rate, any copy is assumed to be deleted and lost from the genome. In practice, the following function is used for the rate of dupli- cative transposition per copy per generation, AI.

AI = a(nb - n,) for n, 5 nb

AI = 0 for n, > nb,

where a is a positive constant, nb implies the number bounded and anb is much less than unity. Let 6 be the rate of deletion per copy per generation. Then the change of n, in one generation becomes {u(nb - n,) - 6)n,, and n, increases with time according to the following formula:

nt = r i / { l + (ri - l)e-a'*], (2) where t is measured as the number of generations, and ?'i is the copy number at equilibrium and equal to nb - 6/a.

The initial condition that nt = 1 at t = 0 in every genome of the population is rather unrealistic. In this study, a unique sequence is considered as the origin for each repetitive DNA family, and a stochastic formulation is required for the behavior of the original copy introduced into the population. In other words, the initial state should be treated stochastically even if the increase of copy number may be satisfactorily described deterministically in a later stage. In the following, an approximate analysis is considered [by using my previous study (OHTA 1983b)l that gives a rough understanding of spreading or loss of a unit newly appearing in one genome of the population. When the rate of duplication (designated as 72) is larger than that of deletion (yo), such that 2Ny2 >> 1, the probability of spreading of a unit (U) becomes, for the diploid with almost free recombination, approximately

U = 2(YZ - Yo); (3)

in the present case, 7 2 is not a constant, but depends on the copy number in a genome. However, the probability of spreading is mostly determined by the initial few generations, so that the duplication rate at the beginning may be used in place of 7 2 of (3), i . e . , a(nb - l), the value of AI at n, = 1. Since yo = 6 in our notation, we have

U = 2{u(nb - 1) - 61 = 2(anb - a - 6). (4)

In addition, the expected copy number in the total population at the gth generation from the introduction (mg) clearly follows the formula

mg = e(anb-~-6)g (5)

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 4: population genetics of an expanding family of mobile genetic elements

148 T. OHTA

The average number in the population conditional on spreading becomes (OHTA 198313)

When this number reaches 2N, one expects one copy per genome. The number of generations required to reach such a state is, by equating mg,. = 2N,

1 In{4N(anb - a - 6)l.

g z anb - a - 6 (7)

The deterministic formulation ( 2 ) starts from this state by putting n = 1 and t = 0. Actually, the assumption of one copy per genome, each occupying a different chromosomal site, does not hold, and the copy number per genome is considered to obey a Poisson distribution. In a later section, the effect of this assumption on predicting various quantities, such as allelism and identity coefficient, are examined by Monte Carlo simulation, together with the validity of a deterministic change of copy number ( 2 ) .

ALLELISM AND IDENTITY COEFFICIENTS

For repetitive sequences that are evolving under duplicative transposition and gene conversion, the transitional equations of allelism and identity coeffi- cients have been worked out under the assumption of constant copy number (OHTA 1985). The result is used here by changing the copy number per genome deterministically using (2 ) . As before, let v be the mutation rate per copy per generation under the infinite allele model (KIMURA and CROW 1964). Analogously, the number of chromosomal sites is so large that transposition always occurs to a new site. In this report, gene conversion is excluded. Allel- ism-that is, the probability that a random copy of a genome find a homolo- gous copy at the same chromosomal site of another genome of the popula- tion-is designated F,, where t is the tth generation. In case of free recombi- nation between nonallelic units, as in this study, we need two identity coeffi- cients: ft (allelic identity) and C, (nonallelic identity). Note that in my previous study (OHTA 1985) two coefficients (C , and C,) are defined for nonallelic identity. My previous results reduce to the following transition equations of F t , J and Ct:

1 2N

Ft+l = (1 - 2Xr)Ft + - ( 1 - Ft),

where AI is the transposition rate at the tth generation, as given by (l) ,

1 2NFt

& + I = (1 - 2 V ) J + - (1 -ft). (9)

As for the change of the nonallelic identity coefficient, the intra- and inter- genome occurrences of transposition give almost the same result under the assumption of free recombination. Therefore, it is assumed here that trans-

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 5: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 149

position occurs within the genome (w = 1 in my previous notation).

where a! = 2XI/(n, - l), and

Note that the above formulation is an approximation, particularly while n, is small, and the validity of this method is examined later by simulations.

At equilibrium, when allelic identity is close to unity, these equations provide

1 1 + 4N6

1 ’= 1 + 4N&

F =

1 1 + 8Nv/(&(l + 4NS))’

e x

where a hat (^) over F, f and C denotes equilibrium values, and G = 26/(h - 1). SLATKIN’S (1985) result on the identity coefficient for a transposon family is expressed as 1/(1 + 4NGv) by the present notat ion.pur formula (13) converges to this value, when 4N6 >> 1 , since G(1 + 4NF) = 2/7i and 8Nv/(2/h) = 4NGv. BROOKFIELD’S (1985) results are also the same as ours, except that f is the unity in his formulation, and his formulas are expressed in terms of divergence time.

By using (8) to (lo), allelism and identity coefficients are calculated numer- ically from one generation to the next. It would be reasonable to assume that the initial rate of transposition is high, such that 2NXI >> 1 . Then allelism is zero, and identity coefficients are unity at t = 0. Figure 1 presents an example showing how they change with time, together with the deterministic increase of copy number. Parameters are chosen so that the results are helpful for understanding the evolution of some highly repetitive sequences, such as Alu; a = 5 X nb = lo4 and N = lo4. Thus, equilibrium copy number, 4 = nb - 6/a = 9998. Also shown in the figure are the values of allelism and identity coefficients at equilibrium. Several interesting features are revealed from the figure. Allelism gradually increases from zero to the equilibrium value, and identity decreases with time. A notable feature is that the copy number can increase quite rapidly depending on the rate of dupli- cative transposition, whereas allelism and identity coefficients take a long time to reach equilibrium. In particular, nonallelic identity is extremely slow to approach equilibrium.

This feature is important for understanding the evolution of repetitive se- quences; therefore, let us discuss its theoretical basis. From (8) to (lo), the rate of approach to equilibrium may be obtained. In our case, the coefficients of the equations change with time because of the increase of copy number, but

6 = Y = 2 X

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 6: population genetics of an expanding family of mobile genetic elements

150 T. OHTA

I a\'

10000 0 0 U Y

3 :Loo0 C 3

10

0- (D Y

ilibr ium time ( i n un i ts of io4 generations)

FIGURE 1 .-Copy number, allelism and identity coefficients of an expanding family of repetitive sequence are shown as functions of time. Their equilibrium values are also shown. Parameters are a = 5 X 6 = v = 2 X N = lo4 and 6 = 9998.

the final values are significant. For allelism, by denoting the final rate by 77 with a subscript,

1 1 2N 2N

1)'c = - + 26 = - (1 + e),

where 0 = 4N6. For identity coefficients,

1 1 2NF 2N

vf= - + 2v = - (1 + e + el), (15)

and

qy = _26 n - 1 (L 4 N + fi) + 2v = 2N [A n - 1 (I 2N + &) + el], (16)

where 6 is the copy number at equilibrium and O1 = 4Nv. Equations ( 1 4 ) and (1 5) tell us that allelism and allelic identity approach equilibrium values some- what similarly, as does homozygosity at single locus. On the other hand, the rate of approach to equilibrium of nonallelic identity depends heavily on the copy number and is very different from the single locus case. When 6 is large, the first term of the right-hand side of (16) is quite small, and one expects the following relationship:

22, >> 726 (L + F). n - 1 4N (17)

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 7: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 151

Then the rate becomes almost 2v, which is simply the mutational pressure. If v is measured per nucleotide site, as is often done in molecular evolution studies, U takes a value of the order of lo-’ (see KIMURA 1983). In some cases, the first term of the right-hand side of (16) may be larger than 2v; then it affects the rate of approach, even when 4 is large.

The rate of approach to equilibrium of nonallelic identity is usually quite small, particularly when 4 is large. The result may be important for under- standing the high identity among copies of some highly repetitive families, such as A h . It is most likely that these families represent some nonequilibrium state with respect to nonallelic identity. It is very easy to choose a set of reasonable values of parameters in which copy number, allelism and allelic identity are close to their equilibrium values while nonallelic identity is much higher than its final figure.

MONTE CARLO SIMULATIONS

Monte Carlo experiments were performed to verify the theoretical predic- tions developed above. At the same time, the variance of copy number among genomes in the population was examined. Each experiment starts with the introduction of a new element in one genome of the population. This element has a probability of duplicative transposition within the genome as given by (1) and that of deletion, 6. When it transposes, it always moves to a new chromosomal site. In practice, the site is stored as an integer, and it increases by one by each transpositional event. The mutational state is stored for each element by another set of integers. Again, it increases by one by each muta- tional event, so that mutation occurs always to a new state. The mutational state never goes back; thus, it is the infinite allele model (KIMURA and CROW 1964).

One generation of the experiment consists of the following: mutation with a constant rate, U, per element, duplicative transposition, deletion, recombi- nation and sampling. Each gamete stores its number of elements, and each one of them has two integers for mutational state and chromosomal site. In prac- tice, transposition, deletion, recombination and sampling are done simultane- ously. First, two genomes are sampled, and duplicative transposition and dele- tion are carried out for each of them according to the assigned probability. Next, sampling and recombination are carried out; for two allelic units of the two gametes, one of them is sampled, and for nonallelic units, each unit is independently sampled with a probability of 0.5. These processes of transpo- sition, deletion, recombination and sampling are repeated twice, so that two daughter gametes are sampled from the two parental ones. This method is considered to mimic well the real process of gamete formation. The total processes are repeated N times, and the number of daughter gametes becomes 2N.

In each generation the mean and variance of copy number per gamete, allelism, allelic identity and nonallelic identity are calculated. When the copy number becomes zero, the element is lost from the population and the exper- iment ends. When it spreads into the population, the data may be used to

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 8: population genetics of an expanding family of mobile genetic elements

152 T. OHTA

0 50 100 150 time

FIGURE 2.-Comparison of the theoretical and simulation results for the change of copy number per genome. Smooth curve represents theoretical prediction, and broken lines show two sample paths of Monte Carlo experiment. Parameters are a = 0.01, 6 = 0.001, no = 10, v = 0.0025 and N = 10.

check the theoretical prediction. In my theory, the copy number is assumed to be one per genome of the population at t = 0. Let us denote the number of generations to reach this state gl. As shown already, a newly introduced element takes, on the average,

E(gl) = g = zn{4N(anb - a - 6))/(an, - a - 6)

generations (7) to reach this state, where E denotes expectation. This genera- tion of the experiment is put to t = 0 for comparing theoretical and observed values.

Allelism and identity coefficients were calculated in the following manner. For every pair of genomes of the population [total of 2N(2N - 1)/2], the number of allelic pairs of elements was counted. Allelic identity was examined for each allelic pair of elements by noting their mutational states. The number of nonallelic pairs of elements was also recorded, and identity was calculated for all such nonallelic pairs. All these counts were summed for all pairs of genomes of the population. Allelism was obtained by dividing the total count of allelic pairs of elements by n X 2N(2N - 1)/2, where n is the average copy number per genome. The allelic identity coefficient was found from the total allelic identity count divided by the total number of allelic pairs. Similarly, the nonallelic identity coefficient was obtained by dividing the total nonallelic iden- tity count by the total number of nonallelic pairs of elements in the population.

Figures 2-5 show such comparisons. Parameters used are a = 0.01, 6 = 0.001, nb = 10, v = 0.0025 and N = 10. The ordinate is time in generations, and the smooth curves represent theoretical predictions. In each figure, two broken lines represent two sample paths where the element spreads. These paths represent the first two cases in which the element spread in the experi- ments. Thus, they represent randomly chosen paths. Although there are con- siderable fluctuations in allelism or identity coefficients, the agreements be- tween the theoretical and observed values are remarkably good even for a randomly chosen sample path. I have performed simulations for many other sets of parameter values and have found that agreement is satisfactory. The

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 9: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 153

0 50 100 150 time

FIGURE J.-Comparison of the theoretical and simulation results for the change of allelism. Broken lines are the result of the same sample paths of Monte Carlo experiment shown in Figure 2.

J

50 100 150 0.0;

t i m e FIGURE 4.-Comparison of the theoretical prediction and the result of Monte Carlo experiment

(same sample paths as shown in Figure 2) for the change of allelic identity.

0.0' L

0 50 100 150 time

FIGURE 5.-Comparison of the theoretical prediction and the result of Monte Carlo experiment (same sample paths as shown in Figure 2) for the change of nonallelic identity.

largest disagreement was for the number of generations to reach the initial condition of one copy per genome, g1. This is a random variable with consid- erable fluctuation. But if I adjust gl to make t = 0 at the time when the average copy number becomes 1 for each sample path, the agreements between the theoretical and observed values are very good for each sample path. The result would imply that each path reflects parameter values fairly faithfully. Of

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 10: population genetics of an expanding family of mobile genetic elements

154 T. OHTA

course, it is well known that allelism and identity coefficients have large vari- ances, but if a path is traced over a long period of time, the theoretical prediction fits well.

In the experiments, the variance of copy number among the genomes was also computed. According to CHARLESWORTH and CHARLESWORTH (1 983), the variance of copy number may be expressed, under free recombination, as

where n is the average copy number per genome, T is the total number of chromosomal sites that can be occupied by the element and a: is the variance of frequencies of the element at occupiable sites in the population. Let us look at the quantity, n'/T + Ta:, the sum of the last two terms of this equation. This can be rewritten as

where 2 is the average frequency, and E denotes expectation. E ( x 2 ) / 2 is allelism in our notation; therefore, we have

a:,r = n,( 1 - Ft) . (19)

I have checked the validity of (19) by my simulation data and have found that the agreement is satisfactory, although considerable fluctuations are observed. This formula tells us that, when allelism is close to zero, the variance is almost equal to the mean, i.e., that of a Poisson distribution, whereas when Ft is large, the variance becomes correspondingly small.

DISCUSSION

Many families of highly repetitive DNA, retroposons, transposons and others appear to consist of members that are too uniform within species, as compared with the theoretical prediction under an equilibrium assumption (SLATKIN 1985; OHTA 1985; BROOKFIELD 1985; WEINER, DEININCER and EFSTRATIADIS 1986). The present analyses provide a way to make various predictions on an expanding family of repetitive sequences. Allelism and identity coefficients are quantities that can be estimated, and they should be useful for estimating parameter values based on the present theory. Of course, the model is only one possibility among many others. It is possible that the increase of copy number is not so rapid, but may take millions of years. Then we should modify (11, such that the rate of duplicative transposition is more mildly dependent on copy number. Or, it is possible that the initial copy had somehow gained the ability to transpose with high frequency, but that the ability gradually diminished through the accumulation of mutations in the copies [for the case of transposons, see DOVER (1986)l. The present model could also be modified for such cases. For example, equation (1) may be changed, such that A, is an

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 11: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 155

exponentially decreasing function

AI = AI,oe-”t,

where v is the rate of defective mutation with respect to transposition. A numerical analysis is applicable for such cases by using the present method; however, the deterioration is likely to accumulate nonuniformly among the copies, and it is difficult to incorporate such an effect.

WALSH (1985) studied a different model for the accumulation of the proc- essed pseudogenes from a multigene family. He considered a situation where the processed pseudogenes are recurrently created in the genome by normal (functional) genes, and insertion of new pseudogenes and deletion balance each other so that an equilibrium will be attained. The situation may apply to some retroposon families that have an RNA polymerase I1 site. He argued that, if the deletion rate is sufficiently large relative to the mutation rate, the pseu- dogenes will be highly homogeneous. A more precise formulation of this model can be given by using identity coefficients. Let the mutation rate per nucleotide site in terms of the nucleotide substitution of the normal gene be U,, and let that of the pseudogene be vp. Let us denote the identity probability of nucleo- tides between normal and pseudogenes, Cpn, and denote that among pseudo- genes, C,. The nonallelic identity coefficient of normal genes is C2 as before (e.g., OHTA 1983a). Then, the equations for the changes of C, and Cp, in one generation become, by assuming that the mutation is random among the four bases,

ACp = 2Ap(Cpn - Cp) +

where A, is the insertion rate per copy as before, which is equal to the deletion rate at equilibrium. Identity coefficients at equilibrium become

A& + v A, + 4v

Aid2 + APV e - vf + , - x, + 4vp (A, + 4VP)(AP + 4V)’

Cpn =

where V = (U, + v p ) / 3 , Vp = v p / 3 and 62 is the nonallelic identity coefficient of normal genes at equilibrium (e.g., OHTA 1983a, equation loa). Equation ( 2 3 ) supports WALSH’S conclusion that, if A, >> vp, the identity coefficient is high. Note that, in WALSH’S model, pseudogenes themselves do not transpose duplicatively; thus, the formulation is simple.

Is it possible that recurrent insertion from normal or some other source genes contributes to the homogeneity of Alu and other repetitive families considered here? We do not know the answer, but our model can be modified

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 12: population genetics of an expanding family of mobile genetic elements

156 T . OHTA

to include such a process. Equations (8) to (10) are now changed to

(24) 1

2N F,+1 = (1 - 2x1 - 2Ap,,)F, + - (1 - F , )

where A,,, is the insertion rate at the tth generation. We also need

where Cpn,, is the identity coefficient between the normal gene and repetitive copies at time t, and mutation is assumed to occur under the infinite allele model (KIMURA and CROW 1964). Also, the copy number increases by the amount, hp,,n,, at the tth generation, in addition to the change by duplicative transposition. Here Ap,t is measured per randomly chosen copy from the re- petitive family, and it could be extremely small when the copy number becomes large. Recurrent insertion from the normal gene copies may then be negligible.

In addition to the above extension of the present analysis, gene conversion can be brought into the model if necessary. My previous transition formulas for f and two nonallelic identity coefficients (Cl and C2, OHTA 1985) will be useful. Thus, the present method is applicable to many cases, but difficulties arise when the assumption of a simple deterministic increase of copy number is violated.

In some families of transposons, the process of duplicative transposition is very intricate, and the transposition rate appears to depend much upon back- ground factors (KIDWELL 1979; ENCELS 1979; SCHWARZ-SOMMER et al. 1985). UYENOYAMA (1985) studied a realistic but complicated model for the P-M system of Drosophila. The analysis becomes tedious even for formulating the single-locus transition from M to P type, because of interactions among trans- position, cytotype effect, maternal or offspring control and natural selection. It is impossible to incorporate these factors into the present analyses. On the other hand, MUKAI et al. (1 985) consider that the spread of a P-like factor in a Japanese Drosophila population can be described by a simple function. KAP- LAN, DARDEN and LANGLEY (1985) considered a model in which there are two types of elements, wild type and mutant, in a transposon family. In their model, the transposition rate is copy-number dependent as in my model, and in ad- dition, the mutant type is assumed to be less efficient for transposition than is the wild type. Mutation is unidirectional, from the wild type to the mutant, i .e . , a kind of deterioration of the element. In such a model, the transposon family will be lost after a very long period of time. It would be desirable to study their model with the present method, although the analysis would be very complicated.

Mammalian and Drosophila dispersed repetitive sequences appear to differ with respect to allelism. It is usually high in the former and low in the latter

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 13: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 157

(MONTGOMERY and LANGLEY 1983), although Hawaiian Drosophila seem to be an exception (HUNT, BISHOP and CARSON 1984). It is plausible that practically all transpositional events are selectively deleterious in Drosophila genomes, whereas in mammals, some transpositional events are not deleterious (i .e. , se- lectively neutral or advantageous), and the frequencies in the population can increase; that is, allelism becomes high. It should be noted here that, even though allelism is high and may be close to the equilibrium value in many retroposon families of mammals, nonallelic identity is still far from the equilib- rium value (see Figure 1). In this sense, these families are in the state of nonequilibrium. The present model treats only selectively neutral transposition and mutation, and the effect of natural selection is left to a future study.

I thank B. S. WEIR, M. SLATKIN, H. TACHIDA and two anonymous referees for their many valuable suggestions and comments for improving the presentation. I also thank N. OKADA for his stimulating discussions. This work is supported by a Grant-in-Aid from the Ministry of Education, Science and Culture of Japan. Contribution 1674 from the National Institute of Genetics, Mishima, 41 1 Japan.

LITERATURE CITED

ARNHEIM, N., 1983 Concerted evolution of multigene families. pp. 38-61. In: Evolution of Genes

BRITTEN, R. J. and E. H. DAVIDSON, 1969 Gene regulation for higher cells: a theory. Science 165:

BROOKFIELD, J. F. Y., 1985 A model for DNA sequence evolution within a transposable element

CHARLESWORTH, B. and D. CHARLESWORTH, 1983 The population dynamics of transposable ele-

DANIELS, G. R. and P. L. DEININGER, 1985 Several major mammalian SINE families are derived

Selfish genes, the phenotype paradigm and genome

Molecular drive: a cohesive mode of species evolution. Nature 299: 1 1 1-1 17.

What drives new functions? In: Evolutionary Processes and Theory, Edited by S.

Total DNA transcription in vitro: a novel procedure to detect highly repetitive and transcribable sequences with tRNA-like structures. Proc. Natl. Acad. Sci. USA. In press.

Hybrid dysgenesis in Drosophila melanogaster: rules of inheritance of female sterility. Genet. Res. 33: 2 19-236.

HUDSON, R. R. and N. L. KAPLAN, 1986 On the divergence of a transposable element family. J. Math. Biol. In press.

HUNT, J. A., J. G. BISHOP, 111 and H. L. CARSON, 1984 Chromosomal mapping of a middle- repetitive DNA sequence in a cluster of five species of Hawaiian Drosophila. Proc. Natl. Acad. Sci. USA 81: 7146-7150.

KAPLAN, N., T. DARDEN and C. H. LANGLEY, 1985 Evolution and extinction of transposable

and Proteins, Edited by M. NEI and K. KOEHN. Sinauer Associates, Sunderland, Massachusetts.

349-357.

family. Genetics 112 393-408.

ments. Genet. Res. 4 2 1-28.

from tRNA genes. Nature 317: 819-822.

evolution. Nature 284 601-603. DOOLITTLE, W. F. and C. SAPIENZA, 1980

DOVER, G. A., 1982

DOVER, G. A., 1986 KARLIN and E. NEVO. Academic Press, New York.

ENDOH, H. and N. OKADA, 1986

ENGELS, W. R., 1979

elements in Mendelian populations. Genetics 109: 459-480.

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 14: population genetics of an expanding family of mobile genetic elements

158 T. OHTA

KIDWELL, M. G., 1979 Hybrid dysgenesis in D. melanogaster: the relationship between the P-M and I-R interaction systems. Genet. Res. 33: 205-217.

The Neutral Theory of Molecular Evolution. Cambridge University Press, Cam- bridge, England.

KIMURA, M. and J. F. CROW, 1964 The number of alleles that can be maintained in a finite population. Genetics 4 9 725-738.

LANGLEY, C. H., J. F. Y. BROOKFIELD and N. KAPLAN, 1983 Transposable elements in Mendelian populations. I. A theory. Genetics 104: 457-471.

MATSUMOTO, K., K. MURAKAMI and N. OKADA, 1986 Gene for lysine tRNA1 as a progenitor of the highly repetitive and transcribable sequences present in the salmon genome. Proc. Natl. Acad. Sci. USA. In press.

Transposable elements in Mendelian populations. 11. Distribution of copia-like elements in natural populations. Genetics 104 473-483.

Rapid change in mutation rate in a local population of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 82: 7671-7675.

NAGYLAKI, T., 1984a The evolution of multigene families under intrachromosomal gene conver-

NAGYLAKI, T., 1984b Evolution of multigene families under interchromosomal gene conversion.

OHTA, T., 1980 Evolution and Variation of Multigene Families: Lecture Notes in Biomathematics, Vol.

Allelic and nonallelic homology of a supergene family. Proc. Natl. Acad. Sci. USA

KIMURA, M., 1983

MONTGOMERY, E. A. and C. H. LANGLEY, 1983

MUKAI, T., M. BABA, M. AKIYAMA, N. UOWAKI, S. KUSAKABE and F. TAJIMA, 1985

sion. Genetics 106: 529-548.

Proc. Natl. Acad. Sci. USA 81: 3796-3800.

37. Springer, New York.

OHTA, T., 1982 79: 3251-3254

OHTA, T., 1983a On the evolution of multigene families. Theor. Pop. Biol. 23: 2 16-240.

OHTA, T., 1983b Theoretical study on the accumulation of selfish DNA. Genet. Res. 41: 1-15.

OHTA, T., 1984 Some models of gene conversion for treating the evolution of multigene families.

A model of duplicative transposition and gene conversion for repetitive DNA

ORGEL, L. E. and F. H. C. CRICK, 1980 Selfish DNA: the ultimate parasite. Nature 284 604-607.

ROGERS, J. H., 1985a The structure and evolution of retroposons. pp. 231-279. International Review of Cytology, Vol. 93: Genome Evolution in Prokaryotes and Eukaryotes, Edited by D. C. REANNEY and P. CHAMBON. Academic Press, New York.

Genetics 106 517-528.

OHTA, T., 1985 families. Genetics 110 5 13-524.

ROGERS, J. H., 1985b Origins of repeated DNA. Nature 317: 765-766.

SAKAMOTO, K. and N. OKADA, 1985 Rodent type 2 Alu family, rat identifier sequence, rabbit C family and bovine or goat 73 bp repeat may have evolved from tRNA genes. J. Mol. Evol. 22: 134-140.

SCHWARZ-SOMMER, Z., A. GIERL, H. CUYPERS, P. A. PETERSON and H. SAEDLER, 1985 Plant transposable elements generate the DNA sequence diversity needed in evolution. EMBO J. 4: 59 1-597.

SHAPIRO, J. A. (Editor), 1983 Mobile Genetic Elements. Academic Press, New York.

SINGER, M. F., 1982

SLATKIN, M., 1985

SINES and LINES: highly repeated short and long interspersed sequences in

Genetic differentiation of transposable elements under mutation and unbiased

mammalian genomes. Cell 28: 433-434.

gene conversion. Genetics 110 461-468.

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022

Page 15: population genetics of an expanding family of mobile genetic elements

EXPANDING MOBILE GENETIC ELEMENTS 159 Quantitative models of hybrid dysgenesis: rapid evolution under trans-

position, extrachromosomal inheritance, and fertility selection. Theor. Pop. Biol. 27: 176-201.

WALSH, J. B., 1985 How many processed pseudogenes are accumulated in a gene family? Genetics 1 1 0 345-364.

WEINER, A. M., P. L. DEININCER and A. EFSTRATIADIS, 1986 Nonviral retroposons: genes, pseu- dogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. In press.

UYENOYAMA, M. K. , 1985

Communicating editor: B. S . WEIR

Dow

nloaded from https://academ

ic.oup.com/genetics/article/113/1/145/5996989 by guest on 22 January 2022