25
STATISTICAL STUDIES ON PROTEIN POLYMORPHISM IN NATURAL POPULATIONS. 111. DISTRIBUTION OF ALLELE FREQUENCIES AND THE NUMBER OF ALLELES PER LOCUS RANAJIT CHAKRABORTY, PAUL A. FUERST AND MASATOSHI NE1 Center for Demographic and Population Genetics, University of Texas at Houston, Texas 77025 Manuscript received July IO, 1979 Revised copy received November 9, 1979 ABSTRACT With the aim of understanding the mechanism of maintenance of protein polymorphism, we have studied the properties of allele frequency distribution and the number of alleles per locus, using gene-frequency data from a wide range of organisms (mammals, birds, reptiles, amphibians, Drosophila and non-Drosophila invertebrates) in which 20 or more loci with at least 100 genes were sampled. The observed distribution of allele frequencies was U- shaped in all of the 138 populations (mostly species or subspecies) examined and generally agreed with the theoretical distribution expected under the mutation-drift hypothesis, though there was a significant excess of rare alleles (gene frequency, 0 - 0.05) in about a quarter of the populations. The agree- ment between the mutation-drift theory and observed data was quite satis- factory for the numbers of polymorphic (gene frequency, 0.05 - 0.95) and monomorphic (0.95 - 1 .O) alleles.-The observed pattern of allele-frequency distribution was incompatible with the prediction from the overdominance hypothesis. The observed correlations of the numbers of rare alleles, poly- morphic alleles and monomorphic alleles with heterozygosity were of the order of magnitude that was expected under the mutation-drift hypothesis. Our re- sults did not support the view that intracistronic recombination is an important source of genetic variation. The total number of alleles per locus was positively correlated with molecular weight in most of the species examined, and the magnitude of the correlation was consistent with the theoretical prediction from mutation-drift hypothesis. The correlation between molecular weight and the number of alleles was generally higher than the correlation between molecular weight and heterozygosity, as expected. HE purpose of this series of papers is to study the mechanism of maintenance protein polymorphism in natural populations from the statistical point of view. We are particularly interested in testing the neutral mutation or mutation- drift hypothesis as a statistical null hypothesis. In the first paper (FUERST, CHAKRABORTY and NEI 1977) , we examined various properties of heterozygosity, particularly the relationship between the mean and variance of heterozygosity among different loci or different species; whereas, in the second paper (CHAKRA- BORTY, FUERST and NEI 1978), we studied the pattern of gene differentiation between populations in terms of the mean and variance of genetic distance and the correlation of heterozygosity between populations. In this paper, we report Genetics 94: 1039-1063 April, 1980.

of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

STATISTICAL STUDIES ON PROTEIN POLYMORPHISM IN NATURAL POPULATIONS. 111. DISTRIBUTION OF ALLELE FREQUENCIES

AND THE NUMBER OF ALLELES PER LOCUS

RANAJIT CHAKRABORTY, PAUL A. FUERST AND MASATOSHI NE1

Center f o r Demographic and Population Genetics, University of Texas at Houston, Texas 77025

Manuscript received July I O , 1979 Revised copy received November 9, 1979

ABSTRACT

With the aim of understanding the mechanism of maintenance of protein polymorphism, we have studied the properties of allele frequency distribution and the number of alleles per locus, using gene-frequency data from a wide range of organisms (mammals, birds, reptiles, amphibians, Drosophila and non-Drosophila invertebrates) in which 20 or more loci with at least 100 genes were sampled. The observed distribution of allele frequencies was U- shaped in all of the 138 populations (mostly species or subspecies) examined and generally agreed with the theoretical distribution expected under the mutation-drift hypothesis, though there was a significant excess of rare alleles (gene frequency, 0 - 0.05) in about a quarter of the populations. The agree- ment between the mutation-drift theory and observed data was quite satis- factory for the numbers of polymorphic (gene frequency, 0.05 - 0.95) and monomorphic (0.95 - 1 .O) alleles.-The observed pattern of allele-frequency distribution was incompatible with the prediction from the overdominance hypothesis. The observed correlations of the numbers of rare alleles, poly- morphic alleles and monomorphic alleles with heterozygosity were of the order of magnitude that was expected under the mutation-drift hypothesis. Our re- sults did not support the view that intracistronic recombination is an important source of genetic variation. The total number of alleles per locus was positively correlated with molecular weight in most of the species examined, and the magnitude of the correlation was consistent with the theoretical prediction from mutation-drift hypothesis. The correlation between molecular weight and the number of alleles was generally higher than the correlation between molecular weight and heterozygosity, as expected.

HE purpose of this series of papers is to study the mechanism of maintenance protein polymorphism in natural populations from the statistical point of

view. We are particularly interested in testing the neutral mutation or mutation- drift hypothesis as a statistical null hypothesis. In the first paper (FUERST, CHAKRABORTY and NEI 1977) , we examined various properties of heterozygosity, particularly the relationship between the mean and variance of heterozygosity among different loci or different species; whereas, in the second paper (CHAKRA- BORTY, FUERST and NEI 1978), we studied the pattern of gene differentiation between populations in terms of the mean and variance of genetic distance and the correlation of heterozygosity between populations. In this paper, we report Genetics 94: 1039-1063 April, 1980.

Page 2: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1040 R. CHAKRABORTY, P. A. FUERST A N D M. N E 1

on the properties of allele-frequency distributions and number of alleles per locus. Our interest is in knowing whether or not the observed pattern of these quantities agrees with the prediction from the mutation-drift hypothesis. It is known that, unlike heterozygosity and genetic distance, the number of alleles per locus in a sample is strongly affected by sample size. Therefore, a rather careful study is necessary for these quantities. We also investigated whether or not the number of alleles per locus is correlated with the subunit molecular weight of the protein encoded.

MATERIALS A N D METHODS

As in our previous papers, gene-frequency data for protein loci were obtained by surveying the literature. We used only those species or populations in which at least twenty loci were studied with a sample size of 100 genes or more. Occasionally, one species comprised a number of local populations. In this case, cluster analysis was conducted by using NEI’S (1972) genetic- distance measure, and when a pair of groups of local populations showed an average genetic distance of 0.05 or less, they were regarded as the saine population. This was done because the number of genes sampled from a local population was often small. I t is known that the effect of subdivision of population on the distribution of allele frequencies under the mutation- drift hypothesis is small unless the migration rate between local populations is extremely small (EWENS and GILLESPIE 1974). In practice, in only a small proportion of the populations used was the genetic distance between local populations close to 0.05. At any rate, the total number of populations thus generated was 138. The genetic distance between local populations within a species is generally lower than 0.05 (NEI 1975). Therefore, our populations generally represent species or subspecies.

We have included 36 populations of mammals (26 species: AVISE, SMITH and SELANDER 1974; AVISE et al. 1974; SELANDER et al. 1971; SMITH, SELANDER and JOHNSON 1973; BROWNE 1977; KILPATRICK and ZIMMERMAN 1976; NEVO et al. 1974; GILL 1976; SELANDER, HUNT and YANG 1969; SELANDER and YANG 1969; KIM 1972; JOHNSON et al. 1972; PATTON, SELANDER and SMITH 1972; PATTQN, YANG and MYERS 1975; PATTON and YANG 1977; GREENBAUM and BAKER 1976; GLOVER et al. 1977; MANWELL and BAKER 1977; NOZAWA et al. 1977; SHOTAKE, NOZAWA and TANABE 1977; MCDERMID, Vos and DOWNING 1973; HARRIS, HOPKINSON and ROBSON 1973), two populations of birds (two species: LUCOTTE and KAMINSKI 1976; BAKER 1974), 18 populations of reptiles 114 species: GORMAN and KIM 1975; GORMAN, unpublished; TAYLOR and GORMAN, 1975; GORMAN et al. 1975; PARKER and SELANDER 1976; WEBSTER, SELANDER and YANG 1972; WEBSTER 1975; GARTSIDE, DESSAUER and JOANEN 1977; GARTSIDE, ROGERS and DESSAUER 1977), 21 popula- tions of amphibians (14 species: LARSON and HIGHTON 1978, DUNCAN and HIGHTON 1979; HIGH- TON and WEBSTER 1976; HEDGECOCK 1976, 1978; MEXKLE, GUTTMAN and NICKERSON 1977; NEVO 1976; NEVO, DESSAUER and CHUANG 1975), 20 populations of fish (18 species: WARD and BEARD- MORE 1977; JOHNSON and UTTER 1976; MITTON and KOEHN 1975; JOHNSON, UTTER and HODGINS 1973; JOHNSON 1975; LYNCH and VYSE 1979; VRIJENHOEK, ANGUS and SCHULTZ 1977; MERRITT, ROGERS and KURZ 1978; AVISE and AYALA 1976; SAGE and SELANDER 1975; FRYDENBERG and SIMONSEN 1973), 25 populations of Drosophila (23 species: MARINKOVIC, AYALA and ANDJELKOVIC 1978; ZOUROS e: al. 1974; BARKER and MULLEY 1976; SENE and CARSON 1977; AYALA and TRACEY 1974; AYALA et al. 1974a, SAURA 1974; BAND 1975; PRAKASH 1973a, b, 1977a, b, c; YANG WHEELER and BOCK 1972; LAKOVAARA and SAURA 1971; TEMPLETON, CARSON and SING 1976; STEINER 1974), three populations of other insects (three species: KNOPF 1977; SAURA, HALKKA and LOKKI 1973), eight populations of crustaceans (eight species: TRACEY et a1 1975; NEMETH and TRACEY 1979; LESTER 1979; SELANDER et al. 1970; AYALA, VALENTINE and ZUMWALT 1975) and five populations of other invertebrates (five specles: AYALA et al. 1973, 1974b, 1975; JARVINEN e? al. 1976; AHMAD, SKIBINSKI and BEARDMORE 1977).

Page 3: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1041

DATA ANALYSIS

General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for protein

loci in the Drosophila willistoni group and man agreed with the theoretical dis- tributions from the mutation-drift theory, She found an excess of rare alleles compared with the theoretical expectation and took this as evidence for her hypothesis that slightly deleterious mutations play an important role in deter- mining the level of protein polymorphism. A similar conclusion was obtained by LATTER (1976) in his analysis of the relationship between heterozygosity and gene frequency in the Drosophila willistoni group species. Their studies, however, should be reexamined, since they did not take into account the varia- tion of mutation rate among loci, which apparently cannot be neglected (NEI, FUERST and CHAKRABORTY 1976; ZOUROS 1979). Furthermore, it is interesting to know whether or not the excess of rare alleles occurs in other groups of organisms.

Theory: The theoretical distribution of allele frequencies for a single locus was studied by WRIGHT (1949) and KIMURA and CROW (1964) by using the so-called infinite-allele model. They have shown that the expected number of alleles whose frequency is between z and z + dz is given by

@ ( Z) dz = 4iVv (1 - Z) 4 N v - 1 ~ 1 d ~ , (1) where N is the effective population size and U is the mutation rate per locus per generation. This model applies to a collection of alleles obtained from different loci if the mutation rate is the same for all loci. Therefore, we call it the infinite- allele model with constant mutation rate (IC model). In practice, however, the mutation rate varies from locus to locus. NEI, CHAKRABORTY and FUERST ( 1976) therefore developed the so-called infinite-allele model with varying mutation rate (IV model). In this model, M = 4Nv is assumed to vary with loci and follow the gamma distribution given by

f(W = [P"/r(ff)lf+arMor , (2)

where CY = az/Vu and /3 = M/VIII, in which ii? and Vzv are the mean and variance of M y respectively. The formula for +(z) is

Formulas (1 ) and ( 3 ) give the expected number of alleles in the population. The number of alleles in a sample, particularly that of rare alleles, is known to be affected by sample size (EWENS 1972). The expected number of alleles whose frequency is between p and q in a sample of n genes is given by

Page 4: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1042 R. CHAKRABORTY, P. A. FUERST A N D M. NE1

where n p and nq are integral numbers. If we use ~(z) in ( I ) , n(p,q) becomes

In the case of varying mutation rate, n(p,q) can be obtained by numerical integrations.

One might argue that OHTA and KIMURA’S (1973) stepwise mutation model is more appropriate to current protein data than is the infinite-allele model. Although recent studies on the detectability of hemoglobin variation by electro- phoresis do not necessarily support this view (FUERST and FERRELL 1980; RAM- SHAW, COYNE and LEWONTIN 1979), it is worth examining this model, since the true situation should be somewhere between the two models. The theoretical distribution of allele frequencies for the stepwise mutation model with constant mutation rate (SC model) was studied by KIMURA and OHTA (1975, 1978) for a fixed value of 4Nv. This can be extended to the case where the mutation rate varies according to the gamma distribution (SV model) and, thus, n ( p,q) may be computed (APPENDIX I).

In the case of neutral mutations, the shape of the allele-frequency distribution pooled over many loci is determined by M and a. Our earlier studies (NEI, CHAKRABORTY and FUERST 1976; FUERST, CHAKRABORTY and NEI 1977) and ZOUROS’ (1979) study have suggested that OL is approximately 1 under the null hypothesis of neutral mutations. In this case, the allele frequency distribu- tion is U-shaped as long as the average heterozygosity is lower than 0.3, which is the largest value observed so far in natural populations (FUERST, CHAKRA- BORTY and NEI 1977). However, the absolute value of cp (z) dx varies considerably with the level of average heterozygosity. Recently, LI (1978) studied the theo- retical distributions of allele frequencies for slightly deleterious and overdomi- nant alleles. His results indicate that if overdominant alleles are predominant, the distribution will have a mode at the intermediate gene frequency; whereas, in the case of slightly deleterious mutations, a mode tends to occur between 0.9 and 1 .O. Therefore, it is possible to detect certain types of selection by examining the distribution.

In this connection, it should be noted that the above studies are based on the assumptions of equilibrium conditions. NEI and LI (1976) studied the distribu- tion of neutral alleles in nonequilibrium populations in terms of the infinite- allele model. Their results show that when a population is newly expanded from a relatively small population, a mode near gene frequency 0.9 may arise temporarily. Therefore, it is difficult to distinguish the case of slightly deleterious mutations from that of an expanded population.

Results: The shape of the allele frequency distribution was studied for each of the 138 populations. Alleles were allocated by frequency to the following 14 classes: 0.0 - 0.01, 0.01 - 0.05, 0.05 - 0.1, 0.1 - 0.2, 0.2 - 0.3, 0.3 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 0.7, 0.7 - 0.8, 0.8 - 0.9, 0.90 - 0.95, 0.95 - 0.99 and 0.99 - 1.0. The expected numbers of alleles for these frequency classes were

Page 5: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1043

obtained for the four models, infinite-allele and stepwise mutation models with constant and varying mutation rates (IC, IV, SC and SV models). These expected numbers depend on the value of M or M . Therefore, M or a, depending on the model used, was estimated for each population by the method given in APPENDIX 11. The observed and expected numbers were then compared for each frequency interval.

The observed allele frequency distribution was U-shaped in all of the 138 populations. Some representative examples are presented in Figure 1. In this figure, the three frequency classes between 0.9 and 1.0 are pooled together, but the three classes between 0 and 0.1 remain separated. The theoretical distribu- tion given in this figure is that for the IV model. In practice, all four models mentioned above showed a U-shaped distribution, but the IV model showed the best fit to the data in most populations, as will be discussed later. I t is clear that the pattern of the distribution varies according to the level of average hetero- zygosity. If this level is low, the allele frequency is either high or low, with few intermediate-frequency alleles. On the other hand, if the average heterozygosity

0 .01.05 .1 . 2 . 3 . 4 .S .6 .7 .8 .9 1.0 .o

c "1 d 25

20

25

20

15 15

10 10

5 5

0 0 0

FIGURE 1 .-Observed and expected distributions of allele frequencies in four species repre- senting different average heterozygosity values: (a) Macaca juscataz(20 loci, average hetero- zygosity H = 0.018, n = 1_976), (b) Taricha rivularis (37 loci, H = 0.077, n = 784),-(c) Zoarces uiviparus (32 loci, H = 0.102, n = 757), and (d) Drosoph:la heteroneura (25 loci, H = 0.162, n = 605). The observed distributions are represented by solid columns, while the ex- pected (for the IV model) distributions by slashed columns. The abscissa gives allele-frequency classes and the ordinate the number of alleles.

Page 6: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1044 R. CHAKRABORTY, P. A. FUERST A N D M. NE1

is high, there occur many intermediate-frequency alleles, though the distribution is still U-shaped.

The agreement between the theoretical and observed numbers of alleles is quite good, except in the low-frequency classes. Indeed, if we exclude rare alleles, the agreement is very good in most of the 138 populations, as will be seen below. However, in a few populations, particularly in the Drosophila willistoni group, there was a shortage of alleles in the frequency class between 0.99 and 1.0; whereas, ;he number of alleles in the frequency class between 0.95 and 0.99 was larger than the expected number. For example, in the Carib- bean population of Drosophila willistoni, in which 30 loci were studied, the observed number of alleles per population for the 0.99 - 1 .O frequency class was seven, compared with the expected number of 12.0; whereas, the number of alleles for the 0.95 - 0.99 frequency class was seven, compared with the theo- retical value of 3.0. The decrease of alleles in the 0.99 - 1.0 frequency class, accompanied by the increase of allcles in the 0.95 - 0.99 frequency class. was apparently caused by the excess of rare alleles observed in these populations. Namely, the number of rare alleles increased at the expense of alleles in the 0.99 - 1.0 frequency class, and many of the alleles in this frequency class were then shifted to the next highest frequency class. In a number of populations, the number of rare alleles was excessive compared with the neutral expectation, but in some populations it was deficient. In the next section, we shall examine in detail the magnitudes of excesses or deficiencies of rare alleles, as well as those of common alleles.

Numbers of rare alleles and common alleles The definition of rare alleles is quite arbitrary. In this study, an allele whose

frequency is less than 0.05 is defined as a rare allele, and all other alleles are called common alleles. Actually, we also studied the properties of rare alleles whose frequency was less than 0.01, but they were similar to those of the rare alleles as defined above. We have compared the observed and expected numbers of rare alleles, as well as of common alleles. for each population. In addition to rare alleles and common alleles, we have also compared the observed and ex- pected numbers of polymorphic alleles (gene frequency between 0.05 and 0.95), monomorphic alleles (0.95 - 1.0) and all alleles (0.0 - 1.0). The results ob- tained for rare alleles and common alleles in mammals and Drosophila are presented in Figures 2 and 3. The results €or birds, reptiles. amphibians, fishes and non-Drosophila invertebrates were more or less the same as those for mam- mals. It is clear that the agreement between theory and data is excellent for common alleles in both groups of organisms (Figure 3 ) ; indeed, a good agree- ment was observed in all groups of organisms. On the other hand, the agreement for rare alleles is less satisfactory. In Drosophila, there are four populations in which the observed number is far above the theoretical value. All of them [D. willistoni (Carib.), D. willistoni (Ven.) , D. equinoxialis, D. tropicalis] belong to the D. wi2listoni group. In other populations, however, the agreement between the observed and theoretical values is quite satisfactory. In mammals, the pro-

Page 7: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1045 1.5

1.0

0 U w I U m m 0 .I

0

a

2.

4 '-

1.73 1.

2 s

EXCECTED

. 4 .s 1.0 1.5 i.0 2.5 EXCECTED

FIGURE 2.-Observed us. expected (for the IV model) numbers of rare alleles per locus: (a) mammals (36 populations) and (b) Drosophila (25 populations). The populations that seem to be extreme outliers in mammals are: (1) human (British), (2) Macuca fuscata, (3) Papio h m a - dryus, (4) Geomys bursarius, and ( 5 ) Thomomys bottae (central species range), and those in Drosophila are: (1) D. willistoni (Carib.), (2) D. wilhtoni (Ven.), (3) D. equinoxidis, (4) D. tropicalis, ( 5 ) D. paulistorum (orinocan) and ( 6 ) D. nebulosa.

portional deviation of the observed number from the theoretical value is gem erally very large, but the deviations occur in both the positive and negative directions. These large deviations are partly due to a larger sampling error in this group of organisms than in Drosophila because of the lower level of polymorph- ism. There are two extremely deviant populations in the upward direction: man and the Japanese macaque. At any rate, Figure 2 shows that the excess of rare

FIGURE 3.-Observed us. expected (for the IV model) numbers of common alleles per locus in (a) mammals (36 populations), and (b) Drosophila (25 populations).

Page 8: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1046 R. CHAKRABORTY, P. A. FUERST A N D M. NE1

alleles is not the general rule, and there are many populations in which the num- ber of rare alleles is lower than expected.

To study the agreement between the observed and expected numbers of alleles objectively, some statistical test is required. OHTA (1976) used the x2 test for testing the discrepancy between the observed and expected numbers of alleles. However, the number of alleles in a given frequency class clearly does not follow the binomial distribution, so that the x2 test may not be applicable. Nevertheless, the quantity R2 = (0 - E ) 2 / E is a legitimate measure of discrepancy, where 0 and E represent the observed and expected numbers, respectively. However, before using this measure for testing, we must know the distribution of R’. Since it is difficult to derive this distribution analytically, we studied it by using com- puter simulation. In this simulation, we used the IC model with four different values of M = 4Nu. The M values used were those corresponding to the expected heterozygosities of H = 0.05, 0.10, 0.15 and 0.2. For each of these values, 20 independent sets of gene frequencies equivalent to 20 different loci were gen- erated by STEWART’S algorithm (FUERST, CHAKRABORTY and NEI 1977). The number of genes sampled (n) was either 100 or 300. From these gene-frequency data, R2 was computed for various gene-frequency classes. This was repeated 1000 times to get the distribution of R2. From this distribution, the critical value of R2 corresponding to the 0.05 significance level was cletermined. The results obtained for all alleles, rare alleles and common alleles are given in Table 1. Although the critical values given in this table are subject to some errors because of the limited number of replications used, they can be used for a rough signifi- cance test. It is seen from this table that the critical R2 values for “all alleles” and “common alleles” (gene frequency range of 0.05 - 1.0) are much smaller than that (3.84) of x2, whereas the values for “rare alleles” are considerably larger. Clearly, the x2 test is not justified. For a given group of alleles. the critical R2 value depends on both expected heterozygosity and sample size. In the cases of “all alleles” and “common alleles.” the value increases with increasing H ; in the case of “rare alleles” it decreases. Therefore, the test of significance should ideally be conducted by knowing the expected heterozygosity. The critical value

TABLE 1

RZ values corresponding to the 5 percent significance level

Frequency H = 0.05 H = 0.10 H = 0.15 H = 0.00 range 100 300 100 300 1m 300 100 300

All alleles (0 - 1) 0.71 1.14 1.13 2.13 1.39 2.56 3.89 2.86

Rare alleles (0 - 0.05) 12.51 14.85 7.55 10.16 7.10 9.10 6.75 9.11

Common alleles (0.05 - 1) 0.09 0.09 0.17 0.18 0.23 0.26 0.31 0.30

H stands for the expected heterozygosity, and the figures below H are the number of genes sampled.

Page 9: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1047

TABLE 2

Distribution of R2 ualues for rare alleles in the 138 populrrtions studied

R= 5 0.05 Average heterozygosity

50.10 50.15 > 0.15 Total Percent

O < R z < 5 19(13)* 34(23) 17(12) 8(5) 78(53) 56.5 5 < R 2 < 8 1(1) 6(2) 2 2 11 (3) 8.0

R2 >I5 12 15 3 6 36 26.1 8 < R2 5 15 2 9(1) 2 ( l ) 0 13 (2) 9.4

* Figures in the parentheses refer to the number of populations in which the deviation (0- E)

Populations are grouped according to the observed average heterozygosity. was in the negative direction.

of R2 for “common alleles” is virtually the same for n = 100 and n = 300. How- ever, the critical values for “all alleles” and “rare alleles” are larger for n = 300 than for n = 100. This seems to be due to the fact that the range of the number of rare alleles increases with increasing sample size. Because of a technical diE- culty, we have not studied the distribution of R2 for the case of n > 300, but, since the sample size was between 100 and 300 in 91 out of the 138 populations, we shall use the values for n = 100 in Table 1 for our test. (The median sample size was 224.) There were 18 species in which the sample size was very large ( n > 500). The use of the critical values for n = 100 makes the test liberal in the sense that the test rejects the null hypothesis more often than warranted.

Tables 2 and 3 show the R2 values for common alleles and rare alleles, respec- tively, for the 138 populations studied, where the populations are classified according to the observed average heterozygosity. Theoretically, populations should be classified according to the expected average heterozygosity, but, in practice, this is impossible. Therefore, the present test should be regarded only as semi-quantitative. The expected number of alleles (rare, common, poly- morphic or all alleles) uscd for the computation of R2 was again that for the infinite-allele model with varying mutation rate (IV model). Actually, we com- puted the R2 values using the expected number of alleles for all the four differ- ent models. The R2 values obtained for each of the models were then ranked within each population. and a Friedman two-way analysis of variance was con-

TABLE 3

Distribution of R2 ualues for common alleles in the 238 populations studied

Average heterozygosity R2 50.05 50.10 50.15 ’0.15 Total Percent

0 < R2 5 0.1 29(21)* 58(34) 17(8) l l ( 7 ) 115(70) 83.3 0.1 < R2 2 0.2 4(1) 4(2) 5(4) l (1 ) 14(8) 10.1 0.2 < R2 5 0.3 1 1 1(1) 0 3 ~ ) 2.2 R2 > 0.3 0 1 1(1) 6 (3) 4.3

* Figures in the parentheses refcr to the number of populations in which the deviation (0 -E)

Populations are classified according to the observed average heterozygosity. was in the negative direction.

Page 10: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1048 R. CHAKRABORTY, P. A. FUERST AND M. NE1

ducted for the rank data. This analysis showed that the goodness-of-fit of the model to data was in the order of IV, IC, SC and SV, the IV model being significantly better than the IC model.

A comparison of Tables 1 and 2 shows that the majority of R2 values for com- mon alleles are within the 0.05 significance level, though the present test is semi-quantitative, as mentioned above. (Note also that the critical values in Table 1 were obtained by using the IC model rather than the IV model.) There- fore, we can conclude that the number of common alleles is generally in agree- ment with the theoretical value expected from the mutation-drift hypothesis. On the other hand, the number of rare alleles apparently exceeds the expected number significantly in many populations (Table 2). Indeed, R2 is larger than 15 in 26 percent of the populations. Included in this group are the Drosophila willistoni group species and man, where OEITA (1976) and LATTER (1976) found an excess of rare alleles. However, it is noted that in more than 50 percent of the populations, R2 is apparently within the 0.05 significance level. Therefore, it may be concluded that even the number of rare alleles agree with the theoreti- cal expectation in a majority of populations. We note that, in many populations, a deficiency rather than an excess of rare alleles is observed, though it is not generally statistically significant.

At this point, it should be noted that very rare alleles can not be detected unless sample size is very large. Therefore, we examined the frequency of rare alleles in more detail for those populations in which more than 1000 genes were sampled. In this case. one extra population (Japanese population) was included since there was a large sample of genes surveyed (NEEL et a2. 1978), and the three frequency classes, 0.0 - 0.005, 0.005 - 0.01 and 0.01 - 0.05. were con- sidered (Table 4). It is seen that in five populations (D. willistoni, P . platessa, M . fuscata, and two human populations) there are excesses of rare alleles in all three of the frequency classes; whereas, in other populations the agreement be- tween theory and data is generally satisfactory, except in S. alatus (fish). It is interesting to note that when there is an exceqs in a rare allele class. the other rare allele classes also show an excess. S. alatus is unique in that no rare alleles have been observed in a sample of 2008 genes. However, this might be due to the fact that JOHNSON, UTTER and HODGINS (1973), who studied this organism, were primarily interested in a taxonomic question and little attention was paid to rare alleles.

We have done similar statistical tests for all alleles, polymorphic alleles and monomorphic alleles computing the observed and expected values of R2. The results obtained have suggested that the total numbw of alleles (all alleles) is significantly greater than the expected number from the IV model in 36 of the 138 populations. All of these populations were those in the last row in Table 2. This indicates that the increase in the total number of alleles is entirely due to the increased number of rare alleles. On the other hand, the results for poly- morphic alleles and monomorphic alleles were essentially the same as those for common alleles, and the agreement between theory and data was satisfactory, though we do not present the results.

Page 11: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1049

TABLE 4

Observed and expected numbers of rare alleles per locus in eleven populations in which more than 1000 genes are sampled

Populations

AV. no. of Observed Allele frequency class genes No. of average 0.0- 0.005 0.005-0.01 0.01 -0.5

sampled loci heterozygosity Ob. Exp. Obs. Exp. Obs. Exp.

D. engyochracea D. willistoni (Carib.) D. buzzatii D. mimica Sebastes alatus Pleuronectes platessa Bufo viridis Macaca fuscata Human (British) Human (Japanese)

1656 1850 2149 1298 2008 1415 1034 1976 561 7 6966

20 0.127 27 0.185 29 0.068 21 0.222 25 0.038 44 0.109 26 0.159 29 0.018 43 0.077 25 0.070

0.50 0.45 0.05 0.11 0.30 0.27 2.19 0.77 0.52 0.18 0.78 0.44 0.17 0.23 0.00 0.06 0.14 0.13 1.10 0.86 0.04 0.23 0.38 0.58 0.00 0.12 0.00 0.03 0.00 0.07 0.86 0.35 0.14 0.09 0.48 0.22 0.50 0.50 0.15 0.14 0.31 0.35 0.41 0.05 0.17 0.01 0.21 0.03 1.33 0.35 0.05 0.06 0.07 0.15 1.36 0.33 0.08 0.04 0.04. 0.13

Expected numbers are based on the IV model.

Relationship between the number of alleles and heterozygosity Under the mutation-drift hypothesis, a positive correlation between the num-

ber of alleles and heterozygosity is expected. This is so even for rare alleles (EANES and KOEHN 1977). In Drosophila, however, KOEHN and EANES (1976) and EANES and KOEHN (1977) showed that the correlation between the num- ber of rare alleles and heterozygosity is considerably higher than their neutral expectation. They postulated that this excess Correlation might be due to a high rate of intragenic recombination. In the computation of the expected correlation under the mutation-drift hypothesis, however, they did not consider the varia- tion in mutation rates among loci. (When they studied this problem, the varying mutation model was not available.) Furthermore, their method of computation was based on NEI, MARUYAMA and CHAKRABORTY’S (1975) computer algorithm, which was later shown to be imperfect when 4Nv is large. In this section, we intend to examine the relationships between heterozygosity and the number of alleles for different gene-frequency classes by using a more accurate theoretical correlation. We shall investigate the relationships for rare alleles (gene fre- quency, 0.0 - 0.05), polymorphic alleles (0.05 - 0.95) and monomorphic alleles (0.95 - 1.0). EANES and KOEHN (1977) studied the correlation between the number of rare alleles and the heterozygosity contributed by common alleles (polymorphic alleles plus monomorphic alleles) using the arcsine transforma- tion of heterozygosity, In this paper, we shall consider the heterozygosity con- tributed by all alleles, and no transformation will be made. Theoretically, the arcsine transformation of heterozygosity is not justified, since heterozygosity is not a usual proportion and generally has a E-shaped distribution (FUERST, CHAKRABORTY and NEI 1977).

For obtaining the theoretical correlation, we again used Stewart’s algorithm for the infinite-allele model. That is, generating a random set of gene frequencies far a locus, the numbers of alleles for the three frequency classes and sample

Page 12: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1050 R. CHAKRABORTY, P. A. FUERST A N D M. N E 1

heterozygosity were computed. This computation was repeated 50,000 times for the case of constant mutation rate and 20,000 times for the case of varying mutation rate in each of the 30 expected heterozygosity values from 0.01 to 0.30 at 0.01 intervals. Using the results obtained, the theoretical correlations were computed for each expected heterozygosity value. In the computation of these correlations, all loci were used whether they were polymorphic or monomorphic. (EANES and KOEHN used only polymorphic loci). No attempt was made to com- pute the theoretical correlation for the stepwise mutation model since no com- parable computer algorithm is available.

The results obtained for the case of sample size n = 300 are given in Figure 4. It is clear that, for the case of a constant mutation rate, the correlation for rare alleles is about 0.1 1 for all heterozygosity values, whereas the correlation for polymorphic alleles is about 0.89. On the other hand, the correlation for monomorphic alleles is always negative, and the absolute value declines almost linearly from 0.79 to 0.55 as the expected heterozygosity increases from 0.01 to 0.3. When the mutation rate varies among loci, the correlation between sample heterozygosity and the number of rare alleles increases as the expected hetero- zygosity increases, but the correlation for polymorphic alleles is virtually un- affected. The absolute value of the correlation for monomorphic alleles still declines with increasing average heterozygosity, but with a slower rate. The effect of sample size seems to be small as long as li is larger than 100. When n = 100, all the correlations were only slightly smaller than those for n = 300. We have not examined the expectations for n > 300. In the following, we shall use the correlations for n = 300 for the comparison with the observed values, except in some cases.

Our results for the case of constant mutation rate are somewhat different from those of EANES and KOEHN (1977). When the average heterozygosity or 4Nu is small, their theoretical correlation between heterozygosity and the number of rare alleles was slightly higher than ours. but declined as 4Nv increased. How- ever, since their method of computing the correlation was different from ours, as mentioned earlier, it is difficult to compare the two sets of results quantitatively.

The observed correlations between sample heterozygosity and the number of alleles for the three-gene-frequency classes were computed in all of the 138 populations. Each of the correlations obtained was compared with the theoretical correlation corresponding to the average heterozygosity of the population studied. In some populations, however, there were no alleles in a given frequency class at any locus, so that such a comparison was not possible. The observed correla- tions between heterozygosity and the number of rare alleles for Drosophila species, together with the theoretical relationship for the varying mutation model, are presented in Figure 5. It is clear that in 22 of the 25 populations, the observed correlation is positive and scattered around the theoretical values. The large variation around the theoretical values is expected to occur, since the observed correlation is subject to a large stochastic error, as well as to sampling error. The averages of the theoretical and observed correlations for these species are 0.40 and 0.38, respectively. so that the agreement is satisfactory. Here, again, the

Page 13: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1.0

.7!

2 0 - .S( c d W pc pc 0 0

.21

0

ALLELE FREQUENCY DISTRIBUTIONS

. . .l .2

AVERAGE HETEROZYGOSITY

1051

.1.0

e.75

-.so

-.2S

0

FIGURE 4.-Correlations between heterozygosity and the number of alleles for three different gene-frequency classes in relation to average heterozygosity under two models of neutral muta- tions. Solid lines refer to the IV model, whereas dashed lines the IC model. These results were obtained by computer simulations. The right-hand scale for the ordinate refers to the correlation between heterozygosity and the number of monomorphic alleles, the left-hand scale to the other correlations.

observed correlation agrees with the expected value from the IV model better than with that from the IC model. Table 5 shows the average observed and theoretical correlations €or each of the three classes of alleles (rare, polymorphic and monomorphic) over all populations studied. It is seen that the overall agree- ment between the observed and theoretical correlations is quite satisfactory for all three allele-frequency classes when the interlocus variation of mutation rate was considered. The constant mutation model is less satisfactory than the vary- ing mutation model in explaining the observed correlations. Table 5 includes

Page 14: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1052

2 .4- - I- < -I W

I 0

.2-

R. CHAKRABORTY, P. A. FUERST A N D M. NE1

V

0

I .6

I I I I i .os .10 . is .20 2s

. . .

.0.2 J FIGURE 5 . 4 b s e r v e d correlations between heterozygosity and the number of rare alleles in

25 Drosophila populations in relation to the average heterozygosity. The solid line refers to the expected relationship as obtained by computer simulations under the IV model.

the correlations for common alleles and all alleles, as well. For these, the constant mutation model is again less satisfactory. It is noted that the observed correlation for all alleles is significantly different from the theoretical even when the vary- ing mutation model is used. In the present case, however, the number of genes sampled varied considerably from population to population and was often lower than 300, as mmtioned before. Theref'ore, a complete agreement between theory

TABLE 5

Auerages of obserued and theoretical correlations between heierozygosity and the number of alleles for various gene frequency clmses

Frequency class

Number of Observed populations correlation

Rare alleles

Polymorphic alleles

Monomorphic alleles

Common alleles

All alleles

(0.0 - 0.05) 132 0.339t 0.025

(0.05 - 0.95) 133 0.931 +- 0.004

(0.95 - 1.0) 133 -0.885 f 0.005

(0.05 - 1.0) 132 0.908 +- 0.005

(0.0 - 1.0) 138 0.769% 0.01 1

Theoretical correlation IC model IV model

O . l i l * 0.300 (0.293)

0.904* 0.922 (0.924)

-0.878 -0.884(-0.886)

0.893* 0.906 (0.909)

0.638* 0.709* (0.768) ~ ~~~ ~~ ~~~~

* The observed value is significantly different from the theoretical at the 0.1 percent level when

The theoretical correlations are for n = 300, but those in the parentheses for the IV model are the critical value for n = 300 is used.

for n = 100.

Page 15: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1053

and data is not expected. Nevertheless, if we use the theoretical value for n = 100, the agreement becomes very good (Table 5).

As mentioned above, KOEHN and EANES (1 976) and EANES and KOEHN (1977) observed a positi7.e correlation between heterozygosity and the number of rare alleles in Drosophila, which they thought was much higher than the theoretical value expected under the mutation-drift hypothesis. We have not been able to reproduce their results, but our recalculations have suggested that the discrepancy between data and theory in their analysis is mainly due to the fact that they did not consider the variation in mutation rate among loci in computing their theoretical correlations. We have also noted that arcsine transformation of het- erozygosity generally inflates the correlation. Our results for the populations studied by them are presented in Table 6. If we consider the large stochastic error associated with this correlation (see Figure 5 ) , the agreement between theory and data seems to be satisfactory.

Subunit molecular weight of proteins and the number of alkeles Recently, KOEHN and EANES (1977, 1978), NEI, FUERST and CHAKRABORTY

(1978), WARD (1978) and BROWN and LANGLEY (1979) have shown that there is a positive correlation between heterozygosity and the subunit molecular weight of the protein encoded. NEI, FUERST and CHAKRABORTY (1978) showed that the magnitude of the observed correlation is in agreement with the expectation from the mutation-drift hypothesis. KOEHN and EANES (1977) and EANES and KOEHN (1978) observed a similar positive correlation between the number of alleles and subunit molecular weight of protein. In this section, we generalize their observation and examine whether tho magnitude of the correlation can be ex- plained by the mutation-drift hypothesis or not.

The expected correlation between molecular weight and the number of alleles can be computed in the same way as that for the correlation between molecular weight and heterozygosity ( NEI, FUERST and CHAKRABORTY 1978), assuming that the mutation rate is correlated with the molecular weight and follows the

TABLE 6

Correlations between heterozygosity and the number of rare alleles in the Drosophila populations that were used in EANES and KOEHN'S analysis

Population Number of loci

Average heterozygosity

Observed correlation

D. bifasciatu D. equinoxialis (Ven.) D. robusta D. iropicalis (Ven.) D. willistoni (Ven.) D. willistoni (Carib.) D. willistoni (Trin.)

20 30 40 29 31 27 21

0.251 0.185 0.123 0.156 0.183 0.184 0.200

0.213 0.438 0.386 0.580 0.490 0.468 0.765

Expected correlation IC model IV model

0.120 0.527 0.115 0.465 0.108 0.375 0.105 0.415 0.110 0.450 0.112 0.453 o.io5 0.480

The source of the data used in our computation is given in EANES and KOEHN (1977). They used four more populations for which the gene frequency data were unpublished. The correlations in this table were obtained by our method rather than by theirs.

Page 16: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1054 R. CHAKRABORTY, P. A. FUERST AND 111. NE1

gamma distribution. The only difference is that the former depends on sample size, whereas the latter is practically independent of sample size. However, numerical computations have shown that the effect of sample size on the former correlation is rclatively cmall when the sample size is larger than n = 300 (CHAKRABORTY and FUERST 1979), The actual value of the correlation between mutation rate and molecular weight (rm) is not known at the present time. For lhose proteins in which amino acid sequences have been determined, the correla- tion has been estimated to be 0.5 under the mutation-drift hypothesis (NEI, CHAKRABQRTY and FUERST 1976). In this paper, we assume that this correlation applies to the proteins used for electrophoresis. Under this assumption, the ex- pected correlation between molecular weight and the number of alleles per locus is one-half the correlation obtained under the assumption that the mutation rate is completely correlated with molecular weight ( NEI, FUERST and CHAKRABORTY 1978). The solid and dashed lines in Figure 6 show the expected squared correla- tions under the assumptions o€ rCm = 1 and r,, = 0.5, respectively, for the case of IZ = 500. The squared correlation represents the proportion of variance of the number of alleles that is attributable to the variation of molecular weight.

The empirical correlation between molecular weight and the number of alleles per locus was computed for each of the 122 populations. Non-Drosophila inverte- brates were excluded because of lack of information on molecular weight. In this study, the same data on molccular weight were used as those used by NEI, FUERST and CHAKRABQRTY (1978). The number of protein loci used for a population was eight to 38. The squared Correlations obtained for the 45 populations, in which the number of genes sampled is equal to or greater than 300, are given in Figure 6.

Figure 6 shows that the correlation varies greatly among different species, just like the correlation between molecular weight and heterozygosity studied earlier. NEI, FUERST and CHAKRABORTY (1 978) have shown that a large stochastic error is associated with this type of correlation; thus, the comparison of observed and expected correlations f o r a single species is not very meaningful, unless a large number of loci is studied. Nevertheless, Figure 6 shows that in a majority of populations the correlation is positive and tends to increase with increasing average heterozygosity, as predicted by the mutation-drif t theory.

Comparison of Figure 6 with Figure 1 or 2 of NEI, FUERST and CHAKRABORTY (1 978) indicates that the expected correlation between molecular weight and number of alleles is considerably higher than the expected correlation between molecular weight and heterozygosity (see also CHAKRABORTY and FUERST 1979). For example, when the expected heterozygosity is 0.1, the former correlation is 0.43, whereas the latter is 0.22. We have examined whether or not this is true with our data. The results obtained have shown that this is indeed the case in most organisms. In 78 of the 122 populations examined, the former correlation was higher than the latter. Table 7 shows the average correlations between the number of alleles and molecular weight and the average correlations between heterozygosity and molecular weight in five different groups of organisms. (Two bird species are not included.) It is clear that the former is generally higher

Page 17: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1055

0.9

0.t

z El 4 w DI

0 " 0 w PL 4

v) S 0.3

0

-0.1

e .

0

I I 1

Cl ;2 AVERAGE HETEROZYGOSITY

FIGURE 6.-Squared correlations (r2) between subunit molecular weight and the number of alleles per locus in 45 populations in relation to the average heterozygosity. The solid curve gives the theoretical value of 1-2 for the IV model when the mutation rate is directly proportional to molecular weight. The dashed curve gives the theoretical r2 value when the correlation between mutation rate and molecular weight is 0.5. The negative ordinate is used to indicate the r2 values for which r was negative. mammals, fishes, x reptiles, A amphibians, and 0 Drosophila.

than the latter, though the effect of sampling errors seems to be large for some of the correlations. This observation confirms KOEHN and EANES' (1977) similar finding in Drosophila.

DISCUSSION

Since we presented our general discussion about the maintenance of protein polymorphism elsewhere (NEI 1980, and others), our discussion in this paper will be confined to problems that directly relate to the subjects studied here. One of our major findings in this paper is that the excess of rare alleles, compared with the neutral expectation, is not a general phenomenon. In many populations, the number of rare alleles is at the level of neutral expectations, particularly when the variation in mutation rates among loci is taken into account.

Page 18: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1056 R. CHAKRABORTY, P. A. FUERST A N D M. NE1

TABLE 7

Auerage correlations between number of alleles and molecular weight (rnm) and auerage correlrttions between heterozygosity and molecular

weight (rhm) in five different groups of organisms

Sample size = 100 - 300 Sample size > 300 Numberof - Number of - Populations H rhm rnm Populations H ram rnn

Drosophila 1 1 0.158 0.398 0.465 14 0.129 0.481 0.416 Fish T 0.058 0.125 0.134 13 0.061 0.182 0.178 Amphibians 17 0.059 0.159 0.206 4 0.087 0.162 0.005 Reptiles 14 0.074 0.1% 0.333 4 0.049 0.209 0.352 Mammals 26 0.064 0.149 0.245 10 0.046 0.299 0.345

- H refers to the average heterozygosity.

Nevertheless, there are a substantial number of species, particularly in the Drosophila willistoni group, where the number of rare alleles is higher than the neutral expectation. This can be explained either by the hypothesis of slightly deleterious mutations (OHTA 1974; LI 1978) or by the bottleneck effect (NEI 1976; NEI and LI 1976). However, the fact that the excess of rare alleles is not a general phenomenon makes it easier for the bottleneck-effect hypothesis to explain the data than for OHTA’S hypothesis of slightly deleterious mutations. If OHTA’S hypothesis is correct, the excess of rare alleles should occur in all populations; whereas, in the bottleneck hypothesis. the excess should occur only in those populations that went through a bottleneck relatively recently. Further- more, in OHTA’S hypothesis, the excess of rare alleles is expected to occur more often in a (large) population with high heterozygosity than in a (small) popula- tion with low heterozygosity. The R2 values for rare alleles in Table 2, however, show that this is not the case in practice. We have also seen that when there is an excess in a rare allele class, the other rare allele classes also show an excess. This observation is easier to explain by the bottleneck-eff ect hypothesis rather than by the hypothesis of deleterious mutations. In OHTA’S hypothesis of slightly deleterious mutations. the mutation-selection balance is supposed to be established in large populations, such as those of Drosophila willistoni. In this case, the allele frequency distribution will no longer be U-shaped, as emphasized by LI (1978). Yet, the observed distributions are all U-shaped. This creates another difficulty for the hypothesis of slightly deleterious mutations. On the other hand, we have seen no indication of a mode at the intermediate gene frequency in any population. This suggests that overdominant selection is not important.

Our study on the correlation between heterozygosity and the number of rare alleles has shown that the observed value is generally consistent with the theoret- ical value, though it is subject to a large stochastic error. Our results, therefore, do not support EANES and KOEHN’S (1977) suggestion that, in addition to muta- tion, intragenic recombination induces further genetic variability. STROBECK and MORGAN (1978) have conducted a theoretical study of the effect of intra- genic recombination on the magnitude of heterozygosity in finite populations. ‘They conclude that this effect is not important unless 4Nv is larger than 1. Most

Page 19: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1057

of the average heterozygosities observed so far are less than 0.3, so that the average 4Nv seems to be considerably lower than 1 in most populations. There- fore, in those species studied here, the effect of intragenic recombination does not appear to be important except in some loci where the mutation rate is un- usually high. In this connection, it should also be noted that no theoretical study has been done about the correlation between heterozygosity and the number of rare alleles when intragenic recombination occurs. Without such a theoretical study, it is difficult to evaluate the effect of intragenic recombination on genetic variability.

We have seen that the number of alleles per locus is positively correlated with the subunit molecular weight of the protein encoded. This positive correla- tion is most easily explained by the mutation-drift hypothesis, since, under this hypothesis, the number of alleles is a function of 4Nv (WRIGHT 1949) and the subunit molecular weight is expected to be correlated with the mutation rate (KOEHN and EANES 1978). If selection is a major factor for the maintenance of genetic variability, the number of alleles or heterozygosity should primarily depend on the intensity of selection rather than on the mutation rate. However, a similar positive correlation is consistent with OHTA’S (1974) hypothesis of slightly deleterious mutations, since in this hypothesis mutation is the major factor determining the level of heterozygosity, except in large populations.

We have confirmed KOEHN and EANES’ (1977) earlier observation that the correlation between molecular weight and the number of alleles is higher than the correlation between molecular weight and heterozygosity. We have also shown that the former correlation is theoretically expected to be larger than the latter under the equilibrium theory of neutral mutations. However, there are two additional factors that may make the former correlation higher than the latter. One is the existence of deleterious mutations (not necessarily slightly deleterious mutations). When sample size is large, deleterious mutations that would never become prevalent in the population are likely to be included in the sample. If they are included, the number of alleles in the sample will have a higher correlation with the total mutation rate, which would, in turn, be highly correlated with molecular weight in the presence of various functional con- straints of the protein encoded. On the other hand, deleterious mutations con- tribute little to heterozygosity, so that the correlation of molecular weight with heterozygosity is expected to be smaller than that with the number of alleles. The other factor is the bottleneck effect. NEI and LI (1976) have shown that after a population goes through a bottleneck the number of alleles reaches the equilibrium value much faster than heterozygosity. Therefore, the correlation of molecular weight with the number of alleles is expected to be higher than that with heterozygosity in recently expanded populations.

We thank the following investigators who kindly supplied unpublished data for the present study: 5. C. AVISE, D. G. BUTH, S. D. FERRIS, G. C. GORMAN, D. HEDGECOCK, R. HIGHTON, A. G. JOHNSON, M. S. JOHNSON, K. KNOPF, M. N. MANLOVE, R. B. MERRITT, J. B. MITTON, E. D. PARKER, R. PATTON: S. PRAKASH, M. H. SMITH, G. N. SOMERO, W. W. M. STEINER, A. TEMPLE- TON and S. TILLEY. We are also grateful to R. K. KOEHN, W. F. ~ N E S and T. OHTA for their helpful comments. This study was supported by research grants from the Public Health Service and the National Science Foundation.

Page 20: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1058 R. CHAKRABORTY, P. A. FUERST A N D M. N E 1

LITERATURE CITED

ABRAMOWITZ, M. and I. A. STEGUN, 1965 cations, New York.

AHMAD, M., D. 0. F. SKIBINSKI and J. B. BEARDMORE, 1977 genetic variation in the common mussel Mytilus edulis. Biochem. Genet. 15: 833-846.

AVISE, J. C. and F. J. AYALA, 1976 Genetic differentiation in speciose versus depauperate p h p lads: evidence from the California minnows. Evolution 30: 46-58.

AVISE, 5. C., M. H. SMITH and R. K. SELANDER, 1974 Biochemical polymorphism and system- atics in the genus Peromyscus. VI. The boylii species group. J. Mammalogy 55: 751-763.

AVISE J. C., M. H. SMITE R. K. SELANDER, T. E. LAWLOR and P. R. RAMSEY, 1974 Biochemical polymorphism and systematics in the genus Peromyscus. V. Insular and mainland species of the subgenus Haplomylomys. Systemat. Zool. 23 : 226-238.

Genetic variation in Tridacna maxima, an ecological analog of some unsuccessful evolutionary lineages. Evolu- tion 27: 177-191.

AYALA, F. J. and M. L. TRACEY, 1974 Genetic differentiation within and between species of the Drosophila willistoni group. Proc. Natl. Acad. Sci. U.S. 71 : 999-1003.

AYALA, F. J., M. L. TRACEY, L. G. BARR, J. F. MCDONALD and S. P~REZ-SALAS, 1974a Genetic variation in natural populations of fivc! Drosophila species and the hypothesis of the selective neutrality of protein polymorphisms. Genetics 77: 343-384.

Genetic variability in a temperate intertidal Phoronid, Phoronopsis uiridis. Biochem. Genet. 11 : 413-427.

Genetic variability of the Antarctic brachiopod Liothyrellcz notorcadensis and its bearing on mass extinction hypothesis. J. Paleont. 44: 1-9.

AYALA, F. J., J. W. VALENTINE and G. S . ZUMWALT, 1975 An electrophoretic study of the Antarctic zooplankter Euphausia superba. Limnol. and Ocean. 20 : 635-640.

BAKER, M. C., 1974 Genetic structure of the two populations of white crowned sparrows with different song dialects. Condor 76: 351-356.

BAND, H. T., 1975 A survey of isozyme polymorphism i n a Drosophila melanogaster natural population. Genetics 80: 761-771.

BARKER, J. S. F. and J. C. MULLEY, 1976 Isozyme variation in natural populations of Drosophila buzzatii. Evolution 30: 213-233.

Handbook of Mathematical Functions. Dover Publi-

An estimate of the amount of

AYALA, F. J., D. HEDGECOCK, G. S. ZUMWALT and J. W. VALENTINE, 1973

AYALA, F. J., J. W. VALENTINE, L. G. BARR and G. S. ZUMWALT, 1974b

AYALA, F. J., J. W. VALENTINE, T. E. DELACA and G. S. ZUMWALT, 1975

BROWN, A. J. L. and C. H. LANGLEY, 1979 Correlation between heterozygosity and subunit molecular weight. Nature 277: 649-651.

Genetic variation in island and mainland populations of Peromyscus leu- copus. Amer. Midland Natur. 97: 1-9.

Some sampling properties of selectively neutral alleles: effects of variability of mutation rates, Genet. Res. 34: 253-267.

Statistical studies on protein polymorphism in natural populations. 11. Gene differentiation between populations. Genetics 88: 367-390.

Genetic relationships of the Eastern large Plethodon of the Ouachita Mountains. Copeia, No. 1,95-110.

The correlation of rare alleles with heterozygosity: de- termination of the correlation for the neutral models. Genet. Res. 29: 223-230. -, 1978 Relationship between subunit size and number of rare electrophoretic alleles in human enzymes. Biochem. Genet. 16: 971-985.

BROWNE, R. A., 1977

CHAKRABORTY, R. and P. A. FUERST, 1979

CHAKRABORTY, R., P. A. FUERST and M. NEI, 1978

DUNCAN, R. and R. HIGHTON, 1979

EANES, W. F. and R. K. KO", 1977

Page 21: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1059

EWENS, W. J., 1972 The sampling theory of selectively neutral alleles. Theoret. Pop. Biol. 3:

EWENS, W. J. and J. H. GILLESPIE, 1974 Some simulation results for the neutral allele model with interpretations. Theoret. Pop. Genet. 6: 35-57.

FRYDENBERG, 0. and V. SIMONSEN, 1973 Genetics of Zoarces populations. V. Amount of protein polymorphism and degree of genic heterozygosity. Hereditas 75: 221-232.

FUERST, P. A., R. CHAKRABORTY and M. NEI, 1977 Statistical studies on protein polymorphism in natural populations. I. Distribution of single locus heterozygosity. Genetics 86: 455-483.

FUERST, P. A. and R. E. FERRELL, 1980 The stepwise mutation model. an experimetnal evalua- tion utilizing hemoglobin variants. Genetics 94: 185-201.

GARTSIDE, D. F., H. C. DESSAUER and T. JOANEN, 1977 Genic homozygosity in an ancient reptile (Alligrrtor mississippiensis). Biochem. Genet. 15 : 655-663.

GARTSIDE, D. F., J. S. ROGERS and H. C. DESSAUER, 1977 Speciation with little genic and mor- phological differentiation in the ribbon snakes Thamnophis proximus and T . sauritus (Colubridae). Copeia, No. 4,697-707.

GILL, A. E., 1976 Genetic divergence wf insular populations of deer mice. Biochem. Genet. 14:

GLOVER, D. G., M. H. SMITH, L. AMES, J. JOULE and J. M. DU~ACH, 1977 Genetic variation in Pika populations. Canad. J. Zool. 55: 1841-1845.

GORMAN, G. C. and Y. J. KIM, 1975 Genetic variation and genetic distance among populations of Anolis lizards on two lesser Antillean Island banks. Systemat. Zool. 24: 369-373.

GORMAN, G. C., M. SOULE, S. Y. YANG and E. NEVO, 1975 Evolutionary genetics of insular Adriatic lizards. Evolution 29: 52-71.

GREENBAUM, I. F. and R. J. BAKER, 1976 Evolutionary relationships in Macrotus (Mammalia: Chiroptera) : biochemical variation and karyology. Systemat. Zool. 15 : 15-25.

HARRIS, H., D. A. HOPKINSON and E. B. ROBSON, 1973 The incidence of rare alleles determining electrophoretic variants: data on 43 enzyme loci in man. Ann. Hum. Genet. 37: 237-253.

HEDGECOCK, D., 1976 Genetic variation in two widespread species of salamanders, Tur:cha granulosa and Taricha torosa. Biochem. Genet. 14: 561-576. - , 1978 Population sub- division and genetic divergence in the red-bellied Newt, Taricha riuulmis. Evolution 32 :

HIGHTON, R. and T. P. WEBSTER, 1976 Geographic protein variation and divergence in popula- tions of the salamander Plethodon cinereus. Evolution 30: 33-45.

JARVINEN, O., H. SISULA, A.-L. VARVIO-AHO and P. SALMINEN, 1976 Genic variation in isolated marginal populations of the Roman Snail, Helix pomatia L. Hereditas 82: 101-110.

JOHNSON, A. G. and F. M. UTTER, 1976 Electrophoretic variation in intertidal and subtidal organisms in Puget Sound, Washington. Anim. Blood Grps. and Biachem. Genet. 7: 3-14.

JOHNSON, A. G., F. M. UTTER and H. 0. HODGINS, 1973 Estimate of genetic polymorphism and heterozygosity in three species of rockfish (Genus Sebustes). Comp. Biochem. Physiol. 44B: 397-406.

JOHNSON, M. S., 1975 Biochemical systematics of the atherinid genus Menidia. Copeia, No. 4, 662-691.

JOHNSON, W. E., R. K. SELANDER, M. H. SMITH and Y. J. KIM, 1972 Biochemical genetics of sibling species of the cotton rat (Sigmodon). Univ. Texas Publ. 7213: 297-305.

KILPATRICK, C. W. and E. G. BMMERMAN, 1976 Biochemical variation and systematics of Peromyscus pectorulis. J. Mammalogy 57 : 506-522.

KIM, Y. J., 1972 Studies of biochemical genetics and karyotypes in pocket gophers (family Geomyidae) , Ph.D. Dissertation, Univ. of Texas, Austin, Texas.

87-112.

835-848.

271-286.

Page 22: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1060 KIMURA, M. and I. F. CROW, 1964 The number of alleles that can be maintained in a finite pop-

ulation. Genetics 49 : 725-738. KIMURA, M. and T. OHTA, 1975 Distribution of allelic frequencies in a finite population under

stepwise production of neutral alleles. Proc. Natl. Acad. Sci. U.S. 72: 2761-2764. ---?

1978 Stepwise mutation model and distribution of allelic frequencies in a finite population. Proc. Natl. Acad. Sci. U.S. 75: 2868-2872.

Protein variation in Gomphus (Odonata: Gomphidae). Ph.D. Dissertation, Univ. of Florida.

An analysis of allelic diversity in natural populations of Drosophila: The correlation of rare alleles with heterozygosity. pp. 377-390. In: Population Genetics and Ecology. Edited by S, KARLIN and E. NEVO. Academic Press, New York. --, 1977 Subunit size and genetic variation of enzymes in natural populations of Drosophila. Theoret. Pop. Biol. 11: 330-341. --, 1978 Molecular structure and protein variation within and among populations. Evolutionary Biology 11 : 39-100.

Genetic variation in natural populations of Drosophila ob- scura. Genetics 69: 377-384.

Geographic protein variation and divergence of the Plethodon welleri group (Amphibia, Plethodontidae). Systemat. Zool. 27: 431-48.

The intensity of selection for electrophoretic variants in natural popula- tions of Drosophila. pp. 391-408. In: Populat’on Genetics and Ecology. Edited by S. KARLIN and E. NEVO. Academic Press, New York.

LESTER, L. J., 1979 Population genetics of Penaeid shrimp from the Gulf of Mexico. J. Heredity 70: 175-180.

LI, W.-H., 1978 Maintenance of genetic variability under the joint effect of mutation, selection and random drift. Genetics 90: 349-382.

LUCOTTE, G. and M. KAMINSKI, 1976 Polymorphisme biochimique chez le faisan commun (Phasianus colchicus). Biochem. Syst. Ecol. 4: 223-226.

LYNCH, 5. C. and E. R. VYSE, 1979 Genetic variability and divergence in Grayling, Thymallus arcticus. Genetics 92 : 263-278.

MANWELL, C. and C. M. A. BAKER, 1977 Genetic distance between the Australian Merino and the Poll Dorset sheep. Genet. Res. 29: 239-253.

MARINKOVIC, D., F. J. AYALA and M. ANDJELKOVIC, 1978 Genetic polymorphism and phylogeny of Drosophila subobscura. Evolution 32: 164-1 73.

MCDERMID, E. M., G. H. Vos and H. J. DOWNING, 1973 Blood groups, red cell enzymes and serum proteins of Baboons and Vervets. Folia Primat. 19: 319-326.

MERRLE, D. A., S. I. GUTTMAN and M. A. NICKERSON, 1977 Genetic uniformity throughout the range of the Hellbender, Cryptobranchus alleganiensis. Copeia, No. 3, 549-553.

MERRITT, R. B., J. F. ROGERS and B. J. KURZ, 1978 Genic variation in the Longnose Dace, Rhinichthys cataractae. Evolution 32 : 116-124.

MITTON, J. B. and R. K. KOEHN, 1975 Genetic organization and adaptive response of allozymes to ecological variables in FunduZus heteroclitus. Genetics 79: 97-1 11.

NEEL, 5. V., N. VEDA, C. SATOH, R. E. FERRELL, R. J. TANIS and H. B. &MILTON, 1978 The frequency in Japanese of genetic variants of 22 proteins. V. Summary and comparlson with data on Caucasians from the British Isles. Ann. Hum. Genet., Lond. 41: 429-441.

NEI, M., 1972 Genetic distance between populations. Am. Naturalist 106: 283-292 -, 1975 Molecular Population Genetics and Evolution. North Holland, Amsterdam and New York. __ , 1976 Comments on “The intensity of selection for electrophoretic variants in natural populations of Drosophila” by B. D. H. LATTER. p. 409. In: Population Genetics

- . 1980 Stochastic theory of population genetics and evolution. In: Mathematical Models in Biology. Edited by C. BARIGOZZI. Springer-Verlag, Berlin. (In press).

R. CHAKRABORTY, P. A. FUERST AND M. NE1

KNOPP, K. W., 1977

KOEHN, R. K. and W. F. EANES, 1976

LAKOVAARA, S. and A. SAURA, 1971

LARSON, A. and R. HIGHTON, 1978

LATTER, B. D. H., 1976

and Ecology. Edited by S. KARLIN and E. NEVO. Academic Press, New York.

Page 23: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1061

NEI, M., R. CHAKRABORTY and P. A. FUERST, 1976 Infinite allele model with varying mutation rate. Proc. Natl. Acad. Sci. US. 73: 41644168.

NEI, M., P, A. FUERST and R. CHAKRABORTY, 1976 Testing the neutral mutation hypothesis by distribution of single locus heterozygosity. Nature 262 : 4 9 1 4 3 . - , 1978 Subunit molecular weight and genetic variability of proteins in natural populations. Proc. Natl. Acad. Sci. U.S. 75: 3359-3362.

NEI, M. and W. -H. LI, 1976 The transient distribution of allele frequencies under mutation pressure. Genet. Res. 2.8: 205-214.

NEI, M., T. MARUYAMA and R. CHAKRABORTY, 1975 The bottleneck effect and genetic variability in populations. Evolution 29: 1-10.

NEMETH, S. T. and M. L. TRACEY, 1979 Allozyme variability and relatedness in six crayfish species. 5. Heredity 70: 37-43.

NEVO, S., 1976 NEVO, E., H. C. DESSAUER and K.-C. CHUANG, 1975 Genetic variation as a test of natural selec-

tion. Proc. Natl. Acad. Sci. U.S. 72: 2145-2149. NEVO, E., Y. J. KIM, C. R. SHAW and C. S. THAELER, 1974 Genetic variation, selection and

speciation in Thomomys talpoides pocket gophers. Evolution 28: 1-23. NOZAWA, K., T. SHOTAKE, Y. OHKURA and Y. TANABE, 1977 Genetic variations within and be-

tween species of Asian macaques. Japan. J. Genet. 52 : 15-30. OHTA, T., 1974 Mutational pressure as the main cause of molecular evolution and polymorphism.

Nature 252: 351-354. -, 1976 Role of very slightly deleterious mutations in molecu- lar evolution and polymorphism. Theoret. Pop. Biol. 10: 254-275.

A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22 : 201-204.

The organization of genetic diversity in the partheno- genetic lizard Cnemidophorus tesselatus. Genetics 84: 791-805.

Genic variation in hybridizing popula- tions of gophers (genus Thomomys) . Systemat. Zool. 21 : 263-270,

Genetic variation in Thomomys bottae pocket gophers: macrogeographic patterns. Evolution 31 : 697-720.

Genetic and morphologic divergence among in- troduced rat populations (Rattus rutius) of the Gal6pagos Archipelago, Ecuador. Systemat. Zool. 24: 296-310.

Patterns of gene variation in central and marginal populations of Dro- sophila robusta. Genetics 75: 347-369. - , 1973b Low gene variation in Drosophila busckii. Genetics 75: 571-576. -, 1977a Further studies on gene polymorphism in the main body and geographically isolated populations of Drosophila pseudoobscura. Genetics 85: 713-719. -, 1977b Gene polymorphism in natural populations of Drosophila per- similis. Genetics 85: 513-520. -, 1977c Genetic divergence in closely related sibling species Drosophila pseudoobscura, Drosophila persimilis and Drosophila miranda. Evolution 31: 14-23.

RAMSHAW, J. A. M., J. A. COYNE and R. C. LEWONTIN, 1979 The sensitivity of gel electro-

SAGE, R. D. and R. K. SELANDER, 1975 Trophic radiation through polymorphism in cichlid fishes.

SAURA, A., 1974 Genic variation in Scandinavian populations of Drosophila bifusciata. Heredi-

SAURA, A., 0. HALKKA and J. LOKKI, 1973 Enzyme gene heterozygosity in small island popula-

Genetic variation in constant environments. Experientia 32 : 858-859.

OHTA, T. and M. KIMURA, 1973

PARKER, E. D. and R. K. SELANDER, 1976

PATTON, 5. L., R. K. SELANDER and M. H. SMITH, 1972

PATTON, 5. L. and S. Y. YANG, 1977

PATTON, J. L., S. Y. YANG and P. MYERS, 1975

PRAKASH, S., 1973a

phoresis as a detector of genetic variation. Genetics 93: 1019-1037.

Proc. Natl. Acad. Sci. US. 72: 4669-4673.

tas 76: 161-172.

tions of Philaenus spumarius (L.) (Homoptera). Genetica U: 459-473.

Page 24: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

1062

SELANDER, R. K., W. G. HUNT and S. Y. YANG, 1969

SELANDER, R. K., M. H. SMITH, S. Y. YANG, W. E. JOHNSON and J. B. GENTRY, 1971

R. CHAKRABORTY, P. A. FUERST AND M. NE1

Protein polymorphism and genic hetero- zygosity in two European subspecies of the house mouse. Evolution 23: 379-390.

Biochemical polymorphism and systematics in the genus Peromyscus. I. Variation in the old-field mouse (Peromyscus pol~ono:us). Univ. Texas Publ. 7103: 49-90.

SELANDER, R. K. and S. Y. YANG, 1969 Protein polymorphism and genic heterozygosity in a wild population of the house mouse (Mus musculus). Genetics 63: 653-667.

SELANDER, R. K., S. Y. YANG, R. C. LEWONTIN and W. E. JOHNSON, 1970 Genetic variation in the horseshoe crab (Limulus polyphemus), a phylogenetic “relic.” Evolution 24: 402-414.

SENE, F. M. and H. L. CARSON, 1977 Genetic variation in Hawaiian Drosophila. IV. Allozymic similarity between D. siluestris and D. heteroneura from the island of Hawaii. Genetics 86:

Blood protein variations in Baboons. I. Gene exchange and genetic distance between Pupio anubis, Pap.‘o hamadryas and their hybrid. Japan. 5. Genet. 52 : 223-237.

SMITH, M. H., R. K. SELANDER and W. E. JOHNSON, 1973 Biochemical polymorphism and systematics in the genus Peromyscus. 111. Variation in the Florida deer mouse (Peromyscus floridanus), a Pleistocene relict. J. Mammalogy 54: 1-13.

Enzyme polymorphism and dessication resistance in two species of Hawaiian Drosophila. W.D. Thesis, Univ. of Hawaii.

The effect of intragenic recombination on the number of alleles in a finite population. Genetics 88: 829-844.

Population genetics of a “colonising” lizard: natural selection for allozyme morphs in Anolis grahami. Hercdity 35: 241-247.

The population genetics of partheno- genetic strains of Drosophila mercatorum. 11. The capacity for parthenogenesis in a natural, bisexual population. Genetics 82 : 527-542.

Biochemical genetics of lobsters: genetic variation and the structure of American lobster (Homarus americanus) populations. J. Fish. Res. Board Canada 32 : 2091-2101.

Variation and heterozygosity in sexu- ally us. clonally reproducing populations of Poeciliopsis. Evolution 31 : 767-781.

Subunit size of enzymes and genetic heterozygosity in vertebrates. Biochem. Genet. 16: 799-810.

Protein variation in the Plaice, Pleuranectes plrztessu L. Genet. Res. 30: 45-62.

An electrophoretic comparison of the Hispaniolan lizards Anolis cyboies and A . marconoi. Breviora 431 : 1-8.

Genetic variability and similarity in the Anolis lizards of Bimini. Evolution 26 : 523-535.

187-198. SHOTAKE, T., K. NOZAWA and Y. TANABE, 1977

STEINER, W. W. M., 1974

STROBECK, C. and K. MORGAN, 1978

TAYLOR, C. E. and G. C. GORMAN, 1975

TEMPLETON, A. R., H. L. CARSON and C. F. SING, 1976

TRACEY, M. L., K. NELSON, D. HEDGECOCK, R. A. SHLESER and M. L. PRESSICK, 1975

VRIJENHOEK, R. C., R. A. ANGUS and R. J. SCHULTZ, 1977

WARD, R. D., 1978

WARD, R. D. and J. A. BEARDMORE, 1977

WEBSTER, T. P., 1975

WEBSTER, T. P., R. K. SELANDER and S. Y. YANG, 1972

WRIGHT, S., 1949 YANG, S. Y., L. L. WHEELER and I. R. BOCK, 1972

ZOUROS? E., 1979

ZOUROS, E., C. B. KRIMBAS, S. TSAKAS and M. LOUKAS, 1974

Genetics of populations. Encyl. Britannica 10: 111-112. Isozyme variations and phylogenetic relation-

Mutation rates, population sizes, and amounts of electrophoretic variation of

Genic versus chromosomal variation

ships in the Drosophila bipec:ana:a species complex. Univ. of Texas Publ. 7213: 213-227.

enzyme loci in natural populations. Genetics 92 : 623446.

in natural populations of Drosophila subobscura. Genetics 78: 1223-1244. Corresponding editor: B. S. WEIR

Page 25: of - GeneticsALLELE FREQUENCY DISTRIBUTIONS 1041 DATA ANALYSIS General pattern of allele frequency distributions OHTA ( 1976) examined whether the allele-frequency distributions for

ALLELE FREQUENCY DISTRIBUTIONS 1063

APPENDIX

I. Allele frequency distribution for [he stepwise mutation model Although KIMURA and OHTA’S (1978) formula for +(z) is more accurate than KIMURA and

OKTA’S (1975), we shall use the latter formula, since, in our case there is not much difference between the values obtained by the two formulae and the latter formula simplifies the mathe- matical treatment considerably. KIMURA and OHTA’S (1975) formula is

where B (. , .) is the beta function. When M varies among loci following the gamma distribution (21, n ( p , 9 ) is given by

n(p,q) = . (~)~[J(M)B(M,A+I)/B(A+i,M+n-i)]dM, (A3) a=np+l a

which can be evaluatedrumerically. 11. Estimation of M or M far a large number of loci

The estimate (h) of M for the infinite-allele model with constant mutation rate is obtained by equating the average heterozygosity (B) to its expectation, M/(l + M). Namely, dl = n/ - (1 - EL Similarly, the M value for the stepwiscmutation model may be estimated by M = H(2’- H)/[2(1 - a),], since the expectation of H is given by 1 - (1 + 2M)-% (KIMURA and OHTA 1975). On the other hand, the estimate of A is given by [ (1 + 2 2 ) ?h - 1]/2.

In the case of the infinite-allele modelxith varying mutation rate, we assume a=l, following our earlier studies. Then, /3 E a/G= 1/M. In this case, the expectation of H is 1 - pePB,(/3).

where E,(/3) = J c t d t / t (NEI, CHAKRABORTY and FUERST 1976). If we note that the a values so far observed in natural populations are all less than 0.35, pePE, (p) may be expressed as

m

B

W E l (8) = (p2 + a,P + a,) /(P’ + blP + b,)

with the accuracy of 5 x 10-5, where a, = 2.334733, a , = 0.250621, b, = 3.330657 and b, = 1.681534 (ABRAMOWITZ and STECUN 1965, p. 231). Therefore, a may be estimated by

M = 2i!?/[b,(l- n) - a, + ‘{al - b,(l - 8)}, - 4#{a2 - b , ( l - 8)}] (A4)

where 2 5 0.5. WhenW > 0.5, the term involving the square root has a negative sign. NEI, CHARRABORTY and FUERST (1976) have shown that the expected heterozygosity for the

stepwise mutation model with varying mutation rate is given by 1 - 1 /1./3/2 exp (p/2) erfc

(dm) for a = 1, where erfc (2) is the error function given by ( 2 / d G ) J exp ( - t2)dt and

/3 = 1/a. Usingthis result, we have tabulated the relationship between H and a. Therefore, we can estimate M from the average heterozygosity estimate.

m

1: