8
Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data Li Jin and Ranajit Chakraborty Center for Demographic and Population Genetics, University of Texas Houston Health Science Center DNA fingerprinting exhibits multilocus genotypes of individuals, detected by the use of a single multilocus probe. Consequently, population data on DNA fingerprinting do not provide a complete characterization of the genetic variation in terms of allele-frequency distributions, since neither the number of loci nor the locus affiliation of alleles is directly observable. Yet DNA fingerprinting has been proved to be a cost-effective method of detecting hypervariable polymorphisms in several organisms, where the traditional loci fail to detect enough variation for microevolutionary studies. In the present paper we demonstrate that the above-mentioned features of DNA fin- gerprinting data do not cause any serious problem when they are used in evolutionary studies. Bias-corrected estimators of Nei’s standard and minimum genetic distances are derived, and, by an application of this theory to data on seven short tandem repeat loci in three major human populations, it is shown that these modified measures of genetic distances based on DNA fingerprint patterns are quite close to Nei’s distances based on locus-specific allele frequencies. Empirical as well as theoretical support of the adequacy of such genetic distances from DNA fingerprinting data is also discussed, and it indicates that the technical limitations of DNA fingerprinting should not deter the use of the method for short-term evolutionary studies. Introduction The polymorphic loci known as variable number of tandem repeat (VNTR) loci are highly polymorphic in humans and in many other organisms (Burke and Bruford 1987; Nakamura et al. 1987; Gilbert et al. 1990; Georges et al. 199 1; Ely et al. 1992). They are charac- terized by a large number of alleles per locus and high heterozygosity. Consequently, the VNTR loci, as a new class of genetic marker, have proved to be extremely useful in gene mapping (Nakamura et al. 1987; Hearne et al. 1992)) forensic identification of individuals (Jef- freys et al. 1985 c; Chakraborty and Kidd 199 1), deter- mination of relatedness of individuals (Jeffreys et al. 1985a, 1991; Chakraborty and Jin 1993a, 1993b), and evolutionary studies of closely related populations or species (Gilbert et al. 1990; Jin and Chakraborty, ac- cepted ) . A DNA fingerprint is the pattern revealed by a sin- gle multilocus probe (MLP) that simultaneously detects many minisatellite loci in the genome, by Southern blot Key words: DNA fingerprint, genetic distances, coefficient of gene diversity, population genetics. Address for correspondence and reprints: Dr. Ranajit Chakraborty, Center for Demographic and Population Genetics, University of Texas Graduate School of Biomedical Sciences, P.O. Box 20334, Houston, Texas 77225. Mol. Biol. Evol. 1 l( 1): 120-127. 1994. 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l lOi-0012$02.00 120 analysis (Jeffreys et al. 1985b, 1988; Wong et al. 1986, 1987; Nakamura et al. 1987). Minisatellite loci are the VNTR loci whose repeat units are > 15 bp long. The genetic basis of a DNA fingerprint revealed by an MLP is that the probe is homologous to a large number of chromosomally dispersed genomic loci, many of which exhibit considerable genetic polymorphism (Nakamura et al. 1987; Armour et al. 1990). The simultaneous screening of many hypervariable loci makes DNA fin- gerprinting by MLPs efficient and cost-effective. Several efforts have been made to utilize MLP data for evolutionary studies, since the first demonstration of their utility in genetic studies (Jeffreys et al. 1985b). However, it is noteworthy that the existing statistical measures, including Nei’s genetic distances ( Nei 1987 ) , are not directly applicable to MLP data, since the num- ber of loci detected by an MLP is unknown and since the allele frequencies at each locus are undetermined. Lynch ( 1990, 199 1) proposed an empirical similarity measure on the basis of allele sharing. A very similar approach was used by Yuhki and O’Brien (1990) and Gilbert et al. ( 1990, 199 1) in their analyses. Unfortu- nately, the lack of a theoretical study of the relationship of those genetic distance measures with the time of di- vergence makes the approach less appealing. However, it has been shown that measures of genetic distance from some summary statistics based on allele sharing can be

Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

  • Upload
    lambao

  • View
    236

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

Li Jin and Ranajit Chakraborty Center for Demographic and Population Genetics, University of Texas Houston Health Science Center

DNA fingerprinting exhibits multilocus genotypes of individuals, detected by the use of a single multilocus probe. Consequently, population data on DNA fingerprinting do not provide a complete characterization of the genetic variation in terms of allele-frequency distributions, since neither the number of loci nor the locus affiliation of alleles is directly observable. Yet DNA fingerprinting has been proved to be a cost-effective method of detecting hypervariable polymorphisms in several organisms, where the traditional loci fail to detect enough variation for microevolutionary studies. In the present paper we demonstrate that the above-mentioned features of DNA fin- gerprinting data do not cause any serious problem when they are used in evolutionary studies. Bias-corrected estimators of Nei’s standard and minimum genetic distances are derived, and, by an application of this theory to data on seven short tandem repeat loci in three major human populations, it is shown that these modified measures of genetic distances based on DNA fingerprint patterns are quite close to Nei’s distances based on locus-specific allele frequencies. Empirical as well as theoretical support of the adequacy of such genetic distances from DNA fingerprinting data is also discussed, and it indicates that the technical limitations of DNA fingerprinting should not deter the use of the method for short-term evolutionary studies.

Introduction

The polymorphic loci known as variable number of tandem repeat (VNTR) loci are highly polymorphic in humans and in many other organisms (Burke and Bruford 1987; Nakamura et al. 1987; Gilbert et al. 1990; Georges et al. 199 1; Ely et al. 1992). They are charac- terized by a large number of alleles per locus and high heterozygosity. Consequently, the VNTR loci, as a new class of genetic marker, have proved to be extremely useful in gene mapping (Nakamura et al. 1987; Hearne et al. 1992)) forensic identification of individuals (Jef- freys et al. 1985 c; Chakraborty and Kidd 199 1 ), deter- mination of relatedness of individuals (Jeffreys et al. 1985a, 1991; Chakraborty and Jin 1993a, 1993b), and evolutionary studies of closely related populations or species (Gilbert et al. 1990; Jin and Chakraborty, ac- cepted ) .

A DNA fingerprint is the pattern revealed by a sin- gle multilocus probe (MLP) that simultaneously detects many minisatellite loci in the genome, by Southern blot

Key words: DNA fingerprint, genetic distances, coefficient of gene diversity, population genetics.

Address for correspondence and reprints: Dr. Ranajit Chakraborty, Center for Demographic and Population Genetics, University of Texas Graduate School of Biomedical Sciences, P.O. Box 20334, Houston, Texas 77225.

Mol. Biol. Evol. 1 l( 1): 120-127. 1994. 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l lOi-0012$02.00

120

analysis (Jeffreys et al. 1985b, 1988; Wong et al. 1986, 1987; Nakamura et al. 1987). Minisatellite loci are the VNTR loci whose repeat units are > 15 bp long. The genetic basis of a DNA fingerprint revealed by an MLP is that the probe is homologous to a large number of chromosomally dispersed genomic loci, many of which exhibit considerable genetic polymorphism (Nakamura et al. 1987; Armour et al. 1990). The simultaneous screening of many hypervariable loci makes DNA fin- gerprinting by MLPs efficient and cost-effective.

Several efforts have been made to utilize MLP data for evolutionary studies, since the first demonstration of their utility in genetic studies (Jeffreys et al. 1985b). However, it is noteworthy that the existing statistical measures, including Nei’s genetic distances ( Nei 1987 ) , are not directly applicable to MLP data, since the num- ber of loci detected by an MLP is unknown and since the allele frequencies at each locus are undetermined. Lynch ( 1990, 199 1) proposed an empirical similarity measure on the basis of allele sharing. A very similar approach was used by Yuhki and O’Brien (1990) and Gilbert et al. ( 1990, 199 1) in their analyses. Unfortu- nately, the lack of a theoretical study of the relationship of those genetic distance measures with the time of di- vergence makes the approach less appealing. However, it has been shown that measures of genetic distance from some summary statistics based on allele sharing can be

Page 2: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

Genetic Distance from DNA Fingerprint 121

used for evolutionary studies (Chakraborty and Jin 1993b; Jin and Chakraborty, accepted), since the mu- tation-drift expectations of such genetic distances theo- retically retain a linear relationship with the time of di- vergence of two populations studied (t) when t is not very large (say, t d N, generations, where N, is the ef- fective size of the population ) .

A statistic proposed by Stephens et al. ( 1992) to estimate heterozygosity shed some light on the utility of traditional genetic distance measures based on allele- frequency distributions for DNA fingerprint data. They showed that the specific distribution of allele frequencies for each locus is not needed in estimating the average heterozygosity. Therefore, the estimation of average heterozygosity is possible because the frequency of each allele can be estimated from the frequency of the cor- responding band. However, their estimator is biased. Jin and Chakraborty ( 1993) therefore proposed bias-cor- rected estimators of the average heterozygosity and of the number of loci, from DNA fingerprint band-fre- quency data in a population sample.

In the present paper, we argue that genetic distance measures (e.g., Nei’s minimum genetic distance and Nei’s standard genetic distance) can be written in a form in which the specific distributions of allele frequencies for each locus are not needed, and that, in turn, these genetic distances can be estimated from the frequencies of the bands corresponding to the alleles. With the same logic, Nei’s ( 1973 ) coefficient of gene diversity, GST, can also be estimated. The application of various genetic distance measures to a group of short tandem repeat (STR) loci (Edwards et al. 1992) indicates that Nei’s genetic distances do not differ from their modified ver- sion for DNA fingerprint data, on the basis of MLP. The technical limitations of MLP data are also discussed.

Theory The Alternative Expressions of Nei’s Genetic Distances

Nei’s ( 1972 ) minimum genetic distance is defined

by

& = KJ~+J~W~-J~Y, (1)

where

L n1

Jx = c c P&r,lL 9

I=1 i=l

L ni

JY = c c PfY(l)IL 9

I=1 i=l

JXY = C C Pi.x(l)Piy(l)lL 7

I=1 i=l

in which pix(/) and piy(/) are the frequencies of the ith allele of the Ith locus from populations X and Y, re- spectively. L is the number of loci, and nI is the number of alleles at the Ith locus. Note that Jx, Jr, and Jxy can also be written as

Jx = 2 P&IL, k=l

JY = 5 P:,IL > (3) k=l

JXY = 5 PkxPky/ L 9

k=l

where pkX and pky are the frequencies of the kth allele among all alleles, regardless of their locus affiliation, and A is the total number of alleles at L loci considered (A = Ck, r-21).

The Estimation of Number of Loci and Heterozygosity

Since all bands (alleles) observed are effectively dominant, the expectation of the relative frequency of occurrence of the kth band (allele), Sk, is given by

E(Sk) = 2Pk - p; t (4)

where pk is the frequency of the kth allele among all alleles for one particular population, and Hardy-Wein- berg equilibrium is assumed (Stephens et al. 1992). Furthermore, by solving equation (4)) pk can be esti- mated by

jk= 1 -h -Sk. (5) and the standard genetic distance (Nei 1972) is defined

by

D, = [(log Jx+lw Jd/Wlw JXY,

It was suggested by Jin and Chakraborty ( 1993) that the average heterozygosity in a population (H) and the number of loci (L) for single-probe multilocus DNA

(2 1 fingerprints can be estimated (bias-corrected) by

Page 3: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

122 Jin and Chakraborty

AP

c Sk/v-iy which is based on the Taylor series expansion, and

l!,=L,+ 2 (l-\J1-sk)-k=’ 8n the binomial distribution of &x and sky, leading

, (@ to V(Skn-) = E(Skx)[l-~(~kx)]h’, k=l

and v(sky) = E( skv) [ I- E( sky)] /ny, are used in deriving equa- tion (9).

and This yields a bias-corrected estimate of Jxu,

^ JXY = JXY - A,

(7) where

(11)

where A, is the total number of alleles at all polymorphic loci, L, is the number of monomorphic loci, and n is the number of individuals sampled from the population. Since J = 1 - H, the bias-corrected estimates of Jx and Jy can be obtained from equation ( 7 ) .

VEX (1-G)lnx

(12)

++(l--Gx)/ny SL, sky Ii The Estimation of Genetic Distance for DNA Fingerprinting

Following a similar logic, we now derive a bias- corrected estimate of Jxy which may lead to bias-cor- rected estimates of Nei’s genetic distances (for both minimum distance and standard distance).

The probability ( Jxy) that two alleles, one each chosen from of the two populations, are identical can be estimated by substituting pk’s by using equation (5), yielding

with J!, estimated by taking the average of the two pop- ulation-specific estimates of L obtained from equa- tion (6).

Therefore, the bias-corrected estimates of Nei’s minimum distance and standard distance can be ob- tained by substituting jxy, jx, and jy in equations ( 1) and (3). These become

Al,= H~x+~Ywl -A,, (13)

and

A

jxY = 2 iktiky/L k=l

B3f. = [(log &+1og jy)/2] - log &y . (14)

( 8 ) The Estimation of Coefficient of Gene Diversity \ I

= 5 (l-G)( l-=)/L, k=l

Nei ( 1973 ) defined the coefficient of gene diversity, GST, as

GST = DST

where Sk-v and sky are the Sk’s in populations X and Y, Hs+ DST’ (15)

respectively, and f, is the estimated number of loci. It can be shown that where DST = ck 21 Dk,/s2, in which & is Nei’s ( 1972)

minimum genetic distance between the kth and Zth

E[Jxrl = i~xU-~Vn~ 1 populations, and Hs = ck Hk/s, in which Hk is the het- erozygosity of kth population. By using equation ( 13 ), GSr can be estimated by

&(l-G)in, gL (9) V IV 2 2 &/

where nx and ny are the number of individuals sampled from populations X and Y, respectively. The approxi- mation

k I GsT= 1 ” 7

S ; Hk + z T & (16)

where equations ( 13) and (7) are used for estimating 1

E[( 1-IJ1_skx)( l-G)] &I and Hk, IXSpeCtiVdy.

= Pkdky + V(Skx)Pky( l-Pkx)-3/8 Numerical Results

Recently, Edwards et al. ( 199 1) described several + V(Sky)Pkx( l-Pky)-3/8 , ( 10) STR loci, each of which demonstrates considerable de-

Page 4: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

Genetic Distance from DNA Fingerprint 123

Table 1 Estimation of the Number of Loci and Heterozygosity for STR Loci

ESTIMATES OF HETEROZYGOSITY FROM

METHOD AND POPULATION

ESTIMATE OF No. OF LOCI

Multilocus Data

Allele Frequencies

Five locus: Caucasian Blacks Asians .

Seven locus: Caucasian Blacks Asians

5.04 0.635 0.638 5.03 0.733 0.730 4.7 1 0.628 0.607

6.88 0.700 0.694 6.88 0.762 0.762 6.84 0.669 0.654

grees of polymorphism within populations. The popu-

lation genetic characteristics of five of these loci (THO 1, RENA4, FARB, HPRTB, and ARA) are described else- where (Edwards et al. 1992) in Caucasians ( 200 indi- viduals) , American blacks ( 200 individuals), and Asians (80 individuals) currently residing in Houston. Two more STR loci (CD4 and PLA2Al) have been typed recently for the same individuals from the populations mentioned above (H. Hammond, personal communi- cation ) .

For each individual from whom genotype data at all loci are available, an MLP DNA fingerprinting pattern can be generated by merging the genotype data of the seven STR loci together, so that each individual would exhibit 7 (for those who are homozygous at all loci) to 14 distinct bands (alleles, for those who are heterozygous at all loci). Therefore, such merged data mimics a seven- locus DNA fingerprinting profile for each individual sampled. Table 1 shows the estimates of the number of loci (by using eq. [ 6 1) and average heterozygosity (by using eq. [ 71)) which may be contrasted with the true number of loci and the unbiased estimate (Nei 1978) of the average heterozygosity based on locus-specific al- lele-frequency data. We present such comparisons with two types of data merging (five locus and seven locus), since two (HPRTB and ARA) of the seven STR loci scored in this survey of three populations are X linked. Therefore, the five-locus genotype data consist of ge- notypic information on all individuals (male and female) sampled (373 total), while the seven-locus data consist of genotypic information on the females only (yielding 15 1 individuals total). For both comparisons, the esti- mates of H from the multilocus genotype data (mim- icking DNA fingerprinting) are reasonably close to the estimates based on locus-specific allele frequencies, and the error of prediction of the true number of loci is also

small. The estimates for the Asian sample exhibit a greater deviation. This is probably because the sample size for this case is the smallest (7 1 for the five-locus data, and 27 for the seven-locus data). Of course, since the Asian sample is a composite of individuals of several different national populations (e.g., Taiwanese, Filipino, Japanese, Chinese, and Vietnamese), possible substruc- truing effects may also influence such departures. It should be recalled, however, that Edwards et al. ( 1992) did not find any significant departure from the Hardy- Weinberg expectations of locus-specific genotype fre- quencies in this sample.

The computations of the modified distances (Dsf from eq. [ 141 and D,ffrom eq. [ 131) are shown in table 2 for these merged DNA profiles. For comparison, we also computed the bias-corrected Nei’s standard distance and minimum distance estimates (Nei 1978 ) on the basis of the locus-specific allele frequencies, as reported by Edwards et al. ( 1992). A fifth estimate of genetic dis- tance, Di, also shown in this table, is based on allele sharing between individuals, Di = 1 - [ nb/ ( n,l + n,2)], where n,l and nw2 are the average numbers of allele shared between individuals within the two populations, and nb is the average number of shared alleles between two individuals drawn one each from the two popula- tions (Jin and Chakraborty, accepted). These values

(n wl f nw2, and ?zb) were computed by taking averages over all possible pairs of individuals in the sample. We considered this measure of genetic distance, since Jin and Chakraborty (accepted) have shown that the evo- lutionary dynamics of Di is similar to that of Nei’s stan- dard genetic distance (0,) on the basis of locus-specific allele frequencies when the time of divergence between two populations is small, even though the computation of the statistic Di does not require the information of locus affiliation of alleles. Since Nei’s minimum distance measures (D, and D,f) have an asymptotic value of

Table 2 Estimates of Nei’s Distances Obtained from Allele-Frequency Data and from DNA Fingerprint Data for STR Loci

GENETIC DISTANCES

POPULATIONS COMPARED DSf DS D m/ D, Di

Five locus: C-B 0.1560 0.1548 0.1547 0.1527 0.1559 C-A . 0.1133 0.1099 0.1071 0.1048 0.1134 B-A . 0.1510 0.1422 0.1519 0.1478 0.1475

Seven locus: C-B . 0.1483 0.1496 0.1434 0.1462 0.1503 C-A 0.092 1 0.1136 0.0892 0.1084 0.0977 B-A _. _. 0.1401 0.1414 0.1757 0.1458 0.1430

NOTE.-C = Caucasian; B = American black; and A = Asian.

Page 5: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

124 Jin and Chakraborty

( J_+Jy)/2 (which occurs when the two populations, X and Y, do not share any allele at all loci), we divided these measures by the average homozygosity, ( JX+ Jr)/ 2, in the two populations, so that the genetic distance values have a range of 0 to 1.

As in table 1, the genetic distances between the three pairs of populations were computed separately for merged genotype data for five loci and seven loci. These numerical illustrations indicate that the multilocus ge- notype-based estimates of genetic distances (D,f and Dm,f) are close to the respective estimates based on locus- specific allele frequencies (D, and D,), even though the former ones utilize the frequency of each band (allele) from DNA fingerprint patterns without the recognition of locus affiliation of bands (alleles). The band (allele) sharing-based estimate ( Dj ) is also close to the estimate of Nei’s standard distance (D,, based on locus-specific allele frequencies) or its modified version ( D,J, based on the multilocus data), which is consistent with the pre- diction of Jin and Chakraborty (accepted). Therefore, we conclude that the incomplete information (i.e., the unknown number of loci and the lack of knowledge of locus affiliation of alleles) in DNA fingerprinting data has little effect on the estimation of genetic distances between populations. The bias corrections, however, are critical, since without them (i.e., when the last term of eq. [ 61 and the term A of eq. [ 111 are ignored), the genetic distances would have been overestimated to the extent of 5%-37.5% of the amounts shown in table 2.

The coefficient of gene diversity, GST, estimated from the DNA fingerprinting data by using equation ( 16), for these three populations becomes 6.46% for the five-locus data and 6.00% for the seven-locus data. Those values are virtually identical (6.40% and 5.95%, respec- tively) to the estimates from locus-specific allele fre- quencies, using Nei and Chesser’s ( 1983) method.

Discussion

We have shown elsewhere that a bias correction for estimating the average heterozygosity and number of loci from DNA fingerprinting data is important (Jin and Chakraborty 1993 ) , and, because of the sampling prop- erty of sk and its relationship with allele frequencies (eq. [ 4]), the method of bias correction suggested by Nei ( 1978) and invoked by Stephens et al. ( 1992) must be modified in this context. In the present paper we observe that the same modified method of bias correction applies to the estimation of genetic distances as well, and in this case the bias correction is even more important. For example, in the numerical examples of the STR loci for the three major human populations given above, the bias of genetic distance estimates is 5%-37.5%, when the last term of equation ( 6) and the term A of equation ( 11) are not used. Therefore, we conclude that, even

with our approximations, the need for bias correction is even more critical in the estimation of genetic dis- tances, as compared with the estimation of the average heterozygosity and number of loci alone.

The close correspondence between the estimates based on locus-specific allele frequencies and the ones from multilocus data demonstrates that, even though the DNA fingerprinting patterns neither provide infor- mation with regard to the total number of loci underlying such DNA profiles nor recognize the locus affiliation of alleles, they may be used to estimate most of the im- portant summary measures of genetic variation for evo- lutionary studies (e.g., the average heterozygosity, genetic distance, and the coefficient of gene diversity). The technical limitations of DNA fingerprinting do not se- riously affect the predictability of these measures of ge- netic variation, as shown in the illustration above.

The inference regarding the average heterozygosity and genetic distance, based on DNA fingerprinting data, requires two additional assumptions that are not needed when each locus is individually characterized. First, the Hardy-Weinberg rule is assumed for each population, when the allele frequencies at the underlying loci in DNA fingerprinting data are estimated from the relative fre- quencies of each band within populations (sk; see eq. [ 5 I). This assumption has been a subject of considerable debate in recent years (Lander 1989; Cohen 1990; Le- wontin and Hart1 199 1). But, it has been widely dem- onstrated, in terms of traditional genetic markers (Mourant et al. 1976), as well as with hypervariable polymorphic loci (Devlin et al. 1990; Chakraborty and Kidd 199 1; Chakraborty et al. 199 1, 1993; Deka et al. 1992, and accepted; Devlin and Risch 1992; Edwards et al. 1992; Weir 1992), that even the cosmopolitan populations that are demographically substructured do not exhibit any substantial departure from the Hardy- Weinberg expectation of genotype frequencies. Popu- lation substructure affects the estimation of the total number of segregating alleles within a population, and, even there, in the presence of population substructure, an apparent excess of only the rare alleles is observed (Chakraborty et al. 1988; Chakraborty 1990). Since the rare alleles do not contribute substantially either in the evaluation of the average heterozygosity or in the esti- mation of genetic distance (Nei 1978 ) , we contend that the presence of population substructure should not greatly affect the estimation of allele frequencies from the relative frequencies of specific fingerprinting bands (sk). The number of individuals sampled is probably more important, since, when the sample size is small, the imprecision of the estimate of Sk values may accu- mulate more error than any real effect of substructuring.

Second, in our illustration of merged genotype data we treated the different alleles from different loci as sep-

Page 6: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

Genetic Distance from DNA Fingerprint 125

arate bands; hence, the alleles are equated to bands. In contrast, in real DNA fingerprint data, comigration of DNA fragments in electrophoresis, resulting from alleles at different loci, may violate this assumption. While em- pirical data on the extent of comigration in DNA fin- gerprinting are not yet available, there are some studies suggesting that the assumption of no comigration may not be a serious problem in using DNA fingerprinting data for evolutionary studies. For example, in a position- by-position analysis of fragment sizes, Krawczak and Bockel( 1992) and Bockel et al. ( 1992) showed that the problem of corn&ration may be examined by postulating “position-specific genetic factors,” F, an unobservable variable. When F > 0, an individual’s DNA fingerprint pattern would exhibit the presence of bands at that po- sition. The relative frequencies of the presence of bands at specific positions (x, in the terminology of Bockel et al. 1992)) in turn, predict how large the values of F can be in a DNA fingerprint database from population sur- veys. In general, values of x do not exceed 0.15-0.25, so that the probability of F b 2 is ~0.035, under the assumption that F is distributed as a Poisson variate. This suggests that comigration of alleles at different loci is not a very common phenomenon. Furthermore, there is no evidence that alleles of any specific frequency would be more likely to comigrate than the ones that form distinctly different bands.

DNA fingerprinting data and the data generated by several probes must be used with some reservations for evolutionary studies for other reasons as well (e.g., see Kidd et al. 199 1). For example, since polymorphisms at such loci are due to copy-number variation of tan- demly repeated units and since the exact mechanism of production of new alleles is not yet known, their utility for studying genetic affinities of populations has been questioned. However, although new genetic variability at such loci seems to be generated by complex processes, such as strand slippage during DNA replication or non- homologous sister-chromatid exchange, recent studies suggested that different classes of VNTR loci have pat- terns of genetic variation that agree consistently with the two main mutation-drift models of population genetics. There is empirical (Levinson and Gutman 1987a, 1987b; Kwiatkowski et al. 1992; Schlijtterer and Tautz 1992; Bowcock et al. 1993; Mahtani and Willard 1993 ) as well as theoretical (Shriver et al. 1993; Valdes et al. 1993) support of the assumption that the hypervariable loci whose repeat units are short ( l-5 bp, called “microsatel- lite,” or “STR,” loci) follow the predictions of the step- wise-mutation model (Ohta and Kimura 1973). In con- trast, minisatellite loci with repeat units b 15 bp follow the prediction of the infinite-allele model (Clark 1987; Jeffreys et al. 1988; Flint et al. 1989). Furthermore, even

though a full dissection of all loci underlying the MLPs used for DNA fingerprinting is yet to be made, it is known that these loci are of minisatellite nature (Jeffreys et al. 1985b; Wong et al. 1987; Armour et al. 1990) and that they are located on chromosomal bands dispersed in the genome that are identifiable from in situ hybrid- ization of metaphase chromosomes (Royle et al. 1988; Zischler et al. 1989). Therefore, the evolutionary dy- namics of measures of genetic distances (D,f or 0,s) that are based on the DNA fingerprint should have properties identical to the locus-specific, allele fre- quency-based measures of Nei’s minimum (Dm) and standard (0,) distances under the infinite-allele model (Li and Nei 1975).

In summary, the present work demonstrates that, even though the DNA fingerprinting data give neither a full characterization of all loci nor the allele frequency distributions at individual loci, they are useful for evo- lutionary studies, and Nei’s genetic distance can be used for such purposes. Since the hypervariability at such loci is caused by a high mutation rate (Jeffreys et al. 1988), DNA fingerprinting data may be useful only for studying short-term evolution. The abundance of minisatellite loci detected by DNA fingerprinting has been noted in birds (Burke and Bruford 1987; Hillel et al. 1989; Burke et al. 199 I), marmosets (Dixson et al. 1988), dogs and cats (Jeffreys and Morton 1987), foxes (Gilbert et al. 1990), lions (Gilbert et al. 199 1)) and many other an- imals (see citations in Kelly et al. 1989). Therefore, the use of the method developed in this paper will not be restricted to microevolutionary studies in humans alone. Indeed, for some of the species listed above, genetic dis- tances based on classical genetic markers are almost use- less for studying evolutionary relationships between closely related populations and species, because of an extremely low level of genetic variation. In contrast, DNA fingerprint variation in highly inbred populations is seen to be discriminatory enough to validate the history of establishment of such populations (Gilbert et al. 1990).

Acknowledgments

This research was supported by U.S. Public Health Service research grants GM 4 1399 and GM 458 16 from the U.S. National Institutes of Health and by grant 92- IJ-CX-K024 from the National Institute of Justice. We thank Dr. M. Nei and an anonymous reviewer for their constructive suggestions that improved the presentation of the results.

LITERATURE CITED

ARMOUR, J. A., S. POVEY, S. JEREMIAH, and A. J. JEFFREYS. 1990. Systematic cloning of human minisatellites from or- dered array charomid libraries. Genomics 850 l-5 12.

Page 7: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

126 Jin and Chakraborty

BOCKEL, B., P. NORNBERG, and M. KRAWCZAK. 1992. Like- lihoods of multilocus DNA fingerprints in extended families. Am. J. Hum. Genet. 51:554-561.

BOWCOCK, A., S. OSBORNE-LAWRENCE, R. BARNES, A. CHAKRABARTI, S. WASHINGTON, and C. DUNN. 1993. Mi- crosatellite polymorphism linkage map of human chro- mosome 13s. Genomics 15:376-386.

BURKE, T., and M. W. BRUFORD. 1987. DNA fingerprinting in birds. Nature 325: 149- 152.

BURKE, T., 0. HANOTTE, M. W. BRUFORD, and E. CAIRNS. 199 1. Multilocus and single locus minisatellite analysis in population biological studies. Pp. 154-168 in T. BURKE, G. DOLF, A. J. JEFFREYS, and R. WOLFF, eds. DNA fin- gerprinting: approaches and applications. Birkhauser, Basel.

CHAKRABORTY, R. 1990. Genetic profile of cosmopolitan populations: effects of hidden subdivision. Anthropol. Anz. 48:313-331.

CHAKRABORTY, R., M. FORNAGE, R. GUEGUE, and E. BOER- WINKLE. 199 1. Population genetics of hypervariable loci: analysis of PCR based VNTR polymorphism within a pop- ulation. Pp. 127-143 in T. BURKE, G. DOLF, A. J. JEFFREYS, and R. WOLFF, eds. DNA fingerprinting: approaches and applications. Birkhauser, Basel.

CHAKRABORTY, R., and L. JIN. 1993a. Determination of re- latedness between individuals by DNA fingerprinting. Hum. Biol. 65:875-895.

p. 1993b. A unified approach to study hypervariable polymorphisms: statistical considerations of determining relatedness and population distances. Pp. 153-l 75 in S. D. J. PENA, R. CHAKRABORTY, J. T. EPPLEN, and A. J. FREYS, eds. DNA fingerprinting: state of the science. Birk- hauser, Basel.

CHAKRABORTY, R., and K. K. KJDD. 199 1. The utility of DNA typing in forensic work. Science 254: 1735- 1739.

CHAKRABORTY, R., P. E. SMOUSE, and J. V. NEEL. 1988. Pop- ulation amalgamation and genetic variation: observations on artificially agglomerated tribal populations of Central and South America. Am. J. Hum. Genet. 43:709-725.

CHAKRABORTY, R., M. R. SRINIVASAN, and M. DE ANDRADE. 1993. Intraclass and interclass correlations of allele sizes within and between loci in DNA typing data. Genetics 133: 41 l-419.

CLARK, A. G. 1987. Neutrality tests of highly polymorphic restriction-fragment-length polymorphisms. Am. J. Hum. Genet. 41:948-956.

COHEN, J. E. 1990. DNA fingerprinting for forensic identifi- cation: potential effects on data interpretation of subpop- ulation of heterogeneity and band number variability. Am. J. Hum. Genet. 46:358-368.

DEKA, R., R. CHAKRABORTY, S. DECROO, F. ROTHHAMMER, S. A. BARTON, and R. E. FERRELL. 1992. Characteristics of polymorphism at a VNTR locus 3’ to the apolipoprotein B gene in five human populations. Am. J. Hum. Genet. 51: 1325-1333.

DEKA, R., S. DECROO, L. JIN, F. ROTHHAMMER, S. MCGARVEY, R. E. FERRELL, and R. CHAKRABORTY. Pop- ulation genetic characteristics of the DlS80 locus in seven human populations. (submitted).

DEVLIN, B., and N. RISCH. 1992. A note on Hardy-Weinberg equilibrium of VNTR data by using the Federal Bureau of Investigation’s fixed-bin method. Am. J. Hum. Genet. 51: 549-553.

DEVLIN, B., N. RISCH, and K. ROEDER. 1990. No excess of homozygosity at loci used for DNA fingerprinting. Science 249:1416-1420.

DIXSON, A. F., N. HASTIE, I. PATEL, and A. J. JEFFREYS. 1988. DNA ‘fingerprinting’ of captive family groups of common marmosets (Callithrix jacchus). Folia Primatol. 51:52-55.

EDWARDS, A., A. CIVITELLO, H. HAMMOND, and C. T. CAS- KEY. 199 1. DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am. J. Hum. Genet. 49: 746-756.

EDWARDS, A., H. HAMMOND, L. JIN, C. T. CASKEY, and R. CHAKRABORTY. 1992. Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. Genomics 12:24 l-253.

ELY, J., R. DEKA, R. CHAKRABORTY, and R. E. FERRELL. 1992. Comparison of five tandem repeat loci between hu- mans and chimpanzees. Genomics 14:692-698.

FLINT, J., A. J. BOYCE, J. J. MARTINSON, and J. B. CLEGG. 1989. Population bottlenecks in Polynesia revealed by minisatellite. Hum. Genet. 83:257-263.

GEORGES, M., A. GUNAWARDANA, D. W. THREADGILL, M. LATHROP, I. OLASKER, A. MISHRA, L. L. SARGEANT, A. SCHOEBERLEIN, M. R. STEELE, C. TERRY, D. S. THREAD- GILL, X. ZHAS, T. HOLM, R. FRIES, and J. E. WOMACK. 199 1. Characterization of a set of variable number of tan- dem repeat markers conserved in bovidae. Genomics 11: 24-32.

GILBERT, D. A., N. LEHMAN, S. J. O’BRIEN, and R. K. WAYNE. 1990. Genetic fingerprinting reflects population differen- tiation in the California Channel Island fox. Nature 344: 764-767.

GILBERT, D. A., C. PACKER, A. E. PUSEY, J. C. STEPHENS, and S. J. O’BRIEN. 199 1. Analytical DNA fingerprinting in lions: parentage, genetic diversity, and kinship. J. Hered. 82:378-386.

HEARNE, C. M., S. SHOSH, and J. A. TODD. 1992, Microsatel- lites for linkage analysis of genetic traits. Trends Genet. 8: 288-294.

HILLEL, J., Y. PLOTZY, A. HABERFELD, U. LAVI, A. CAHANER, and A. J. JEFFREYS. 1989. DNA fingerprints of poultry. Anim. Genet. 20: 145- 155.

JEFFREY& A. J., J. F. Y. BROOKFIELD, and R. SEMEONOFF. 1985a. Positive identification of an immigration test-case using human DNA fingerprints. Nature 317:8 18-8 19.

JEFFREYS, A. J., and D. B. MORTON. 1987. DNA fingerprints of dogs and cats. Anim. Genet. 18: l-l 5.

JEFFREYS, A. J., N. J. ROYLE, V. WILSON, and Z. WONG. 1988. Spontaneous mutation rates to new length alleles at tandem-repetitive loci in human DNA. Nature 332:278- 281.

JEFFREYS, A. J., M. TURNER, and P. DEBENHAM. 199 1. The efficiency of multilocus DNA fingerprint probes for indi- vidualization and establishment of family relationships, de- termined from extensive casework. Am. J. Hum. Genet. 48:824-840.

Page 8: Estimation of Genetic Distance and Coefficient of Gene ... · Estimation of Genetic Distance and Coefficient of Gene Diversity from Single- Probe Multilocus DNA Fingerprinting Data

Genetic Distance from DNA Fingerprint 127

JEFFREYS, A. J., V. WILSON, and S. L. THEIN. 19853. Hyper- variable ‘minisatellite’ regions in human DNA. Nature 314: 67-73.

. 1985 c. Individual-specific ‘fingerprints’ of human DNA. Nature 316:76-79.

JIN, L., and R. CHAKRABORTY. 1993. A bias-corrected estimate of heterozygosity for single-probe multilocus DNA finger- prints. Mol. Biol. Evol. 10: 1112-l 114.

-. Population dynamics of DNA fingerprinting patterns within and between populations. Genet. Res. (accepted).

KELLY, R., G. BULFIELD, A. COLLICK, M. GIBBS, and A. J. JEFFREYS. 1989. Characterization of a highly unstable mouse minisatellite locus: evidence for somatic mutation during early development. Genomics 5:844-856.

KIDD, J. R., F. L. BLACK, K. M. WEISS, I. BALAZS, and K. K. KIDD. 199 1. Studies of three Amerindian populations using nuclear DNA polymorphisms. Hum. Biol. 63:775-794.

KRAWCZAK, M., and B. BOCKEL. 1992. A genetic factor model for the statistical analysis of multilocus DNA fingerprints. Electrophoresis 13: lo- 17.

KWIATKOWSKI, D. J., E. P. HENSKE, K. WEIMER, L. OZELIUS, J. F. GUSELLA, and J. HAINES. 1992. Construction of a GT polymorphism map of human 9s. Genomics 12:229-240.

LANDER, E. S. 1989. DNA fingerprinting on trial. Nature 339: 501-505.

LEVINSON, G., and G. A. GUTMAN. 1987a. High frequencies of short frameshifts in poly-CA / TG tandem repeats borne by bacteriophage M 13 in Escherichia coli K- 12. Nucleic Acids Res. 15:5323-5338.

. 19876. Slipped-strand mispiring: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203-221.

LEWONTIN, R. C., and D. L. HARTL. 199 1. Population genetics in forensic DNA typing. Science 254: 1745- 1750.

LI, W.-H., and M. NEI. 1975. Drift variances of heterozygosity and genetic distance in transient states. Genet. Res. 25:229- 248.

LYNCH, M. 1990. The similarity index and DNA fingerprinting. Mol. Biol. Evol. 7:478-484.

. 199 1. Analysis of population genetic structure by DNA fingerprinting. Pp. 113- 126 in T. BURKE, G. DOLF, A. J. JEFFREYS, and R. WOLFF, eds. DNA fingerprinting: ap- proaches and applications. Birkhauser, Basel.

MAHTANI, M. M., and H. F. WILLARD. 1993. A polymorphic X-linked tetranucleotide repeat locus displaying a high rate of new mutation: implications for mechanisms of mutation at short tandem repeat loci. Hum. Mol. Genet. 2:43 l-437.

MOURANT, A. E., A. C. KOPEC, K. DOMANIEWSKA-SOBCZAK. 1976. The distribution of the human blood groups. Oxford University Press, London.

NAKAMURA, Y., M. LEPPERT, P. O’CONNELL, R. WOLFF, T. HOLM, M. CULVER, C. MARTIN, E. FUJIMOTO, M. HOFF, E. KUMLIN, and R. WHITE. 1987. Variable number of tan-

NEI, M. 1972. Genetic distance between populations. Am. Nat. 106:283-292.

. 1973. Analysis of gene diversity in subdivided pop- ulations. Proc. Natl. Acad. Sci. USA 70:3321-3323.

. 1978. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89: 583-590.

p. 1987. Molecular evolutionary genetics. Columbia University Press, New York.

NEI, M., and R. K. CHESSER. 1983. Estimation of fixation indices and gene diversities. Ann. Hum. Genet. 47:253- 259.

OHTA, T., and M. KIMURA. 1973. A model of mutation ap- propriate to estimate the number of electrophoretically de- tectable alleles in a finite population. Genet. Res. 22:20 l- 204.

ROYLE, N. J., R. E. CLARKSON, Z. WONG, and A. J. JEFFREYS. 1988. Clustering of hypervariable minisatellites in the pro- terminal regions of human autosomes. Genomics 3:352- 360.

SCHL~TTERER, C., and D. TAUTZ. 1992. Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 20:2 1 l-2 15.

SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWIN- KLE. 1993. VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983-993.

STEPHENS, J. C., D. A. GILBERT, N. YUHKI, and S. J. O’BRIEN. 1992. Estimation of heterozygosity for single-probe multi- locus DNA fingerprints. Mol. Biol. Evol. 9:729-743.

VALDES, A. M., M. SLATKIN, and N. B. FREIMER. 1993. Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749.

WEIR, B. S. 1992. Independence of VNTR alleles defined as fixed bins. Genetics 130:873-887.

WONG, Z., V. WILSON, A. J. JEFFREYS, and S. L. THEIN. 1986. Cloning a selected fragment from a human DNA ‘finger- print’: isolation of an extremely polymorphic minisatellite. Nucleic Acids Res. 14:4605-46 16.

WONG, Z., V. WILSON, I. TATEL, S. POVEY, and A. J. JEF- FREYS . 1987. Characterization of a panel of highly variable minisatellites cloned from human DNA. Ann. Hum. Genet. 51:269-288.

YUHKI, N., and S. J. O’BRIEN. 1990. DNA variation of the mammalian major histocompatibility complex reflects ge- nomic diversity and population history. Proc. Natl. Acad. Sci. USA 87:836-840.

ZISCHLER, H., I. NANDA, R. SCHAFER, M. SCHMID, and J. T. EPPLEN . 1989. Digoxigenated oligonucleotide probes spe- cific for simple repeats in DNA fingerprinting and hybrid- ization in situ. Hum. Genet. 82:227-233.

MASATOSHI NEI, reviewing editor

Received June 17, 1993 dem repeat (VNTR) markers for human gene mapping. Science 235: 16 16- 1622. Accepted September 13, 1993