5
Proc. Nati. Acad. Sci. USA Vol. 89, pp. 7669-7673, August 1992 Population Biology Origins of the Indo-Europeans: Genetic evidence (gene frequencies/Europe) ROBERT R. SOKAL*t, NEAL L. ODENt, AND BARBARA A. THOMSON* *Department of Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245; and tDepartment of Preventive Medicine, Division of Epidemiology, Health Sciences Center, State University of New York, Stony Brook, NY 11794-8036 Contributed by Robert R. Sokal, May 22, 1992 ABSTRACT Two theories of the origins of the Indo- Europeans currently compete. M. Gimbutas believes that early Indo-Europeans entered southeastern Europe from the Pontic Steppes starting ca. 4500 B.C. and spread from there. C. Renfrew equates early Indo-Europeans with early farmers who entered southeastern Europe from Asia Minor ca. 7000 BC and spread through the continent. We tested genetic distance matrices for each of 25 systems in numerous Indo-European- speaking samples from Europe. To match each of these ma- trices, we created other distance matrices representing geog- raphy, language, time since origin of agriculture, Gimbutas' model, and Renfrew's model. The correlation between genetics and language is signi t. Geography, when held constant, produces a markedly lower, yet still highly significant partial correlation between genetics and language, showing that more remains to be explained. However, none of the remaining three distances-time since origin of agriculture, Gimbutas' model, or Renfrew's model-reduces the partial correlation further. Thus, neither of the two theories appears able to explain the orign of the Indo-Europeans as gauged by the genetics- language correlation. Almost all Europeans speak Indo-European (IE) languages, the only exceptions being Finns, Estonians, Hungarians, Turks, Basques, and Maltese. Where did IEs come from and how did they spread to most areas of Europe? Two theories of IE origins, derived from archaeological and linguistic evidence, currently predominate. The majority view is that of Marija Gimbutas (1-3) of the University of California, Los Angeles. She believes that early IEs entered southeastern Europe in three Kurgan culture waves from the Pontic Steppes starting ca. 4500 B.C. and spread from there. This view was challenged in 1987 by Colin Renfrew of Cambridge University (4). He equates early IEs with early farmers who entered southeastern Europe from Asia Minor ca. 7000 B.C. and spread through the continent by demic diffusion as proposed by Ammerman and Cavalli-Sforza (5). Genetic evidence from modern populations supports this model (6- 8), justifying a subsequent test of Renfrew's theory. How- ever, because Renfrew links his hypothesis with the origin of agriculture by demic diffusion, it becomes difficult to test the two hypotheses separately. Here we examine whether the genetic evidence available from modern European populations favors one of the two hypotheses on IE origins. Our approach is to examine correlations between genetic and linguistic distances in Eu- rope and to estimate the effects of various factors (geography, origin of agriculture) and hypothesized movements (Gimbu- tas' and Renfrew's models) on the magnitude of these cor- relations. MATERIALS AND METHODS We studied 25 genetic systems (erythrocyte antigens, plasma proteins, enzymes, histocompatibility alleles, immunoglob- ulins; Table 1) from 2111 IE-speaking samples in Europe. Details are specified elsewhere (10-14). We computed Pre- vosti's genetic distances (15, 16) (GEN) separately for the 479 to 27 localities (mean = 84) of each genetic system. Linguistic distances (LAN) were subjective estimates fur- nished by M. Ruhlen, based on his current classification of IE languages (17). A dendrogram (Fig. 1) resulting from UPGMA clustering (18) of the linguistic distance matrix shows the relations between the IE languages in that matrix. We com- puted great-circle geographic distances (GEO) between pairs of localities. The origin-of-agriculture distances (OOA) be- tween any pair of points were described earlier (8). They sum distances from their respective starting times of agriculture back to their putative common agricultural origins. The Renfrew hypothesis distance (REN) matrix was based on ref. 4 and discussions with Professor Renfrew. In his view, most of the introduction and subsequent diversification of the IE language families in Europe was concurrent with the spread of agriculture in the continent. Nevertheless, Renfrew explains the final branching into the major language families by a series of 10 so-called transitions illustrated in ref. 4 (figure 7.7). These transitions are associated with specific archaeological assemblages whose starting dates were en- tered on a map we smoothed by interpolation. Superimposed on this map (Fig. 2A) is a directed graph summarizing the directions and branching patterns of Renfrew's transitions. The REN between any pair of localities is their distance in time along the directed graph. If two localities are located in regions connected by different branches of the graph, the REN is computed by summing the time-distances along each branch to the point of their common origin. Suggestions by Professor Renfrew (personal communication) that some of these transitions might be wholly or partly acculturation rather than demic diffusion were tested by a sensitivity test aiming to maximize average GEN,LAN correlations. No genetic evidence for acculturation was found and the REN values were retained as described above. The Gimbutas distances (GIM) are based on a map (Fig. 2B) redrawn from one provided by Professor Gimbutas. It shows the regions reached by Kurganization waves 1 and/or 2 and 3. Distances between any pair of localities ij are computed as Dij = qki + qkj - q(ki+kj). In this formula q is the proportion of nonreplacement of resident genes by Kurgan genes, and ki and kj are the number of Kurganization waves received by localities i andj, respec- tively. Two localities are assigned DV = 1 (ki = kj = 0) in the un-Kurganized region (N in Fig. 2B), and ki = kj = 100 in the Abbreviation: IE, Indo-European. tTo whom reprint requests should be addressed. 7669 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on August 28, 2021

Origins ofthe Indo-Europeans: Genetic evidence1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002-0.142 -0.065 -0.050 0.016 3-1

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Origins ofthe Indo-Europeans: Genetic evidence1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002-0.142 -0.065 -0.050 0.016 3-1

Proc. Nati. Acad. Sci. USAVol. 89, pp. 7669-7673, August 1992Population Biology

Origins of the Indo-Europeans: Genetic evidence(gene frequencies/Europe)

ROBERT R. SOKAL*t, NEAL L. ODENt, AND BARBARA A. THOMSON**Department of Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245; and tDepartment of Preventive Medicine, Division ofEpidemiology, Health Sciences Center, State University of New York, Stony Brook, NY 11794-8036

Contributed by Robert R. Sokal, May 22, 1992

ABSTRACT Two theories of the origins of the Indo-Europeans currently compete. M. Gimbutas believes that earlyIndo-Europeans entered southeastern Europe from the PonticSteppes starting ca. 4500 B.C. and spread from there. C.Renfrew equates early Indo-Europeans with early farmers whoentered southeastern Europe from Asia Minor ca. 7000 BC andspread through the continent. We tested genetic distancematrices for each of 25 systems in numerous Indo-European-speaking samples from Europe. To match each of these ma-trices, we created other distance matrices representing geog-raphy, language, time since origin of agriculture, Gimbutas'model, and Renfrew's model. The correlation between geneticsand language is signi t. Geography, when held constant,produces a markedly lower, yet still highly significant partialcorrelation between genetics and language, showing that moreremains to be explained. However, none of the remaining threedistances-time since origin of agriculture, Gimbutas' model,or Renfrew's model-reduces the partial correlation further.Thus, neither of the two theories appears able to explain theorign of the Indo-Europeans as gauged by the genetics-language correlation.

Almost all Europeans speak Indo-European (IE) languages,the only exceptions being Finns, Estonians, Hungarians,Turks, Basques, and Maltese. Where did IEs come from andhow did they spread to most areas of Europe? Two theoriesof IE origins, derived from archaeological and linguisticevidence, currently predominate. The majority view is that ofMarija Gimbutas (1-3) of the University of California, LosAngeles. She believes that early IEs entered southeasternEurope in three Kurgan culture waves from the PonticSteppes starting ca. 4500 B.C. and spread from there. Thisview was challenged in 1987 by Colin Renfrew of CambridgeUniversity (4). He equates early IEs with early farmers whoentered southeastern Europe from Asia Minor ca. 7000 B.C.and spread through the continent by demic diffusion asproposed by Ammerman and Cavalli-Sforza (5). Geneticevidence from modern populations supports this model (6-8), justifying a subsequent test of Renfrew's theory. How-ever, because Renfrew links his hypothesis with the origin ofagriculture by demic diffusion, it becomes difficult to test thetwo hypotheses separately.Here we examine whether the genetic evidence available

from modern European populations favors one of the twohypotheses on IE origins. Our approach is to examinecorrelations between genetic and linguistic distances in Eu-rope and to estimate the effects ofvarious factors (geography,origin of agriculture) and hypothesized movements (Gimbu-tas' and Renfrew's models) on the magnitude of these cor-relations.

MATERIALS AND METHODS

We studied 25 genetic systems (erythrocyte antigens, plasmaproteins, enzymes, histocompatibility alleles, immunoglob-ulins; Table 1) from 2111 IE-speaking samples in Europe.Details are specified elsewhere (10-14). We computed Pre-vosti's genetic distances (15, 16) (GEN) separately for the 479to 27 localities (mean = 84) of each genetic system.

Linguistic distances (LAN) were subjective estimates fur-nished by M. Ruhlen, based on his current classification ofIElanguages (17). A dendrogram (Fig. 1) resulting from UPGMAclustering (18) of the linguistic distance matrix shows therelations between the IE languages in that matrix. We com-puted great-circle geographic distances (GEO) between pairsof localities. The origin-of-agriculture distances (OOA) be-tween any pair of points were described earlier (8). They sumdistances from their respective starting times of agricultureback to their putative common agricultural origins.The Renfrew hypothesis distance (REN) matrix was based

on ref. 4 and discussions with Professor Renfrew. In his view,most ofthe introduction and subsequent diversification oftheIE language families in Europe was concurrent with thespread of agriculture in the continent. Nevertheless, Renfrewexplains the final branching into the major language familiesby a series of 10 so-called transitions illustrated in ref. 4(figure 7.7). These transitions are associated with specificarchaeological assemblages whose starting dates were en-tered on a map we smoothed by interpolation. Superimposedon this map (Fig. 2A) is a directed graph summarizing thedirections and branching patterns of Renfrew's transitions.The REN between any pair of localities is their distance intime along the directed graph. If two localities are located inregions connected by different branches of the graph, theREN is computed by summing the time-distances along eachbranch to the point of their common origin. Suggestions byProfessor Renfrew (personal communication) that some ofthese transitions might be wholly or partly acculturationrather than demic diffusion were tested by a sensitivity testaiming to maximize average GEN,LAN correlations. Nogenetic evidence for acculturation was found and the RENvalues were retained as described above.The Gimbutas distances (GIM) are based on a map (Fig.

2B) redrawn from one provided by Professor Gimbutas. Itshows the regions reached by Kurganization waves 1 and/or2 and 3. Distances between any pair of localities ij arecomputed as

Dij = qki + qkj - q(ki+kj).

In this formula q is the proportion of nonreplacement ofresident genes by Kurgan genes, and ki and kj are the numberof Kurganization waves received by localities i andj, respec-tively. Two localities are assigned DV = 1 (ki = kj = 0) in theun-Kurganized region (N in Fig. 2B), and ki = kj = 100 in the

Abbreviation: IE, Indo-European.tTo whom reprint requests should be addressed.

7669

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Aug

ust 2

8, 2

021

Page 2: Origins ofthe Indo-Europeans: Genetic evidence1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002-0.142 -0.065 -0.050 0.016 3-1

7670 Population Biology: Sokal et al.

Table 1. Matrix correlations between genetic and linguistic distances, and partial matrix correlations involving these distances andgeographic, origin-of-agriculture, Renfrew and Gimbutas distances

System N GEN,LAN .GEO .GEO,OOA .GEO,OOA,REN .GEO,OOA,GIM1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.0272-5 MN 179 0.024 0.039 0.040 0.057* 0.106*2-7 MN 51 -0.002 -0.142 -0.065 -0.050 0.0163-1 P 79 -0.042 -0.077 -0.045 -0.046 -0.0214-1 RHESUS 479 0.054*** 0.006 -0.001 0.004 0.0034-13 RHESUS 74 0.114* 0.057 0.086* 0.087 0.0654-19 RHESUS 69 0.179*** 0.164** 0.173** 0.178*** 0.192***5-1 LUTH 27 -0.029 -0.015 0.010 0.012 0.0196-1 KELL 103 0.093 0.076 0.048 0.032 0.0756-3 KELL 30 0.027 -0.054 0.025 0.027 0.0617-1 ABHSE 49 0.017 -0.190 -0.174 -0.180 -0.2178-1 DUFFY 81 0.109* 0.073 0.084 0.080 0.110**36-1 HP 147 0.133*** 0.055* 0.065* 0.061* 0.03837-1 TF 33 0.117* 0.072 0.063 0.060 0.07738-1 GC 85 0.003 0.007 -0.005 -0.009 0.057*50-1-1 AP 61 0.317*** 0.249*** 0.207* 0.195* 0.177*52 PGD 39 0.122 0.043 0.045 0.049 0.00553 PGM1 63 0.236*** 0.077 0.000 -0.013 0.04856 AK 58 0.005 -0.119 -0.062 -0.073 0.02563 ADA 41 0.203*** 0.059 0.058 0.031 -0.01065 TASTER 52 0.359*** 0.273*** 0.291*** 0.294*** 0.219***100 HLA-A 60 0.408*** 0.238** 0.181*** 0.177* 0.184***101/2 HLA-B 60 0.455*** 0.280*** 0.216*** 0.211*** 0.231***200 GM 30 0.231*** 0.080 0.052 0.075 -0.011201 KM 28 0.246* 0.213* 0.215* 0.184* 0.19*

Average 0.141 0.059 0.063 0.060 0.067Numbers preceding the system symbols, up to 65, are those assigned by Mourant et al. (9); those from 100 and above were assigned in our

laboratory. N, numbers of localities samples; GEN, LAN, GEO, OOA, REN, and GIM stand for genetic, linguistic, geographic, origin-of-agriculture, Renfrew, and Gimbutas distances, respectively; the pairwise correlation is indicated as GEN,LAN; in the interest ofbrevity, partialcorrelations, which all are of GEN against LAN with various other distances held constant, are indicated by a period followed by the constantvariables. Thus, .GEO,OOA stands for rGENLAN.GEO,OOA. The correlations are followed by significance symbols based on 249 permutationsof rows and columns of one of the two distance or residual matrices. Significances are indicated as follows: *, 0.01 < P c 0.05; **, 0.004 <P ' 0.01; ***, P = 0.004. The last probability is conservative, since it is the lowest we can demonstrate with 249 permutations. Had we carriedout more permutations, we probably could have shown that many of the correlations marked by three asterisks are significant at P << 0.004.The significance for the average correlation is evaluated by Fisher's test for combining probabilities. In all cases P s 0.0001.

original Kurgan area (O in Fig. 2B). We do not, of course,know the value of q. We have been unable to learn fromGimbutas (refs. 1-3) how much actual population movement(versus cultural diffusion) is implied by her model. To resolvethis dilemma, we iteratively solved for the maximumGEN,GIM correlation over all genetic systems, obtaining avalue for q of 0.54. We used this estimate for computing ourGIM values.The six distance matrices were assembled for each genetic

system. The five other distance matrices had to match thedimension and composition of the genetic distance matrix ofeach system. We applied Mantel's method (19, 20) to test theassociation between all pairs of distance matrices andSmouse-Long-Sokal tests (21) to yield partial matrix corre-lations. We evaluated significance by 249 permutations. TheSmouse-Long-Sokal test extends Mantel's test to three ormore matrices and tests whether an association betweenmatrix A and B is significant when one or more matrices C,D, . . . are held constant. In this way we tested whether anycorrelation remained between GEN and LAN, once thecorrelation between these two variables due to one or moreregressor variables was eliminated.

RESULTSAmong pairwise correlations, the average correlations in-volvingGEN or LAN with other variables are low, except forLAN with GEO (0.480) and LAN with OOA (0.594). Onlyone correlation (LAN,GIM = -0.035) is nonsignificant byFisher's test for combining probabilities (22). All three design

matrices, OOA, REN, and GIM, are moderately related toGEO, ranging from 0.462 to 0.578. REN and GIM are alsomoderately related to OOA (0.342 and 0.222, respectively),but REN is only slightly correlated with GIM (0.231).

In Table 1, we show only correlations involving the zero-order pair GEN,LAN. Most pairwise correlations GEN,LANare positive, moderately high, and strongly significant. Be-cause both genetics and language are spatially autocorre-lated, we next calculated their partial correlations by holdingGEO constant. The average correlation for GEN,LAN of0.141 drops to 0.059, with 7 systems retaining significantpartial correlations. These linear correlations between dis-tance matrices are characteristically low, despite high signif-icance. Partial correlations with added distance matrices heldconstant do not decrease further and continue strongly sig-nificant. Fig. 3 summarizes these relations for the averagecorrelation. Fisher's tests for combining probabilities indi-cate very high significance (P << 0.001) for every averagecorrelation in the graph. In the absence of correlation thereshould be an equal number of positive and negative coeffi-cients. This null hypothesis can be firmly rejected by a signtest (22) at P < 0.025 for all correlations). We also tested bysign tests whether a significant number of genetic systemsdecrease their correlations as the number of distance matri-ces held constant increases. Holding GEO constant de-creases the correlation significantly, but no further distancematrix has any effect. If either REN or GIM explains part ofthe genetics-language correlation, they should reduce it.They do not, nor do the OOA distances. This last observationalso runs counter to Renfrew's theory.

Proc. Natl. Acad Sci. USA 89 (1992)D

ownl

oade

d by

gue

st o

n A

ugus

t 28,

202

1

Page 3: Origins ofthe Indo-Europeans: Genetic evidence1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002-0.142 -0.065 -0.050 0.016 3-1

Proc. Natl. Acad. Sci. USA 89 (1992) 7671

LINGUISTIC DISTANCES CLUSTEREO16.00 12.00 8.00 4.00 0.00

FIG. 1. Dendrogram showing the results of UPGMA clustering (18) of the distances, furnished by M. Ruhlen, among 43 IE languages.Abscissa is in arbitrary units.

DISCUSSION

There is significant correlation between genetic and linguisticdistances among IE speakers in Europe. This correlation issignificantly reduced by keeping geographic distances con-stant, confirming earlier findings of spatial autocorrelation ofboth variables (11, 13, 14). The partial correlations remainingafter geographic distances are held constant are still signifi-cant, yet none ofthe three distance matrices representing thehypotheses tested in this study-origin of agriculture, Ren-frew, or Gimbutas-can further explain (i.e., reduce) thecorrelations. Earlier we demonstrated (8) that the hypothesisof the origin of agriculture by demic diffusion (5, 23, 24)explains genetic distances in modern European populations.When tested separately for IE speakers (unpublished re-sults), this relationship is still true. However, the origin ofagriculture is unable to explain the genetics-language corre-lations in Europe. Neither ofthe two contending hypotheses,Renfrew's or Gimbutas', contributes an additional explana-tory element. Why might that be so?

Is a study of the correlations of genetic and linguisticdistances of IE speakers the wrong approach? To contributeto an understanding of the origin of IE speakers, geneticdistances must be correlated with linguistic distances. Sucha relationship has been demonstrated for Europe (25) andelsewhere (see references in ref. 25). Such correlations occurbecause (i) the processes of geographic differentiation ofpopulations and those leading to linguistic differentiationproceed in tandem; (ii) once established, linguistic differ-ences serve as barriers to population mixing, reinforcing thegenetics-language correlation; and (iii) introduction into anarea of populations differentiated elsewhere will increase thegenetics-language relation because these previously differ-entiated groups will differ with respect to both genetics andlanguage. Of these, the first factor should be the major one.This is corroborated in the present data, where the onlysignificant common factor is geography. The remaining sig-nificant partial correlation between language and genetics,after geography is held constant, indicates a relation between

these two variables above and beyond that caused by theircommon spatial differentiation.Do REN and GIM fail to remove any genetics-language

correlation because our coding of the two models is incor-rect? If there were no gene flow, genetics could not resolvethe controversy. By basing his model on the demic diffusiontheory of the origin of agriculture, Renfrew in effect admitsgene flow. Yet neither OOA nor REN removes any genetics-language correlation. With respect to GIM, we note that theIndo-Europeanization of Iran and northwest India clearlyinvolved population movements. We believe that Gimbutas'hypothesis ofthe Kurganization of southeastern Europe mustimply an analogous process. We are supported in this by theoutcome of our sensitivity test, which indicates populationmixing. Thus our results support neither Renfrew's norGimbutas' theory. However, the significant partial correla-tions remaining after GEO, OOA, and REN or GIM havebeen held constant still require explanation and may hold theclue to IE origins.The averaged GEN,LAN correlations are rather low. The

highest pairwise correlation in Table 1 is only 0.455. Linearcorrelations of distance matrices generally tend to be on thelow side, even with high statistical significance established bypermutational methods. The averaged partial correlations ofgenetic distance with linguistic distance, with other distancesincluding geography held constant, vary slightly around 0.06,depending on which other matrices are included. In Table 1only a few systems show substantial correlations, the othersbeing small and not significant. Not every locus will differ-entiate during the origins of the various populations. In acomparison of modern, racially diverse populations-Italians, Nigerians, and Japanese (as listed in ref. 26)-Italians differed from the other two populations by as muchas 0.2 in only 20.4% of the cases. Differentiation or diffusioninvolving these populations would be detectable in only afewloci. Since the genetic differences between the pre-TE pop-ulation and the IEs surely were less than those amongItalians, Japanese, and Nigerians, we should not expect manysystems to show strong genetics-language correlation. Note

Population Biology: Sokal et al.

Dow

nloa

ded

by g

uest

on

Aug

ust 2

8, 2

021

Page 4: Origins ofthe Indo-Europeans: Genetic evidence1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002-0.142 -0.065 -0.050 0.016 3-1

7672 Population Biology: Sokal et al.

xxxxxxxxx QOCocOCGnccr 3 mongoosesXXXXXYXXXX O6OQOOCCO'944hi 4t; UUihmh*IU

'2 ' d abXXXX4X XX X000X0XXXOOCO C-.G@6G 9CM CD709S EOMISIZEI* + - >aXXXXXXXX OCOCCCC C 9OGO Negivene ..:XXXCXXXL OOCCOOCC2 D f@M@v|m

4500 4950 5600 5900 6500 7250 7750 8500 9000 yr BP

.B

that the high correlations subsumed in the low averagesremain high as partial correlations also.Our results imply that, while neither of the currently

contending hypotheses of IE origins can be supported by thegenetic evidence, there is significant residual correlation forsome genetic systems that requires explanation. We shall notpropose an alternative hypothesis of IE origins. However,the observed correlations invite exploratory data analysis ofthe population samples supporting the relation between lan-guage and genetics.We examined the residuals from the regressions ofGEN on

GEO and LAN on GEO. We examined the highest 2% of theproducts of these residuals and mapped the pairs of localitiesrepresented by them for each of the five systems (50.1.1 AP,65 T, 101 HLA-A, 101/102 HLA-B, and 201 KM) showing the

FIG. 2. Maps used for comput-ing distances corresponding to thetwo theories of TE origins. (A)Map for computing Renfrew(REN) distances. The contoursrepresent time intervals [years be-fore present (yr BP)], as identifiedin the key. They were obtainedfrom a map in which the startingdates of archaeological assem-blages, which characterize Ren-frew's 10 transitions (4), were in-terpolated to smooth the surface.The area has been subdivided into10 numbered regions correspond-ing to the identically numberedtransitions. A directed graph fol-lowing the outline furnished byRenfrew has been superimposedon the map. (B) Map for comput-ing Gimbutas (GIM) distances.The map shows outlines ofregionsin Europe that received none, one,or more of the Kurgan waves de-scribed by Gimbutas in refs. 1-3.Each region is labeled by the wavenumber it received. Regions that

- received more than one wave aremarked by more than one nu-meral. Thus, area 123 receivedwaves 1, 2, and 3. The regionlabeled 0 is the original home ofthe Kurgan people (the Sredni-Stog and Yamna cultures), regionslabeled N received none of theKurgan waves. The map is basedon hand-drawn originals by M.Gimbutas.

highest correlation between genetics and language. Pairedpositive (negative) deviations indicate areas more (less) dis-tant genetically and linguistically than their geographic dis-tances would predict. In maps for these five systems, pairedpositive deviations frequently involve Sardinia, which isquite distant genetically, and also linguistically, from nearbyMediterranean populations. Paired negative deviationsheavily involve Iceland. Icelanders, being "displaced Scan-dinavians," are far less distant genetically and linguisticallyfrom Scandinavians and other Germanic speakers than theirgeographic distances indicate.While the map patterns do not, unfortunately, suggest an

alternative hypothesis of IE origins, since the relations theydo indicate are far more recent (such as the settlement ofIceland), they suggest that the overall pattern of partial

Proc. Natl. Acad. Sci. USA 89 (1992)

Dow

nloa

ded

by g

uest

on

Aug

ust 2

8, 2

021

Page 5: Origins ofthe Indo-Europeans: Genetic evidence1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002-0.142 -0.065 -0.050 0.016 3-1

Proc. Natl. Acad. Sci. USA 89 (1992) 7673

.060"**.GEOOOAREN\v

11ne I- 15ns

.141*** .059*** .063*** .065***3*** 15nGGENLAN .GEO - .GEO,OOA .GEOOOARENGIM

\5ns 9n

-,_ \4 .067** I.GEO,OOA,G1M

* Renfrew correct

E Gimbutas correct

* Neither correct

FIG. 3. Summary ofresults. The large arrows indicate successive steps in computing zero- to fourth-order partial correlations between genetic(GEN) and linguistic (LAN) distances. Other distances successively held constant are geography (GEO), origin of agriculture (OOA), Gimbutas(GIM), and Renfrew (REN). The numerical values at both ends of the large arrows are the average correlations from the bottom line of Table1. They are all highly significant (P << 0.001). The numbers above the large arrows are the numbers of genetic systems (out of 25) that respondcounter to expectations when an added distance matrix is held constant. The symbols following these. numbers give the results of a one-tailedsign test (22) of the positive and negative changes to the correlations during the operation indicated by the arrow [ns (not significant), P > 0.05;***, P < 0.005]. The three small arrows beneath each large arrow furnish predictions made by each theory concerning the behavior ofthe partialcorrelations. From the top down the arrows represent Renfrew's theory, Gimbutas' theory, and the assumption that neither theory is correct.A horizontal small arrow predicts no effect, a downward sloping small arrow predicts a reduction in the magnitude of the partial correlations,and a downward vertical small arrow predicts a reduction of the partial correlation to nonsignificance. The small arrows illustrate that thepredictions of the Renfrew and Gimbutas theories are not borne out and that the outcomes are compatible with the prediction that neither theoryis correct.

correlations might help us decide among competing hypoth-eses. If the TEs originated in situ by local differentiation only,there should be no significant partial correlation, since ge-ography should fully explain the observed genetic and lin-guistic distances. This was not the case. If the genetics-language correlation were entirely due to the spread ofpopulations accompanying the origin of agriculture, then theorigin-of-agriculture model should suffice, or at least thereshould be some effect due to origin ofagriculture. But we sawthat origin-of-agriculture distances (OOA) cannot reduce thepartial correlations remaining after geography has been heldconstant. If the IEs originated by a branching process outsideor inside of Europe and the populations ancestral to themodem IE language families branched off at different times,and moved into different regions in Europe where theydifferentiated subsequently, they would yield a pattern suchas was found by us. A phylogenetic tree structure would addadditional similarities and distances to the data, above andbeyond those engendered by local differentiation. Theseconclusions agree with earlier findings in our laboratory (13,14, 27) that intrusion of populations differentiated elsewherehas contributed an important element to the associationbetween genetics and language in Europe.

We thank Prof. Marija Gimbutas, Lord Renfrew, and Dr. MerrittRuhlen for their collegial cooperation in this work. We are indebtedto D. DiGiovanni, M.-J. Fortin, and C. Wilson for technical assis-tance. Part of the computation was carried out on the CornellNational Supercomputer Facility. This research was supported byNational Science Foundation Grant BNS8918751 and National In-stitutes of Health Grant GM28262.

1. Gimbutas, M. (1973) J. Indo-Eur. Studies 1, 1-20.2. Gimbutas, M. (1979) Arch. Suisses Anthropol. GMn. 43, 113-137.3. Gimbutas, M. (1986) in Ethnogenese Europdischer V6lker, eds. Bern-

hard, W. & Kandler-PNsson, A. (Fischer, Stuttgart, F.R.G.), pp. 5-20.

4. Renfrew, C. (1987) Archaeology and Language: The Puzzle of Indo-European Origins (Jonathan Cape, London).

5. Ammerman, A. J. & Cavalli-Sforza, L. L. (1984) The Neolithic Transi-tion and the Genetics of Populations in Europe (Princeton Univ. Press,Princeton, NJ).

6. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. L. (1978) Science 201,786-792.

7. Sokal, R. R. & Menozzi, P. (1982) Am. Nat. 119, 1-17.8. Sokal, R. R., Oden, N. L. & Wilson, C. (1991) Nature (London) 351,

143-145.9. Mourant, A. E., Koped, A. C. & Domaniewska-Sobczak, K. (1976) The

Distribution ofthe Human Blood Groups (Oxford Univ. Press, London).10. Derish, P. A. & Sokal, R. R. (1988) Hum. Biol. 60, 801-824.11. Sokal, R. R. (1988) Proc. Natl. Acad. Sci. USA 85, 1722-1726.12. Sokal, R. R., Oden, N. L. & Thomson, B. A. (1988) Am. J. Phys.

Anthropol. 76, 337-361.13. Sokal, R. R., Oden, N. L., Legendre, P., Fortin, M.-J., Kim, J. &

Vaudor, A. (1989) Am. J. Phys. Anthropol. 79, 489-502.14. Sokal, R. R., Harding, R. M. & Oden, N. L. (1989) Am. J. Phys.

Anthropol. 80, 267-294.15. Prevosti, A., Ocana, J. & Alonso, G. (1975) Theor. Appl. Genet. 45,

231-241.16. Wright, S. (1978) Evolution and the Genetics of Populations, Vol 4:

Variability Within and Among Populations (Univ. of Chicago Press,Chicago).

17. Ruhlen, M. (1991) A Guide to the World's Languages, Vol 1: Classifi-cation; With a Postscript on RecentDevelopments (Stanford Univ. Press,Stanford, CA).

18. Sneath, P. H. A. & Sokal, R. R. (1973) Numerical Taxonomy (Freeman,San Francisco).

19. Mantel, N. (1%7) Cancer Res. 27, 209-220.20. Sokal, R. R. (1979) Syst. Zool. 28, 227-231.21. Smouse, P. E., Long, J. C. & Sokal, R. R. (1986) Syst. Zool. 35,627-632.22. Sokal, R. R. & Rohlf, F. J. (1981) Biometry (Freeman, San Francisco),

2nd Ed.23. Ammerman, A. J. & Cavalli-Sforza, L. L. (1973) in The Explanation of

Culture Change, ed. Renfrew, C. (Duckworth, London), pp. 343-357.24. Ammerman, A. J. & Cavalli-Sforza, L. L. (1979) in Transformations:

Mathematical Approaches to Culture Change, eds. Renfrew, C. &Cooke, K. L. (Academic, New York), pp. 275-294.

25. Sokal, R. R., Oden, N. L., Legendre, P., Fortin, M.-J., Kim, J., Thom-son, B. A., Vaudor, A., Harding, R. M. & Barbujani, G. (1990)Am. Nat.135, 157-175.

26. Roychoudhury, A. K. & Nei, M. (1988) Human Polymorphic Genes:World Distribution. (Oxford Univ. Press, New York).

27. Sokal, R. R. (1991) Annu. Rev. Anthropol. 20, 119-140.

Population Biology: Sokal et al.

Dow

nloa

ded

by g

uest

on

Aug

ust 2

8, 2

021