39
Anatole Klyosov This second part of a two-part series of articles applies the methods developed in Part 1 to many different world populations. While some of the results presented here are supported by other studies, many of the results are quite different from those found in previous studies, or, in cases where only theories of origin have been pro- posed, from popular expectations. We show here that our methods of calculating the TMRCA for a group of haplotypes, leads to a need for a reevaluation of current theories of the populating of Europe and other areas. The data presented below will show several major con- clusions: (a) The male Basques living today have rather recent roots of less than four thousand years ago, contrary to the legend that proposes they lived some 30 thousand years ago. ____________________________________________________________ Address for correspondence: Anatole Klyosov, [email protected] Received: January 8, 2009; accepted: August 1, 2009 (b) There is no justification in the results of a "Ukrainian refuge" for the R1a1 ancient population allegedly 15,000 years ago; instead, evidence has been obtained that the oldest R1a1 lived circa 20,000 years before the present (ybp) in South Siberia. There are two sets of data and these provide ages of 21,000±3,000 ybp and 19,625±2,800 ybp, calculated by two different methods, and 11,650±1,550 years ago appeared in the Balkans (Serbia, Kosovo, Bosnia, Macedonia). (c) Except the South Siberian and Balkans populations, present-day bearers of R1a1 across Western and Eastern Europe have common ancestors who lived between 3550 and 4750 years ago (the "youngest" in Scotland, Ireland and Sweden, the "oldest" in Russia (4750±500 ybp) and Germany (4,700±520 ybp), (d) There are two different groups of Indian R1a1 haplotypes; one shows a good match with the Russian Slavic R1a1 group, having a common ancestor several hundred years "younger" than the Russian R1a com- mon ancestor (4,050±500 vs. 4,750±500 ybp). This supports the idea that a proto-Slavic migration to India as Aryans occurred (mentioned in classic ancient Indian literature) around 3600 ybp. The other Indian R1a population is significantly older, with a common ances- tor living 7,125±950 ybp; they could have migrated from South Siberia to South India. 217

DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II: Walking the Map

Embed Size (px)

Citation preview

Page 1: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

Anatole Klyosov

This second part of a two-part series of articles appliesthe methods developed in Part 1 to many different worldpopulations While some of the results presented hereare supported by other studies many of the results arequite different from those found in previous studies orin cases where only theories of origin have been pro-posed from popular expectations We show here thatour methods of calculating the TMRCA for a group ofhaplotypes leads to a need for a reevaluation of currenttheories of the populating of Europe and other areas

The data presented below will show several major con-clusions

(a) The male Basques living today have rather recentroots of less than four thousand years ago contrary tothe legend that proposes they lived some 30 thousandyears ago

____________________________________________________________

Address for correspondence Anatole Klyosov aklyosovcomcastnet

Received January 8 2009 accepted August 1 2009

(b) There is no justification in the results of aUkrainian refuge for the R1a1 ancient populationallegedly 15000 years ago instead evidence has beenobtained that the oldest R1a1 lived circa 20000 yearsbefore the present (ybp) in South Siberia There are twosets of data and these provide ages of 21000plusmn3000 ybpand 19625plusmn2800 ybp calculated by two differentmethods and 11650plusmn1550 years ago appeared in theBalkans (Serbia Kosovo Bosnia Macedonia)

(c) Except the South Siberian and Balkans populationspresent-day bearers of R1a1 across Western and EasternEurope have common ancestors who lived between3550 and 4750 years ago (the youngest in ScotlandIreland and Sweden the oldest in Russia (4750plusmn500ybp) and Germany (4700plusmn520 ybp)

(d) There are two different groups of Indian R1a1haplotypes one shows a good match with the RussianSlavic R1a1 group having a common ancestor severalhundred years younger than the Russian R1a com-mon ancestor (4050plusmn500 vs 4750plusmn500 ybp) Thissupports the idea that a proto-Slavic migration to Indiaas Aryans occurred (mentioned in classic ancient Indianliterature) around 3600 ybp The other Indian R1apopulation is significantly older with a common ances-tor living 7125plusmn950 ybp they could have migratedfrom South Siberia to South India

217

218

(e) South India Chenchu R1a1 match the current Rus-sian Slavic R1a1 haplotypes and the Chenchu R1acommon ancestor appeared some 3200plusmn1900 ybp ap-parently after the R1a1 migration from the North toIndia Another Chenchu R1a1 lineage originated about350plusmn350 ybp around the 17th century CE

(f) The so-called Cohen Modal Haplotype in Haplo-group J1 originated 9000plusmn1400 years ago if all relatedJ1 haplotypes are considered About 4000plusmn520 ybp itappeared in the proto-Jewish population and1050plusmn190 years ago (if to consider only CMH) or1400plusmn260 years ago (if to consider only Jewish J1population) split a recent CMH lineage

(g) another so-called CMH of Haplogroup J2 appearedin the Jewish population 1375plusmn300 years ago

(h)The South African Lemba population of HaplogroupJ has nothing to do with ancient Jewish patriarchs sincethe haplogroup appears to have penetrated the Lembapopulation some 625plusmn200 ybp around the 14th centuryCE

(i) Native American Haplogroup Q1a3a contains atleast six lineages the oldest of which originated16000plusmn3300 years ago in accord with archaeologicaldata

The methods used in the present study are described inthe companion article

The set of Basque R1b haplotypes was already discussedas an example in the companion article with the follow-ing results The common ancestor of the Basques lived3625plusmn370 years ago and had the following haplotype

13-24-14-11-11-14-12-12-12-13-13-29-17-9-10-11-11-25-14-18-29-15-15-17-17

Since the 750 19-marker Iberian R1b1 haplotypes con-tain 16 identical base haplotypes the time span to theircommon ancestor can also be calculated asln(75016)00285 = 135 generations (without a correc-tion for back mutations) or 156 generations with thecorrection (see Table 2 in the companion paper) to thecommon ancestor that is 3900 ybp (by thelogarithmic method) It is only 76 higher than theabove value obtained by mutations count (by thelinear method) and fits well into the standard errorof the calculation

It is remarkable that the Basque ancestral 25-markerhaplotype is practically identical to the 25-marker ances-tral haplotypes of R1b-U152 and R1b1b2-M269 sub-clades

13-24-14-11-11-14-12-12-12-13-13-29-17-9-10-11-11-25- - -29-15-15-17-17

with the respective common ancestors having lived4375plusmn450 and 4450plusmn460 years before present(Klyosov 2008a) with estimated error being shown asthe 95 confidence interval

As was considered in detail in the preceding paper (PartI) these relatively tight confidence intervals result fromthe large number of alleles involved in each of theconsidered haplotype series such as 750 of the 19-mark-er Iberian R1b1 haplotype series (14250 alleles) 184 ofthe 25-marker U152 haplotypes (4600 alleles) and 197of the 25-marker M269 haplotypes (4925 alleles) Thestandard errors of the measurementsndashfor the averagenumber of mutations per marker in said haplotypeseriesndashresulted in 95 confidence intervals of plusmn200284 and 273 respectively while the correspond-ing values for the mutation rates for the employed 12-19- and 25-marker haplotypes were equal to 10 as itwas discussed in Part I Naturally for smaller haplotypeseries the standard errors and standard deviations aresignificantly higher as shown below

There are only two differences in alleles (in bold) of theBasque (Iberian) and the M269-U152 base haplotypes(in DYS 437 and 448) which are 14-18 in the Basquebase haplotype and 15-19 in the latter These mutationsare quite insignificant because the corresponding meanvalues in the Basque haplotypes are 1453 and 1835respectively

The data show that all three populations including theBasques are likely to be descendents from the samecommon ancestor of the R1b1b2-M269 haplogroupThe principal conclusion is that the male Basques livingtoday have rather recent roots of less than four thousandyears contrary to legend that proposes they lived some30000 years ago Despite the ancient language it isvery likely that the present day Basques represent arather recent Iberian population in terms of DNAgenealogy It is very unlikely that their ancestors hadencountered Neanderthals in Europe or had been associ-ated with the Aurignacian culture (34000-23000 ybp)nor did they make sophisticated cave paintings in Southof France Spain and Portugal Arguably the Basqueancient and unique language was brought to Iberiaaround 3600 ybp by the M269 bearers from their placeof preceding location(s) andor their origin presumablyin Asia The origin of R1b1b2 however is beyond thescope of this study and will be discussed in more detailelsewhere

219Klyosov DNA genealogy mutation rates etc Part II Walking the map

A list of 243 of 19-marker Irish haplotypes encompass-ing 35 surnames with origins in the province of Munsterwas published in (McEvoy et al 2008) The listedhaplotypes were not identified by the authors in terms ofhaplogroups However when a haplotype tree wascomposed as shown in it became obvious thatit included quite a distinct branch of 25 haplotypes of adifferent origin which clearly descended from a differ-ent common ancestor from apparently a different haplo-group Indeed those 25 haplotypes had the followingbase haplotype (in the format DYS 19-388-3891-3892-390-391-392-393-434-435-436-437-438-439-460-461-462-385a-385b employed by Adams et al 2008)

15-13-13-16-23-10-11-13-11-11-12-15-10-11-10-11-12-12-15

It was identified as a member of Haplogroup I2 Forexample the Iberian base I2 haplotype deduced from anextended haplotype list in (Adams et al 2008) is asfollows

15-13-13-16-23-10-11-13-11-11-12-15-10-11-10- -12- -15

It differs by only one mutation on two markers (shownin bold) and its assumed common ancestor lived appar-ently 12800plusmn2600 ybp (Klyosov 2009a) The 25 IrishI2 haplotypes contain 199 mutations which bring theircommon ancestor to 9600plusmn1200 ybp This is evidentlya combination of subclades I2a1 (5600plusmn620 ybp) I2a2(6250plusmn800 and 2275plusmn380 ybp for two separate branch-es) I2b1 (5700plusmn590 ybp) and I2b2 (5000plusmn630 ybp)with timespans to the respective common ancestorsshown in parentheses (Klyosov to be published) Nev-

220

ertheless this apparent dating shows an upper limit ofthe respective timespan and depends on the fraction ofthe subclades that are represented in the dataset

Since a difference in two mutations in two 19-markerhaplotypes approximately corresponds to a time differ-ence between them of about 1900plusmn1300 years it bringsthe two ancestral I2 haplotypes (in the Iberia and on theIsles) into a rather close proximity in time Besides theancient I2 haplotypes in Ireland and Iberia resemble theancient I2 base haplotype in Eastern Europe (PolandUkraine Belarus Russia) which in said 19-marker for-mat is as follows

15-13-13-16-23-10-12-13-X-X-X-14-X-11-X-X-X-15-15

and its common ancestor lived 10800plusmn1170 ybp(Klyosov 2009a) The same comment regarding acombination of I2a and I2b subclades (see above) isapplicable here as well

When the 25 ancient I2 haplotypes were removed fromthe tree of the Irish haplotypes the presumably R1b1218-haplotype tree became as shown in Itsbase haplotype

14-12-13-16-24-11-13-13-11-11-12-15-12-12-11-12-11-11-14

turned out to be identical with the Iberian R1b1 ancestralhaplotype (see the preceding paper Part I) and with theAtlantic Modal Haplotype shown in the same format

221Klyosov DNA genealogy mutation rates etc Part II Walking the map

14-12-13-16-24-11-13-13- X- X- Y- 15-12-12-11- X-X- 11-14

Here X replaces the alleles which are not part of the67-marker FTDNA format and Y stands for DYS436which is not specified for the AMH The same haplo-type is the base one for the subclade U152 (R1b1c10)with a common ancestor of 4125plusmn450 ybp P312(3950plusmn400 ybp) L21 (3600plusmn370 ybp) and for R1b1b2(M269) haplogroup (with subclades) with a commonancestor of 4375plusmn450 ybp (Klyosov 2008a and to bepublished) and only one mutation away (DYS390=23)for R1b-U106 (4175plusmn430 ybp) Hence they all arelikely to have a rather recent origin where the wordldquorecentrdquo is used in comparison to some widespreadexpectations in the literature of some 30000 years to acommon ancestor

All the 218 Irish R1b1 haplotypes have 731 mutationsin their 4142 alleles which brings their common ances-tor to 3350plusmn360 ybp The average number of mutationsper marker is slightly lower in the Irish R1b1 haplotypes(0176plusmn0007) compared to the Iberian ones(0196plusmn0004) the mutation rate was the same in thecalculations of the 19-marker haplotypes It appearsthat the M269 bearers had come to both Iberia andIreland fairly late compared with the time of their inhab-iting the continent or elsewhere where the commonancestor of R1b1b2 subclades lived between 3600plusmn370and 4175plusmn430 ybp (see above)

Since this dataset (McEvoy et al 2008) was publishedwithout assigning of haplotypes to haplogroups onemight think that once the I2 haplotypes were separatedfrom the series the set would contain some other extra-neous haplotypes from other haplogroups besides R1b1and the dating would be distorted However it is veryunlikely First the tree ( ) does not containobviously foreign branches which would otherwiseproduce a very distinct protuberances Second thevalue on the marker DYS392 in the remaining haplo-types was completely characteristic of Haplogroup R1b(predominantly a value of 13 with a few 14s and one12) Third the young age of 3350plusmn360 ybp for thisIrish dataset which is even younger than that of anumber of R1b1b2 subclades (3600plusmn370 to 4175plusmn430ybp) shows that there were no any noticeable admix-tures of haplogroups other than R1b which wouldsharply increase the age of the dataset

It might appear that the common ancestor(s) of thisgroup of Irish individuals had arrived to the Isles some-what late compared to his relatives in Iberia Howeverwe know that the asymmetry of mutations can affect theobserved number of mutations per marker and if theIberian R1b1 haplotype series is more asymmetrical interms of mutations compared to the Irish one it mightexplain the difference

However it is not so The Iberian haplotypes arepractically symmetrical (the degree of mutations is056) and from 0196plusmn0004 (the observed number ofmutations per marker without the corrections) the cor-rections brought the result to 0218plusmn0004 (correctedfor back mutations) and further to 0217plusmn0004(corrected for asymmetry of mutations) The degree ofasymmetry of the Irish 188 haplotypes was 054 so therespective results will be 0176plusmn0007 0192plusmn0007 and0190plusmn0007 Hence the Irish R1b1 haplotypes mightstill appear to be a little younger compared to theIberian R1b1 haplotypes

The pattern of the Irish R1b1 haplotypes however is abit more complicated since the above series of 218haplotypes also contains two different base haplotypesone in which DYS439=11 (shown in bold) and thereare 13 of such base haplotypes in the whole haplotypeseries

14-12-13-16-24-11-13-13-11-11-12-15-12- -11-12-11-11-14

and another in which DYS391=10 and DYS385b=15(shown in bold) and there are 24 of such base haplo-types among the total of 218

14-12-13-16-24- -13-13-11-11-12-15-12-12-11-12-11-11-

The last one is obviously a young lineage since the 24base haplotypes are sitting on the 98-haplotype branch(on the left-hand side in ) which gives an esti-mate of the TSCA for this branch of ln(9824)00285 =49 generations (without the correction) and 52 genera-tions with the correction for back mutations that isaround 1300 ybp The older 13 base haplotypes inthe total amount of 218 haplotypes would give2750plusmn290 years to the common ancestor

This dating actually matches fairly well the TSCA for1242 haplotypes from the British Isles for which 262haplotypes are identical to each other hence the basehaplotypes in the FTDNA format (X and Y stand for nottyped DYS385ab)

13-24-14-X-Y-12-12-12-13-13-29

The fraction of the base haplotype givesln(1242262)00179 = 87 generations (without correc-tion) or 96 generations with the correction that is2400plusmn250 years to the common ancestor This popula-tion of R1b1 is certainly younger compared with theIberian R1b1 series of haplotypes

Which of the base (ancestral) R1b1 haplotypes wereyounger in terms of the TSCA the Irish or the Iberianwas further examined using a larger series of 983 Irish

222

R1b1 haplotypes published in (McEvoy and Bradley2006) Their base haplotypes was as follows

14-12-13-16-24-11-13-13-9-11-12-15-12-12-11-10-11-11-14

In all 983 haplotypes 966 had the allele of 9 (98 oftotal) on DYS434 while in the preceding series of 218Irish haplotypes 216 of them had 11 (99 of total)on the same very locus It seems that the authors simplychanged their notation of haplotypes We have the samesituation with DYS461 where 189 out of 218 had the12 allele (87) while in the larger series it was 10 in833 of 943 haplotypes (88) It appears that thesehaplotypes are in fact identical to each other

All the 983 R1b1 haplotypes have 3706 mutations fromthe base haplotype that is 0198plusmn0003 for the averagenumber of mutations per marker This is practicallyidentical with 0196plusmn0004 for the Iberian R1b1 haplo-types The degree of asymmetry for the larger series is

exactly the same (054) as in the 218-haplotype seriesand cannot shift the number of mutations per markerhence the age of the common ancestor stays at3800plusmn380 ybp Therefore the Irish and the IberianR1b1 haplotypes (3625plusmn370 ybp) have practically thesame common ancestor from the viewpoint of DNAgenealogy

It is of interest to compare them to a Central Europeanseries such as the Flemish R1b 12-marker 64-haplotypeseries (Mertens 2007) In that case a non-FTDNAformat of 12 markers was employed in which DYS426and DYS388 were replaced with DYS437 and DYS438The average mutation rate for the format was calculatedand shown in Table 1 in the preceding paper All 64haplotypes have the following base haplotype (the first10 markers in the format of the FTDNA plus DYS437and DYS438)

13-24-14-11-11-14-X-Y-12-13-13-29-15-12

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 2: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

218

(e) South India Chenchu R1a1 match the current Rus-sian Slavic R1a1 haplotypes and the Chenchu R1acommon ancestor appeared some 3200plusmn1900 ybp ap-parently after the R1a1 migration from the North toIndia Another Chenchu R1a1 lineage originated about350plusmn350 ybp around the 17th century CE

(f) The so-called Cohen Modal Haplotype in Haplo-group J1 originated 9000plusmn1400 years ago if all relatedJ1 haplotypes are considered About 4000plusmn520 ybp itappeared in the proto-Jewish population and1050plusmn190 years ago (if to consider only CMH) or1400plusmn260 years ago (if to consider only Jewish J1population) split a recent CMH lineage

(g) another so-called CMH of Haplogroup J2 appearedin the Jewish population 1375plusmn300 years ago

(h)The South African Lemba population of HaplogroupJ has nothing to do with ancient Jewish patriarchs sincethe haplogroup appears to have penetrated the Lembapopulation some 625plusmn200 ybp around the 14th centuryCE

(i) Native American Haplogroup Q1a3a contains atleast six lineages the oldest of which originated16000plusmn3300 years ago in accord with archaeologicaldata

The methods used in the present study are described inthe companion article

The set of Basque R1b haplotypes was already discussedas an example in the companion article with the follow-ing results The common ancestor of the Basques lived3625plusmn370 years ago and had the following haplotype

13-24-14-11-11-14-12-12-12-13-13-29-17-9-10-11-11-25-14-18-29-15-15-17-17

Since the 750 19-marker Iberian R1b1 haplotypes con-tain 16 identical base haplotypes the time span to theircommon ancestor can also be calculated asln(75016)00285 = 135 generations (without a correc-tion for back mutations) or 156 generations with thecorrection (see Table 2 in the companion paper) to thecommon ancestor that is 3900 ybp (by thelogarithmic method) It is only 76 higher than theabove value obtained by mutations count (by thelinear method) and fits well into the standard errorof the calculation

It is remarkable that the Basque ancestral 25-markerhaplotype is practically identical to the 25-marker ances-tral haplotypes of R1b-U152 and R1b1b2-M269 sub-clades

13-24-14-11-11-14-12-12-12-13-13-29-17-9-10-11-11-25- - -29-15-15-17-17

with the respective common ancestors having lived4375plusmn450 and 4450plusmn460 years before present(Klyosov 2008a) with estimated error being shown asthe 95 confidence interval

As was considered in detail in the preceding paper (PartI) these relatively tight confidence intervals result fromthe large number of alleles involved in each of theconsidered haplotype series such as 750 of the 19-mark-er Iberian R1b1 haplotype series (14250 alleles) 184 ofthe 25-marker U152 haplotypes (4600 alleles) and 197of the 25-marker M269 haplotypes (4925 alleles) Thestandard errors of the measurementsndashfor the averagenumber of mutations per marker in said haplotypeseriesndashresulted in 95 confidence intervals of plusmn200284 and 273 respectively while the correspond-ing values for the mutation rates for the employed 12-19- and 25-marker haplotypes were equal to 10 as itwas discussed in Part I Naturally for smaller haplotypeseries the standard errors and standard deviations aresignificantly higher as shown below

There are only two differences in alleles (in bold) of theBasque (Iberian) and the M269-U152 base haplotypes(in DYS 437 and 448) which are 14-18 in the Basquebase haplotype and 15-19 in the latter These mutationsare quite insignificant because the corresponding meanvalues in the Basque haplotypes are 1453 and 1835respectively

The data show that all three populations including theBasques are likely to be descendents from the samecommon ancestor of the R1b1b2-M269 haplogroupThe principal conclusion is that the male Basques livingtoday have rather recent roots of less than four thousandyears contrary to legend that proposes they lived some30000 years ago Despite the ancient language it isvery likely that the present day Basques represent arather recent Iberian population in terms of DNAgenealogy It is very unlikely that their ancestors hadencountered Neanderthals in Europe or had been associ-ated with the Aurignacian culture (34000-23000 ybp)nor did they make sophisticated cave paintings in Southof France Spain and Portugal Arguably the Basqueancient and unique language was brought to Iberiaaround 3600 ybp by the M269 bearers from their placeof preceding location(s) andor their origin presumablyin Asia The origin of R1b1b2 however is beyond thescope of this study and will be discussed in more detailelsewhere

219Klyosov DNA genealogy mutation rates etc Part II Walking the map

A list of 243 of 19-marker Irish haplotypes encompass-ing 35 surnames with origins in the province of Munsterwas published in (McEvoy et al 2008) The listedhaplotypes were not identified by the authors in terms ofhaplogroups However when a haplotype tree wascomposed as shown in it became obvious thatit included quite a distinct branch of 25 haplotypes of adifferent origin which clearly descended from a differ-ent common ancestor from apparently a different haplo-group Indeed those 25 haplotypes had the followingbase haplotype (in the format DYS 19-388-3891-3892-390-391-392-393-434-435-436-437-438-439-460-461-462-385a-385b employed by Adams et al 2008)

15-13-13-16-23-10-11-13-11-11-12-15-10-11-10-11-12-12-15

It was identified as a member of Haplogroup I2 Forexample the Iberian base I2 haplotype deduced from anextended haplotype list in (Adams et al 2008) is asfollows

15-13-13-16-23-10-11-13-11-11-12-15-10-11-10- -12- -15

It differs by only one mutation on two markers (shownin bold) and its assumed common ancestor lived appar-ently 12800plusmn2600 ybp (Klyosov 2009a) The 25 IrishI2 haplotypes contain 199 mutations which bring theircommon ancestor to 9600plusmn1200 ybp This is evidentlya combination of subclades I2a1 (5600plusmn620 ybp) I2a2(6250plusmn800 and 2275plusmn380 ybp for two separate branch-es) I2b1 (5700plusmn590 ybp) and I2b2 (5000plusmn630 ybp)with timespans to the respective common ancestorsshown in parentheses (Klyosov to be published) Nev-

220

ertheless this apparent dating shows an upper limit ofthe respective timespan and depends on the fraction ofthe subclades that are represented in the dataset

Since a difference in two mutations in two 19-markerhaplotypes approximately corresponds to a time differ-ence between them of about 1900plusmn1300 years it bringsthe two ancestral I2 haplotypes (in the Iberia and on theIsles) into a rather close proximity in time Besides theancient I2 haplotypes in Ireland and Iberia resemble theancient I2 base haplotype in Eastern Europe (PolandUkraine Belarus Russia) which in said 19-marker for-mat is as follows

15-13-13-16-23-10-12-13-X-X-X-14-X-11-X-X-X-15-15

and its common ancestor lived 10800plusmn1170 ybp(Klyosov 2009a) The same comment regarding acombination of I2a and I2b subclades (see above) isapplicable here as well

When the 25 ancient I2 haplotypes were removed fromthe tree of the Irish haplotypes the presumably R1b1218-haplotype tree became as shown in Itsbase haplotype

14-12-13-16-24-11-13-13-11-11-12-15-12-12-11-12-11-11-14

turned out to be identical with the Iberian R1b1 ancestralhaplotype (see the preceding paper Part I) and with theAtlantic Modal Haplotype shown in the same format

221Klyosov DNA genealogy mutation rates etc Part II Walking the map

14-12-13-16-24-11-13-13- X- X- Y- 15-12-12-11- X-X- 11-14

Here X replaces the alleles which are not part of the67-marker FTDNA format and Y stands for DYS436which is not specified for the AMH The same haplo-type is the base one for the subclade U152 (R1b1c10)with a common ancestor of 4125plusmn450 ybp P312(3950plusmn400 ybp) L21 (3600plusmn370 ybp) and for R1b1b2(M269) haplogroup (with subclades) with a commonancestor of 4375plusmn450 ybp (Klyosov 2008a and to bepublished) and only one mutation away (DYS390=23)for R1b-U106 (4175plusmn430 ybp) Hence they all arelikely to have a rather recent origin where the wordldquorecentrdquo is used in comparison to some widespreadexpectations in the literature of some 30000 years to acommon ancestor

All the 218 Irish R1b1 haplotypes have 731 mutationsin their 4142 alleles which brings their common ances-tor to 3350plusmn360 ybp The average number of mutationsper marker is slightly lower in the Irish R1b1 haplotypes(0176plusmn0007) compared to the Iberian ones(0196plusmn0004) the mutation rate was the same in thecalculations of the 19-marker haplotypes It appearsthat the M269 bearers had come to both Iberia andIreland fairly late compared with the time of their inhab-iting the continent or elsewhere where the commonancestor of R1b1b2 subclades lived between 3600plusmn370and 4175plusmn430 ybp (see above)

Since this dataset (McEvoy et al 2008) was publishedwithout assigning of haplotypes to haplogroups onemight think that once the I2 haplotypes were separatedfrom the series the set would contain some other extra-neous haplotypes from other haplogroups besides R1b1and the dating would be distorted However it is veryunlikely First the tree ( ) does not containobviously foreign branches which would otherwiseproduce a very distinct protuberances Second thevalue on the marker DYS392 in the remaining haplo-types was completely characteristic of Haplogroup R1b(predominantly a value of 13 with a few 14s and one12) Third the young age of 3350plusmn360 ybp for thisIrish dataset which is even younger than that of anumber of R1b1b2 subclades (3600plusmn370 to 4175plusmn430ybp) shows that there were no any noticeable admix-tures of haplogroups other than R1b which wouldsharply increase the age of the dataset

It might appear that the common ancestor(s) of thisgroup of Irish individuals had arrived to the Isles some-what late compared to his relatives in Iberia Howeverwe know that the asymmetry of mutations can affect theobserved number of mutations per marker and if theIberian R1b1 haplotype series is more asymmetrical interms of mutations compared to the Irish one it mightexplain the difference

However it is not so The Iberian haplotypes arepractically symmetrical (the degree of mutations is056) and from 0196plusmn0004 (the observed number ofmutations per marker without the corrections) the cor-rections brought the result to 0218plusmn0004 (correctedfor back mutations) and further to 0217plusmn0004(corrected for asymmetry of mutations) The degree ofasymmetry of the Irish 188 haplotypes was 054 so therespective results will be 0176plusmn0007 0192plusmn0007 and0190plusmn0007 Hence the Irish R1b1 haplotypes mightstill appear to be a little younger compared to theIberian R1b1 haplotypes

The pattern of the Irish R1b1 haplotypes however is abit more complicated since the above series of 218haplotypes also contains two different base haplotypesone in which DYS439=11 (shown in bold) and thereare 13 of such base haplotypes in the whole haplotypeseries

14-12-13-16-24-11-13-13-11-11-12-15-12- -11-12-11-11-14

and another in which DYS391=10 and DYS385b=15(shown in bold) and there are 24 of such base haplo-types among the total of 218

14-12-13-16-24- -13-13-11-11-12-15-12-12-11-12-11-11-

The last one is obviously a young lineage since the 24base haplotypes are sitting on the 98-haplotype branch(on the left-hand side in ) which gives an esti-mate of the TSCA for this branch of ln(9824)00285 =49 generations (without the correction) and 52 genera-tions with the correction for back mutations that isaround 1300 ybp The older 13 base haplotypes inthe total amount of 218 haplotypes would give2750plusmn290 years to the common ancestor

This dating actually matches fairly well the TSCA for1242 haplotypes from the British Isles for which 262haplotypes are identical to each other hence the basehaplotypes in the FTDNA format (X and Y stand for nottyped DYS385ab)

13-24-14-X-Y-12-12-12-13-13-29

The fraction of the base haplotype givesln(1242262)00179 = 87 generations (without correc-tion) or 96 generations with the correction that is2400plusmn250 years to the common ancestor This popula-tion of R1b1 is certainly younger compared with theIberian R1b1 series of haplotypes

Which of the base (ancestral) R1b1 haplotypes wereyounger in terms of the TSCA the Irish or the Iberianwas further examined using a larger series of 983 Irish

222

R1b1 haplotypes published in (McEvoy and Bradley2006) Their base haplotypes was as follows

14-12-13-16-24-11-13-13-9-11-12-15-12-12-11-10-11-11-14

In all 983 haplotypes 966 had the allele of 9 (98 oftotal) on DYS434 while in the preceding series of 218Irish haplotypes 216 of them had 11 (99 of total)on the same very locus It seems that the authors simplychanged their notation of haplotypes We have the samesituation with DYS461 where 189 out of 218 had the12 allele (87) while in the larger series it was 10 in833 of 943 haplotypes (88) It appears that thesehaplotypes are in fact identical to each other

All the 983 R1b1 haplotypes have 3706 mutations fromthe base haplotype that is 0198plusmn0003 for the averagenumber of mutations per marker This is practicallyidentical with 0196plusmn0004 for the Iberian R1b1 haplo-types The degree of asymmetry for the larger series is

exactly the same (054) as in the 218-haplotype seriesand cannot shift the number of mutations per markerhence the age of the common ancestor stays at3800plusmn380 ybp Therefore the Irish and the IberianR1b1 haplotypes (3625plusmn370 ybp) have practically thesame common ancestor from the viewpoint of DNAgenealogy

It is of interest to compare them to a Central Europeanseries such as the Flemish R1b 12-marker 64-haplotypeseries (Mertens 2007) In that case a non-FTDNAformat of 12 markers was employed in which DYS426and DYS388 were replaced with DYS437 and DYS438The average mutation rate for the format was calculatedand shown in Table 1 in the preceding paper All 64haplotypes have the following base haplotype (the first10 markers in the format of the FTDNA plus DYS437and DYS438)

13-24-14-11-11-14-X-Y-12-13-13-29-15-12

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 3: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

219Klyosov DNA genealogy mutation rates etc Part II Walking the map

A list of 243 of 19-marker Irish haplotypes encompass-ing 35 surnames with origins in the province of Munsterwas published in (McEvoy et al 2008) The listedhaplotypes were not identified by the authors in terms ofhaplogroups However when a haplotype tree wascomposed as shown in it became obvious thatit included quite a distinct branch of 25 haplotypes of adifferent origin which clearly descended from a differ-ent common ancestor from apparently a different haplo-group Indeed those 25 haplotypes had the followingbase haplotype (in the format DYS 19-388-3891-3892-390-391-392-393-434-435-436-437-438-439-460-461-462-385a-385b employed by Adams et al 2008)

15-13-13-16-23-10-11-13-11-11-12-15-10-11-10-11-12-12-15

It was identified as a member of Haplogroup I2 Forexample the Iberian base I2 haplotype deduced from anextended haplotype list in (Adams et al 2008) is asfollows

15-13-13-16-23-10-11-13-11-11-12-15-10-11-10- -12- -15

It differs by only one mutation on two markers (shownin bold) and its assumed common ancestor lived appar-ently 12800plusmn2600 ybp (Klyosov 2009a) The 25 IrishI2 haplotypes contain 199 mutations which bring theircommon ancestor to 9600plusmn1200 ybp This is evidentlya combination of subclades I2a1 (5600plusmn620 ybp) I2a2(6250plusmn800 and 2275plusmn380 ybp for two separate branch-es) I2b1 (5700plusmn590 ybp) and I2b2 (5000plusmn630 ybp)with timespans to the respective common ancestorsshown in parentheses (Klyosov to be published) Nev-

220

ertheless this apparent dating shows an upper limit ofthe respective timespan and depends on the fraction ofthe subclades that are represented in the dataset

Since a difference in two mutations in two 19-markerhaplotypes approximately corresponds to a time differ-ence between them of about 1900plusmn1300 years it bringsthe two ancestral I2 haplotypes (in the Iberia and on theIsles) into a rather close proximity in time Besides theancient I2 haplotypes in Ireland and Iberia resemble theancient I2 base haplotype in Eastern Europe (PolandUkraine Belarus Russia) which in said 19-marker for-mat is as follows

15-13-13-16-23-10-12-13-X-X-X-14-X-11-X-X-X-15-15

and its common ancestor lived 10800plusmn1170 ybp(Klyosov 2009a) The same comment regarding acombination of I2a and I2b subclades (see above) isapplicable here as well

When the 25 ancient I2 haplotypes were removed fromthe tree of the Irish haplotypes the presumably R1b1218-haplotype tree became as shown in Itsbase haplotype

14-12-13-16-24-11-13-13-11-11-12-15-12-12-11-12-11-11-14

turned out to be identical with the Iberian R1b1 ancestralhaplotype (see the preceding paper Part I) and with theAtlantic Modal Haplotype shown in the same format

221Klyosov DNA genealogy mutation rates etc Part II Walking the map

14-12-13-16-24-11-13-13- X- X- Y- 15-12-12-11- X-X- 11-14

Here X replaces the alleles which are not part of the67-marker FTDNA format and Y stands for DYS436which is not specified for the AMH The same haplo-type is the base one for the subclade U152 (R1b1c10)with a common ancestor of 4125plusmn450 ybp P312(3950plusmn400 ybp) L21 (3600plusmn370 ybp) and for R1b1b2(M269) haplogroup (with subclades) with a commonancestor of 4375plusmn450 ybp (Klyosov 2008a and to bepublished) and only one mutation away (DYS390=23)for R1b-U106 (4175plusmn430 ybp) Hence they all arelikely to have a rather recent origin where the wordldquorecentrdquo is used in comparison to some widespreadexpectations in the literature of some 30000 years to acommon ancestor

All the 218 Irish R1b1 haplotypes have 731 mutationsin their 4142 alleles which brings their common ances-tor to 3350plusmn360 ybp The average number of mutationsper marker is slightly lower in the Irish R1b1 haplotypes(0176plusmn0007) compared to the Iberian ones(0196plusmn0004) the mutation rate was the same in thecalculations of the 19-marker haplotypes It appearsthat the M269 bearers had come to both Iberia andIreland fairly late compared with the time of their inhab-iting the continent or elsewhere where the commonancestor of R1b1b2 subclades lived between 3600plusmn370and 4175plusmn430 ybp (see above)

Since this dataset (McEvoy et al 2008) was publishedwithout assigning of haplotypes to haplogroups onemight think that once the I2 haplotypes were separatedfrom the series the set would contain some other extra-neous haplotypes from other haplogroups besides R1b1and the dating would be distorted However it is veryunlikely First the tree ( ) does not containobviously foreign branches which would otherwiseproduce a very distinct protuberances Second thevalue on the marker DYS392 in the remaining haplo-types was completely characteristic of Haplogroup R1b(predominantly a value of 13 with a few 14s and one12) Third the young age of 3350plusmn360 ybp for thisIrish dataset which is even younger than that of anumber of R1b1b2 subclades (3600plusmn370 to 4175plusmn430ybp) shows that there were no any noticeable admix-tures of haplogroups other than R1b which wouldsharply increase the age of the dataset

It might appear that the common ancestor(s) of thisgroup of Irish individuals had arrived to the Isles some-what late compared to his relatives in Iberia Howeverwe know that the asymmetry of mutations can affect theobserved number of mutations per marker and if theIberian R1b1 haplotype series is more asymmetrical interms of mutations compared to the Irish one it mightexplain the difference

However it is not so The Iberian haplotypes arepractically symmetrical (the degree of mutations is056) and from 0196plusmn0004 (the observed number ofmutations per marker without the corrections) the cor-rections brought the result to 0218plusmn0004 (correctedfor back mutations) and further to 0217plusmn0004(corrected for asymmetry of mutations) The degree ofasymmetry of the Irish 188 haplotypes was 054 so therespective results will be 0176plusmn0007 0192plusmn0007 and0190plusmn0007 Hence the Irish R1b1 haplotypes mightstill appear to be a little younger compared to theIberian R1b1 haplotypes

The pattern of the Irish R1b1 haplotypes however is abit more complicated since the above series of 218haplotypes also contains two different base haplotypesone in which DYS439=11 (shown in bold) and thereare 13 of such base haplotypes in the whole haplotypeseries

14-12-13-16-24-11-13-13-11-11-12-15-12- -11-12-11-11-14

and another in which DYS391=10 and DYS385b=15(shown in bold) and there are 24 of such base haplo-types among the total of 218

14-12-13-16-24- -13-13-11-11-12-15-12-12-11-12-11-11-

The last one is obviously a young lineage since the 24base haplotypes are sitting on the 98-haplotype branch(on the left-hand side in ) which gives an esti-mate of the TSCA for this branch of ln(9824)00285 =49 generations (without the correction) and 52 genera-tions with the correction for back mutations that isaround 1300 ybp The older 13 base haplotypes inthe total amount of 218 haplotypes would give2750plusmn290 years to the common ancestor

This dating actually matches fairly well the TSCA for1242 haplotypes from the British Isles for which 262haplotypes are identical to each other hence the basehaplotypes in the FTDNA format (X and Y stand for nottyped DYS385ab)

13-24-14-X-Y-12-12-12-13-13-29

The fraction of the base haplotype givesln(1242262)00179 = 87 generations (without correc-tion) or 96 generations with the correction that is2400plusmn250 years to the common ancestor This popula-tion of R1b1 is certainly younger compared with theIberian R1b1 series of haplotypes

Which of the base (ancestral) R1b1 haplotypes wereyounger in terms of the TSCA the Irish or the Iberianwas further examined using a larger series of 983 Irish

222

R1b1 haplotypes published in (McEvoy and Bradley2006) Their base haplotypes was as follows

14-12-13-16-24-11-13-13-9-11-12-15-12-12-11-10-11-11-14

In all 983 haplotypes 966 had the allele of 9 (98 oftotal) on DYS434 while in the preceding series of 218Irish haplotypes 216 of them had 11 (99 of total)on the same very locus It seems that the authors simplychanged their notation of haplotypes We have the samesituation with DYS461 where 189 out of 218 had the12 allele (87) while in the larger series it was 10 in833 of 943 haplotypes (88) It appears that thesehaplotypes are in fact identical to each other

All the 983 R1b1 haplotypes have 3706 mutations fromthe base haplotype that is 0198plusmn0003 for the averagenumber of mutations per marker This is practicallyidentical with 0196plusmn0004 for the Iberian R1b1 haplo-types The degree of asymmetry for the larger series is

exactly the same (054) as in the 218-haplotype seriesand cannot shift the number of mutations per markerhence the age of the common ancestor stays at3800plusmn380 ybp Therefore the Irish and the IberianR1b1 haplotypes (3625plusmn370 ybp) have practically thesame common ancestor from the viewpoint of DNAgenealogy

It is of interest to compare them to a Central Europeanseries such as the Flemish R1b 12-marker 64-haplotypeseries (Mertens 2007) In that case a non-FTDNAformat of 12 markers was employed in which DYS426and DYS388 were replaced with DYS437 and DYS438The average mutation rate for the format was calculatedand shown in Table 1 in the preceding paper All 64haplotypes have the following base haplotype (the first10 markers in the format of the FTDNA plus DYS437and DYS438)

13-24-14-11-11-14-X-Y-12-13-13-29-15-12

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 4: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

220

ertheless this apparent dating shows an upper limit ofthe respective timespan and depends on the fraction ofthe subclades that are represented in the dataset

Since a difference in two mutations in two 19-markerhaplotypes approximately corresponds to a time differ-ence between them of about 1900plusmn1300 years it bringsthe two ancestral I2 haplotypes (in the Iberia and on theIsles) into a rather close proximity in time Besides theancient I2 haplotypes in Ireland and Iberia resemble theancient I2 base haplotype in Eastern Europe (PolandUkraine Belarus Russia) which in said 19-marker for-mat is as follows

15-13-13-16-23-10-12-13-X-X-X-14-X-11-X-X-X-15-15

and its common ancestor lived 10800plusmn1170 ybp(Klyosov 2009a) The same comment regarding acombination of I2a and I2b subclades (see above) isapplicable here as well

When the 25 ancient I2 haplotypes were removed fromthe tree of the Irish haplotypes the presumably R1b1218-haplotype tree became as shown in Itsbase haplotype

14-12-13-16-24-11-13-13-11-11-12-15-12-12-11-12-11-11-14

turned out to be identical with the Iberian R1b1 ancestralhaplotype (see the preceding paper Part I) and with theAtlantic Modal Haplotype shown in the same format

221Klyosov DNA genealogy mutation rates etc Part II Walking the map

14-12-13-16-24-11-13-13- X- X- Y- 15-12-12-11- X-X- 11-14

Here X replaces the alleles which are not part of the67-marker FTDNA format and Y stands for DYS436which is not specified for the AMH The same haplo-type is the base one for the subclade U152 (R1b1c10)with a common ancestor of 4125plusmn450 ybp P312(3950plusmn400 ybp) L21 (3600plusmn370 ybp) and for R1b1b2(M269) haplogroup (with subclades) with a commonancestor of 4375plusmn450 ybp (Klyosov 2008a and to bepublished) and only one mutation away (DYS390=23)for R1b-U106 (4175plusmn430 ybp) Hence they all arelikely to have a rather recent origin where the wordldquorecentrdquo is used in comparison to some widespreadexpectations in the literature of some 30000 years to acommon ancestor

All the 218 Irish R1b1 haplotypes have 731 mutationsin their 4142 alleles which brings their common ances-tor to 3350plusmn360 ybp The average number of mutationsper marker is slightly lower in the Irish R1b1 haplotypes(0176plusmn0007) compared to the Iberian ones(0196plusmn0004) the mutation rate was the same in thecalculations of the 19-marker haplotypes It appearsthat the M269 bearers had come to both Iberia andIreland fairly late compared with the time of their inhab-iting the continent or elsewhere where the commonancestor of R1b1b2 subclades lived between 3600plusmn370and 4175plusmn430 ybp (see above)

Since this dataset (McEvoy et al 2008) was publishedwithout assigning of haplotypes to haplogroups onemight think that once the I2 haplotypes were separatedfrom the series the set would contain some other extra-neous haplotypes from other haplogroups besides R1b1and the dating would be distorted However it is veryunlikely First the tree ( ) does not containobviously foreign branches which would otherwiseproduce a very distinct protuberances Second thevalue on the marker DYS392 in the remaining haplo-types was completely characteristic of Haplogroup R1b(predominantly a value of 13 with a few 14s and one12) Third the young age of 3350plusmn360 ybp for thisIrish dataset which is even younger than that of anumber of R1b1b2 subclades (3600plusmn370 to 4175plusmn430ybp) shows that there were no any noticeable admix-tures of haplogroups other than R1b which wouldsharply increase the age of the dataset

It might appear that the common ancestor(s) of thisgroup of Irish individuals had arrived to the Isles some-what late compared to his relatives in Iberia Howeverwe know that the asymmetry of mutations can affect theobserved number of mutations per marker and if theIberian R1b1 haplotype series is more asymmetrical interms of mutations compared to the Irish one it mightexplain the difference

However it is not so The Iberian haplotypes arepractically symmetrical (the degree of mutations is056) and from 0196plusmn0004 (the observed number ofmutations per marker without the corrections) the cor-rections brought the result to 0218plusmn0004 (correctedfor back mutations) and further to 0217plusmn0004(corrected for asymmetry of mutations) The degree ofasymmetry of the Irish 188 haplotypes was 054 so therespective results will be 0176plusmn0007 0192plusmn0007 and0190plusmn0007 Hence the Irish R1b1 haplotypes mightstill appear to be a little younger compared to theIberian R1b1 haplotypes

The pattern of the Irish R1b1 haplotypes however is abit more complicated since the above series of 218haplotypes also contains two different base haplotypesone in which DYS439=11 (shown in bold) and thereare 13 of such base haplotypes in the whole haplotypeseries

14-12-13-16-24-11-13-13-11-11-12-15-12- -11-12-11-11-14

and another in which DYS391=10 and DYS385b=15(shown in bold) and there are 24 of such base haplo-types among the total of 218

14-12-13-16-24- -13-13-11-11-12-15-12-12-11-12-11-11-

The last one is obviously a young lineage since the 24base haplotypes are sitting on the 98-haplotype branch(on the left-hand side in ) which gives an esti-mate of the TSCA for this branch of ln(9824)00285 =49 generations (without the correction) and 52 genera-tions with the correction for back mutations that isaround 1300 ybp The older 13 base haplotypes inthe total amount of 218 haplotypes would give2750plusmn290 years to the common ancestor

This dating actually matches fairly well the TSCA for1242 haplotypes from the British Isles for which 262haplotypes are identical to each other hence the basehaplotypes in the FTDNA format (X and Y stand for nottyped DYS385ab)

13-24-14-X-Y-12-12-12-13-13-29

The fraction of the base haplotype givesln(1242262)00179 = 87 generations (without correc-tion) or 96 generations with the correction that is2400plusmn250 years to the common ancestor This popula-tion of R1b1 is certainly younger compared with theIberian R1b1 series of haplotypes

Which of the base (ancestral) R1b1 haplotypes wereyounger in terms of the TSCA the Irish or the Iberianwas further examined using a larger series of 983 Irish

222

R1b1 haplotypes published in (McEvoy and Bradley2006) Their base haplotypes was as follows

14-12-13-16-24-11-13-13-9-11-12-15-12-12-11-10-11-11-14

In all 983 haplotypes 966 had the allele of 9 (98 oftotal) on DYS434 while in the preceding series of 218Irish haplotypes 216 of them had 11 (99 of total)on the same very locus It seems that the authors simplychanged their notation of haplotypes We have the samesituation with DYS461 where 189 out of 218 had the12 allele (87) while in the larger series it was 10 in833 of 943 haplotypes (88) It appears that thesehaplotypes are in fact identical to each other

All the 983 R1b1 haplotypes have 3706 mutations fromthe base haplotype that is 0198plusmn0003 for the averagenumber of mutations per marker This is practicallyidentical with 0196plusmn0004 for the Iberian R1b1 haplo-types The degree of asymmetry for the larger series is

exactly the same (054) as in the 218-haplotype seriesand cannot shift the number of mutations per markerhence the age of the common ancestor stays at3800plusmn380 ybp Therefore the Irish and the IberianR1b1 haplotypes (3625plusmn370 ybp) have practically thesame common ancestor from the viewpoint of DNAgenealogy

It is of interest to compare them to a Central Europeanseries such as the Flemish R1b 12-marker 64-haplotypeseries (Mertens 2007) In that case a non-FTDNAformat of 12 markers was employed in which DYS426and DYS388 were replaced with DYS437 and DYS438The average mutation rate for the format was calculatedand shown in Table 1 in the preceding paper All 64haplotypes have the following base haplotype (the first10 markers in the format of the FTDNA plus DYS437and DYS438)

13-24-14-11-11-14-X-Y-12-13-13-29-15-12

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 5: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

221Klyosov DNA genealogy mutation rates etc Part II Walking the map

14-12-13-16-24-11-13-13- X- X- Y- 15-12-12-11- X-X- 11-14

Here X replaces the alleles which are not part of the67-marker FTDNA format and Y stands for DYS436which is not specified for the AMH The same haplo-type is the base one for the subclade U152 (R1b1c10)with a common ancestor of 4125plusmn450 ybp P312(3950plusmn400 ybp) L21 (3600plusmn370 ybp) and for R1b1b2(M269) haplogroup (with subclades) with a commonancestor of 4375plusmn450 ybp (Klyosov 2008a and to bepublished) and only one mutation away (DYS390=23)for R1b-U106 (4175plusmn430 ybp) Hence they all arelikely to have a rather recent origin where the wordldquorecentrdquo is used in comparison to some widespreadexpectations in the literature of some 30000 years to acommon ancestor

All the 218 Irish R1b1 haplotypes have 731 mutationsin their 4142 alleles which brings their common ances-tor to 3350plusmn360 ybp The average number of mutationsper marker is slightly lower in the Irish R1b1 haplotypes(0176plusmn0007) compared to the Iberian ones(0196plusmn0004) the mutation rate was the same in thecalculations of the 19-marker haplotypes It appearsthat the M269 bearers had come to both Iberia andIreland fairly late compared with the time of their inhab-iting the continent or elsewhere where the commonancestor of R1b1b2 subclades lived between 3600plusmn370and 4175plusmn430 ybp (see above)

Since this dataset (McEvoy et al 2008) was publishedwithout assigning of haplotypes to haplogroups onemight think that once the I2 haplotypes were separatedfrom the series the set would contain some other extra-neous haplotypes from other haplogroups besides R1b1and the dating would be distorted However it is veryunlikely First the tree ( ) does not containobviously foreign branches which would otherwiseproduce a very distinct protuberances Second thevalue on the marker DYS392 in the remaining haplo-types was completely characteristic of Haplogroup R1b(predominantly a value of 13 with a few 14s and one12) Third the young age of 3350plusmn360 ybp for thisIrish dataset which is even younger than that of anumber of R1b1b2 subclades (3600plusmn370 to 4175plusmn430ybp) shows that there were no any noticeable admix-tures of haplogroups other than R1b which wouldsharply increase the age of the dataset

It might appear that the common ancestor(s) of thisgroup of Irish individuals had arrived to the Isles some-what late compared to his relatives in Iberia Howeverwe know that the asymmetry of mutations can affect theobserved number of mutations per marker and if theIberian R1b1 haplotype series is more asymmetrical interms of mutations compared to the Irish one it mightexplain the difference

However it is not so The Iberian haplotypes arepractically symmetrical (the degree of mutations is056) and from 0196plusmn0004 (the observed number ofmutations per marker without the corrections) the cor-rections brought the result to 0218plusmn0004 (correctedfor back mutations) and further to 0217plusmn0004(corrected for asymmetry of mutations) The degree ofasymmetry of the Irish 188 haplotypes was 054 so therespective results will be 0176plusmn0007 0192plusmn0007 and0190plusmn0007 Hence the Irish R1b1 haplotypes mightstill appear to be a little younger compared to theIberian R1b1 haplotypes

The pattern of the Irish R1b1 haplotypes however is abit more complicated since the above series of 218haplotypes also contains two different base haplotypesone in which DYS439=11 (shown in bold) and thereare 13 of such base haplotypes in the whole haplotypeseries

14-12-13-16-24-11-13-13-11-11-12-15-12- -11-12-11-11-14

and another in which DYS391=10 and DYS385b=15(shown in bold) and there are 24 of such base haplo-types among the total of 218

14-12-13-16-24- -13-13-11-11-12-15-12-12-11-12-11-11-

The last one is obviously a young lineage since the 24base haplotypes are sitting on the 98-haplotype branch(on the left-hand side in ) which gives an esti-mate of the TSCA for this branch of ln(9824)00285 =49 generations (without the correction) and 52 genera-tions with the correction for back mutations that isaround 1300 ybp The older 13 base haplotypes inthe total amount of 218 haplotypes would give2750plusmn290 years to the common ancestor

This dating actually matches fairly well the TSCA for1242 haplotypes from the British Isles for which 262haplotypes are identical to each other hence the basehaplotypes in the FTDNA format (X and Y stand for nottyped DYS385ab)

13-24-14-X-Y-12-12-12-13-13-29

The fraction of the base haplotype givesln(1242262)00179 = 87 generations (without correc-tion) or 96 generations with the correction that is2400plusmn250 years to the common ancestor This popula-tion of R1b1 is certainly younger compared with theIberian R1b1 series of haplotypes

Which of the base (ancestral) R1b1 haplotypes wereyounger in terms of the TSCA the Irish or the Iberianwas further examined using a larger series of 983 Irish

222

R1b1 haplotypes published in (McEvoy and Bradley2006) Their base haplotypes was as follows

14-12-13-16-24-11-13-13-9-11-12-15-12-12-11-10-11-11-14

In all 983 haplotypes 966 had the allele of 9 (98 oftotal) on DYS434 while in the preceding series of 218Irish haplotypes 216 of them had 11 (99 of total)on the same very locus It seems that the authors simplychanged their notation of haplotypes We have the samesituation with DYS461 where 189 out of 218 had the12 allele (87) while in the larger series it was 10 in833 of 943 haplotypes (88) It appears that thesehaplotypes are in fact identical to each other

All the 983 R1b1 haplotypes have 3706 mutations fromthe base haplotype that is 0198plusmn0003 for the averagenumber of mutations per marker This is practicallyidentical with 0196plusmn0004 for the Iberian R1b1 haplo-types The degree of asymmetry for the larger series is

exactly the same (054) as in the 218-haplotype seriesand cannot shift the number of mutations per markerhence the age of the common ancestor stays at3800plusmn380 ybp Therefore the Irish and the IberianR1b1 haplotypes (3625plusmn370 ybp) have practically thesame common ancestor from the viewpoint of DNAgenealogy

It is of interest to compare them to a Central Europeanseries such as the Flemish R1b 12-marker 64-haplotypeseries (Mertens 2007) In that case a non-FTDNAformat of 12 markers was employed in which DYS426and DYS388 were replaced with DYS437 and DYS438The average mutation rate for the format was calculatedand shown in Table 1 in the preceding paper All 64haplotypes have the following base haplotype (the first10 markers in the format of the FTDNA plus DYS437and DYS438)

13-24-14-11-11-14-X-Y-12-13-13-29-15-12

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 6: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

222

R1b1 haplotypes published in (McEvoy and Bradley2006) Their base haplotypes was as follows

14-12-13-16-24-11-13-13-9-11-12-15-12-12-11-10-11-11-14

In all 983 haplotypes 966 had the allele of 9 (98 oftotal) on DYS434 while in the preceding series of 218Irish haplotypes 216 of them had 11 (99 of total)on the same very locus It seems that the authors simplychanged their notation of haplotypes We have the samesituation with DYS461 where 189 out of 218 had the12 allele (87) while in the larger series it was 10 in833 of 943 haplotypes (88) It appears that thesehaplotypes are in fact identical to each other

All the 983 R1b1 haplotypes have 3706 mutations fromthe base haplotype that is 0198plusmn0003 for the averagenumber of mutations per marker This is practicallyidentical with 0196plusmn0004 for the Iberian R1b1 haplo-types The degree of asymmetry for the larger series is

exactly the same (054) as in the 218-haplotype seriesand cannot shift the number of mutations per markerhence the age of the common ancestor stays at3800plusmn380 ybp Therefore the Irish and the IberianR1b1 haplotypes (3625plusmn370 ybp) have practically thesame common ancestor from the viewpoint of DNAgenealogy

It is of interest to compare them to a Central Europeanseries such as the Flemish R1b 12-marker 64-haplotypeseries (Mertens 2007) In that case a non-FTDNAformat of 12 markers was employed in which DYS426and DYS388 were replaced with DYS437 and DYS438The average mutation rate for the format was calculatedand shown in Table 1 in the preceding paper All 64haplotypes have the following base haplotype (the first10 markers in the format of the FTDNA plus DYS437and DYS438)

13-24-14-11-11-14-X-Y-12-13-13-29-15-12

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 7: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

223Klyosov DNA genealogy mutation rates etc Part II Walking the map

All 64 haplotypes contained 215 mutations which re-sults in 4150plusmn500 years to the common ancestor

All those principal ancestral (base) haplotypes as well asthe Swedish series (Karlsson et al 2006) of 76 of the9-marker R1b1b2 haplotypes with the base

13-24-14-11-11-14-X-Y-Z-13-13-29

fit to the Atlantic Modal Haplotype All the 76 haplo-types included seven base haplotypes and 187 mutationsfrom it It gives ln(767)0017 = 140 generations(without the correction) or 163 (with the correction)when the logarithmic method is employed and187769000189 = 145 generations (without the cor-rection) or 169 (with the correction for back mutations)which gives 4225plusmn520 years to the common ancestor

The TSCA values for the various R1b populations andthe data from which they were derived are shown in

The ages of the Irish (3350plusmn360 ybp and3800plusmn380 ybp) Iberian (3625plusmn370 ybp) Flemish

(4150plusmn500 ybp) and Swedish (4225plusmn520 ybp) popula-tions differ insignificantly from each other in terms oftheir standard deviations (all within the 95 confidenceinterval) However it still can provide food for thoughtabout history of the European R1b1b2 population

These haplotypes were briefly considered in the preced-ing paper (Part I) as an example for calculating theTSCA for 1527 of 25-marker haplotypes taking intoaccount the effect of back mutations and the degree ofasymmetry of mutations

These 1527 haplotypes included 857 English haplo-types 366 Irish haplotypes and 304 Scottish haplotypesAll of them turned out to be strikingly similar and verylikely descended from the same common ancestor whohad the following haplotype

Population N HaplotypeSize (mkrs)

TSCA(years)

Source of Haplotypes

Basque 17 25 3625 plusmn 370 Basque Project see Web Resources

Basque 44 12 3500 plusmn 735 Basque Project see Web Resources

Iberian 750 19 3625 plusmn 370 Adams et al (2008)

Ireland--First Series (from Munster) 218 19 3800 plusmn 380 McEvoy (2008)

Ireland--Second Series 983 19 3800 plusmn 380 McEvoy (2006)

Ireland--rdquoYoungerrdquo (from Munster) 98 19 2750 plusmn 290 McEvoy (2008)

British 1242 9 2400 plusmn 250 Campbell (2007)

Flemish 64 12 4150 plusmn 500 Mertens (2007)

Swedish 76 9 4225 plusmn 520 Karlsson (2006)

R-U106 284 25 4175 plusmn 430 Weston (2009) see Web Resources

R-U152 184 25 4125 plusmn 450 Kerchner (2008) see Web Resources

R-P312 464 25 67 3950 plusmn 400 Stevens (2009a) see Web Resources

R-L21 509 25 67 3600 plusmn 370 Srevens (2009b) see Web Resources

R-L23 22 25 67 5475 plusmn 680 Ysearch see Web Resources

R-M222 269 25 67 1450 plusmn 150 Wilson (2009) see Web Resources

R-M269 (with all subclades) 197 25 67 4375 plusmn 450

Note Where the number of markers is shown as ldquo25 67rdquo it means that the calculations were done with 25 merkers but a diagram was constructedwith 67 markers to look for possible problematic branches

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 8: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

224

13-22-14-10-13-14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

All 1527 haplotypes contained 8785 mutations from theabove base haplotype which gives the ldquoobservedrdquo valueof 0230plusmn0002 mutations per marker in the 95 confi-dence interval Since the degree of asymmetry of thehaplotype is 065 the corrected (for back mutations andthe asymmetry of mutations) value is equal to0255plusmn0003 mutationsmarker which results in139plusmn14 generations that is 3475plusmn350 years to the com-mon ancestor at the 95 confidence level

The TSCA values for the English Irish and Scottish25-marker haplotypes calculated separately (the degreeof asymmetry for each series were equal to 066 064and 064 respectively) were 136plusmn14 151plusmn16 and131plusmn15 generations that is 3400plusmn350 3775plusmn400 and3275plusmn375 ybp Indeed their averaged value equals to139plusmn10 generations and the obtained dispersion showsthat the calculated standard deviations (based on plusmn10as the 95 confidence interval) are reasonable Hencethe time span to the common ancestor of 3475plusmn350years is a reliable estimate for more than 1500 EnglishIrish and Scottish individuals

The modal haplotype for the sub-clade of HaplogroupI1 defined by P109 (presently named I1d1 according toISOGG-2009) is different from that of Isles I1 on twomarkers out of 25 (those two shown in bold)

13- -14-10- -14-11-14-11-12-11-28-15-8-9-8-11-23-16-20-28-12-14-15-16

A set of I1-P109 haplotypes has a TSCA of 2275plusmn330years (Klyosov to be published)

The value for TSCA seen in the populations discussedabove are quite typical of other European I1 popula-tions For the Northwest EuropeanScandinavian(Denmark Sweden Norway Finland) combined seriesof haplotypes the TSCA equals to 3375plusmn345 ybp Forthe Central and South Europe (Austria Belgium Neth-erlands France Italy Switzerland Spain) it equals to3425plusmn350 ybp for Germany 3225plusmn330 ybp for theEast European countries (Poland Ukraine Belarus Rus-sia Lithuania) it is equal to 3225plusmn360 ybp (to bepublished) It is of interest that even Middle Eastern I1haplotypes (Jordan Lebanon and presumably Jewishones) descend from a common ancestor who lived atabout the same time 3475plusmn480 ybp (Klyosov to bepublished)

The ASD methods gave a noticeably higher values for theTSCAs for the English Irish Scottish and the pooledhaplotype series For the first three series of 25-marker

haplotypes calculated separately the TSCAs were 158175 and 155 generations respectively with their aver-aged value equal to 162plusmn11 generations which can becompared to 139plusmn10 generations calculated by theldquolinearrdquo method (corrected for back-mutations see com-panion article) corrected for back mutations As waspointed out the ASD method typically overestimates theTSCA by 15-20 due to double and triple mutationsand some almost unavoidable extraneous haplotypes towhich the ASD method is rather sensitive In this case ofthe Isle I1 haplotypes the overestimation of the ASD-derived TSCA was a typical 17 compared with theldquolinearrdquo method (corrected for back-mutations)

The ldquomappingrdquo of the enormous territory from theAtlantic through Russia and India to the Pacific andfrom Scandinavia to the Arabian Penisula reveals thatHaplogroup R1a men are marked with practically thesame ancestral haplotype which is about 4500 - 4700years ldquooldrdquo through much of its geographic rangeExceptions in Europe are found only in the Balkans(Serbia Kosovo Macedonia Bosnia) where the com-mon ancestor is significantly more ancient about11650plusmn1550 ybp and in the Irish Scottish and Swed-ish R1a1 populations which have a significantlyldquoyoungerrdquo common ancestor some thousands yearsldquoyoungerrdquo compared with the Russian the German andthe Poland R1a1 populations These geographic pat-terns will be explored below in this section Thehaplogroup name R1a1 is used here to mean Haplo-group R-M17 because that is the meaning in nearly allof the referenced articles

The entire map of base (ancestral) haplotypes and theirmutations as well as ldquoagesrdquo of common ancestors ofR1a1 haplotypes in Europe Asia and the Middle Eastshow that approximately six thousand years ago bearersof R1a1 haplogroup started to migrate from the Balkansin all directions spreading their haplotypes A recentexcavation of 4600 year-old R1a1 haplotypes (Haak etal 2008) revealed their almost exact match to present-day R1a1 haplotypes as it is shown below

Significantly the oldest R1a population appears to be inSouthern Siberia

The 57 R1a 25-marker haplotype series of English origin(YSearch database) contains ten haplotypes that belongto a DYS388=10 series and was analyzed separatelyThe remaining 47 haplotypes contain 304 mutationscompared to the base haplotype shown below whichcorresponds to 4125plusmn475 years to a common ancestorin the 95 confidence interval The respective haplo-type tree is shown in

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 9: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

225Klyosov DNA genealogy mutation rates etc Part II Walking the map

The 52 haplotype series of Ireland origin all with 25markers (YSearch database) contains 12 haplotypeswhich belong to a DYS388=10 distinct series as shownin and was analyzed separately The remaining40 haplotypes contain 244 mutations compared to thebase haplotype shown below which corresponds to3850plusmn460 years to a common ancestor

Thus R1a1 haplotypes sampled on the British Islespoint to English and Irish common ancestors who lived4125plusmn475 and 3850plusmn460 years ago The English base(ancestral) haplotype is as follows

13-25-15-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

and the Irish one

13-25-15- -11-14-12-12-10-13-11-30-15-9-10-11-11--14 20-32-12-15-15-16

An apparent difference in two alleles between the Britishand Irish ancestral haplotypes is in fact fairly insignifi-cant since the respective average alleles are equal to1051 and 1073 and 2398 and 2355 respectivelyHence their ancestral haplotypes are practically thesame within approximately one mutational difference

About 20 of both English and Irish R1a haplotypeshave a mutated allele in eighth position in the FTDNAformat (DYS388=12agrave10) with a common ancestor ofthat population who lived 3575plusmn450 years ago (172mutations in the 30 25-marker haplotypes withDYS388=10) 61 of these haplotypes were pooled froma number of European populations ( ) and thetree splits into a relatively younger branch on the leftand the ldquoolderrdquo branch on the lower right-hand side

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 10: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

226

This DYS388=10 mutation is observed in northern andwestern Europe mainly in England Ireland Norwayand to a much lesser degree in Sweden Denmark Neth-erlands and Germany In areas further east and souththat mutation is practically absent

31 haplotypes on the left-hand side and on the top of thetree ( ) contain collectively 86 mutations fromthe base haplotype

13-25-15-10-11-14-12-10-10-13-11-30-15-9-10-11-11-25-14-19-32-12-14-14-17

which corresponds to 1625plusmn240 years to the commonancestor It is a rather recent common ancestor wholived around the 4th century CE The closeness of thebranch to the trunk of the tree in also points tothe rather recent origin of the lineage

The older 30-haplotype branch in the tree provides withthe following base DYS388=10 haplotype

13-25-16-10-11-14-12-10-10-13-11-30-15-9-10-11-11-24-14-19-32-12-14-15-16

These 30 haplotypes contain 172 mutations for whichthe linear method gives 3575plusmn450 years to the commonancestor

These two DYS388=10 base haplotypes differ from eachother by less than four mutations which brings theircommon ancestor to about 3500 ybp It is very likelythat it is the same common ancestor as that of theright-hand branch in

The upper ldquoolderrdquo base haplotype differs by six muta-tions on average from the DYS388=12 base haplotypesfrom the same area (see the English and Irish R1a1 basehaplotypes above) This brings common ancestorto about 5700plusmn600 ybp

This common ancestor of both the DYS388=12 andDYS388=10 populations lived presumably in the Bal-kans (see below)since the Balkan population is the onlyEuropean R1a1 population that is that old for almosttwo thousand years before bearers of that mutationarrived to northern and western Europe some 4000 ybp(DYS388=12) and about 3600 ybp (DYS388=10) Thismutation has continued to be passed down through thegenerations to the present time

A set of 29 R1a1 25-marker haplotypes from Scotlandhave the same ancestral haplotype as those in Englandand Ireland

(6)

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 11: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

227Klyosov DNA genealogy mutation rates etc Part II Walking the map

13-25-15-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

This set of haplotypes contained 164 mutations whichgives 3550plusmn450 years to the common ancestor

A 67-haplotype series with 25 markers from Germanyrevealed the following ancestral haplotype

13-25- -10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14 20-32-12-15-15-16

There is an apparent mutation in the third allele fromthe left (in bold) compared with the Isles ancestral R1a1haplotypes however the average value equals to 1548for England 1533 for Ireland 1526 for Scotland and1584 for Germany so there is only a small differencebetween these populations The 67 haplotypes contain488 mutations or 0291plusmn0013 mutations per markeron average This value corresponds to 4700plusmn520 yearsfrom a common ancestor in the German territory

These results are supported by the very recent datafound during excavation of remains from several malesnear Eulau Germany Tissue samples from one of themwas derived for the SNP SRY108312 (Haak et al2008) so it was from Haplogroup R1a1 and probablyR1a1a as well (the SNPs defining R1a1a were nottested) The skeletons were dated to 4600 ybp usingstrontium isotope analysis The 4600 year old remainsyielded some DNA and a few STRs were detectedyielding the following partial haplotype

13(14)-25-16-11-11-14-X-Y-10-13-Z-30-15

These haplotypes very closely resemble the above R1a1ancestral haplotype in Germany both in the structureand in the dating (4700plusmn520 and 4600 ybp)

The ancestral R1a1 haplotypes for the Norwegian andSwedish groups are almost the same and both are similarto the German 25-marker ancestral haplotype Their16-and 19-haplotype sets (after DYS388=10 haplotypeswere removed five and one respectively) contain onaverage 0218plusmn0023 and 0242plusmn0023 mutations permarker respectively which gives a result of 3375plusmn490and 3825plusmn520 years back to their common ancestorsThis is likely the same time span within the error margin

The ancestral haplotypes for the Polish Czech andSlovak groups are very similar to each other having onlyone insignificant difference in DYS439 (shown in bold)The average value is 1043 on DYS439 in the Polish

group 1063 in the Czech and Slovak combined groupsThe base haplotype for the 44 Polish haplotypes is

13-25-16-10-11-14-12-12- -13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

and for the 27 Czech and Slovak haplotypes it is

13-25-16-10-11-14-12-12- 13-11-30-16-9-10-11-11-23-14-20-32-12-15-15-16

The difference in these haplotypes is really only 02mutations--the alleles are just rounded in opposite direc-tions

The 44 Polish haplotypes have 310 mutations amongthem and there are 175 mutations in the 27 Czech-Slovak haplotypes resulting in 0282plusmn0016 and0259plusmn0020 mutations per marker on average respec-tively These values correspond to 4550plusmn520 and4125plusmn430 years back to their common ancestors re-spectively which is very similar to the German TSCA

Many European countries are represented by just a few25-marker haplotypes in the databases 36 R1a1 haplo-types were collected from the YSearch database where theancestral origin of the paternal line was from DenmarkNetherlands Switzerland Iceland Belgium France ItalyLithuania Romania Albania Montenegro SloveniaCroatia Spain Greece Bulgaria and Moldavia Therespective haplotype tree is shown in

The tree does not show any noticeable anomalies andpoints at just one common ancestor for all 36 individu-als who had the following base haplotype

13-25-16-10-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

The ancestral haplotype match is exactly the same asthose in Germany Russia (see below) and has quiteinsignificant deviations from all other ancestral haplo-types considered above within fractions of mutationaldifferences All the 36 individuals have 248 mutationsin their 25-marker haplotypes which corresponds to4425plusmn520 years to the common ancestor This is quitea common value for European R1a1 population

The haplotype tree containing 110 of 25-marker haplo-types collected over 10 time zones from the WesternUkraine to the Pacific Ocean and from the northerntundra to Central Asia (Tadzhikistan and Kyrgyzstan) isshown in

The ancestral (base) haplotype for the haplotype tree is

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 12: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

228

13-25-16-11-11-14-12-12-10-13-11-30-15-9-10-11-11-24-14-20-32-12-15-15-16

It is almost exactly the same ancestral haplotype as thatin Germany The average value on DYS391 is 1053 inthe Russian haplotypes and was rounded to 11 and thebase haplotype has only one insignificant deviation fromthe ancestral haplotype in England which has the thirdallele (DYS19) 1548 which in RussiaUkraine it is1583 (calculated from one DYS19=14 32 DYS19=1562 DYS19=16 and 15 DYS19=17) There are similarinsignificant deviations with the Polish and Czechoslo-vakian base haplotypes at DYS458 and DYS447 re-

spectively within 02-05 mutations This indicates asmall mutational difference between their commonancestors

The 110 RussianUkranian haplotypes contain 804 mu-tations from the base haplotype or 0292plusmn0010 muta-tions per marker resulting in 4750plusmn500 years from acommon ancestor The degree of asymmetry of thisseries of haplotypes is exactly 050 and does not affectthe calculations

R1a1 haplotypes of individuals who considered them-selves of Ukrainian and Russian origin present a practi-

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 13: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

229Klyosov DNA genealogy mutation rates etc Part II Walking the map

cally random geographic sample For example thehaplotype tree contains two local Central Asian haplo-types (a Tadzhik and a Kyrgyz haplotypes 133 and 127respectively) as well as a local of a Caucasian Moun-tains Karachaev tribe (haplotype 166) though the maleancestry of the last one is unknown beyond his presenttribal affiliation They did not show any unusual devia-tions from other R1a1 haplotypes Apparently they arederived from the same common ancestor as are all otherindividuals of the R1a1 set

The literature frequently refers to a statement that R1a1-M17 originated from a ldquorefugerdquo in the present Ukraineabout 15000 years ago following the Last GlacialMaximum This statement was never substantiated byany actual data related to haplotypes and haplogroups

It is just repeated over and over through a relay ofreferences to references The oldest reference is appar-ently that of Semino et al (2000) which states that ldquothisscenario is supported by the finding that the maximumvariation for microsatellites linked to Eu19 [R1a1] isfound in Ukrainerdquo (Santachiara-Benerecetti unpub-lished data) Now we know that this statement isincorrect No calculations were provided in (Semino etal 2000)or elsewhere which would explain the dating of15000 years

Then a paper by Wells et al (2001) states ldquoM17 adescendant of M173 is apparently much younger withan inferred age of ~ 15000 years No actual data orcalculations are provided The subsequent sentence inthe paper says ldquoIt must be noted that these age estimates

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 14: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

230

are dependent on many possibly invalid assumptionsabout mutational processes and population structureThis sentence has turned out to be valid in the sense thatthe estimate was inaccurate and overestimated by about300 from the results obtained here However see theresults below for the Chinese and southern Siberianhaplotypes below

A more detailed consideration of R1a1 haplotypes in theRussian Plain and across Europe and Eurasia in generalhas shown that R1a1 haplogroup appeared in Europebetween 12 and 10 thousand years before present rightafter the Last Glacial Maximum and after about 6000ybp had populated Europe though probably with low

density After 4500 ybp R1a1 practically disappearedfrom Europe incidentally along with I1 Maybe moreincidentally it corresponded with the time period ofpopulating of Europe with R1b1b2 Only those R1a1who migrated to the Russian Plain from Europe around6000-5000 years bp stayed They had expanded to theEast established on their way a number of archaeologi-cal cultures including the Andronovo culture which hasembraced Northern Kazakhstan Central Asia and SouthUral and Western Siberia and about 3600 ybp theymigrated to India and Iran as the Aryans Those wholeft behind on the Russian Plain re-populated Europebetween 3200 and 2500 years bp and stayed mainly inthe Eastern Europe (present-day Poland Germany

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 15: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

231Klyosov DNA genealogy mutation rates etc Part II Walking the map

Czech Slovak etc regions) Among them were carriersof the newly discovered R1a1-M458 subclade (Underhillet al 2009)

The YSearch database contains 22 of the 25-markerR1a1 haplotypes from India including a few haplotypesfrom Pakistan and Sri Lanka Their ancestral haplotypefollows

13-25-16-10-11-14-12-12-10-13-11-30- -9-10-11-11-24-14-20-32-12-15-15-16

The only one apparent deviation in DYS458 (shown inbold) is related to an average alleles equal to 1605 inIndian haplotypes and 1528 in Russian ones

The India (Regional) Y-project at FTDNA (Rutledge2009) contains 15 haplotypes with 25 markers eight ofwhich are not listed in the YSearch database Combinedwith others from Ysearch a set of 30 haplotypes isavailable Their ancestral haplotype is exactly the sameas shown above shows the respective haplo-types tree

All 30 Indian R1a1 haplotypes contain 191 mutationsthat is 0255plusmn0018 mutations per marker It is statisti-cally lower (with 95 confidence interval) than0292plusmn0014 mutations per marker in the Russian hap-lotypes and corresponds to 4050plusmn500 years from acommon ancestor of the Indian haplotypes compared to4750plusmn500 years for the ancient ldquoRussianrdquo TSCA

Archaeological studies have been conducted since the1990rsquos in the South Uralrsquos Arkaim settlement and haverevealed that the settlement was abandoned 3600 yearsago The population apparently moved to northernIndia That population belonged to Andronovo archae-ological culture Excavations of some sites of Androno-vo culture the oldest dating between 3800 and 3400ybp showed that nine inhabitants out of ten shared theR1a1 haplogroup and haplotypes (Bouakaze et al2007 Keyser et al 2009) The base haplotype is asfollows

13-25(24)-16(17)-11-11-14-X-Y-10-13(14)-11-31(32)

In this example alleles that have not been assessed arereplaced with letters One can see that the ancient R1a1haplotype closely resembles the Russian (as well as theother R1a1) ancestral haplotypes

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 16: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

232

In a recent paper (Keyser et al 2009) the authors haveextended the earlier analysis and determined 17-markerhaplotypes for 10 Siberian individuals assigned to An-dronovo Tagar and Tashtyk archaeological culturesNine of them shared the R1a1 haplogroup (one was ofHaplogroup CxC3) The authors have reported thatldquonone of the Y-STR haplotypes perfectly matched those

included in the databasesrdquo and for some of those haplo-types ldquoeven the search based on the 9-loci minimalhaplotype was fruitless However as showsall of the haplotypes nicely fit to the 17-marker haplo-type tree of 252 Russian R1a1 haplotypes (with a com-mon ancestor of 4750plusmn490 ybp calculated using these17-marker haplotypes [Klyosov 2009b]) a list of whichwas recently published (Roewer et al 2008) All sevenAndronovo Tagar and Tashtyk haplotypes (the othertwo were incomplete) are located in the upper right-hand side corner in and the respective branchon the tree is shown in

As one can see the ancient R1a1 haplotypes excavatedin Siberia are comfortably located on the tree next tothe haplotypes from the Russian cities and regionsnamed in the legend to

The above data provide rather strong evidence that theR1a1 tribe migrated from Europe to the East between5000 and 3600 ybp The pattern of this migration isexhibited as follows 1) the descendants who live todayshare a common ancestor of 4725plusmn520 ybp 2) theAndronovo (and the others) archaeological complex ofcultures in North Kazakhstan and South and WesternSiberia dates 4300 to 3500 ybp and it revealed severalR1a1 excavated haplogroups 3) they reached the SouthUral region some 4000 ybp which is where they builtArkaim Sintashta (contemporary names) and the so-called a country of towns in the South Ural regionaround 3800 ybp 4) by 3600 ybp they abandoned thearea and moved to India under the name of Aryans TheIndian R1a1 common ancestor of 4050plusmn500 ybpchronologically corresponds to these events Currentlysome 16 of the Indian population that is about 100millions males and the majority of the upper castes aremembers of Haplogroup R1a1(Sengupta et al 2006Sharma et al 2009)

Since we have mentioned Haplogroup R1a1 in Indiaand in the archaeological cultures north of it it isworthwhile to consider the question of where and whenR1a1 haplotypes appeared in India On the one handit is rather obvious from the above that R1a1 haplo-types were brought to India around 3500 ybp fromwhat is now Russia The R1a1 bearers known later asthe Aryans brought to India not only their haplotypesand the haplogroup but also their language therebyclosing the loop or building the linguistic and culturalbridge between India (and Iran) and Europe and possi-bly creating the Indo-European family of languages Onthe other hand there is evidence that some Indian R1a1haplotypes show a high variance exceeding that inEurope (Kivisild et al 2003 Sengupta et al 2006Sharma et al 2009 Thanseem et al 2009 Fornarino etal 2009) thereby antedating the 4000-year-old migra-

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 17: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

233Klyosov DNA genealogy mutation rates etc Part II Walking the map

tion from the north However there is a way to resolvethe conundrum

It seems that there are two quite distinct sources ofHaplogroup R1a1 in India One indeed was probablybrought from the north by the Aryans However themost ancient source of R1a1 haplotypes appears to beprovided by people who now live in China

In an article by Bittles et al (2007) entitled Physicalanthropology and ethnicity in Asia the transition fromanthropometry to genome-based studies a list of fre-quencies of Haplogroup R1a1 is given for a number ofChinese populations however haplotypes were notprovided The corresponding author Professor Alan HBittles kindly sent me the following list of 31 five-mark-er haplotypes (presented here in the format DYS19 388389-1 389-2 393) in The haplotype tree isshown in

17 12 13 30 13 14 12 13 30 12 14 12 12 31 1314 13 13 32 13 14 12 13 30 13 17 12 13 32 1314 12 13 32 13 17 12 13 30 13 17 12 13 31 1317 12 12 28 13 14 12 13 30 13 14 12 12 29 1317 12 14 31 13 14 12 12 28 13 14 13 14 30 1314 12 12 28 12 15 12 13 31 13 15 12 13 31 1314 12 14 29 10 14 12 13 31 10 14 12 13 31 1014 14 14 30 13 17 14 13 29 10 14 12 13 29 1017 13 13 31 13 14 12 12 29 13 14 12 14 28 1017 12 13 30 13 14 13 13 29 13 16 12 13 29 1314 12 13 31 13

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 18: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

234

These 31 5-marker haplotypes contain 99 mutationsfrom the base haplotype (in the FTDNA format)

13-X-14-X-X-X-X-12-X-13-X-30

which gives 0639plusmn0085 mutations per marker or21000plusmn3000 years to a common ancestor Such a largetimespan to a common ancestor results from a lowmutation rate constant which was calculated from theChandlerrsquos data for individual markers as 000677mutationhaplotypegeneration or 000135mutationmarkergeneration (see Table 1 in Part I of thecompanion article)

Since haplotypes descended from such a ancient com-mon ancestor have many mutations which makes theirbase (ancestral) haplotypes rather uncertain the ASDpermutational method was employed for the Chinese setof haplotypes (Part I) The ASD permutational methoddoes not need a base haplotype and it does not require acorrection for back mutations For the given series of 31five-marker haplotypes the sum of squared differencesbetween each allele in each marker equals to 10184 It

should be divided by the square of a number of haplo-types in the series (961) by a number of markers in thehaplotype (5) and by 2 since the squared differencesbetween alleles in each marker were taken both ways Itgives an average number of mutations per marker of1060 After division of this value by the mutation ratefor the five-marker haplotypes (000135mutmarkergeneration for 25 years per generation) weobtain 19625plusmn2800 years to a common ancestor It iswithin the margin of error with that calculated by thelinear method as shown above

It is likely that Haplogroup R1a1 had appeared in SouthSiberia around 20 thousand years ago and its bearerssplit One migration group headed West and hadarrived to the Balkans around 12 thousand years ago(see below) Another group had appeared in Chinasome 21 thousand years ago Apparently bearers ofR1a1 haplotypes made their way from China to SouthIndia between five and seven thousand years ago andthose haplotypes were quite different compared to theAryan ones or the ldquoIndo-Europeanrdquo haplotypes Thisis seen from the following data

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 19: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

235Klyosov DNA genealogy mutation rates etc Part II Walking the map

Thanseem et al (2006) found 46 R1a1 haplotypes of sixmarkers each in three different tribal population ofAndra Pradesh South India (tribes Naikpod Andh andPardhan) The haplotypes are shown in a haplotype treein The 46 haplotypes contain 126 mutationsso there were 0457plusmn0050 mutations per marker Thisgives 7125plusmn950 years to a common ancestor The base(ancestral) haplotype of those populations in the FTD-NA format is as follows

13-25-17-9-X-X-X-X-X-14-X-32

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by four mutations on six markers which corresponds to11850 years between their common ancestors andplaces their common ancestor at approximately(11850+4050+7125)2 = 11500 ybp

110 of 10-marker R1a1 haplotypes of various Indian popula-tions both tribal and Dravidian and Indo-European casteslisted in (Sengupta et al 2006) shown in a haplotype tree inFigure 13 contain 344 mutations that is 0313plusmn0019 muta-tions per marker It gives 5275plusmn600 years to a common

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 20: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

236

ancestor The base (ancestral) haplotype of those popula-tions in the FTDNA format is as follows

13-25-15-10-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype

13-25-16-10-11-14-12-12-10-13-11-30

by just 05 mutations on nine markers (DYS19 has in fact amean value of 155 in the Indian haplotypes) which makesthem practically identical However in this haplotype seriestwo different populations the ldquoIndo-Europeanrdquo one and theldquoSouth-Indianrdquo one were mixed therefore an ldquointermediaterdquoapparently phantom ldquocommon ancestorrdquo was artificially cre-ated with an intermediate TSCA between 4050plusmn500 and7125plusmn950 (see above)

For a comparison let us consider Pakistani R1a1 haplo-types listed in the article by Sengupta (2003) and shownin The 42 haplotypes contain 166 mutationswhich gives 0395plusmn0037 mutations per marker and7025plusmn890 years to a common ancestor This value fitswithin the margin of error to the ldquoSouth-Indianrdquo TSCAof 7125plusmn950 ybp The base haplotype was as follows

13-25-17-11-X-X-X-12-10-13-11-30

It differs from the ldquoIndo-Europeanrdquo Indian haplotype bytwo mutations in the nine markers

Finally we will consider Central Asian R1a1 haplotypeslisted in the same paper by Sengupta (2006) Tenhaplotypes contain only 25 mutations which gives0250plusmn0056 mutations per haplotype and 4050plusmn900

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 21: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

237Klyosov DNA genealogy mutation rates etc Part II Walking the map

years to a common ancestor It is the same value that wehave found for the ldquoIndo-Europeanrdquo Indian haplotypes

The above suggests that there are two different subsetsof Indian R1a1 haplotypes One was brought by Euro-pean bearers known as the Aryans seemingly on theirway through Central Asia in the 2nd millennium BCEanother much more ancient made its way apparentlythrough China and arrived in India earlier than seventhousand years ago

Sixteen R1a1 10-marker haplotypes from Qatar andUnited Arab Emirates have been recently published(Cadenas et al 2008) They split into two brancheswith base haplotypes

13-25-15-11-11-14-X-Y-10-13-11-

13-25-16-11-11-14-X-Y-10-13-11-

which are only slightly different and on DYS19 themean values are almost the same just rounded up ordown The first haplotype is the base one for sevenhaplotypes with 13 mutations in them on average0186plusmn0052 mutations per marker which gives2300plusmn680 years to a common ancestor The secondhaplotype is the base one for nine haplotypes with 26mutations an average 0289plusmn0057 mutations permarker or 3750plusmn825 years to a common ancestorSince a common ancestor of R1a1 haplotypes in Arme-nia and Anatolia lived 4500plusmn1040 and 3700plusmn550 ybprespectively (Klyosov 2008b) it does not conflict with3750plusmn825 ybp in the Arabian peninsula

A series of 67 haplotypes of Haplogroup R1a1 from theBalkans was published (Barac et al 2003a 2003bPericic et al 2005) They were presented in a 9-markerformat only The respective haplotype tree is shown in

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 22: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

238

One can see a remarkable branch on the left-hand sideof the tree which stands out as an ldquoextended and fluffyrdquoone These are typically features of a very old branchcompared with others on the same tree Also a commonfeature of ancient haplotype trees is that they are typical-ly ldquoheterogeneousrdquo ones and consist of a number ofbranches

The tree in includes a rather small branch oftwelve haplotypes on top of the tree which containsonly 14 mutations This results in 0130plusmn0035 muta-tions per marker or 1850plusmn530 years to a commonancestor Its base haplotype

13-25-16-10-11-14-X-Y-Z-13-11-30

is practically the same as that in Russia and Germany

The wide 27-haplotype branch on the right contains0280plusmn0034 mutations per marker which is rathertypical for R1a1 haplotypes in Europe It gives typicalin kind 4350plusmn680 years to a common ancestor of thebranch Its base haplotype

13-25-16- -11-14-X-Y-Z-13-11-30

is again typical for Eastern European R1a1 base haplo-types in which the fourth marker (DYS391) often fluc-tuates between 10 and 11 Of 110 Russian-Ukrainianhaplotypes diagramed in 51 haplotypes haveldquo10rdquo and 57 have ldquo11rdquo in that locus (one has ldquo9rdquo andone has ldquo12rdquo) In 67 German haplotypes discussedabove 43 haplotypes have ldquo10rdquo 23 have ldquo11rdquo and onehas ldquo12 Hence the Balkan haplotypes from thisbranch are more close to the Russian than to the Ger-man haplotypes

The ldquoextended and fluffyrdquo 13-haplotype branch on theleft contains the following haplotypes

13 24 16 12 14 15 13 11 3112 24 16 10 12 15 13 13 2912 24 15 11 12 15 13 13 2914 24 16 11 11 15 15 11 3213 23 14 10 13 17 13 11 3113 24 14 11 11 11 13 13 2913 25 15 9 11 14 13 11 3113 25 15 11 11 15 12 11 2912 22 15 10 15 17 14 11 3014 25 15 10 11 15 13 11 2913 25 15 10 12 14 13 11 2913 26 15 10 11 15 13 11 2913 23 15 10 13 14 12 11 28

The set does not contain a haplotype which can bedefined as a base This is because common ancestorlived too long ago and all haplotypes of his descendantsliving today are extensively mutated In order to deter-

mine when that common ancestor lived we have em-ployed three different approaches described in thepreceding paper (Part 1) namely the ldquolinearrdquo methodwith the correction for reverse mutations the ASDmethod based on a deduced base (ancestral) haplotypeand the permutational ASD method (no base haplotypeconsidered) The linear method gave the followingdeduced base haplotype an alleged one for a commonancestor of those 13 individuals from Serbia KosovoBosnia and Macedonia

13- - -10- - -X-Y-Z-13-11-

The bold notations identify deviations from typical an-cestral (base) East-European haplotypes The thirdallele (DYS19) is identical to the Atlantic and Scandina-vian R1a1 base haplotypes All 13 haplotypes contain70 mutations from this base haplotype which gives0598plusmn0071 mutations on average per marker andresults in 11425plusmn1780 years from a common ancestor

The ldquoquadratic methodrdquo (ASD) gives the followingldquobase haplotyperdquo (the unknown alleles are eliminatedhere and the last allele is presented as the DYS389-2notation)

1292 ndash 2415 ndash 1508 ndash 1038 ndash 1208 ndash 1477 ndash 1308ndash 1146 ndash 1662

A sum of square deviations from the above haplotyperesults in 103 mutations total including reverse muta-tions ldquohiddenrdquo in the linear method Seventyldquoobservedrdquo mutations in the linear method amount toonly 68 of the ldquoactualrdquo mutations including reversemutations Since all 13 haplotypes contain 117 markersthe average number of mutations per marker is0880plusmn0081 which corresponds to 0880000189 =466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor 000189 is the mutation rate (in muta-tions per marker per generation) for the given 9-markerhaplotypes (companion paper Part I)

A calculation of 11650plusmn1550 years to a common an-cestor is practically the same as 11425plusmn1780 yearsobtained with linear method and corrected for reversemutations

The all-permutation ldquoquadraticrdquo method (Adamov ampKlyosov 2008) gives 2680 as a sum of all square differ-ences in all permutations between alleles When dividedby N2 (N = number of haplotypes that is 13) by 9(number of markers in haplotype) and by 2 (since devia-tion were both ldquouprdquo and ldquodownrdquo) we obtain an averagenumber of mutations per marker equal to 0881 It isnear exactly equal to 0880 obtained by the quadraticmethod above Naturally it gives again 0881000189= 466plusmn62 generations or 11650plusmn1550 years to a com-mon ancestor of the R1a1 group in the Balkans

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 23: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

239Klyosov DNA genealogy mutation rates etc Part II Walking the map

These results suggest that the first bearers of the R1a1haplogroup in Europe lived in the Balkans (Serbia Kos-ovo Bosnia Macedonia) between 10 and 13 thousandybp It was shown below that Haplogroup R1a1 hasappeared in Asia apparently in China or rather SouthSiberia around 20000 ybp It appears that some of itsbearers migrated to the Balkans in the following 7-10thousand years It was found (Klyosov 2008a) thatHaplogroup R1b appeared about 16000 ybp apparent-ly in Asia and it also migrated to Europe in the follow-ing 11-13 thousand years It is of a certain interest thata common ancestor of the ethnic Russians of R1b isdated 6775plusmn830 ybp (Klyosov to be published) whichis about two to three thousand years earlier than mostof the European R1b common ancestors It is plausiblethat the Russian R1b have descended from the Kurganarchaeological culture

The data shown above suggests that only about 6000-5000 ybp bearers of R1a1 presumably in the Balkansbegan to mobilize and migrate to the west toward theAtlantics to the north toward the Baltic Sea and Scandi-navia to the east to the Russian plains and steppes tothe south to Asia Minor the Middle East and far southto the Arabian Sea All of those local R1a1 haplotypespoint to their common ancestors who lived around4800 to 4500 ybp On their way through the Russianplains and steppes the R1a1 tribe presumably formedthe Andronovo archaeological culture apparently do-mesticated the horse advanced to Central Asia andformed the ldquoAryan populationrdquo which dated to about4500 ybp They then moved to the Ural mountainsabout 4000 ybp and migrated to India as the Aryanscirca 3600-3500 ybp Presently 16 of the maleIndian population or approximately 100 million peo-ple bear the R1a1 haplogrouprsquos SNP mutation(SRY108312) with their common ancestor of4050plusmn500 ybp of times back to the Andronovo archae-ological culture and the Aryans in the Russian plains andsteppes The current ldquoIndo-Europeanrdquo Indian R1a1haplotypes are practically indistinguishable from Rus-sian Ukrainian and Central Asian R1a1 haplotypes aswell as from many West and Central European R1a1haplotypes These populations speak languages of theIndo-European language family

The next section focuses on some trails of R1a1 in Indiaan ldquoAryan trail

Kivisild et al (2003) reported that eleven out of 41individuals tested in the Chenchu an australoid tribalgroup from southern India are in Haplogroup R1a1 (or27 of the total) It is tempting to associate this withthe Aryan influx into India which occurred some3600-3500 ybp However questionable calculationsof time spans to a common ancestor of R1a1 in India

and particularly in the Chenchu (Kivisild et al 2003Sengupta et al 2006 Thanseem et al 2006 Sahoo et al2006 Sharma et al 2009 Fornarino et al 2009) usingmethods of population genetics rather than those ofDNA genealogy have precluded an objective and bal-anced discussion of the events and their consequences

The eleven R1a1 haplotypes of the Chenchus (Kivisild etal 2003) do not provide good statistics however theycan allow a reasonable estimate of a time span to acommon ancestor for these 11 individuals Logically ifthese haplotypes are more or less identical with just afew mutations in them a common ancestor would likelyhave lived within a thousand or two thousands of ybpConversely if these haplotypes are all mutated andthere is no base (ancestral) haplotype among them acommon ancestor lived thousands ybp Even two base(identical) haplotypes among 11 would tentatively giveln(112)00088 = 194 generations which corrected toback mutations would result in 240 generations or6000 years to a common ancestor with a large marginof error If eleven of the six-marker haplotypes are allmutated it would mean that a common ancestor livedapparently earlier than 6 thousand ybp Hence evenwith such a small set of haplotypes one can obtain usefuland meaningful information

The eleven Chenchu haplotypes have seven identical(base) six-marker haplotypes (in the format of DYS19-388-290-391-392-393 commonly employed in earli-er scientific publications)

16-12-24-11-11-13

They are practically the same as those common EastEuropean ancestral haplotypes considered above if pre-sented in the same six-marker format

16-12-25-11(10)-11-13

Actually the author of this study himself an R1a1 SlavR1a1) has the ldquoChenchurdquo base six-marker haplotype

These identical haplotypes are represented by a ldquocombrdquoin If all seven identical haplotypes are derivedfrom the same common ancestor as the other four mu-tated haplotypes the common ancestor would havelived on average of only 51 generations bp or less than1300 years ago [ln(117)00088 = 51] with a certainmargin of error (see estimates below) In fact theChenchu R1a1 haplotypes represent two lineages one3200plusmn1900 years old and the other only 350plusmn350years old starting from around the 17th century CE Thetree in shows these two lineages

A quantitative description of these two lineages is asfollows Despite the 11-haplotype series containingseven identical haplotypes which in the case of one

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 24: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

240

common ancestor for the series would have indicated51 generations (with a proper margin of error) from acommon ancestor the same 11 haplotypes contain 9mutations from the above base haplotype The linearmethod gives 91100088 = 93 generations to a com-mon ancestor (both values without a correction for backmutations) Because there is a significant mismatchbetween the 51 and 93 generations one can concludethat the 11 haplotypes descended from more than onecommon ancestor Clearly the Chenchu R1a1 haplo-type set points to a minimum of two common ancestorswhich is confirmed by the haplotype tree ( ) Arecent branch includes eight haplotypes seven beingbase haplotypes and one with only one mutation Theolder branch contains three haplotypes containing threemutations from their base haplotype

15-12-25-10-11-13

The recent branch results in ln(87)00088 = 15 genera-tions (by the logarithmic method) and 1800088 = 14generations (the linear method) from the residual sevenbase haplotypes and a number of mutations (just one)respectively It shows a good fit between the twoestimates This confirms that a single common ancestorfor eight individuals of the eleven lived only about350plusmn350 ybp around the 17th century The old branchof haplotypes points at a common ancestor who lived3300088 = 114plusmn67 generations BP or 3200plusmn1900ybp with a correction for back mutations

Considering that the Aryan (R1a1) migration to north-ern India took place about 3600-3500 ybp it is quite

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 25: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

241Klyosov DNA genealogy mutation rates etc Part II Walking the map

plausible to attribute the appearance of R1a1 in theChenchu by 3200plusmn1900 ybp to the Aryans

The origin of the influx of Chenchu R1a1 haplotypesaround the 17th century is likely found in this passageexcerpted from (Kivisild et al 2003) ldquoChenchus werefirst described as shy hunter-gatherers by the Moham-medan army in 1694rdquo

Let us consider much more distant time periods tofurther examine and justify the timing methods of DNAgenealogy developed in this study 117 six-markerhaplotypes of Native Americans all members of Haplo-group Q-M3 (Q1a3a) have been published (Bortolini etal 2003) and a haplotype tree shown in was developed based upon their data

The tree contains 31 identical (base) haplotypes and 273mutations from that ldquobaserdquo haplotype It is obvious

that the haplotypes in the tree descended from differentcommon ancestors since 31 base haplotypes out of 117total would give ln(11731)00088 =151 generations toa common ancestor though 273 mutations in all 117haplotypes would give 265 generations (both 151 and265 without corrections for back mutations) This isour principal criterion suggested in this study whichpoints at multiplicity (more than one) of common ances-tors in a given haplotype series This in turn makes anycalculations of a time span to a ldquocommon ancestorrdquousing all 117 haplotypes invalid since the result wouldpoint to a ldquophantom common ancestorrdquo Depending onrelative amounts of descendants from different commonancestors in the same haplotype series a timespan to aldquophantom common ancestorrdquo varies greatly often bymany thousands of years

An analysis of the haplotype tree in shows thatit includes at least six lineages each with its own com-mon ancestor Four of them turned out to be quite

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 26: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

242

recent common ancestors who lived within the lastthousand years They had the following base haplotypes

13-12-24-10-14-13 13-12-23-10-14-13 13-12-24-10-15-12 13-12-24-10-13-14

The oldest branch contains 11 haplotypes with thefollowing base (ancestral) haplotype

13-13-24-10-14-14

This branch contains 32 mutations which gives0485plusmn0086 mutations per marker on average for six-marker haplotypes that is 12125plusmn2460 years to acommon ancestor for those 11 individuals

However this was the most ancient ancestor of just onebranch of haplotypes From several base haplotypesshown above one can see that a mutational differencebetween this base haplotype and more recent base hap-lotypes reach 4 mutations per six-marker haplotypeThis corresponds to about 19700 years of mutationaldifference between them and points out that com-mon ancestor lived 16300 plusmn 3300 ybp

Hence a common ancestor of several groups of individ-uals among Native Americans of Haplogroup Q1a3aand having largely varied haplotypes lived between13000 and 19600 ybp with the 95 confidence inter-val This dating is in line with many independent datesfrom archaeological climatological and genetic studiesof Native American origins Some researchers refer thepeopling of the Americas to the end of the last glacialmaximum approximately 23000 to 19000 years agoand suggest a strong population expansion started ap-proximately 18000 and finished 15000 ybp (Fagundeset al 2008) Others refer to archaeology data of Paleo-Indian people between 11000 to approximately 18-22000 ybp (Haynes 2002 p 52 Lepper 1999 pp362ndash394 Bradley and Stanford 2004 Seielstad et al2003) In any event the time span of 16000 years agois quite compatible with those estimates

The ldquoCohen Modal Haplotyperdquo (CMH) was introduced(Thomas et al 1998) ten years ago to designate thefollowing six-marker haplotype (in DYS 19-388-390-391-392-393 format)

14-16-23-10-11-12

Further research showed that this haplotype presents inboth J1 and J2 haplogroups

In Haplogroup J1 if 12-marker haplotypes are collectedand analyzed the 12-marker haplotype correspondingto the CMH may be shown to splits into two principallineages (Klyosov 2008c) with the following base(ancestral) haplotypes

12-23-14-10-13- -11-16- 13-11-

and

12-23-14-10-13- -11-16- 13-11-

which differ from each other by four mutations (shownin bold)

In Haplogroup J2 the same six-marker ldquoCMHrdquo alsoexists and if 12-marker haplotypes containing the CMHare collected and analyzed the following 12-markerldquoCMHrdquo modal or base haplotype can be determined(Klyosov 2008c)

12-23-14-10-13-17-11-16-11-13-11-30

It differs from both J1 CMHs by three and one muta-tions respectively

In fact actual Cohanim of Haplogroup J2 recognizedby the Cohanim Association (private communicationfrom the President of the Cohanim Association of LatinAmerica Mr Mashuah HaCohen-Pereira alsowwwcohenorgbr) have the following base 12-markerhaplotypes

12-23-15-10-14-17-11-15-12-13-11-29

which differs by five mutations from the above CMH-J2base haplotype (to be published) Another Cohanimbase haplotype

12-23-15-10-13-18-11-15-12-13-11-29

is currently assigned to the Cohanim-Sephardim lineageThe above Cohanim haplotype belonged to the commonancestor who lived 3000plusmn560 ybp The presumablyCohanim-Sephardim base haplotype belonged to thecommon ancestor who lived 2500plusmn710 ybp (to bepublished)

In this section we consider the J1 ldquoCohen Modal Haplo-typerdquo in its extended format of the 25- 37- and 67-marker haplotypes

A 25-marker haplotype tree of 49 presumed Jewish J1haplotypes is shown in In this tree the twoCMH branches are located on either side at the top of

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 27: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

243Klyosov DNA genealogy mutation rates etc Part II Walking the map

the tree the ldquorecent CMHrdquo (rCMH) is the more com-pact 17-haplotype branch on the right (between haplo-types 008 and 034) and the 9-haplotype ldquoolder CMHrdquo(oCMH) branch is on the left (between haplotypes 012and 026) The base haplotype for the ldquorecent CMHrdquo is

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - 12-14-16-17

and for the ldquoolder CMHrdquo

12-23-14-10-13- -11-16- -13-11- - -8-9-11-11--14- - -12-14-16-17

There are 9 mutations between these two base haplo-types as shown above in bold If we sum the distancesbetween them and their common ancestorrsquos haplotype

(using the mean values for the ancestors values) thereare 72 mutations between them This corresponds toabout 4650 years of a mutational difference betweenthem (that is a sum of the distances between them andTHEIR common ancestor) We will use this result later

The rCMH branch contains 41 mutations which givesonly 00965plusmn00015 mutations per marker on averageand corresponds to 1400plusmn260 years to a commonancestor The oCMH branch contains 36 mutationswhich gives 0160plusmn0027 mutations per marker on aver-age and corresponds to 2400plusmn470 years to a commonancestor

From the data obtained we can calculate that the com-mon ancestor of the two branches lived about4225plusmn520 ybp That is when a common ancestor of the

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 28: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

244

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 29: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

245Klyosov DNA genealogy mutation rates etc Part II Walking the map

ldquoCohen modal haplotyperdquo lived among the future Jew-ish community of Haplogroup J1 according to informa-tion stored in their 25-marker haplotypes

In order to study the question of the origin of the CMHfurther we examined 37-marker haplotypes 85 J1haplotypes having 37 markers and also the embeddedsix-marker CMH were collected from both Jewish andnon-Jewish descendants 33 of these haplotypes alsocontained 67 markers and will be discussed below

shows a 37-marker CMH haplotype tree Theleft-hand side has the ldquorecent CMHrdquo branch and theldquoolder CMHrdquo branch is located at the lower right

The ldquoolder CMHrdquo 22-haplotype branch contains 126mutations in just the first 25 markers and 243 mutationsusing all 37 markers which results in TSCAs of3575plusmn480 and 3525plusmn420 years from a common ances-tor respectively or on average 3525plusmn450 years

Similarly the ldquorecent CMHrdquo branch corresponds to975plusmn135 and 1175plusmn140 years to a common ancestorrespectively or on average 1075plusmn190 ybp

The most resolution comes from the 67-marker haplo-types but there are fewer of these The 33 haplotypesare shown as a 67-marker CMH tree in

The tree again splits into two quite distinct branches arecent one the 17-haplotype branch on the right and anolder one the 16-haplotype branch on the left Againthese are same two principal ldquoCMHrdquo branches eachone with its own common ancestor who lived about3000 years apart If we examine this smaller number ofhaplotypes (33) on both 25 markers and 37 markers weget results consistent with those obtained above with thelarger 85-haplotype set though naturally the confidenceintervals are somewhat larger A common ancestor ofthe ldquoolder CMHrdquo calculated from 25- and 37-markerhaplotypes lived 4150plusmn580 and 3850plusmn470 ybp onaverage 4000plusmn520 ybp A common ancestor of theldquorecent CMHrdquo also calculated from 25- and 37-markerhaplotypes lived 975plusmn205 and 1150plusmn180 years to acommon ancestor respectively on average 1050plusmn190ybp around the 9th to the 11th century This coincideswith the Khazarian times however it would be a stretchto make a claim related to this It could also havecorresponded with the time of Jewish Ashkenazi appear-ance in Europe

The base (ancestral) 67-marker haplotype of the ldquoolderCMHrdquo 16-haplotype branch ( ) is as follows

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

The 17-haplotype ldquorecent CMHrdquo branch has the follow-ing 67-marker base haplotype with differences in bold

12 23 14 10 13 11 16 13 11 30 17 8 9 11 1114 12 14 16 17 11 10 22 22 15 14 18 35

10 11 8 15 16 8 11 10 8 11 9 12 21 22 10 12 1215 8 12 21 13 12 14 12 12 12 11

According to data provided in databases two-thirds ofthe bearers of the ldquoolder CMHrdquo (10 individuals of the16) in the respective 67-marker branch did not claimJewish heritage They are descendants of people wholived in Italy Cuba Lebanon Puerto-Rico Spain Eng-land and France (Basque) That may explain why theldquoolder CMHrdquo haplotype differs in three alleles in thefirst 25 markers from the Jewish oCMH shown earlierOn the other hand 16 out of 17 haplotypes in the recentCMH branch claimed Jewish heritage and severalclaimed themselves to be descendants of Cohens Theirbase haplotype is identical with the Jewish rCMHshown earlier

To study this issue further three haplotypes of inhabit-ants of the Arabian Peninsula with typical Arabic namesand having the following 37-marker ldquoCMHrdquo haplo-types

12 23 14 10 14 17 11 16 12 12 11 29 17 8 9 11 11 2514 20 26 12 14 16 17 10 10 22 22 14 15 18 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 17 8 9 10 11 2514 19 30 13 13 13 16 11 9 19 20 16 13 16 17 33 3612 10

12 23 14 10 12 16 11 16 11 13 11 29 21 8 9 11 11 2614 20 26 12 14 15 16 10 10 20 22 14 14 17 18 32 3413 9

were added to the set of haplotypes shown in All three Arabic haplotypes joined the lower predomi-nantly non-Jewish branch on the right of After the addition of the Arab haplotypes all 25 of thehaplotypes in the branch contained 162 mutations on 25markers which gives 4125plusmn525 years to a commonancestor slightly older that age of the branch withoutthose three haplotypes This time period is close to thatof the legendary Biblical split into the Jewish and theArab lineages of the Abrahamic tribes although theapplicability of those events to the results of this studyare uncertain especially given that the uncertainty in thedating of Abraham is at least as large if not larger asthe uncertainties in our TSCA values

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 30: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

246

The following section demonstrates that the ldquoCMHrdquo infact appeared as long as 9000 ybp or earlier on theArabian Peninsula The above time spans of about4000plusmn520 or 4125plusmn525 (the ldquoolder CMHrdquo) and1050plusmn190 (the ldquorecent CMHrdquo) ybp was generated as aresult of migrations of haplotype bearers from the Ara-bian Peninsula to the Middle East and further to thenorth We can neither prove nor disprove as yet that theldquorecent CMHrdquo appeared in the Khazar Khaganate be-tween 9th and 11th centuries At any rate about athousand years ago the bearer of the base ldquorecentCMHrdquo became a common ancestor to perhaps millionsof present-day bearers of this lineage

Obviously the name ldquoCohen Modal Haplotyperdquo was amisleading one Though by the end of the 1990rsquos it hadcertainly attracted attention to DNA genealogy Even asa ldquomodalrdquo haplotype it is not exclusively associated witha Jewish population A haplotype tree of HaplogroupJ1 Arabs from the Arabian Peninsula is shown in

The tree is composed of 19 haplotypes eachcontaining 37 markers and belonging to Haplogroup J1listed in the Arabian Peninsula YDNA J1 Project (Al-Jasmi 2008)

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 31: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

247Klyosov DNA genealogy mutation rates etc Part II Walking the map

There are two branches of the tree one of which has theldquoCMHrdquo embedded and has the following ancestral(base) haplotype

12-23-14-10-12-18-11-16-11-13-11-30-18-8-9-11-11-25-14-20-26-12-13-15-16-10-10-19-22-15-14-18-18-33-35-12-10

There are 73 mutations in 25 markers on the six -haplo-type branch or 0487plusmn0057 mutations per marker onaverage or 9000plusmn1400 years to a common ancestor

The second branch a seven-haplotype branch () has a significantly ldquoyoungerrdquo common ancestor

since it is much less extended from the ldquotrunkrdquo of thetree It has the following base haplotype with markersdiffering from the older base haplotype shown in bold

12-23-14- -11- -11-13-11-30- -8-9-11-11--14-20- -12- -10-10- -22- -14-18-18-- - -10

The seven haplotypes contain 27 mutations in their first25 markers or 0154plusmn0030 mutations per marker and2300plusmn500 years to a common ancestor for this branch

These two ancestral haplotypes differ by ten mutationson the first 25 markers which corresponds approxi-mately to 7000 years between them The ldquoyoungerrdquocommon ancestor who lived 2300plusmn500 ybp is verylikely a direct descendant of the ldquoolderrdquo one who lived9000plusmn1400 ybp and had the ldquoCMHrdquo haplotype

It seems that the ldquoCohen Modal Haplotyperdquo likelybecame an ancestral haplotype for a significant fractionof the inhabitants of the Arabian Peninsula between7600 and 10400 ybp (the 95 confidence interval) andwas a common feature of the background populationfrom which both the Arabs and Jews arose When theancient Israelites became a distinct group the CMH wasquite naturally in this population though in a slightlydrifted form which gave rise to the Jewish ldquoolderCMHrdquo Still later when the Israelite priesthood was

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 32: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

248

established the CMH was again available for inclusionthough whether it actually was included from the begin-ning or not is still uncertain No one knows exactlywhen or how the priesthood was formedndashwhether itindeed started with one man according to tradition orhow it expanded whether only by patrilineal descentfrom the founder or as seems more likely given thepresent-day variety of haplogroups within the Cohanimby a variety of inclusion mechanisms By around the 7th

century CE the ldquorecent CMHrdquo split from the ldquoolderCMHrdquo and became the ancestral haplotype for a sepa-rate albeit recent Jewish lineage within Haplogroup J1If we consider only ldquorCMHrdquo haplotypes within thisCohanim population a common ancestor can be identi-fied who lived 1050plusmn190 ybp

The Jewish ldquoCohen Modal Haplotyperdquo of HaplogroupJ2 that is J2 haplotypes with the six-marker CMHembedded represents a rather compact group of haplo-types with a recent ancestor who lived in about the 7th

century CE (see below) As it was indicated above this

ldquoJ2-CMHrdquo is unlikely to be associated with the Co-hanim and represents just a string of alleles accidentallyincluding the 14-16-23-10-11-12 sequence That doesnot mean however that no Cohanim are in J2--there aresome just not this group

This compact group of ten J2 haplotypes located ratherclose to the trunk of the tree (indicating young age) asshown in represents the J2-CMH Their37-marker base haplotype is as follows

12-23-14-10-13-17-11-16-11-13-11-30-18-9-9-11-11-26-15-20-29-12-14-15-16-10-11-19-22-15-13-19-17-35-39-12-9

The ten haplotypes contain 25 and 44 mutations in 25and 37 markers respectively This gives 1450plusmn320 and1300plusmn230 years to a common ancestor or 1375plusmn300ybp when averaged So the common ancestor livedaround the 6th to the 8th centuries CE

The J2-CMH base haplotype differs from the J1ldquorecentrdquo and ldquoolderrdquo base (ancestral) haplotypes by 29

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 33: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

249Klyosov DNA genealogy mutation rates etc Part II Walking the map

and 25 mutations respectively in their 37-marker hap-lotypes This corresponds to about 11800 and 9600years of mutational difference respectively Clearly J1-and J2-CMH represent quite distant lineages After allthey belong to two different haplogroups

According to old records the Roma arrived in Bulgariaduring the Middle Ages Haplotypes of Bulgarian Romahave been compiled from the testing of 179 males from12 local tribes (Zhivotovski et al 2004) All of thehaplotypes were similar and apparently originated fromthe same rather recent common ancestor It seems thata very narrow circle of Roma perhaps a single tribecame to Bulgaria some 500-700 years ago Descendantsof other unrelated tribes if included apparently did notsurvive It cannot be excluded from consideration thata few close relatives rather than a single founder werethe patriarchs of the tribe that survived

Let us consider both the 6- and 8-marker haplotypes inorder to understand how an elongation of haplotypescan affect data

The most numerous tribe the Rudari had the followingsix-marker base haplotype which was represented by 62identical haplotypes out of the total of 67 haplotypesfrom the tribe

15-12-22-10-11-12

The same base haplotype was represented in 12 of 13members tested from the Kalderash tribe in 9 of 26members of the Lom tribe in 4 of 4 members of theTorgovzi (ldquoTradersrdquo) tribe in 20 of 29 from the Kalai-djii tribe and in 12 of 19 from the Musicians tribeOther haplotypes also contained very few mutations Itis obvious that the ancestral haplotype was ratherldquoyoungrdquo no older than several hundred ybp

Overall all 179 haplotypes of Bulgarian Roma con-tained 146 identical (base) six-marker haplotypes and34 mutations compared to the base haplotype

Considering remaining base haplotypes this givesln(179146)00088 = 23plusmn3 generations or 575plusmn75 yearsto the common ancestor for all 179 members of all the12 tribes Considering mutations this gives3417900088 = 22plusmn4 generations or 550plusmn100 years

This represents a typical fit between the TSCAs obtainedfrom the two quite different methods of calculationswhere there is a well defined cluster descended from onecommon ancestor and again shows that there is a fun-damental basis for such fit This basis was explained in

detail in the companion paper (Part I) ldquoFormalrdquo calcu-lations of the standard deviation will give about plusmn20for the 95 confidence interval for the second resultabove that is 550plusmn110 years to the common ancestorand about the same margin of error for the first result

Thus the Roma show a straightforward uncomplicatedDNA genealogy family tree Incidentally these haplo-types belong to Haplogroup H1 which is very commonin India Beyond India it is found mainly among theRoma The base six-marker haplotype for H1 in Indiais exactly the same as that for the Roma shown aboveHowever its common ancestor in India lived severalthousand years ago For example the Indians of theKoya tribe have a base haplotype 15-12-22-10-11-12with a common ancestor of 2575plusmn825 ybp and theIndians of the Koraja tribe have a base haplotype X-Y-22-10-Z-12 with a common ancestor of 2450plusmn810ybp In general an ldquoagerdquo of Haplogroup H1 in Indiadetermined in several datasets is 3975plusmn450 ybp(Klyosov to be published)

In the 8-marker format the base haplotype for theBulgarian Roma is as follows

15-12-22-10-11-12 -- 14-16

in which the last two are markers DYS389-1 and 389-2Considering eight markers adds 25 mutations to the 179haplotypes bringing the total to 59 mutations Thisgives 591798000163 = 25plusmn4 generations to a com-mon ancestor that is 625plusmn100 ybp It is essentially thesame as that for the six-marker haplotypes(3417900088 = 22plusmn4 generations to a common ances-tor that is 550plusmn100 ybp

The 8-marker dataset has 20 fewer base haplotypescompared to those of the six-marker series that is 126This results in ln(179126)0013 = 27plusmn4 generations toa common ancestor or 675plusmn100 ybp This is within themargin of error with the result for the six-marker haplo-types As one can see 675plusmn100 575plusmn75 and 550plusmn100ybp fit each other quite satisfactorily

Another example of a population that apparently repre-sents the Roma is presented by a series of 34 haplotypesfrom Croatia (Barac et al 2003a 2003b Pericic et al2005) The haplotype tree composed of those haplo-types is shown in

The referenced articles did not specify the origin or theethnic features of the tested individuals however agroup of H1 bearers in Croatia will most likely be theRoma This guess was further supported by the TSCAestimate as follows

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 34: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

250

Sixteen haplotypes representing nearly half of the hap-lotypes were identical base ancestral haplotypes

12-22-15-10-15-17-X-Y-Z-14-11-29

Using the logarithmic method we obtainln(3416)0017 = 44 generations to a common ancestor(without a correction for back mutations) The linearmethod gives 24340017 = 42 generations (without acorrection) Clearly all the 34 individuals have a singlecommon ancestor who lived 45plusmn10 generations ago(with a correction for back mutations) or 1125plusmn250years ago between the 7th and 12th century CE

The Polynesians such as the Maoris Cook IslandersSamoans often have Haplogroup C2 In a publishedstudy (Zhivotovski et al 2004) 37 ten-marker haplo-types were determined in these three populations andthe base haplotype for all of them follows (in the FTD-NA format plus DYSA72 that is DYS461)

14-20-16-10-X-X-X-15-13-12-12-30 -- 9

It is not clear from the tree whether it is derived fromone or more of common ancestors Let us employ againthe ldquofitrdquo criterion of the logarithmic and the linearmethods

There were eight base haplotypes among the 37 haplo-types total and 49 mutations in the whole set withrespect to the base haplotype The average mutationrate for this haplotype equals to 00018 per marker pergeneration of 25 years (Klyosov 2009 the companionarticle) Therefore a common ancestor of all the 37individuals lived 49371000018 = 74 generations(without a correction for back mutations) or 80 gener-ations with the correction which results in 2000plusmn350ybp Eight base haplotypes (on top of the tree in

) give ln(378)0018 = 85 generations (without thecorrection) or 93 generations (with the correction)which equates to 2325 ybp This is within the margin orerror The 16 difference albeit not a dramatic onemight indicate that there was a slight admixture in thedataset by one or more extraneous haplotypes

Incidentally the theorized Polynesian expansion timecited by Zhivotovski (2004) was between 650 and 1200

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 35: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

251Klyosov DNA genealogy mutation rates etc Part II Walking the map

years ago Either that timespan is underestimated or theDNA-genealogy points at arrival of not one but a num-ber of travelers to the Polynesian islands They mighthave ldquobroughtrdquo an earlier common ancestor in theDNA

Lemba is a South African tribe whose people live inLimpopo Province in Zimbabwe in Malawi and inMozambique A list of 136 Lemba haplotypes waspublished by Thomas et al (2000) and the authorsnoted that some Lemba belong to the CMH Jewishlineage We will demonstrate that it is very unlikelyThomas did not publish the haplogroups for any of theindividuals only their six-marker STR haplotypes

Of the 136 Lemba individuals 41 had typical ldquoBanturdquohaplotypes belonging to Haplogroup E3a (by theauthorrsquos definition) with a base haplotype

15-12-21-10-11-13

The 41 haplotypes contain 91 mutations from the abovehaplotype that is 0370plusmn0039 mutations per marker or8300plusmn1200 years from a common ancestor

Another 23 Lemba who were tested had the followingbase haplotype

14-12-23-10-15-14

All 23 haplotypes had only 16 mutations which gives2150plusmn580 years to a common ancestor for these indi-viduals

There were a few scattered Lemba haplotypes apparent-ly from different unidentified haplogroups and finallythere were 57 haplotypes of apparently Haplogroup Jwhich in turn split into three different branches asshown in Three base haplotypes one foreach branch are shown below

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 36: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

252

14-16-24-10-13-12 14-15-24-10-11-12 14-16-23-10-11-12

The first one represents 16 identical haplotypes (theupper right area in ) which obviously camefrom a very recent common ancestor As one can seefrom the haplotype tree none of these haplotypes ismutated Its common ancestor should have lived nomore than a few centuries ago

The second one being a base haplotype for a 26-haplo-type branch on the left-hand side in is a rathercommon haplotype in the Arabic world and belongslikely to Haplogroups J2 but possibly to J1 The branchcontains 21 mutations which gives 2550plusmn610 years toa common ancestor who most likely lived in the firstmillennium BCE It is clearly not the ldquoCohen ModalHaplotyperdquo and differs from the third haplotype by twomutations which in the six-marker format correspondsto about 7300 years

The third base haplotype which is the CMH in itssix-marker format supports a branch of 15 haplotypeson the lower right-hand side of Twelve ofthose CMH haplotypes are identical to each other andform a flat branch There are no mutations in them andthey must have come from a very recent ancestor of onlya few centuries ago From a fraction of the base haplo-type their common ancestor lived only ln(1512)00088 = 25 generations ago or about 625plusmn200ybp around the 14th century

The three mutated haplotypes in this series are quitedifferent from the CMH and apparently do not belongto the same group of haplotypes All of them have twoor four mutations from the CMH

14-23-14-10-11-12 14-23-14-10-11-12 16-24-14-10-11-12

Unfortunately more extended haplotypes are not avail-able It is very likely that they are rather typical mutatedArabic haplotypes Besides it is not known with cer-tainty to which haplogroup they belong J1 or J2though the CMH is the modal haplotype for J1

Obviously to call the Lemba haplotypes the ldquoCohenhaplotyperdquo is a huge stretch in regard to the Cohenreference They could have been Jewish and originatedjust a few centuries ago or they could have been ArabicSince the CMH on six markers is the modal haplotypefor all of Haplogroup J1 automatically attributing agroup of CMH haplotypes to the Cohanim is mislead-ing Hence the so-called ldquoCohen Modal Haplotyperdquo inthe ldquoBlack Jews of Southern Africardquo has nothing to do

with an ancient history of either the Lemba or the Jewishpeople It is a rather recent development

The present article (Part II) and the companion article(Part I) present a rather simple and self-consistent ap-proach to dating both recent and distant common ances-tors based upon appearance of haplotype trees Itprovides an approach to a verification of haplotype setsin terms of a singular common ancestor for the set or todetermining that they represent a multiplicity of com-mon ancestors (through use of the ldquologarithmic meth-odrdquo) as well as a way to calculate a time span to acommon ancestor corrected for reverse mutations andasymmetry of mutations Obviously time spans tocommon ancestors refer to those ancestors whose de-scendants survived and present their haplotypes andhaplogroups for analysis in the present Naturally theirtribes or clans could have appeared earlier since it isvery likely that in many cases offspring andor descen-dants did not survive

Amazing as it is our history is written in the DNA ofeach of us the survivors The ldquowritingrdquo represents justa ldquoscribble on the cuffrdquo of the DNA though there is adeeper meaning to these scribbles If we look at them fora single individual without comparisons to others theydo not say much They represent just a string of num-bers However when compared with those in otherpeople these scribbles start to tell a story These collec-tive stories are about origins of mankind appearances oftribes their migrations about our ancestors and theircontributions to current populations This study isintended to contribute quantitative methods for studyingthe field of DNA genealogy

I am indebted to Dr A Aburto for providing a series ofthe ldquoCohen Modal Haplotypesrdquo to Dr AH Bittles forproviding a series of haplotypes from China to TheresaM Wubben for valuable discussions

Arabian Peninsula J1 ProjectwwwfamilytreednacompublicArabian_YDNA_J1_Pr

ojectdefaultaspx

Ashina ProjecthttpwwwfamilytreednacompublicAshinaRoyalDynasty

Basque DNA ProjecthttpwwwfamilytreednacompublicBasqueDNA

Haplogroup Q Projecthttpm242haplogrouporg

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 37: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

253Klyosov DNA genealogy mutation rates etc Part II Walking the map

Bradley B Stanford D (2004) The North Atlantic ice-edge corridor a possible Paleolithic route to the NewWorld 36459-478

Cadenas AM Zhivotovsky LA Cavalli-Sforza LL Un-derhill PA Herrera RJ (2008) Y-chromosome diversitycharacterizes the Gulf of Oman 18374-386

Campbell KD (2007) Geographic patterns of haplo-group R1b in the Brutish Isles 31-13

Cervantes A (2008) Basque DNA Project See

Fagundes NJR Kanitz R Eckert R Valls ACS BogoMR Salzano FM Smith DG Silva WA Zago MARibeiro-dos-Santos et al (2008) Mitochondrial popu-lation genomics supports a single pre-Clovis origin witha coastal route for the peopling of the Americas

82583-592

Fornarino S Pala M Battaglia V Maranta R AchilliA Modiano G Torroni A Semino O Santachiara-Benerecetti SA (2009) Mitochondrial and Y-chromo-some diversity of the Tharus (Nepal) a reservoir ofgenetic variation 20099154

Haak W Brandt G de Jong HN Meyer C GanslmeierR Heyd V Hawkesworth C Pike AWG Meller H AltKW (2008) Ancient DNA strontium isotopes andosteological analyses shed light on social and kinshiporganization of the later Stone Age

10518226-18231

Haynes G (2002) Cambridge University Press

p 52

Karlsson AO Wallerstroumlm T Goumltherstroumlm A Holmlund G(2006) Y-chromosome diversity in Sweden ndash A long-timeperspective 14863-970

Kerchner C (2008) R1b (U152+) project See

Keyser C Bouakaze C Crubezy E Nikolaev VG Mon-tagnon D Reis T Ludes B (2009) Ancient DNAprovides new insights into the history of south SiberianKurgan people published online 16 May2009

Kivisild T Rootsi S Metspalu M Mastana S KaldmaK Parik J Metspalu E Adojaan M Tolk HV StepanovV et al (2003) The genetic heritage of the earliestsettlers persists both in Indian tribal and caste popula-tions 72313-332

India (regional) DNA ProjecthttpwwwfamilytreednacompublicIndiadefaultaspx

ISOGG 2009 Y-DNA Haplogroup Q and its SubcladeshttpwwwisoggorgtreeISOGG_HapgrpQ09html

R-312 and Subclades Projecthttpwwwfamilytreednacompublicatlantic-

r1b1cdefaultaspx

R-L21 ProjecthttpwwwfamilytreednacompublicR-

L21defaultaspxpublicwebsiteaspx

R1b (U152+) Projecthttpwwwfamilytreednacom

publicR1b1c10defaultaspx

R-M222 ProjecthttpwwwfamilytreednacompublicR1b1c7defaulta

spx

Adamov DS Klyosov AA (2008b) Evaluation of anage of populations from Y chromosome using meth-ods of average square distance (in Russian)

(ISSN 1942-7484) 1855-905

Al-Jasmi J (2008) The Arabian Y-DNA J1 Project See

Bara L Peri i M Klari IM Jani ijevi B Parik JRootsi S Rudan P (2003a) Y chromosome STRs inCroatians 138127-133

Bara L Peri i M Klari IM Rootsi S Jani ijevi B KivisildT Parik J Rudan I Villems R Rudan P (2003) Y-chromo-somal heritage of Croatian population and its island isolates

Bittles AH Black ML Wang W (2007) Physical anthro-pology and ethnicity in Asia the transition from anthro-pology to genome-based studies 2677-82

Bortolini M-C Salzano FM Thomas MG Stuart SNasanen SPK Bau CHD Hutz MH Layrisse Z Petzl-Erler ML Tsuneto LT et al (2003) Y-chromosomeevidence for differing ancient demographic histories inthe Americas 73524-539

Bouakaze C Keyser C Amory S Crubezy E (2007) Firstsuccessful assay of Y-SNP typing by SNaPshot minise-quencing on ancient DNA 121493-499

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 38: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

254

Klyosov AA (2008a) The features of the West Europe-an R1b haplogroup (in Russian)

(ISSN 1942-7484) 1568-629

Klyosov AA (2008b) Where is the origin of the Slavsand Indo-Europeans DNA genealogy gives an answer(in Russian) (ISSN1942-7484) 1400-477

Klyosov AA (2008b) Origin of the Jews via DNAgenealogy (ISSN1942-7484) 154-232

Klyosov AA (2009a) Iberian haplotypes and analysisof history of populations of Basques Sephards and othergroups in the Iberian peninsula

(ISSN 1942-7484) 2390-421

Klyosov AA (2009b) Haplotypes of Eastern Slavsnine tribes (ISSN1942-7484) 2232-251

Lepper BT (1999) Pleistocene peoples of midcontinen-tal North America in Bonnichsen R and Turnmire Keds Oregon StateUniversity Press pp 362-394

McEvoy B Simms K Bradley DG (2008) Genetic investigation of thepatrilineal kinship structure of early Medieval Ireland

136415-22

McEvoy B Bradley DG (2006) Y-chromosomes andthe extent of patrilineal ancestry in Irish surnames

119212-219

Mertens G (2007) Y-Haplogroup frequencies in theFlemish population 319-25

Peri i M Lauc LB Klaric IM Rootsi S Janicijevic BRudan I Terzic R Colak I Kvesic A Popovic D SijackiA Behluli I Thorthevic D Efremovska L Bajec EDStefanovic BD Villems R and Rudan P (2005) Highresolution phylogenetic analysis of southeastern Europe(SEE) traces major episodes of paternal gene flow amongSlavic populations 221964ndash1975

Roewer L Willuweit S Kruumlger C Nagy M RychkovS Morozowa I Naumova O Schneider Y Zhukova OStoneking M Nasidze I (2008) Analysis of Y chromo-some STR haplotypes in the European part of Russiareveals high diversities but non-significant genetic dis-tances between populations 122219-23

Rutledge C (2009) India (regional) DNA Project(including Pakistan Bangladesh Nepal Bhutan) See

Sahoo S Singh A Himabindu G Banerjee J SitalaximiT Gaikwad S Trivedi R Endicott P Kivisild T Metspa-lu M et al (2006) A prehistory of Indian Y chromo-somes evaluating demic diffusion scenarios

103843-848

Seielstad M Yuldasheva N Singh N Underhill P Oef-ner P Shen P Wells RS (2003) A novel Y-chromosomevariant puts an upper limit on the timing of first entryinto the Americas 73700-705

Semino O Passarino G Oefner PJ Lin AA Arbuzova SBeckman LE De Benedictis G Francalacci P Kouvatsi ALimborska S Marcikiae M Mika A Mika B Primorac DSantachiara-Benerecetti AS Cavalli-Sforza LL Underhill PA(2000) The genetic legacy of paleolithicin extant Europeans A Y chromosome perspective

Sengupta S Zhivotovsky LA King R Mehdi SQ EdmondsCA Chow CT Lin AA Mitra M Sil SK Ramesh A RaniMVU Thakur CM Cavalli-Sforza LL Majumder PP Under-hill PA (2006) Polarity and Temporality of High-ResolutionY-Chromosome Distributions in India Identify BothIndigenous and Exogenous Expansions and Reveal MinorGenetic Influence of Central Asian Pastoralists

78202-221

Sharma S Rai E Sharma P Jena M Singh S DarvishiK Bhat AK Bhanwer AJS Tiwari PK Bamezai RNK(2009) The Indian origin of paternal haplogroup R1a1substantiates the autochthonous origin of Brahmins andthe caste system 5447-55

Stevens R Tilroe V (2009) R-P312 and subcladesproject (October 2009) See

Stevens R Secher B Walsh M (2009) R-L21 project(October 2009) See

Thanseem I Thangaraj K Chaubey G Singh VKBhaskar LV Reddy MB Reddy AG Singh L (2006)Genetic affinities among the lower castes and tribalgroups of India Inference from Y chromosome andmitochondrial DNA 2006 Aug 77(1)42

Thomas MG Skorecki K Ben-Ami H Parfitt T Brad-man N Goldstein DB (1998) Origins of Old Testamentpriests 394138-140

Thomas MG Parfitt T Weiss DA Skorecki K WilsonJF le Roux M Bradman N Goldstein DB (2000) YChromosomes traveling South the Cohen Modal Hap-lotype and the origin of the Lemba - the Black Jews ofSouthern Africa 66674-686

Underhill PA Myres NM Rootsi S et al (2009)Separating the post-Glacial coancestry of European and

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61

Page 39: DNA Genealogy, Mutation Rates, And Some Historical Evidence Written in Y-Chromosome, Part II:  Walking the Map

255Klyosov DNA genealogy mutation rates etc Part II Walking the map

Asian Y chromosomes within Haplogroup R1a On-line publication doi

101038ejhg2009194

Wells RS Yuldasheva N Ruzibakiev R Underhill PA EvseevaI Blue-Smith J Jin L Su B Pitchappan R ShanmugalakshmiS Balakrishnan K Read M Pearson NM Zerjal T WebsterMT Zholoshvili I Jamarjashvili E Gambarov S Nikbin BDostiev A Aknazarov O Zalloua P Tsoy I Kitaev M Mirra-khimov M Chariev A Bodmer WF (2001) The Eurasianheartland a continental perspective on Y-chromosome diversi-ty

Weston D Corbett G Maddi M (2009) YDNA Haplo-group R1b-U106S21+ Research Group See

Wilson D McLaughlin J (2009) R-M222 haplogroupproject See

Zhivotovsky LA Underhill PA Cinnoglu C Kayser MMorar B Kivisild T Scozzari R Cruciani F Destro-Bisol G Spedini G et al (2004) The effective mutationrate at Y chromosome short tandem repeats with appli-cation to human population-divergence time

7450-61