34
2/6/01 Historical inference from linguistic and genetic data Potentially “…the best evidence of the derivation of … the human race” (Thomas Jefferson) BUT Inferences are complex methods and results from several disciplines Intellectual stakes are high Work has often been careless sometimes spectacularly so dangers of overinterpretation and “scientism”

Historical inference from linguistic and genetic data

  • Upload
    romeo

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Historical inference from linguistic and genetic data. Potentially “…the best evidence of the derivation of … the human race” (Thomas Jefferson) BUT Inferences are complex methods and results from several disciplines Intellectual stakes are high Work has often been careless - PowerPoint PPT Presentation

Citation preview

Page 1: Historical inference from linguistic and genetic data

2/6/01

Historical inferencefrom linguistic and genetic data

Potentially “…the best evidence of the derivation of … the human race” (Thomas Jefferson)

BUTInferences are complex

methods and results from several disciplines

Intellectual stakes are highWork has often been careless

sometimes spectacularly sodangers of overinterpretation and “scientism”

Page 2: Historical inference from linguistic and genetic data

2/6/01

General methodological problems

• Not all graphs are trees– “treeness” tests often left out– “treeness” hypothesis can often be rejected

• Tree inference may be underdetermined– Branching structure– Root choice

• Rates of change may not be constant– for different markers– across time

• Gene trees (and language trees) may not be population trees• Biology and language are complicated

– simplifying assumptions are sometimes perniciously mistaken

Page 3: Historical inference from linguistic and genetic data

2/6/01

Trees vs. Clines (etc.)

• A tree structure represents the results of a sequence of splits in population (or language)– no further influences among separate branches– if rates of change are constant, distances should

be quantized

• Within an interbreeding (intercommunicating) population, distances reflect the amount of gene flow (transmission of linguistic traits)– should correlate strongly with accessibility– e.g. geographical distance in the simplest case

Page 4: Historical inference from linguistic and genetic data

2/6/01

Page 5: Historical inference from linguistic and genetic data

2/6/01

The… procedures outlined here provide a rigorous method for inferring whether the geographical pattern of variation is consistent with an historical split (fragmentation) or no split(recurrent gene flow) using criteria that are completley explicit. For example, in analyzing the mtDNA of tiger salamanders, a clear split into eastern and western lineages was detected for mtDNA. Using the same explicit criteria, there was no split among any human populations. Quite the contrary, the present analysis documents recurrent and continual genetic interchange among all Old World human populations throughout the entire time period marked by mt DNA. Accordingly, estimating a date for a 'split' of Africans from non-Africans based on evidnece from mtDNA is certainly allowed by many computer programs, but the results are meaningless because a date is being assigned to an 'event' that never occurred.

Templeton (1997)

Page 6: Historical inference from linguistic and genetic data

2/6/01

Methods for tree inference(“phylogeny”)

• Two general approaches– clustering (easier but cruder)– generate and evaluate alternative trees

• Distance-based methods– based on matrix of distances/similarities

• Parsimony– based on set of partly-shared characters or traits

http://evolution.genetics.washington.edu/phylip/software.html documents 193 different phylogeny packages

Page 7: Historical inference from linguistic and genetic data

2/6/01

Cognate percentagesfor 8 Vanuatu languages

Toga 64 Mosina 64 58 Peterara 57 51 65 Nduindui 29 28 34 32 Sakao 51 45 55 52 40 Malo 39 39 45 41 43 50 Fortsenal 52 48 57 60 31 48 45 Raga

Data from Guy (1994)

Page 8: Historical inference from linguistic and genetic data

2/6/01

Reconstruction Algorithm(Guy 1994)

“A message is input at the root of a tree-shaped transmission network, whence it is transmitted to the terminal nodes. As they travel, copies of the original message are affected by errors consisting in randomly selected segments of the message being replaced by other segments randomly drawn from a pool of possible segments (the "alphabet“ of the message). The problem is: from the garbled versions of the original message collected at the terminal nodes, reconstruct the network and the history of the transmission of the message.”

“Additive-distance” tree with weights on branches ratherthan on nodes -- doesn’t assume constant rate of change…

Page 9: Historical inference from linguistic and genetic data

2/6/01

Explanatory force of the model

• Set of distances grows as

• Set of binary-tree branch labels grows as

• For 8 languages: we predict 28 numbers (the inter-language cognate proportions) with 14 numbers (the binary tree branch proportions)

2

2 NN

)1(2 N

Page 10: Historical inference from linguistic and genetic data

2/6/01

Inferred tree  Toga -830-----:-919-----:-972-----:-947-----: Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----'Fortsenal -----759-----------' | Malo ----------772----------------' 

from Guy (1994)

Mosina/Toga: .77*.83 = .6391 (really 64%)Peterara/Mosina: .829*.919*.77 = .5866 (really 58%) Peterara/Toga: .829*.919*.830 = .6323 (really 64%)

Page 11: Historical inference from linguistic and genetic data

2/6/01

True - predictedcognate percentages

  Toga 0 Mosina 1 -1 Peterara 1 -1 4 Nduindui -2 -1 0 0 Sakao 2 0 2 3 1 Malo -3 0 -1 -2 0 -2 Fortsenal -1 -1 -1 0 1 1 4 Raga 

The model fits very well!

Page 12: Historical inference from linguistic and genetic data

2/6/01

Where’s the root?

  Toga -830-----:-919-----:-972-----:-947-----:--Protolanguage Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----'Fortsenal -----759-----------' | Malo ----------772----------------' 

Isn’t it obvious?

Page 13: Historical inference from linguistic and genetic data

2/6/01

Oops: other options

  Toga -830-----:-919-----:-972-----:-947-----: Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----'Fortsenal -----759-----------' | Malo ----------772----------------' 

protolanguage

Page 14: Historical inference from linguistic and genetic data

2/6/01

And some more…

  Toga -830-:-919-:-972-:-947-:-895-:-883-:-567- Sakao Mosina -770-' | | | `-759- Fortsenal Peterara -----829---' | `---772----- Malo Nduindui -----795---:-949-' Raga -----755---'

protolanguage

In the absence of other constraints, the root can be placed anywherein the tree without changing the model’s fit!

Page 15: Historical inference from linguistic and genetic data

2/6/01

Possible “other constraints”

• Historical evidence– about earlier forms– about structure of relationships among

contemporary forms• “outgroup”

• Constraints on rate of change– linguistic (or genetic) “clock”

Page 16: Historical inference from linguistic and genetic data

2/6/01

A universal constantfor glottochronology?

Thirteen sets of data, presented in partial justification of these assumptions, serve as a basis for calculating a universal constant to express the average rate of retention k of the basic-root morphemes: k = 0.8048 ± 0.0176 per millennium, with a confidence limit of 90%.

Lees (1953)

Page 17: Historical inference from linguistic and genetic data

2/6/01

Language Years Words Cognates Rate(per millenium)

English 1000 209 160 .766

Latin/Spanish 1800 200 131 .790

Latin/French 1850 200 125 .776

German 1100 214 180 .854

Middle Egyptian/Coptic

2200 200 106 .760

Greek 2070 213 147 .836

Chinese 1000 210 167 .795

Swedish 1050 207 176 .853

Some of Lees’ data:

Page 18: Historical inference from linguistic and genetic data

2/6/01

Some more retentive languages(rates per 1000 years)

Language 100-word list 200-word list

Icelandic (rural) 99% 97.6%

Icelandic (urban) 98% 96.2%

Georgian 96.5% 89.9%

Amenian 97.8% 94%

Bergsland & Vogt (1962)

Page 19: Historical inference from linguistic and genetic data

2/6/01

David Lithgow (pers. com. circa 1970) has observed a replacement of some 20% of the basic vocabulary in Muyuw (Woodlark island) in one generation. Raise 0.8 to the 33rd power, and that gives you the retention rate of Muyuw per 1000 years should it continue to evolve at that rate: 0.06%.

Jacques Guy (1994)

Some less retentive ones

Bergsland & Vogt estimate of vocabulary retention in East Greenlandic as .722 in 600 years, or .34 per millenium.

Page 20: Historical inference from linguistic and genetic data

2/6/01

“Language chains” A .77 B .65 .76 C

Configurations like this are taken as prima facie evidence of“non-treeness”, to be attributed to borrowing/mixing/clinetypes of situations. But in fact they can also easily be generatedby variable rates of change:

A ----------- 90% -----------. |____ protolanguage B ---- 95% ----. | |---- 90% ----' C ---- 80% ----'

Note that the required difference in mean rate of changeis only (.9-.9*.8)/.9 = .2 , or 20%

Page 21: Historical inference from linguistic and genetic data

2/6/01

Mitochondrial Genome

Page 22: Historical inference from linguistic and genetic data

2/6/01

Mitochondrial family tree

Page 23: Historical inference from linguistic and genetic data

2/6/01

Mitochondrial phylogeny

Page 24: Historical inference from linguistic and genetic data

2/6/01

Three fascinating “results”

• Mitochrondrial Eve

• Mitochrondial Clans

• The three-wave theory: converging linguistic and genetic evidence

Page 25: Historical inference from linguistic and genetic data

2/6/01

Mitochondrial Eve

Cann, Stoneking, and Wilson (1987):

mtDNA comparisons of 147 people from Europe, Africa, Asia, Australia, and new Guinea show that all present human mtDNA is descended from a single African woman who lived about 200,000 years ago.

Page 26: Historical inference from linguistic and genetic data

2/6/01

First problem

• Computer program was used to find a tree consistent with the mtDNA data

• But so were many other (unreported) trees!– order of answers depended on order of data– root could be effectively anywhere in the

dataset• e.g. Melanesian Eve, Asian Eve, European Eve…

Page 27: Historical inference from linguistic and genetic data

2/6/01

Other problems

• mtDNA may not change at a constant rate

• mtDNA changes may be adaptive

• Gene trees may not be population trees– DNA (including mtDNA) can spread by

gradual flow or by range expansion– spread can be influenced by other factors

Page 28: Historical inference from linguistic and genetic data

2/6/01

Early results: Native Americans come from four genetic lineages, labeled A through D. Amerinds have all four lineages, NaDene only A, and Eskaleuts A and D.

Current results:The four mtDNA lineages divide into nine distinct genetic subtypes. All four lineages are in all three language groups. Many local populations have all four lineages and a number even have all the subtypes. All subtypes can be found in North, Central and South America.

“It isn't realistic to believe that the same lineages ended up in all these populations across two continents by separate migrations."

Page 29: Historical inference from linguistic and genetic data

2/6/01

http://www.oxfordancestors.com/:

Oxford Ancestors

We put the Genes in Genealogy

Oxford Ancestors is the World's first organization to harness the power and precision of modern DNA- based genetics in the service of genealogy.

MatriLine™ interprets your deep maternal ancestry, linking you - if your roots are in Europe - to one of seven women: Ursula, Tara, Helena, Katrine, Velda, Xenia or Jasmine.

Page 30: Historical inference from linguistic and genetic data

2/6/01

Page 31: Historical inference from linguistic and genetic data

2/6/01

And MtDNA inheritance may not even be entirely clonal!

• Mice– demonstration of “paternal leakage”

• Hagelberg – rare mtDNA mutation in Vanuatu

• Erye-Walker – statistics of mtDNA “homoplasies”

Page 32: Historical inference from linguistic and genetic data

2/6/01

Island evidence

• Erika Hagelberg (Proc. R. Soc. 1999)– Island of Nguna (Vanuatu, Melanesia)– 3 main MtDNA population groups

• as expected for the region

– In all three groups, the same mutation is sometimes found

• previously known only from one Northern European

– Repeated chance mutation is unlikely• local spread by recombination seems more probable

Page 33: Historical inference from linguistic and genetic data

2/6/01

Statistics of mtDNA “homoplasies”• Mutations that occur in different mtDNA

haplogroups around the world• Assuming purely maternal inheritance, these

were thought to represent chance recurrence of mutations in “hypervariable” regions

• Eyre-Walker et al. (Proc. R. Soc. 1999):– regions are not statistically more variable than others– mutations cluster geographically

• MacCauley (1999) counters– much of the result comes from a dataset that may be

errorful– “no need to panic”

Page 34: Historical inference from linguistic and genetic data

2/6/01

Reaction of another mtDNA afficionado:

…I am reminded of a comment by a bishop’s wifein Victorian England, also concerning human origins: “Let us hope that it isn’t true, and if it is, that it willnot become generally known.”