Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Kinship structures create persistent channels forlanguage transmissionLansing et al.
Supporting Information (SI)
Genetic data. The sample dataset comprises 982 men from the eastern Indonesian islands of Sumba (n “ 505) and Timor(n “ 477), with associated genetic, cultural and linguistic information. Of the 25 villages in this study, 14 are on Sumba and 11on Timor. All Sumba villages are patrilocal. Of the 11 Timor villages, two follow patrilocal kinship practice (Fatukati andUmaklaran), while the remainder are matrilocal. The diversity of mtDNA and Y chromosome haplogroups found in eachsampled village is shown in Fig. S1. The genetic trees were constructed using the method described below, which is freelyavailable as an open source R application at https://carpntrees.shinyapps.io/shinyapp/ [last accessed August 2017].
125° E
9.5° S
8.5° S
9.5° S
8.5° S
119° E 120° E
8° S
9° S
10° S
120° E
TimorSumba
124° E 128° E
Fatukati
Umaklaran
Kateri
Wehali
Kakaniuk
Wehali
Umanen
LawaluBesikama
Tailai
Rainmanawe
Wehali
Kamanasa
Wehali
Laran
Kletek
Bilur Pangadu
Waimangura
Bukambero
Kodi
Lamboya
Mahu
Wanokaka
Mbatakapidu
Loli
Rindi
Praibakul
Anakalang
Mamboro
Sumba mtDNA Haplogroup
125° E
9.5° S
8.5° S
9.5° S
8.5° S
119° E 120° E
Fatukati
Umaklaran
Kateri
Wehali
Kakaniuk
Wehali
Umanen
LawaluBesikama
Tailai
Rainmanawe
Wehali
Kamanasa
Wehali
Laran
Kletek
Bilur Pangadu
Waimangura
Bukambero
Kodi
Lamboya
Mahu
Wanokaka
Mbatakapidu
Loli
Praibakul
Wunga
Anakalang
Mamboro
Sumba Y-STR Haplogroup Timor Y-STR Haplogroup
S−M254C−M38O−M110O−M119K−M526
K−M526xO−M95M−P34O−P201F−P14
O−M122O−P203O−P201xxC−M216S−M230(xM254)
D5B5aB5b1B4cM7bEM(xD,Q)BMF1ac(xF1a1_4)F1a(xF1a1_4)
NQ
B4aF1a1_4D(xD4,D5)M7b_cM(xD)F1ac(xF1a,F1a1_4)M7c1Q1_2
F1a
R
P
F1(xF1a)F1ac(xF1a)M7
B4bF1(xF1ac)Other(xM)M(xE,D5,Q)M(xE,Q)
M(xE)
L3(xM,N)F3M(xD,D5)N(xR)N(xR9,xR)R9(xF1,F3)R9(xF1)F1acD5a'bF(xN,R)M (xQ)
B5bF1a_cOther(xM,R)R(xB)B(xB4a)R(xP,B,F)B(xB4c,B4b,B4a)F1a_c(xF1a)R(xF1a,B)M(xM7c1)
E−P1D−M116C−M216C−P343C−M208C−M38C−P355F−P14
J−L27O−M119O−P203O−M110
O−M134O−M95
O−JST3M−P34
O−P201K−P307K−P336K−P397
S−M230S−P377
S−M254Q−P36P−P295
Timor mtDNA Haplogroup
QM29−xQQ12D4D5D−xD4D5M−xCDEM7M12GQM7c1E−xE1a1EorM12GE1a1aE1a1Y2N−xFB4bB4cB5N9ab
N9aN9abN−xRN9R−xFB4bB4cB5R−xFB4aB4bB4cB5PF3F1aB4cB5aB5b1B4aB4b
61
24
41
Sample Size
Fig. S1. Genetic diversity on Sumba and Timor. Diversity of mtDNA haplogroups (top) and Y chromosome haplogroups (bottom) in villages on Sumba (left) and Timor (right). ‚
indicates patrilocal villages; ‚ indicates matrilocal villages. Pie charts, scaled by sample size, show the distribution of haplogroups within each village.
Lansing et al. 1 of 13
mtDNA tree and distances. A tree of female lineages was constructed using mitochondrial DNA, with a combination of HVS-Isequences (540 base pairs) and SNP-defined haplogroup information. While the sequence data show recent variation betweenindividuals, the haplogroups (determined by hierarchically genotyped SNPs) permit greater resolution of deeper parts of thephylogeny. By combining both information sources, a phylogenetic tree can be built that represents the true phylogeny farbetter than one generated from a single data source alone.
The tree building approach involved several steps. First, a tree was built using the haplogroup of each individual, withthe typology constrained to match the consensus haplogroup tree from the Phylotree project [1]. This tree contains leavesrepresenting individuals that share the same haplogroup. These leaves were then further refined using a neighbor-joiningalgorithm on the HVS-I sequences, using the most closely related haplogroup as an outgroup. Thus, we iteratively generatea final tree in which the deep nodes are determined by SNP-defined haplogroups and recent nodes by HVS-I variation. Allindividuals with the same haplogroup and HVS-I sequence are initially collapsed to speed up the calculations, later beingseparated at the tips after the tree reconstruction was complete. The final tree is shown in Fig. S2. Subtrees (e.g., for patrilocalvillages on Timor, etc.) were extracted from this master tree.
The pairwise distance matrix used for IM model fitting and language transmission analysis was built using the number ofpairwise differences in HVS-I sequences between each pair of individuals.
MHU15
PBL24
BKP34
PBL51
PBL52
BKR13
BKR39
BKR47
WMR24
BLP45
PBL44
BKR41
PBL56
AKL16
AKL37
AKL42
KDI42
LBY40
MBK25
BLP43
WNK25
BKP28
BKP29
KDI45
AKL44
WNK02
WNK05
WNK26
WNK27
BKR35
LBY28
LBY30
WNK49
LBY20
MHU12
MHU29
WNK03
WNK12
WNK19
MHU06
KDI11
KDI39
AKL28
LBY37
MBK13
MHU24
WNK10
WNK13
WNK23
WNK30
WNK38
AKL20
BKR23
BKR43
BLP02
BLP16
KDI20
LBY05
MBK05
MBK17
MHU14
MHU31
MHU41
WMR07
WMR09
WMR13
WMR15
WMR16
WMR18
WMR21
WMR25
WMR32
WMR33
WMR38
WMR43
BKP38
BKR28
BKR32
WNK48
KDI04
MHU05
MHU21
MHU27
AKL11
AKL12
AKL34
BKP33
BLP13
BLP18
BLP37
MBK10
MBK28
MBK31
MBK37
PBL09
PBL27
PBL31
PBL35
PBL50
PBL57
WNK29
WNK39
WNK42
AKL05
AKL18
AKL27
AKL50
BKP14
BKP15
BLP26
BLP33
BLP40
LBY42
LOLI19
LOLI25
LOLI26
LOLI29
MBK40
MBK41
MHU10
MHU13
PBL42
WNK01
WNK28
WNK31
WNK32
WNK41
WNK50
WNK51
AKL21
BKR07
LBY27
LBY38
MBK08
MBK14
MBK15
MBK27
MBK34
MBK39
MHU04
MHU08
MHU11
MHU17
PBL14
PBL33
WMR17
WMR46
BLP12
MHU37
WNK06
WNK08
PBL07
PBL41
BKR26
BLP28
LBY09
LBY10
LBY11
LBY34
MBK18
MBK47
PBL18
PBL47
BLP06
BLP10
LBY15
LBY22
BKP41
BKR25
BLP32
BLP50
BLP51
LBY08
LBY18
LBY47
LOLI01
KDI12
KDI13
KDI16
KDI18
KDI33
LOLI22
BLP49
BLP38
BLP52
WNK37
PBL22
PBL23
PBL53
MBK44
MBK45
BKP21
BKP23
BKR31
LBY43
WNK07
BKR16
KDI19
LBY35
LBY44
LBY45
MBK35
KDI25
WMR47
WMR44
BLP14
WMR01
BLP22
LBY01
KDI35
LBY06
LBY07
LBY14
LBY46
BKR27
BKR48
BLP42
KDI07
KDI17
KDI30
LBY26
LBY48
MBK48
PBL06
WMR41
WMR50
BKR30
WMR10
WMR37
MBK03
MHU18
MHU26
AKL41
BLP39
WNK21
MHU35
AKL33
AKL35
BKP39
BLP25
BLP53
BKP13
BKP17
BKP22
BLP24
MBK20
PBL29
PBL49
BKR29
MBK30
MBK46
KDI15
LBY31
LBY33
AKL13
AKL36
BKR01
KDI05
LBY21
LBY41
LOLI09
LOLI11
LOLI13
LOLI18
LOLI23
LOLI24
LOLI32
MBK29
MHU28
WNK15
WNK40
MBK11
MBK12
LOLI16
MHU09
WMR49
AKL04
AKL08
BKP42
BKR22
BLP47
KDI23
KDI31
KDI34
LBY29
MBK21
MBK42
PBL45
BKR08
KDI41
LOLI36
MHU44
PBL48
LOLI28
BKR05
BKR09
BLP07
BLP41
KDI10
KDI37
KDI43
LBY24
MHU43
WMR08
WMR20
KDI02
KDI40
KDI44
MBK19
MBK36
MBK43
MBK50
MBK53
MHU40
WNK44
MHU01
WNK46
WMR39
LBY16
BLP23
WMR19
LBY39
BKR24
BLP20
BLP31
BLP44
MHU07
KDI14
MHU16
WMR40
WNK14
WNK17
WNK18
WNK22
WNK34
BLP48
WMR42
AKL19
AKL43
BLP46
BLP35
MHU22
BKR19
BKR44
KDI09
LBY25
BKR34
BLP09
MHU30
MHU45
BKR06
BKR18
BKR37
KDI03
KDI21
KDI27
MBK51
BLP15
BKR15
LOLI06
LOLI12
LOLI38
MBK23
MHU32
WMR02
WMR04
WMR11
WMR12
WMR14
WMR23
WMR26
WMR27
WMR28
WMR29
WMR31
WMR34
LOLI20
BKP16
BKR17
AKL30
AKL32
AKL47
BLP29
LOLI34
MBK38
MHU20
WMR06
WMR30
WMR36
WMR51
WNK45
WNK47
AKL46
LBY04
LBY19
LBY36
LBY49
WMR48
BKR33
BKP04
BKP07
BKP26
BKP30
BKP20
BLP34
LBY02
WNK09
WNK11
WNK35
MHU25
MHU33
MHU23
LOLI30
BKP36
KDI01
KDI06
LOLI15
MBK22
WMR35
WNK16
BKR12
BKR20
BKR21
LBY17
LBY32
LOLI05
LOLI27
PBL19
WNK36
WMR22
PBL21
PBL54
AKL23
BKP35
LOLI31
WNK24
AKL25
Su
mb
a m
tDN
A H
ap
log
rou
p
D5B5aB5b1B4cM7bEM(xD,Q)BMF1ac(xF1a1_4)F1a(xF1a1_4)B4aF1a1_4D(xD4,D5)M7b_cM(xD)F1ac(xF1a,F1a1_4)M7c1Q1_2NQF1aF1(xF1a)F1ac(xF1a)M7RB4bF1(xF1ac)Other(xM)M(xE,D5,Q)M(xE,Q)PM(xE)L3(xM,N)F3M(xD,D5)N(xR)N(xR9,xR)R9(xF1,F3)R9(xF1)F1acD5a'bF(xN,R)M (xQ)B5bF1a_cOther(xM,R)R(xB)B(xB4a)R(xP,B,F)B(xB4c,B4b,B4a)F1a_c(xF1a)R(xF1a,B)M(xM7c1)
01
0k
20
k3
0k
40
k5
0k
60
k7
0k
80
k
years
ago
A
B
10k
20k
30k
40k
50k
60k
70k
ye
ars
ag
o0
Tim
or
mtD
NA
Ha
plo
gro
up
Q12D4D5
M7c1
Y2
N9a
P
TKI47
TKA04
TBA29
TLN23
TUK13
TUK14
TUK35
TUK37
TKA36
TKR16
TFI22
TKA61
TRE09
TRE20
TRE30
TTI18
TBA36
TRE07
TUK36
TKW08
TKK18
TKK21
TKK41
TKK42
TKK43
TKA21
TKA27
TKA58
TKI13
TKI31
TKI43
TKI45
TKR15
TKS15
TLN06
TLN45
TRE14
TRE24
TRE28
TRE40
TRE42
TRE44
TRE47
TRE48
TUK03
TUK17
TUK22
TUK27
TUK39
TUL26
TRE12
TBA28
TKA46
TKA48
TKR17
TBA25
TFI03
TFI19
TFI21
TFI35
TKR09
TKW02
TLN28
TLN43
TLN51
TRE16
TRE41
TRE49
TTI06
TTI10
TTI13
TTI22
TUK06
TUK20
TUK23
TUK33
TKA19
TKA67
TRE39
TKS03
TRE33
TUL42
TFI30
TLN12
TRE37
TRE38
TTI07
TTI14
TKA55
TFI13
TFI17
TFI33
TLN49
TKI15
TKK28
TLN29
TUL21
TLN37
TBA10
TBA22
TKA02
TKI38
TKI50
TUL15
TKR04
TKR14
TLN16
TBA24
TUK28
TKA29
TRE34
TRE36
TRE50
TUK12
TBA18
TKA59
TKR02
TKS08
TRE29
TBA09
TKA22
TKK08
TLN08
TLN10
TUK05
TUK07
TUK08
TUK09
TUK16
TUK24
TUK30
TUK40
TKA37
TKA41
TKA38
TKK50
TRE17
TRE23
TUL47
TKA26
TLN15
TRE22
TBA31
TKI24
TKK29
TLN11
TLN27
TLN44
TTI01
TTI02
TTI03
TUL19
TUL20
TUL32
TUL33
TRE45
TUK32
TFI08
TFI09
TFI11
TFI12
TFI16
TFI27
TKR10
TTI24
TKA53
TKA03
TFI29
TFI32
TKA60
TKI26
TKI42
TKA15
TKI08
TKI12
TKI14
TKI16
TKI25
TKI29
TKI30
TKI40
TKI28
TFI05
TBA20
TRE31
TUL43
TBA15
TBA26
TKA09
TKR03
TKR06
TUL23
TKK39
TKS10
TLN22
TLN24
TKI41
TKK12
TRE04
TKA18
TKI20
TFI06
TLN26
TRE25
TUK18
TUK19
TUK25
TUK38
TUL39
TUL40
TKR13
TKI35
TKA05
TKA40
TKA51
TKI46
TRE13
TFI23
TKA06
TKA14
TKA43
TKA44
TKA66
TKK02
TKK05
TKK07
TKK10
TKK13
TKK14
TKK16
TKK19
TKK22
TKK23
TKK24
TKK31
TKK33
TKK34
TKK36
TKK38
TKK40
TKK44
TKK48
TKK49
TKS01
TKS11
TKS13
TKS14
TKW01
TKW09
TKW12
TLN31
TLN42
TRE03
TRE05
TRE26
TRE27
TRE43
TTI04
TTI05
TTI09
TUK10
TUK26
TUK34
TUL01
TUL02
TUL05
TUL06
TUL09
TUL10
TUL13
TUL14
TUL18
TUL22
TUL27
TUL31
TUL35
TUL37
TUL44
TUL45
TUL48
TUL49
TUL50
TBA27
TKR05
TUL34
TBA39
TKA11
TKI04
TLN13
TLN35
TLN38
TLN40
TLN46
TRE32
TTI21
TUL30
TUL41
TUL03
TLN41
TKW15
TUL25
TUL29
TKW10
TLN14
TLN19
TKA07
TTI16
TTI17
TTI20
TKS02
TRE46
TKA28
TKR01
TUL07
TBA41
TKA17
TKA24
TKI21
TKW03
TLN17
TLN47
TRE06
TUL12
TFI10
TTI19
TTI23
TBA32
TKS09
TLN05
TLN21
TRE11
TFI18
TFI26
TKA45
TLN18
TKA01
TKA42
TKI09
TKI17
TKI19
TKI23
TKI32
TKI33
TKI36
TKI37
TKI44
TKI48
TKI49
TKS12
TLN01
TLN20
TLN25
TLN36
TRE02
TUL04
TUL24
TKI01
TKI10
TUL38
TKA34
TKS05
TKS07
TKW04
TKW13
TFI31
TKI07
TKK27
TLN32
TLN34
TLN48
TUL16
TUK31
TBA34
TBA30
TKA16
TLN07
TUL46
TKK30
TKR07
TKW05
TKW06
TKA56
TLN30
TRE35
TUK29
TKA30
TKA39
TUK11
TKA12
TBA14
TKA50
TKK25
Fig. S2. mtDNA phylogeny. The female lineages of sampled men from (A) Sumba and (B) Timor, colored according to their mtDNA haplogroup.
2 of 13 Lansing et al.
Y chromosome tree and distances. The tree of male lineages was reconstructed using both Y chromosome haplogroup informationand Y-STR markers. Using a similar approach as for mtDNA, both information sources were combined by first building ahaplogroup tree, and then refining recent lineage branching using Y-STR variation. The haplogroup tree was constrained tothe topology of the Y-DNA Haplogroup Tree defined by the International Society of Genetic Genealogy [2], with improvementsas per Karafet et al. [3] to incorporate new markers specific to this geographical region. To refine relationships at the tips,Bruvo’s distance [4] was calculated between each pair of individuals based on Y-STR variation. Leaves were then furtherrefined using a neighbor-joining algorithm on the Bruvo distance of the Y-STRs. As before, all genetically identical individualswere initially collapsed and later separated out after the overall tree structure had been determined. The final tree is shown inFig. S3. Individual population trees were extracted from this master tree.
The pairwise distance matrix used for IM model fitting and language transmission analysis was built using the Bruvodistance [4] between all pairs of individuals, as defined by their Y-STR diversity.
BKR24
BLP46
LOLI18
LOLI23
LOLI24
LOLI25
MBK18
MBK19
MBK22
MBK36
BKP04
BKP07
BKP16
BKP20
BKP22
BKP26
BKP28
BKP29
BKP30
BKP34
MBK34
MBK35
WNK24
WNK26
WNK32
MBK10
MHU05
MHU15
MHU21
MHU27
PBL06
PBL07
PBL33
PBL56
WNK36
PBL48
PBL18
AKL34
WNK07
BLP06
PBL14
MBK48
PBL31
PBL35
PBL57
LOLI28
BKP33
MBK30
PBL41
AKL32
AKL36
AKL47
BLP02
LBY25
BLP10
BKP13
MBK47
LOLI01
MHU13
PBL24
PBL51
BKP41
PBL21
PBL44
PBL45
PBL54
KDI34
MHU07
KDI03
KDI43
KDI45
BLP28
BLP32
BLP38
BLP40
WNK06
WNK44
PBL50
BKP38
BKP39
WNK28
WNK30
WNK12
LBY20
MHU17
MHU24
MHU25
MHU33
LOLI32
MHU04
MHU30
BLP16
BLP52
BLP13
BLP23
BLP45
WNK42
MBK50
MBK11
AKL19
PBL29
BLP48
BLP26
BLP35
BLP09
BLP12
BLP14
BLP15
BLP18
BLP20
BLP22
BLP25
BLP29
BLP31
BLP33
BLP34
BLP37
BLP41
BLP42
BLP43
BLP44
BLP47
BLP51
PBL27
MHU16
MBK29
MBK39
AKL13
AKL08
AKL28
AKL30
AKL44
AKL35
AKL18
AKL20
AKL21
AKL25
AKL33
AKL41
AKL42
AKL43
AKL50
PBL49
BKP21
BKP23
MBK12
WMR29
BKP14
BKP15
BKP17
BKP35
BLP39
BKP36
LOLI30
MHU14
BLP53
MHU35
MHU40
PBL52
PBL19
PBL42
PBL47
PBL09
WMR44
LBY44
MHU01
WMR01
WMR04
WMR07
WMR19
WMR26
WMR40
WMR43
WMR48
WMR51
WNK34
WNK35
WNK46
BKR35
MHU09
WMR14
WMR27
MBK45
WNK29
BLP24
LBY17
LBY09
LBY45
WNK17
WNK47
LBY36
LBY14
LBY19
LBY26
LBY46
LBY48
LBY49
LBY06
WMR33
WMR13
WMR24
WMR37
WMR38
WMR42
LBY01
KDI21
LBY27
LBY42
LBY32
MBK03
MBK15
MBK20
MBK23
MBK28
MBK31
MBK40
MBK41
MBK44
MBK51
MBK38
PBL22
BKR29
BLP50
WNK19
BLP07
MBK37
WMR06
WMR49
AKL04
AKL05
AKL11
AKL12
AKL16
AKL23
AKL37
BKP42
LBY02
LBY08
LBY10
LBY11
LBY15
LBY18
LBY21
LBY22
LBY28
LBY30
LBY33
LBY34
LBY35
LBY38
LBY39
LBY47
LOLI06
LOLI11
MBK05
MBK08
MBK14
MBK17
MBK21
MBK27
MBK42
MBK43
MBK46
MBK53
WMR02
WMR08
WMR09
WMR10
WMR11
WMR12
WMR15
WMR17
WMR18
WMR20
WMR21
WMR23
WMR25
WMR34
WMR39
WMR46
WMR50
WNK03
WNK05
WNK18
WNK22
WNK45
WNK48
LBY41
WMR16
WMR32
WMR36
WMR41
MBK25
KDI07
LBY04
LBY05
LBY43
LBY07
LBY24
WMR47
BKR19
BKR27
BKR34
BKR44
KDI35
KDI09
BKR30
BKR28
BKR05
BKR12
BKR20
BKR21
BKR48
KDI44
WMR28
WMR31
BKR47
BKR08
BKR09
BKR13
BKR32
BKR33
BKR39
BKR41
BKR43
KDI01
KDI02
KDI04
KDI05
KDI06
KDI10
KDI11
KDI12
KDI13
KDI14
KDI15
KDI16
KDI17
KDI18
KDI19
KDI20
KDI23
KDI25
KDI27
KDI31
KDI33
KDI39
KDI40
KDI41
KDI42
LBY31
WNK16
BKR06
BKR07
BKR18
BKR22
BKR31
KDI30
LOLI31
WMR30
LOLI16
LOLI36
LOLI19
LOLI29
LOLI38
LOLI15
WNK25
BKR23
LOLI27
LOLI12
LOLI26
WMR22
WMR35
WNK15
LOLI20
MHU44
MBK13
PBL23
PBL53
AKL46
MHU45
LBY16
LBY29
LBY37
LBY40
MHU28
MHU31
MHU32
MHU41
WNK21
MHU06
MHU08
MHU10
MHU11
MHU12
MHU18
MHU22
MHU23
MHU29
MHU37
MHU43
WNK08
WNK09
WNK10
WNK11
WNK13
WNK14
WNK23
WNK27
WNK31
WNK37
WNK38
WNK39
WNK40
WNK50
WNK51
MHU26
BKR01
LOLI34
WNK49
MHU20
BKR15
BKR16
BKR17
BKR25
BKR26
BKR37
WNK01
LOLI05
KDI37
WNK41
LOLI13
BLP49
AKL27
LOLI09
LOLI22
WNK02
Su
mb
a Y
-ST
R H
ap
log
rou
p
years
ago
ye
ars
ag
o
TTI13
TUL43
TTI16
TTI19
TTI23
TKK12
TKI33
TUK24
TKW08
TKA06
TUK03
TBA29
TKA37
TBA22
TTI10
TKK28
TLN24
TLN30
TUL29
TLN38
TUK09
TUK20
TUK23
TKR07
TKK41
TKA05
TRE13
TRE32
TUK34
TUK37
TUL46
TLN13
TLN32
TKW09
TKA01
TLN45
TTI07
TKA29
TKA34
TRE23
TKA39
TKA42
TKA51
TKA36
TKA43
TFI06
TUL50
TRE35
TKA26
TKA55
TKS09
TUK31
TKA12
TUL35
TLN43
TUL42
TKA28
TKW15
TKA50
TKR10
TBA14
TRE07
TKR03
TUK11
TKI47
TUL41
TUK05
TKR16
TKA19
TBA34
TUK12
TUK38
TUK39
TKA40
TKK30
TUK27
TUL27
TUK35
TKA17
TLN16
TBA18
TTI05
TKS02
TRE41
TUL18
TKS11
TKA60
TUL34
TUK13
TKR01
TBA27
TBA41
TRE31
TKS14
TKS03
TKS05
TKS07
TKW04
TBA24
TLN34
TKA14
TKK23
TKK29
TKR04
TKR09
TUL05
TUL15
TUK16
TKW02
TLN49
TRE06
TRE26
TUK10
TUL23
TKI14
TKI42
TKK07
TFI13
TTI21
TFI09
TKR02
TLN11
TLN17
TBA25
TUL44
TKK36
TKS08
TLN26
TRE12
TRE24
TTI06
TUL49
TUL07
TKK14
TKK16
TKK19
TKK33
TKK34
TLN19
TKK08
TTI03
TRE44
TUL30
TKI44
TRE47
TKK24
TRE37
TTI01
TKI15
TKA11
TUL32
TTI20
TFI31
TFI17
TBA20
TKK31
TKK44
TUL47
TBA39
TLN37
TLN42
TLN46
TFI30
TUK08
TUK25
TLN06
TLN29
TUL33
TRE27
TTI04
TBA30
TLN21
TUL20
TKI08
TKI45
TBA10
TKI17
TKI26
TKI32
TKI37
TKI40
TKI41
TKI43
TKI48
TKK22
TKK38
TKK43
TKK50
TUL14
TUK22
TFI29
TTI14
TKS13
TRE11
TKW12
TLN28
TUK07
TKI21
TLN47
TRE25
TKS15
TBA26
TKR15
TKA38
TKA48
TKA30
TKA15
TKA04
TKA27
TBA32
TLN44
TKA45
TKK49
TKA41
TUL13
TKI09
TRE45
TKK27
TFI08
TRE22
TLN22
TBA09
TBA15
TBA36
TFI16
TFI19
TFI35
TKA02
TKR05
TRE16
TUL02
TUL39
TKS12
TUL26
TUK29
TRE03
TKA24
TKA59
TKI19
TRE02
TRE17
TRE30
TRE40
TRE48
TTI09
TTI22
TFI18
TUL40
TRE05
TFI21
TRE49
TKI13
TUK14
TKK40
TKI07
TKR06
TKA66
TLN10
TKK48
TLN08
TUK26
TUK30
TUK36
TUL24
TKW13
TKA46
TRE46
TLN40
TKR13
TUL25
TRE28
TKI20
TKA18
TKI35
TFI26
TBA28
TFI22
TFI33
TTI17
TLN25
TLN15
TUK19
TKA58
TLN31
TRE42
TTI18
TTI02
TRE43
TUL48
TKR14
TUL06
TKS01
TKW01
TRE39
TLN07
TKW05
TUK17
TRE09
TRE14
TRE20
TRE29
TRE36
TRE38
TRE50
TUK32
TKA07
TKA22
TKA44
TKI01
TKI24
TKK02
TKK10
TKK13
TKK25
TKK39
TKS10
TLN27
TUL38
TUL45
TRE33
TKA61
TKA21
TKW06
TLN48
TFI27
TRE34
TUK18
TRE04
TKR17
TFI11
TTI24
TUL01
TFI32
TLN14
TLN18
TUK33
TUK40
TKK21
TLN20
TUL37
TKI25
TKI30
TUL03
TUL04
TUL10
TUL31
TLN01
TLN12
TFI05
TKI50
TLN35
TKI23
TKI28
TKK18
TKA16
TKA56
TKI29
TKA67
TKI10
TKI12
TKI36
TKI46
TKA03
TKI04
TUL19
TKW10
TLN23
TUK06
TKA53
TUL22
TKK05
TUL09
TLN05
TFI12
TKI16
TKI31
TLN36
TLN51
TUL16
TKK42
TLN41
TBA31
TKI38
TFI10
TFI03
TFI23
TUK28
TKW03
TUL12
TKI49
TUL21
TKA09
A
B
Fig. S3. Y chromosome phylogeny. The male lineages of sampled men from (A) Sumba and (B) Timor, colored according to their Y chromosome haplogroup.
Lansing et al. 3 of 13
Molecular dating. Previously dated haplogroups with confidence intervals were used to calibrate the trees. For the mtDNAphylogeny, three time points were used, as determined from whole mtDNA genome sequences by Fu et al. [5]. For the Ychromosome phylogeny, five time points were used, taken from Karafet et al. [3]. Molecular dating of both trees using thesecalibration points was performed using the chronos function in the R package ape [6].
Language data and tree. The linguistic data were organized and classified according to the principles of the traditionalcomparative method. As an independent check, the resulting language phylogenies for Sumba and Timor were compared againstthe language relationships given in Ethnologue [7]. The language phylogenies were then calibrated using penalized likelihoodand a relaxed substitution rate model to estimate internal branching times [8]. The language tree for Sumba (Fig. S4A) wasconstructed by setting the root to 4, 085 years before present, the date inferred by Xu et al. [9] for the arrival of Austronesianspeakers on Sumba. The language tree for Timor (Fig. S4B) was constructed using two much less well supported priors: i) theTimorese Austronesian languages were set to split 5, 000 years ago; and ii) the languages spoken on Timor were assumed tohave a coalescent date no earlier than 10, 000 years ago. Note that the depth of the Austronesian/non-Austronesian split is notknown, and is purposely given an arbitrary date. The coalescence age of this external branch has little effect on the analyses.
L800 Betun
L863 Tetun
L803 Kemak
L802 Dawanta
L801 Bunak
10 kya
5 kya
3.88 kya
2.49 kya
A Sumba
B Timor
L870 Bukambero
L708, L709 Wejewa
L1492 Wanokaka
L153Praibakul
L535 Wanokaka
L1493 Anakalang
L1212 Kambera
L164 Rindi
L1497 Mamboro
L1490 Kodi
L1496 Lamboya
L539 Loli
4.09 kya
3.63 kya
3.18 kya
2.72 kya
2.27 kya 1.21 kya
1.82 kya
1.36 kya0.91 kya
0.45 kya
Fig. S4. Language phylogenies for (A) Sumba and (B) Timor. ‚ indicates calibration points.
Isolation with Migration (IM) model. An Isolation with Migration model was used to capture the impact of sex-biased movementson genetic diversity [10]. The IM model describes a single panmictic population of size aN that splits into n subpopulationsof size N at time 2Nτ generations in the past. Migration occurs at a rate m between subpopulations. To assess the effectsof postmarital residence practices on mtDNA and the Y chromosome, we distinguish between the migration rates of women(mfemale, for mtDNA) and men (mmale, for the Y chromosome), and run the model separately for each group. Both runs havethe same values of N , n, a, τ and mutation rate µ, but can have different migration rates mfemale and mmale. For example,matrilocal kinship systems lead to greater migration of men than women, such that mmale ą mfemale. The converse is true forpatrilocal systems. A cultural preference for endogamy is reflected by low migration rates for both mfemale and mmale.
Using a scaled (haploid) migration rate M “ 2Nm and θ “ 2Nµ, with mutation rate µ in mutations per generation for thelocus, the probability that the number of nucleotide differences between a pair of individuals Sj is k is given by
P pSj “ kq “2ÿ
r“1
Ajr´ λrθ
k
pλr ` θqk`1
´
1´ e´τpλr`θqkÿ
l“0
pλr ` θql τ l
l!
¯
`e´τpλr`θq paθqk
p1` aθqk`1
kÿ
l“0
` 1a` θ
˘lτ l
l!
¯
[1]
where j “ 0 and j “ 1 correspond to the cases of two samples being from the same or different villages, respectively. Theremaining sub-equations are defined as
4 of 13 Lansing et al.
λ1 “nM ` n´ 1´
?D
2pn´ 1q
λ2 “nM ` n´ 1`
?D
2pn´ 1q , [2]
D “ pnM ` n´ 1q2 ´ 4 pn´ 1qM, [3]
A01 “λ2 ´ 1λ2 ´ λ1
A02 “1´ λ1
λ2 ´ λ1
A11 “λ2
λ2 ´ λ1
A02 “´λ1
λ2 ´ λ1. [4]
This model can be used to make genetic predictions regarding the distribution of genetic differences in mtDNA and inthe Y chromosome given sex-biased migration. To do so, we calculate the equations twice, applying a migration rate ofM “ Mmale “ 2Nmmale to explore Y chromosome differences and M “ Mfemale “ 2Nmfemale to explore mtDNA difference,while keeping other parameters constant. We take this approach when exploring the phase space of the model, see Fig. S5.
It is important to be aware that such a model is a considerable simplification of the complex migration patterns generated bykinship systems. For instance, it assumes that i) movements of mtDNA and Y chromosomes are independent; ii) the effectivepopulation size N and demographic history, as described by n, τ and a, of the mtDNA and Y chromosome are the same (inpractice, N is often lower for men due to their higher reproductive variance); and iii) standard coalescent assumptions hold,such as a relatively small sample size compared to the total population size, and exchangeability among sampled individuals.A further important assumption is that the measurement of pairwise differences between two individuals in a sample isindependent of the number of pairwise differences between other individuals in the sample. This final point is relevant to ourApproximate Bayesian Computation (ABC) modeling especially, and is elaborated on below.
Despite these limitations, this non-equilibrium model is complex enough to incorporate many demographic events that areexpected to be reflected in the data. It closely fits the distribution of pairwise differences for distant relatives observed inthe Sumba and Timor mtDNA and Y chromosome data sets (see main text Fig. 3), as well as differences between these lociat smaller genetic distances. We are especially interested in the quality of fitting at smaller genetic distances because thesebetter reflect the impact of kinship systems over recent genetic history. While we do not suggest that the fitted parameterscorrespond directly to real-world properties of these populations, they do capture patterns that provide a qualitative indicationof whether matrilocality or patrilocality is a likely cause of the observed data.
Parameter space of the IM model within and between villages. The mtDNA and Y chromosome pairwise distances within and betweensubpopulations are shown in Fig. S5, with other parameters N “ 200, n “ 100, τ “ 5.0 and a “ 2.0 and mmale and mfemalescaled between 0 and 0.05. The Kullback-Leibler divergence for between-village comparisons (Fig. S5B) is almost the same forall migration rate values. Differences in the quality of the model fit, as quantified by the Kullback-Leibler divergence, onlybecome apparent when looking at within-village comparisons (Fig. S5A). In other words, it is only easy to observe kinshipsystems through differences in mtDNA and Y chromosome pairwise distances within subpopulations. This is an importantfuture consideration when designing methods to detect the signature of kinship practices, and is leveraged in our ABC method.
Fitting the IM model using Approximate Bayesian Computation (ABC). ABC is a method to assess how much the observed data supportsdifferent parameter values in a model. The pairwise Bruvo distances [4] of the Y chromosome (values in range [0,1]) were firstscaled such that the average observed distance was the same as the mtDNA pairwise differences. Given that the time depth ofthe Y chromosome and mtDNA trees are broadly similar for our samples, we consider this operation equivalent to ‘scaling’ themutation rate of the Y chromosome and mtDNA such that, for the purposes of modeling, they can be considered equal (µ= per site mutation rate of 1.67×10−7 × generation time of 25 years × 540 mtDNA sites = 0.002214). The distribution ofmtDNA pairwise distances and scaled Y chromosome Bruvo’s distances are similar. After obtaining the observed distributions,we proceeded with the ABC iterations.
We use full coalescent simulations in our ABC model fitting, rather than simply sampling from the theoretical distributionof pairwise distances (Eq. (1)). This is slower but allows us to properly account for correlations in pairwise distances betweensamples (e.g., see [11]). The coalescent simulations were performed using the coalescent program msms [12] and implementedso as to replicate the IM model explored in Wilkinson-Herbots (2008) and described above.
In the simple form of ABC used, an iteration consists of i) proposing a combination of parameter values according to theparameter prior distributions; ii) using msms to simulate samples given these parameter values, both for the female (withmigration rate mfemale) and male (with migration rate mmale) lineages; iii) calculating the average pairwise distance between
Lansing et al. 5 of 13
Fig. S5. Migration rate phase space. (A) Kullback-Leibler divergence of pairwise distance distributions (KL(Y|mtDNA) + KL(mtDNA|Y)) within subpopulations. (B) Kullback-Leiblerdivergence of pairwise distance distributions between subpopulations. (C) Example plots of mtDNA (blue) and Y chromosome (red) pairwise distance distributions within (solidlines) and between (dashed lines) subpopulations, focusing on the low-difference regime.
6 of 13 Lansing et al.
k k
Fig. S6. Posterior distributions of IM model fitting for Sumba data, based on 300 best-fitting parameter values from three million samples of the prior distributions.
k k
Fig. S7. Posterior distributions of IM model fitting for Timor data, based on 300 best-fitting parameter values from three million samples of the prior distributions.
Lansing et al. 7 of 13
samples, using only within-village comparisons (which are more informative regarding locality, see Fig. S5); and iv) calculatingthe distance metric (Kullback-Leibler divergence) by summing the KL-divergence of the simulated and observed mtDNA andY chromosome distance distributions. For N , n and the migration rates we used uniform priors with bounds 70 ď N ď 80,15 ď n ď 300, 0 ď mmale ď 0.25, 0 ď mfemale ď 0.25. For τ and a, we used lognormal distributions with mean 1 and standarddeviation 1.25, allowing us to sample large values of the derived parameters T0 “ 2.0 × 25 × N × τ and Nancestral “ a × Nwhile weighting sampling toward more plausible smaller parameter values. The number of individuals sampled from each villagein the model was the same as the number of individuals in our data. This procedure was iterated three million times, keepingthe parameters that generate a model output with the lowest distance from the data as being representative of the posteriordistributions. The ABC posterior distributions of the IM model used for main text Fig. 3 are shown in Figs. S6 and Figs. S7.The prior distribution of T0 and ancestral N are calculated from the uniform and lognormal priors of other parameters.
Kinship and language transmission in matrilocal Timor. In Fig. 4 of the main text, we show the genetic distances of the villagesin Timor, including the 9 matrilocal villages and 2 patrilocal villages. In Fig. S8, only the genetic distances of the matrilocalvillages on Timor are shown. The changes are minor when compared to Fig. 4B, D : the ‘p’ cluster does not disappear,and considering only matrilocal villages results in only a slight difference in the kernel density of pairwise genetic distances.We interpret this result as reflecting the relatively small number of pairwise patrilocal comparisons within the total sample.The presence of this weak ‘p’ cluster may be related to the high proportion of effectively ambilocal marriages in the closelyintegrated villages of the centuries-old Wehali village cluster (see Therik [13], but also noted in our informal surveys) withweak representation of Y-chromosome-specific drift in a minority of samples.
0 5 10 15
0
5
10
15
B
p
m
Timor matrilocal pairs
with same language
an
Ydistance
(i,j)
mtDNA distance (i, j)
0 5 10 15
0
5
10
15
A
Ydistance
(i,j)
mtDNA distance (i, j)
Timor matrilocal
all pairs
Fig. S8. Genetic distances on matrilocal villages of Timor, between all pairs of individuals (A) and only individuals who speak a common language (B). Considering onlymatrilocal villages in Timor does not significantly change pairwise genetic distances, and still show matrilocality (m), ambi- or neolocality (an), with a small patrilocal cluster (p)related to a high proportion of effectively ambilocal marriages.
Analysis of linguistic and genetic distances. In Fig. S9, we compare genetic distances along both matrilines and patrilineswith linguistic distances, 1´ dlpi, jq, defined as
1´ dlpi, jq “ 1´ cospli, ljq, [5]where li is the language vector of individual i and lj is the language vector of individual j. A pair of individuals who speak thesame set of languages will have 1´ dlpi, jq “ 0. If they both speak at least one language in common, but one or both of themspeak other languages as well, then 0 ă 1´ dlpi, jq ă 1. If they do not speak any language in common, 1´ dlpi, jq “ 1.
On Sumba, we see the signal of monolinguality as pairs of individuals that either share all of their languages or do notshare a language at all (Fig. S9A-B). The few pairs of individuals that speak the same language form a cluster with a mtDNAgenetic distance of ~8 (Fig. S9A). On their Y chromosome (Fig. S9B), pair of individuals that speak the same languages areeither closely related (~0) or distantly related to some degree (~5–10).
Multilinguality is evident in Timor (Fig. S9C-D) where there are pairs of individuals that share some, but not all, of theirlanguages. Most individuals speak the same languages (for instance, nearly everyone speaks Upper Tetun) and they are closelyrelated on their mtDNA (Fig. S9C), but have only distant relatedness on the Y chromosome (Fig. S9D).
Co-phylogeny of genes and languages. The co-phylogeny method has often been applied in functional ecology to find linksbetween species traits and environmental variables, thus explaining observed biological processes in ecosystems [14]. Morerecently, similar methods have been applied in the context of host-parasite co-evolution [15] and cultural traits [16]. Theevolution of language is analogous to parasites carried by their human speakers, whose genes evolve in a parallel process.Language traits evolve as the environmental and genetic conditions of the speech communities evolve. Using the co-phylogenyframework, we can test the significance of a global hypothesis of co-evolution between genes and language [15].
Co-phylogenetic analysis requires the specification of a binary association link matrix that indicates which of the locallyspoken languages each sampled individual speaks (e.g., for Timor: Bunak, Betun, Dawanta, Kemak and/or Upper Tetun;
8 of 13 Lansing et al.
0 5 10 15
mtDNA distance (i, j)
0.0
0.2
0.4
0.6
0.8
1.0
languagedistance
(i,j)
0 5 10 15
Y distance (i, j)
0.0
0.2
0.4
0.6
0.8
1.0
languagedistance
(i,j)
C D
0 5 10 15
mtDNA distance (i, j)
0.0
0.2
0.4
0.6
0.8
1.0
languagedistance
(i,j)
10 15mtDNA distance (i, j)
0.0
0.5
1.0
languagedistance
(i,j)
0 5 10 15
Y distance (i, j)
0.0
0.2
0.4
0.6
0.8
1.0
languagedistance
(i,j)
Y distance (i, j)
0.0
0.5
1.0
langu
age
distance
(i,j)
A B
Fig. S9. Linguistic and genetic distances. The kernel density of the linguistic and genetic distances is shown for all pairs of individuals. (A) On Sumba, where villages aremonolingual, only a few pairs with mtDNA distance ~8 speak the same language. (B) For the Y chromosome on Sumba, pairs of individuals who speak the same language formtwo clusters of close (~0) and slightly distant (~5–10) relatedness. (C) On Timor, there is a high probability of finding pairs of individuals that are closely related both on theirmtDNA and in terms of the languages they speak. (D) Most pairs of individuals on Timor have a distant degree of relatedness on their Y chromosome, but speak a similar set oflanguages.
Lansing et al. 9 of 13
for Sumba: languages spoken in each monolingual village). On Timor, language fluency was assessed for each participanton a three-point ordinal scale (‘Understand’, ‘Understand some’, ‘Do not understand’) during a survey administered to eachparticipant. The first two categories were then coded as a positive indication of ability with the corresponding language for thepurposes of the cophylogenetic analysis.
Statistical test for gene-language coevolution. To test whether a significant association exists between the evolution of genes andlanguages, we employ statistics that are functions of the genetic and language phylogenies, together with their association links(i.e., the list of which individuals speak which languages) [15]. In this statistical test, we examine the significance of congruence,if any, between the gene and language trees. If genes and languages occupy corresponding positions in both phylogenetic treeswith a significant degree of congruence, we reject the global null hypothesis that their evolution has been independent.
ParaFit statistic. Evaluating the global hypothesis requires three pieces of information: i) the gene phylogeny G, ii) the languagephylogeny L, and iii) the association between genes and languages LG.
The cophylogeny matrices L and G (Fig. S10) represent the principal coordinates of the linguistic and genetic distances ofindividuals along their respective trees. The association matrix LG is a binary link specifying the languages spoken by eachindividual. To calculate the congruence of the gene and language trees, we define the fourth-corner statistics matrix D [15] as
D “ G LG1 L [6]
From D, we define the global association ParaF itGlobal statistic, to test the gene-language coevolution hypothesis, as thesum of squares of dij
ParaF itGlobal “ tracepD1Dq “ÿ
pdij2q [7]
Lprincipal
coordinates
of the language tree
princ. coordinates
lan
gu
ag
esLG
language-gene
associa on
(binary data
of who speaks what)
lan
gu
ag
es
genes
Gprincipal
coordinates
of the gene tree
genes
pri
nc.
co
ord
ina
tes
Dfourth-corner
sta s csge
ne
tre
e
pri
nc.
co
ord
ina
tes
language tree
princ. coordinates
Fig. S10. Fourth-corner matrix. Statistics describing gene-language associations.
Permutation Model. To test whether the association between genes and languages is significant, we calculate ParaF itGlobal onthe original data, as well as on the permuted data. Permutation is performed by shuffling the columns of LG as shown inFig. S11. Randomizing the columns is akin to shuffling the gene labels in the link matrix, which removes the association, butpreserves the extent and degree of multilinguality.
The observed association between genes and languages is then determined using Z scores. The distribution of ParaF itGlobalvalues of different permuted realizations of the data, ParaF itGlobalperm, are taken and the number of standard deviationsParaF itGlobaldata (i.e., the ParaF itGlobal value of the original data) is away from the mean of tParaF itGlobalpermu iscalculated. The Z score is taken over 25 randomizations r, each with 999 permutations p of ParaF itGlobalperm.
Z “ParaF itGlobaldata ´
ř
r,p ParaFitGlobalpermr,p
r`p
sptParaF itGlobalpermr,p uq[8]
languages
...
languages
g3 g1 gn ... g2
...
g1 g2 g3 ... gn
Fig. S11. Permutation model. Columns of the LG matrix are shuffled to remove associations, while preserving the number of multilingual individuals and the number oflanguages they speak.
10 of 13 Lansing et al.
Language switching on genetic trees. To estimate host switching probabilities, we assume that both gene and language treesare accurately reconstructed, and seek to generate a one-to-one mapping of branching points in the gene and language trees,thus describing how the languages were transmitted and ultimately leading to the current language(s) spoken at each leaf ofthe gene tree today. A plausible model shown in Fig. S12 predicts the languages spoken at each branch of the gene tree atgeneration t. This stochastic simulation is run forward in time over the trees using two rules. First, where the language treebranches into two daughter languages, all genetic clades that speak the ancestral language are randomly assigned to one of thedaughter languages. Second, in each generation, there is a probability α that a given clade switches to a new language chosenfrom among the languages that exist in the population at that time. That is, a proportion of lineages α switches to a newlanguage at each generation. These two rules provide a simple model of language diversification and host switching (Fig. S12)that is sufficient to reconstruct patterns of language sharing observed in the present.
L1 L2 L3
L*12
L*123
A Bt0
t1
t2
t4
t3
t5
t6 G1 G2 G3G1 G2 G3
C
Fig. S12. Language switching in a gene tree conditioned on a language tree. Shown is (A) an unannotated gene tree, (B) a language tree and (C) an example of a simulatedlanguage-annotated gene tree. Language diversifies i) during language branching (L˚
123 splits into L˚12 and L3 at t2), and ii) during host switching, as in the branches leading
to G2 (language switch at t4) and G3 (language switch at t5).
This plausible stochastic model of language transmission along the branches of the gene tree was run forward in time, startingfrom the gene tree root (t0 in Fig. S12C). The following process was performed at each generation of 25 years: i) determine thegenetic clades and languages at generation t (e.g., two genetic clades and languages L˚12 and L3 at t2 in Fig. S12B-C); ii) if alanguage branches at generation t, each genetic branch that carried the ancestral language at t´ 1 is randomly assigned to oneof the two new languages (e.g., at t2 in Fig. S12C, L˚12 is assigned to the first branch and L3 to the second branch); iii) eachgenetic branch then switches to a different language within the current pool of languages with probability α (e.g., geneticbranch G2 switches to language L3 at t4).
This simulation process was repeated for many gene-language simulations across the range α “ r0, 100s where α “ 0 indicatesno language switching and α “ 100 means all lineages switch their language every generation. The average Z of each α wasthen compared with the Z of the observed data to find the degree of language switching that best fits the data. Variants of thisgene-language co-evolution model yield qualitatively similar results.
Multilinguality in the model of language switching between genetic clades. The model of language switching between genetic clades asshown in Fig. S12 was repeated with simulations across a range of values of the language switching rate α. From these, weobtained a distribution of Z scores and identified the range of α values that yield Z scores similar to those seen in the observeddata. This simulation model is sufficient to generate plausible patterns of language sharing in Sumba, where individuals aremonolingual. However, multilinguality is common in Timor, and we therefore modified the model such that individuals havea list of languages. When a new language is introduced (i.e., an ancestral language branches into daughter languages), orlanguage switching occurs (as determined by α), there is now a fixed probability β that the new language is appended to thelanguage list rather than replacing the language list. We are primarily interested in fitting the language switch rate α, and soonly a limited range of β values were explored. In main text Fig. 5, we take β “ α. Other β values yield qualitatively similarresults, with all values leading to Z score convergence at α ă 5%.
Alternative model of language switching between genetic clades. A plausible alternative stochastic model of language transmissionalong the branches of the gene tree run forward in time (starting from the root of the gene tree) involves the following process,which is performed at each generation t (with an interval of 25 years):
1. Set parameters – determine the genetic clades and languages at generation t;
2. Cospeciation – if a language branches at generation t, each genetic branch that carried the ancestral language at t´ 1 israndomly assigned one of the two new daughter languages. Genetic branches associated with a language that does notbranch inherit the language(s) spoken by that branch at time t´ 1; and
3. Language switch – each genetic branch then switches to a different language within the current pool of languages withprobability α.
In the case of multilinguality (as on Timor), every individual has a list of languages and the set of languages spoken by anindividual is determined by these additional rules:
Lansing et al. 11 of 13
4. Parent language replaced by daughter language during cospeciation – whenever language branching occurs, for everygenetic branch that spoke the ancestral language at t´ 1, one of the daughter languages replaces the ancestral languagein the current list of languages of the genetic branch; and
5. Additional language during switch – when a genetic branch switches to a language not in its language list at t´ 1, itretains the other languages it already spoke at t´ 1.
For this alternative model, language switching rates α converge to 1–5% per generation for all trees examined, as shown inFig. S13. Under this alternative model, the ‘half life’ of a language on a lineage is 325–1, 700 years.
The difference between this alternative model and the model used in main text Fig. 5 lies in steps 4 and 5. In the modelused in the main text, the new language is appended with a probability β; in this alternative model, i) the chosen new daughterlanguage always replaces the ancestral language in step 4; and ii) the language chosen during switching in step 5 is alwaysappended to the language list of the genetic branch. That is, in the main model, β is allowed to vary, while in this alternativemodel, β is always 0 during language tree branching, and β is always 1 during language switching.
Fig. S13. Language switching rates under the alternative model. The Z score measure of association between gene and language phylogenies on Sumba and Timor is shownfor different language switching rates α under the alternative model. All cases independently converge and abruptly lose gene-language associations, behaving similar torandomized cases, when α « 5%.
Probability of shared gene-language heritage. The probability that a pair of individuals pi, jq share a common language l giventhat they belong to the same genetic clade g at generation t is given by
P pil, jl|igptq, jgptqq “P pil, jl X igptq, jgptqq
P pigptq, jgptqq“
řctg“1
´
ř
i,j‰i 1cosplig ,ljg qą0
¯
řctg“1 ngpng ´ 1q{2 , [9]
where ct is the number of genetic clades in generation t each lasting 25 years, lig and ljg are the language vectors of individualsi and j belonging to g, 1cosplig ,ljg qą0 is 1 if i and j share a language and 0 otherwise, and ng are the number of individuals in g.
Probability of shared gene-language heritage in idealized scenarios. In Fig. 2 of the main text, we show P pil, jl|igptq, jgptqq forthe randomly permuted case. Here, we present alternative hypothesis modeling by estimating language distributions using thelanguage switching model (shown in Fig. S12) for two idealized scenarios:
1. The language-switching rate is very high (α “ 1), such that every lineage has the opportunity to switch its languageevery generation; and
2. No language-switching occurs (α “ 0), such that language sharing is purely generated by divergence in the language tree.
Considering mtDNA, the α “ 0 scenario is consistent with rigid matrilocality, while the α “ 1 scenario is consistent with(very) frequent female movement, as might accompany patrilocality. The converse is true for the Y chromosome, with α “ 0corresponding to rigid patrilocality. Running these scenarios for our genetic and language trees (both mtDNA and the Ychromosome) allows us to make language sharing predictions in the case of extremely sex-biased migrations.
For patrilocal Sumba, the observed mtDNA pattern is relatively close to the patrilocal model prediction throughout thetime period explored, and clearly departs from the matrilocal model. The observed Y chromosome pattern is very distant fromthe matrilocal prediction and generally approaches the idealized patrilocal model, until about 20 kya. These results support
12 of 13 Lansing et al.
0 10k 20k 60k50k40k30k 70k
years ago
P(i
l,j l|i g
,jg)
0.1
0.4
0.3
0.2
0.5
0 10k 20k 60k50k40k30k 70k
years ago
P(i
l,j l|i g
,jg)
0.2
1.0
0.4
0 10k 20k 60k50k40k30k 70k
years ago
P(i
l,j l|i g
,jg)
0.2
1.0
0.6
0.4
0 10k 20k 60k50k40k30k 70k
years ago
P(i
l,j l|i g
,jg)
0.1
0.4
0.3
0.2
0.5 A Sumba mtDNA B Sumba Y
data
random
matrilocal
patrilocal
C Timor mtDNA D Timor Y
0.8
0.6
0.8
Fig. S14. Language sharing in the mtDNA and Y phylogenies of (A-B) Sumba and (C-D) Timor. The probability is shown of sharing a language l given that each pair ofindividuals are in the same genetic clade g at a given time in the past. Solid blue lines represent the observed metric, with shaded blue bands indicating the result of randompermutations of the linguistic data. Idealized scenarios are shown for strict matrilocality (yellow) and patrilocality (violet). Probabilities in the Sumba dataset approach the strictpatrilocal case for mtDNA (all times) and the Y chromosome (until „20 kya). In Timor, matrilocal probabilities are higher than chance for mtDNA, and mostly never greater thanchance for the Y chromosome.
our primary conclusions; not only is P pil, jl|ig, jgq greater overall for the Y chromosome in the patrilocal Sumba data, but thiseven begins to approach the predictions of an idealized, very strict patrilocal model.
The situation is more complicated for Timor because most individuals are multilingual. The higher prevalence of multilin-guality in central Timor that we observed probably reflects recent history (notably forced migrations over the last couple ofgenerations). In central Timor, there have been population movements caused by warfare over the last century, potentiallycontributing to the mixture of languages.
To capture the idealized matrilocal and patrilocal patterns, we now need to additionally consider the implications of suchrigid mating systems on multilinguality. To illustrate one method of calculating P pil, jl|ig, jgq while taking multilinguality intoaccount, we ran the same model used for Sumba, but with an additional probability parameter β that the new language isappended to the language list rather than replacing it (see the ‘Multilinguality in the model of language switching betweengenetic clades’ section below). For both the strict matrilocality and patrilocality cases, we used β “ 0.2 when languageswitching occurs. When a new language is introduced (i.e., language tree branching), we used β “ 0.7 for mtDNA and β “ 0.96for the Y chromosome. These β values were chosen as they represent the transition points below which the resulting languagedistribution and shared probabilities closely fit the Timor data.
Note that we are in no way claiming that this method is necessarily the right way to handle multilinguality, and insteadpresent this analysis as an exploratory study only. Multilinguality is an extremely complex question and largely beyond thescope of this manuscript. Nevertheless, these analyses clarify how the P pil, jl|ig, jgq findings for Timor relate to alternativeidealized scenarios. Here, we find that for mtDNA, the pattern on Timor is greater than random chance, analogous to rigidmatrilocality. For the Y chromosome, the pattern on Timor is mostly never greater than random chance, similar to strictmatrilocality, which is also always below random chance.
1. Van Oven M, Kayser M (2009) Updated comprehensive phylogenetic tree of global human mitochondrial dna variation. Human Mutation 30(2).2. ISOGG (2017) International society of genetic genealogy (http://www.isogg.org/tree, accessed 10-24-2016).3. Karafet TM et al. (2008) New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Research pp. 830–838.4. Bruvo R, Michiel NK, D’Souza TG, Schulenburg H (2004) A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology 13(7):2101–2106.5. Fu Q et al. (2013) A revised timescale for human evolution based on ancient mitochondrial genomes. Current Biology 23(7):553–559.6. Paradis E, Claude J, Strimmer K (2004) Ape: Analyses of phylogenetics and evolution in r language. Bioinformatics 20(2):289.7. Lewis PM, Simons GF, Fennig CD (2013) Ethnologue: Languages of the world.8. Paradis E (2013) Molecular dating of phylogenies by likelihood methods: a comparison of models and a new information criterion. Molecular Phylogenetics and Evolution 67(2):436–444.9. Xu S et al. (2012) Genetic dating indicates that the asian–papuan admixture through eastern indonesia corresponds to the austronesian expansion. Proceedings of the National Academy of
Sciences 109(12):4574–4579.10. Wilkinson-Herbots HM (2008) The distribution of the coalescence time and the number of pairwise nucleotide differences in the “Isolation with Migration” model. Theoretical Population Biology
73:277–288.11. Slatkin M, Hudson RR (1991) Pairwise comparisons of mitochondrial dna sequences in stable and exponentially growing populations. Genetics 129(2):555–562.12. Ewing G, Hermisson J (2010) MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26(16):2064–2065.13. Therik T (2004) Wehali – The Female Land: Traditions of a Timorese Ritual Centre. (Pandanus Books: ANU Research School of Pacific and Asian Studies, Canberra, Australia).14. Legendre P, Desdevises Y, Bazin E, Page RDM (2002) A statistical test for host–parasite coevolution. Systematic Biology 51:217–234.15. Dray S, Legendre P (2008) Testing the species traits–environment relationships: The fourth-corner problem revisited. Ecology 89:3400–3412.16. Tehrani JJ, Collard M, Shennan SJ (2010) The cophylogeny of populations and cultures: reconstructing the evolution of iranian tribal craft traditions using trees and jungles. Philosophical Transactions
of the Royal Society B: Biological Sciences 365(1559):3865–3874.
Lansing et al. 13 of 13