Supplementary information A history of expansion and ... of 13 Supplementary information A history of expansion and anthropogenic collapse in a top marine predator of the Black Sea

1 of 13

Supplementary information

A history of expansion and anthropogenic collapse in a top marine

predator of the Black Sea estimated from genetic data.

Fontaine MC, Snirc A, Frantzis A, Koutrakis E, Öztürk B, Öztürk AA, Austerlitz F.

Table S1. Mt-DNA control region haplotype count for the harbor porpoises from the Black Sea and the Aegean Sea. S2

Table S2. Model specification, prior distributions for demographic parameters and locus-specific mutation model parameters. S3

Table S3. Model checking procedure. S4

Fig. S1. Median-joining network among mtDNA-CR haplotypes. The size of the circles is proportional to the total number of haplotypes observed. Sectors are proportional to the numbers of each haplotype observed in the Black Sea (white) and in the Aegean Sea (black). S5

Fig. S2 Mismatch distribution of the mitochondrial control region (mtDNA-CR) in the Black Sea and Aegean Sea harbor porpoises. The black line shows the observed frequencies of pairwise nucleotide differences and the dashed grey line shows the expected frequencies under a demographic expansion. S6

Fig. S3. Population structure estimated from STRUCTURE analysis. S7

Fig. S4. Composite parameters estimated from DIYABC. S8

Fig. S5. Marginal posterior probability densities for the mutation model parameters for the microsatellite loci and the mtDNA control region. S9

Fig. S6. Principal component analysis on the summary statistics when processing model checking for the 15 tested scenarios detailed in Fig. 2. S10

2 of 13

Table S1.

Mt-DNA control region haplotype count for the harbor porpoises from the Black Sea and the

Aegean Sea.

Haplotype code

Black Sea

Aegean Sea

Genbank accession n°

H1 14 5 JX105486 H2 0 1 JX105487 H3 2 1 JX105488 H4 0 1 JX105489 H5 1 1 JX105490 H6 0 1 JX105491 H7 1 0 JX105492 H8 1 0 JX105493 H9 6 0 JX105494

H10 1 0 JX105495 H11 2 0 JX105496 H12 3 0 JX105497 H13 1 0 JX105498 H14 1 0 JX105499 H15 2 0 JX105500 H16 1 0 JX105501 H17 3 0 JX105502 H18 1 0 JX105503 H19 1 0 JX105504 H20 1 0 JX105505 H21 1 0 JX105506 H22 1 0 JX105507 H23 1 0 JX105508 H24 1 0 JX105509 H25 1 0 JX105510 H26 1 0 JX105511 H27 1 0 JX105512 H28 1 0 JX105513 H29 1 0 JX105514 H30 1 0 JX105515 H31 1 0 JX105516 H32 1 0 JX105517 Total 54 10 64

The haplotype codes correspond to the code used in Fig. S1.

3 of 13

Table S2. Model specification, prior distributions for demographic parameters and locus-specific

mutation model parameters.

Priors for the demographic parameters N1 UN~[10, 1000] Na UN~[1000, 10000] Nb UN~[1, 100] Nc UN~[10, 1000] NBN UN~[1, 100] T1 UN~[100,1000] T2 UN~[1, 100] DE UN~[100, 1000] DB UN~[1, 10] Prior for the mutation model Autosomal microsatellites MEAN - !mic UN~[1 x 10-4, 1 x 10-3] GAM - !mic GA~[1 x 10-5, 1 x 10-2 , 2] MEAN – P UN~[0.10 , 0.30] GAM – P GA~[0.01, 0.9 , 2] MEAN - SNI LU~[1 x 10-8, 1 x 10-4] GAM – SNI GA~[1 x 10-9, 1 x 10-3, 2] Mt-DNA Control Region !seq UN~[1 x 10-7, 1 x 10-5] K1 UN~[0.05, 20.00] % invar. sites 82 Shape 0.9

Uniform distribution (UN) with 2 parameters: min and max; Gamma distribution (GA) with 3 parameters: min, max, shape; Log-Uniform (LU) distribution with 2 parameters: min and max. See Fig. 2 for the demographic parameters of each model tested. The mutation model parameters for the microsatellite loci were the mutation rate (!mic), the parameter determining the shape of the gamma distribution of individual loci mutation rate (P), and the Single Insertion Nucleotide rate (SNI). The Mt-DNA CR HKY mutation model had two variable parameters, the per site and generation mutation rate (!seq) and the transition/transversion ratio (K1) parameter, and two fixed parameters, the fraction of constant sites (% invar. sites), and the shape of the Gamma distribution of mutations among sites (Shape).

4 of 13

Table S3. Model checking procedure.

Summary statistics (S) A He V MGW NHA NSS MPD VPD D

Obs. S 6.9 0.49 4.36 0.64 32 32 2.20 1.87 -2.22

Prob. (Ssimul. < Sobs.)

SC1 0.842 0.195 0.644 0.037* 1.000*** 0.864 0.031* 0.018* 0.000*** SC2 0.591 0.025* 0.164 0.286 1.000*** 0.693 0.032* 0.016* 0.002** SC3 0.960* 0.730 0.940 0.016* 1.000*** 0.979* 0.168 0.121 0.001** SC4 0.772 0.148 0.555 0.051 1.000*** 0.936 0.107 0.075 0.002** SC5 0.663 0.173 0.261 0.284 1.000*** 0.771 0.104 0.078 0.001** SC6 0.952* 0.731 0.934 0.017* 1.000*** 0.986* 0.158 0.116 0.001** SC7 0.762 0.129 0.592 0.044* 1.000*** 0.779 0.044* 0.023* 0.001** SC8 0.855 0.277 0.740 0.030* 1.000*** 0.801 0.036* 0.017* 0.002** SC9 0.862 0.264 0.666 0.033* 0.996** 0.516 0.023* 0.010* 0.000*** SC10 0.998** 0.993** 0.972* 0.015* 1.000*** 1.000*** 0.855 0.768 0.009** SC11 0.806 0.106 0.5165 0.057 1.000*** 0.927 0.138 0.074 0.001** SC12 0.482 0.026* 0.207 0.167 1.000*** 0.406 0.042* 0.022* 0.003** SC13 1.000*** 0.997** 0.987* 0.027* 1.000*** 1.000*** 0.866 0.777 0.006** SC14 0.748 0.566 0.834 0.008** 0.867 0.681 0.175 0.149 0.018* SC15 0.737 0.639 0.891 0.016* 0.857 0.648 0.220 0.198 0.047*

Evolutionary scenario SC1 to SC15 are represented in Figure 2. The probability Prob. (Ssimul. < Sobs.) given for each summary statistics was computed from 1,000 virtual data sets simulated from the posterior distributions of parameters obtained under a given scenario. Corresponding tail-area probabilities (p-values) were obtained as Prob. (Ssimul. < Sobs.) and 1.0 - Prob. (Ssimul. < Sobs.) for Prob. (Ssimul. < Sobs.) ! 0.5 and > 0.5, respectively (1). These summary statistics were the ones used in the ABC analysis. For microsatellite loci, we used the mean number of alleles per locus (A), the mean expected heterozygosity (He), the mean allele size variance (V), and the mean MGW index across loci (2). For the mtDNA-CR, we used the number of haplotypes (NHA), the number of segregating sites (NSS), the mean pairwise differences (MPD) and their variance (VPD), and Tajima's D statistics. *, **, *** = tail-area probability < 0.05, < 0.01 and < 0.001, respectively.

1. Gelman A, Carlin JB, Stern HS, & Rubin DB (1995) Bayesian Data Analysis (Chapman & Hall,

London). 2. Garza JC & Williamson EG (2001) Detection of reduction in population size using data from

microsatellite loci. Mol Ecol 10:305-318.

5 of 13

Fig. S1. Median-joining network among mtDNA-CR haplotypes. Circle sizes are proportional

to the total number of haplotypes observed. Sectors are proportional to the numbers of each

haplotype observed in the Black Sea (white) and in the Aegean Sea (black).

6 of 13

Fig. S2. Mismatch distribution for the mitochondrial control region (mtDNA-CR) in the Black Sea and

Aegean Sea harbor porpoises. The black line shows the observed frequencies of pairwise differences

and the dashed grey line shows the expected pairwise frequencies under a demographic expansion.

7 of 13

Fig. S3. Population structure as estimated using STRUCTURE 2.3.3 (ref. 1). (A) The

probability of observing the data (X) under a given number of putative clusters (K) is shown

as a function of K, for two admixture models implemented in STRUCTURE: the standard

model and the Locprior model, the latter assuming a priori that porpoises from the Black Sea

(1) and the Aegean Sea (2) might come from two distinct populations (see ref. 1). The error

bars provide the standard deviation observed over 10 replicated runs for each value of K. The

individual admixture proportion estimated at K = 2 are shown for the standard model (B) and

for the Locprior model (C). Each individual is represented by a thin horizontal line divided

into two colored segments that represent its estimated admixture fraction in each cluster.

1. Hubisz MJ, Falush D, Stephens M, & Pritchard JK (2009) Inferring weak population structure with the

assistance of sample group information. Mol Ecol Res 9:1322-1332.

8 of 13

!"!"#$%&'(

#$%&'()

* + , - !

*.*

*.,

*.!

*./

*.0

!"!)#$%&'(

#$%&'()

* +* ,* -* !*

*.**

*.*,

*.*!

*.*/

*.*0

!"!'#$%&'(

#$%&'()

* + , - !

*.*

*.,

*.!

*./

*.0

+.*

!"$*+,

!"#$%&'

()((( ()((* ()((+ ()((, ()((- ()(.(

(.((

*((

/((

+((

0((

!)#$*+,

!"#$%&'

()(* ()(+ ()(, ()(- ().(

(0

.(.0

*(*0

/(

!'#$*+,

!"#$%&'

()((( ()((* ()((+ ()((, ()((- ()(.(

(.((

*((

/((

+((

0((

Fig. S4. Composite parameters estimated from DIYABC. Marginal posterior probability

density of the population genetic diversity estimated using " = 4.Ne.!micro for autosomal

microsatellite loci (left panels) and " = Ne.!seq for mtDNA-CR (right panels), where Ne is the

effective population size, and !micro and !seq are the microsatellite mutation rate and

mitochondrial mutation rate, respectively.

9 of 13

Microsatellate mutation rate

Den

sity

2e!04 4e!04 6e!04 8e!04 1e!03

050

010

0015

00

P parameter of the GSM

Den

sity

0.10 0.15 0.20 0.25 0.30

010

2030

4050

60

SNI rate

Den

sity

0e+00 2e!05 4e!05 6e!05 8e!05 1e!04

050

000

1000

0015

0000

MtDNA CR mutation rate

Den

sity

2e!06 4e!06 6e!06 8e!06 1e!05

050

000

1000

00

K1 parameter of the HKY mutation rate

Den

sity

0 5 10 15 20

0.00

0.02

0.04

Fig. S5. Marginal posterior probability densities for the mutation model parameters for the

microsatellite loci and the mtDNA control region. For the microsatellite loci, the General

Stepwise Mutation model included the mutation rate, the parameter P determining the shape

of the gamma distribution of individual loci mutation rate, and the rate of Single Nucleotide

Insertions (SNI). For the mtDNA control region, the HKY model included the mutation rate

and the K1 parameter determining the shape of the gamma distribution of the inter-sites

variation of the mutation rate. The dotted lines show the prior distributions.

10 of 13

Fig. S6. (Continued)

11 of 13

Fig. S6. (Continued)

12 of 13

Fig. S6. Principal component analysis on the summary statistics when processing model

checking for the 15 tested scenarios detailed in Fig. 2. PCA were processed on the summary

13 of 13

statistics used to discriminate among scenarios and compute the posterior distributions of

parameters. Principal components (PCs) were computed considering 15,000 data sets

simulated with parameter values drawn from the prior distributions of parameters obtained

under each of the 15 scenarios. Then the target (observed) data set as well as the 1,000 data

sets simulated from the posterior distributions of parameters were added to each plane of the

PCA. Only the planes defined by the PC1 - PC2 and PC2 - PC3 are provided on the left and

right panels, respectively. The scenarios labelled in red are those with the highest posterior

probability (see the text).

Documents

Supplementary information A history of expansion and ... of 13 Supplementary information A history of expansion and anthropogenic collapse in a top marine predator of the Black Sea