19
Supporting Information Castoe et al. 10.1073/pnas.0900233106 SI Methods Mitochondrial Genome Sequencing and Annotation. The mitochon- drial genomes of two new snake species were sequenced for this study: Tropidophis haetianus (voucher: Brigham Young Univer- sity 48469; Hispaniola), and Anilius scytale (voucher: Christo- pher L. Parkinson; Brazil). Total DNA was isolated from tissue of both species using the Qiagen DNeasy extraction kit and protocol (Qiagen). Using the Expand Long Template PCR system (Roche Molecular Biochemicals), the mitochondrial ge- nome was amplified in five overlapping long fragments (1). All primer details are available upon request. Cycling conditions followed the manufacturers’ suggestions, with annealing tem- peratures between 50 °C and 55 °C, and for 35 cycles. Positive PCR products were electrophoretically separated and excised from agarose gels, followed by purification using the GeneClea- nIII kit (BIO101). Purified PCR products were cloned using the TopoXL cloning kit (Invitrogen). Plasmids containing the cor- rect long PCR fragments were isolated and purified using QIAprep Spin Miniprep kits (Qiagen) and sequenced using M13 primers (flanking the cloning site in the Topo vectors), along with an array of internal primers. Automated sequence deter- mination was carried out using either the CEQ Dye Terminator Cycle Sequencing Quick Start Kit (Beckman Coulter) run on a Beckman CEQ8000, or the ABI BIG-DYE cycle sequencing kit run on an ABI 3730 automated sequencer (Applied Biosystems). Most tRNAs were detected using tRNAscan (2), followed by manual verification. The tRNAs not detected by tRNAscan were identified by their position in the genome and folded manually based on homology. The tRNAs were used to identify approximate boundaries of protein coding genes, control regions, and ribosomal RNAs. Final boundaries of protein coding genes were set based on position of the most plausible first start and last stop codons in each region, including noncanonical signal codons known to operate in vertebrate mitochondrial genomes (3). Protein-coding genes were also translated to their amino acid sequence, and all amino acid and DNA sequences were compared to previously published snake genomes to verify annotation. Nucleotide Model Selection. MrModeltest v.2.2 (http://www.abc.se/ nylander) was used to select appropriate models of nucleotide evolution for Bayesian phylogenetic analyses, and PAUP * v.4.0b10 (4) was used to calculate model likelihoods for use in MrModeltest. AIC (5, 6) was used to select best-fit models for partitions of the dataset in MrModeltest. The mitochondrial dataset (mtDNA) was partitioned by gene and codon position, yielding a total of 39 partitions, and each partition was run through MrModeltest to identify the best partition-specific nu- cleotide model. Like the mitochondrial dataset, the nuclear dataset (nucDNA) was partitioned by gene and codon position, and these 6 partitions were assigned appropriate independent models of evolution using MrModeltest. Bayesian Phylogenetic Analysis Settings. Phylogenies were esti- mated using Bayesian Metropolis-Hastings coupled Markov chain Monte Carlo (BMCMC) phylogenetic methods conducted in MrBayes 3.1 (7) with vague priors and three incrementally heated chains in addition to the cold chain (as per the program’s defaults). Each MCMC analysis was conducted with the default settings, and with multiple independent MCMC runs per anal- ysis, sampling the chain 10,000 generations. Summary statistics and consensus phylograms with posterior support were esti- mated from the combination of post burn-in samples from parallel MCMC runs per analysis. The best-fit models for each partition were implemented as partition-specific models within (partitioned) mixed-model anal- yses of the mtDNA data, nuclear gene data, and the combined mtDNA nuclear gene dataset. These mixed-models BMCMC analyses were conducted using the ‘‘unlink’’ and ‘‘ratepr variable’’ commands in MrBayes to allow partition-specific nucleotide models and rates. Depending on the mixing and convergence rates of different analyses, we used different num- bers of independent runs and generations to assemble suffi- ciently large posterior distributions for the individual and com- bined data analyses. The mtDNA and the combined (mtDNA and nuclear gene) data were both analyzed using 6 independent MCMC runs, each conducted for 1 10 7 generations. The nuclear data analyses mixed much more rapidly (likely due to the simpler model and smaller data set size); this dataset was analyzed using 4 independent MCMC runs, each conducted for 5 10 6 generations. Assessment of Bayesian Phylogenetic Analysis Performance. We in- vestigated the performance of models by examining cold chain likelihood and parameter estimate burn-in, as well as the shapes and overlap of posterior distributions of parameters. We looked for evidence that model likelihood and parameter estimates ascended directly and rapidly to a stable plateau, and that independent runs converged on similar likelihood and parame- ter posterior distributions (considered evidence that a model was not fit with too many parameters). We also used the potential scale reduction factor (PSRF) to verify that independent runs converged on estimates of phylogeny and parameters (7, 8). We considered runs to have converged when the PSRF of parame- ters fell below 1.02 (1 indicating 100% convergence of runs). In separate analyses not shown here, we also experimented with mixed-models with less (or no) partitions, all of which suggested the superiority of the highly partitioned models described above based on bayes factor, relative bayes factor, and Akaike weights comparisons among variously partitioned BMCMC runs (e.g., ref. 9). Based on this criterion, we chose the conservative burn-in period of 3 10 6 because independent runs of all analyses had converged to PSRF 1.02 by this period (most parameters had PSRFs 1.002 by 3 10 6 generations). Molecular Evolutionary Analyses and Hypothesis Testing. Maximum parsimony, log-determinant distance methods, and maximum likelihood analyses conducted in PAUP * 4.0b10 were used primarily to evaluate evolutionary hypotheses. All MP phyloge- netic analyses were conducted using PAUP * version 4.0b10 with all characters treated as equally weighted. Heuristic searches were used to find optimal trees, with TBR branch-swapping and 1,000 random-taxon-addition sequences, and bootstrap support assessed using with 1,000 full heuristic pseudoreplicates (each with 10 random-taxon-addition replicates). Maximum likelihood (ML) analyses were used to test the significance of the likelihood score differences between alter- native topologies using Shimodaira-Hasegawa (S-H) tests (10). S-H tests were conducted using the RELL option with 300 replicates per test. Maximum likelihood was also used to opti- mize branch lengths based on different portions of the mtDNA data (e.g., different codon positions, transversion substitutions, etc.). Site-specific likelihood values and site rate classes (per site) were also calculated via ML using the ‘‘lscores’’ command. A Castoe et al. www.pnas.org/cgi/content/short/0900233106 1 of 19

Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Supporting InformationCastoe et al. 10.1073/pnas.0900233106SI MethodsMitochondrial Genome Sequencing and Annotation. The mitochon-drial genomes of two new snake species were sequenced for thisstudy: Tropidophis haetianus (voucher: Brigham Young Univer-sity 48469; Hispaniola), and Anilius scytale (voucher: Christo-pher L. Parkinson; Brazil). Total DNA was isolated from tissueof both species using the Qiagen DNeasy extraction kit andprotocol (Qiagen). Using the Expand Long Template PCRsystem (Roche Molecular Biochemicals), the mitochondrial ge-nome was amplified in five overlapping long fragments (1). Allprimer details are available upon request. Cycling conditionsfollowed the manufacturers’ suggestions, with annealing tem-peratures between 50 °C and 55 °C, and for 35 cycles. PositivePCR products were electrophoretically separated and excisedfrom agarose gels, followed by purification using the GeneClea-nIII kit (BIO101). Purified PCR products were cloned using theTopoXL cloning kit (Invitrogen). Plasmids containing the cor-rect long PCR fragments were isolated and purified usingQIAprep Spin Miniprep kits (Qiagen) and sequenced using M13primers (flanking the cloning site in the Topo vectors), alongwith an array of internal primers. Automated sequence deter-mination was carried out using either the CEQ Dye TerminatorCycle Sequencing Quick Start Kit (Beckman Coulter) run on aBeckman CEQ8000, or the ABI BIG-DYE cycle sequencing kitrun on an ABI 3730 automated sequencer (Applied Biosystems).

Most tRNAs were detected using tRNAscan (2), followed bymanual verification. The tRNAs not detected by tRNAscanwere identified by their position in the genome and foldedmanually based on homology. The tRNAs were used to identifyapproximate boundaries of protein coding genes, controlregions, and ribosomal RNAs. Final boundaries of proteincoding genes were set based on position of the most plausiblefirst start and last stop codons in each region, includingnoncanonical signal codons known to operate in vertebratemitochondrial genomes (3). Protein-coding genes were alsotranslated to their amino acid sequence, and all amino acid andDNA sequences were compared to previously published snakegenomes to verify annotation.

Nucleotide Model Selection. MrModeltest v.2.2 (http://www.abc.se/�nylander) was used to select appropriate models of nucleotideevolution for Bayesian phylogenetic analyses, and PAUP*v.4.0b10 (4) was used to calculate model likelihoods for use inMrModeltest. AIC (5, 6) was used to select best-fit models forpartitions of the dataset in MrModeltest. The mitochondrialdataset (mtDNA) was partitioned by gene and codon position,yielding a total of 39 partitions, and each partition was runthrough MrModeltest to identify the best partition-specific nu-cleotide model. Like the mitochondrial dataset, the nucleardataset (nucDNA) was partitioned by gene and codon position,and these 6 partitions were assigned appropriate independentmodels of evolution using MrModeltest.

Bayesian Phylogenetic Analysis Settings. Phylogenies were esti-mated using Bayesian Metropolis-Hastings coupled Markovchain Monte Carlo (BMCMC) phylogenetic methods conductedin MrBayes 3.1 (7) with vague priors and three incrementallyheated chains in addition to the cold chain (as per the program’sdefaults). Each MCMC analysis was conducted with the defaultsettings, and with multiple independent MCMC runs per anal-ysis, sampling the chain 10,000 generations. Summary statisticsand consensus phylograms with posterior support were esti-

mated from the combination of post burn-in samples fromparallel MCMC runs per analysis.

The best-fit models for each partition were implemented aspartition-specific models within (partitioned) mixed-model anal-yses of the mtDNA data, nuclear gene data, and the combinedmtDNA nuclear gene dataset. These mixed-models BMCMCanalyses were conducted using the ‘‘unlink’’ and ‘‘ratepr �variable’’ commands in MrBayes to allow partition-specificnucleotide models and rates. Depending on the mixing andconvergence rates of different analyses, we used different num-bers of independent runs and generations to assemble suffi-ciently large posterior distributions for the individual and com-bined data analyses. The mtDNA and the combined (mtDNAand nuclear gene) data were both analyzed using 6 independentMCMC runs, each conducted for 1 � 107 generations. Thenuclear data analyses mixed much more rapidly (likely due to thesimpler model and smaller data set size); this dataset wasanalyzed using 4 independent MCMC runs, each conducted for5 � 106 generations.

Assessment of Bayesian Phylogenetic Analysis Performance. We in-vestigated the performance of models by examining cold chainlikelihood and parameter estimate burn-in, as well as the shapesand overlap of posterior distributions of parameters. We lookedfor evidence that model likelihood and parameter estimatesascended directly and rapidly to a stable plateau, and thatindependent runs converged on similar likelihood and parame-ter posterior distributions (considered evidence that a model wasnot fit with too many parameters). We also used the potentialscale reduction factor (PSRF) to verify that independent runsconverged on estimates of phylogeny and parameters (7, 8). Weconsidered runs to have converged when the PSRF of parame-ters fell below 1.02 (1 indicating 100% convergence of runs). Inseparate analyses not shown here, we also experimented withmixed-models with less (or no) partitions, all of which suggestedthe superiority of the highly partitioned models described abovebased on bayes factor, relative bayes factor, and Akaike weightscomparisons among variously partitioned BMCMC runs (e.g.,ref. 9). Based on this criterion, we chose the conservative burn-inperiod of 3 � 106 because independent runs of all analyses hadconverged to PSRF �1.02 by this period (most parameters hadPSRFs �1.002 by 3 � 106 generations).

Molecular Evolutionary Analyses and Hypothesis Testing. Maximumparsimony, log-determinant distance methods, and maximumlikelihood analyses conducted in PAUP* 4.0b10 were usedprimarily to evaluate evolutionary hypotheses. All MP phyloge-netic analyses were conducted using PAUP* version 4.0b10 withall characters treated as equally weighted. Heuristic searcheswere used to find optimal trees, with TBR branch-swapping and1,000 random-taxon-addition sequences, and bootstrap supportassessed using with 1,000 full heuristic pseudoreplicates (eachwith 10 random-taxon-addition replicates).

Maximum likelihood (ML) analyses were used to test thesignificance of the likelihood score differences between alter-native topologies using Shimodaira-Hasegawa (S-H) tests (10).S-H tests were conducted using the RELL option with 300replicates per test. Maximum likelihood was also used to opti-mize branch lengths based on different portions of the mtDNAdata (e.g., different codon positions, transversion substitutions,etc.). Site-specific likelihood values and site rate classes (per site)were also calculated via ML using the ‘‘lscores’’ command. A

Castoe et al. www.pnas.org/cgi/content/short/0900233106 1 of 19

Page 2: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

GTR�model of nucleotide evolution was used for all MLanalyses, and parameter estimates were estimated for eachanalysis uniquely.

Modified Calculations for Bayesian Estimation of Convergence andDivergence in PAML. For our Bayesian approach, we modified thecodeml program of PAML (11) to calculate the posterior prob-ability of all possible amino acid substitutions along every branchin the phylogeny, while accounting for rate variation across sites.The probability of convergent and divergent substitutions werecalculated as the sum of the joint probabilities of all possiblepairs of substitutions that end in the same state (convergent) orin a different state (divergent), between the two branches inquestion.

Simplifying the notation slightly, the posterior probability forsubstitution i to j at a given site and a given node was defined as:

P�i3 j�x� � �l

P�x��z � i�

�P�i3 j�t, y � l� � P�x

��z � j, y � l��k P�i3 k�t, y � l� � P�x��z � k, y � l�

� P� y � l�x�,

where y is an indicator of the rate category defined by thediscrete gamma distribution for rates-across-sites, x is the ob-served sequence data at the tips (for a given site), and z is thestate at the current node. Each of x, y, and z are implicitlyconditioned on the site and node under consideration. Theprobability of the observed data terms involving x, refer to thedata at the tips above the current node or below it. The ‘‘below’’terms are simply the conditional site likelihoods calculatedduring a postorder traversal (12), while the ‘‘above’’ terms arecomplementary values calculated in a secondary preorder tra-versal (see refs. 13 and 14 for discussion). We define the expectednumber of posterior substitutions as the sum of these probabil-ities across all sites for each substitution type.

Controls for Possible Biases Caused by Deep Divergence. Since bothbranches in the pair of greatest interest were quite deep andreflected a high level of divergence, we examined the relation-ship between excess convergence and divergence to assess theimpact of possible artifacts. There was a significant positivecorrelation between excess convergence and divergence (r �0.586, P � 1 � 10�15; Fig. 3 A and B), which suggests that longer

branches may have been subject to greater than expectedamounts of convergent substitution than shorter branches. Theobserved excess convergence on the snake–agamid stem pair isfar in excess of what is predicted by this (Fig. 3 A and B). We thentested whether error in the posterior substitution probabilitydistributions at deeper branches could explain the observedexcess convergence. We found a significant relationship betweendivergence and the entropy of the posterior substitution prob-ability matrices (r � 0.462, P � 1 � 10�15), as well as a significantrelationship between excess convergence and entropy(r � 0.225,P � 2.1 � 10�11). The latter relationship, however, is entirelyaccounted for by the correlation between divergence and en-tropy, as demonstrated by partial correlation analysis (r ��0.064, P � 0.0615). These analyses therefore suggest that theexcess convergence that occurred between the snake–agamidbranch pair is not explainable by biases in posterior convergenceestimates at longer branch lengths.

Prediction of the Expected Convergence to Divergence Ratio. Wefurther modified PAML to calculate the expected probabilities ofconvergent and divergent substitution, per site. We conditionedthese calculations on the MLE parameter estimates, so that wecould evaluate the expected posterior probabilities per site if themodel were exactly correct. Under this assumption, the expectedsubstitution probability (for a given site and node) reducessimply to:

ExP�i3 j�x� � �l

�i � P�i3 j�t, y � l� � 1 �k ,

where �iis the equilibrium frequency of state i, and k is thenumber of rate categories. Since this is the per site expectation,we can use these probabilities to calculate the expected numberof convergent and divergent substitutions per site, and simplymultiply these by the number of sites to yield the desiredpredictions.

Model-Based Predictions of Convergence. To assess the expecteddegree of random homoplasy in the dataset if the model wereexactly correct, we calculated the asymptotic expected ratio ofconvergent to divergent substitutions based on parameter esti-mates from the mitochondrial data (see Methods). This analysissuggested a much smaller amount of convergence should havebeen expected in the entire dataset based on random chancealone, if the evolutionary model was correct (0.099 convergentsubstitutions per divergent substitution, or 1.7 times less than theobserved ratio; Fig. 3B, blue line).

1. Jiang ZJ, et al. (2007) Comparative mitochondrial genomics of snakes: Extraordinarysubstitution rate dynamics and functionality of the duplicate control region. BMC EvolBiol 7:123.

2. Lowe TM, Eddy SR (1997) Trnascan-se: A program for improved detection of transferrna genes in genomic sequence. Nucleic Acids Res 25:955–964.

3. Slack KE, Janke A, Penny D, Arnason U (2003) Two new avian mitochondrial genomes(penguin and goose) and a summary of bird and reptile mitogenomic features. Gene302:43–52.

4. Swofford DL (2001) PAUP*: Phylogenetic Analysis Using Parsimony (* and OtherMethods) (Sinauer Associates, Sunderland, MA).

5. Akaike H (1973) Information theory and an extension of the maximum likelihoodprinciple. Second International Symposium on Information Theory, eds Petrov BN,Csake F (Akademia Kiado, Budapest), pp 673–681.

6. Akaike H (1983) Information measures and model selection. Int Stat Inst 22:277–291.7. Ronquist F, Huelsenbeck JP (2003) Mrbayes 3: Bayesian phylogenetic inference under

mixed models. Bioinformatics 19:1572–1574.

8. Gelmin A, Rubin DB (1992) Inference from iterative simulation using multiple se-quences (with discussion). Stat Sci 7:457–472.

9. Castoe TA, Sasa MM, Parkinson CL (2005) Modelling nucleotide evolution at themesoscale: The phylogeny of the neotropical pitvipers of the porthidium group(viperidae: crotalinae). Mol Phylogenet Evol 37:881–898.

10. Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods withapplications to phylogenetic inference. Mol Biol Evol 16:1114–1116.

11. Yang ZH (1997) Paml: A program package for phylogenetic analysis by maximumlikelihood. Comput App Biosci 13:555–556.

12. Felsenstein J (1981) Evolutionary trees from DNA sequences: A maximum likelihoodapproach. J Mol Evol 17:368–376.

13. Schadt EE, Sinsheimer JS, Lange K (1998) Computational advances in maximum likeli-hood methods for molecular phylogeny. Genome Res 8:222–233.

14. Siepel A, Haussler D (2004) Phylogenetic estimation of context-dependent substitutionrates by maximum likelihood. Mol Biol Evol 21:468–488.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 2 of 19

Page 3: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S1. Bayesian phylogenetic estimates based on the mitochondrial (A) and nuclear (B) gene nucleotide datasets. All nodes shown are supported by 100%posterior probability support unless noted. Nodes below 95% posterior probability were collapsed. The gray bars indicate members of the Agamidae andIguanidae that form a clade in the nuclear tree (B) but are distant in the mitochondrial tree (A). The gray-filled circle indicated the only other strongly supporteddifference between the mitochondrial and nuclear gene trees.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 3 of 19

Page 4: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

A i

ii

iii

Fig. S2. (A) The number of sites per 100 base segments that show an absolute difference in �SSLS between topologies 0.1. The number of sites (per 100)supporting the NUC topology are shown in gray fill, and those supporting the MT topology are shown in black fill. Results for (i) first positions, (ii) second positions,and (iii) third position are shown separately. (B) The number of sites per 100 base segments that show an absolute difference in �SSLS between topologies 0.5.The number of sites (per 100) supporting the NUC topology are shown in gray fill, and those supporting the MT topology in black fill. Results for (i) first positions,(ii) second positions, and (iii) third position are shown separately.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 4 of 19

Page 5: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

B i

ii

iii

Fig. SF2. Continued.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 5 of 19

Page 6: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

A.

B.

Fig. S3. Comparison of GC nucleotide content across squamate mitochondrial genomes used in this study. GC content for (A) third codon positions and (B) allcodon position of all 13 mitochondrial protein-coding genes are shown for all squamate species used. Snakes are shown in red, agamid lizards in black, and theremaining squamates in blue.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 6 of 19

Page 7: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S4. Phylogenetic tree estimated by using log-determinant analysis of the mitochondrial nucleotide dataset.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 7 of 19

Page 8: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

1st position transversions

2nd position transversions

3rd position transversions

Fig. S5. Plots of the site-specific log likelihood ratio (�SSLS) across the mitochondrial genome for transversion substitutions only for first (A), second (B), andthird (C) codon positions. Positive values of �SSLS indicate higher relative support for the NUC (presumed correct) tree, whereas negative �SSLS values indicategreater support for the MT topology. Absolute �SSLS values 0.5 are shown in red for emphasis, and the remaining are shown in blue.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 8 of 19

Page 9: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

5554

87

95

99

81

86

Fig. S6. (A) Bayesian phylogeny estimate (PhyloBayes, CAT model with DPP and dGamma) based on the mitochondrial amino acid data translated from all 13protein-coding regions. All nodes received 100% posterior probability unless otherwise specified. Agamids are colored red, iguanids orange, and snakes blue.(B) Bayesian phylogeny estimate (PhyloBayes, CAT model with DPP and dGamma) with the top 2.5% most converged sites between snakes and agamids removed.All nodes received 100% posterior probability unless otherwise specified. Agamids are colored red, iguanids orange, and snakes blue. (C) Bayesian phylogenyestimate (PhyloBayes, CAT model with DPP and dGamma) with the top 5.0% most converged sites between snakes and agamids. All nodes received 100%posterior probability unless otherwise specified. Agamids are colored red, iguanids orange, and snakes blue.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 9 of 19

Page 10: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

72

58

82

90

57

55

Fig. S6. Continued.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 10 of 19

Page 11: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

76

72

71 82

81

53

97

Fig. S6. Continued.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 11 of 19

Page 12: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S7. (A) Empirical maximization of the expected number of true positives under FDR control. The number of rejected null hypotheses were calculated byusing the Benjamini and Hochberg procedure for FDR control at a spectrum of maximum allowed false-discovery rates (blue). The number of expected truepositives was also calculated for each value of the maximum FDR (red) by using (1 � FDR) times the number of rejected null hypotheses. This allows the empiricalmaximum to be determined, at which point the set of rejected null hypotheses is expected to contain the maximum number of true positives (11 rejected nullhypotheses, 5.4 of which are expected to be correctly rejected). (B–L) Branch pairs having a significant excess of convergent substitutions, at 51% maximum FDR.The FDR level was determined by maximizing the expected number of true positives. Because the maximum FDR is 51%, �5 of these 11 cases are expected toreflect correctly rejected null hypotheses. Results are displayed with increasing P values, so that the first 5 branch pairs are the most likely to be correct.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 12 of 19

Page 13: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S7. Continued.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 13 of 19

Page 14: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S7. Continued.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 14 of 19

Page 15: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S7. Continued.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 15 of 19

Page 16: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S8. Bayesian phylogeny estimate based on the mitochondrial nucleotide data from 13 protein-coding genes reanalyzed with 500 codons with the highestSSLS for the MT (over the NUC) tree omitted. Posterior probability support for bipartitions is indicated adjacent to respective nodes; analysis based on a totalof 10,227 nucleotide sites from the mitochondrial protein-coding genes.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 16 of 19

Page 17: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Fig. S9. Amino acid sites with the 2.5% highest probability of convergence between snakes and agamids. Sites rankings were determined by their posteriorprobability of convergence between snakes and agamids. Note that these sites are displayed in sequential order for display purposes only and are not foundtogether in the actual alignment.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 17 of 19

Page 18: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Table S1. Mitochondrial genome sequences used in this study

Vertebrate group Genbank accession Species

Outgroups NC�001922 Alligator mississippiensisNC�001567 Bos taurusNC�000886 Chelonia mydasNC�001323 Gallus gallusNC�001573 Xenopus laevis

Tuatara NC�004815 Sphenodon punctatusLizards NC�005958 Abronia graminea

NC�006284 Amphisbaena schmidtiNC�010972 Anolis carolinensisNC�006287 Bipes biporusNC�005962 Cordylus warrenNC�006283 Diplometopon zarudnyiNC�000888 Eumeces egregiousNC�007627 Gecko geckoNC�006285 Geocalamus acutusNC�002793 Iguana iguanaNC�008328 Lacerta viridisEU747729 Ophisaurus attenuatesNC�006922 Pogona vitticepesNC�006282 Rhineura floridanaNC�005960 Sceloporus occidentalisNC�005959 Shinisaurus crocodilurusNC�007008 Teratoscincus keyserlingii

AB080275–6 Varanus komodoensisNC�008065 Xenagama taylori

Snakes FJ755180 Anilius scytaleNC�007398 Boa constrictorNC�007401 Cylindrophis ruffusNC�001945 Dinodon semicarinatusAM236347 Eunectes notaeusNC�005961 Leptotyphlops dulcisNC�007397 Ovophis okinavensisDQ523161 Pantherophis slowinskiiNC�007399 Python regiusAM236346 Ramphotyphlops australisFJ755181 Tropidophis haetianus

AM236345 Typhlops mirus

New mitochondrial genomes are indicated in bold.

Castoe et al. www.pnas.org/cgi/content/short/0900233106 18 of 19

Page 19: Supporting Information - PNAS...Apr 28, 2009  · ship between excess convergence and divergence to assess the ... degree of random homoplasy in the dataset if the model were exactly

Table S2. Nuclear gene sequences used in this study

Species (mtDNA) Species (Rag1) Rag1 accession Species (c-mos) c-mos accession no.

Abronia graminea Elgaria panamintina AY662603 Elgaria multicarinata AF039479Acrochordus granulatus A. granulatus/A. javanicus AY487388/AY988070 A. granulatus AF471124Agkistrodon piscivorus Gloydius halys AY662614 A. piscivorus AF471096Alligator mississippiensis A. mississippiensis AF143724 A. mississippiensis AY447979Amphisbaena schmidti A. xera AY662619 A. sp. AY444020Anilius scytale A. scytale AY487382/AY988072 A. scytale AF544722Anolis carolinensis A. paternus AY662589 Leiocephalus sp. AF315388Bipes biporus B. biporus AY662616 B. sp. AY444018Boa constrictor B. constrictor AY988064/AY487351 B. constrictor AF471115Bos taurus B. Taurus XM�867321 B. taurus AY168496Chelonia mydas C. mydas AY687907 A. spinifera DQ529205Cordylus warreni C. polyzonus AY662643 C. namaquensis AY217848Cylindrophis ruffus C. ruffus AY662613 C. ruffus AF471133Dinodon semicarinatus D. sp. AY662611 D. rufozonatum AF471163Diplometopon zarudnyi D. zarudnyi AY444049 D. zarudnyi AY444023Eumeces egregius E. inexpectatus AY662632 E. inexpectatus AY217888Eunectes notaeus E. notaeus AY988063 E. murinus AY099964Gallus gallus G. gallus NM�001031188 G. gallus XM�426938Gecko gecko G. gecko AY662625 G. gecko AY444028Geocalamus acutus G. acutus AY444043 G. acutus AY444017Homo sapiens H. sapiens NM�000448 H. sapiens NM�005372Iguana iguana Basiliscus plumifrons AY662599 Basiliscus plumifrons AY987986Lacerta viridis Eremias sp. AY662615 L. viridis DQ097132Leptotyphlops dulcis L. columbi AY487383 L. humilis AY099979Ophisaurus attnuatus O. attenua AY662602 O. gracilis AY444030Ovophis okinavensis O. okinavensis GB####### O. okinavensis GB#######Pantherophis guttatus — — P. guttatus DQ902070Pogona vitticepes Physignathus lesueurii AY662581 Physignathus lesueurii AF137524Python regius P. reticulatus AY487396 P. molurus AY099968Ramphotyphlops australis R. braminus AY662612 R. braminus AY099980Rhineura floridana R. floridana AY662618 R. floridana AY444021Sceloporus occidentalis Phrynosoma mcallii AY662590 S. grammicus AF039478Shinisaurus crocodilurus S. crocodilurus AY662610 S. crocodilurus AY099976Sphenodon punctatus S. punctatus AY662576 S. punctatus AF039483Teratoscincus keyserlingii T. keyserlingii AY662624 T. keyserlingii AY662569Tropidophis haetianus T. haetianus/T. melanurus AY988073/AY487384 T. haetianus AY099962Typhlops mirus T. lumbricalis AY487387 T. brachycephalus AY099981Varanus komodoensis V. griseus AY662608 V. salvator AF435017Xenagama unicolor Japalura tricarinata AY662585 Agama agama AF137530Xenopeltis unicolor X. unicolor AY487400/DQ465564 X. unicolor AF544689Xenopus laevis X. laevis L19324 X. laevis M25366

Castoe et al. www.pnas.org/cgi/content/short/0900233106 19 of 19