Upload
dinhdung
View
217
Download
0
Embed Size (px)
Citation preview
1
The molecular basis of host adaptation in cactophilic Drosophila: Molecular evolution of a Glutathione S-transferase gene (GstD1) in Drosophila mojavensis
Luciano M. Matzkin
Department of Ecology and Evolutionary Biology and Center for Insect Science, University of Arizona, Tucson, Arizona 85721
Sequence data from this article have been deposited with the GenBank Data Library under accession nos. EU079380-EU079471.
Genetics: Published Articles Ahead of Print, published on February 1, 2008 as 10.1534/genetics.107.083287
2
Running head: Molecular evolution of GstD1
Key words: cactophilic Drosophila, Glutathione S-transferase, detoxification, host adaptation,
sequence variation
Corresponding author:
Luciano M. Matzkin
Department of Ecology and Evolutionary Biology
University of Arizona
1041 E. Lowell St.
Tucson, AZ 85721-0088
Phone: (520) 626-2772
Fax: (520) 626-3522
Email: [email protected]
3
ABSTRACT
Drosophila mojavensis is a cactophilic fly endemic to the northwestern deserts of North
America. This species includes four genetically isolated cactus host races each individually
specializing on the necrotic tissues of a different cactus species. The necrosis of each cactus
species provides the resident D. mojavensis populations with a distinct chemical environment. A
previous investigation of the role of transcriptional variation in the adaptation of D. mojavensis
to its hosts produced a set of candidate loci that are differentially expressed in response to host
shifts, among them was Glutathione S-transferase-D1 (GstD1). In both D. melanogaster and
Anopheles gambiae, GstD1 has been implicated in the resistance of these species to the
insecticide DDT. The pattern of sequence variation of the GstD1 locus from all four D.
mojavensis populations, D. arizonae (sister species) and D. navojoa (outgroup) has been
examined. The data suggest that in two populations of D. mojavensis GstD1 has gone through a
period of adaptive amino acid evolution. Further analyses indicate that of the seven amino acid
fixations that occurred in the D. mojavensis lineage, two of them occur in the active site pocket,
potentially having a significant effect on substrate specificity and in the adaptation to alternative
cactus hosts.
4
INTRODUCTION
The concept of “evolution at two levels” proposed by KING and WILSON (1975), predicts that
evolutionary change is a function of both coding sequence and transcriptional variation. There
are myriads of examples highlighting the role of coding sequence variation in evolution and
recently this pattern has been observed at the genomic level (BIERNE and EYRE-WALKER 2004;
BUSTAMANTE et al. 2005; FAY et al. 2002; VOIGHT et al. 2006). As well, it is becoming
apparent that natural selection plays a large role in interspecific transcriptional variation (GILAD
et al. 2006; NUZHDIN et al. 2004; RIFKIN et al. 2003). What is sometimes lacking in many
studies, and often the least tractable, is an understanding of how the transcriptional and sequence
variation relate to the ecology of the organism.
Drosophila mojavensis offers a unique opportunity to incorporate knowledge of its ecology
with its transcriptional and sequence variation. Drosophila mojavensis and its sister species D.
arizonae are cactophilic flies that have diverged ~1.5 million years ago (MATZKIN 2004;
MATZKIN and EANES 2003; REED et al. 2007). These species utilize the necrotic tissues of
several cactus species as their host. The range of D. mojavensis is composed of four
geographically and genetically isolated host races (MACHADO et al. 2007; REED et al. 2007;
ROSS and MARKOW 2006). Each host race, mainland Sonora Desert, Baja California, Catalina
Island and Mojave Desert, utilizes a different species of cactus: organpipe (Stenocereus
thurberi), agria (S. gummosus), prickly pear (Opuntia spp.) and barrel (Ferocactus cylindraceus),
respectively (FELLOWS and HEED 1972; RUIZ and HEED 1988). Drosophila mojavensis has been
proposed (RUIZ et al. 1990) to have originated in Baja California utilizing a Stenocereus cactus
5
(possibly agria) and then migrated up the peninsula and colonized Catalina Island and the
Mojave Desert, shifting cactus hosts in the process. A subsequent colonization (and host shift)
from Baja to Sonora established the present day mainland Sonora Desert population.
The differences in the chemical composition of the cacti, in addition to the microflora
associated with the necrosis (STARMER 1982; STARMER et al. 1986) produce very distinct
chemical environments to which each population of D. mojavensis must adapt (HEED 1978;
KIRCHER 1982; VACEK 1979). Some of the chemical differences include such compounds as
alcohols, alkaloids (in Opuntia), triterpenes and glycosides. Previous studies in D. mojavensis
have shown that this chemical variation can drive the molecular and functional evolution of
metabolic genes (MATZKIN 2004; MATZKIN 2005; MATZKIN and EANES 2003). For example,
alleles of alcohol dehydrogenase-2 (ADH-2) with the greatest activity on 2-propanol are found at
highest frequency in a population (Baja California) that experiences greater 2-propanol
concentration within the cactus necrosis (agria). Additionally, a previous study examining
transcriptional variation in D. mojavensis identified ADH-2 as one of a series of genes whose
expression is induced as a response to a cactus host shift (MATZKIN et al. 2006). This suggests
that both transcriptional and coding sequence changes are involved in the adaptation process.
Among the list of candidate genes previously shown to differentially regulate with host
utilization was a gene with high sequence identity to the D. melanogaster Glutathione S-
transferase D1 (GstD1) gene (MATZKIN et al. 2006). In D. melanogaster (TANG and TU 1994) as
well as in the mosquito Anopheles gambiae (RANSON et al. 2001), GstD1 has a high activity and
is presumed to be involved in the resistance to the insecticide DDT (Dichloro-Diphenyl-
6
Trichloroethane). In general several members of the Gst gene family have been known to play a
role in detoxification in many taxa, including insects (ENAYATI et al. 2005). GST’s generally
function on hydrophobic organic compounds, altering their hydrophobicity (ATKINS et al. 1993).
Given the large quantities of organic compounds within the cactus necrosis, some of which are
toxic, there is constant selection pressure on D. mojavensis to evolve resistance to these
compounds, especially in the presence of a cactus host shift. Therefore, it can be predicted that
as a consequence of a cactus host shift, detoxification enzymes might be under selection (both at
the transcriptional and coding sequence level). This study investigates the pattern of variation
among the different cactus host populations of D. mojavensis and D. arizonae at GstD1, a gene
previously suggested to play a role in host adaptation (MATZKIN et al. 2006). Glutathione S-
transferase D1 appears to have been through a period of adaptive protein evolution associated
with the cactus host shifts of two D. mojavensis populations. The fixed amino acid changes that
occurred in D. mojavensis have potentially large functional consequence given their location on
the enzyme’s structure. Conversely, strong purifying selection appears to have occurred in the
other two D. mojavensis populations as well as in D. arizonae, possibly as a result of
commonalities in cactus host utilization. Additionally, the pattern of population structure at this
locus suggests an alternative scenario to the relationship and history of the four D. mojavensis
host races.
MATERIALS AND METHODS
Sampling and sequencing: All fly samples used in this study, with the exception of D.
7
navojoa (outgroup) and one D. arizonae line, were collected from the field and maintained as
isofemale lines on Banana/Opuntia media. The four D. navojoa lines sampled were obtained
from the Tucson Drosophila Species Stock Center (stocks 15081-1374.01 [Tehuantepec, Mex.],
15081-1374.10 [El Dorado, Mex.], 15081-1374.11 and 15081-1374.12 [both from Jalisco,
Mex.]). The collection sites of all samples used in this study are shown in Fig. 1. The collection
of D. arizonae included 20 lines from the northern population (seven from Navojoa, Mex., six
from Riverside, CA and eight from Tucson, AZ) and 5 from the southern population (four from
Hidalgo, Mex. [one was from the Stock Center, 15081-1271.05] and one from Chiapas, Mex.).
A total of 63 lines were used for D. mojavensis: 15 from the Mojave population (seven from
Grand Canyon and eight from Anza-Borrego), eight from the Catalina Island population, 17 from
the Baja California population (La Paz, Mex.), and 23 from the mainland Sonora population
(three from Desemboque, Mex., four from Hermosillo, Mex. and 16 from Guaymas, Mex.).
The complete nucleotide sequence of the D. mojavensis GstD1 locus was obtained by querying
the sequenced clone from our previous study (MATZKIN et al. 2006) to the D. mojavensis
genome sequence (http://rana.lbl.gov/drosophila/). The complete coding sequence of GstD1 is
630 bp long, and like other D-class Gst’s it is intron-less. The entire GstD1 coding region plus
99 bp upstream from the initiation codon was sequenced in all the 93 lines mentioned above.
Genomic DNA was extracted from one individual per isofemale line using DNeasy columns
(Qiagen, Valencia, CA). PCR amplification of the GstD1 gene region was done using the
following forward and reverse primers: 5’ CATGAGCCCGATATTTAATT 3’ and 5’
TGGAGCTCAATCGAAGTATT 3’. Amplifications were done at 54° C annealing temperature
in an Eppendorf MasterCycler using Invitrogen Taq DNA polymerase. Sequencing of both
8
strands using the above primers was done in an Applied Biosystems (Foster City, CA) 3730XL
DNA Analyzer housed at the Genomic Analysis and Technology Core Facility at the University
of Arizona. All sequences are stored under GenBank accession nos. EU079380-EU079471.
Data analysis: All chromatographs were aligned and sequence contigs were assembled using
Sequencher v4.6 (Ann Arbor, MI) software. As stated above, most sequences were derived from
individuals from isofemale lines, therefore it is expected that there will be segregating
polymorphisms within the sequence strains. Knowledge of the linkage phase is not necessary
however, for certain analyses that test departure from neutrality, such as the McDonald-Kreitman
(MK) test (MCDONALD and KREITMAN 1991). Given that polymorphism can lead to fixations,
under neutrality the ratio of synonymous to non-synonymous polymorphisms and fixations
should be similar. The MK test determines via a G-test if there is a difference between these two
ratios (i.e. departure from neutrality), hence only knowledge of the number of fixations and
polymorphisms is needed. Conversely, linkage phase is needed to implement analysis based on
pairwise nucleotide differences, as well for the purpose of examining phylogenetic relationships.
Therefore, the linkage phase had to be inferred from the dataset. Several methods exist for
haplotype reconstruction. The earliest method created was based on parsimony (CLARK 1990).
Although it was simple, it had a high error rate. Recently both maximum likelihood and
Bayesian methods have been devised. Two such a methods (HAP and ELB) were utilized in this
study. HALPERIN and ESKIN (2004) devised a maximum likelihood algorithm (HAP) that breaks
up the sequence into block of limited diversity. This algorithm was implemented using the HAP
website (http://diego.ucsd.edu/hap/html/). Although, HAP appears to be faster and more
accurate than previous methods (HALPERIN and ESKIN 2004), its performance using data with
9
known recombination is still to be determined. One method that has proven to have the lowest
error rate using markers with recombination is the pseudo-Bayesian ELB algorithm (EXCOFFIER
et al. 2003). The ELB method was implemented from the Arlequin (ver. 3.01) software package
(EXCOFFIER et al. 2005). After both haplotype reconstructions were reconciled (see Results), one
haplotype was randomly chosen per individual. This dataset of reconstructed haplotypes was
used in all subsequent analysis except when otherwise noted.
Evolutionary analyses of the sequence variation at GstD1 were done mostly using the DnaSP
ver. 4.10 (ROZAS et al. 2003) and Arlequin ver. 3.01 (EXCOFFIER et al. 2005) software packages.
Historical selection and/or demographic changes can leave their mark in the frequency spectrum
and number of sequence polymorphisms (NIELSEN 2001); Tajima’s D (TAJIMA 1989), Fu and
Li’s F (FU and LI 1993), Fu’s FS (FU 1997) and Fay & Wu’s H (FAY and WU 2000) test statistics
were used to examine this pattern. Significant negative values are due to large numbers of low
frequency alleles, suggesting that a sweep or bottleneck has occurred in the past. Positive values
are a result of few alleles being maintained at relative high frequency, suggesting balancing
selection. Conversely, selective neutrality tests that are independent of genealogy or those in
which genealogy can be removed, such as the MK test, are robust to demographic changes
(NIELSEN 2001). The evolutionary history of GstD1 was examined using the MK test,
additionally an outgroup (D. navojoa) was used to polarize the fixed differences. Therefore
lineage specific MK test were performed for each species and as well as for each of the D.
mojavensis populations. Statistical significance for the frequency distribution tests and the 95%
confidence intervals for θ was estimated by performing 10,000 coalescent simulations
incorporating the estimated recombination rate for the particular population tested (DnaSP and
10
Arlequin was used for these analysis). Recombination rates were calculated using the maximum
likelihood method of (HUDSON 2001). The coalescent simulations of θ for D. arizonae and the
four D. mojavensis populations was used to examine for the presence of significant differences in
the level of variation between population and/or species.
Phylogenetic analysis was performed using both a maximum likelihood (ML) and a Bayesian
approach. Individuals with identical haplotypes were combined. The ML analysis was done
using the program PAUP* ver. 4.0b10 (SWOFFORD 2003) and for the Bayesian analysis the
program MrBayes ver. 3.1.2 (HUELSENBECK and RONQUIST 2001; RONQUIST and HUELSENBECK
2003) was used. Evolutionary models were chosen using the programs Modeltest ver. 3.7
(POSADA and CRANDALL 1998) and MrModeltest ver. 2.2 (NYLANDER 2004). Phylogenetic
robustness of the ML analysis was determined by performing 1,000 bootstrap replicates. The
Bayesian analysis was done using 4 chains (three heated and one cold) of 10,000,000 generations
sampling every 100 generations and discarding the first 1,000 trees. Since GstD1 is a nuclear
recombining gene, the phylogenetic analysis were used to visually illustrate the relationships
between the populations and species and not of the individuals sampled.
RESULTS
Intra- and interspecific variation at GstD1: Out of the 93 individuals sequenced, only 16
contained ambiguous sites and only seven contained more than two ambiguous sites. For the
most part the HAP and ELB algorithms behaved similarly. In eight cases the haplotype
predictions between the HAP and ELB algorithm differed, and five of those differ only due to
11
the placement of singletons. Given that GstD1 can potentially recombine, the ELB algorithm
seems most appropriate and its results were thus used for the analysis. The reconstructed
haplotype dataset used in the analysis is provided as a supplemental data file (found at
http://www.genetics.org/supplemental/).
Overall there was a very small amount of linkage disequilibrium within each of the D.
mojavensis populations and in D. arizonae. Only three out of 36 pairwise comparisons were
significant using Fisher’s exact test in the Sonora population and only one out of ten
comparisons was significant in the Baja population. There were no significant pairwise
comparisons in the Mojave and Catalina Island populations. Within the Northern D. arizonae
population only one out of 153 comparisons showed a departure from independence. Tables of
pairwise comparisons and P-values are provided as a supplemental file (supplemental Table 1 at
http://www.genetics.org/supplemental/). The level of intragenic recombination in GstD1 is
shown in supplemental Table 2 (found at http://www.genetics.org/supplemental/).
The level of synonymous, nonsynonymous and noncoding sequence variation for all three
species is shown in Table 1. Overall D. arizonae contained the greatest number of segregating
synonymous sites (θs). Using 10,000 coalescent simulations the 95% interval of θs for the
Northern D. arizonae population was estimated (0.0208-0.0605) and compared with that of D.
mojavensis. For every comparison the level of synonymous variation in the Northern D.
arizonae population was significantly greater (Mojave, P < 0.001; Catalina Island, P < 0.001;
Sonora, P < 0.001; Baja, P = 0.001). A similar pattern between Northern D. arizonae and the
four D. mojavensis populations was observed comparing both synonymous and noncoding
12
sequence variation (Mojave, P < 0.001; Catalina Island, P < 0.001; Sonora, P < 0.001; Baja, P =
0.011). Within D. mojavensis, the Baja California population contains double the synonymous
variation of Sonora and Catalina Island populations and more than seven times that of the
Mojave population. Although it may appear that there is a greater amount of nonsynonymous
variation when all D. mojavensis populations are combined, this is because the four populations
are genetically isolated (see below). When comparing each D. mojavensis population separately,
with exception of the Catalina Island population (P = 0.021), the level of nonsynonymous
variation (θn) is not significantly different from the Northern D. arizonae population (95%
interval 0.0006-0.0047 based on 10,000 coalescent simulations).
Population structure: Population structure within D. mojavensis and D. arizonae was
estimated by calculating the FST parameter. Only synonymous and noncoding variation was used
in the calculation of FST in order to remove any possible bias due to selection at nonsynonymous
sites. To be certain there is no structure within each of the D. mojavensis population, pairwise
FST were calculated for each of the collection sites separately (supplemental Table 3 at
http://www.genetics.org/supplemental/). Most of the comparisons were significant with the
exception of the two collection sites within the Mojave population and between the three
collections sites within the Sonora population. When pooling all collections sites within a
population, there is significant genetic isolation between all the four D. mojavensis populations
(Table 2). With the exception of the comparison between the Catalina Island and Mojave
population (P = 0.99), a similar pattern was observed when using nonsynonymous sites
(supplemental Table 4 at http://www.genetics.org/supplemental/). This result was expected since
the Catalina Island and Mojave populations share the same protein haplotype. The Northern D.
13
arizonae population is composed of collections within the Sonoran desert and Riverside, CA.
There was no significant FST between those two collection sites (FST = 0.068, P = 0.09). When
pooling all the Northern D. arizonae collections and comparing it to the Southern D. arizonae,
significant genetic isolation was observed (FST = 0.59, P < 0.001).
Phylogenetic analysis: Both maximum likelihood (ML) and Bayesian methods of
phylogenetics analysis were used in this study. With very minor exceptions, both methods
resulted in the same phylogenetic structure. Figure 2 shows the result of the Bayesian analysis
(results of the ML analysis not shown). For the Bayesian analysis the evolutionary model chosen
was K80+I+Γ (I = 0.573, Γ = 0.883) with codon partitioning (KIMURA 1980). Both D.
mojavensis and D. arizonae are reciprocally monophyletic. The D. mojavensis clade is split into
two well supported groups, Baja/Sonora and Catalina Island/Mojave. The Southern D. arizonae
population forms a monophyletic group within the larger polyphyletic clade of D. arizonae.
Frequency and test of neutrality analysis: Given the above determined population structure
within the two species, demographic changes within each population could be determined with
less bias by analyzing each population separately. Table 3 contains the summary statistics for
several frequency distributions tests done on each of the D. mojavensis and D. arizonae
populations (analysis of D. navojoa is not shown given low statistical power due to the small
sample size). In the Baja D. mojavensis population the above statistics were all negative and the
majority significantly. The only other D. mojavensis population with negative statistics was the
Sonora population, although none were significant. In both D. arizonae populations the majority
of the statistics were negative (see Table 3).
14
The distribution of sequence divergence across GstD1, especially that of nonsynonymous
divergence, appears to be heterogeneous in the Baja California and mainland populations, while
variation is distributed homogenously within the Catalina Island and Mojave populations (Figure
3). Another method of comparing the level fixed differences and polymorphisms is the MK test
(Table 4). This test was implemented in two ways (standard and lineage-specific), the first
compared each population against the combined dataset of the sister species (e.g. Baja D.
mojavensis vs. all D. arizonae) and the second used the sequence of the outgroup D. navojoa to
polarize the fixed differences to be able to examine lineage specific departures from neutrality.
Although the MK test is robust to demographic changes (NIELSEN 2001), the effect of such
changes on these analyses cannot be fully determined until a robust knowledge of the
demographic histories of the D. mojavensis populations and D. arizonae is ascertained (a large
scale multilocus analysis of the demographic histories of D. mojavensis and D. arizonae is in
progress). Another factor that can affect the interpretation of the MK test is if the gene occurs in
a region of low recombination (SHAPIRO et al. 2007). Genes in regions of low recombination,
such as those located at either the centromeric or telomeric ends of chromosomes are more
drastically affected by hitchhiking (ANDOLFATTO and PRZEWORSKI 2001; SMITH and HAIGH
1974) and background selection (CHARLESWORTH et al. 1993) at nearby genes. Recombination
rates across chromosomes in D. mojavensis have not been determined, but given that GstD1 is
located in the middle of Muller element E (CONSORTIUM 2007) it is not likely to be in a low
recombining region of the genome. An assumption of the MK test is that all synonymous
positions be silent; this is not the case in the presence of codon usage bias (AKASHI 1999;
KLIMAN and HEY 1994). The effective number of codons (ENC) is a measure of codon usage
15
bias which ranges from 20 (extreme bias) to 61 (equal use of all codons) (WRIGHT 1990). In Gst,
the ENC in D. mojavensis (32.4) and D. arizonae (32.8) are similar and do not show a very high
level of bias, therefore codon usage bias is not expected to greatly affect the analysis.
Results for both Baja and Sonora populations of D. mojavensis significantly reject the
assumptions of neutrality of the MK test in the direction of a greater fixation of nonsynonymous
substitutions (Table 4). For example, in the Baja population the ratio of nonsynonymous to
synonymous polymorphisms is 0.25 while the same ratio for fixed differences that occur in the
lineage leading to the Baja D. mojavensis populations is 1.75. A significant departure from the
neutral expectation was not found in the other two D. mojavensis populations or in D. arizonae
(Table 4).
Given the relatively recent divergence of D. mojavensis and D. arizonae, a large number of
fixations between the species would not be expected. If the divergence time between the species
is much less than 4Ne generations ago the incomplete assortment of the ancestral variation would
be expected. To better estimate the divergence between the two species all the population data
on D. mojavensis and D. arizonae have been compiled including data from studies of the
following loci: from Adh-1, Adh-2 and G6pd (MATZKIN 2004; MATZKIN and EANES 2003), the
10 random nuclear loci from MACHADO et al. (2007), and GstD1 (this study). This data was
used to perform 10,000 coalescent simulation using the HKA program (J. HEY;
http://lifesci.rutgers.edu/heylab/). The divergence time between D. mojavensis and D. arizonae
was estimated to have occurred 3.3Ne generations ago. According to CLARK (1997) this would
suggest that the majority of the once shared variation would have been fixed or lost in one or the
16
other lineage. Any polymorphism that has not fixed in one or possibly both species would
increase the number of polymorphisms within species and decrease the amount of fixed
differences between the species. This essentially will make it difficult to observe a pattern of
variation suggesting adaptive protein evolution, as was seen for GstD1 in this study.
To better determine the effect of the divergence time between D. mojavensis and D. arizonae
on the ability to detect a significant pattern of adaptive protein evolution a series of coalescent
simulations were performed. The simulations were performed only for the two populations that
show a significant departure from neutrality, Baja California and Sonora (Table 4). Using the ms
program (HUDSON 2002), 10,000 coalescent simulations were performed of a D. mojavensis
population splitting from D. arizonae 3.3Ne generations ago. For the Baja California and Sonora
populations θ was set to set 4.48 and 3.54, respectively (θ values are from GstD1). Since the
focus was on the strict neutral model the observed level of variation at GstD1 was used to
calculate the probability that a given mutation from a simulated replicate would be assigned to
either a synonymous or non-synonymous site class (0.12 for Baja California and 0.19 for
Sonora). For each replicate, a MK test was performed and significance was calculated using a χ2
test. For the Baja California population the probability of observing a significant MK test
suggesting adaptive protein evolution under the neutral model is P = 0.015 and for the Sonora
population is P = 0.049. Therefore, it is unlikely that the significant results observed in this
study are a result of the relatively short divergence time between D. mojavensis and D. arizonae.
17
DISCUSSION
Population history of D. mojavensis and D. arizonae: The pattern of variation at GstD1
strongly supports the existence of four genetically isolated populations for D. mojavensis (Table
2). This observation supports previously studies using both nuclear and mitochondrial genes
(MACHADO et al. 2007; REED et al. 2007; ROSS and MARKOW 2006). A point of disagreement
between studies on nuclear genes (MACHADO et al. 2007), including this study, and
mitochondrial genes (REED et al. 2007) is the relationships between the four D. mojavensis
populations. As seen in GstD1, nuclear genes show that the Baja and Mainland populations form
one cluster and the Catalina Island and Mojave populations form another (Figure 2).
Mitochondrial genes suggest that the Catalina Island population is embedded within the
Baja/Mainland cluster and that the Mojave population is most basal. The mitochondrial pattern
contradicts the previous proposed model of the speciation of D. mojavensis and D. arizonae.
RUIZ et al. (1990) proposed that D. mojavensis diverged from D. arizonae in the Baja California
peninsula by an allopatric event, then a small subpopulation migrated north to the Mojave desert
and eventually became isolated. A further colonization event from this isolated Mojave
population lead to the Catalina Island population and in a similar fashion the Sonora population
was later formed by colonizers from the Baja peninsula. According to this hypothesis it is
expected that the greatest diversity will be observed within the Baja population and that the
Catalina Island and Sonora populations will be clustered within the Mojave and Baja
populations, respectively. Evidence from GstD1 (present study), Adh-1/Adh-2 (MATZKIN 2004;
MATZKIN and EANES 2003) and a random set of ten nuclear markers (MACHADO et al. 2007)
show that there is a greater level of nucleotide variation within the Baja population (Table 1).
18
Although the phylogeographic pattern at GstD1 support a Baja/Sonora and Catalina
Island/Mojave clustering (Figure 2), the Catalina Island population did not appear to be nested
within Mojave and relationships between Baja and Sonora could not be fully resolved.
The level of synonymous variation in the Northern D. arizonae is significantly greater than in
all four of the D. mojavensis populations suggesting a greater effective population size in the
Northern D. arizonae (Table 1). The level of variation in the Southern D. arizonae population
samples appears to be similar to that of D. mojavensis, but only few individuals were available
from that population. Similarly to D. mojavensis, disagreements between nuclear and
mitochondrial markers are observed in D. arizonae. A greater effective population size of D.
arizonae relative to D. mojavensis has previously been observed for nuclear genes (MACHADO et
al. 2007; MATZKIN 2004; MATZKIN and EANES 2003), while the opposite was observed for the
mitochondria (REED et al. 2007). A different pattern of variation between nuclear (diparentally
inherited) and mitochondrial (maternally inherited) markers could be due to differences in
dispersal rates between the sexes, the mating system (CHESSER and BAKER 1996), selection on
the mitochondria (BAZIN et al. 2006) or possibly due to genetic draft associated with
endosymbionts (MEIKLEJOHN et al. 2007). No sex specific dispersal rates were previously
observed in D. mojavensis, but this was only examined in one of the four populations (MARKOW
and CASTREZANA 2000). Spiroplasma endosymbionts have been found in D. mojavensis
(MATEOS et al. 2006), although the nature of this interaction has not been resolved. These
discrepancies in results between nuclear and mitochondrial genes warrant future examination.
19
Evolutionary history of GstD1: The cactophilic lifestyle of D. mojavensis and D. arizonae, as
well as other members of the Repleta group, subject them to a wide array of chemical, sometimes
toxic, environments (KIRCHER 1982). This chemical diversity is dependent on both the cactus
host and the bacterial and yeast communities associated with the necrosis (STARMER 1982;
STARMER et al. 1986). For example differences in alcohols between agria and organpipe were
proposed to have driven the molecular and functional divergence of Adh between the Baja and
Sonora populations (MATZKIN 2004; MATZKIN 2005). In addition to alcohols, a variety of other
compounds are experienced by populations of D. mojavensis, from the alkaloids of Opuntia in
Catalina Island to the triterpenes and glycosides of the columnar cacti (agria and organpipe) of
both Baja and Sonora (KIRCHER 1977; KIRCHER 1980; KIRCHER 1982). Although chemical
differences occur between agria and organpipe necrosis, these two columnar cacti share many
chemical properties. Agria and organpipe are closely related, both members of the same genus,
Stenocereus, and are distantly related and more derived relative to barrel cactus and even further
to prickly pear (COTA and WALLACE 1997; NYFFELER 2002). The phylogenetic relationships,
and thus the chemical relatedness between the cactus hosts (agria/organpipe vs. barrel/prickly
pear) appear to associate closely with the pattern of variation at GstD1 (Baja/Sonora vs. Catalina
Island/Mojave).
In the Baja and Sonora populations, a large level of interspecific amino acid divergence in the
5’ and 3’ end of the GstD1 locus was observed (Figure 3). Although the Baja and Sonora
populations appear significantly diverged when examining synonymous positions (Table 2), they
appear to have preserved a very similar protein haplotype for GSTD1. Similarly, the populations
in Mojave and Catalina Island share the same protein haplotype, but there are no fixed amino
20
acid differences at GSTD1 between these populations and D. arizonae (Table 4). Under a
neutral scenario these strikingly different patterns of variation between the Baja/Sonora and
Catalina Island/Mojave populations could reflect a reduction in population size in the
Baja/Sonora lineage that allowed for the fixation of slightly deleterious amino acid
polymorphisms and a subsequent maintenance of a large population size in Catalina
Island/Mojave that would deter the fixation of slightly deleterious amino acid polymorphisms.
This scenario, which excludes selection as the driver for the pattern of variation seen between
populations, is not supported by the observations made in this study. A significant shift in the
frequency spectrum was observed for the Baja population and a similar pattern was seen for the
Sonora population (Table 3). This shift in the frequency spectrum can be either a product of
demographic changes, such as population expansion, or positive selection (NIELSEN 2001). The
effect of a population expansion will be genome wide and therefore a similar signature at many
loci should be expected. Recently, a combined analysis of ten random regions across the
genome of D. mojavensis observed that for all four populations there was an overall signature of
population expansion (MACHADO et al. 2007). In that study and as well as in this, it is clear that
the level of variation, and hence population size, is greatest in the Baja and Sonora populations
relative to the Catalina Island and Mojave populations. Furthermore, as the level of variation in
Catalina Island and Mojave is much lower than for D. arizonae, it seems unlikely that the lack of
amino acid differences is associated with a large population size in Catalina Island and Mojave.
Therefore it appears that the pattern of variation observed could have been formed as a result of
positive selection in the Baja/Sonora lineage and purifying selection in the Catalina
Island/Mojave lineage.
21
A significant departure from the neutral expectation is observed in the Baja and Sonora
populations (Table 4). Furthermore, the lineage specific MK test (Table 4) suggests that more
amino acid changes occurred in the lineage leading to the Baja and Sonora populations of D.
mojavensis than is expected by chance. Both Baja and Sonora share the same amino acid
differences with D. arizonae, with the exception of residue 178, which is fixed in Baja and found
in all but one individual sampled from Sonora. The location of the seven amino acids provides
further evidence that this locus has gone through a period of adaptive protein evolution in the
past. Two of the seven fixed differences between D. arizonae and D. mojavensis from Baja and
Sonora (Leu-7-Gln and His-39-Gln) occur in the active site pocket of the GSTD1 enzyme. The
crystallographic structure and active site has been determined previously for the A. gambiae
homolog of the D. mojavensis GSTD1 (CHEN et al. 2003). The active site pocket is composed of
two subsites, the G-site and H-site. The G-site, mostly composed of hydrophilic and polar
residues, is the location where glutathione binds, while the H-site is a hydrophobic pocket where
the substrate binds (CHEN et al. 2003). The histidine (positively charged) to glutamine (polar)
change at residue 39 occurs in the G-site. Although this substitution might appear to cause a
charge change in the molecule, given the isoelectric point of histidine (7.6) at a physiological pH
(nearly neutral) the side chain will not be charged. The leucine (hydrophobic) to glutamine
(polar) change occurs in the H-site. The location of these two changes, plus the fact that they
also affect the polarity of the active site, could have significant consequences for the activity and
substrate specificity of GSTD1.
It has been proposed (HEED 1982) that the host of the D. mojavensis/D. arizonae ancestor was
prickly pear. Evidence from this study and previous surveys (MACHADO et al. 2007) suggests
22
that the Baja/Sonora and Catalina Island/Mojave clusters are well supported. Therefore it can be
hypothesized that following the divergence of D. mojavensis, a population settled in Baja or
Sonora and shifted to columnar cacti, while a population migrated north remaining in prickly
pear and subsequently moved to barrel cactus. The shift to columnar cacti by the ancestral
Baja/Sonora population exposed them to alternative compounds. It is at this moment in its recent
evolutionary history that it appears that selection at GstD1 took place. Given the pattern of
sequence divergence observed in this study, its transcriptional variation (MATZKIN et al. 2006)
and the known role of GstD1 in detoxification, this strongly suggests that GstD1 played a role in
the columnar cactus host adaptation of Baja and Sonora populations of D. mojavensis.
Transcriptional and coding sequence evolution: Cactophilic Drosophila, as well as many
other phytophagous insects have adapted to the chemical composition of their host (BECERRA
1997). The effects of and response to such chemical diversity is especially great in the presence
of a host shift. It is at this point where the ability to modulate enzyme activity and pathway
fluxes, especially those involved in detoxification, are of great importance.
Enzymes tend to have as their substrate a set of related compounds, when faced with a non-
optimal compound the velocity (i.e. enzyme activity) of the reaction will be reduced. To
maintain wild type activity, changes at the coding sequence (e.g. modification of active site) or in
the enzyme titer (e.g. increase expression) will be needed. When faced with an environmental
shift (e.g. cactus host shift) four outcomes can be predicted for an enzyme, termed here as: 1).
neutral, 2). coding sequence, 3). transcriptional, and 4). coding sequence and transcriptional.
23
In the first case neutral changes in the coding sequence and in the expression of the enzyme
will accumulate because the enzyme has no involvement in the detoxification or because of
constraint. The second outcome involves the fixation of new non-synonymous mutations that
modify the active site, and hence increase the enzyme activity, although new mutations are not
necessary for this outcome (HARTL et al. 1985). For the last two outcomes changes in the ability
to modulate gene expression are needed. Given the structure of transcription binding sites it is
probable that mutations can create a potential binding domain (STONE and WRAY 2001). As a
response to an environmental shift the titer and expression of transcription factors may be
affected such that the previously non-functional binding domain will be activated and might
subsequently increase the expression of the enzyme and hence increase activity. Therefore, it is
these genetic backgrounds that have the ability to modulate gene expression that will increase in
frequency (third outcome). The pattern of variation in GstD1 highlights the fourth outcome,
which in addition to transcriptional modulation there are significant changes at the protein level.
Evolutionary change, as observed in D. mojavensis, can therefore involve a combination of
transcriptional and coding sequence changes.
ACKNOWLEDGMENTS
I like to thank Kristin Sweetser for all her hard work on this project and Therese Markow for
assistance and support. Therese Markow, Carlos Machado, Jeremy Bono and two anonymous
reviewers provided helpful suggestions and comments on the manuscript. This study was
supported by the NIGMS 2K12 G000798-06 Postdoctoral Excellence in Research and Teaching
(PERT) fellowship through the Center for Insect Science at the University of Arizona to LMM.
24
LITERATURE CITED
AKASHI, H., 1999 Inferring the fitness effects of DNA mutations from polymorphism and
divergence data: Statistical power to detect directional selection under stationarity and
free recombination. Genetics 151: 221-238.
ANDOLFATTO, P., and M. PRZEWORSKI, 2001 Regions of lower crossing over harbor more rare
variants in African populations of Drosophila melanogaster. Genetics 158: 657-665.
ATKINS, W. M., R. W. WANG, A. W. BIRD, D. J. NEWTON and A. Y. H. LU, 1993 The catalytic
mechanism of Glutathione-S-Transferase (Gst): Spectroscopic determination of the
Pk(A) of Tyr-9 in rat Alpha-1-1 Gst. J. Biol. Chem. 268: 19188-19191.
BAZIN, E., S. GLEMIN and N. GALTIER, 2006 Population size does not influence mitochondrial
genetic diversity in animals. Science 312: 570-572.
BECERRA, J., 1997 Insects on plants: Macroevolutionary chemical trends in host use. Science
276: 253-256.
BIERNE, N., and A. EYRE-WALKER, 2004 The genomic rate of adaptive amino acid substitution
in Drosophila. Mol. Biol. Evol. 21: 1350-1360.
BUSTAMANTE, C. D., A. FLEDEL-ALON, S. WILLIAMSON, R. NIELSEN, M. T. HUBISZ et al., 2005
Natural selection on protein-coding genes in the human genome. Nature 437: 1153-1157.
CHARLESWORTH, B., M. T. MORGAN and D. CHARLESWORTH, 1993 The effect of deleterious
mutations on neutral molecular variation. Genetics 134: 1289-1303.
CHEN, L., P. R. HALL, X. E. ZHOU, H. RANSON, J. HEMINGWAY et al., 2003 Structure of an
insect [delta]-class glutathione S-transferase from a DDT-resistant strain of the malaria
vector Anopheles gambiae. Acta Crystallogr. D 59: 2211-2217.
25
CHESSER, R., and R. BAKER, 1996 Effective sizes and dynamics of uniparentally and diparentally
inherited genes. Genetics 144: 1225-1235.
CLARK, A. G., 1990 Inference of haplotypes from PCR-amplified samples of diploid
populations. Mol. Biol. Evol. 7: 111-122.
CLARK, A. G., 1997 Neutral behavior of shared polymorphism. Proc. Natl. Acad. Sci. 94: 7730-
7734.
CONSORTIUM, D. C. G. S. A. A., 2007 Genomics on a phylogeny: Evolution of genes and
genomes in the genus Drosophila. Nature in press.
COTA, J. H., and R. S. WALLACE, 1997 Chloroplast DNA evidence for divergence in Ferocactus
and its relationships to North American columnar cacti (cactaceae: Cactoideae). Syst.
Bot. 22: 529-542.
ENAYATI, A. A., H. RANSON and J. HEMINGWAY, 2005 Insect glutathione transferases and
insecticide resistance. Insect Mol. Biol. 14: 3-8.
EXCOFFIER, L., G. LAVAL and D. BALDING, 2003 Gametic phase estimation over large genomic
regions using an adaptive window approach. Hum. Genom. 1: 7-19.
EXCOFFIER, L., G. LAVAL and S. SCHNEIDER, 2005 Arlequin ver. 3.0: An integrated software
package for population genetics data analysis. Evol. Bioinformatics Online 1: 47-50.
FAY, J. C., and C. I. WU, 2000 Hitchhiking under positive darwinian selection. Genetics 155:
1405-1413.
FAY, J. C., G. J. WYCKOFF and C. I. WU, 2002 Testing the neutral theory of molecular evolution
with genomic data from Drosophila. Nature 415: 1024-1026.
FELLOWS, D. F., and W. B. HEED, 1972 Factors affecting host plant selection in desert-adapted
cactiphilic Drosophila. Ecology 53: 850-858.
26
FU, Y. X., 1997 Statistical tests of neutrality of mutations against population growth, hitchhiking
and background selection. Genetics 147: 915-925.
FU, Y. X., and W. H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693-709.
GILAD, Y., A. OSHLACK, G. K. SMYTH, T. P. SPEED and K. P. WHITE, 2006 Expression profiling
in primates reveals a rapid evolution of human transcription factors. Nature 440: 242-
245.
HALPERIN, E., and E. ESKIN, 2004 Haplotype reconstruction from genotype data using imperfect
phylogeny. Bioinformatics 20: 1842-1849.
HARTL, D. L., D. E. DYKHUIZEN and A. M. DEAN, 1985 Limits of adaptation: The evolution of
selective neutrality. Genetics 111: 655-674.
HEED, W. B., 1978 Ecology and genetics of sonoran desert Drosophila, pp. 109-126 in
Ecological genetics: The interface, edited by P. F. BRUSSARD. Springer-Verlag.
HEED, W. B., 1982 The origin of Drosophila in the sonoran desert, pp. 65-80 in Ecological
genetics and evolution: The cactus-yeast-Drosophila model system, edited by J. S. F.
BARKER and W. T. STARMER. Academic Press, New York.
HUDSON, R. R., 2001 Two-locus sampling distributions and their application. Genetics 159:
1805-1817.
HUDSON, R. R., 2002 Generating samples under a wright-fisher neutral model of genetic
variation. Bioinformatics 18: 337-338.
HUELSENBECK, J. P., and F. RONQUIST, 2001 MrBayes: Bayesian inference of phylogenetic trees.
Bioinformatics 17: 754-755.
KIMURA, M., 1980 A simple method for estimating evolutionary rates of base substitutions
through comparative studies of nucleotide-sequences. J. Mol. Evol. 16: 111-120.
27
KING, M.-C., and A. C. WILSON, 1975 Evolution at two levels in humans and chimpanzees.
Science 188: 107-116.
KIRCHER, H. W., 1977 Triterpene glycosides and queretaroic acid in organ pipe cactus.
Phytochemistry 16: 1078.
KIRCHER, H. W., 1980 Triterpenes in organ pipe cactus. Phytochemistry 19: 2707.
KIRCHER, H. W., 1982 Chemical composition of cacti and its relationship to sonoran desert
Drosophila, pp. 143-158 in Ecological genetics and evolution: The cactus-yeast-
Drosophila model system, edited by J. S. F. BARKER and W. T. STARMER. Academic
Press, New York.
KLIMAN, R. M., and J. HEY, 1994 The effects of mutation and natural-selection on codon bias in
the genes of Drosophila. Genetics 137: 1049-1056.
MACHADO, C. A., L. M. MATZKIN, L. K. REED and T. A. MARKOW, 2007 Multilocus nuclear
sequences reveal intra- and interspecific relationships among chromosomally
polymorphic species of cactophilic Drosophila. Mol. Ecol. 16: 3009-3024.
MARKOW, T. A., and S. CASTREZANA, 2000 Dispersal in cactophilic Drosophila. Oikos 89: 378-
386.
MATEOS, M., S. J. CASTREZANA, B. J. NANKIVELL, A. M. ESTES, T. A. MARKOW et al., 2006
Heritable endosymbionts of Drosophila. Genetics 174: 363-376.
MATZKIN, L. M., 2004 Population genetics and geographic variation of alcohol dehydrogenase
(Adh) paralogs and glucose-6-phosphate dehydrogenase (G6pd) in Drosophila
mojavensis. Mol. Biol. Evol. 21: 276-285.
MATZKIN, L. M., 2005 Activity variation in alcohol dehydrogenase paralogs is associated with
adaptation to cactus host use in cactophilic Drosophila Mol. Ecol. 14: 2223-2231.
28
MATZKIN, L. M., and W. F. EANES, 2003 Sequence variation of alcohol dehydrogenase (Adh)
paralogs in cactophilic Drosophila. Genetics 163: 181-194.
MATZKIN, L. M., T. D. WATTS, B. G. BITLER, C. A. MACHADO and T. A. MARKOW, 2006
Functional genomics of cactus host shifts in Drosophila mojavensis. Mol. Ecol. 15: 4635-
4643.
MCDONALD, J. H., and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in
Drosophila. Nature 351: 652-654.
MEIKLEJOHN, C. D., K. L. MONTOOTH and D. M. RAND, 2007 Positive and negative selection on
the mitochondrial genome. Trends in Genetics 23: 259-263.
NIELSEN, R., 2001 Statistical tests of selective neutrality in the age of genomics. Heredity 86:
641-647.
NUZHDIN, S., M. WAYNE, K. HARMON and L. MCINTYRE, 2004 Common pattern of evolution of
gene expression level and protein sequence in Drosophila. Mol. Biol. Evol. 21: 1308-
1317.
NYFFELER, R., 2002 Phylogenetic relationships in the cactus family (cactaceae) based on
evidence from trnK/matK and trnL-trnF sequences. Am. J. Bot. 89: 312-326.
NYLANDER, J. A. A., 2004 MrModeltest v2. Evolutionary Biology Centre, Uppsala University,
Uppsala, Sweden.
POSADA, D., and K. A. CRANDALL, 1998 Modeltest: Testing the model of DNA substitution.
Bioinformatics 14: 817-818.
RANSON, H., L. ROSSITER, F. ORTELLI, B. JENSEN, X. L. WANG et al., 2001 Identification of a
novel class of insect Glutathione s-transferases involved in resistance to DDT in the
malaria vector Anopheles gambiae. Biochem. J. 359: 295-304.
29
REED, L. K., M. NYBOER and T. A. MARKOW, 2007 Evolutionary relationships of Drosophila
mojavensis geographic host races and their sister species Drosophila arizonae. Mol. Ecol.
16: 1007-1022.
RIFKIN, S. A., J. KIM and K. P. WHITE, 2003 Evolution of gene expression in the Drosophila
melanogaster subgroup. Nat. Genet. 33: 138-144.
RONQUIST, F., and J. P. HUELSENBECK, 2003 MrBayes 3: Bayesian phylogenetic inference under
mixed models. Bioinformatics 19: 1572-1574.
ROSS, C. L., and T. A. MARKOW, 2006 Microsatellite variation among diverging populations of
Drosophila mojavensis. J. Evol. Biol. 19: 1691-1700.
ROZAS, J., J. C. SANCHEZ-DELBARRIO, X. MESSEGUER and R. ROZAS, 2003 DnaSP, DNA
polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-
2497.
RUIZ, A., and W. B. HEED, 1988 Host-plant specificity in the cactophilic Drosophila mulleri
species complex. J. Anim. Ecol. 57: 237-249.
RUIZ, A., W. B. HEED and M. WASSERMAN, 1990 Evolution of the mojavensis cluster of
cactophilic Drosophila with descriptions of two new species. J. Hered. 81: 30-42.
SHAPIRO, J. A., W. HUANG, C. ZHANG, M. J. HUBISZ, J. LU et al., 2007 Adaptive genic evolution
in the Drosophila genomes. Proc. Natl. Acad. Sci. 104: 2271-2276.
SMITH, J. M., and J. HAIGH, 1974 Hitch-hiking effect of a favorable gene. Genet. Res. 23: 23-35.
STARMER, W. T., 1982 Analysis of the community structure of yeasts associated with the
decaying stems of cactus. I. Stenocereus gummosus Microb. Ecol. 8: 71-81.
30
STARMER, W. T., J. S. F. BARKER, H. J. PHAFF and J. C. FOGLEMAN, 1986 Adaptations of
Drosophila and yeasts: Their interactions with the volatile 2-propanol in the cactus
microorganism Drosophila model system. Aust. J. Biol. Sci. 39: 69-77.
STONE, J., and G. WRAY, 2001 Rapid evolution of cis-regulatory sequences via local point
mutations. Mol. Biol. Evol. 18: 1764-1770.
SWOFFORD, D. L., 2003 Paup*: Phylogenetic analysis using parsimony (*and other methods),
version 4.0b10. Sinauer Associates, Sunderland, MA.
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics 123: 585-595.
TANG, A. H., and C. P. D. TU, 1994 Biochemical characterization of Drosophila Glutathione s-
transferases D1 and D21. J. Biol. Chem. 269: 27876-27884.
VACEK, D. C., 1979 The microbial ecology of the host plants of Drosophila mojavensis. Thesis,
University of Arizona, Tucson, AZ.
VOIGHT, B. F., S. KUDARAVALLI, X. WEN and J. K. PRITCHARD, 2006 A map of recent positive
selection in the human genome. PLoS Biology 4.
WRIGHT, F., 1990 The 'effective number of codons' used in a gene. Gene 87: 23-29.
31
TABLE 1
Synonymous, nonsynonymous and noncoding sequence variation for GstD1 in D. mojavensis, D. arizonae and D. navojoa
Synonymous Nonsynonymous Synonymous and noncoding
N Ss πs θs Sn πn θn Ss+nc πs+nc θs+nc
D. mojavensis all 63 17 0.0168 0.0241 15 0.0110 0.0067 28 0.0210 0.0237
Mojave 15 1 0.0023 0.0021 2 0.0012 0.0013 1 0.0014 0.0012
Catalina Island 8 3 0.0081 0.0077 0 0.0000 0.0000 3 0.0048 0.0046
Sonora 23 4 0.0083 0.0073 4 0.0016 0.0023 9 0.0104 0.0097
Baja 17 8 0.0068 0.0158 3 0.0019 0.0019 12 0.0075 0.0142
D. arizonae all 25 26 0.0350 0.0461 6 0.0029 0.0033 31 0.0265 0.0332
Northern 20 21 0.0310 0.0396 4 0.0019 0.0024 24 0.0211 0.0273
Southern 5 2 0.0054 0.0064 0 0.0000 0.0000 3 0.0048 0.0058
D. navojoa all 4 7 0.0233 0.0255 1 0.0011 0.0011 11 0.0232 0.0246
N, number of lines sampled; S, number of segregating sites; π, mean pairwise difference per bp; θ, nucleotide diversity per bp.
32
TABLE 2
Pairwise FST between the four D. mojavensis populations
Mojave Catalina Island Sonora
Catalina Island 0.654***
Sonora 0.842*** 0.788***
Baja 0.902*** 0.855*** 0.252***
The data was permuted 10,000 times in order to calculate significance. *** P < 0.001.
33
TABLE 3
Summary statistics of frequency distribution tests
Species Population Tajima's D A Fu's FS B Fu & Li's D C Fay & Wu's H D
D. mojavensis Mojave 0.035 -0.488 1.062 0.724
Catalina Island 0.204 0.562 0.165 0.714
Sonora -0.146 -9.732 -0.221 -0.431
Baja -1.426* -5.194† -2.138* -1.412
D. arizonae Northern -0.878* -8.391 -0.752 3.600
Southern -1.049 -0.186 -1.569 0.900
A TAJIMA 1989; B FU 1997; C FU and LI 1993; D FAY and WU 2000
† P < 0.1, * P < 0.05
34
TABLE 4
Fixed and polymorphic variation at GstD1 and tests for departure from neutrality
Standard MK test Lineage-specific MK test
Species Population Fixed Polymorphic Test Fixed Polymorphic Test D. mojavensis Mojave syn. 7 32 Fisher's 5 1 Fisher's nonsyn. 0 7 ns 0 2 ns Catalina syn. 6 34 Fisher's 4 3 N/A Island nonsyn. 0 6 ns 0 0 Sonora syn. 6 40 G-test 4 9 G-test nonsyn. 6 10 * 6 4 ns Baja syn. 6 43 G-test 4 12 G-test nonsyn. 7 9 ** 7 3 * D. arizonae Northern syn. 3 52 Fisher's 2 24 Fisher's nonsyn. 0 18 ns 0 4 ns Southern syn. 9 31 G-test 8 3 Fisher's nonsyn. 1 15 ns 1 0 ns
Synonymous sites include changes within both the coding and noncoding regions. * P < 0.05, ** P < 0.01
35
FIGURE 1. Collecting locations for D. mojavensis (squares), D. arizonae (circles) and D. navojoa
(triangles): 1. Catalina Island, CA; 2. Anza-Borrego, CA; 3. Grand Canyon, AZ; 4. Hermosillo,
Mex; 5. Desemboque, Mex; 6. Guaymas, Mex; 7. La Paz, Mex; 8. Riverside, CA; 9. Tucson, AZ;
10. Navojoa, Mex; 11. Hidalgo, Mex; 12. Chiapas, Mex; 13. El Dorado, Mex; 14. Jalisco, Mex;
15. Tehuantepec, Mex.
FIGURE 2. Bayesian tree analysis for all GstD1 sequences using the K80+I+Γ model (see text for
more details). The support for branches is shown. The species (solid brackets) and populations
(dashed brackets) are shown. The sample identification for D. mojavensis are: DE =
Desemboque, Mex; HE = Hermosillo, Mex; MJ = Guaymas, Mex; MJBC = La Paz, Mex; CI =
Catalina Island, CA; ANZA = Anza-Borrego, CA; WC = Grand Canyon, AZ. The sample
identification for D. arizonae are: ARNA = Navojoa, Mex; ARTU = Tucson, AZ; RVSD =
Riverside, CA; CHI = Chiapas, Mex; HID and MXT = Hidalgo, Mex. The sample identification
for D. navojoa are: NAV1 = Tehuantepec, Mex; NAV10 = El Dorado, Mex; NAV11 and
NAV12 = Jalisco, Mex.
FIGURE 3. Distribution of the non-synonymous to synonymous divergence (Ka/Ks) using a
sliding window analysis (window size = 50 bp, step size = 10 bp). The divergence between each
of the four D. mojavensis population and D. arizonae is shown. The circles indicate the seven
non-synonymous changes that occur between the Baja and Sonora populations of D. mojavensis
and D. arizonae. The filled circles are those changes that occur in the active site of GSTD1.