38
1 The molecular basis of host adaptation in cactophilic Drosophila: Molecular evolution of a Glutathione S-transferase gene (GstD1) in Drosophila mojavensis Luciano M. Matzkin Department of Ecology and Evolutionary Biology and Center for Insect Science, University of Arizona, Tucson, Arizona 85721 Sequence data from this article have been deposited with the GenBank Data Library under accession nos. EU079380-EU079471. Genetics: Published Articles Ahead of Print, published on February 1, 2008 as 10.1534/genetics.107.083287

The molecular basis of host adaptation in cactophilic ... · (possibly agria) and then migrated up the peninsula and colonized Catalina Island and the Mojave Desert, ... STARMER et

Embed Size (px)

Citation preview

1

The molecular basis of host adaptation in cactophilic Drosophila: Molecular evolution of a Glutathione S-transferase gene (GstD1) in Drosophila mojavensis

Luciano M. Matzkin

Department of Ecology and Evolutionary Biology and Center for Insect Science, University of Arizona, Tucson, Arizona 85721

Sequence data from this article have been deposited with the GenBank Data Library under accession nos. EU079380-EU079471.

Genetics: Published Articles Ahead of Print, published on February 1, 2008 as 10.1534/genetics.107.083287

2

Running head: Molecular evolution of GstD1

Key words: cactophilic Drosophila, Glutathione S-transferase, detoxification, host adaptation,

sequence variation

Corresponding author:

Luciano M. Matzkin

Department of Ecology and Evolutionary Biology

University of Arizona

1041 E. Lowell St.

Tucson, AZ 85721-0088

Phone: (520) 626-2772

Fax: (520) 626-3522

Email: [email protected]

3

ABSTRACT

Drosophila mojavensis is a cactophilic fly endemic to the northwestern deserts of North

America. This species includes four genetically isolated cactus host races each individually

specializing on the necrotic tissues of a different cactus species. The necrosis of each cactus

species provides the resident D. mojavensis populations with a distinct chemical environment. A

previous investigation of the role of transcriptional variation in the adaptation of D. mojavensis

to its hosts produced a set of candidate loci that are differentially expressed in response to host

shifts, among them was Glutathione S-transferase-D1 (GstD1). In both D. melanogaster and

Anopheles gambiae, GstD1 has been implicated in the resistance of these species to the

insecticide DDT. The pattern of sequence variation of the GstD1 locus from all four D.

mojavensis populations, D. arizonae (sister species) and D. navojoa (outgroup) has been

examined. The data suggest that in two populations of D. mojavensis GstD1 has gone through a

period of adaptive amino acid evolution. Further analyses indicate that of the seven amino acid

fixations that occurred in the D. mojavensis lineage, two of them occur in the active site pocket,

potentially having a significant effect on substrate specificity and in the adaptation to alternative

cactus hosts.

4

INTRODUCTION

The concept of “evolution at two levels” proposed by KING and WILSON (1975), predicts that

evolutionary change is a function of both coding sequence and transcriptional variation. There

are myriads of examples highlighting the role of coding sequence variation in evolution and

recently this pattern has been observed at the genomic level (BIERNE and EYRE-WALKER 2004;

BUSTAMANTE et al. 2005; FAY et al. 2002; VOIGHT et al. 2006). As well, it is becoming

apparent that natural selection plays a large role in interspecific transcriptional variation (GILAD

et al. 2006; NUZHDIN et al. 2004; RIFKIN et al. 2003). What is sometimes lacking in many

studies, and often the least tractable, is an understanding of how the transcriptional and sequence

variation relate to the ecology of the organism.

Drosophila mojavensis offers a unique opportunity to incorporate knowledge of its ecology

with its transcriptional and sequence variation. Drosophila mojavensis and its sister species D.

arizonae are cactophilic flies that have diverged ~1.5 million years ago (MATZKIN 2004;

MATZKIN and EANES 2003; REED et al. 2007). These species utilize the necrotic tissues of

several cactus species as their host. The range of D. mojavensis is composed of four

geographically and genetically isolated host races (MACHADO et al. 2007; REED et al. 2007;

ROSS and MARKOW 2006). Each host race, mainland Sonora Desert, Baja California, Catalina

Island and Mojave Desert, utilizes a different species of cactus: organpipe (Stenocereus

thurberi), agria (S. gummosus), prickly pear (Opuntia spp.) and barrel (Ferocactus cylindraceus),

respectively (FELLOWS and HEED 1972; RUIZ and HEED 1988). Drosophila mojavensis has been

proposed (RUIZ et al. 1990) to have originated in Baja California utilizing a Stenocereus cactus

5

(possibly agria) and then migrated up the peninsula and colonized Catalina Island and the

Mojave Desert, shifting cactus hosts in the process. A subsequent colonization (and host shift)

from Baja to Sonora established the present day mainland Sonora Desert population.

The differences in the chemical composition of the cacti, in addition to the microflora

associated with the necrosis (STARMER 1982; STARMER et al. 1986) produce very distinct

chemical environments to which each population of D. mojavensis must adapt (HEED 1978;

KIRCHER 1982; VACEK 1979). Some of the chemical differences include such compounds as

alcohols, alkaloids (in Opuntia), triterpenes and glycosides. Previous studies in D. mojavensis

have shown that this chemical variation can drive the molecular and functional evolution of

metabolic genes (MATZKIN 2004; MATZKIN 2005; MATZKIN and EANES 2003). For example,

alleles of alcohol dehydrogenase-2 (ADH-2) with the greatest activity on 2-propanol are found at

highest frequency in a population (Baja California) that experiences greater 2-propanol

concentration within the cactus necrosis (agria). Additionally, a previous study examining

transcriptional variation in D. mojavensis identified ADH-2 as one of a series of genes whose

expression is induced as a response to a cactus host shift (MATZKIN et al. 2006). This suggests

that both transcriptional and coding sequence changes are involved in the adaptation process.

Among the list of candidate genes previously shown to differentially regulate with host

utilization was a gene with high sequence identity to the D. melanogaster Glutathione S-

transferase D1 (GstD1) gene (MATZKIN et al. 2006). In D. melanogaster (TANG and TU 1994) as

well as in the mosquito Anopheles gambiae (RANSON et al. 2001), GstD1 has a high activity and

is presumed to be involved in the resistance to the insecticide DDT (Dichloro-Diphenyl-

6

Trichloroethane). In general several members of the Gst gene family have been known to play a

role in detoxification in many taxa, including insects (ENAYATI et al. 2005). GST’s generally

function on hydrophobic organic compounds, altering their hydrophobicity (ATKINS et al. 1993).

Given the large quantities of organic compounds within the cactus necrosis, some of which are

toxic, there is constant selection pressure on D. mojavensis to evolve resistance to these

compounds, especially in the presence of a cactus host shift. Therefore, it can be predicted that

as a consequence of a cactus host shift, detoxification enzymes might be under selection (both at

the transcriptional and coding sequence level). This study investigates the pattern of variation

among the different cactus host populations of D. mojavensis and D. arizonae at GstD1, a gene

previously suggested to play a role in host adaptation (MATZKIN et al. 2006). Glutathione S-

transferase D1 appears to have been through a period of adaptive protein evolution associated

with the cactus host shifts of two D. mojavensis populations. The fixed amino acid changes that

occurred in D. mojavensis have potentially large functional consequence given their location on

the enzyme’s structure. Conversely, strong purifying selection appears to have occurred in the

other two D. mojavensis populations as well as in D. arizonae, possibly as a result of

commonalities in cactus host utilization. Additionally, the pattern of population structure at this

locus suggests an alternative scenario to the relationship and history of the four D. mojavensis

host races.

MATERIALS AND METHODS

Sampling and sequencing: All fly samples used in this study, with the exception of D.

7

navojoa (outgroup) and one D. arizonae line, were collected from the field and maintained as

isofemale lines on Banana/Opuntia media. The four D. navojoa lines sampled were obtained

from the Tucson Drosophila Species Stock Center (stocks 15081-1374.01 [Tehuantepec, Mex.],

15081-1374.10 [El Dorado, Mex.], 15081-1374.11 and 15081-1374.12 [both from Jalisco,

Mex.]). The collection sites of all samples used in this study are shown in Fig. 1. The collection

of D. arizonae included 20 lines from the northern population (seven from Navojoa, Mex., six

from Riverside, CA and eight from Tucson, AZ) and 5 from the southern population (four from

Hidalgo, Mex. [one was from the Stock Center, 15081-1271.05] and one from Chiapas, Mex.).

A total of 63 lines were used for D. mojavensis: 15 from the Mojave population (seven from

Grand Canyon and eight from Anza-Borrego), eight from the Catalina Island population, 17 from

the Baja California population (La Paz, Mex.), and 23 from the mainland Sonora population

(three from Desemboque, Mex., four from Hermosillo, Mex. and 16 from Guaymas, Mex.).

The complete nucleotide sequence of the D. mojavensis GstD1 locus was obtained by querying

the sequenced clone from our previous study (MATZKIN et al. 2006) to the D. mojavensis

genome sequence (http://rana.lbl.gov/drosophila/). The complete coding sequence of GstD1 is

630 bp long, and like other D-class Gst’s it is intron-less. The entire GstD1 coding region plus

99 bp upstream from the initiation codon was sequenced in all the 93 lines mentioned above.

Genomic DNA was extracted from one individual per isofemale line using DNeasy columns

(Qiagen, Valencia, CA). PCR amplification of the GstD1 gene region was done using the

following forward and reverse primers: 5’ CATGAGCCCGATATTTAATT 3’ and 5’

TGGAGCTCAATCGAAGTATT 3’. Amplifications were done at 54° C annealing temperature

in an Eppendorf MasterCycler using Invitrogen Taq DNA polymerase. Sequencing of both

8

strands using the above primers was done in an Applied Biosystems (Foster City, CA) 3730XL

DNA Analyzer housed at the Genomic Analysis and Technology Core Facility at the University

of Arizona. All sequences are stored under GenBank accession nos. EU079380-EU079471.

Data analysis: All chromatographs were aligned and sequence contigs were assembled using

Sequencher v4.6 (Ann Arbor, MI) software. As stated above, most sequences were derived from

individuals from isofemale lines, therefore it is expected that there will be segregating

polymorphisms within the sequence strains. Knowledge of the linkage phase is not necessary

however, for certain analyses that test departure from neutrality, such as the McDonald-Kreitman

(MK) test (MCDONALD and KREITMAN 1991). Given that polymorphism can lead to fixations,

under neutrality the ratio of synonymous to non-synonymous polymorphisms and fixations

should be similar. The MK test determines via a G-test if there is a difference between these two

ratios (i.e. departure from neutrality), hence only knowledge of the number of fixations and

polymorphisms is needed. Conversely, linkage phase is needed to implement analysis based on

pairwise nucleotide differences, as well for the purpose of examining phylogenetic relationships.

Therefore, the linkage phase had to be inferred from the dataset. Several methods exist for

haplotype reconstruction. The earliest method created was based on parsimony (CLARK 1990).

Although it was simple, it had a high error rate. Recently both maximum likelihood and

Bayesian methods have been devised. Two such a methods (HAP and ELB) were utilized in this

study. HALPERIN and ESKIN (2004) devised a maximum likelihood algorithm (HAP) that breaks

up the sequence into block of limited diversity. This algorithm was implemented using the HAP

website (http://diego.ucsd.edu/hap/html/). Although, HAP appears to be faster and more

accurate than previous methods (HALPERIN and ESKIN 2004), its performance using data with

9

known recombination is still to be determined. One method that has proven to have the lowest

error rate using markers with recombination is the pseudo-Bayesian ELB algorithm (EXCOFFIER

et al. 2003). The ELB method was implemented from the Arlequin (ver. 3.01) software package

(EXCOFFIER et al. 2005). After both haplotype reconstructions were reconciled (see Results), one

haplotype was randomly chosen per individual. This dataset of reconstructed haplotypes was

used in all subsequent analysis except when otherwise noted.

Evolutionary analyses of the sequence variation at GstD1 were done mostly using the DnaSP

ver. 4.10 (ROZAS et al. 2003) and Arlequin ver. 3.01 (EXCOFFIER et al. 2005) software packages.

Historical selection and/or demographic changes can leave their mark in the frequency spectrum

and number of sequence polymorphisms (NIELSEN 2001); Tajima’s D (TAJIMA 1989), Fu and

Li’s F (FU and LI 1993), Fu’s FS (FU 1997) and Fay & Wu’s H (FAY and WU 2000) test statistics

were used to examine this pattern. Significant negative values are due to large numbers of low

frequency alleles, suggesting that a sweep or bottleneck has occurred in the past. Positive values

are a result of few alleles being maintained at relative high frequency, suggesting balancing

selection. Conversely, selective neutrality tests that are independent of genealogy or those in

which genealogy can be removed, such as the MK test, are robust to demographic changes

(NIELSEN 2001). The evolutionary history of GstD1 was examined using the MK test,

additionally an outgroup (D. navojoa) was used to polarize the fixed differences. Therefore

lineage specific MK test were performed for each species and as well as for each of the D.

mojavensis populations. Statistical significance for the frequency distribution tests and the 95%

confidence intervals for θ was estimated by performing 10,000 coalescent simulations

incorporating the estimated recombination rate for the particular population tested (DnaSP and

10

Arlequin was used for these analysis). Recombination rates were calculated using the maximum

likelihood method of (HUDSON 2001). The coalescent simulations of θ for D. arizonae and the

four D. mojavensis populations was used to examine for the presence of significant differences in

the level of variation between population and/or species.

Phylogenetic analysis was performed using both a maximum likelihood (ML) and a Bayesian

approach. Individuals with identical haplotypes were combined. The ML analysis was done

using the program PAUP* ver. 4.0b10 (SWOFFORD 2003) and for the Bayesian analysis the

program MrBayes ver. 3.1.2 (HUELSENBECK and RONQUIST 2001; RONQUIST and HUELSENBECK

2003) was used. Evolutionary models were chosen using the programs Modeltest ver. 3.7

(POSADA and CRANDALL 1998) and MrModeltest ver. 2.2 (NYLANDER 2004). Phylogenetic

robustness of the ML analysis was determined by performing 1,000 bootstrap replicates. The

Bayesian analysis was done using 4 chains (three heated and one cold) of 10,000,000 generations

sampling every 100 generations and discarding the first 1,000 trees. Since GstD1 is a nuclear

recombining gene, the phylogenetic analysis were used to visually illustrate the relationships

between the populations and species and not of the individuals sampled.

RESULTS

Intra- and interspecific variation at GstD1: Out of the 93 individuals sequenced, only 16

contained ambiguous sites and only seven contained more than two ambiguous sites. For the

most part the HAP and ELB algorithms behaved similarly. In eight cases the haplotype

predictions between the HAP and ELB algorithm differed, and five of those differ only due to

11

the placement of singletons. Given that GstD1 can potentially recombine, the ELB algorithm

seems most appropriate and its results were thus used for the analysis. The reconstructed

haplotype dataset used in the analysis is provided as a supplemental data file (found at

http://www.genetics.org/supplemental/).

Overall there was a very small amount of linkage disequilibrium within each of the D.

mojavensis populations and in D. arizonae. Only three out of 36 pairwise comparisons were

significant using Fisher’s exact test in the Sonora population and only one out of ten

comparisons was significant in the Baja population. There were no significant pairwise

comparisons in the Mojave and Catalina Island populations. Within the Northern D. arizonae

population only one out of 153 comparisons showed a departure from independence. Tables of

pairwise comparisons and P-values are provided as a supplemental file (supplemental Table 1 at

http://www.genetics.org/supplemental/). The level of intragenic recombination in GstD1 is

shown in supplemental Table 2 (found at http://www.genetics.org/supplemental/).

The level of synonymous, nonsynonymous and noncoding sequence variation for all three

species is shown in Table 1. Overall D. arizonae contained the greatest number of segregating

synonymous sites (θs). Using 10,000 coalescent simulations the 95% interval of θs for the

Northern D. arizonae population was estimated (0.0208-0.0605) and compared with that of D.

mojavensis. For every comparison the level of synonymous variation in the Northern D.

arizonae population was significantly greater (Mojave, P < 0.001; Catalina Island, P < 0.001;

Sonora, P < 0.001; Baja, P = 0.001). A similar pattern between Northern D. arizonae and the

four D. mojavensis populations was observed comparing both synonymous and noncoding

12

sequence variation (Mojave, P < 0.001; Catalina Island, P < 0.001; Sonora, P < 0.001; Baja, P =

0.011). Within D. mojavensis, the Baja California population contains double the synonymous

variation of Sonora and Catalina Island populations and more than seven times that of the

Mojave population. Although it may appear that there is a greater amount of nonsynonymous

variation when all D. mojavensis populations are combined, this is because the four populations

are genetically isolated (see below). When comparing each D. mojavensis population separately,

with exception of the Catalina Island population (P = 0.021), the level of nonsynonymous

variation (θn) is not significantly different from the Northern D. arizonae population (95%

interval 0.0006-0.0047 based on 10,000 coalescent simulations).

Population structure: Population structure within D. mojavensis and D. arizonae was

estimated by calculating the FST parameter. Only synonymous and noncoding variation was used

in the calculation of FST in order to remove any possible bias due to selection at nonsynonymous

sites. To be certain there is no structure within each of the D. mojavensis population, pairwise

FST were calculated for each of the collection sites separately (supplemental Table 3 at

http://www.genetics.org/supplemental/). Most of the comparisons were significant with the

exception of the two collection sites within the Mojave population and between the three

collections sites within the Sonora population. When pooling all collections sites within a

population, there is significant genetic isolation between all the four D. mojavensis populations

(Table 2). With the exception of the comparison between the Catalina Island and Mojave

population (P = 0.99), a similar pattern was observed when using nonsynonymous sites

(supplemental Table 4 at http://www.genetics.org/supplemental/). This result was expected since

the Catalina Island and Mojave populations share the same protein haplotype. The Northern D.

13

arizonae population is composed of collections within the Sonoran desert and Riverside, CA.

There was no significant FST between those two collection sites (FST = 0.068, P = 0.09). When

pooling all the Northern D. arizonae collections and comparing it to the Southern D. arizonae,

significant genetic isolation was observed (FST = 0.59, P < 0.001).

Phylogenetic analysis: Both maximum likelihood (ML) and Bayesian methods of

phylogenetics analysis were used in this study. With very minor exceptions, both methods

resulted in the same phylogenetic structure. Figure 2 shows the result of the Bayesian analysis

(results of the ML analysis not shown). For the Bayesian analysis the evolutionary model chosen

was K80+I+Γ (I = 0.573, Γ = 0.883) with codon partitioning (KIMURA 1980). Both D.

mojavensis and D. arizonae are reciprocally monophyletic. The D. mojavensis clade is split into

two well supported groups, Baja/Sonora and Catalina Island/Mojave. The Southern D. arizonae

population forms a monophyletic group within the larger polyphyletic clade of D. arizonae.

Frequency and test of neutrality analysis: Given the above determined population structure

within the two species, demographic changes within each population could be determined with

less bias by analyzing each population separately. Table 3 contains the summary statistics for

several frequency distributions tests done on each of the D. mojavensis and D. arizonae

populations (analysis of D. navojoa is not shown given low statistical power due to the small

sample size). In the Baja D. mojavensis population the above statistics were all negative and the

majority significantly. The only other D. mojavensis population with negative statistics was the

Sonora population, although none were significant. In both D. arizonae populations the majority

of the statistics were negative (see Table 3).

14

The distribution of sequence divergence across GstD1, especially that of nonsynonymous

divergence, appears to be heterogeneous in the Baja California and mainland populations, while

variation is distributed homogenously within the Catalina Island and Mojave populations (Figure

3). Another method of comparing the level fixed differences and polymorphisms is the MK test

(Table 4). This test was implemented in two ways (standard and lineage-specific), the first

compared each population against the combined dataset of the sister species (e.g. Baja D.

mojavensis vs. all D. arizonae) and the second used the sequence of the outgroup D. navojoa to

polarize the fixed differences to be able to examine lineage specific departures from neutrality.

Although the MK test is robust to demographic changes (NIELSEN 2001), the effect of such

changes on these analyses cannot be fully determined until a robust knowledge of the

demographic histories of the D. mojavensis populations and D. arizonae is ascertained (a large

scale multilocus analysis of the demographic histories of D. mojavensis and D. arizonae is in

progress). Another factor that can affect the interpretation of the MK test is if the gene occurs in

a region of low recombination (SHAPIRO et al. 2007). Genes in regions of low recombination,

such as those located at either the centromeric or telomeric ends of chromosomes are more

drastically affected by hitchhiking (ANDOLFATTO and PRZEWORSKI 2001; SMITH and HAIGH

1974) and background selection (CHARLESWORTH et al. 1993) at nearby genes. Recombination

rates across chromosomes in D. mojavensis have not been determined, but given that GstD1 is

located in the middle of Muller element E (CONSORTIUM 2007) it is not likely to be in a low

recombining region of the genome. An assumption of the MK test is that all synonymous

positions be silent; this is not the case in the presence of codon usage bias (AKASHI 1999;

KLIMAN and HEY 1994). The effective number of codons (ENC) is a measure of codon usage

15

bias which ranges from 20 (extreme bias) to 61 (equal use of all codons) (WRIGHT 1990). In Gst,

the ENC in D. mojavensis (32.4) and D. arizonae (32.8) are similar and do not show a very high

level of bias, therefore codon usage bias is not expected to greatly affect the analysis.

Results for both Baja and Sonora populations of D. mojavensis significantly reject the

assumptions of neutrality of the MK test in the direction of a greater fixation of nonsynonymous

substitutions (Table 4). For example, in the Baja population the ratio of nonsynonymous to

synonymous polymorphisms is 0.25 while the same ratio for fixed differences that occur in the

lineage leading to the Baja D. mojavensis populations is 1.75. A significant departure from the

neutral expectation was not found in the other two D. mojavensis populations or in D. arizonae

(Table 4).

Given the relatively recent divergence of D. mojavensis and D. arizonae, a large number of

fixations between the species would not be expected. If the divergence time between the species

is much less than 4Ne generations ago the incomplete assortment of the ancestral variation would

be expected. To better estimate the divergence between the two species all the population data

on D. mojavensis and D. arizonae have been compiled including data from studies of the

following loci: from Adh-1, Adh-2 and G6pd (MATZKIN 2004; MATZKIN and EANES 2003), the

10 random nuclear loci from MACHADO et al. (2007), and GstD1 (this study). This data was

used to perform 10,000 coalescent simulation using the HKA program (J. HEY;

http://lifesci.rutgers.edu/heylab/). The divergence time between D. mojavensis and D. arizonae

was estimated to have occurred 3.3Ne generations ago. According to CLARK (1997) this would

suggest that the majority of the once shared variation would have been fixed or lost in one or the

16

other lineage. Any polymorphism that has not fixed in one or possibly both species would

increase the number of polymorphisms within species and decrease the amount of fixed

differences between the species. This essentially will make it difficult to observe a pattern of

variation suggesting adaptive protein evolution, as was seen for GstD1 in this study.

To better determine the effect of the divergence time between D. mojavensis and D. arizonae

on the ability to detect a significant pattern of adaptive protein evolution a series of coalescent

simulations were performed. The simulations were performed only for the two populations that

show a significant departure from neutrality, Baja California and Sonora (Table 4). Using the ms

program (HUDSON 2002), 10,000 coalescent simulations were performed of a D. mojavensis

population splitting from D. arizonae 3.3Ne generations ago. For the Baja California and Sonora

populations θ was set to set 4.48 and 3.54, respectively (θ values are from GstD1). Since the

focus was on the strict neutral model the observed level of variation at GstD1 was used to

calculate the probability that a given mutation from a simulated replicate would be assigned to

either a synonymous or non-synonymous site class (0.12 for Baja California and 0.19 for

Sonora). For each replicate, a MK test was performed and significance was calculated using a χ2

test. For the Baja California population the probability of observing a significant MK test

suggesting adaptive protein evolution under the neutral model is P = 0.015 and for the Sonora

population is P = 0.049. Therefore, it is unlikely that the significant results observed in this

study are a result of the relatively short divergence time between D. mojavensis and D. arizonae.

17

DISCUSSION

Population history of D. mojavensis and D. arizonae: The pattern of variation at GstD1

strongly supports the existence of four genetically isolated populations for D. mojavensis (Table

2). This observation supports previously studies using both nuclear and mitochondrial genes

(MACHADO et al. 2007; REED et al. 2007; ROSS and MARKOW 2006). A point of disagreement

between studies on nuclear genes (MACHADO et al. 2007), including this study, and

mitochondrial genes (REED et al. 2007) is the relationships between the four D. mojavensis

populations. As seen in GstD1, nuclear genes show that the Baja and Mainland populations form

one cluster and the Catalina Island and Mojave populations form another (Figure 2).

Mitochondrial genes suggest that the Catalina Island population is embedded within the

Baja/Mainland cluster and that the Mojave population is most basal. The mitochondrial pattern

contradicts the previous proposed model of the speciation of D. mojavensis and D. arizonae.

RUIZ et al. (1990) proposed that D. mojavensis diverged from D. arizonae in the Baja California

peninsula by an allopatric event, then a small subpopulation migrated north to the Mojave desert

and eventually became isolated. A further colonization event from this isolated Mojave

population lead to the Catalina Island population and in a similar fashion the Sonora population

was later formed by colonizers from the Baja peninsula. According to this hypothesis it is

expected that the greatest diversity will be observed within the Baja population and that the

Catalina Island and Sonora populations will be clustered within the Mojave and Baja

populations, respectively. Evidence from GstD1 (present study), Adh-1/Adh-2 (MATZKIN 2004;

MATZKIN and EANES 2003) and a random set of ten nuclear markers (MACHADO et al. 2007)

show that there is a greater level of nucleotide variation within the Baja population (Table 1).

18

Although the phylogeographic pattern at GstD1 support a Baja/Sonora and Catalina

Island/Mojave clustering (Figure 2), the Catalina Island population did not appear to be nested

within Mojave and relationships between Baja and Sonora could not be fully resolved.

The level of synonymous variation in the Northern D. arizonae is significantly greater than in

all four of the D. mojavensis populations suggesting a greater effective population size in the

Northern D. arizonae (Table 1). The level of variation in the Southern D. arizonae population

samples appears to be similar to that of D. mojavensis, but only few individuals were available

from that population. Similarly to D. mojavensis, disagreements between nuclear and

mitochondrial markers are observed in D. arizonae. A greater effective population size of D.

arizonae relative to D. mojavensis has previously been observed for nuclear genes (MACHADO et

al. 2007; MATZKIN 2004; MATZKIN and EANES 2003), while the opposite was observed for the

mitochondria (REED et al. 2007). A different pattern of variation between nuclear (diparentally

inherited) and mitochondrial (maternally inherited) markers could be due to differences in

dispersal rates between the sexes, the mating system (CHESSER and BAKER 1996), selection on

the mitochondria (BAZIN et al. 2006) or possibly due to genetic draft associated with

endosymbionts (MEIKLEJOHN et al. 2007). No sex specific dispersal rates were previously

observed in D. mojavensis, but this was only examined in one of the four populations (MARKOW

and CASTREZANA 2000). Spiroplasma endosymbionts have been found in D. mojavensis

(MATEOS et al. 2006), although the nature of this interaction has not been resolved. These

discrepancies in results between nuclear and mitochondrial genes warrant future examination.

19

Evolutionary history of GstD1: The cactophilic lifestyle of D. mojavensis and D. arizonae, as

well as other members of the Repleta group, subject them to a wide array of chemical, sometimes

toxic, environments (KIRCHER 1982). This chemical diversity is dependent on both the cactus

host and the bacterial and yeast communities associated with the necrosis (STARMER 1982;

STARMER et al. 1986). For example differences in alcohols between agria and organpipe were

proposed to have driven the molecular and functional divergence of Adh between the Baja and

Sonora populations (MATZKIN 2004; MATZKIN 2005). In addition to alcohols, a variety of other

compounds are experienced by populations of D. mojavensis, from the alkaloids of Opuntia in

Catalina Island to the triterpenes and glycosides of the columnar cacti (agria and organpipe) of

both Baja and Sonora (KIRCHER 1977; KIRCHER 1980; KIRCHER 1982). Although chemical

differences occur between agria and organpipe necrosis, these two columnar cacti share many

chemical properties. Agria and organpipe are closely related, both members of the same genus,

Stenocereus, and are distantly related and more derived relative to barrel cactus and even further

to prickly pear (COTA and WALLACE 1997; NYFFELER 2002). The phylogenetic relationships,

and thus the chemical relatedness between the cactus hosts (agria/organpipe vs. barrel/prickly

pear) appear to associate closely with the pattern of variation at GstD1 (Baja/Sonora vs. Catalina

Island/Mojave).

In the Baja and Sonora populations, a large level of interspecific amino acid divergence in the

5’ and 3’ end of the GstD1 locus was observed (Figure 3). Although the Baja and Sonora

populations appear significantly diverged when examining synonymous positions (Table 2), they

appear to have preserved a very similar protein haplotype for GSTD1. Similarly, the populations

in Mojave and Catalina Island share the same protein haplotype, but there are no fixed amino

20

acid differences at GSTD1 between these populations and D. arizonae (Table 4). Under a

neutral scenario these strikingly different patterns of variation between the Baja/Sonora and

Catalina Island/Mojave populations could reflect a reduction in population size in the

Baja/Sonora lineage that allowed for the fixation of slightly deleterious amino acid

polymorphisms and a subsequent maintenance of a large population size in Catalina

Island/Mojave that would deter the fixation of slightly deleterious amino acid polymorphisms.

This scenario, which excludes selection as the driver for the pattern of variation seen between

populations, is not supported by the observations made in this study. A significant shift in the

frequency spectrum was observed for the Baja population and a similar pattern was seen for the

Sonora population (Table 3). This shift in the frequency spectrum can be either a product of

demographic changes, such as population expansion, or positive selection (NIELSEN 2001). The

effect of a population expansion will be genome wide and therefore a similar signature at many

loci should be expected. Recently, a combined analysis of ten random regions across the

genome of D. mojavensis observed that for all four populations there was an overall signature of

population expansion (MACHADO et al. 2007). In that study and as well as in this, it is clear that

the level of variation, and hence population size, is greatest in the Baja and Sonora populations

relative to the Catalina Island and Mojave populations. Furthermore, as the level of variation in

Catalina Island and Mojave is much lower than for D. arizonae, it seems unlikely that the lack of

amino acid differences is associated with a large population size in Catalina Island and Mojave.

Therefore it appears that the pattern of variation observed could have been formed as a result of

positive selection in the Baja/Sonora lineage and purifying selection in the Catalina

Island/Mojave lineage.

21

A significant departure from the neutral expectation is observed in the Baja and Sonora

populations (Table 4). Furthermore, the lineage specific MK test (Table 4) suggests that more

amino acid changes occurred in the lineage leading to the Baja and Sonora populations of D.

mojavensis than is expected by chance. Both Baja and Sonora share the same amino acid

differences with D. arizonae, with the exception of residue 178, which is fixed in Baja and found

in all but one individual sampled from Sonora. The location of the seven amino acids provides

further evidence that this locus has gone through a period of adaptive protein evolution in the

past. Two of the seven fixed differences between D. arizonae and D. mojavensis from Baja and

Sonora (Leu-7-Gln and His-39-Gln) occur in the active site pocket of the GSTD1 enzyme. The

crystallographic structure and active site has been determined previously for the A. gambiae

homolog of the D. mojavensis GSTD1 (CHEN et al. 2003). The active site pocket is composed of

two subsites, the G-site and H-site. The G-site, mostly composed of hydrophilic and polar

residues, is the location where glutathione binds, while the H-site is a hydrophobic pocket where

the substrate binds (CHEN et al. 2003). The histidine (positively charged) to glutamine (polar)

change at residue 39 occurs in the G-site. Although this substitution might appear to cause a

charge change in the molecule, given the isoelectric point of histidine (7.6) at a physiological pH

(nearly neutral) the side chain will not be charged. The leucine (hydrophobic) to glutamine

(polar) change occurs in the H-site. The location of these two changes, plus the fact that they

also affect the polarity of the active site, could have significant consequences for the activity and

substrate specificity of GSTD1.

It has been proposed (HEED 1982) that the host of the D. mojavensis/D. arizonae ancestor was

prickly pear. Evidence from this study and previous surveys (MACHADO et al. 2007) suggests

22

that the Baja/Sonora and Catalina Island/Mojave clusters are well supported. Therefore it can be

hypothesized that following the divergence of D. mojavensis, a population settled in Baja or

Sonora and shifted to columnar cacti, while a population migrated north remaining in prickly

pear and subsequently moved to barrel cactus. The shift to columnar cacti by the ancestral

Baja/Sonora population exposed them to alternative compounds. It is at this moment in its recent

evolutionary history that it appears that selection at GstD1 took place. Given the pattern of

sequence divergence observed in this study, its transcriptional variation (MATZKIN et al. 2006)

and the known role of GstD1 in detoxification, this strongly suggests that GstD1 played a role in

the columnar cactus host adaptation of Baja and Sonora populations of D. mojavensis.

Transcriptional and coding sequence evolution: Cactophilic Drosophila, as well as many

other phytophagous insects have adapted to the chemical composition of their host (BECERRA

1997). The effects of and response to such chemical diversity is especially great in the presence

of a host shift. It is at this point where the ability to modulate enzyme activity and pathway

fluxes, especially those involved in detoxification, are of great importance.

Enzymes tend to have as their substrate a set of related compounds, when faced with a non-

optimal compound the velocity (i.e. enzyme activity) of the reaction will be reduced. To

maintain wild type activity, changes at the coding sequence (e.g. modification of active site) or in

the enzyme titer (e.g. increase expression) will be needed. When faced with an environmental

shift (e.g. cactus host shift) four outcomes can be predicted for an enzyme, termed here as: 1).

neutral, 2). coding sequence, 3). transcriptional, and 4). coding sequence and transcriptional.

23

In the first case neutral changes in the coding sequence and in the expression of the enzyme

will accumulate because the enzyme has no involvement in the detoxification or because of

constraint. The second outcome involves the fixation of new non-synonymous mutations that

modify the active site, and hence increase the enzyme activity, although new mutations are not

necessary for this outcome (HARTL et al. 1985). For the last two outcomes changes in the ability

to modulate gene expression are needed. Given the structure of transcription binding sites it is

probable that mutations can create a potential binding domain (STONE and WRAY 2001). As a

response to an environmental shift the titer and expression of transcription factors may be

affected such that the previously non-functional binding domain will be activated and might

subsequently increase the expression of the enzyme and hence increase activity. Therefore, it is

these genetic backgrounds that have the ability to modulate gene expression that will increase in

frequency (third outcome). The pattern of variation in GstD1 highlights the fourth outcome,

which in addition to transcriptional modulation there are significant changes at the protein level.

Evolutionary change, as observed in D. mojavensis, can therefore involve a combination of

transcriptional and coding sequence changes.

ACKNOWLEDGMENTS

I like to thank Kristin Sweetser for all her hard work on this project and Therese Markow for

assistance and support. Therese Markow, Carlos Machado, Jeremy Bono and two anonymous

reviewers provided helpful suggestions and comments on the manuscript. This study was

supported by the NIGMS 2K12 G000798-06 Postdoctoral Excellence in Research and Teaching

(PERT) fellowship through the Center for Insect Science at the University of Arizona to LMM.

24

LITERATURE CITED

AKASHI, H., 1999 Inferring the fitness effects of DNA mutations from polymorphism and

divergence data: Statistical power to detect directional selection under stationarity and

free recombination. Genetics 151: 221-238.

ANDOLFATTO, P., and M. PRZEWORSKI, 2001 Regions of lower crossing over harbor more rare

variants in African populations of Drosophila melanogaster. Genetics 158: 657-665.

ATKINS, W. M., R. W. WANG, A. W. BIRD, D. J. NEWTON and A. Y. H. LU, 1993 The catalytic

mechanism of Glutathione-S-Transferase (Gst): Spectroscopic determination of the

Pk(A) of Tyr-9 in rat Alpha-1-1 Gst. J. Biol. Chem. 268: 19188-19191.

BAZIN, E., S. GLEMIN and N. GALTIER, 2006 Population size does not influence mitochondrial

genetic diversity in animals. Science 312: 570-572.

BECERRA, J., 1997 Insects on plants: Macroevolutionary chemical trends in host use. Science

276: 253-256.

BIERNE, N., and A. EYRE-WALKER, 2004 The genomic rate of adaptive amino acid substitution

in Drosophila. Mol. Biol. Evol. 21: 1350-1360.

BUSTAMANTE, C. D., A. FLEDEL-ALON, S. WILLIAMSON, R. NIELSEN, M. T. HUBISZ et al., 2005

Natural selection on protein-coding genes in the human genome. Nature 437: 1153-1157.

CHARLESWORTH, B., M. T. MORGAN and D. CHARLESWORTH, 1993 The effect of deleterious

mutations on neutral molecular variation. Genetics 134: 1289-1303.

CHEN, L., P. R. HALL, X. E. ZHOU, H. RANSON, J. HEMINGWAY et al., 2003 Structure of an

insect [delta]-class glutathione S-transferase from a DDT-resistant strain of the malaria

vector Anopheles gambiae. Acta Crystallogr. D 59: 2211-2217.

25

CHESSER, R., and R. BAKER, 1996 Effective sizes and dynamics of uniparentally and diparentally

inherited genes. Genetics 144: 1225-1235.

CLARK, A. G., 1990 Inference of haplotypes from PCR-amplified samples of diploid

populations. Mol. Biol. Evol. 7: 111-122.

CLARK, A. G., 1997 Neutral behavior of shared polymorphism. Proc. Natl. Acad. Sci. 94: 7730-

7734.

CONSORTIUM, D. C. G. S. A. A., 2007 Genomics on a phylogeny: Evolution of genes and

genomes in the genus Drosophila. Nature in press.

COTA, J. H., and R. S. WALLACE, 1997 Chloroplast DNA evidence for divergence in Ferocactus

and its relationships to North American columnar cacti (cactaceae: Cactoideae). Syst.

Bot. 22: 529-542.

ENAYATI, A. A., H. RANSON and J. HEMINGWAY, 2005 Insect glutathione transferases and

insecticide resistance. Insect Mol. Biol. 14: 3-8.

EXCOFFIER, L., G. LAVAL and D. BALDING, 2003 Gametic phase estimation over large genomic

regions using an adaptive window approach. Hum. Genom. 1: 7-19.

EXCOFFIER, L., G. LAVAL and S. SCHNEIDER, 2005 Arlequin ver. 3.0: An integrated software

package for population genetics data analysis. Evol. Bioinformatics Online 1: 47-50.

FAY, J. C., and C. I. WU, 2000 Hitchhiking under positive darwinian selection. Genetics 155:

1405-1413.

FAY, J. C., G. J. WYCKOFF and C. I. WU, 2002 Testing the neutral theory of molecular evolution

with genomic data from Drosophila. Nature 415: 1024-1026.

FELLOWS, D. F., and W. B. HEED, 1972 Factors affecting host plant selection in desert-adapted

cactiphilic Drosophila. Ecology 53: 850-858.

26

FU, Y. X., 1997 Statistical tests of neutrality of mutations against population growth, hitchhiking

and background selection. Genetics 147: 915-925.

FU, Y. X., and W. H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693-709.

GILAD, Y., A. OSHLACK, G. K. SMYTH, T. P. SPEED and K. P. WHITE, 2006 Expression profiling

in primates reveals a rapid evolution of human transcription factors. Nature 440: 242-

245.

HALPERIN, E., and E. ESKIN, 2004 Haplotype reconstruction from genotype data using imperfect

phylogeny. Bioinformatics 20: 1842-1849.

HARTL, D. L., D. E. DYKHUIZEN and A. M. DEAN, 1985 Limits of adaptation: The evolution of

selective neutrality. Genetics 111: 655-674.

HEED, W. B., 1978 Ecology and genetics of sonoran desert Drosophila, pp. 109-126 in

Ecological genetics: The interface, edited by P. F. BRUSSARD. Springer-Verlag.

HEED, W. B., 1982 The origin of Drosophila in the sonoran desert, pp. 65-80 in Ecological

genetics and evolution: The cactus-yeast-Drosophila model system, edited by J. S. F.

BARKER and W. T. STARMER. Academic Press, New York.

HUDSON, R. R., 2001 Two-locus sampling distributions and their application. Genetics 159:

1805-1817.

HUDSON, R. R., 2002 Generating samples under a wright-fisher neutral model of genetic

variation. Bioinformatics 18: 337-338.

HUELSENBECK, J. P., and F. RONQUIST, 2001 MrBayes: Bayesian inference of phylogenetic trees.

Bioinformatics 17: 754-755.

KIMURA, M., 1980 A simple method for estimating evolutionary rates of base substitutions

through comparative studies of nucleotide-sequences. J. Mol. Evol. 16: 111-120.

27

KING, M.-C., and A. C. WILSON, 1975 Evolution at two levels in humans and chimpanzees.

Science 188: 107-116.

KIRCHER, H. W., 1977 Triterpene glycosides and queretaroic acid in organ pipe cactus.

Phytochemistry 16: 1078.

KIRCHER, H. W., 1980 Triterpenes in organ pipe cactus. Phytochemistry 19: 2707.

KIRCHER, H. W., 1982 Chemical composition of cacti and its relationship to sonoran desert

Drosophila, pp. 143-158 in Ecological genetics and evolution: The cactus-yeast-

Drosophila model system, edited by J. S. F. BARKER and W. T. STARMER. Academic

Press, New York.

KLIMAN, R. M., and J. HEY, 1994 The effects of mutation and natural-selection on codon bias in

the genes of Drosophila. Genetics 137: 1049-1056.

MACHADO, C. A., L. M. MATZKIN, L. K. REED and T. A. MARKOW, 2007 Multilocus nuclear

sequences reveal intra- and interspecific relationships among chromosomally

polymorphic species of cactophilic Drosophila. Mol. Ecol. 16: 3009-3024.

MARKOW, T. A., and S. CASTREZANA, 2000 Dispersal in cactophilic Drosophila. Oikos 89: 378-

386.

MATEOS, M., S. J. CASTREZANA, B. J. NANKIVELL, A. M. ESTES, T. A. MARKOW et al., 2006

Heritable endosymbionts of Drosophila. Genetics 174: 363-376.

MATZKIN, L. M., 2004 Population genetics and geographic variation of alcohol dehydrogenase

(Adh) paralogs and glucose-6-phosphate dehydrogenase (G6pd) in Drosophila

mojavensis. Mol. Biol. Evol. 21: 276-285.

MATZKIN, L. M., 2005 Activity variation in alcohol dehydrogenase paralogs is associated with

adaptation to cactus host use in cactophilic Drosophila Mol. Ecol. 14: 2223-2231.

28

MATZKIN, L. M., and W. F. EANES, 2003 Sequence variation of alcohol dehydrogenase (Adh)

paralogs in cactophilic Drosophila. Genetics 163: 181-194.

MATZKIN, L. M., T. D. WATTS, B. G. BITLER, C. A. MACHADO and T. A. MARKOW, 2006

Functional genomics of cactus host shifts in Drosophila mojavensis. Mol. Ecol. 15: 4635-

4643.

MCDONALD, J. H., and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in

Drosophila. Nature 351: 652-654.

MEIKLEJOHN, C. D., K. L. MONTOOTH and D. M. RAND, 2007 Positive and negative selection on

the mitochondrial genome. Trends in Genetics 23: 259-263.

NIELSEN, R., 2001 Statistical tests of selective neutrality in the age of genomics. Heredity 86:

641-647.

NUZHDIN, S., M. WAYNE, K. HARMON and L. MCINTYRE, 2004 Common pattern of evolution of

gene expression level and protein sequence in Drosophila. Mol. Biol. Evol. 21: 1308-

1317.

NYFFELER, R., 2002 Phylogenetic relationships in the cactus family (cactaceae) based on

evidence from trnK/matK and trnL-trnF sequences. Am. J. Bot. 89: 312-326.

NYLANDER, J. A. A., 2004 MrModeltest v2. Evolutionary Biology Centre, Uppsala University,

Uppsala, Sweden.

POSADA, D., and K. A. CRANDALL, 1998 Modeltest: Testing the model of DNA substitution.

Bioinformatics 14: 817-818.

RANSON, H., L. ROSSITER, F. ORTELLI, B. JENSEN, X. L. WANG et al., 2001 Identification of a

novel class of insect Glutathione s-transferases involved in resistance to DDT in the

malaria vector Anopheles gambiae. Biochem. J. 359: 295-304.

29

REED, L. K., M. NYBOER and T. A. MARKOW, 2007 Evolutionary relationships of Drosophila

mojavensis geographic host races and their sister species Drosophila arizonae. Mol. Ecol.

16: 1007-1022.

RIFKIN, S. A., J. KIM and K. P. WHITE, 2003 Evolution of gene expression in the Drosophila

melanogaster subgroup. Nat. Genet. 33: 138-144.

RONQUIST, F., and J. P. HUELSENBECK, 2003 MrBayes 3: Bayesian phylogenetic inference under

mixed models. Bioinformatics 19: 1572-1574.

ROSS, C. L., and T. A. MARKOW, 2006 Microsatellite variation among diverging populations of

Drosophila mojavensis. J. Evol. Biol. 19: 1691-1700.

ROZAS, J., J. C. SANCHEZ-DELBARRIO, X. MESSEGUER and R. ROZAS, 2003 DnaSP, DNA

polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-

2497.

RUIZ, A., and W. B. HEED, 1988 Host-plant specificity in the cactophilic Drosophila mulleri

species complex. J. Anim. Ecol. 57: 237-249.

RUIZ, A., W. B. HEED and M. WASSERMAN, 1990 Evolution of the mojavensis cluster of

cactophilic Drosophila with descriptions of two new species. J. Hered. 81: 30-42.

SHAPIRO, J. A., W. HUANG, C. ZHANG, M. J. HUBISZ, J. LU et al., 2007 Adaptive genic evolution

in the Drosophila genomes. Proc. Natl. Acad. Sci. 104: 2271-2276.

SMITH, J. M., and J. HAIGH, 1974 Hitch-hiking effect of a favorable gene. Genet. Res. 23: 23-35.

STARMER, W. T., 1982 Analysis of the community structure of yeasts associated with the

decaying stems of cactus. I. Stenocereus gummosus Microb. Ecol. 8: 71-81.

30

STARMER, W. T., J. S. F. BARKER, H. J. PHAFF and J. C. FOGLEMAN, 1986 Adaptations of

Drosophila and yeasts: Their interactions with the volatile 2-propanol in the cactus

microorganism Drosophila model system. Aust. J. Biol. Sci. 39: 69-77.

STONE, J., and G. WRAY, 2001 Rapid evolution of cis-regulatory sequences via local point

mutations. Mol. Biol. Evol. 18: 1764-1770.

SWOFFORD, D. L., 2003 Paup*: Phylogenetic analysis using parsimony (*and other methods),

version 4.0b10. Sinauer Associates, Sunderland, MA.

TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA

polymorphism. Genetics 123: 585-595.

TANG, A. H., and C. P. D. TU, 1994 Biochemical characterization of Drosophila Glutathione s-

transferases D1 and D21. J. Biol. Chem. 269: 27876-27884.

VACEK, D. C., 1979 The microbial ecology of the host plants of Drosophila mojavensis. Thesis,

University of Arizona, Tucson, AZ.

VOIGHT, B. F., S. KUDARAVALLI, X. WEN and J. K. PRITCHARD, 2006 A map of recent positive

selection in the human genome. PLoS Biology 4.

WRIGHT, F., 1990 The 'effective number of codons' used in a gene. Gene 87: 23-29.

31

TABLE 1

Synonymous, nonsynonymous and noncoding sequence variation for GstD1 in D. mojavensis, D. arizonae and D. navojoa

Synonymous Nonsynonymous Synonymous and noncoding

N Ss πs θs Sn πn θn Ss+nc πs+nc θs+nc

D. mojavensis all 63 17 0.0168 0.0241 15 0.0110 0.0067 28 0.0210 0.0237

Mojave 15 1 0.0023 0.0021 2 0.0012 0.0013 1 0.0014 0.0012

Catalina Island 8 3 0.0081 0.0077 0 0.0000 0.0000 3 0.0048 0.0046

Sonora 23 4 0.0083 0.0073 4 0.0016 0.0023 9 0.0104 0.0097

Baja 17 8 0.0068 0.0158 3 0.0019 0.0019 12 0.0075 0.0142

D. arizonae all 25 26 0.0350 0.0461 6 0.0029 0.0033 31 0.0265 0.0332

Northern 20 21 0.0310 0.0396 4 0.0019 0.0024 24 0.0211 0.0273

Southern 5 2 0.0054 0.0064 0 0.0000 0.0000 3 0.0048 0.0058

D. navojoa all 4 7 0.0233 0.0255 1 0.0011 0.0011 11 0.0232 0.0246

N, number of lines sampled; S, number of segregating sites; π, mean pairwise difference per bp; θ, nucleotide diversity per bp.

32

TABLE 2

Pairwise FST between the four D. mojavensis populations

Mojave Catalina Island Sonora

Catalina Island 0.654***

Sonora 0.842*** 0.788***

Baja 0.902*** 0.855*** 0.252***

The data was permuted 10,000 times in order to calculate significance. *** P < 0.001.

33

TABLE 3

Summary statistics of frequency distribution tests

Species Population Tajima's D A Fu's FS B Fu & Li's D C Fay & Wu's H D

D. mojavensis Mojave 0.035 -0.488 1.062 0.724

Catalina Island 0.204 0.562 0.165 0.714

Sonora -0.146 -9.732 -0.221 -0.431

Baja -1.426* -5.194† -2.138* -1.412

D. arizonae Northern -0.878* -8.391 -0.752 3.600

Southern -1.049 -0.186 -1.569 0.900

A TAJIMA 1989; B FU 1997; C FU and LI 1993; D FAY and WU 2000

† P < 0.1, * P < 0.05

34

TABLE 4

Fixed and polymorphic variation at GstD1 and tests for departure from neutrality

Standard MK test Lineage-specific MK test

Species Population Fixed Polymorphic Test Fixed Polymorphic Test D. mojavensis Mojave syn. 7 32 Fisher's 5 1 Fisher's nonsyn. 0 7 ns 0 2 ns Catalina syn. 6 34 Fisher's 4 3 N/A Island nonsyn. 0 6 ns 0 0 Sonora syn. 6 40 G-test 4 9 G-test nonsyn. 6 10 * 6 4 ns Baja syn. 6 43 G-test 4 12 G-test nonsyn. 7 9 ** 7 3 * D. arizonae Northern syn. 3 52 Fisher's 2 24 Fisher's nonsyn. 0 18 ns 0 4 ns Southern syn. 9 31 G-test 8 3 Fisher's nonsyn. 1 15 ns 1 0 ns

Synonymous sites include changes within both the coding and noncoding regions. * P < 0.05, ** P < 0.01

35

FIGURE 1. Collecting locations for D. mojavensis (squares), D. arizonae (circles) and D. navojoa

(triangles): 1. Catalina Island, CA; 2. Anza-Borrego, CA; 3. Grand Canyon, AZ; 4. Hermosillo,

Mex; 5. Desemboque, Mex; 6. Guaymas, Mex; 7. La Paz, Mex; 8. Riverside, CA; 9. Tucson, AZ;

10. Navojoa, Mex; 11. Hidalgo, Mex; 12. Chiapas, Mex; 13. El Dorado, Mex; 14. Jalisco, Mex;

15. Tehuantepec, Mex.

FIGURE 2. Bayesian tree analysis for all GstD1 sequences using the K80+I+Γ model (see text for

more details). The support for branches is shown. The species (solid brackets) and populations

(dashed brackets) are shown. The sample identification for D. mojavensis are: DE =

Desemboque, Mex; HE = Hermosillo, Mex; MJ = Guaymas, Mex; MJBC = La Paz, Mex; CI =

Catalina Island, CA; ANZA = Anza-Borrego, CA; WC = Grand Canyon, AZ. The sample

identification for D. arizonae are: ARNA = Navojoa, Mex; ARTU = Tucson, AZ; RVSD =

Riverside, CA; CHI = Chiapas, Mex; HID and MXT = Hidalgo, Mex. The sample identification

for D. navojoa are: NAV1 = Tehuantepec, Mex; NAV10 = El Dorado, Mex; NAV11 and

NAV12 = Jalisco, Mex.

FIGURE 3. Distribution of the non-synonymous to synonymous divergence (Ka/Ks) using a

sliding window analysis (window size = 50 bp, step size = 10 bp). The divergence between each

of the four D. mojavensis population and D. arizonae is shown. The circles indicate the seven

non-synonymous changes that occur between the Baja and Sonora populations of D. mojavensis

and D. arizonae. The filled circles are those changes that occur in the active site of GSTD1.

36

37

38