31
Classification BIOLOGICAL SCIENCES/Evolution Divergence of Drosophila melanogaster repeatomes in response to a sharp microclimate contrast in ‘Evolution Canyon’, Israel Short title Divergence of Drosophila repeat elements Young Bun Kim 1 , Jung Hun Oh 2 , Lauren J. McIver 1 , Eugenia Rashkovetsky 3 , Katarzyna Michalak 1 , Harold R. Garner 1,4 , Lin Kang 1 , Eviatar Nevo , Abraham Korol , Pawel Michalak 1,4,5§ 1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA 2 Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA 3 Institute of Evolution, Haifa University, Haifa, Israel 4 Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA 5 Department of Fish and Wildlife Conservation, Virginia Tech, Blacksburg, VA 24061, USA §These senior authors contributed equally to the paper

Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Classification

BIOLOGICAL SCIENCES/Evolution

Divergence of Drosophila melanogaster repeatomes in response to a sharp microclimate contrast in ‘Evolution Canyon’, Israel

Short title

Divergence of Drosophila repeat elements

Young Bun Kim1, Jung Hun Oh

2, Lauren J. McIver

1, Eugenia Rashkovetsky

3, Katarzyna

Michalak1, Harold R. Garner

1,4, Lin Kang

1, Eviatar Nevo

3§, Abraham Korol

3§, Pawel

Michalak1,4,5§

1Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA

2Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, NY

10065, USA 3Institute of Evolution, Haifa University, Haifa, Israel

4Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA

5Department of Fish and Wildlife Conservation, Virginia Tech, Blacksburg, VA 24061, USA

§These senior authors contributed equally to the paper

Page 2: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Abstract

Repeat sequences, especially mobile elements, make up large portions of most eukaryotic

genomes and provide enormous, albeit commonly underappreciated, evolutionary potential. We

analyzed repeatomes of Drosophila melanogaster that have been diverging in response to a

microclimate contrast in ‘Evolution Canyon’ (Mount Carmel, Israel), a natural evolutionary

laboratory with two abutting slopes at an average distance of only 200 meters posing a constant

ecological challenge to their local biotas. Flies inhabiting the colder and more humid North-

facing slope carried about 5% more transposable elements than those from the hot and dry

South-facing slope, in parallel to a suite of other genetic and phenotypic differences between the

two populations. Nearly 50% of all mobile element insertions were slope-unique, with many of

them disrupting coding sequences of genes critical for cognition, olfaction, and thermotolerance,

consistent with the observed patterns of thermotolerance differences and assortative mating.

Keywords: adaptive evolution, genome sequencing, mobile elements, microsatellites

Significance Statement

Repeatome with its enormous variability and internal epigenetic dynamics emerges as a critical

source of potentially adaptive changes and evolutionary novelties, as exemplified here by

Drosophila melanogaster from a sharp ecological contrast in North Israel. Flies derived from the

opposing sides of this long-studied microsite exhibit a significant difference in the contents and

distribution of mobile elements, as well as microsatellite allele frequencies, corresponding well

with earlier reported phenotypic patterns of stress resistance and assortative mating in the

system.

Page 3: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

\body

Introduction

One of the greatest surprises of comparative genomics has been the discovery that

eukaryotic genomes are loaded with ubiquitous repeat sequences, such as transposable elements

and tandem array repeats (satellites). Once relegated to ‘junk DNA’, today many repeat elements

emerge as potent functional genetic units, providing inspiration for new research initiatives (1).

Although traditionally viewed as selfish DNA, transposable elements (TEs) are increasingly

being recognized as agents of adaptive change and a rich source of evolutionary novelties,

including hybrid dysgenesis (2), insecticide resistance (3), mammalian placenta (4), vertebrate

adaptive immune system (4), as well as flower development and plant fitness (5). ‘Domesticated’

TEs can be repurposed to function as transcription factors, while others play various roles in

heterochromatin formation, genome stability, centromere binding, chromosome segregation,

meiotic recombination, TE silencing, programmed genome rearrangement, V(D)J recombination,

and translational regulation (4, 6). Even low-complexity sequence repeats, such as

microsatellites, may act as novel regulatory elements (7), enhance alternative splicing (8), and

provide new substrate for protein coding variation (9).

Transposable elements have been implicated in adaptations to environmental changes

either through somewhat elusive direct remobilization after stress (10) or more indirectly as a

source of new and pre-existing genetic variation (11). A striking example of adaptive TE

insertion polymorphism has been found in Drosophila melanogaster from Evolution Canyon

(Lower Nahal Oren, Israel) (12, 13), in which slopes 100–400 m apart differ dramatically in

aridity, solar radiation, and associated vegetation due to the higher insolation on the south-facing

slope (SFS) relative to the north-facing slope (NFS).The promoter region of hsp70Ba, one of the

five hsp70 paralogs that encode a major inducible heat-shock protein in D. melanogaster, was

polymorphic for 1.2 Kb P-element insertion, with the insert being 28 times more frequent in

NFS- than SFS-inhabiting flies. The P-element disruption was associated with decreased Hsp70

expression and heat-shock survival, but increased reproductive success in temperate

conditions(13). While elevated levels of Hsp70 are beneficial during thermotolerance challenges,

they tend to be deleterious for growth and development (14, 15).

In parallel with a common pattern of slope-specific adaptations observed across many

other taxa inhabiting this remarkable ecological microgradient (reviewed in (16, 17)), SFS-

derived D. melanogaster outperform their conspecific neighbors from NFS not only in basal and

inducible thermotolerance after diverse heat shocks (18-20), but also in resistance to desiccation

and starvation (18, 21). In addition to stress resistance, the two populations differ in oviposition

site preferences (18), male's courtship song parameters (22), sexual and reproductive behavior

(23) leading to partial assortative mating within slopes (24), and phenotypic plasticity for wing

morphology (25) – all this in spite of the physical proximity and migration between slopes (26).

A recent genome analysis of the two populations (27) reveals a number of chromosomal

regions of interslope divergence and low sequence polymorphism, suggestive of selective sweeps

among genes enriched for functions related to stimulus responses, developmental and

reproductive processes, remarkably consistent with our previous findings on the phenotypic

patterns of stress responses, life history, and mating functions. Sequence divergence in genes

underlying adaptive changes, coupled with assortative mating within populations, can be

interpreted as signatures of incipient speciation (21). The picture of adaptive divergence in the

natural system remains largely incomplete without looking into noncoding parts of the genomes,

Page 4: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

primarily those occupied by repeat elements. To fill the void, we have now used high-coverage

(40x) genome sequencing and analyzed variation of TEs and microsatellites in the two D.

melanogaster populations.

Results and Discussion

In D. melanogaster, TEs make up 22% of the whole genome, with LTR retrotransposons

being the most abundant, followed by LINE (long interspersed nuclear element)-like non-LTR

retrotransposons and terminal inverted repeat (TIR) DNA transposons (28). We identified a total

of 14,555 and 15,374 TEs in SFS and NFS populations, respectively. NFS had consistently more

TEs than SFS in all chromosomes and genomic regions, except TIRs and LTRs in coding

sequences and TIRs and (marginally) non-LTRs in 3’-UTRs (Tables 1 and 2; Wilcoxon test,

P=0.047). Different from the results reported by Kofler et al. (29), but consistent with Bartolomé

et al. (30) X chromosome had the lowest concentration of TEs. 5’-UTRs contained overall more

retroelement insertions than 3’-UTRs in both SFS and NFS, but the pattern was reversed for

TIRs. For more reliable sites after removing less than 10 reads coverage sites and overlapping

TE insertions, 9,877 and 10,926 TE insertions were identified in SFS and NFS, respectively. Out

of these filtered TE insertions, 2,174 (22%) and 2,452 (22%) were fixed (> 95%) in SFS and

NFS, respectively. Out of the 5,222 known TE insertions present in the reference genome, 2,871

(55%) and 2,951 (57%) were identified in SFS and NFS, respectively. A total of 6,964 (48%)

and 7,812 (51%) TE insertions were positioned uniquely in SFS and NFS, respectively (Tables 3

and 4), demonstrating the ubiquity of TE-induced polymorphism in the populations. Significant

interslope differentiation was produced by retroelements (non-LTR: chi square with Yates

correction = 15.293, df = 1, P < 0.0001; LTR: 8.508, P = 0.0035) rather than DNA transposons

(TIRs: 1.451, P = 0.228). Only 363 TEs in SFS and 518 TEs in NFS were slope-unique and fixed

(frequency >95%).

With a total of 84 differential insertion sites, I-elements were the most divergent TEs

between SFS and NFS (Table S1). I-elements are ~5.4 Kb, non-LTR retroelements belonging to

LINEs, having spread across natural populations of D. melanogaster in the early 20th

century

(31). I-elements are subject to female germline mobilization in the progeny of certain D.

melanogaster intraspecific crosses, responsible for a maternal-effect embryonic lethality known

as I-R type of hybrid dysgenesis (32). Quasimodo and roo LTRs were the next most slope-

divergent TEs, followed by LINE-like jockey and Doc. The most slope-differential TIR was P-

element, represented by 45 unique insertion sites. Interestingly, P-element insertions in the

Hsp70Ba promoter reported earlier (12, 13) were absent in the current samples, a result of either

demographic dynamics or selection against the insertion over the course of 13 years separating

the fly collections in the field. However, there were 76 other unique TE insertions in SFS and

107 in NFS within the putative promoter regions, including a P-element in a gene from the small

heat shock protein (HSP20) family, Hsp67Ba in SFS flies. Although expression of Hsp67Ba was

not directly investigated in D. melanogaster from Evolution Canyon, we have previously shown

that small Hsps such as Hsp40 and Hsp23 can contribute to thermotolerance in the system (33).

There were at least 20 significant GO term enrichments representing genes with slope-unique TE

insertions in their putative promoters, with functional activities related to hydrolases and

alternative splicing being most conspicuous (Tables S2 and S3).

Additional 426 genes were disrupted by 511 slope-specific TE insertions within coding

sequences, presumably leading to the genes’ functional inactivation. Interestingly, there were 20

cognition-related and 17 sensory perception-related genes affected by the inserts, including eight

Page 5: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

olfactory receptor and eight gustatory receptor genes, all critical for detecting food and avoiding

toxicants, as well as for courtship and mating. Cognition, sensory perception of chemical stimuli,

and olfaction were among the most significantly overrepresented GO terms (Benjamini-

Hochberg-adjusted P < 0.01) among genes with TE-disrupted coding sequences (Tables S4 and

S5). This slope-divergent transposition of mobile elements may contribute to courtship changes

and mating isolation. Indeed, we and others have observed various degrees of partial mating

isolation between NFS- and SFS-derived D. melanogaster over many years of fly collections in

Evolution Canyon (23, 24, 34) (see (35) for an exception). Twenty seven genes in SFS and 27

other genes in NFS were hit by multiple (2-3) slope-unique TEs within CDSs. For example,

olfactory receptor gene Or92a in SFS was disrupted by two different TEs, roo and hobo,

whereas CG11034 was affected by two independent F-element insertions and one jockey.

A question arises how transposition, which on average has deleterious effects, can result

in a certain population-unique functional enrichment. First, it should be noted that transposition

tends to be nonrandom with respect to insertion sites. For example, P-elements prefer a specific

palindromic arrangement of hydrogen bonding sites over a 14-bp region centered on their

insertion site (36). It is thus not entirely unexpected that certain gene families sharing common

structural motifs as well as functionalities are preferentially attracting TEs. It has also been

suggested that transcriptionally active house-keeping genes, such as Hsp70 paralogs, have a

permanently accessible chromatin structure making their promoters easier targets for mobile

elements (37, 38). These intragenic TEs thereafter segregate in natural populations and provide a

source of rampant genetic variation upon which natural selection and demographic processes

may act (11). Purifying selection against TEs or positive selection favoring particular TE

insertions may have operated differently on NFS and SFS, dependent on ecological conditions of

the contrasting slopes. Alternatively to differential selection regimes, it is possible that

demographic processes may have starkly different dynamics on the opposite slopes of Evolution

Canyon. Such environmental events as droughts and forest fires, like that one in December of

2010 that ravaged the Carmel mountains, including Evolution Canyon habitats (39), presumably

drive local Drosophila populations to very low numbers, followed by rapid repopulations and

expansions. Even without major environmental disasters, D. melanogaster natural populations

have been known to be subject to boom and bust cycles (40) boosting short-term Ne and enabling

short-term evolution act primarily on pre-existing intermediate-frequency genetic variants that

are swept the remainder of the way to fixation in a process known as soft sweep (41).

To investigate the potential relationship between molecular adaptations and TEs, we

tested whether TEs are found in the range of “non-neutral” single nucleotide polymorphism

regions of Tajima’s D <-2.0 and significant interslope differentiation, suggestive of disruptive

positive selection, as we reported elsewhere (27). Except for chromosome X (and 2L in SFS), the

density of TEs was increased in the regions relative to mean TE density per chromosome, even

as much as three times on 2R. The highest number of TEs (18 in NFS and 16 in SFS) was found

in the window 2R 230000-240000, all inserted in intergenic sequences. The ~60 Kb region

(9069408 to 9127928 bp) of interest on 3R that showed a high excess of SNPs assigned by a

Hidden Markov Model analysis to high-differentiating state contained four TEs in SFS (hobo,

roo, and two Quasimodos) and only two TEs in NFS (roo and Quasimodo), all within intergenic

DNA. The highest interslope differentiation was found in a “non-neutral” region of X

chromosome (positions 10520000-10530000): six TEs in SFS and only one in NFS (Fig. 1),

affecting an exon of CG9806 (SFS), intron of X11Lbeta (SFS and NFS), and the intergenic

sequence between the two genes (SFS).

Page 6: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Similar to other studies (29, 30), the highest concentration of TEs in both NFS and SFS

was on chromosome 4, and intergenic sequences accumulated more TEs than other genomic

regions. In D. melanogaster, chromosome 4 is a small (5–6 Mb) genetic element that normally

does not recombine (42), free to accumulate TEs and other repetitive elements. Although the TE

density tends to be locally higher in regions with low recombination rates, the relationship

between recombination rate and transposable site occupancy frequency exhibits a less consistent

pattern, seemingly dependent on TE type (43). Consistent with Rizon et al. (43), we observed

that the abundance of transposons (TIRs) but not retroelements (both LTR and non-LTR)

significantly negatively correlated with recombination rate along chromosome arms. However,

this effect again depended on the TE type. TIRs were significantly correlated with recombination

in all chromosomes in both NFS and SFS populations (Spearman R, -0.82 – -.44, p < 0.05),

whereas LTR and non-LTR retroelements showed a significant correlation only along X

chromosome in SFS (-0.44 – -0.42, p < 0.05). The significant negative correlation observed

between transposons and recombination rate suggests that selection acts against these TEs, and

that TIR remobilization is less efficient to compensate for recombination and selection effects

relative to retroelements. An opposite pattern was observed in Caenorhabditis elegans (44),

wherein TIR (but not LTR or non-LTR retrotransposon) density was positively correlated with

recombination rate.

Traditionally, microsatellite alleles have been used as neutral markers in population

genetic research (45), despite the fact that many microsatellites, especially those within coding

and regulatory sequences, may produce distinct morphological and behavioral phenotypes with

adaptive significance (9, 46, 47). Although intronic microsatellites were most abundant (39,127;

46%) in the sequenced genomes, we have identified as many as 8,562 (10%) and 692 (0.8%)

microsatellite loci in exons and putative promoter regions, respectively (Fig. 2). There was a

large asymmetry in the number of microsatellites in 5’- (14,141) compared to 3’-UTR introns

(500), a result of a 22 times higher number of introns in the former, as well as the average intron

size difference between the two regions. Previous surveys of microsatellite divergence between

NFS- and SFS-derived D. melanogaster have provided reports of high (12) and low (48) genetic

differentiation (Fst), an inconsistency probably due to demographic fluctuations in the

populations, as well as a very limited choice of marker loci in the pre-genomics past. Here we

used a total of 65,396 microsatellite loci to estimate Fst values in comparison with Fst estimates

from all genomic SNPs (27). The average Fst estimate across all microsatellites was 0.012, one

order of magnitude higher than that reported by others (48) but consistently lower than our

estimates of Fst from SNPs for the same genomes (Fig. 2). The exclusion of exonic

microsatellites slightly increases the mean Fst estimate (0.014). There was a significant positive

correlation between microsatellite- and SNP-derived Fst values across 1 Mb intervals on all

chromosomal arms (Spearman rank correlation r = 0.447-0.521, p < 0.05), except for 3R (r =

0.129, p = 0.513; Fig. 3). A monomer (Tn) locus within a 5’-UTR intron of CG42686 (3R

chromosome), with one NFS-exclusive allele and two SFS-exclusive alleles was the most

divergent microsatellite, followed by 11 other loci that remained significantly different between

slopes after Benjamini-Hochberg correction of individual P-values, all in noncoding sequences

(Table S6). Overall, our results show substantial interslope divergence in repeat sequences, in parallel

with differentiation of coding sequences (27). While we find no evidence that microsatellites

contribute to local adaptations in Evolution Canyon, mobile elements emerge as potential

surrogates of adaptive divergence along the sharp microclimate gradient. A similar conclusion

Page 7: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

has been reached in a study of wild barley (Hordeum spontaneum) from Evolution Canyon (49,

50), showing that more BARE-1 retrotransposon copies and proportionally fewer solo LTRs were

found in the upper, drier sites of the canyon, particularly at the top of the SFS, than at lower

sites. Interestingly, the pattern seems to be roughly opposite to D. melanogaster whose NFS

genomes carry a heavier TE load.

Materials and Methods

Sampling and DNA extractions

A total of 16 NFS and 16 SFS D. melanogaster isofemale lines were obtained from females

collected in EC (Mount Carmel, Israel) in October 2010. ~1µg of DNA from each line were

pooled to make population representations. Illumina paired-end libraries were constructed and

sequenced with HiSeq, 100-cycle, at ~40x coverage per population, as described elsewhere (27).

Mapping reads and data processing

Paired-end reads were filtered for minimum average base quality score of 20 and a minimum

length of 50 bp (see (27) for details). Trimmed reads were mapped to the Drosophila

melanogaster reference genome using BWA (0.5.9-r16) (51). Paired-end data were converted to

BAM format using SAMtools. For each detected SNP, differentiation between populations was

calculated using both Fst and Fisher exact test. Footprints of natural selection were detected using

Tajima's D calculated over non-overlapping 10 Kb windows along chromosome arms.

Detection of interslope selective differentiated genomic regions and GO term enrichment Hidden Markov Model (HMM) analysis in R was used to discriminate between the distributions

of three hidden states corresponding to high, moderate and low interslope differentiation, and to

assign each SNP to a corresponding state, using interslope Fst values (see (27) for details). To

search for high differentiation regions along chromosome arms we used 10 Kb non-overlapping

windows. For each window the level of differentiation was represented by two parameters, LI =

nL/nT (“low island”) and HI = nH/nT (“high island”), where nT is the total number of SNPs

within each window, while nH and nL are the numbers of SNPs assigned in the HMM step to

either high- or low-differentiation states, respectively. Following this step, a permutation test was

conducted in order to detect genomic regions with significant enrichment of HI scores.

Significantly differentiated genomic regions between slopes were then inspected for the type and

strength of selection using the Tajima's D scores for each window as obtained with Popoolation

(52). Trimmed files of each population were converted to a pileup format separately and used to

calculate population measures over a non-overlapping window of size 10 Kb in PoPoolation. A

score of D<-2 is indicative of a recent selective sweep (fixation of novel mutation) followed by a

slow recovery of variation, hence an excess of rare alleles. On the other hand, D>2 is indicative

of small allele frequency differences due to balancing selection. Thus, combining both

differentiation and selection scores can be used for detection of genomic regions corresponding

to interslope differentiation caused by alternative selection (small D and significant HI).

To study the biological significance of genes under diversifying selection, an enrichment

analysis of gene ontology (GO) terms was conducted with DAVID 6.7 (53).

Identification of Transposable Elements

To identify transposable elements (TEs), we used a PoPoolationTE software package that allows

finding TE insertions present in a reference genome as well as novel TE insertions (29). As a

Page 8: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

reference, the Drosophila melanogaster genome v5.31, transposable element sequences v5.31,

and a GFF file (v5.31) from FlyBase were used. We used the same classification of TE insertions

proposed by Kofler et al. (29), with three major ‘orders’: terminal inverted repeat (TIR)

elements, long terminal repeat (LTR) and non-LTR retrotransposons, further grouped into 115

families and 5,222 insertions. However, unlike Kofler et al. (29), we added P-element (FlyBase)

to the list of analyzed TEs. To measure differentiation between SFS and NFS, which is expressed

by the fixation index (Fst), we used PoPoolation2 software (54).

Creating a set of microsatellites with uniquely mappable flanking sequences

A set of microsatellite loci with unique flanking sequences was identified in the dmel reference

(BDGP R5/dm3) genome using methods that were used prior to create a unique microsatellite set

for the human genome (55, 56). Tandem Repeat Finder (57) was run on the dmel reference

genome to identify a starting set of 875,647 microsatellites using parameters 2 5 5 80 10 14 6. A

Perl script reduced this set to only 1 to 6 mer microsatellites which were at least 12 nt in length

with a minimum 90% identity score. This set contained 209,591 microsatellite loci. Using the

LINE locations for dmel from the UCSC Genome Browser (58), we removed microsatellites

from the set which were within 1 nucleotide of any of these large repetitive regions. This set was

further reduced to only contain those microsatellites with unique flanking sequences. The

resulting set contained 72,603 microsatellites. Locations of RefSeq genes in the dmel reference

(Flybase and DHGP, Lawrence Berkeley National Laboratory) were downloaded from the UCSC

Genome Browser (58). A Perl script was written to identify the position of each microsatellite

with respect to the RefSeq genes. The Perl script used the first nucleotide of each microsatellite

to determine its position. The first RefSeq gene which contained this position in the list of

RefSeq genes was used. We ignored any genes which followed in the list which might also

contain this point. Upstream was defined as the region spanning 1000 nt prior to the transcription

start nucleotide.

Microsatellite calling

All raw Illumina reads were mapped to the dmel reference with BWA (0.5.9-r16) (51). Using our

microsatellite calling software which includes local realignment and read filtering (55, 56), we

were able to call 67,565 and 66,276 microsatellite loci in the NFS and SFS samples, respectively.

We required at least 3 reads to call each microsatellite allele. Since the samples contained

multiple genomes we did not limit each microsatellite locus to a maximum of two alleles.

Acknowledgments

We would like to thank Alan Templeton and Harmit Malik for reviewing the paper. The project

was supported by an US-Israel BSF grant #2011438 (to PM, AK, ER) and the Ancell Teicher

Research Foundation for Genetics and Molecular Evolution (to EN).

References

Page 9: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

1. Wang T, et al. (2007) Species-specific endogenous retroviruses shape the transcriptional

network of the human tumor suppressor protein p53. Proc Natl Acad Sci U S A

104(47):18613-18618.

2. Ashburner M, Golic KG, & Hawley RS (2005) Drosophila : a laboratory handbook

(Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.) 2nd Ed pp xxviii, 1409

p.

3. Aminetzach YT, Macpherson JM, & Petrov DA (2005) Pesticide resistance via

transposition-mediated adaptive gene truncation in Drosophila. Science 309(5735):764-

767.

4. Sinzelle L, Izsvak Z, & Ivics Z (2009) Molecular domestication of transposable elements:

from detrimental parasites to useful host genes. Cell Mol Life Sci 66(6):1073-1093.

5. Joly-Lopez Z, Forczek E, Hoen DR, Juretic N, & Bureau TE (2012) A gene family

derived from transposable elements during early angiosperm evolution has reproductive

fitness benefits in Arabidopsis thaliana. PLoS Genet 8(9):e1002931.

6. Feschotte C (2008) Transposable elements and the evolution of regulatory networks. Nat

Rev Genet 9(5):397-405.

7. Iglesias AR, Kindlund E, Tammi M, & Wadelius C (2004) Some microsatellites may act

as novel polymorphic cis-regulatory elements through transcription factor binding. Gene

341:149-165.

8. Sirand-Pugnet P, Durosay P, Brody E, & Marie J (1995) An intronic (A/U)GGG repeat

enhances the splicing of an alternative intron of the chicken beta-tropomyosin pre-

mRNA. Nucleic Acids Res 23(17):3501-3507.

9. Li YC, Korol AB, Fahima T, & Nevo E (2004) Microsatellites within genes: structure,

function, and evolution. Mol Biol Evol 21(6):991-1007.

10. Capy P, Gasperi G, Biemont C, & Bazin C (2000) Stress and transposable elements: co-

evolution or useful parasites? Heredity (Edinb) 85 ( Pt 2):101-106.

11. Gonzalez J, Karasov TL, Messer PW, & Petrov DA (2010) Genome-wide patterns of

adaptation to temperate environments associated with transposable elements in

Drosophila. PLoS Genet 6(4):e1000905.

12. Michalak P, et al. (2001) Genetic evidence for adaptation-driven incipient speciation of

Drosophila melanogaster along a microclimatic contrast in "Evolution Canyon," Israel.

Proc Natl Acad Sci U S A 98(23):13195-13200.

13. Lerman DN, Michalak P, Helin AB, Bettencourt BR, & Feder ME (2003) Modification

of heat-shock gene expression in Drosophila melanogaster populations via transposable

elements. Mol Biol Evol 20(1):135-144.

14. Krebs RA & Feder ME (1997) Deleterious consequences of Hsp70 overexpression in

Drosophila melanogaster larvae. Cell Stress Chaperones 2(1):60-71.

15. Zatsepina OG, et al. (2001) A Drosophila melanogaster strain from sub-equatorial Africa

has exceptional thermotolerance but decreased Hsp70 expression. J Exp Biol 204(Pt

11):1869-1881.

16. Nevo E (1995) Asian, African and European Biota Meet at Evolution-Canyon Israel -

Local Tests of Global Biodiversity and Genetic Diversity Patterns. P Roy Soc Lond B Bio

262(1364):149-155.

17. Nevo E (2012) "Evolution Canyon," a potential microscale monitor of global warming

across life. Proc Natl Acad Sci U S A 109(8):2960-2965.

Page 10: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

18. Nevo E, Rashkovetsky E, Pavlicek T, & Korol A (1998) A complex adaptive syndrome

in Drosophila caused by microclimatic contrasts. Heredity 80 ( Pt 1):9-16.

19. Lupu A, Pechkovskaya A, Rashkovetsky E, Nevo E, & Korol A (2004) DNA repair

efficiency and thermotolerance in Drosophila melanogaster from "Evolution Canyon".

Mutagenesis 19(5):383-390.

20. Rashkovetsky E, et al. (2006) Adaptive differentiation of thermotolerance in Drosophila

along a microclimatic gradient. Heredity 96(5):353-359.

21. Korol A, Rashkovetsky E, Iliadi K, & Nevo E (2006) Drosophila flies in "Evolution

Canyon" as a model for incipient sympatric speciation. Proc Natl Acad Sci U S A

103(48):18184-18189.

22. Iliadi KG, et al. (2009) [Peculiarities of the courtship song in the Drosophila

melanogaster populations adapted to gradient of microecological conditions]. J Evol

Biochem Physiol 45(5):579-589.

23. Iliadi K, et al. (2001) Sexual and reproductive behaviour of Drosophila melanogaster

from a microclimatically interslope differentiated population of "Evolution Canyon"

(Mount Carmel, Israel). Proc Biol Sci 268(1483):2365-2374.

24. Korol A, et al. (2000) Nonrandom mating in Drosophila melanogaster laboratory

populations derived from closely adjacent ecologically contrasting slopes at "Evolution

Canyon". Proc Natl Acad Sci U S A 97(23):12637-12642.

25. Debat V, et al. (2008) Multidimensional analysis of Drosophila wing variation in

Evolution Canyon. J Genet 87(4):407-419.

26. Pavlicek T, Frenkel Z, Korol AB, Beiles A, & Nevo E (2008) Drosophila at the

"Evolution Canyon" Microsite, Mt. Carmel, Israel: Selection Overrules Migration.

(Translated from English) Isr J Ecol Evol 54(2):165-180.

27. Hubner S, et al. (2013) Genome differentiation of Drosophila melanogaster from a

microclimate contrast in Evolution Canyon, Israel. Proc Natl Acad Sci U S A

110(52):21059-21064.

28. Clark AG, et al. (2007) Evolution of genes and genomes on the Drosophila phylogeny.

Nature 450(7167):203-218.

29. Kofler R, Betancourt AJ, & Schlotterer C (2012) Sequencing of pooled DNA samples

(Pool-Seq) uncovers complex dynamics of transposable element insertions in Drosophila

melanogaster. PLoS Genet 8(1):e1002487.

30. Bartolome C, Maside X, & Charlesworth B (2002) On the abundance and distribution of

transposable elements in the genome of Drosophila melanogaster. Mol Biol Evol

19(6):926-937.

31. Bucheton A, et al. (1992) I elements and the Drosophila genome. Genetica 86(1-3):175-

190.

32. Orsi GA, Joyce EF, Couble P, McKim KS, & Loppin B (2010) Drosophila I-R hybrid

dysgenesis is associated with catastrophic meiosis and abnormal zygote formation. J Cell

Sci 123(Pt 20):3515-3524.

33. Carmel J, Rashkovetsky E, Nevo E, & Korol A (2011) Differential expression of small

heat shock protein genes Hsp23 and Hsp40, and heat shock gene Hsr-omega in fruit flies

(Drosophila melanogaster) along a microclimatic gradient. J Hered 102(5):593-603.

34. Singh SR, Rashkovetsky E, Iliadi K, Nevo E, & Korol A (2005) Assortative mating in

Drosophila adapted to a microsite ecological gradient. Behav Genet 35(6):753-764.

Page 11: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

35. Panhuis TM, Swanson WJ, & Nunney L (2003) Population genetics of accessory gland

proteins and sexual behavior in Drosophila melanogaster populations from Evolution

Canyon. Evolution 57(12):2785-2791.

36. Liao GC, Rehm EJ, & Rubin GM (2000) Insertion site preferences of the P transposable

element in Drosophila melanogaster. Proc Natl Acad Sci U S A 97(7):3347-3351.

37. Walser JC, Chen B, & Feder ME (2006) Heat-shock promoters: targets for evolution by P

transposable elements in Drosophila. PLoS Genet 2(10):e165.

38. Haney RA & Feder ME (2009) Contrasting patterns of transposable element insertions in

Drosophila heat-shock promoters. PLoS One 4(12):e8486.

39. Shen Y, Lansky E, Traber M, & Nevo E (2013) Increases in both acute and chronic

temperature potentiate tocotrienol concentrations in wild barley at 'Evolution Canyon'.

Chem Biodivers 10(9):1696-1705.

40. Harry M, et al. (1999) Fine-scale biodiversity of Drosophilidae in "Evolution Canyon" at

the Lower Nahal Oren microsite, Israel. (Translated from English) Biologia 54(6):685-

705.

41. Burke MK, et al. (2010) Genome-wide analysis of a long-term evolution experiment with

Drosophila. Nature 467(7315):587-590.

42. Hochman B (1976) The fourth chromosome of Drosophila melanogaster. The Genetics

and Biology of Drosophila, ed Ashburner M, Novitski, E. (Academic Press, New York),

Vol 1b, pp 903–928.

43. Rizzon C, Marais G, Gouy M, & Biemont C (2002) Recombination rate and the

distribution of transposable elements in the Drosophila melanogaster genome. Genome

Res 12(3):400-407.

44. Duret L, Marais G, & Biemont C (2000) Transposons but not retrotransposons are

located preferentially in regions of high recombination rate in Caenorhabditis elegans.

Genetics 156(4):1661-1669.

45. Tautz D & Schlotterer (1994) Simple sequences. Curr Opin Genet Dev 4(6):832-837.

46. Fondon JW, 3rd & Garner HR (2004) Molecular origins of rapid and continuous

morphological evolution. Proc Natl Acad Sci U S A 101(52):18058-18063.

47. Fondon JW, 3rd, Hammock EA, Hannan AJ, & King DG (2008) Simple sequence

repeats: genetic modulators of brain function and behavior. Trends Neurosci 31(7):328-

334.

48. Schlotterer C & Agis M (2002) Microsatellite analysis of Drosophila melanogaster

populations along a microclimatic contrast at lower Nahel Oren canyon, Mount Carmel,

Israel. Mol Biol Evol 19(4):563-568.

49. Kalendar R, Tanskanen J, Immonen S, Nevo E, & Schulman AH (2000) Genome

evolution of wild barley (Hordeum spontaneum) by BARE-1 retrotransposon dynamics in

response to sharp microclimatic divergence. Proc Natl Acad Sci U S A 97(12):6603-6607.

50. Bedada G, Westerbergh A, Nevo E, Korol A, & Schmid KJ (2014) DNA sequence

variation of wild barley Hordeum spontaneum (L.) across environmental gradients in

Israel. Heredity (Edinb) .

51. Li H & Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25(14):1754-1760.

52. Kofler R, et al. (2011) PoPoolation: a toolbox for population genetic analysis of next

generation sequencing data from pooled individuals. PLoS One 6(1):e15925.

Page 12: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

53. Huang da W, Sherman BT, & Lempicki RA (2009) Systematic and integrative analysis of

large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44-57.

54. Kofler R, Pandey RV, & Schlotterer C (2011) PoPoolation2: identifying differentiation

between populations using sequencing of pooled DNA samples (Pool-Seq).

Bioinformatics 27(24):3435-3436.

55. McIver LJ, Fondon JW, 3rd, Skinner MA, & Garner HR (2011) Evaluation of

microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the

quality and utility of the raw data and alignments. Genomics 97(4):193-199.

56. McIver LJ, McCormick JF, Martin A, Fondon JW, 3rd, & Garner HR (2013) Population-

scale analysis of human microsatellites reveals novel sources of exonic variation. Gene

516(2):328-334.

57. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic

Acids Res 27(2):573-580.

58. Karolchik D, et al. (2014) The UCSC Genome Browser database: 2014 update. Nucleic

Acids Res 42(1):D764-770.

Page 13: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table 1. Chromosomal distribution of transposable elements in ‘Evolution Canyon’ D.

melanogaster (SFS – South-facing slope, NFS – North-facing slope).

SFS 5UTR 3UTR Intron CDS Promoter Intergenic Total

X 26 51 778 78 20 1054 2007 2L 36 48 1162 94 25 1485 2850 2R 49 72 1308 112 28 1418 2987 3L 34 51 1301 108 30 1449 2973 3R 44 80 1262 120 30 1261 2797 4 4 13 361 22 1 175 576 Total 193 315 6172 534 134 6842 14190

NFS 5UTR 3UTR Intron CDS Promoter Intergenic Total

X 27 48 852 66 24 1077 2094 2L 32 46 1276 93 33 1642 3122 2R 50 69 1338 105 35 1495 3092 3L 44 54 1415 96 29 1601 3239 3R 50 75 1370 107 41 1257 2900 4 0 12 369 21 2 174 578 Total 203 304 6620 488 164 7246 15025

Page 14: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table 2. Class I (LTR and non-LTR) and Class II (TIRs) transposable elements and their

genomic distribution in genomes of SFS- and NFS-derived D. melanogaster.

SFS NFS

region n n

non-LTR 5'-UTR 22 25

3'-UTR 55 54

intron 1487 1652

CDS 67 86

promoter 22 32

intergenic 1663 1810

LTR 5'-UTR 47 49

3'-UTR 151 153

intron 2563 2762

CDS 281 252

promoter 45 56

intergenic 2884 3110

TIR 5'-UTR 124 129

3'-UTR 109 97

intron 2122 2206

CDS 186 150

promoter 67 76

intergenic 2295 2326

Number of TE insertions after removing less coverage sites than 10 reads and overlapping TE insertions.

Page 15: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table 3. SFS- and NFS-unique transposable elements (LTRs, non-LTRs, and TIRs).

SFS 5UTR 3UTR Intron CDS Promoter Intergenic Total

X 20 28 476 29 8 536 1097 2L 23 30 566 51 16 705 1391 2R 35 32 581 54 12 519 1233 3L 24 34 649 56 20 671 1454 3R 22 46 810 77 19 734 1708 4 0 4 49 6 1 21 81 Total 124 174 3131 273 76 3186 6964

NFS 5UTR 3UTR Intron CDS Promoter Intergenic Total

X 21 24 548 21 13 561 1188 2L 21 24 683 55 25 859 1667 2R 36 30 634 47 20 591 1358 3L 34 36 757 45 22 814 1708 3R 34 46 917 67 27 717 1808 4 0 2 57 3 0 21 83 Total 146 162 3596 238 107 3563 7812

Page 16: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table 4. Classes and chromosomal distribution of SFS- and NFS-unique transposable elements.

SFS LTR Non-LTR TIR Total

X 610 224 263 1097 2L 809 320 262 1391 2R 670 306 257 1233 3L 814 340 300 1454 3R 988 419 301 1708 4 18 10 53 81 Total 3909 1619 1436 6964

NFS LTR Non-LTR TIR Total

X 676 265 247 1188 2L 913 436 318 1667 2R 739 344 275 1358 3L 948 435 325 1708 3R 1041 468 299 1808 4 20 11 52 83 Total 4337 1959 1516 7812

Page 17: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Figure legends

Figure 1. An X chromosome region characterized by high sequence divergence, coupled with

low Tajima’s D values, as well as high TE differentiation.

Figure 2. NFS-SFS interpopulation differentiation (Fst) in SNPs and microsatellites across the

major chromosome arms.

Figure 3. Genomic distribution of microsatellites in Evolution Canyon Drosophila melanogaster.

Page 18: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table S1. Top 30 slope-differential TEs.

family SFS NFS difference

1 I-element 250 334 84

2 Quasimodo 197 275 78

3 roo 1210 1284 74

4 jockey 592 658 66

5 Doc 58 106 48

6 P-element 529 574 45

7 mdg1 146 190 44

8 Ivk 56 94 38

9 412 306 340 34

10 F-element 172 206 34

11 hobo 136 170 34

12 297 222 255 33

13 Rt1b 93 120 27

14 Burdock 112 133 21

15 flea 81 102 21

16 opus 153 172 19

17 blood 251 233 18

18 pogo 103 120 17

19 FB 133 117 16

20 invader2 46 62 16

21 mdg3 181 196 15

22 gypsy12 25 39 14

23 invader1 26 40 14

24 Bari1 29 16 13

25 Tirant 138 126 12

26 HMS-Beagle 111 100 11

27 copia 76 87 11

28 Tabor 36 46 10

29 diver 46 56 10

30 Idefix 22 31 9

Page 19: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table S2. Genes disrupted by slope-differential TE insertions within putative promoters (1-100

bp upstream of 5'UTR).

SFS promoter

X 2L 2R 3L 3R 4

CG32813 CG3544 CG1882 CG34022 CG2663 Kif3C

CG34338 CG4267 rgr Src64B CG10267 Idgf4 CG11929 CG11825 CHMP2B CG2641 Cht6 CG11034 CG34227 Jon65Aii CG7800 CG2556 Ucp4C CG13155 form3 CG3940 CG1640 GRHR CheB53a h CG6465 CG12177 GRHR mir-31a Hsp67Ba Cpn CG33932 Rat1 Dip3 CG7949 CG6904

CG4592 Hs3st-A CG7368 tinc

CG31720 Gr58b CG7368 CG5386

snRNA:U5:34A CG30181 Ir68b CG34376

Ance Gr59e Ir68b CG7045

CG13284

Sap130 CG33093

Fas3

stv lmd

Aats-asn

26-29-p CG4449

l(2)k14505 shd CG11851

Dbp73D CG6154

olf413 CG6296

CG12768 RhoGAP100F

CG32459

NFS promoter

X 2L 2R 3L 3R 4

MED22 Cdlc2 CG1399 mthl8 CG32945 CG18823 CG18641 9/5/2013 emc CG31531 CG2694 CG3119 PGRP-SC1a CG13898 hd CG15036 CG34393 trsn mthl6 CG12224 CG15036 Cyp28d2 Spn47C CG3280 GstD3 Hexo2 CG7251 CG13223 Ilp3 CG10909 CG1468 Arc-p20 CG8768 Ncc69 pr-set7 wisp spz3 CG4646 CG32107 pr-set7 CG33252 CG7224 cbc CG10738 Caf1 CG8611 CG4594 CR30068 Tom Acyp2 CG42583 mre11 Cyp6a20 CG7579 CG42822 sol crol Khc-73 CG5235 tRNA:P:90Cb

CG14615 CG9426 GstS1 CG13041 CG31223

tam Muc55B CG13038 TotZ

spel1 tRNA:G3:56EFa Cpr72Ec CG5377

spel1 tRNA:G3:56EFa Fit2 CG13840

Page 20: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

esg CG34369 CG33286 CG13829

snRNA:U5:35D CG32835 Csp beat-IV

beat-Ic TM4SF Aats-ile CG33337

CG6453 CG8157 Snap25 Miro

fbp

rdgC polybromo

CG17470

AGO2 CG11854

CG2201

CG17192

CR33987

CG11951

CG3651

Tpi

CG1544

CG1971

Page 21: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table S3. GO term (InterPro SMART domain) enrichment of genes disrupted by slope-differential TE insertions within putative

promoters (1-100 bp upstream of 5'UTR; see also Table S2).

GO term Genes count % P-value Benjamini-Hochberg P

disulfide bond 10 1.5 3.8E-3 4.1E-1

lipase 4 0.6 7.5E-3 6.2E-1

ZnF_C2H2 7 1.1 1.0E-1 8.4E-1

alternative splicing 14 2.1 2.7E-2 8.5E-1

phospholipase A1 activity 3 0.5 7.8E-3 8.7E-1

MADF 3 0.5 8.5E-2 9.0E-1

repressor 4 0.6 8.7E-2 9.6E-1

hydrolase 24 3.6 8.5E-2 9.8E-1

DoH 2 0.3 7.7E-2 9.8E-1

Copper type II, ascorbate-dependent monooxygenase-like, C-terminal 2 0.3 4.8E-2 9.8E-1

Dopamine-beta-monooxygenase 2 0.3 4.8E-2 9.8E-1

DOMON 2 0.3 9.4E-2 9.9E-1

Page 22: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table S4. Genes with CDSs disrupted by slope-unique TEs insertions.

SFS

X 2L 2R 3L 3R 4

Atf3 galectin Gprk1 Trh CG12581 Syt7

Hr4 galectin Nipped-A CG34022 Nep2 Rad23

Hr4 Pkg21D CG10395 Psa CG17387 Slip1

N CG5440 scaf CG7971 CG2082 bt

dnc CG15358 Or42b Pxn CG2023 Caps

ec CG31689 CG15236 CG14958 glob3 Caps

CG32767 CG3119 Spn42Dc sty CG31556 CG34338 CG3119 CG1358 CG12029 Taf1 Ir7c CG42467 Hey CG17150 Taf1 Ir7c CG10019 CSN7 Rh50 Sas-4 CG15343 hoe1 CG11669 CG10673 CG34127 CG15365 CG14034 CG30345 CG10483 CG31464 CG32702 CG7251 brp loj CG16733 Cht6 CG9527 brp CG10226 CG8358 Cht6 CG11319 wde SP1173 CG6293

CG15221 Rat1 Cpr47Ee unc-13-4A Sodh-2

CG15740 Mnn1 epsilonTry Rac2 Ugt86Dd CG4395 Mnn1 RpIII128 CG32376 CG4757 CG4395 Ndae1 CG13168 Fhos CG31441 Traf-like Wnt4 CG30049 CG5653 dpr4 SmG CG13794 CG8550 CG3689 CG14708 CG4829 CG13794 muskelin CG32043 CG18547 B-H2 U26 Cyp9h1 CG32043 mthl12 tgy CG9466 CG10814 Ir67a CG14377 CG42578 CG32984 CG10814 dpr10 Ir87a Npc1b Or30a CG30484 CG7512 Kif19A CG1529 CG13137 cg thoc6 Su(var)3-9

CG1801 me31B cg CG4328 CG6118 Ir20a hgo Cyp6a20 sowah c(3)G

CG18788 CG8157 mirr CG14876

CG6792 CG30466 Atg1 CG14876

CG17024 CG30083 Meics CG31279

Tehao CG33465 Hsc70Cb TyrRII

B4 CG7747 CG32138 sr

nimB1 NiPp1 CG17173 sr

rk l(2)k01209 CG7739 sr

nAcRalpha-34E l(2)k01209 CG13449 CG7146

Page 23: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

CG33648 CG43066 CG7402 CG7685

CG42817 CG42856 CG7402 Cyp12a4

Cyp303a1 CG33454

MYPT-75D CG34139

CG42389 CG10476 GNBP2 Or92a

Gr36d CG10081 CG10424 Or92a

CadN2 CG15120 wnd AnnIX

sick Obp56c CG8765 SNF4Agamma

CG2617 mus209 CG14185 CG7080

CG9270 CG10543 CG17147 CG7080

CG31627 CG30394 CG10586 cenB1A

Mio Gr58a CG10586 Ir94d

CG31619 CG30275 CG33286 CG16732

lt CG3502 Ac78C sba

Ugt37a1 CG3530 CG12546 CG13625

CG5554 jim CG31109

tamo AGO3 CG11854

zip CG40470 CG13659

CG40470 CG13659

nvd CG31436

CG31436

CG5913

CG5959

CG6073

CG33970

side

CG12428

CG12428

Dhc98D

CG33346

CG10000

CG14518

ppk21

ppk21

ppk19

PH4alphaSG1

CG15539

CG31016

CG15561

CG15561

awd

Page 24: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

NFS

X 2L 2R 3L 3R 4

CG12541 CG5565 Gprk1 klar CG31547 CG2177

CG32719 CG5565 Gprk1 CG7971 Pi4KIIalpha Sox102F

CG1468 CG33124 Nipped-B CG7971 Alh bt

CG2186 v(2)k05816 Nipped-A CG42676 Alh CG11203 CG3036 Tsp42En sty Fer1 Gr10b Bub1 CG1358 Gr64d CG14606 CG15740 Bub1 Hey CG1332 CG14606 CG10362 CG7236 CG8708 Ir64a CG14606 Smr CG11034 sns mthl2 rn Smr CG11034 CG30001 sif pyd3 NFAT CG11034 luna CG10483 CG18744 rut CG31633 san CG10483 CG2767 CG32553 smt3 CG13203 CG10483 CG7800

out CG13397 CG30037 unc-13-4A CG15864

CG9572 CG9555 CG3884 CG34462 CG8358 AnnX Toll-4 TppII CG6776 Npc2c RunxA jp CG10799 CG5741 CG6241 CG1518 CG17633 CG42807 CG5660 CG6254 CG34120 bib CG6357 CG5653 CG14692 Cyp6t1 Fatp cg CG42826 Ugt86De CG14476 Fatp Mdr50 SuUR KLHL18

CG18301 Cyp6a20 CG32082 CG3313

CG14070 Ir52a CG7339 Cyp313a5

piwi Ir52d CG11529 CG6188

CG12307 CG30091 CG11529 ry

Or33a CG30324 Ent3 CG42761

Or33b CG33960 CG32107 CG4525

CG17024 mthl4 CG14120 CG10311

CG17024 CG5756 CG14120 CG10311

CG17024 GstE6 CG12520 CG5840

kek4 GstE6 CG10752 CG7146

CG31729 CG15128 Best4 Dys

CG31729 Or56a Best3 CG11391

CG16848 Rx CG33795 CG11391

CG16848 Xbp1 CG33689 CG34139

CG31731 Rgk3 Eip74EF CG5466

CG31769

ventrally-expressed-protein-D CG34002 CG3353

CG33648

ventrally-expressed-protein-D CG7408 CG31198

Page 25: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

CG33648 robo CG42337 CG17819

CG42313 fd59A CG11306 Ir94c

CG15255 Gr59e CG12546 CG6985

l(2)35Di Gr59e CG40470 CG13833

BicD CG13561 CG40470 sba

CG17570 CG13561 nvd CG31104

CG17570 HSPC300 CG14562 CHKov1

Ugt37a1 slbo

CG5116

CG17472 Ir60d

CG31380

CG33322

CG18472

CG9339

CG6296

Ac3

CG17189

CG11629

Gr98b

CG42748

Gr98d

CG40006

Or98a

lt

CG12426

lt

CG12426

CG10000

CG9997

CG31048

CG14518

CG15817

CG31025

hdc

aralar1

CG34041

CG31016

CG31002

alpha-Est5

Page 26: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table S5. GO BP term enrichment of genes disrupted by slope-differential TE insertions within putative promoters (1-100 bp

upstream of 5'UTR; see also Table S4).

Term Gene count % P-value Benjamini-Hochberg P

sensory perception of chemical stimulus 16 1.7 1.7E-4 1.7E-1

G-protein coupled receptor protein signaling pathway 21 2.3 2.0E-4 1.1E-1

oxidation reduction 30 3.3 1.5E-3 4.3E-1

cognition 20 2.2 2.0E-3 4.2E-1

sensory perception 17 1.8 2.4E-3 4.0E-1

cell surface receptor linked signal transduction 29 3.1 3.7E-3 4.9E-1

sensory perception of taste 7 0.8 6.9E-3 6.6E-1

neurological system process 25 2.7 8.4E-3 6.8E-1

establishment of planar polarity 6 0.7 2.1E-2 9.3E-1

establishment of tissue polarity 6 0.7 2.3E-2 9.2E-1

sensory perception of smell 7 0.8 2.6E-2 9.3E-1

establishment of ommatidial polarity 5 0.5 3.2E-2 9.5E-1

morphogenesis of a polarized epithelium 6 0.7 3.6E-2 9.5E-1

cyclic nucleotide metabolic process 4 0.4 4.7E-2 9.8E-1

developmental programmed cell death 3 0.3 5.1E-2 9.8E-1

4-hydroxyproline metabolic process 3 0.3 5.7E-2 9.8E-1

peptidyl-proline hydroxylation to 4-hydroxy-L-proline 3 0.3 5.7E-2 9.8E-1

peptidyl-proline modification 3 0.3 5.7E-2 9.8E-1

growth 7 0.8 6.9E-2 9.9E-1

phagocytosis 10 1.1 7.4E-2 9.9E-1

conditioned taste aversion 2 0.2 7.7E-2 9.9E-1

axon extension 3 0.3 7.9E-2 9.9E-1

regulation of DNA metabolic process 3 0.3 7.9E-2 9.9E-1

Page 27: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Term Gene count % P-value Benjamini-Hochberg P

inorganic anion transport 3 0.3 8.7E-2 9.9E-1

cell migration 9 1.0 8.9E-2 9.9E-1

aging 6 0.7 9.7E-2 9.9E-1

multicellular organismal aging 6 0.7 9.7E-2 9.9E-1

determination of adult life span 6 0.7 9.7E-2 9.9E-1

Page 28: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Table S6. Characteristics of microsatellites with significantly different allele frequencies between NFS and SFS.

Location gene gene symbol gene region reference allele (nt) motif mer p-value

3R:18467429-18467440 NM_001104405 CG42686 5UTR_intron 12 T 1 5.17E-09

X:17211960-17211973 NM_001272712 B-H2 5UTR_intron 14 A 1 5.10E-07

2L:15873418-15873432 intergenic NA NA 15 A 1 9.25E-07

X:11291756-11291785 NM_167281 dlg1 intron 30 TAA 3 1.55E-06

2L:15638584-15638603 intergenic NA NA 20 CAG 3 1.82E-06

X:7951290-7951305 NM_167144 fs(1)h 5UTR_intron 16 AC 2 1.92E-06

X:5715328-5715340 NM_132041 CG15765 intron 13 T 1 2.84E-06

X:3768618-3768644 NM_176684 CG2930 intron 27 ATCA 4 2.89E-06

3R:20918302-20918326 intergenic NA NA 25 ACCG 4 3.95E-06

2R:3664231-3664251 NM_136477 CG30497 intron 21 TG 2 4.08E-06

X:9983815-9983827 NM_001042802 CG34104 5UTR_intron 13 AAAAAC 6 4.56E-06

3R:19150954-19150965 NM_079737 pnt intron 12 A 1 5.56E-06

Page 29: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide
Page 30: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide
Page 31: Divergence of Drosophila melanogaster repeatomes in response … · Abstract Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide

Intron, 39127

Exon, 8562

Promoter, 692

5'UTR, 2342

3'UTR, 3514

Intergenic, 30212