11
INVESTIGATION Satellite DNA-Like Elements Associated With Genes Within Euchromatin of the Beetle Tribolium castaneum Josip Brajkovi c,* Isidoro Feliciello,* ,Branka Bruvo-Mad W ari c,* and Ðurd W ica Ugarkovi c* ,1 *Department of Molecular Biology, Rud W er Bo skovi c Institute, Bijeni cka 54, HR-10000 Zagreb, Croatia, and Dipartimento di Medicina Clinica e Sperimentale, Università degli Studi di Napoli Federico II, via Pansini 5, I80131, Napoli, Italy ABSTRACT In the red our beetle Tribolium castaneum the major TCAST satellite DNA accounts for 35% of the genome and encompasses the pericentromeric regions of all chromosomes. Because of the presence of transcriptional regulatory elements and transcriptional activity in these sequences, TCAST satellite DNAs also have been proposed to be modulators of gene expression within euchromatin. Here, we analyze the distribution of TCAST homologous repeats in T. castaneum euchromatin and study their association with genes as well as their potential gene regulatory role. We identied 68 arrays composed of TCAST-like elements distributed on all chromosomes. Based on sequence characteristics the arrays were composed of two types of TCAST-like elements. The rst type consists of TCAST satellite-like elements in the form of partial monomers or tandemly arranged monomers, up to tetramers, whereas the second type consists of TCAST-like elements embedded with a complex unit that resembles a DNA transposon. TCAST-like ele- ments were also found in the 59 untranslated region (UTR) of the CR1-3_TCa retrotransposon, and therefore retrotransposition may have contributed to their dispersion throughout the genome. No signicant differ- ence in the homogenization of dispersed TCAST-like elements was found either at the level of local arrays or chromosomes nor among different chromosomes. Of 68 TCAST-like elements, 29 were located within introns, with the remaining elements anked by genes within a 262 to 404,270 nt range. TCAST-like elements are statistically overrepresented near genes with immunoglobulin-like domains attesting to their nonrandom distribution and a possible gene regulatory role. KEYWORDS repetitive DNA satellite DNA gene regulation transposon immunoglobulin- like genes Based on the hypothesis of Britten and Davidson (1971), repetitive elements can be a source of regulatory sequences and act to distribute regulatory elements throughout the genome. In particular, mobile transposable elements (TEs) are predicted to be a source of noncoding material that allows for the emergence of genetic novelty and inu- ences evolution of gene regulatory networks (Feschotte 2008). Re- cently it has been shown that at least 5.5% of conserved noncoding elements unique to mammals originate from mobile elements and are preferentially located close to genes involved in development and transcription regulation (Lowe et al. 2007). The complete sequence conservation, wide evolutionary distribution, and presence of func- tional elements such as promoters and transcription factor binding sites within some satellite DNA sequences has led to the assumption that in addition to participating in centromere formation, they might also act as cis-regulatory elements of gene expression (Ugarkovi c 2005). To perform potential regulatory functions, satellite DNA ele- ments are predicted to be preferentially distributed in euchromatic portion of the genomes in the vicinity of genes. Whole-genome se- quencing projects enable the presence and distribution of satellite DNA repeats in the euchromatic portion of the genome to be de- termined. The analysis of satellite DNA-like elements dispersed within euchromatin, and their comparison with homologous elements pres- ent within heterochromatin, also may reveal insights into the origin of satellite DNAs and their subsequent evolution (Kuhn et al. 2012). Satellite DNAs are major building elements of pericentromeric and centromeric heterochromatin in many eukaryotic species, and in Copyright © 2012 Brajkovi c et al. doi: 10.1534/g3.112.003467 Manuscript received April 30, 2012; accepted for publication June 18, 2012 This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/ by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Supporting information is available online at http://www.g3journal.org/lookup/ suppl/doi:10.1534/g3.112.003467/-/DC1. 1 Corresponding author: Department of Molecular Biology, Rud W er Bo skovi c Institute, Bijeni cka 54, HR-10000 Zagreb, Croatia. E-mail: [email protected] Volume 2 | August 2012 | 931

Satellite DNA-Like Elements Associated With Genes Within

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

INVESTIGATION

Satellite DNA-Like Elements Associated With GenesWithin Euchromatin of the Beetle TriboliumcastaneumJosip Brajkovi�c,* Isidoro Feliciello,*,† Branka Bruvo-MadWari�c,* and ÐurdWica Ugarkovi�c*,1

*Department of Molecular Biology, RudWer Bo�skovi�c Institute, Bijeni�cka 54, HR-10000 Zagreb, Croatia, and †Dipartimentodi Medicina Clinica e Sperimentale, Università degli Studi di Napoli Federico II, via Pansini 5, I80131, Napoli, Italy

ABSTRACT In the red flour beetle Tribolium castaneum the major TCAST satellite DNA accounts for 35%of the genome and encompasses the pericentromeric regions of all chromosomes. Because of the presenceof transcriptional regulatory elements and transcriptional activity in these sequences, TCAST satellite DNAsalso have been proposed to be modulators of gene expression within euchromatin. Here, we analyze thedistribution of TCAST homologous repeats in T. castaneum euchromatin and study their association withgenes as well as their potential gene regulatory role. We identified 68 arrays composed of TCAST-likeelements distributed on all chromosomes. Based on sequence characteristics the arrays were composed oftwo types of TCAST-like elements. The first type consists of TCAST satellite-like elements in the form ofpartial monomers or tandemly arranged monomers, up to tetramers, whereas the second type consists ofTCAST-like elements embedded with a complex unit that resembles a DNA transposon. TCAST-like ele-ments were also found in the 59 untranslated region (UTR) of the CR1-3_TCa retrotransposon, and thereforeretrotransposition may have contributed to their dispersion throughout the genome. No significant differ-ence in the homogenization of dispersed TCAST-like elements was found either at the level of local arraysor chromosomes nor among different chromosomes. Of 68 TCAST-like elements, 29 were located withinintrons, with the remaining elements flanked by genes within a 262 to 404,270 nt range. TCAST-likeelements are statistically overrepresented near genes with immunoglobulin-like domains attesting to theirnonrandom distribution and a possible gene regulatory role.

KEYWORDS

repetitive DNAsatellite DNAgene regulationtransposonimmunoglobulin-like genes

Based on the hypothesis of Britten and Davidson (1971), repetitiveelements can be a source of regulatory sequences and act to distributeregulatory elements throughout the genome. In particular, mobiletransposable elements (TEs) are predicted to be a source of noncodingmaterial that allows for the emergence of genetic novelty and influ-ences evolution of gene regulatory networks (Feschotte 2008). Re-cently it has been shown that at least 5.5% of conserved noncodingelements unique to mammals originate from mobile elements and are

preferentially located close to genes involved in development andtranscription regulation (Lowe et al. 2007). The complete sequenceconservation, wide evolutionary distribution, and presence of func-tional elements such as promoters and transcription factor bindingsites within some satellite DNA sequences has led to the assumptionthat in addition to participating in centromere formation, they mightalso act as cis-regulatory elements of gene expression (Ugarkovi�c2005). To perform potential regulatory functions, satellite DNA ele-ments are predicted to be preferentially distributed in euchromaticportion of the genomes in the vicinity of genes. Whole-genome se-quencing projects enable the presence and distribution of satelliteDNA repeats in the euchromatic portion of the genome to be de-termined. The analysis of satellite DNA-like elements dispersed withineuchromatin, and their comparison with homologous elements pres-ent within heterochromatin, also may reveal insights into the origin ofsatellite DNAs and their subsequent evolution (Kuhn et al. 2012).

Satellite DNAs are major building elements of pericentromericand centromeric heterochromatin in many eukaryotic species, and in

Copyright © 2012 Brajkovi�c et al.doi: 10.1534/g3.112.003467Manuscript received April 30, 2012; accepted for publication June 18, 2012This is an open-access article distributed under the terms of the CreativeCommons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in anymedium, provided the original work is properly cited.Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.112.003467/-/DC1.1Corresponding author: Department of Molecular Biology, RudWer Bo�skovi�cInstitute, Bijeni�cka 54, HR-10000 Zagreb, Croatia. E-mail: [email protected]

Volume 2 | August 2012 | 931

certain species they account for the majority of genomic DNA, as inbeetles from the coleopteran family Tenebrionidae (Ugarkovi�c andPlohl 2002). In the red flour beetle Tribolium castaneum, pericentro-meric heterochromatin comprises approximately 40% of the genome,and TCAST satellite DNA has previously been characterized as themajor satellite that encompasses centromeric as well as pericentro-meric regions of all 20 chromosomes (Ugarkovi�c et al. 1996). TCASTsatellite is composed of two subfamilies, Tcast1a and Tcast1b, whichtogether comprise 35% of the whole genome. Tcast1a and Tcast1bhave an average homology of 79% and are a similar size at 362 bp and377 bp, respectively, but they are characterized by a divergent, sub-family specific region of approximately 100 bp (Feliciello et al. 2011).The genome sequencing project of T. castaneum has recently beencompleted (Richards et al. 2008). Sequencing involved the euchro-matic portion of the genome, with.20% of the genome, correspond-ing to heterochromatic regions, excluded due to technical difficulties.

In this article, we searched for the presence of TCAST satellite-homologous elements within the assembled T. castaneum genome byusing a comprehensive computational analysis. By searching the se-quenced T. castaneum genome, we found 68 TCAST satellite DNAarrays within the euchromatin of all chromosomes. They were map-ped to 59 or 39 ends, as well as within introns, of more than 100protein-coding genes. Based on sequence characteristics, dispersedTCAST-like elements were classified into two groups. The first groupincludes partial TCAST satellite monomers or short arrays of tan-demly arranged monomers up to tetramers. The second group con-tains TCAST-like element embedded within complex repeat units thatcontain two hallmarks of DNA transposons, terminal inverted repeatsand target-size duplications. The evolutionary relationship and possi-ble modes of dispersion of the two types of dispersed TCAST-likesequences are discussed. In addition, we examined the sequence di-vergence, phylogenetic relationship, and chromosomal distribution ofthe elements. Annotation, characterization, and classification of geneswithin the region of TCAST-like elements are reported, with thepreferential localization of TCAST-like elements near specific groupsof genes identified. Our results demonstrate for the first time, theenrichment of satellite DNA-like elements in the vicinity of genes withimmunoglobulin-like domains and suggest their possible gene-regulatoryrole.

MATERIALS AND METHODSBLASTN version 2.2.22+ was used to screen the NCBI refseq_ge-nomic database of T. castaneum. All scaffolds that have not beenmapped to linkage groups were also screened. The program was op-timized to search for highly similar sequences (megablast) to the querysequence [TCAST consensus sequence (Ugarkovi�c et al. 1996)]. Genesflanking TCAST–homologous elements were found automatically byNCBI blast. Sequences corresponding to hits, as well as their flankingregions, were analyzed by dot plot (http://www.vivo.colostate.edu/molkit/dnadot/), using standard parameters (window size 9, mismatchlimit 0), or more relaxed conditions (window size 11, mismatch limit 1),to determine the exact start and end site of specific TCAST-likeelements. The TCAST transposon-like elements were analyzed indetail for the presence of hallmarks such as terminal inverted repeats(TIRs) and target-site duplications with the aid of the Gene Jockeysequence analysis program (for Apple Macintosh). Secondary struc-tures were determined using the default parameters of the MFOLDprogram available online [http://mfold.rna.albany.edu/?q=mfold (Zuker2003)]. AT content was analyzed using BioEdit Sequence AlignmentEditor (Hall 1999). Repbase, a reference database of eukaryotic repetitiveDNA, was screened using WU-BLAST (Kohany et al. 2006).

Sequence alignment was performed using MUSCLE algorithm(Edgar 2004) combined with manual adjustment. All sequences wereincluded in the alignment, with the exception of the ones that did notat least partially overlap with other sequences. Gblocks was used toeliminate poorly aligned positions and divergent regions of the align-ments (Talavera and Castresana 2007). Alignments (original fastafiles) are available upon request. jModelTest 0.1.1 software (Posada2008) was used to infer best-fit models of DNA evolution—TPM3uf+Gfor transposon-like and A type elements and TPM1uf for B typeelements. Maximum likelihood (ML) trees were estimated with thePhyML 3.0 software (Guindon and Gascuel 2003) using best-fitmodels. Markov chain Monte Carlo Bayesian searches were per-formed in MrBayes v. 3.1.2. (Huelsenbeck and Ronquist 2001) underthe best-fit models (two simultaneous runs, each with four chains; 3· 106 generations; sampling frequency one in every 100 generations;majority rule consensus trees constructed based on trees sampledafter burn-in). Branch support was evaluated by bootstrap analysis(1000 replicates) in ML and by posterior probabilities in Bayesiananalyses. Pairwise sequence diversity (uncorrected P) was calculatedusing the MEGA 5.05 software (Tamura et al. 2011).

T. castaneum gene homologs in Drosophila melanogaster weresearched using the OrthoDB Phylogenomic database. Each gene hasOrthoDB identificator, with Uniprot data linked to OrthoDB (Water-house et al. 2011). To find sets of biological annotations that fre-quently appear together and are significantly enriched in a set ofgenes located near TCAST-like elements, program GeneCodis 2.0available online (http://genecodis.dacya.ucm.es/) was used. GeneCodisgenerates statistical rank scores for single annotations and their com-binations. To find all the possible combinations of annotations, Gen-eCodis uses the apriori algorithm introduced by Agrawal et al. 1993.Once the annotations were extracted, a statistical analysis based on thehypergeometric distribution or the x2 test of independence was exe-cuted to calculate the statistical significance (P values) for each in-dividual annotation or co-annotations.

Two-tailed hypergeometric test with Bonferroni correction (alpha =0.025) was used to analyze the distribution of TCAST-like elementsamong T. castaneum chromosomes. In each chromosome the fre-quency of TCAST-like elements was compared with the frequency inthe complete sample and the significance of deviations was calculated.

RESULTS

Identification of dispersed TCAST-like elementsUsing the consensus sequence of TCAST satellite DNA (Ugarkovi�cet al. 1996) as a query sequence, we screened the NCBI refseq_genomic database of T. castaneum with the alignment programBLASTN version 2.2.22+. The program was optimized to search forhighly similar sequences (megablast) and blast hits on the query se-quence were analyzed individually. Alignments were mapped regard-ing start and end site, chromosome number, and total length. Whenthe distance between two alignments on the same chromosome wasshort, the genomic sequence was further analyzed by dot plot toidentify any potential continuity between the two alignments. Onlygenomic sequences with at least 140 nt (40% of TCAST monomerlength) of continuous sequence and .80% identity to the TCASTconsensus sequence were considered for further analysis. The totalnumber of dispersed TCAST-like elements was 68, with 36 elementsflanked by genes at both 59 and 39 ends, 3 elements flanked by a singlegene either at 59 or 39 end (sequences no. 36, 39, 50), and the 29elements positioned within introns (Table 1). Except 68 TCAST-likeelements associated with genes, no other dispersed TCAST-like

932 | J. Brajkovi�c et al.

elements were found within the assembled T. castaneum genome.Analysis of scaffolds that have not been mapped to linkage groupsrevealed the presence of an additional 41 TCAST-like elements, butbecause they were not mapped to T. castaneum genome and couldpossibly derive from heterochromatin, we did not consider them forfurther analysis.

There were only three cases in which two different TCAST-likeelements were associated with the same gene: gene D6X2C4 containsTCAST-like sequences no. 6 and 13 within introns, gene D6X2U7 isflanked at 59 and 39 end by sequences no. 5 and 7, respectively,whereas gene D6WB29 is located at 39 end of the sequence no. 53and has sequence no. 52 within an intron. All other TCAST-likeelements were positioned near or within different genes. Thus in total,there were 101 genes found in the vicinity of TCAST-like elements.Characteristics of the genes associated with TCAST-like elements,including gene identity number, gene name and chromosomal loca-tion, position relative to the associated TCAST-like element, and dis-tances between TCAST-like elements and genes, are shown in Table 1Distances between TCAST-like elements and genes range from 262 nt(gene positioned at 39 site of the sequence no. 36), to a maximaldistance of 404,270 nt (gene positioned at 59 site of the sequenceno. 5).

Characteristics of TCAST-like elementsTCAST satellite-like elements: Sequence analysis of the 68 TCAST-like elements identified within the vicinity of genes enabled theirclassification into two groups. The first group contains partial TCASTsatellite monomers or tandemly arranged elements, either complete orpartial dimers, trimers, or tetramers (Table 1). The minimal size ofsatellite repeat was 203 nt (0.6 of complete TCAST monomer; se-quence no. 15), whereas the maximal size was 1440 nt (four completeTCAST monomers; sequence no. 43; Table 1). In many sequences,two subtypes of TCAST satellite monomers were mutually inter-spersed: Tcast1a and Tcast1b. Tcast1b corresponds to the TCASTsatellite consensus that was used as a query sequence (Ugarkovi�c et al.1996), and Tcast1a corresponds to the TCAST subfamily described inFeliciello et al. 2011. Tcast1a and Tcast1b have an average homologyof 79% and are of similar sizes at 362 bp and 377 bp respectively, butare characterized by a divergent, subfamily specific region of approx-imately 100 bp (Feliciello et al. 2011). There were 34 TCAST satellite-like elements found within or in the region of 53 genes. Lengths ofTCAST satellite-like elements (Table 1), their exact start and end siteswithin genomic sequence and composition (supporting information,Table S1) are provided.

To see whether there is any clustering of sequences of TCASTsatellite-like elements due to the difference in the homogenization atthe level of local array, chromosome, or among different chromo-somes, sequence alignment and phylogenetic analysis were performed.Tcast1a and Tcast1b subunits were extracted from TCAST satellite-like sequences and analyzed separately. Alignment was performed on24 Tcast1a subunits, ranging in size from 136 and 377 bp (File S1).The average pairwise distances between Tcast1a subunits of TCASTsatellite-like sequences was 5.8%. Alignment adjustment using Gblocks,which eliminates poorly aligned positions and divergent regions,resulted in few changes; therefore, the original, unadjusted align-ment was used for the construction of phylogenetic trees. Becausethe sequences differ in lengths and comprise regions of divergentvariability, methods that take into account specific models of DNAevolution were considered as the most suitable for the constructionof phylogenetic trees, ML and Bayesian (Markov chain Monte

Carlo). The ML tree showed weak resolution with no significantsupport for clustering of sequences derived from the same satellite-likearray or from the same chromosome. Similarly, the Bayesian tree dem-onstrated no significant sequence clustering (Figure 1A).

Alignment of 28 Tcast1b subunits, ranging from 159 bp to 363 bp(File S2), was also not significantly affected by adjustment withGblocks; therefore, the unadjusted alignment was used for the con-struction of phylogenetic trees (Figure 1B). The average pairwise di-vergence between Tcast1b subunits, of TCAST satellite-like sequences,was 6.7%. With the ML phylogenetic tree, four groups composed oftwo or three sequences, were resolved by relatively low bootstrapvalues. However, the majority of Tcast1b subunit sequences remainedunresolved. There was no clustering of subunits derived from thesame array or the same chromosome (Figure 1B). Bayesian tree anal-ysis produced one significantly supported cluster composed of 10sequences derived from 7 chromosomes (Figure 1B).

TCAST transposon-like elements: The second group of TCAST-likerepeats is represented by a complex element that contains an almostcomplete TCAST (or Tcast1b) monomer, and a TCAST monomersegment of approximately 121 bp in an inverted orientation. Thesetwo TCAST segments are separated by a nonsatellite sequence ofapproximately 306 bp. Both TCAST segments are part of TIRs that areapproximately 269 bp long (Figure 2). As a result of the long TIRs,these elements are likely to form stable hairpin secondary structuresand therefore resemble transposons. The nonsatellite part of sequence,common for all TCAST transposon-like elements, is unique in that itdoes not exhibit significant homology to any other sequence withinthe T. castaneum genome. There were 34 TCAST transposon-likeelements found within or in the vicinity of 50 genes. Their lengths(Table 1) and exact start and end sites within genomic sequence(Table S1) are provided. Sequence analysis of TCAST transposon-likeelements determined that 13 of them were. 1000 bp, with a maximalsize of 1181 bp (Table 1). The remaining TCAST transposon-likeelements were shorter, with a minimal size of 314 bp (sequence no.27), and usually lacking part of, or one or both, TIRs. Conserved TIRsare necessary for transposition, and if they are absent, truncated, ormutated so that the transposase cannot interact with the transposonsequence, the transposon cannot be mobilized and therefore repre-sents a molecular fossil of a once active transposon (Capy et al. 1998).Despite mutations and partial truncations of TIRs within the TCASTtransposon-like elements, and likely because of the length of the TIRs,most of the elements still preserve a stable secondary structure andcould potentially remain mobile.

Some TCAST transposon-like elements .1000 bp have a 3-bpduplication at the site of insertion in the form of ACT. One TCASTtransposon-like element (sequence no. 39) is inserted into anotherrepetitive DNA, indicated as Tcast2, which had been previously iden-tified bioinformatically (Wang et al. 2008). Sequence analysis of thistransposon-like element confirms the continuity of Tcast2 from theduplication site “ACT.” Typically, the size of target-site duplication isa hallmark of different superfamilies of eukaryotic DNA transposons,with mariner/Tc1, the only superfamily whose members are charac-terized by either 2- or 3-bp target-site duplication (Capy et al. 1998;Kapitonov and Jurka 2003; Feschotte and Pritham 2007). There arethree open reading frames (ORFs) within TCAST transposon-likesequences, but the resulting putative proteins are very short and donot share similarity with any other proteins (Figure 2). The elementstherefore do not code for transposases and are considered nonauton-omous. Using the whole TCAST transposon-like elements as a query

Volume 2 August 2012 | Gene-Associated Satellite DNA | 933

nTa

ble

1TC

AST

-like

elem

ents

asso

ciated

withgen

eswithinT.ca

staneum

euch

romatin

Uniprot

Entrez

Gen

eNam

eChr

Sat_seq.

Positio

nDistanc

e,bp

DM

Hom

olog

FBgn

Type

Leng

thCop

ies

D6W

ZP1

6625

64Alte

reddisjunc

tion

91

5918

,773

Q9V

EH1

FBgn0

0000

63Sa

tellite

734

2.0

D6W

ZP3

6626

24Ra

s-relatedprotein

Rab-26

91

3977

95Q9V

P48

FBgn0

0869

13Sa

tellite

734

2.0

D6W

ZL9

6619

47Prob

able

serin

e/threon

ine-protein

kina

se9

2Inside

Q0K

HT7

FBgn0

0526

66Sa

tellite

993

2.8

D6X

226

6602

75Arrest

93

5999

,669

Q8IP8

9FB

gn0

0001

14Sa

tellite

716

2.0

D6X

238

6617

41Num

b9

339

115,98

4P1

6554

FBgn0

0029

73Sa

tellite

716

2.0

1001

4183

2no

match

onun

iprot

94

5915

20Sa

tellite

517

1.4

D6X

2D0

6604

40Sh

ort-ch

aindeh

ydrogen

ase

94

3967

04Q9V

E80

FBgn0

0386

10Sa

tellite

517

1.4

D6X

1E7

6568

84Cytoc

hrom

eP4

5030

6A1

95

5940

4,27

0Q9V

WR5

FBgn0

0049

59Sa

tellite

1058

2.9

D6X

2U7

6569

77Elon

gase

95

3999

47Q9V

CY6

FBgn0

0389

86Sa

tellite

1058

2.9

D6X

2C4

6601

95Dop

aminerece

ptor1

96

Inside

P415

96FB

gn0

0115

82Sa

tellite

304

0.8

D6X

2U7

6569

77Elon

gase

97

5971

28Q9V

CY6

FBgn0

0389

86Sa

tellite

394

1.1

D6X

366

6570

55elon

gationof

very

long

chainfatty

acidsprotein

97

3950

,111

Q9V

CY5

FBgn0

0531

10Sa

tellite

394

1.1

D6X

0D7

6577

48Re

ton

cogen

e9

859

56,625

Q8INU0

FBgn0

0118

29Sa

tellite

213

0.6

D6X

0E1

6578

29Dpr9

98

3962

,781

Q9V

FD9

FBgn0

0382

82Sa

tellite

213

0.6

D6X

2H8

6549

54ADAM

metalloprotease

99

Inside

Q6Q

U65

FBgn0

0513

14Tran

sposon

1107

D6X

2U7

6555

61Elon

gase

910

5947

,902

Q9V

CZ0

FBgn0

0389

83Tran

sposon

1085

D6X

2V3

6556

40Pu

tativ

eun

characterized

protein

910

3967

,953

Q9V

DB7

FBgn0

0388

81Tran

sposon

1085

D6X

244

6550

11Se

rine/threon

ine-protein

kina

se32

B9

11Inside

Q0K

ID3

FBgn0

0529

44Tran

sposon

1062

D6X

374

1001

4152

1Pu

tativ

eun

characterized

protein

912

Inside

Q9V

GZ4

FBgn0

0378

14Sa

tellite

292

0.8

D6X

2C4

6601

95Dop

aminerece

ptor1

913

Inside

P415

96FB

gn0

0115

82Tran

sposon

900

D6X

259

6562

90Tran

sportan

dGolgio

rgan

ization13

914

5994

56Q9V

GT8

FBgn0

0402

56Sa

tellite

222

0.6

D6X

260

6563

73Protein-tyrosine

sulfo

tran

sferase

914

3933

,523

Q9V

YB7

FBgn0

0866

74Sa

tellite

222

0.6

D6X

075

6586

03MICAL-likeprotein

915

5939

,684

Q9V

U34

FBgn0

0363

33Sa

tellite

203

0.6

D6X

1P2

6588

91tip

top

915

3914

2,82

1Q9U

3V5

FBgn0

0289

79Sa

tellite

203

0.6

D6X

095

6591

95Trop

onin

C9

1659

3922

P479

47FB

gn0

0133

48Tran

sposon

589

D6X

0I1

6593

36Trop

onin

C9

1639

15,143

P479

47FB

gn0

0133

48Tran

sposon

589

D6X

1J0

6557

13Tran

sporter

917

Inside

Q9N

B97

FBgn0

0341

36Sa

tellite

915

2.5

D6W

F56

1001

4187

7zinc

fing

erprotein

250

318

5912

5,68

5Q7K

AH0

FBgn0

0273

39Sa

tellite

1208

3.4

D6W

F61

6569

24Tran

scrip

tioninitiationfactor

TFIID

subun

it7

318

3964

,294

Q9V

HY5

FBgn0

0249

09Sa

tellite

1208

3.4

D6W

GB1

6590

40Mah

ya3

1959

115,53

7P2

0241

FBgn0

0029

68Sa

tellite

687

1.9

D6W

GB5

6592

01V-typ

eprotonATP

asesubun

itE

319

3982

,217

P546

11FB

gn0

0153

24Sa

tellite

687

1.9

D6W

II010

0141

571

NADH

deh

ydrogen

ase,

putative

320

5942

78Q9W

3N7

FBgn0

0299

71Tran

sposon

635

D6W

II210

0142

263

Putativ

eun

characterized

protein

320

3918

,585

Q9V

IY1

FBgn0

0327

69Tran

sposon

635

D6W

FT8

6575

35WD

repea

t-co

ntaining

protein

473

21Inside

Q96

0Y9

FBgn0

0264

27Sa

tellite

604

1.7

D6W

DY2

6561

25Kyn

uren

ineam

inotransferase

322

5910

,696

Q8S

XC2

FBgn0

0379

55Tran

sposon

1000

D6W

DY4

6562

98Ann

exin

IX3

2239

11,599

P224

64FB

gn0

0000

83Tran

sposon

1000

D6W

FK8

6561

74an

kyrin

2,3/un

c44

323

Inside

Q7K

U95

FBgn0

0854

45Tran

sposon

1081

D6W

FX1

6578

74ralg

uanine

nucleo

tideex

chan

gefactor

324

5964

,025

Q8M

T78

FBgn0

0341

58Tran

sposon

888

D6W

FX3

6580

31galactose-1-pho

spha

teuridylyltran

sferase

324

3939

58Q9V

MA2

FBgn0

0318

45Tran

sposon

888

D6W

DQ4

6592

33Pu

tativ

eun

characterized

protein

325

5915

,051

Q8T

0R9

FBgn0

0388

09Tran

sposon

1016

D6W

DQ6

6593

76co

iled-coild

omainco

ntaining

963

2539

9162

A1Z

A72

FBgn0

0139

88Tran

sposon

1016

D6W

F68

6550

42gluco

sedeh

ydrogen

ase

326

5925

,896

Q9V

Y00

FBgn0

0305

98Tran

sposon

1067

C3X

Z92

6553

48Mito

gen-activ

ated

protein

kina

sekina

sekina

sekina

se2

326

3992

,876

Q8S

YA1

FBgn0

0344

21Tran

sposon

1067

(con

tinue

d)

934 | J. Brajkovi�c et al.

nTa

ble

1,co

ntinue

d

Uniprot

Entrez

Gen

eNam

eChr

Sat_seq.

Positio

nDistanc

e,bp

DM

Hom

olog

FBgn

Type

Leng

thCop

ies

D6W

E82

6584

63Pu

tativ

eun

characterized

protein

327

Inside

Q9V

DK2

FBgn0

0388

15Tran

sposon

314

D6W

HX6

6581

91Pu

tativ

eun

characterized

protein

328

5917

3,88

1Q1R

KQ9

FBgn0

0853

82Tran

sposon

826

D6W

I58

6583

43Cathe

psinL

328

3982

,559

Q95

029

FBgn0

0137

70Tran

sposon

826

D6W

DJ9

6569

22Pu

tativ

eun

characterized

protein

329

5917

3,54

8Q8IPJ

1FB

gn0

0318

59Tran

sposon

1084

D6W

DN0

6575

59PR

MT5

329

3938

3,80

9Q9U

6Y9

FBgn0

0159

25Tran

sposon

1084

D6W

GS3

6569

76Pu

tativ

eun

characterized

protein

330

5937

572

A0A

MQ8

FBgn0

0346

55Sa

tellite

216

0.6

D6W

GT0

1001

4251

5calpain3

330

3922

6,70

7Q11

002

FBgn0

0086

49Sa

tellite

216

0.6

D6W

DS8

6605

32Muscle-sp

ecificprotein

300

331

5937

8,62

6Q4A

BG9

FBgn0

2609

52Tran

sposon

1060

D6W

DT0

6548

60Ph

ospha

tidylinosito

l-bindingclathrin

assembly

protein

331

3978

55C1C

3H4

FBgn0

0863

72Tran

sposon

1060

D6W

HF2

6641

88Nep

hrin

332

Inside

Q9W

4T9

FBgn0

0283

69Tran

sposon

666

D6W

I96

1001

4262

0Hea

tshoc

kprotein

703

33Inside

P111

47FB

gn0

0012

19Tran

sposon

1058

D6W

G02

6549

17N-ace

tylgluco

saminyltran

sferasevi

334

Inside

Q9V

UH4

FBgn0

0364

46Tran

sposon

319

D6W

YD1

6568

91Pu

tativ

eun

characterized

protein

835

5938

5,71

2Q8S

Y79

FBgn0

0322

49Sa

tellite

625

1.7

D6W

YN3

6549

42serin

e-typeproteaseinhibito

r8

3539

58,583

Q9V

SC9

FBgn0

0358

33Sa

tellite

625

1.7

D6W

YA1

6579

13Cop

iaprotein

(Gag

-int-pol

protein)

836

3926

2B6V

6Z8

??Sa

tellite

196

0.5

D6W

YC9

6567

18Cmp-n-ace

tylneu

raminic

acid

syntha

se8

37Inside

B5R

JF3

FBgn0

0522

20Tran

sposon

831

D6W

V42

6560

28CG50

808

38Inside

Q7K

3E2

FBgn0

0313

13Tran

sposon

582

D6W

YA0

1001

4250

7Bea

tenpath

839

5971

65Q94

534

FBgn0

0134

33Tran

sposon

1181

D6W

UX6

6622

35Pu

tativ

eun

characterized

protein

840

Inside

Q7K

UK9

FBgn0

0364

54Tran

sposon

440

D6X

0E1

6549

38defec

tiveprobo

scisex

tensionresp

onse

741

Inside

Q9V

FD9

FBgn0

0382

82Sa

tellite

722

2.0

D6W

PX8

6620

21Ribosom

e-releasingfactor

2,mito

chon

dria

l7

4259

17,480

Q9V

CX4

FBgn0

0511

59Tran

sposon

905

A2A

X72

6620

58Gustatory

rece

ptor

742

3915

81Q9V

PT1

FBgn0

0412

50Tran

sposon

905

D6W

TD1

6618

95similarto

chitina

se6

743

Inside

Q9W

2M7

FBgn0

0345

80Sa

tellite

1440

4.0

D6W

PE6

1001

4207

3vo

ltage-gated

potassium

chan

nel

744

Inside

P179

70FB

gn0

0033

83Tran

sposon

814

D2A

2C6

6638

49Pu

tativ

eun

characterized

protein

445

5994

89Q9V

3S3

FBgn0

0133

00Sa

tellite

549

1.5

D2A

2D1

6638

75Pu

tativ

eun

characterized

protein

445

3910

,920

Q9W

191

FBgn0

0349

94Sa

tellite

549

1.5

D2A

2I0

6570

17Pu

tativ

eun

characterized

protein

446

5958

20Q8S

Z28

FBgn0

0337

86Sa

tellite

558

1.6

D2A

2I1

6570

98Ribon

ucleoside-dipho

spha

tereduc

tase

446

3970

00P4

8591

FBgn0

0120

51Sa

tellite

558

1.6

D1Z

ZG6

6609

83Kinesin-like

protein

447

Inside

Q9V

LW2

FBgn0

0319

55Tran

sposon

508

D2A

2P8

1001

4259

5Pigg

yBac

tran

sposab

leelem

ent

448

Inside

Q9V

HL1

FBgn0

0376

33Tran

sposon

377

D6W

B65

6550

28E7

42

4959

60,525

P201

05FB

gn0

0005

67Sa

tellite

770

2.1

D6W

B73

6549

62orga

niccatio

ntran

sporter

249

3926

38Q7K

3M6

FBgn0

0344

79Sa

tellite

770

2.1

D6W

BG8

6591

29pre-m

RNA-splicinghe

licaseBRR

22

5039

4811

Q9V

UV9

FBgn0

0365

48Sa

tellite

728

D6W

B14

6588

44mon

ophe

nolic

aminetyramine

251

5979

55P2

2270

FBgn0

0045

14Tran

sposon

567

D6W

B15

6587

69Cuticular

protein

47Ef

251

3916

,173

A1Z

8H7

FBgn0

0336

03Tran

sposon

567

D6W

B29

6577

78En

dop

roteaseFU

RIN

252

Inside

P304

32FB

gn0

0045

98Tran

sposon

1045

A8D

IV5

6579

42Nicotinic

acetylch

olinerece

ptorsubun

italpha

112

5359

13,645

P251

62FB

gn0

0041

18Tran

sposon

1021

D6W

B29

6577

78En

dop

roteaseFU

RIN

253

3948

75P3

0432

FBgn0

0045

98Tran

sposon

1021

D6X

3I9

6617

87Tran

scrip

tioninitiationfactor

IIF10

5459

10,025

Q05

913

FBgn0

0102

82Sa

tellite

870

2.4

D6X

3J1

6618

27Pu

tativ

eun

characterized

protein

1054

3966

07Sa

tellite

870

2.4

D6X

4P3

6553

89Neu

tral

alph

a-gluco

sidaseab

1055

Inside

Q7K

MM4

FBgn0

0275

88Sa

tellite

694

1.9

D6X

3H5

6612

46Neu

rexin-4

1056

5922

34Q94

887

FBgn0

0139

97Sa

tellite

224

0.6

D6X

3H7

6613

08Su

ccinatesemialdeh

ydedeh

ydroge

nase

1056

3914

,901

Q9V

BP6

FBgn0

0393

49Sa

tellite

224

0.6

D6X

4V6

6559

16Tu

bby,

putative

1057

Inside

Q9V

B18

FBgn0

0395

30Tran

sposon

763

(con

tinue

d)

Volume 2 August 2012 | Gene-Associated Satellite DNA | 935

sequence, we searched the T. castaneum Gen Bank database for “full-sized” homologous elements that could potentially code for transpo-sases and be considered autonomous. The search identified an element,named TR 1.9, with a 925-bp sequence inserted within a unique se-quence of the TCAST transposon-like elements (Figure 2). This925-bp sequence contains an ORF of 206 amino acids and a con-served domain belonging to the Transposase 1 superfamily, whichalso includes the mariner transposase. DNA transposons of themariner/Tc1 superfamily Mariner-1_TCa and Mariner-2_TCa, wereidentified within the T. castaneum genome (Jurka 2009a, 2009b).Using BLASTP and the translated sequence from the 925 bp ORFas a query sequence, we identified hits with a partial homology toa Mariner-2_TCa transposase and to a mariner-like element trans-posase present in two other insects, the beetle Agrilus planipennis(emerald ash borer) and Chrysoperla plorabunda (green lacewing;Neuroptera), but not to Mariner-1_ TCa transposase.

To test whether there is any chromosome-specific sequenceclustering of TCAST transposon-like sequences that could suggestdifference in homogenization within chromosome and among differ-ent chromosomes, the alignment and subsequent phylogenetic analysisof TCAST transposon-like sequences was performed. Because TCASTtransposon-like elements differ significantly in size (31421181 nt), thealignment and phylogenetic analyses was performed on 25 elementsthat mutually overlap in their sequences, whereas the other nineTCAST transposon-like elements were excluded from the analysisdue to the very low overlapping with other elements. Alignment wasadditionally adjusted using Gblocks (File S3). The average pairwisedivergence among TCAST transposon-like sequences was 12.7%.ML and Bayesian methods gave similar tree topologies (Figure 1C).The ML tree showed very weak resolution of TCAST transposon-likesequences and a general absence of subgroups with specific sequencecharacteristics (Figure 1C). Only two clusters were formed whereas,using the Bayesian tree, we identified three well-supported groups; twoof them were as for ML tree (Figure 1C).

Distribution of TCAST-like elementson T. castaneum chromosomesTCAST-like elements found in the vicinity of genes were distributedon all 10 T. castaneum chromosomes (Table 1). Positions of consti-tutive heterochromatin and euchromatin were assigned on the haploidset of T. castaneum chromosomes, based on C-banding data (Stuartand Mocelin 1995) and Tribolium castaneum 3.0 Assembly data (Fig-ure 3). Within euchromatic segments, the position of each TCAST-like element is specifically indicated (Figure 3) based on the positionwithin the genomic sequence (Table S1). TCAST-like elements weredispersed on both arms of chromosomes 3, 5, 9, and 7, whereas onother chromosomes they were located on a single arm (Figure 3). Thenumber of TCAST-like elements ranged from 2 on chromosome 1(X)to 17 on chromosomes 3 and 9. To detect whether TCAST-like ele-ments were distributed randomly among the T. castaneum chromo-somes or whether there was a significant over or underrepresentationof the elements on some chromosomes we performed hypergeometricdistribution analysis test. The analysis revealed no statistically signif-icant deviation in the number of TCAST-like elements among thechromosomes (Figure S1), pointing to their random distribution.

To determine whether there was a target preference for theinsertion of TCAST-like elements, for example high AT content oranother sequence characteristic, we analyzed the AT content within100 bp of the flanking regions for each TCAST -like element, fromboth 59 and 39 sites (Figure S2 and Figure S3). The average AT contentn

Table

1,co

ntinue

d

Uniprot

Entrez

Gen

eNam

eChr

Sat_seq.

Positio

nDistanc

e,bp

DM

Hom

olog

FBgn

Type

Leng

thCop

ies

D6X

3J6

6620

34Pu

tativ

eun

characterized

protein

1058

5910

15Q9V

EJ9

FBgn0

0385

11Sa

tellite

564

1.6

D6X

3J7

6570

69cd

c73dom

ainprotein

1058

3927

,239

Q9V

HI1

FBgn0

0376

57Sa

tellite

564

1.6

D2A

693

6632

31lysine

-spec

ificdem

ethy

lase

4B6

59Inside

Q9V

6L0

FBgn0

0531

82Sa

tellite

498

1.4

D2A

490

6596

55Fa

cilitated

treh

alosetran

sporter

Tret1-2ho

molog

660

Inside

Q8M

KK4

FBgn0

0336

44Tran

sposon

689

D2A

6I4

6597

28Pu

tativ

eun

characterized

protein

661

5911

6,03

0Q9W

4G2

FBgn0

2609

71Tran

sposon

764

D2A

6I6

6597

91Pu

tativ

eun

characterized

protein

661

3948

60Q9V

NB4

FBgn0

0373

23Tran

sposon

764

D2A

3V0

6572

72Fa

sciclin

-36

6259

37,286

P152

78FB

gn0

0006

36Sa

tellite

281

0.8

D2A

3V3

6574

21LIM

dom

ainkina

se1

662

3921

,789

Q8IR7

9FB

gn0

0412

03Sa

tellite

281

0.8

D6W

8F4

6603

22Disco

-related

x63

Inside

Q9V

XJ5

FBgn0

0426

50Sa

tellite

530

1.5

D6W

8D3

6591

23Plex

Ax

6459

1973

O96

681

FBgn0

0257

41Tran

sposon

848

D6W

GD2

6592

72Aldose-1-ep

imerase

x64

3964

72Q9V

RU1

FBgn0

0356

79Tran

sposon

848

B3M

MG1

6576

52Neu

ral-c

adhe

rin5

65Inside

O15

943

FBgn0

0156

09Sa

tellite

273

0.8

D6W

NN6

6585

79Tran

sien

trece

ptorpoten

tial-g

amma

protein

566

5925

47Q9V

JJ7

FBgn0

0325

93Tran

sposon

894

A3R

E80

6586

61Cardioacce

leratory

pep

tiderece

ptor

566

3927

,105

Q86

8T3

FBgn0

0393

96Tran

sposon

894

A1J

UG2

6612

07Ultraspira

cle

567

Inside

P201

53FB

gn0

0039

64Sa

tellite

379

1.1

D6W

NB3

6560

63Ybox

protein

568

5914

,993

O46

173

FBgn0

0229

59Sa

tellite

455

1.3

D6W

NB6

6560

95Pe

ptid

ech

ainreleasefactor

15

6839

350,36

5Q9V

K20

FBgn0

0324

86Sa

tellite

455

1.3

Alistof

gen

eswith

gen

eiden

titynu

mbers,

gen

ena

me,

chromosom

allocatio

n,position

,an

ddistanc

erelativ

eto

theassociated

TCAST

-like

elem

ent,as

wella

salistof

TCAST

-like

elem

ents,theirtypes

(satelliteor

tran

sposon

-like

),totallen

gth

inbp,an

dco

pynu

mber

ofsatellite

repea

tswith

inan

arrayareshow

n.

936 | J. Brajkovi�c et al.

of the flanking regions for both TCAST satellite-like elements andTCAST transposon-like elements did not differ significantly fromthe average AT content of the whole T. castaneum genome or fromthe AT content of randomly selected intergenic regions andintrons. Thus, this finding suggests that with regard to AT con-tent, there is no target preference for the insertion of TCAST-likeelements. Furthermore, alignment and comparison of all flankingsequences of TCAST-like elements did not identify any commonsequence motifs.

Genes in the vicinity of TCAST-like elementsUniprot gene numbers were used as identifiers of genes located in thevicinity of TCAST-like elements (gene names shown in Table 1).Uniprot gene numbers for homologous genes found in Drosophilamelanogaster are also indicated (Table 1). Detailed description ofthe genes, including molecular function of their protein products,biological processes in which these proteins are involved, and theircellular localization (cellular component), are shown (Table S1). Eachidentified gene is assigned to a particular TCAST-like element withinits vicinity, and the precise position of TCAST-like elements in geno-mic sequence (start and end site) is indicated (Table S1). Functionalanalysis revealed that 17 of 101 genes correspond to putative unchar-acterized proteins, whereas the remaining genes are involved in dif-ferent molecular functions and diverse biological processes. Amongthe proteins, a proportion is characterized by ATP binding activity (13proteins) and involvement in protein phosphorylation and /or signaltransduction (9 proteins; Table S1).

To determine whether TCAST-like elements are distributedrandomly relative to genes or whether they are overrepresented nearspecific groups of genes, we used GeneCodis 2.0 to provide a statisticalrepresentation of the genes associated with TCAST-like elements.Because many genes are still not annotated in T. castaneum andfurthermore T. castaneum genomic data are not included in Gene-Codis, we used gene numbers for orthologous genes from D. mela-

nogaster for the analysis and compared them with the whole setof 14,869 genes annotated in D. melanogaster. Genecodis analysisrevealed that TCAST-like elements are located near nine genes char-acterized as members of the immunoglobulin protein superfamily.Because there are only 134 immunoglobulin-like genes presentwithin the total set of D. melanogaster genes, random distributionof TCAST-like elements would result in their occurrence near ap-proximately a single immunoglobulin-like gene. The presence ofTCAST-like elements in the vicinity of nine immunoglobulin-likegenes therefore represents a statistically significant overrepresenta-tion (0.00000427). All nine genes exhibit structural features of im-munoglobulin-like, immunoglobulin subtype 1 and immunoglobulinsubtype 2 proteins and are associated with the following TCASTtransposon-like elements: 25 at the 39end, 28 and 39 at the 59 end,32 and 40 within introns, and TCAST satellite-like elements: 8 at the39 end, 19 and 62 at the 59 end, and 41 within intron (Table 1). Aminimal distance between TCAST-like element and immunoglobulin-like gene was 7165 bp and a maximal 173,881 bp (Table 1). Molecularfunction of most of immunoglobulin-like genes is unknown, and theyare involved in different biological processes such as cell adhesion,protein phosphorylation, and axon guidance (Table S1). Although allnine genes belong to immunoglobulin superfamily, they did not ex-hibit sequence similarity, which could suggest role of duplication intheir evolution and spreading. The position of TCAST-like elementsrelative to the genes also was not consistent with the possibility thatTCAST-like elements duplicated along with the immunoglobulin- likegenes.

Overrepresentation of TCAST-like elements was also found neargenes that exhibit ATP-binding activity and axon guidance propertiesbut with a marginal significance (0.0183374 and 0.00865139). Forthe rest of genes, no significant overrepresentation of TCAST-likeelements was detected. Thus, enrichment of TCAST-like elements inthe vicinity of immunoglobulin-like genes potentially implicates arole of TCAST-like elements in the regulation of these genes.

Figure 1 Bayesian/ML phylogenetic trees of: (A) TCAST satellite-like elements (subunits Tcast1a), (B) TCAST satellite-like elements (subunitsTcast1b), and (C) TCAST transposon-like elements. Sequence numbers correspond to those in Table 1. When a particular sequence is composedof few subrepeats (e.g., Tcast1a or Tcast1b), numbers indicating subrepeats are added (e.g., 43_1, 43_2, 43_3). Numbers in brackets indicatechromosomes on which the corresponding sequences are located. Numbers on branches indicate Bayesian posterior probabilities/ML bootstrapsupport (above 0.5/50%, respectively).

Volume 2 August 2012 | Gene-Associated Satellite DNA | 937

DISCUSSIONTEs are classified in several dozen families based on transpositionmechanisms and different dynamics properties (Hua-Van et al. 2005).Active TEs encode the enzymes necessary for their transposition,either to move between nonhomologous regions in the genome orto copy themselves to other positions. In many cases, TEs do notproduce their own enzymes but are able to use those from functionalcopies or even from other TEs families. Defective and inactive TEsoften are amplified in regions of low recombination such as hetero-chromatin and may form tandemly repeated satellite DNAs. Theorigin of satellite DNA array from transposon-like elements isreported for many insects such as Drosophila melanogaster (Agudoet al. 1999), Drosophila guanche (Miller et al. 2000), and the beetleMisolampus goudoti (Pons 2004) whereas the retroviral-like featureswere first observed in the satellite DNA from rodents of the genusCtenomys (Rossi et al. 1993).

Transposons can be inserted into other repetitive sequences suchas satellite DNAs, as has been observed for the mariner-like elementand MITE element, both inserted into satellite DNA of the antMessorbouvieri (Palomeque et al. 2006). Searching for repetitive elementshomologous to the TCAST repeat within Repbase (http://www.girinst.org/repbase/) revealed that 59 UTR of nonlong terminal re-peat retrotransposon CR1-3_TCa (Jurka 2009c) shares a high sim-ilarity of 83% with a 444-bp long TCAST sequence composed of 1.2

tandem monomers (Figure 1). Other CR1 subfamilies identifiedwithin T. castaneum such as CR1-1_ TCa, CR1-2_TCa, and CR1-4_TCa, published in Repbase, do not share similarity to CR1-3 anddo not contain TCAST similar sequence. We propose that CR1-3 wasinserted within TCAST satellite array and through recombinationhas acquired a part of TCAST sequence. Newly acquired TCASTelement could act as a promoter because TCAST satellite DNA hasan internal promoter for RNA Pol II (Pezer and Ugarkovi�c 2012)and becomes a new functional 59 UTR. Subsequent retrotransposi-tion of CR1-3_TCa could explain the dispersion of TCAST withinthe euchromatin (Figure 4). Three CR1-3_TCa elements withTCAST in the 59UTR were identified within scaffolds that havenot been mapped to linkage groups. However, truncated fragmentswith partial homology to CR1-3_TCa retrotransposon can be map-ped within T. castaneum genome, some of them in the vicinity ofTCAST elements. Such arrangement also indicates the role of CR1-3_TCa in the spreading of TCAST elements. There is also a possibilitythat TCAST satellite DNA originates from CR1-3 retrotransposonwhich was, after inactivation, amplified within the heterochromatinregion. In the case of TCAST transposon-like elements, part of thesatellite sequence is incorporated within TIRs which are character-istic for DNA transposons. The presence of target-site duplicationsat the sites of insertions of some TCAST transposon-like elementsalso indicates transposition as a mode of spreading of TCAST

Figure 2 Organization of TCASTelements within T. castaneumgenome in the form of TCASTtransposon-like element, tan-dem arrays, and CR1-3_TCaretrotransposon. Regions corre-sponding to TCAST elementare shown in red. TCAST trans-poson-like element contains analmost complete TCAST mono-mer and a monomer segmentof approximately 121 bp in aninverted orientation, whereasCR1-3 retrotransposon containssegment corresponding to 1.2monomer. Within TCAST trans-poson-like element terminalinverted repeats (arrows) uniquenonsatellite sequence (green),target-site duplication in theform of “ACT,” and the insertionpoint of 925-bp sequence found

within TR 1.9, element and coding for the putative transposase are shown. Three short ORFs within TCAST transposon-like element are alsoindicated. Within nonlong terminal repeat retrotransposon CR1-3_TCa regions corresponding to 59UTR and to two ORFs are indicated.

Figure 3 Distribution of TCAST-like elements on T. cas-taneum chromosomes. The karyotype representing thehaploid set of T. castaneum chromosomes, and posi-tions of constitutive heterochromatin (dark) and euchro-matin (white) are depicted based on C-banding data(Stuart and Mocelin 1995) and T. castaneum 3.0 assem-bly (http://www.beetlebase.org). TCAST transposon-likeelements (blue) and TCAST satellite-like elements (red)are shown. Two TCAST-like elements are representedas separate lines if they are at least 100 kb distant fromeach other.

938 | J. Brajkovi�c et al.

elements. Parts of satellite DNA elements can be found within sometransposons, such as pDv transposon (Evgen’ev et al. 1982; Zelentsovaet al. 1986) whose long direct terminal repeats show significant se-quence similarity to the pvB370 satellite DNA, located in the centro-meric heterochromatin of a number of species of the Drosophila virilisgroup (Heikkinen et al. 1995). The presence of short stretches ofPisTR-A satellite DNA sequences within 39 UTR of Ogre retrotrans-posons dispersed in the pea (Pisum sativum) genome was reported(Macas et al. 2009). Furthermore, the mobilization of subtelomericrepeats upon excision of the transposable P element from tandemlyrepeated subtelomeric sequences has been observed (Thompson-Stewart et al. 1994).

Incorporation of part of a TCAST satellite DNA sequence intoa (retro)transposable element, and its subsequent mobilization andspreading by (retro)transposition, may explain the distribution ofTCAST element in the vicinity of genes within euchromatin. SatelliteDNA sequences are prone to undergo recurrent repeat copy numberexpansion and contraction in divergent lineages as well as amongpopulations of the same species (Bosco et al. 2007). This amplificationappears to be random and does not correlate with phylogeny of thespecies (Pons et al. 2004; Lee et al. 2005; Bulazel et al. 2007). Ampli-fication of a satellite sequence is reported to occur as a result of un-equal crossing over or duplicative transposition (Smith 1976; Ma andJackson 2006). The discovery of human extrachromosomal elementsoriginating from satellite DNA arrays in cultured human cells anddifferent plant species indicates the possible existence of additionalamplification mechanisms based on rolling-circle replication (Assumet al. 1993; Navrátilová et al. 2008). It has been proposed that satellitesequences excised from their chromosomal loci via intrastrand recom-bination could be amplified in this way, followed by reintegration oftandem arrays into the genome (Feliciello et al. 2006). Moreover, it ispossible that such a mechanism affected TCAST satellite DNA, andthat extrachromosamal circles of TCAST were reintegrated into

different genome locations by homologous recombination basedon short stretches of sequence similarity between TCAST satelliteand target genomic sequence (Figure 4). Integrated TCAST sequen-ces are mainly composed of interspersed elements belonging to twomajor subfamilies, Tcast1a and Tcast1b, which is a prevalent type oforganization in pericentromeric heterochromatin (Feliciello et al.2011). This finding indicates that the origin of dispersed euchro-matic TCAST elements may be duplication of heterochromatincopies.

The distribution of TCAST-like elements relative to protein codinggenes revealed no specific preference for insertions within introns orat 59 or 39 ends of genes. TCAST-like elements are distributed on allchromosomes with no significant deviation in the number among thechromosomes, and phylogenetic analysis did not detect any significantsequence clustering of TCAST-like elements derived from the samechromosome. Dispersed TCAST satellite-like elements produce tan-dem arrays up to tetramers, but repeats from the same array do notreveal any significant clustering on phylogenetic trees. This findingindicates there is no significant difference in the homogenization ofTCAST satellite-like repeats at the level of local arrays or chromosomeor among different chromosomes. The average pair-wise sequencedivergence (6% for dispersed TCAST satellite-like repeats) is greaterthan the usual divergence of satellite elements located in heterochro-matin of tenebrionid beetles [approximately 2% (Ugarkovi�c et al.1996)]. This difference in homogeneity between repeats located inheterochromatin and euchromatin may be explained by a lower rateof gene conversion affecting dispersed satellite-like elements or bya specific mechanism of DNA repair acting on satellite DNA (Felicielloet al. 2006). TCAST transposon-like elements dispersed among thegenes within euchromatin have an even greater average sequencedivergence (approximately 12%) and also exhibit no significantchromosome-specific sequence clustering, indicating a similar rate ofhomogenization within and among the chromosomes. Relatively high

Figure 4 Models of spreading of TCAT-like elementsbased on (A) retrotransposition of CR-3_TCa element.CR1-3_TCa was inserted within TCAST satellite arrayand through recombination has acquired a part ofTCAST sequence, which could act as a promoter andbecome a new functional 59UTR. Subsequent retrotrans-position of CR1-3_TCa could explain the dispersion ofTCAST within the euchromatin. (B) Rolling circle replica-tion of TCAST satellite DNA sequences excised fromtheir heterochromatin loci via intrastrand recombina-tion, followed by reintegration into different genomelocations by homologous recombination.

Volume 2 August 2012 | Gene-Associated Satellite DNA | 939

sequence divergence of TCAST transposon-like elements and the sig-nificant truncation of the majority of them, indicates that the trans-position of these elements did not occur very recently and that theseelements could be considered as molecular fossils of the functionalTCAST transposon-like elements.

Cis-regulatory elements, such as promoters or transcription factorbinding sites, are predicted in some satellite DNAs (Pezer et al. 2011).Transcription from promoters for RNA Pol II is also characteristic forpericentromeric satellite DNAs from the beetles Palorus ratzeburgiiand Palorus subdepressus (Pezer and Ugarkovi�c 2008, 2009). Temper-ature-sensitive transcription of TCAST satellite DNA from an internalRNA Pol II promoter has been demonstrated (Pezer and Ugarkovi�c2012). Based on these findings, it can be proposed that TCAST ele-ments located in the vicinity of genes may function as alternativepromoters, and transcripts derived from them may interfere withthe expression of neighboring gene. This type of regulation is oftenobserved for retrotransposons positioned immediately 59 of proteingenes (Faulkner et al. 2009). In addition, some tissue-specific genepromoters are derived from retrotransposons (Ting et al. 1992;Samuelson et al. 1996). Because of rapid evolutionary turnover, satel-lite DNA sequences often are restricted to a group of closely relatedspecies, or in some instances are species specific. This is the case withTCAST satellite DNA, which is not even detected in the congenericTribolium species. If restricted satellite DNAs have regulatory poten-tial, then insertion of these elements in vicinity of genes could con-tribute to the establishment of lineage-specific or species-specificpatterns of gene expression. Annotation of genes in proximity toTCAST-like elements demonstrated a statistical overrepresentationof certain groups of genes, for example, those with immunoglobu-lin-like domains. Recently, in the fish Salvelinus fontinalis, a regulatoryrole of a 32-bp satellite repeat, located in an intron of the majorhistocompatibility complex gene (MHIIb), on MHIIb gene expressionwas demonstrated (Croisetiere et al. 2010). The level of gene expres-sion depends on temperature, as well as the number of satelliterepeats, and indicates a role for temperature-sensitive satellite DNAin gene regulation of the adaptive immune response. Further studiesare necessary to determine whether TCAST-like elements exhibit a po-tential regulatory role on nearby genes. The transcriptional potentialof satellite DNAs as well as their distribution close to protein-codinggenes, as shown in this study, provides strong support, that in additionto transposons, satellite DNAs represent a rich source for the assemblyof gene regulatory systems.

ACKNOWLEDGMENTSThis work was supported by Croatian Ministry of Science, Educationand Sport (grant no. 098-0982913-2832), European Union FP6 MarieCurie Transfer of Knowledge (grant MTKD-CT-2006-042248), andCOST Action TD0905 “Epigenetics: Bench to Bedside.”

LITERATURE CITEDAgrawal, R., T. Imielinski, and A. Swami, 1993 Mining association rules

between sets of items in large databases, pp. 207–216 in the Proceedings ofthe ACM SIGMOD International Conference on Management of Data,edited by P. Buneman and S. Jajodia. ACM Press, New York.

Agudo, M., A. Losada, J. P. Abad, S. Pimpinelli, P. Ripoll et al.,1999 Centromeres from telomeres? The centromeric region of the Ychromosome of Drosophila melanogaster contains a tandem array oftelomeric HeT-A- and TART-related sequences. Nucleic Acids Res. 27:3318–3324.

Assum, G., T. Fink, T. Steinbeisser, and K. J. Fisel, 1993 Analysis of humanextrachromosomal DNA elements originating from different beta-satellitesubfamilies. Hum. Genet. 91: 489–495.

Bosco, G., P. Campbell, J. T. Leiva-Neto, and T. A. Markow, 2007 Analysisof Drosophila species genome size and satellite DNA content revealssignificant differences among strains as well as between species. Genetics177: 1277–1290.

Britten, R. J., and E. H. Davidson, 1971 Repetitive and non-repetitive DNAsequences and a speculation on the origins of evolutionary novelty. Q.Rev. Biol. 46: 111–138.

Bulazel, K. V., G. C. Ferreri, M. D. Eldridge, and R. J. O’ Neill, 2007 Species-specific shifts in centromere sequence composition are coincident withbreakpoint reuse in karyotypically divergent lineages. Genome Biol. 8:R170.

Capy, P., C. Bazin, D. Higuet, and T. Langin, 1998 Dynamics and Evolutionof Transposable Elements. Springer-Verlag, Austin, TX.

Croisetiere, S., L. Bernatchez, and P. Belhumeur, 2010 Temperature andlength-dependent modulation of the MH class IIb gene expression inbrook charr (Salvelinus fontinalis) by a cis-acting minisatellite. Mol. Im-munol. 47: 1817–1829.

Edgar, R. C., 2004 MUSCLE: multiple sequence alignment with high ac-curacy and high throughput. Nucleic Acids Res. 32: 1792–1797.

Evgen’ev, M. B., G. N. Yenikolopov, N. I. Peunova, and Y. V. Ilyin,1982 Transposition of mobile genetic elements in interspecific hybridsof Drosophila. Chromosoma 85: 375–386.

Faulkner, G. J., Y. Kimura, C. O. Daub, S. Wani, C. Plessy et al., 2009 Theregulated retrotransposon transcriptome of mammalian cells. Nat. Genet.41: 563–571.

Feliciello, I., O. Picariello, and G. Chinali, 2006 Intra-specific variability andunusual organization of the repetitive units in a satellite DNA from Ranadalmatina: molecular evidence of a new mechanism of DNA repair actingon satellite DNA. Gene 383: 81–92.

Feliciello, I., G. Chinali, and Ð. Ugarkovi�c, 2011 Structure and evolutionarydynamics of the major satellite in the red flour beetle Tribolium casta-neum. Genetica 139: 999–1008.

Feschotte, C., 2008 Transposable elements and the evolution of regulatorynetworks. Nat. Rev. Genet. 9: 397–405.

Feschotte, C., and E. J. Pritham, 2007 DNA transposons and the evolutionof eukaryotic genomes. Annu. Rev. Genet. 41: 331–368.

Guindon, S., and O. Gascuel, 2003 A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704.

Hall, T. A., 1999 BioEdit: a user-friendly biological sequence alignmenteditor and analysis program for Windows 95/98/NT. Nucl. Acids Symp.Ser. 41: 95–98.

Heikkinen, E., V. Launonen, E. Müller, and L. Bachmann, 1995 The pvB370BamHI satellite DNA family of the Drosophila virilis group and its evo-lutionary relation to mobile dispersed genetic pDv elements. J. Mol. Evol.41: 604–614.

Hua-Van, A., A. Le Rouzic, C. Maisonhaute, and P. Capy, 2005 Abundance,distribution and dynamics of retrotransposable elements and transpo-sons: similarities and differences. Cytogenet. Genome Res. 110: 426–440.

Huelsenbeck, J. P., and F. Ronquist, 2001 MRBAYES: Bayesian inference ofphylogeny. Bioinformatics 17: 754–755.

Jurka, J., 2009a Mariner-1_TCa. Repbase Rep. 9: 674.Jurka, J., 2009b Mariner-2_TCa. Repbase Rep. 9: 675.Jurka, J., 2009c CR1–3_TCa. Repbase Rep. 9: 737.Kapitonov, V. V., and J. Jurka, 2003 Molecular paleontology of transposable

elements in the Drosophila melanogaster genome. Proc. Natl. Acad. Sci.USA 100: 6569–6574.

Kohany, O., A. J. Gentles, L. Hankus, and J. Jurka, 2006 Annotation, sub-mission and screening of repetitive elements in Repbase: RepbaseSub-mitter and Censor. BMC Bioinformatics 7: 474.

Kuhn, G. C., H. Küttler, O. Moreira-Filho, and J. S. Heslop-Harrison, 2012 The1.688 Repetitive DNA of Drosophila: concerted evolution at different geno-mic scales and association with genes. Mol. Biol. Evol. 29: 7–11.

Lee, H. R., W. Zhang, T. Langdon, W. Jin, H. Yan et al., 2005 Chromatinimmunoprecipitation cloning reveals rapid evolutionary patterns ofcentromeric DNA in Oryza species. Proc. Natl. Acad. Sci. USA 102:11793–11798.

940 | J. Brajkovi�c et al.

Lowe, C. B., G. Bejerano, and D. Haussler, 2007 Thousands of humanmobile element fragments undergo strong purifying selection near de-velopmental genes. Proc. Natl. Acad. Sci. USA 104: 8005–8010.

Ma, J., and S. A. Jackson, 2006 Retrotransposon accumulation and satelliteamplification mediated by segmental duplication facilitate centromereexpansion in rice. Genome Res. 16: 251–259.

Macas, J., A. Kobli�zkova, A. Navratilova, and P. Neumann,2009 Hypervariable 39UTR region of plant LTR-retrotransposons asa source of novel satellite repeats. Gene 448: 198–206.

Miller, W. J., A. Nagel, J. Bachmann, and L. Bachmann, 2000 Evolutionarydynamics of the SGM transposon family in the Drosophila obscura speciesgroup. Mol. Biol. Evol. 17: 1597–1609.

Navrátilová, A., A. Koblízková, and J. Macas, 2008 Survey of extrachro-mosomal circular DNA derived from plant satellite repeats. BMC PlantBiol. 8: 90.

Palomeque, T., J. A. Carrillo, M. Munos-Lopez, and P. Lorite,2006 Detection of a mariner-like element and a miniature inverted-repeat transposable element (MITE) associated with the heterochromatinfrom ants of the genus Messor and their possible involvement for satelliteDNA evolution. Gene 371: 194–205.

Pezer, �Z., and Ð. Ugarkovi�c, 2008 RNA Pol II promotes transcription ofcentromeric satellite DNA in beetles. PLoS ONE 3: e1594.

Pezer, �Z., and Ð. Ugarkovi�c, 2009 Transcription of pericentromeric het-erochromatin in beetles – satellite DNAs as active regulatory elements.Cytogenet. Genome Res. 124: 268–276.

Pezer, �Z., and Ð. Ugarkovi�c, 2012 Satellite DNA-associated siRNAs asmediators of heat shock response in insects. RNA Biol. 9: 587–595.

Pezer, �Z., J. Brajkovi�c, I. Feliciello, and Ð. Ugarkovi�c, 2011 Transcription ofsatellite DNAs in Insects. Prog. Mol. Subcell. Biol. 51: 161–179.

Pons, J., 2004 Cloning and characterization of a transposable-like repeat inthe heterochromatin of the darkling beetle Misolampus goudoti. Genome47: 769–774.

Pons, J. B., E. Bruvo, M. Petitpierre, Ð. Plohl, Ugarkovi�c et al.,2004 Complex structural feature of satellite DNA sequences in the ge-nus Pimelia (Coleoptera: Tenebrionidae): random differential amplifica-tion from a common “satellite DNA library”. Heredity 92: 418–427.

Posada, D., 2008 jModelTest. Phylogenetic model averaging. Mol. Biol.Evol. 25: 1253–1256.

Richards, S., R. A. Gibbs, G. M. Weinstock, S. J. Brown, R. Denell et al.,2008 The genome of the model beetle and pest Tribolium castaneum.Nature 452: 949–955.

Rossi, M. S., C. G. Pesce, O. A. Reig, A. R. Kornblihtt, and J. Zorzópulos,1993 Retroviral-like features in the monomer of the major satellite

DNA from the South American rodents of the genus Ctenomys. DNASeq. 3: 379–381.

Samuelson, L. C., R. S. Phyllips, and L. J. Swanberg, 1996 Amylase genestructures in primates: retroposon insertions and promoter evolution.Mol. Biol. Evol. 13: 767–779.

Smith, P. G., 1976 Evolution of repeated sequences by unequal crossover.Science 191: 528–535.

Stuart, J. J., and G. Mocelin, 1995 Cytogenetics of chromosome rearrange-ments in Tribolium castaneum. Genome 38: 673–680.

Talavera, G., and J. Castresana, 2007 Improvement of phylogenies afterremoving divergent and ambiguously aligned blocks from protein se-quence alignments. Syst. Biol. 56: 564–577.

Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei et al.,2011 MEGA5: Molecular Evolutionary Genetics Analysis using maxi-mum likelihood, evolutionary distance, and maximum parsimonymethods. Mol. Biol. Evol. 28: 2731–2739.

Thompson-Stewart, D., G. H. Karpen, and A. C. Spradling, 1994 A trans-posable element can drive the concerted evolution of tandemly repetitiousDNA. Proc. Natl. Acad. Sci. USA 91: 9042–9046.

Ting, C. N., M. P. Rosenberg, C. M. Snow, L. C. Samuelson, and M. H.Meisler, 1992 Endogenous retroviral sequences are required for tissue-specific expression of a human salivary amylase gene. Genes Dev. 6:1457–1465.

Ugarkovi�c, Ð., 2005 Functional elements residing within satellite DNAs.EMBO Rep. 6: 1035–1039.

Ugarkovi�c, Ð., and M. Plohl, 2002 Variation in satellite DNA profiles—causes and effects. EMBO J. 21: 5955–5959.

Ugarkovi�c, Ð., M. Podnar, and M. Plohl, 1996 Satellite DNA of the redflour beetle Tribolium castaneum—comparative study of satellites fromthe genus Tribolium. Mol. Biol. Evol. 13: 1059–1066.

Wang, S., M. D. Lorenzen, R. W. Beeman, and S. J. Brown, 2008 Analysis ofrepetitive DNA distribution patterns in the Tribolium castaneum genome.Genome Biol. 9: R61.

Waterhouse, R. M., E. M. Zdobnov, F. Tegenfeldf, J. Li, and E. V. Kriventseva,2011 OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.Nucleic Acids Res. 39: D283–D288.

Zelentsova, E. S., R. P. Vashakidze, A. S. Krayev, and M. B. Evgen’ev,1986 Dispersed repeats in Drosophila virilis: elements mobilized byinterspecific hybridization. Chromosoma 93: 469–476.

Zuker, M., 2003 Mfold web server for nucleic acid folding and hybridiza-tion prediction. Nucleic Acids Res. 31: 3406–3415.

Communicating editor: J. A. Scott

Volume 2 August 2012 | Gene-Associated Satellite DNA | 941