Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
INVESTIGATION
Satellite DNA-Like Elements Associated With GenesWithin Euchromatin of the Beetle TriboliumcastaneumJosip Brajkovi�c,* Isidoro Feliciello,*,† Branka Bruvo-MadWari�c,* and ÐurdWica Ugarkovi�c*,1
*Department of Molecular Biology, RudWer Bo�skovi�c Institute, Bijeni�cka 54, HR-10000 Zagreb, Croatia, and †Dipartimentodi Medicina Clinica e Sperimentale, Università degli Studi di Napoli Federico II, via Pansini 5, I80131, Napoli, Italy
ABSTRACT In the red flour beetle Tribolium castaneum the major TCAST satellite DNA accounts for 35%of the genome and encompasses the pericentromeric regions of all chromosomes. Because of the presenceof transcriptional regulatory elements and transcriptional activity in these sequences, TCAST satellite DNAsalso have been proposed to be modulators of gene expression within euchromatin. Here, we analyze thedistribution of TCAST homologous repeats in T. castaneum euchromatin and study their association withgenes as well as their potential gene regulatory role. We identified 68 arrays composed of TCAST-likeelements distributed on all chromosomes. Based on sequence characteristics the arrays were composed oftwo types of TCAST-like elements. The first type consists of TCAST satellite-like elements in the form ofpartial monomers or tandemly arranged monomers, up to tetramers, whereas the second type consists ofTCAST-like elements embedded with a complex unit that resembles a DNA transposon. TCAST-like ele-ments were also found in the 59 untranslated region (UTR) of the CR1-3_TCa retrotransposon, and thereforeretrotransposition may have contributed to their dispersion throughout the genome. No significant differ-ence in the homogenization of dispersed TCAST-like elements was found either at the level of local arraysor chromosomes nor among different chromosomes. Of 68 TCAST-like elements, 29 were located withinintrons, with the remaining elements flanked by genes within a 262 to 404,270 nt range. TCAST-likeelements are statistically overrepresented near genes with immunoglobulin-like domains attesting to theirnonrandom distribution and a possible gene regulatory role.
KEYWORDS
repetitive DNAsatellite DNAgene regulationtransposonimmunoglobulin-like genes
Based on the hypothesis of Britten and Davidson (1971), repetitiveelements can be a source of regulatory sequences and act to distributeregulatory elements throughout the genome. In particular, mobiletransposable elements (TEs) are predicted to be a source of noncodingmaterial that allows for the emergence of genetic novelty and influ-ences evolution of gene regulatory networks (Feschotte 2008). Re-cently it has been shown that at least 5.5% of conserved noncodingelements unique to mammals originate from mobile elements and are
preferentially located close to genes involved in development andtranscription regulation (Lowe et al. 2007). The complete sequenceconservation, wide evolutionary distribution, and presence of func-tional elements such as promoters and transcription factor bindingsites within some satellite DNA sequences has led to the assumptionthat in addition to participating in centromere formation, they mightalso act as cis-regulatory elements of gene expression (Ugarkovi�c2005). To perform potential regulatory functions, satellite DNA ele-ments are predicted to be preferentially distributed in euchromaticportion of the genomes in the vicinity of genes. Whole-genome se-quencing projects enable the presence and distribution of satelliteDNA repeats in the euchromatic portion of the genome to be de-termined. The analysis of satellite DNA-like elements dispersed withineuchromatin, and their comparison with homologous elements pres-ent within heterochromatin, also may reveal insights into the origin ofsatellite DNAs and their subsequent evolution (Kuhn et al. 2012).
Satellite DNAs are major building elements of pericentromericand centromeric heterochromatin in many eukaryotic species, and in
Copyright © 2012 Brajkovi�c et al.doi: 10.1534/g3.112.003467Manuscript received April 30, 2012; accepted for publication June 18, 2012This is an open-access article distributed under the terms of the CreativeCommons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in anymedium, provided the original work is properly cited.Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.112.003467/-/DC1.1Corresponding author: Department of Molecular Biology, RudWer Bo�skovi�cInstitute, Bijeni�cka 54, HR-10000 Zagreb, Croatia. E-mail: [email protected]
Volume 2 | August 2012 | 931
certain species they account for the majority of genomic DNA, as inbeetles from the coleopteran family Tenebrionidae (Ugarkovi�c andPlohl 2002). In the red flour beetle Tribolium castaneum, pericentro-meric heterochromatin comprises approximately 40% of the genome,and TCAST satellite DNA has previously been characterized as themajor satellite that encompasses centromeric as well as pericentro-meric regions of all 20 chromosomes (Ugarkovi�c et al. 1996). TCASTsatellite is composed of two subfamilies, Tcast1a and Tcast1b, whichtogether comprise 35% of the whole genome. Tcast1a and Tcast1bhave an average homology of 79% and are a similar size at 362 bp and377 bp, respectively, but they are characterized by a divergent, sub-family specific region of approximately 100 bp (Feliciello et al. 2011).The genome sequencing project of T. castaneum has recently beencompleted (Richards et al. 2008). Sequencing involved the euchro-matic portion of the genome, with.20% of the genome, correspond-ing to heterochromatic regions, excluded due to technical difficulties.
In this article, we searched for the presence of TCAST satellite-homologous elements within the assembled T. castaneum genome byusing a comprehensive computational analysis. By searching the se-quenced T. castaneum genome, we found 68 TCAST satellite DNAarrays within the euchromatin of all chromosomes. They were map-ped to 59 or 39 ends, as well as within introns, of more than 100protein-coding genes. Based on sequence characteristics, dispersedTCAST-like elements were classified into two groups. The first groupincludes partial TCAST satellite monomers or short arrays of tan-demly arranged monomers up to tetramers. The second group con-tains TCAST-like element embedded within complex repeat units thatcontain two hallmarks of DNA transposons, terminal inverted repeatsand target-size duplications. The evolutionary relationship and possi-ble modes of dispersion of the two types of dispersed TCAST-likesequences are discussed. In addition, we examined the sequence di-vergence, phylogenetic relationship, and chromosomal distribution ofthe elements. Annotation, characterization, and classification of geneswithin the region of TCAST-like elements are reported, with thepreferential localization of TCAST-like elements near specific groupsof genes identified. Our results demonstrate for the first time, theenrichment of satellite DNA-like elements in the vicinity of genes withimmunoglobulin-like domains and suggest their possible gene-regulatoryrole.
MATERIALS AND METHODSBLASTN version 2.2.22+ was used to screen the NCBI refseq_ge-nomic database of T. castaneum. All scaffolds that have not beenmapped to linkage groups were also screened. The program was op-timized to search for highly similar sequences (megablast) to the querysequence [TCAST consensus sequence (Ugarkovi�c et al. 1996)]. Genesflanking TCAST–homologous elements were found automatically byNCBI blast. Sequences corresponding to hits, as well as their flankingregions, were analyzed by dot plot (http://www.vivo.colostate.edu/molkit/dnadot/), using standard parameters (window size 9, mismatchlimit 0), or more relaxed conditions (window size 11, mismatch limit 1),to determine the exact start and end site of specific TCAST-likeelements. The TCAST transposon-like elements were analyzed indetail for the presence of hallmarks such as terminal inverted repeats(TIRs) and target-site duplications with the aid of the Gene Jockeysequence analysis program (for Apple Macintosh). Secondary struc-tures were determined using the default parameters of the MFOLDprogram available online [http://mfold.rna.albany.edu/?q=mfold (Zuker2003)]. AT content was analyzed using BioEdit Sequence AlignmentEditor (Hall 1999). Repbase, a reference database of eukaryotic repetitiveDNA, was screened using WU-BLAST (Kohany et al. 2006).
Sequence alignment was performed using MUSCLE algorithm(Edgar 2004) combined with manual adjustment. All sequences wereincluded in the alignment, with the exception of the ones that did notat least partially overlap with other sequences. Gblocks was used toeliminate poorly aligned positions and divergent regions of the align-ments (Talavera and Castresana 2007). Alignments (original fastafiles) are available upon request. jModelTest 0.1.1 software (Posada2008) was used to infer best-fit models of DNA evolution—TPM3uf+Gfor transposon-like and A type elements and TPM1uf for B typeelements. Maximum likelihood (ML) trees were estimated with thePhyML 3.0 software (Guindon and Gascuel 2003) using best-fitmodels. Markov chain Monte Carlo Bayesian searches were per-formed in MrBayes v. 3.1.2. (Huelsenbeck and Ronquist 2001) underthe best-fit models (two simultaneous runs, each with four chains; 3· 106 generations; sampling frequency one in every 100 generations;majority rule consensus trees constructed based on trees sampledafter burn-in). Branch support was evaluated by bootstrap analysis(1000 replicates) in ML and by posterior probabilities in Bayesiananalyses. Pairwise sequence diversity (uncorrected P) was calculatedusing the MEGA 5.05 software (Tamura et al. 2011).
T. castaneum gene homologs in Drosophila melanogaster weresearched using the OrthoDB Phylogenomic database. Each gene hasOrthoDB identificator, with Uniprot data linked to OrthoDB (Water-house et al. 2011). To find sets of biological annotations that fre-quently appear together and are significantly enriched in a set ofgenes located near TCAST-like elements, program GeneCodis 2.0available online (http://genecodis.dacya.ucm.es/) was used. GeneCodisgenerates statistical rank scores for single annotations and their com-binations. To find all the possible combinations of annotations, Gen-eCodis uses the apriori algorithm introduced by Agrawal et al. 1993.Once the annotations were extracted, a statistical analysis based on thehypergeometric distribution or the x2 test of independence was exe-cuted to calculate the statistical significance (P values) for each in-dividual annotation or co-annotations.
Two-tailed hypergeometric test with Bonferroni correction (alpha =0.025) was used to analyze the distribution of TCAST-like elementsamong T. castaneum chromosomes. In each chromosome the fre-quency of TCAST-like elements was compared with the frequency inthe complete sample and the significance of deviations was calculated.
RESULTS
Identification of dispersed TCAST-like elementsUsing the consensus sequence of TCAST satellite DNA (Ugarkovi�cet al. 1996) as a query sequence, we screened the NCBI refseq_genomic database of T. castaneum with the alignment programBLASTN version 2.2.22+. The program was optimized to search forhighly similar sequences (megablast) and blast hits on the query se-quence were analyzed individually. Alignments were mapped regard-ing start and end site, chromosome number, and total length. Whenthe distance between two alignments on the same chromosome wasshort, the genomic sequence was further analyzed by dot plot toidentify any potential continuity between the two alignments. Onlygenomic sequences with at least 140 nt (40% of TCAST monomerlength) of continuous sequence and .80% identity to the TCASTconsensus sequence were considered for further analysis. The totalnumber of dispersed TCAST-like elements was 68, with 36 elementsflanked by genes at both 59 and 39 ends, 3 elements flanked by a singlegene either at 59 or 39 end (sequences no. 36, 39, 50), and the 29elements positioned within introns (Table 1). Except 68 TCAST-likeelements associated with genes, no other dispersed TCAST-like
932 | J. Brajkovi�c et al.
elements were found within the assembled T. castaneum genome.Analysis of scaffolds that have not been mapped to linkage groupsrevealed the presence of an additional 41 TCAST-like elements, butbecause they were not mapped to T. castaneum genome and couldpossibly derive from heterochromatin, we did not consider them forfurther analysis.
There were only three cases in which two different TCAST-likeelements were associated with the same gene: gene D6X2C4 containsTCAST-like sequences no. 6 and 13 within introns, gene D6X2U7 isflanked at 59 and 39 end by sequences no. 5 and 7, respectively,whereas gene D6WB29 is located at 39 end of the sequence no. 53and has sequence no. 52 within an intron. All other TCAST-likeelements were positioned near or within different genes. Thus in total,there were 101 genes found in the vicinity of TCAST-like elements.Characteristics of the genes associated with TCAST-like elements,including gene identity number, gene name and chromosomal loca-tion, position relative to the associated TCAST-like element, and dis-tances between TCAST-like elements and genes, are shown in Table 1Distances between TCAST-like elements and genes range from 262 nt(gene positioned at 39 site of the sequence no. 36), to a maximaldistance of 404,270 nt (gene positioned at 59 site of the sequenceno. 5).
Characteristics of TCAST-like elementsTCAST satellite-like elements: Sequence analysis of the 68 TCAST-like elements identified within the vicinity of genes enabled theirclassification into two groups. The first group contains partial TCASTsatellite monomers or tandemly arranged elements, either complete orpartial dimers, trimers, or tetramers (Table 1). The minimal size ofsatellite repeat was 203 nt (0.6 of complete TCAST monomer; se-quence no. 15), whereas the maximal size was 1440 nt (four completeTCAST monomers; sequence no. 43; Table 1). In many sequences,two subtypes of TCAST satellite monomers were mutually inter-spersed: Tcast1a and Tcast1b. Tcast1b corresponds to the TCASTsatellite consensus that was used as a query sequence (Ugarkovi�c et al.1996), and Tcast1a corresponds to the TCAST subfamily described inFeliciello et al. 2011. Tcast1a and Tcast1b have an average homologyof 79% and are of similar sizes at 362 bp and 377 bp respectively, butare characterized by a divergent, subfamily specific region of approx-imately 100 bp (Feliciello et al. 2011). There were 34 TCAST satellite-like elements found within or in the region of 53 genes. Lengths ofTCAST satellite-like elements (Table 1), their exact start and end siteswithin genomic sequence and composition (supporting information,Table S1) are provided.
To see whether there is any clustering of sequences of TCASTsatellite-like elements due to the difference in the homogenization atthe level of local array, chromosome, or among different chromo-somes, sequence alignment and phylogenetic analysis were performed.Tcast1a and Tcast1b subunits were extracted from TCAST satellite-like sequences and analyzed separately. Alignment was performed on24 Tcast1a subunits, ranging in size from 136 and 377 bp (File S1).The average pairwise distances between Tcast1a subunits of TCASTsatellite-like sequences was 5.8%. Alignment adjustment using Gblocks,which eliminates poorly aligned positions and divergent regions,resulted in few changes; therefore, the original, unadjusted align-ment was used for the construction of phylogenetic trees. Becausethe sequences differ in lengths and comprise regions of divergentvariability, methods that take into account specific models of DNAevolution were considered as the most suitable for the constructionof phylogenetic trees, ML and Bayesian (Markov chain Monte
Carlo). The ML tree showed weak resolution with no significantsupport for clustering of sequences derived from the same satellite-likearray or from the same chromosome. Similarly, the Bayesian tree dem-onstrated no significant sequence clustering (Figure 1A).
Alignment of 28 Tcast1b subunits, ranging from 159 bp to 363 bp(File S2), was also not significantly affected by adjustment withGblocks; therefore, the unadjusted alignment was used for the con-struction of phylogenetic trees (Figure 1B). The average pairwise di-vergence between Tcast1b subunits, of TCAST satellite-like sequences,was 6.7%. With the ML phylogenetic tree, four groups composed oftwo or three sequences, were resolved by relatively low bootstrapvalues. However, the majority of Tcast1b subunit sequences remainedunresolved. There was no clustering of subunits derived from thesame array or the same chromosome (Figure 1B). Bayesian tree anal-ysis produced one significantly supported cluster composed of 10sequences derived from 7 chromosomes (Figure 1B).
TCAST transposon-like elements: The second group of TCAST-likerepeats is represented by a complex element that contains an almostcomplete TCAST (or Tcast1b) monomer, and a TCAST monomersegment of approximately 121 bp in an inverted orientation. Thesetwo TCAST segments are separated by a nonsatellite sequence ofapproximately 306 bp. Both TCAST segments are part of TIRs that areapproximately 269 bp long (Figure 2). As a result of the long TIRs,these elements are likely to form stable hairpin secondary structuresand therefore resemble transposons. The nonsatellite part of sequence,common for all TCAST transposon-like elements, is unique in that itdoes not exhibit significant homology to any other sequence withinthe T. castaneum genome. There were 34 TCAST transposon-likeelements found within or in the vicinity of 50 genes. Their lengths(Table 1) and exact start and end sites within genomic sequence(Table S1) are provided. Sequence analysis of TCAST transposon-likeelements determined that 13 of them were. 1000 bp, with a maximalsize of 1181 bp (Table 1). The remaining TCAST transposon-likeelements were shorter, with a minimal size of 314 bp (sequence no.27), and usually lacking part of, or one or both, TIRs. Conserved TIRsare necessary for transposition, and if they are absent, truncated, ormutated so that the transposase cannot interact with the transposonsequence, the transposon cannot be mobilized and therefore repre-sents a molecular fossil of a once active transposon (Capy et al. 1998).Despite mutations and partial truncations of TIRs within the TCASTtransposon-like elements, and likely because of the length of the TIRs,most of the elements still preserve a stable secondary structure andcould potentially remain mobile.
Some TCAST transposon-like elements .1000 bp have a 3-bpduplication at the site of insertion in the form of ACT. One TCASTtransposon-like element (sequence no. 39) is inserted into anotherrepetitive DNA, indicated as Tcast2, which had been previously iden-tified bioinformatically (Wang et al. 2008). Sequence analysis of thistransposon-like element confirms the continuity of Tcast2 from theduplication site “ACT.” Typically, the size of target-site duplication isa hallmark of different superfamilies of eukaryotic DNA transposons,with mariner/Tc1, the only superfamily whose members are charac-terized by either 2- or 3-bp target-site duplication (Capy et al. 1998;Kapitonov and Jurka 2003; Feschotte and Pritham 2007). There arethree open reading frames (ORFs) within TCAST transposon-likesequences, but the resulting putative proteins are very short and donot share similarity with any other proteins (Figure 2). The elementstherefore do not code for transposases and are considered nonauton-omous. Using the whole TCAST transposon-like elements as a query
Volume 2 August 2012 | Gene-Associated Satellite DNA | 933
nTa
ble
1TC
AST
-like
elem
ents
asso
ciated
withgen
eswithinT.ca
staneum
euch
romatin
Uniprot
Entrez
Gen
eNam
eChr
Sat_seq.
Positio
nDistanc
e,bp
DM
Hom
olog
FBgn
Type
Leng
thCop
ies
D6W
ZP1
6625
64Alte
reddisjunc
tion
91
5918
,773
Q9V
EH1
FBgn0
0000
63Sa
tellite
734
2.0
D6W
ZP3
6626
24Ra
s-relatedprotein
Rab-26
91
3977
95Q9V
P48
FBgn0
0869
13Sa
tellite
734
2.0
D6W
ZL9
6619
47Prob
able
serin
e/threon
ine-protein
kina
se9
2Inside
Q0K
HT7
FBgn0
0526
66Sa
tellite
993
2.8
D6X
226
6602
75Arrest
93
5999
,669
Q8IP8
9FB
gn0
0001
14Sa
tellite
716
2.0
D6X
238
6617
41Num
b9
339
115,98
4P1
6554
FBgn0
0029
73Sa
tellite
716
2.0
1001
4183
2no
match
onun
iprot
94
5915
20Sa
tellite
517
1.4
D6X
2D0
6604
40Sh
ort-ch
aindeh
ydrogen
ase
94
3967
04Q9V
E80
FBgn0
0386
10Sa
tellite
517
1.4
D6X
1E7
6568
84Cytoc
hrom
eP4
5030
6A1
95
5940
4,27
0Q9V
WR5
FBgn0
0049
59Sa
tellite
1058
2.9
D6X
2U7
6569
77Elon
gase
95
3999
47Q9V
CY6
FBgn0
0389
86Sa
tellite
1058
2.9
D6X
2C4
6601
95Dop
aminerece
ptor1
96
Inside
P415
96FB
gn0
0115
82Sa
tellite
304
0.8
D6X
2U7
6569
77Elon
gase
97
5971
28Q9V
CY6
FBgn0
0389
86Sa
tellite
394
1.1
D6X
366
6570
55elon
gationof
very
long
chainfatty
acidsprotein
97
3950
,111
Q9V
CY5
FBgn0
0531
10Sa
tellite
394
1.1
D6X
0D7
6577
48Re
ton
cogen
e9
859
56,625
Q8INU0
FBgn0
0118
29Sa
tellite
213
0.6
D6X
0E1
6578
29Dpr9
98
3962
,781
Q9V
FD9
FBgn0
0382
82Sa
tellite
213
0.6
D6X
2H8
6549
54ADAM
metalloprotease
99
Inside
Q6Q
U65
FBgn0
0513
14Tran
sposon
1107
D6X
2U7
6555
61Elon
gase
910
5947
,902
Q9V
CZ0
FBgn0
0389
83Tran
sposon
1085
D6X
2V3
6556
40Pu
tativ
eun
characterized
protein
910
3967
,953
Q9V
DB7
FBgn0
0388
81Tran
sposon
1085
D6X
244
6550
11Se
rine/threon
ine-protein
kina
se32
B9
11Inside
Q0K
ID3
FBgn0
0529
44Tran
sposon
1062
D6X
374
1001
4152
1Pu
tativ
eun
characterized
protein
912
Inside
Q9V
GZ4
FBgn0
0378
14Sa
tellite
292
0.8
D6X
2C4
6601
95Dop
aminerece
ptor1
913
Inside
P415
96FB
gn0
0115
82Tran
sposon
900
D6X
259
6562
90Tran
sportan
dGolgio
rgan
ization13
914
5994
56Q9V
GT8
FBgn0
0402
56Sa
tellite
222
0.6
D6X
260
6563
73Protein-tyrosine
sulfo
tran
sferase
914
3933
,523
Q9V
YB7
FBgn0
0866
74Sa
tellite
222
0.6
D6X
075
6586
03MICAL-likeprotein
915
5939
,684
Q9V
U34
FBgn0
0363
33Sa
tellite
203
0.6
D6X
1P2
6588
91tip
top
915
3914
2,82
1Q9U
3V5
FBgn0
0289
79Sa
tellite
203
0.6
D6X
095
6591
95Trop
onin
C9
1659
3922
P479
47FB
gn0
0133
48Tran
sposon
589
D6X
0I1
6593
36Trop
onin
C9
1639
15,143
P479
47FB
gn0
0133
48Tran
sposon
589
D6X
1J0
6557
13Tran
sporter
917
Inside
Q9N
B97
FBgn0
0341
36Sa
tellite
915
2.5
D6W
F56
1001
4187
7zinc
fing
erprotein
250
318
5912
5,68
5Q7K
AH0
FBgn0
0273
39Sa
tellite
1208
3.4
D6W
F61
6569
24Tran
scrip
tioninitiationfactor
TFIID
subun
it7
318
3964
,294
Q9V
HY5
FBgn0
0249
09Sa
tellite
1208
3.4
D6W
GB1
6590
40Mah
ya3
1959
115,53
7P2
0241
FBgn0
0029
68Sa
tellite
687
1.9
D6W
GB5
6592
01V-typ
eprotonATP
asesubun
itE
319
3982
,217
P546
11FB
gn0
0153
24Sa
tellite
687
1.9
D6W
II010
0141
571
NADH
deh
ydrogen
ase,
putative
320
5942
78Q9W
3N7
FBgn0
0299
71Tran
sposon
635
D6W
II210
0142
263
Putativ
eun
characterized
protein
320
3918
,585
Q9V
IY1
FBgn0
0327
69Tran
sposon
635
D6W
FT8
6575
35WD
repea
t-co
ntaining
protein
473
21Inside
Q96
0Y9
FBgn0
0264
27Sa
tellite
604
1.7
D6W
DY2
6561
25Kyn
uren
ineam
inotransferase
322
5910
,696
Q8S
XC2
FBgn0
0379
55Tran
sposon
1000
D6W
DY4
6562
98Ann
exin
IX3
2239
11,599
P224
64FB
gn0
0000
83Tran
sposon
1000
D6W
FK8
6561
74an
kyrin
2,3/un
c44
323
Inside
Q7K
U95
FBgn0
0854
45Tran
sposon
1081
D6W
FX1
6578
74ralg
uanine
nucleo
tideex
chan
gefactor
324
5964
,025
Q8M
T78
FBgn0
0341
58Tran
sposon
888
D6W
FX3
6580
31galactose-1-pho
spha
teuridylyltran
sferase
324
3939
58Q9V
MA2
FBgn0
0318
45Tran
sposon
888
D6W
DQ4
6592
33Pu
tativ
eun
characterized
protein
325
5915
,051
Q8T
0R9
FBgn0
0388
09Tran
sposon
1016
D6W
DQ6
6593
76co
iled-coild
omainco
ntaining
963
2539
9162
A1Z
A72
FBgn0
0139
88Tran
sposon
1016
D6W
F68
6550
42gluco
sedeh
ydrogen
ase
326
5925
,896
Q9V
Y00
FBgn0
0305
98Tran
sposon
1067
C3X
Z92
6553
48Mito
gen-activ
ated
protein
kina
sekina
sekina
sekina
se2
326
3992
,876
Q8S
YA1
FBgn0
0344
21Tran
sposon
1067
(con
tinue
d)
934 | J. Brajkovi�c et al.
nTa
ble
1,co
ntinue
d
Uniprot
Entrez
Gen
eNam
eChr
Sat_seq.
Positio
nDistanc
e,bp
DM
Hom
olog
FBgn
Type
Leng
thCop
ies
D6W
E82
6584
63Pu
tativ
eun
characterized
protein
327
Inside
Q9V
DK2
FBgn0
0388
15Tran
sposon
314
D6W
HX6
6581
91Pu
tativ
eun
characterized
protein
328
5917
3,88
1Q1R
KQ9
FBgn0
0853
82Tran
sposon
826
D6W
I58
6583
43Cathe
psinL
328
3982
,559
Q95
029
FBgn0
0137
70Tran
sposon
826
D6W
DJ9
6569
22Pu
tativ
eun
characterized
protein
329
5917
3,54
8Q8IPJ
1FB
gn0
0318
59Tran
sposon
1084
D6W
DN0
6575
59PR
MT5
329
3938
3,80
9Q9U
6Y9
FBgn0
0159
25Tran
sposon
1084
D6W
GS3
6569
76Pu
tativ
eun
characterized
protein
330
5937
572
A0A
MQ8
FBgn0
0346
55Sa
tellite
216
0.6
D6W
GT0
1001
4251
5calpain3
330
3922
6,70
7Q11
002
FBgn0
0086
49Sa
tellite
216
0.6
D6W
DS8
6605
32Muscle-sp
ecificprotein
300
331
5937
8,62
6Q4A
BG9
FBgn0
2609
52Tran
sposon
1060
D6W
DT0
6548
60Ph
ospha
tidylinosito
l-bindingclathrin
assembly
protein
331
3978
55C1C
3H4
FBgn0
0863
72Tran
sposon
1060
D6W
HF2
6641
88Nep
hrin
332
Inside
Q9W
4T9
FBgn0
0283
69Tran
sposon
666
D6W
I96
1001
4262
0Hea
tshoc
kprotein
703
33Inside
P111
47FB
gn0
0012
19Tran
sposon
1058
D6W
G02
6549
17N-ace
tylgluco
saminyltran
sferasevi
334
Inside
Q9V
UH4
FBgn0
0364
46Tran
sposon
319
D6W
YD1
6568
91Pu
tativ
eun
characterized
protein
835
5938
5,71
2Q8S
Y79
FBgn0
0322
49Sa
tellite
625
1.7
D6W
YN3
6549
42serin
e-typeproteaseinhibito
r8
3539
58,583
Q9V
SC9
FBgn0
0358
33Sa
tellite
625
1.7
D6W
YA1
6579
13Cop
iaprotein
(Gag
-int-pol
protein)
836
3926
2B6V
6Z8
??Sa
tellite
196
0.5
D6W
YC9
6567
18Cmp-n-ace
tylneu
raminic
acid
syntha
se8
37Inside
B5R
JF3
FBgn0
0522
20Tran
sposon
831
D6W
V42
6560
28CG50
808
38Inside
Q7K
3E2
FBgn0
0313
13Tran
sposon
582
D6W
YA0
1001
4250
7Bea
tenpath
839
5971
65Q94
534
FBgn0
0134
33Tran
sposon
1181
D6W
UX6
6622
35Pu
tativ
eun
characterized
protein
840
Inside
Q7K
UK9
FBgn0
0364
54Tran
sposon
440
D6X
0E1
6549
38defec
tiveprobo
scisex
tensionresp
onse
741
Inside
Q9V
FD9
FBgn0
0382
82Sa
tellite
722
2.0
D6W
PX8
6620
21Ribosom
e-releasingfactor
2,mito
chon
dria
l7
4259
17,480
Q9V
CX4
FBgn0
0511
59Tran
sposon
905
A2A
X72
6620
58Gustatory
rece
ptor
742
3915
81Q9V
PT1
FBgn0
0412
50Tran
sposon
905
D6W
TD1
6618
95similarto
chitina
se6
743
Inside
Q9W
2M7
FBgn0
0345
80Sa
tellite
1440
4.0
D6W
PE6
1001
4207
3vo
ltage-gated
potassium
chan
nel
744
Inside
P179
70FB
gn0
0033
83Tran
sposon
814
D2A
2C6
6638
49Pu
tativ
eun
characterized
protein
445
5994
89Q9V
3S3
FBgn0
0133
00Sa
tellite
549
1.5
D2A
2D1
6638
75Pu
tativ
eun
characterized
protein
445
3910
,920
Q9W
191
FBgn0
0349
94Sa
tellite
549
1.5
D2A
2I0
6570
17Pu
tativ
eun
characterized
protein
446
5958
20Q8S
Z28
FBgn0
0337
86Sa
tellite
558
1.6
D2A
2I1
6570
98Ribon
ucleoside-dipho
spha
tereduc
tase
446
3970
00P4
8591
FBgn0
0120
51Sa
tellite
558
1.6
D1Z
ZG6
6609
83Kinesin-like
protein
447
Inside
Q9V
LW2
FBgn0
0319
55Tran
sposon
508
D2A
2P8
1001
4259
5Pigg
yBac
tran
sposab
leelem
ent
448
Inside
Q9V
HL1
FBgn0
0376
33Tran
sposon
377
D6W
B65
6550
28E7
42
4959
60,525
P201
05FB
gn0
0005
67Sa
tellite
770
2.1
D6W
B73
6549
62orga
niccatio
ntran
sporter
249
3926
38Q7K
3M6
FBgn0
0344
79Sa
tellite
770
2.1
D6W
BG8
6591
29pre-m
RNA-splicinghe
licaseBRR
22
5039
4811
Q9V
UV9
FBgn0
0365
48Sa
tellite
728
D6W
B14
6588
44mon
ophe
nolic
aminetyramine
251
5979
55P2
2270
FBgn0
0045
14Tran
sposon
567
D6W
B15
6587
69Cuticular
protein
47Ef
251
3916
,173
A1Z
8H7
FBgn0
0336
03Tran
sposon
567
D6W
B29
6577
78En
dop
roteaseFU
RIN
252
Inside
P304
32FB
gn0
0045
98Tran
sposon
1045
A8D
IV5
6579
42Nicotinic
acetylch
olinerece
ptorsubun
italpha
112
5359
13,645
P251
62FB
gn0
0041
18Tran
sposon
1021
D6W
B29
6577
78En
dop
roteaseFU
RIN
253
3948
75P3
0432
FBgn0
0045
98Tran
sposon
1021
D6X
3I9
6617
87Tran
scrip
tioninitiationfactor
IIF10
5459
10,025
Q05
913
FBgn0
0102
82Sa
tellite
870
2.4
D6X
3J1
6618
27Pu
tativ
eun
characterized
protein
1054
3966
07Sa
tellite
870
2.4
D6X
4P3
6553
89Neu
tral
alph
a-gluco
sidaseab
1055
Inside
Q7K
MM4
FBgn0
0275
88Sa
tellite
694
1.9
D6X
3H5
6612
46Neu
rexin-4
1056
5922
34Q94
887
FBgn0
0139
97Sa
tellite
224
0.6
D6X
3H7
6613
08Su
ccinatesemialdeh
ydedeh
ydroge
nase
1056
3914
,901
Q9V
BP6
FBgn0
0393
49Sa
tellite
224
0.6
D6X
4V6
6559
16Tu
bby,
putative
1057
Inside
Q9V
B18
FBgn0
0395
30Tran
sposon
763
(con
tinue
d)
Volume 2 August 2012 | Gene-Associated Satellite DNA | 935
sequence, we searched the T. castaneum Gen Bank database for “full-sized” homologous elements that could potentially code for transpo-sases and be considered autonomous. The search identified an element,named TR 1.9, with a 925-bp sequence inserted within a unique se-quence of the TCAST transposon-like elements (Figure 2). This925-bp sequence contains an ORF of 206 amino acids and a con-served domain belonging to the Transposase 1 superfamily, whichalso includes the mariner transposase. DNA transposons of themariner/Tc1 superfamily Mariner-1_TCa and Mariner-2_TCa, wereidentified within the T. castaneum genome (Jurka 2009a, 2009b).Using BLASTP and the translated sequence from the 925 bp ORFas a query sequence, we identified hits with a partial homology toa Mariner-2_TCa transposase and to a mariner-like element trans-posase present in two other insects, the beetle Agrilus planipennis(emerald ash borer) and Chrysoperla plorabunda (green lacewing;Neuroptera), but not to Mariner-1_ TCa transposase.
To test whether there is any chromosome-specific sequenceclustering of TCAST transposon-like sequences that could suggestdifference in homogenization within chromosome and among differ-ent chromosomes, the alignment and subsequent phylogenetic analysisof TCAST transposon-like sequences was performed. Because TCASTtransposon-like elements differ significantly in size (31421181 nt), thealignment and phylogenetic analyses was performed on 25 elementsthat mutually overlap in their sequences, whereas the other nineTCAST transposon-like elements were excluded from the analysisdue to the very low overlapping with other elements. Alignment wasadditionally adjusted using Gblocks (File S3). The average pairwisedivergence among TCAST transposon-like sequences was 12.7%.ML and Bayesian methods gave similar tree topologies (Figure 1C).The ML tree showed very weak resolution of TCAST transposon-likesequences and a general absence of subgroups with specific sequencecharacteristics (Figure 1C). Only two clusters were formed whereas,using the Bayesian tree, we identified three well-supported groups; twoof them were as for ML tree (Figure 1C).
Distribution of TCAST-like elementson T. castaneum chromosomesTCAST-like elements found in the vicinity of genes were distributedon all 10 T. castaneum chromosomes (Table 1). Positions of consti-tutive heterochromatin and euchromatin were assigned on the haploidset of T. castaneum chromosomes, based on C-banding data (Stuartand Mocelin 1995) and Tribolium castaneum 3.0 Assembly data (Fig-ure 3). Within euchromatic segments, the position of each TCAST-like element is specifically indicated (Figure 3) based on the positionwithin the genomic sequence (Table S1). TCAST-like elements weredispersed on both arms of chromosomes 3, 5, 9, and 7, whereas onother chromosomes they were located on a single arm (Figure 3). Thenumber of TCAST-like elements ranged from 2 on chromosome 1(X)to 17 on chromosomes 3 and 9. To detect whether TCAST-like ele-ments were distributed randomly among the T. castaneum chromo-somes or whether there was a significant over or underrepresentationof the elements on some chromosomes we performed hypergeometricdistribution analysis test. The analysis revealed no statistically signif-icant deviation in the number of TCAST-like elements among thechromosomes (Figure S1), pointing to their random distribution.
To determine whether there was a target preference for theinsertion of TCAST-like elements, for example high AT content oranother sequence characteristic, we analyzed the AT content within100 bp of the flanking regions for each TCAST -like element, fromboth 59 and 39 sites (Figure S2 and Figure S3). The average AT contentn
Table
1,co
ntinue
d
Uniprot
Entrez
Gen
eNam
eChr
Sat_seq.
Positio
nDistanc
e,bp
DM
Hom
olog
FBgn
Type
Leng
thCop
ies
D6X
3J6
6620
34Pu
tativ
eun
characterized
protein
1058
5910
15Q9V
EJ9
FBgn0
0385
11Sa
tellite
564
1.6
D6X
3J7
6570
69cd
c73dom
ainprotein
1058
3927
,239
Q9V
HI1
FBgn0
0376
57Sa
tellite
564
1.6
D2A
693
6632
31lysine
-spec
ificdem
ethy
lase
4B6
59Inside
Q9V
6L0
FBgn0
0531
82Sa
tellite
498
1.4
D2A
490
6596
55Fa
cilitated
treh
alosetran
sporter
Tret1-2ho
molog
660
Inside
Q8M
KK4
FBgn0
0336
44Tran
sposon
689
D2A
6I4
6597
28Pu
tativ
eun
characterized
protein
661
5911
6,03
0Q9W
4G2
FBgn0
2609
71Tran
sposon
764
D2A
6I6
6597
91Pu
tativ
eun
characterized
protein
661
3948
60Q9V
NB4
FBgn0
0373
23Tran
sposon
764
D2A
3V0
6572
72Fa
sciclin
-36
6259
37,286
P152
78FB
gn0
0006
36Sa
tellite
281
0.8
D2A
3V3
6574
21LIM
dom
ainkina
se1
662
3921
,789
Q8IR7
9FB
gn0
0412
03Sa
tellite
281
0.8
D6W
8F4
6603
22Disco
-related
x63
Inside
Q9V
XJ5
FBgn0
0426
50Sa
tellite
530
1.5
D6W
8D3
6591
23Plex
Ax
6459
1973
O96
681
FBgn0
0257
41Tran
sposon
848
D6W
GD2
6592
72Aldose-1-ep
imerase
x64
3964
72Q9V
RU1
FBgn0
0356
79Tran
sposon
848
B3M
MG1
6576
52Neu
ral-c
adhe
rin5
65Inside
O15
943
FBgn0
0156
09Sa
tellite
273
0.8
D6W
NN6
6585
79Tran
sien
trece
ptorpoten
tial-g
amma
protein
566
5925
47Q9V
JJ7
FBgn0
0325
93Tran
sposon
894
A3R
E80
6586
61Cardioacce
leratory
pep
tiderece
ptor
566
3927
,105
Q86
8T3
FBgn0
0393
96Tran
sposon
894
A1J
UG2
6612
07Ultraspira
cle
567
Inside
P201
53FB
gn0
0039
64Sa
tellite
379
1.1
D6W
NB3
6560
63Ybox
protein
568
5914
,993
O46
173
FBgn0
0229
59Sa
tellite
455
1.3
D6W
NB6
6560
95Pe
ptid
ech
ainreleasefactor
15
6839
350,36
5Q9V
K20
FBgn0
0324
86Sa
tellite
455
1.3
Alistof
gen
eswith
gen
eiden
titynu
mbers,
gen
ena
me,
chromosom
allocatio
n,position
,an
ddistanc
erelativ
eto
theassociated
TCAST
-like
elem
ent,as
wella
salistof
TCAST
-like
elem
ents,theirtypes
(satelliteor
tran
sposon
-like
),totallen
gth
inbp,an
dco
pynu
mber
ofsatellite
repea
tswith
inan
arrayareshow
n.
936 | J. Brajkovi�c et al.
of the flanking regions for both TCAST satellite-like elements andTCAST transposon-like elements did not differ significantly fromthe average AT content of the whole T. castaneum genome or fromthe AT content of randomly selected intergenic regions andintrons. Thus, this finding suggests that with regard to AT con-tent, there is no target preference for the insertion of TCAST-likeelements. Furthermore, alignment and comparison of all flankingsequences of TCAST-like elements did not identify any commonsequence motifs.
Genes in the vicinity of TCAST-like elementsUniprot gene numbers were used as identifiers of genes located in thevicinity of TCAST-like elements (gene names shown in Table 1).Uniprot gene numbers for homologous genes found in Drosophilamelanogaster are also indicated (Table 1). Detailed description ofthe genes, including molecular function of their protein products,biological processes in which these proteins are involved, and theircellular localization (cellular component), are shown (Table S1). Eachidentified gene is assigned to a particular TCAST-like element withinits vicinity, and the precise position of TCAST-like elements in geno-mic sequence (start and end site) is indicated (Table S1). Functionalanalysis revealed that 17 of 101 genes correspond to putative unchar-acterized proteins, whereas the remaining genes are involved in dif-ferent molecular functions and diverse biological processes. Amongthe proteins, a proportion is characterized by ATP binding activity (13proteins) and involvement in protein phosphorylation and /or signaltransduction (9 proteins; Table S1).
To determine whether TCAST-like elements are distributedrandomly relative to genes or whether they are overrepresented nearspecific groups of genes, we used GeneCodis 2.0 to provide a statisticalrepresentation of the genes associated with TCAST-like elements.Because many genes are still not annotated in T. castaneum andfurthermore T. castaneum genomic data are not included in Gene-Codis, we used gene numbers for orthologous genes from D. mela-
nogaster for the analysis and compared them with the whole setof 14,869 genes annotated in D. melanogaster. Genecodis analysisrevealed that TCAST-like elements are located near nine genes char-acterized as members of the immunoglobulin protein superfamily.Because there are only 134 immunoglobulin-like genes presentwithin the total set of D. melanogaster genes, random distributionof TCAST-like elements would result in their occurrence near ap-proximately a single immunoglobulin-like gene. The presence ofTCAST-like elements in the vicinity of nine immunoglobulin-likegenes therefore represents a statistically significant overrepresenta-tion (0.00000427). All nine genes exhibit structural features of im-munoglobulin-like, immunoglobulin subtype 1 and immunoglobulinsubtype 2 proteins and are associated with the following TCASTtransposon-like elements: 25 at the 39end, 28 and 39 at the 59 end,32 and 40 within introns, and TCAST satellite-like elements: 8 at the39 end, 19 and 62 at the 59 end, and 41 within intron (Table 1). Aminimal distance between TCAST-like element and immunoglobulin-like gene was 7165 bp and a maximal 173,881 bp (Table 1). Molecularfunction of most of immunoglobulin-like genes is unknown, and theyare involved in different biological processes such as cell adhesion,protein phosphorylation, and axon guidance (Table S1). Although allnine genes belong to immunoglobulin superfamily, they did not ex-hibit sequence similarity, which could suggest role of duplication intheir evolution and spreading. The position of TCAST-like elementsrelative to the genes also was not consistent with the possibility thatTCAST-like elements duplicated along with the immunoglobulin- likegenes.
Overrepresentation of TCAST-like elements was also found neargenes that exhibit ATP-binding activity and axon guidance propertiesbut with a marginal significance (0.0183374 and 0.00865139). Forthe rest of genes, no significant overrepresentation of TCAST-likeelements was detected. Thus, enrichment of TCAST-like elements inthe vicinity of immunoglobulin-like genes potentially implicates arole of TCAST-like elements in the regulation of these genes.
Figure 1 Bayesian/ML phylogenetic trees of: (A) TCAST satellite-like elements (subunits Tcast1a), (B) TCAST satellite-like elements (subunitsTcast1b), and (C) TCAST transposon-like elements. Sequence numbers correspond to those in Table 1. When a particular sequence is composedof few subrepeats (e.g., Tcast1a or Tcast1b), numbers indicating subrepeats are added (e.g., 43_1, 43_2, 43_3). Numbers in brackets indicatechromosomes on which the corresponding sequences are located. Numbers on branches indicate Bayesian posterior probabilities/ML bootstrapsupport (above 0.5/50%, respectively).
Volume 2 August 2012 | Gene-Associated Satellite DNA | 937
DISCUSSIONTEs are classified in several dozen families based on transpositionmechanisms and different dynamics properties (Hua-Van et al. 2005).Active TEs encode the enzymes necessary for their transposition,either to move between nonhomologous regions in the genome orto copy themselves to other positions. In many cases, TEs do notproduce their own enzymes but are able to use those from functionalcopies or even from other TEs families. Defective and inactive TEsoften are amplified in regions of low recombination such as hetero-chromatin and may form tandemly repeated satellite DNAs. Theorigin of satellite DNA array from transposon-like elements isreported for many insects such as Drosophila melanogaster (Agudoet al. 1999), Drosophila guanche (Miller et al. 2000), and the beetleMisolampus goudoti (Pons 2004) whereas the retroviral-like featureswere first observed in the satellite DNA from rodents of the genusCtenomys (Rossi et al. 1993).
Transposons can be inserted into other repetitive sequences suchas satellite DNAs, as has been observed for the mariner-like elementand MITE element, both inserted into satellite DNA of the antMessorbouvieri (Palomeque et al. 2006). Searching for repetitive elementshomologous to the TCAST repeat within Repbase (http://www.girinst.org/repbase/) revealed that 59 UTR of nonlong terminal re-peat retrotransposon CR1-3_TCa (Jurka 2009c) shares a high sim-ilarity of 83% with a 444-bp long TCAST sequence composed of 1.2
tandem monomers (Figure 1). Other CR1 subfamilies identifiedwithin T. castaneum such as CR1-1_ TCa, CR1-2_TCa, and CR1-4_TCa, published in Repbase, do not share similarity to CR1-3 anddo not contain TCAST similar sequence. We propose that CR1-3 wasinserted within TCAST satellite array and through recombinationhas acquired a part of TCAST sequence. Newly acquired TCASTelement could act as a promoter because TCAST satellite DNA hasan internal promoter for RNA Pol II (Pezer and Ugarkovi�c 2012)and becomes a new functional 59 UTR. Subsequent retrotransposi-tion of CR1-3_TCa could explain the dispersion of TCAST withinthe euchromatin (Figure 4). Three CR1-3_TCa elements withTCAST in the 59UTR were identified within scaffolds that havenot been mapped to linkage groups. However, truncated fragmentswith partial homology to CR1-3_TCa retrotransposon can be map-ped within T. castaneum genome, some of them in the vicinity ofTCAST elements. Such arrangement also indicates the role of CR1-3_TCa in the spreading of TCAST elements. There is also a possibilitythat TCAST satellite DNA originates from CR1-3 retrotransposonwhich was, after inactivation, amplified within the heterochromatinregion. In the case of TCAST transposon-like elements, part of thesatellite sequence is incorporated within TIRs which are character-istic for DNA transposons. The presence of target-site duplicationsat the sites of insertions of some TCAST transposon-like elementsalso indicates transposition as a mode of spreading of TCAST
Figure 2 Organization of TCASTelements within T. castaneumgenome in the form of TCASTtransposon-like element, tan-dem arrays, and CR1-3_TCaretrotransposon. Regions corre-sponding to TCAST elementare shown in red. TCAST trans-poson-like element contains analmost complete TCAST mono-mer and a monomer segmentof approximately 121 bp in aninverted orientation, whereasCR1-3 retrotransposon containssegment corresponding to 1.2monomer. Within TCAST trans-poson-like element terminalinverted repeats (arrows) uniquenonsatellite sequence (green),target-site duplication in theform of “ACT,” and the insertionpoint of 925-bp sequence found
within TR 1.9, element and coding for the putative transposase are shown. Three short ORFs within TCAST transposon-like element are alsoindicated. Within nonlong terminal repeat retrotransposon CR1-3_TCa regions corresponding to 59UTR and to two ORFs are indicated.
Figure 3 Distribution of TCAST-like elements on T. cas-taneum chromosomes. The karyotype representing thehaploid set of T. castaneum chromosomes, and posi-tions of constitutive heterochromatin (dark) and euchro-matin (white) are depicted based on C-banding data(Stuart and Mocelin 1995) and T. castaneum 3.0 assem-bly (http://www.beetlebase.org). TCAST transposon-likeelements (blue) and TCAST satellite-like elements (red)are shown. Two TCAST-like elements are representedas separate lines if they are at least 100 kb distant fromeach other.
938 | J. Brajkovi�c et al.
elements. Parts of satellite DNA elements can be found within sometransposons, such as pDv transposon (Evgen’ev et al. 1982; Zelentsovaet al. 1986) whose long direct terminal repeats show significant se-quence similarity to the pvB370 satellite DNA, located in the centro-meric heterochromatin of a number of species of the Drosophila virilisgroup (Heikkinen et al. 1995). The presence of short stretches ofPisTR-A satellite DNA sequences within 39 UTR of Ogre retrotrans-posons dispersed in the pea (Pisum sativum) genome was reported(Macas et al. 2009). Furthermore, the mobilization of subtelomericrepeats upon excision of the transposable P element from tandemlyrepeated subtelomeric sequences has been observed (Thompson-Stewart et al. 1994).
Incorporation of part of a TCAST satellite DNA sequence intoa (retro)transposable element, and its subsequent mobilization andspreading by (retro)transposition, may explain the distribution ofTCAST element in the vicinity of genes within euchromatin. SatelliteDNA sequences are prone to undergo recurrent repeat copy numberexpansion and contraction in divergent lineages as well as amongpopulations of the same species (Bosco et al. 2007). This amplificationappears to be random and does not correlate with phylogeny of thespecies (Pons et al. 2004; Lee et al. 2005; Bulazel et al. 2007). Ampli-fication of a satellite sequence is reported to occur as a result of un-equal crossing over or duplicative transposition (Smith 1976; Ma andJackson 2006). The discovery of human extrachromosomal elementsoriginating from satellite DNA arrays in cultured human cells anddifferent plant species indicates the possible existence of additionalamplification mechanisms based on rolling-circle replication (Assumet al. 1993; Navrátilová et al. 2008). It has been proposed that satellitesequences excised from their chromosomal loci via intrastrand recom-bination could be amplified in this way, followed by reintegration oftandem arrays into the genome (Feliciello et al. 2006). Moreover, it ispossible that such a mechanism affected TCAST satellite DNA, andthat extrachromosamal circles of TCAST were reintegrated into
different genome locations by homologous recombination basedon short stretches of sequence similarity between TCAST satelliteand target genomic sequence (Figure 4). Integrated TCAST sequen-ces are mainly composed of interspersed elements belonging to twomajor subfamilies, Tcast1a and Tcast1b, which is a prevalent type oforganization in pericentromeric heterochromatin (Feliciello et al.2011). This finding indicates that the origin of dispersed euchro-matic TCAST elements may be duplication of heterochromatincopies.
The distribution of TCAST-like elements relative to protein codinggenes revealed no specific preference for insertions within introns orat 59 or 39 ends of genes. TCAST-like elements are distributed on allchromosomes with no significant deviation in the number among thechromosomes, and phylogenetic analysis did not detect any significantsequence clustering of TCAST-like elements derived from the samechromosome. Dispersed TCAST satellite-like elements produce tan-dem arrays up to tetramers, but repeats from the same array do notreveal any significant clustering on phylogenetic trees. This findingindicates there is no significant difference in the homogenization ofTCAST satellite-like repeats at the level of local arrays or chromosomeor among different chromosomes. The average pair-wise sequencedivergence (6% for dispersed TCAST satellite-like repeats) is greaterthan the usual divergence of satellite elements located in heterochro-matin of tenebrionid beetles [approximately 2% (Ugarkovi�c et al.1996)]. This difference in homogeneity between repeats located inheterochromatin and euchromatin may be explained by a lower rateof gene conversion affecting dispersed satellite-like elements or bya specific mechanism of DNA repair acting on satellite DNA (Felicielloet al. 2006). TCAST transposon-like elements dispersed among thegenes within euchromatin have an even greater average sequencedivergence (approximately 12%) and also exhibit no significantchromosome-specific sequence clustering, indicating a similar rate ofhomogenization within and among the chromosomes. Relatively high
Figure 4 Models of spreading of TCAT-like elementsbased on (A) retrotransposition of CR-3_TCa element.CR1-3_TCa was inserted within TCAST satellite arrayand through recombination has acquired a part ofTCAST sequence, which could act as a promoter andbecome a new functional 59UTR. Subsequent retrotrans-position of CR1-3_TCa could explain the dispersion ofTCAST within the euchromatin. (B) Rolling circle replica-tion of TCAST satellite DNA sequences excised fromtheir heterochromatin loci via intrastrand recombina-tion, followed by reintegration into different genomelocations by homologous recombination.
Volume 2 August 2012 | Gene-Associated Satellite DNA | 939
sequence divergence of TCAST transposon-like elements and the sig-nificant truncation of the majority of them, indicates that the trans-position of these elements did not occur very recently and that theseelements could be considered as molecular fossils of the functionalTCAST transposon-like elements.
Cis-regulatory elements, such as promoters or transcription factorbinding sites, are predicted in some satellite DNAs (Pezer et al. 2011).Transcription from promoters for RNA Pol II is also characteristic forpericentromeric satellite DNAs from the beetles Palorus ratzeburgiiand Palorus subdepressus (Pezer and Ugarkovi�c 2008, 2009). Temper-ature-sensitive transcription of TCAST satellite DNA from an internalRNA Pol II promoter has been demonstrated (Pezer and Ugarkovi�c2012). Based on these findings, it can be proposed that TCAST ele-ments located in the vicinity of genes may function as alternativepromoters, and transcripts derived from them may interfere withthe expression of neighboring gene. This type of regulation is oftenobserved for retrotransposons positioned immediately 59 of proteingenes (Faulkner et al. 2009). In addition, some tissue-specific genepromoters are derived from retrotransposons (Ting et al. 1992;Samuelson et al. 1996). Because of rapid evolutionary turnover, satel-lite DNA sequences often are restricted to a group of closely relatedspecies, or in some instances are species specific. This is the case withTCAST satellite DNA, which is not even detected in the congenericTribolium species. If restricted satellite DNAs have regulatory poten-tial, then insertion of these elements in vicinity of genes could con-tribute to the establishment of lineage-specific or species-specificpatterns of gene expression. Annotation of genes in proximity toTCAST-like elements demonstrated a statistical overrepresentationof certain groups of genes, for example, those with immunoglobu-lin-like domains. Recently, in the fish Salvelinus fontinalis, a regulatoryrole of a 32-bp satellite repeat, located in an intron of the majorhistocompatibility complex gene (MHIIb), on MHIIb gene expressionwas demonstrated (Croisetiere et al. 2010). The level of gene expres-sion depends on temperature, as well as the number of satelliterepeats, and indicates a role for temperature-sensitive satellite DNAin gene regulation of the adaptive immune response. Further studiesare necessary to determine whether TCAST-like elements exhibit a po-tential regulatory role on nearby genes. The transcriptional potentialof satellite DNAs as well as their distribution close to protein-codinggenes, as shown in this study, provides strong support, that in additionto transposons, satellite DNAs represent a rich source for the assemblyof gene regulatory systems.
ACKNOWLEDGMENTSThis work was supported by Croatian Ministry of Science, Educationand Sport (grant no. 098-0982913-2832), European Union FP6 MarieCurie Transfer of Knowledge (grant MTKD-CT-2006-042248), andCOST Action TD0905 “Epigenetics: Bench to Bedside.”
LITERATURE CITEDAgrawal, R., T. Imielinski, and A. Swami, 1993 Mining association rules
between sets of items in large databases, pp. 207–216 in the Proceedings ofthe ACM SIGMOD International Conference on Management of Data,edited by P. Buneman and S. Jajodia. ACM Press, New York.
Agudo, M., A. Losada, J. P. Abad, S. Pimpinelli, P. Ripoll et al.,1999 Centromeres from telomeres? The centromeric region of the Ychromosome of Drosophila melanogaster contains a tandem array oftelomeric HeT-A- and TART-related sequences. Nucleic Acids Res. 27:3318–3324.
Assum, G., T. Fink, T. Steinbeisser, and K. J. Fisel, 1993 Analysis of humanextrachromosomal DNA elements originating from different beta-satellitesubfamilies. Hum. Genet. 91: 489–495.
Bosco, G., P. Campbell, J. T. Leiva-Neto, and T. A. Markow, 2007 Analysisof Drosophila species genome size and satellite DNA content revealssignificant differences among strains as well as between species. Genetics177: 1277–1290.
Britten, R. J., and E. H. Davidson, 1971 Repetitive and non-repetitive DNAsequences and a speculation on the origins of evolutionary novelty. Q.Rev. Biol. 46: 111–138.
Bulazel, K. V., G. C. Ferreri, M. D. Eldridge, and R. J. O’ Neill, 2007 Species-specific shifts in centromere sequence composition are coincident withbreakpoint reuse in karyotypically divergent lineages. Genome Biol. 8:R170.
Capy, P., C. Bazin, D. Higuet, and T. Langin, 1998 Dynamics and Evolutionof Transposable Elements. Springer-Verlag, Austin, TX.
Croisetiere, S., L. Bernatchez, and P. Belhumeur, 2010 Temperature andlength-dependent modulation of the MH class IIb gene expression inbrook charr (Salvelinus fontinalis) by a cis-acting minisatellite. Mol. Im-munol. 47: 1817–1829.
Edgar, R. C., 2004 MUSCLE: multiple sequence alignment with high ac-curacy and high throughput. Nucleic Acids Res. 32: 1792–1797.
Evgen’ev, M. B., G. N. Yenikolopov, N. I. Peunova, and Y. V. Ilyin,1982 Transposition of mobile genetic elements in interspecific hybridsof Drosophila. Chromosoma 85: 375–386.
Faulkner, G. J., Y. Kimura, C. O. Daub, S. Wani, C. Plessy et al., 2009 Theregulated retrotransposon transcriptome of mammalian cells. Nat. Genet.41: 563–571.
Feliciello, I., O. Picariello, and G. Chinali, 2006 Intra-specific variability andunusual organization of the repetitive units in a satellite DNA from Ranadalmatina: molecular evidence of a new mechanism of DNA repair actingon satellite DNA. Gene 383: 81–92.
Feliciello, I., G. Chinali, and Ð. Ugarkovi�c, 2011 Structure and evolutionarydynamics of the major satellite in the red flour beetle Tribolium casta-neum. Genetica 139: 999–1008.
Feschotte, C., 2008 Transposable elements and the evolution of regulatorynetworks. Nat. Rev. Genet. 9: 397–405.
Feschotte, C., and E. J. Pritham, 2007 DNA transposons and the evolutionof eukaryotic genomes. Annu. Rev. Genet. 41: 331–368.
Guindon, S., and O. Gascuel, 2003 A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704.
Hall, T. A., 1999 BioEdit: a user-friendly biological sequence alignmenteditor and analysis program for Windows 95/98/NT. Nucl. Acids Symp.Ser. 41: 95–98.
Heikkinen, E., V. Launonen, E. Müller, and L. Bachmann, 1995 The pvB370BamHI satellite DNA family of the Drosophila virilis group and its evo-lutionary relation to mobile dispersed genetic pDv elements. J. Mol. Evol.41: 604–614.
Hua-Van, A., A. Le Rouzic, C. Maisonhaute, and P. Capy, 2005 Abundance,distribution and dynamics of retrotransposable elements and transpo-sons: similarities and differences. Cytogenet. Genome Res. 110: 426–440.
Huelsenbeck, J. P., and F. Ronquist, 2001 MRBAYES: Bayesian inference ofphylogeny. Bioinformatics 17: 754–755.
Jurka, J., 2009a Mariner-1_TCa. Repbase Rep. 9: 674.Jurka, J., 2009b Mariner-2_TCa. Repbase Rep. 9: 675.Jurka, J., 2009c CR1–3_TCa. Repbase Rep. 9: 737.Kapitonov, V. V., and J. Jurka, 2003 Molecular paleontology of transposable
elements in the Drosophila melanogaster genome. Proc. Natl. Acad. Sci.USA 100: 6569–6574.
Kohany, O., A. J. Gentles, L. Hankus, and J. Jurka, 2006 Annotation, sub-mission and screening of repetitive elements in Repbase: RepbaseSub-mitter and Censor. BMC Bioinformatics 7: 474.
Kuhn, G. C., H. Küttler, O. Moreira-Filho, and J. S. Heslop-Harrison, 2012 The1.688 Repetitive DNA of Drosophila: concerted evolution at different geno-mic scales and association with genes. Mol. Biol. Evol. 29: 7–11.
Lee, H. R., W. Zhang, T. Langdon, W. Jin, H. Yan et al., 2005 Chromatinimmunoprecipitation cloning reveals rapid evolutionary patterns ofcentromeric DNA in Oryza species. Proc. Natl. Acad. Sci. USA 102:11793–11798.
940 | J. Brajkovi�c et al.
Lowe, C. B., G. Bejerano, and D. Haussler, 2007 Thousands of humanmobile element fragments undergo strong purifying selection near de-velopmental genes. Proc. Natl. Acad. Sci. USA 104: 8005–8010.
Ma, J., and S. A. Jackson, 2006 Retrotransposon accumulation and satelliteamplification mediated by segmental duplication facilitate centromereexpansion in rice. Genome Res. 16: 251–259.
Macas, J., A. Kobli�zkova, A. Navratilova, and P. Neumann,2009 Hypervariable 39UTR region of plant LTR-retrotransposons asa source of novel satellite repeats. Gene 448: 198–206.
Miller, W. J., A. Nagel, J. Bachmann, and L. Bachmann, 2000 Evolutionarydynamics of the SGM transposon family in the Drosophila obscura speciesgroup. Mol. Biol. Evol. 17: 1597–1609.
Navrátilová, A., A. Koblízková, and J. Macas, 2008 Survey of extrachro-mosomal circular DNA derived from plant satellite repeats. BMC PlantBiol. 8: 90.
Palomeque, T., J. A. Carrillo, M. Munos-Lopez, and P. Lorite,2006 Detection of a mariner-like element and a miniature inverted-repeat transposable element (MITE) associated with the heterochromatinfrom ants of the genus Messor and their possible involvement for satelliteDNA evolution. Gene 371: 194–205.
Pezer, �Z., and Ð. Ugarkovi�c, 2008 RNA Pol II promotes transcription ofcentromeric satellite DNA in beetles. PLoS ONE 3: e1594.
Pezer, �Z., and Ð. Ugarkovi�c, 2009 Transcription of pericentromeric het-erochromatin in beetles – satellite DNAs as active regulatory elements.Cytogenet. Genome Res. 124: 268–276.
Pezer, �Z., and Ð. Ugarkovi�c, 2012 Satellite DNA-associated siRNAs asmediators of heat shock response in insects. RNA Biol. 9: 587–595.
Pezer, �Z., J. Brajkovi�c, I. Feliciello, and Ð. Ugarkovi�c, 2011 Transcription ofsatellite DNAs in Insects. Prog. Mol. Subcell. Biol. 51: 161–179.
Pons, J., 2004 Cloning and characterization of a transposable-like repeat inthe heterochromatin of the darkling beetle Misolampus goudoti. Genome47: 769–774.
Pons, J. B., E. Bruvo, M. Petitpierre, Ð. Plohl, Ugarkovi�c et al.,2004 Complex structural feature of satellite DNA sequences in the ge-nus Pimelia (Coleoptera: Tenebrionidae): random differential amplifica-tion from a common “satellite DNA library”. Heredity 92: 418–427.
Posada, D., 2008 jModelTest. Phylogenetic model averaging. Mol. Biol.Evol. 25: 1253–1256.
Richards, S., R. A. Gibbs, G. M. Weinstock, S. J. Brown, R. Denell et al.,2008 The genome of the model beetle and pest Tribolium castaneum.Nature 452: 949–955.
Rossi, M. S., C. G. Pesce, O. A. Reig, A. R. Kornblihtt, and J. Zorzópulos,1993 Retroviral-like features in the monomer of the major satellite
DNA from the South American rodents of the genus Ctenomys. DNASeq. 3: 379–381.
Samuelson, L. C., R. S. Phyllips, and L. J. Swanberg, 1996 Amylase genestructures in primates: retroposon insertions and promoter evolution.Mol. Biol. Evol. 13: 767–779.
Smith, P. G., 1976 Evolution of repeated sequences by unequal crossover.Science 191: 528–535.
Stuart, J. J., and G. Mocelin, 1995 Cytogenetics of chromosome rearrange-ments in Tribolium castaneum. Genome 38: 673–680.
Talavera, G., and J. Castresana, 2007 Improvement of phylogenies afterremoving divergent and ambiguously aligned blocks from protein se-quence alignments. Syst. Biol. 56: 564–577.
Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei et al.,2011 MEGA5: Molecular Evolutionary Genetics Analysis using maxi-mum likelihood, evolutionary distance, and maximum parsimonymethods. Mol. Biol. Evol. 28: 2731–2739.
Thompson-Stewart, D., G. H. Karpen, and A. C. Spradling, 1994 A trans-posable element can drive the concerted evolution of tandemly repetitiousDNA. Proc. Natl. Acad. Sci. USA 91: 9042–9046.
Ting, C. N., M. P. Rosenberg, C. M. Snow, L. C. Samuelson, and M. H.Meisler, 1992 Endogenous retroviral sequences are required for tissue-specific expression of a human salivary amylase gene. Genes Dev. 6:1457–1465.
Ugarkovi�c, Ð., 2005 Functional elements residing within satellite DNAs.EMBO Rep. 6: 1035–1039.
Ugarkovi�c, Ð., and M. Plohl, 2002 Variation in satellite DNA profiles—causes and effects. EMBO J. 21: 5955–5959.
Ugarkovi�c, Ð., M. Podnar, and M. Plohl, 1996 Satellite DNA of the redflour beetle Tribolium castaneum—comparative study of satellites fromthe genus Tribolium. Mol. Biol. Evol. 13: 1059–1066.
Wang, S., M. D. Lorenzen, R. W. Beeman, and S. J. Brown, 2008 Analysis ofrepetitive DNA distribution patterns in the Tribolium castaneum genome.Genome Biol. 9: R61.
Waterhouse, R. M., E. M. Zdobnov, F. Tegenfeldf, J. Li, and E. V. Kriventseva,2011 OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.Nucleic Acids Res. 39: D283–D288.
Zelentsova, E. S., R. P. Vashakidze, A. S. Krayev, and M. B. Evgen’ev,1986 Dispersed repeats in Drosophila virilis: elements mobilized byinterspecific hybridization. Chromosoma 93: 469–476.
Zuker, M., 2003 Mfold web server for nucleic acid folding and hybridiza-tion prediction. Nucleic Acids Res. 31: 3406–3415.
Communicating editor: J. A. Scott
Volume 2 August 2012 | Gene-Associated Satellite DNA | 941