10
502 Mol. Biol. Evol. 16(4):502–511. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 The Domain Structure and Retrotransposition Mechanism of R2 Elements Are Conserved Throughout Arthropods William D. Burke,* Harmit S. Malik,* Jeffrey P. Jones² 1 and Thomas H. Eickbush* *Department of Biology, University of Rochester; and ²Department of Pharmacology, School of Medicine and Dentistry, Rochester, New York R2 elements are non-LTR retrotransposons that insert in the 28S rRNA genes of arthropods. Partial sequence data from many species have previously suggested that these elements have been vertically inherited since the origin of this phylum. Here, we compare the complete sequences of nine R2 elements selected to represent the diversity of arthropods. All of the elements exhibited a uniform structure. Identification of their conserved sequence features, combined with our biochemical studies, allows us to make the following inferences concerning the retrotransposition mechanism of R2. While all R2 elements insert into the identical sequence of the 28S gene, it is only the location of the initial nick in the target DNA that is rigidly conserved across arthropods. Variation at the R2 59 junctions suggests that cleavage of the second strand of the target site is not conserved within or between species. The extreme 59 and 39 ends of the elements themselves are also poorly conserved, consistent with a target primed reverse transcription mechanism for attachment of the 39 end and a template switch model for the attachment of the 59 end. Comparison of the ;1,000-aa R2 ORF reveals that it can be divided into three domains. The central 450-aa domain can be folded by homology modeling into a tertiary structure resembling the fingers, palm, and thumb subdomains of retroviral reverse transcriptases. The carboxyl terminal end of the R2 protein appears to be the endonuclease domain, while the amino-terminal end contains zinc finger and c-myb-like DNA-binding motifs. Introduction Eukaryotic transposable elements are difficult to study because their numerous copies gradually accu- mulate sequence changes. These changes make it diffi- cult to determine which copies are autonomous ele- ments, which can be active if components are provided in trans, and which are inactive. The highly abundant non-LTR retrotransposable elements are particularly dif- ficult to study, because their integration mechanism of- ten results in large numbers of defective elements (Wei- ner, Deininger, and Efstradiadis 1986). For example, it has been estimated that of the 500,000 L1 copies present in the human genome, only 30–60 copies are active (Sassaman et al. 1997). R1 and R2 elements are distinct site-specific non- LTR retrotransposable elements that insert in the 28S rRNA genes of arthropods (fig. 1A). The unique location of these elements in a highly conserved region of the genome provides several experimental advantages. First, the elements can be isolated from any species irrespec- tive of their sequence relationship to previously defined elements (Jakubczak, Burke, and Eickbush 1991; Burke et al. 1993). Second, the concerted evolution of the rRNA genes results in rapid turnover and the elimination of ‘‘dead’’ copies (Jakubczak et al. 1992). Finally, the sequence specificity of the insertions facilitates bio- chemical characterization of the retrotransposition en- zymes (Xiong and Eickbush 1988; Luan et al. 1993). 1 Present address: Department of Chemistry, Washington State University. Abbreviations: non-LTR, non-long terminal repeat; RT, reverse transcriptase; TPRT, target primed reverse transcription. Key words: retrotransposons, reverse transcriptase, integration, zinc fingers, c-myb, arthropods. Address for correspondence and reprints: Thomas H. Eickbush, Department of Biology, University of Rochester, Rochester, NewYork 14627-0211. E-mail: [email protected]. The single ORF of the R2 element from the silkmoth, Bombyx mori, has been shown to encode the enzymatic activities necessary to initiate the integration reaction (Xiong and Eickbush 1988; Luan et al. 1993). In this simple reaction, the R2 endonuclease first nicks one strand of the chromosomal target site. The 39 hydroxyl group released by this nick is then used as the primer for the R2 reverse transcriptase (RT) to prime synthesis of the cDNA strand. This mechanism, target primed re- verse transcription (TPRT), is believed to be the mech- anism of integration used by other non-LTR retrotran- sposable elements (Moran et al. 1996; Finnegan 1997), as well as by mobile bacterial and mitochondrial group II introns (Zimmerly et al. 1995). It is also likely to be the mechanism used for the insertion of short inter- spersed nucleotide elements (SINEs) (Jurka 1997; Oka- da et al. 1997). The phylogeny of the identified R2 elements is con- sistent with that of their arthropod hosts, suggesting that R2 has coexisted with the rDNA units for over 500 Myr (Lathe et al. 1995; Lathe and Eickbush 1997; Burke et al. 1998). To date, however, complete R2s have only been sequenced from the silkmoth (Burke, Calalang, and Eickbush 1987) and two species of Drosophila (Jakub- czak, Xiong, and Eickbush 1990; Malik and Eickbush 1999). In this report, we present the complete sequence of six additional R2 elements from divergent arthropods. Comparison of these R2 sequences provides an oppor- tunity to identify conserved features related to their structure and mechanism of integration. Materials and Methods Organisms The horseshoe crab (Limulus polyphemus) and a sowbug (isopod, Porcellio scaber) were obtained from Ward’s Natural Science Establishment, Inc. The earwig (Forficula auricularia) was collected locally. A Euro- pean strain of the jewel wasp (Nasonia vitripennis) was Downloaded from https://academic.oup.com/mbe/article/16/4/502/2925438 by guest on 11 December 2021

The Domain Structure and Retrotransposition Mechanism of R2

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

502

Mol. Biol. Evol. 16(4):502–511. 1999q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

The Domain Structure and Retrotransposition Mechanism of R2 ElementsAre Conserved Throughout Arthropods

William D. Burke,* Harmit S. Malik,* Jeffrey P. Jones†1 and Thomas H. Eickbush**Department of Biology, University of Rochester; and †Department of Pharmacology, School of Medicine and Dentistry,Rochester, New York

R2 elements are non-LTR retrotransposons that insert in the 28S rRNA genes of arthropods. Partial sequence datafrom many species have previously suggested that these elements have been vertically inherited since the origin ofthis phylum. Here, we compare the complete sequences of nine R2 elements selected to represent the diversity ofarthropods. All of the elements exhibited a uniform structure. Identification of their conserved sequence features,combined with our biochemical studies, allows us to make the following inferences concerning the retrotranspositionmechanism of R2. While all R2 elements insert into the identical sequence of the 28S gene, it is only the locationof the initial nick in the target DNA that is rigidly conserved across arthropods. Variation at the R2 59 junctionssuggests that cleavage of the second strand of the target site is not conserved within or between species. Theextreme 59 and 39 ends of the elements themselves are also poorly conserved, consistent with a target primed reversetranscription mechanism for attachment of the 39 end and a template switch model for the attachment of the 59 end.Comparison of the ;1,000-aa R2 ORF reveals that it can be divided into three domains. The central 450-aa domaincan be folded by homology modeling into a tertiary structure resembling the fingers, palm, and thumb subdomainsof retroviral reverse transcriptases. The carboxyl terminal end of the R2 protein appears to be the endonucleasedomain, while the amino-terminal end contains zinc finger and c-myb-like DNA-binding motifs.

Introduction

Eukaryotic transposable elements are difficult tostudy because their numerous copies gradually accu-mulate sequence changes. These changes make it diffi-cult to determine which copies are autonomous ele-ments, which can be active if components are providedin trans, and which are inactive. The highly abundantnon-LTR retrotransposable elements are particularly dif-ficult to study, because their integration mechanism of-ten results in large numbers of defective elements (Wei-ner, Deininger, and Efstradiadis 1986). For example, ithas been estimated that of the 500,000 L1 copies presentin the human genome, only 30–60 copies are active(Sassaman et al. 1997).

R1 and R2 elements are distinct site-specific non-LTR retrotransposable elements that insert in the 28SrRNA genes of arthropods (fig. 1A). The unique locationof these elements in a highly conserved region of thegenome provides several experimental advantages. First,the elements can be isolated from any species irrespec-tive of their sequence relationship to previously definedelements (Jakubczak, Burke, and Eickbush 1991; Burkeet al. 1993). Second, the concerted evolution of therRNA genes results in rapid turnover and the eliminationof ‘‘dead’’ copies (Jakubczak et al. 1992). Finally, thesequence specificity of the insertions facilitates bio-chemical characterization of the retrotransposition en-zymes (Xiong and Eickbush 1988; Luan et al. 1993).

1 Present address: Department of Chemistry, Washington StateUniversity.

Abbreviations: non-LTR, non-long terminal repeat; RT, reversetranscriptase; TPRT, target primed reverse transcription.

Key words: retrotransposons, reverse transcriptase, integration,zinc fingers, c-myb, arthropods.

Address for correspondence and reprints: Thomas H. Eickbush,Department of Biology, University of Rochester, Rochester, New York14627-0211. E-mail: [email protected].

The single ORF of the R2 element from the silkmoth,Bombyx mori, has been shown to encode the enzymaticactivities necessary to initiate the integration reaction(Xiong and Eickbush 1988; Luan et al. 1993). In thissimple reaction, the R2 endonuclease first nicks onestrand of the chromosomal target site. The 39 hydroxylgroup released by this nick is then used as the primerfor the R2 reverse transcriptase (RT) to prime synthesisof the cDNA strand. This mechanism, target primed re-verse transcription (TPRT), is believed to be the mech-anism of integration used by other non-LTR retrotran-sposable elements (Moran et al. 1996; Finnegan 1997),as well as by mobile bacterial and mitochondrial groupII introns (Zimmerly et al. 1995). It is also likely to bethe mechanism used for the insertion of short inter-spersed nucleotide elements (SINEs) (Jurka 1997; Oka-da et al. 1997).

The phylogeny of the identified R2 elements is con-sistent with that of their arthropod hosts, suggesting thatR2 has coexisted with the rDNA units for over 500 Myr(Lathe et al. 1995; Lathe and Eickbush 1997; Burke etal. 1998). To date, however, complete R2s have onlybeen sequenced from the silkmoth (Burke, Calalang, andEickbush 1987) and two species of Drosophila (Jakub-czak, Xiong, and Eickbush 1990; Malik and Eickbush1999). In this report, we present the complete sequenceof six additional R2 elements from divergent arthropods.Comparison of these R2 sequences provides an oppor-tunity to identify conserved features related to theirstructure and mechanism of integration.

Materials and MethodsOrganisms

The horseshoe crab (Limulus polyphemus) and asowbug (isopod, Porcellio scaber) were obtained fromWard’s Natural Science Establishment, Inc. The earwig(Forficula auricularia) was collected locally. A Euro-pean strain of the jewel wasp (Nasonia vitripennis) was

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

R2 Elements in Arthropods 503

FIG. 1.—Location and structure of arthropod R2 elements. A, TherDNA unit from Drosophila melanogaster with the location of the R1and R2 insertion sites. The 18S, 5.8S, and 28S genes are indicated byblack boxes. B, Structure of the R2 elements. The single ORF in R2is shown by stippled shading; darker stippling indicates the central RTdomain, and open boxes indicate the 59 and 39 untranslated regions(UTRs). The jewel wasp A element appears to be a R1/R2 chimera,with the shaded box labeled ORF1 being similar to the ORF1 of allR1 elements.

obtained from J. Werren, and the collembola species(Anurida maritima) was collected in Maine by L. At-kinson. The silkmoth 59 junctions were derived fromstrains 703 and C108 of B. mori and a Chinese strainof Bombyx mandarina (Xiong, Sakaguchi, and Eickbush1988).

DNA Amplification and Sequencing

PCR amplification of the 59 half of each R2 wasobtained with PCR primers to a sequenced region fromthe 39 half of each element and the primerTGCCCAGTGCTCTGAATGTC, which is complemen-tary to the 28S gene of arthropods 80 bp upstream ofthe R2 site. The amplified DNA was cloned intomp18T2 (Burke, Muller, and Eickbush 1995), and mul-tiple clones were organized into complementary pairs.At least two clones of each orientation were used todetermine the sequence of both strands using nested se-quencing primers. When it was determined that the jew-el wasp R2 element initially obtained from a genomiclibrary (family A) (Burke et al. 1993) represented a fu-sion with an R1 element, our standard PCR approach(Eickbush and Eickbush 1995; Burke et al. 1998) wasused to isolate a second R2 family from this species(family B). PCR amplification of the 39 half of the el-ement was obtained with the degenerate primerGCNTWWGCNGAYGAY (N 5 any nucleotide, W 5A or T, and Y 5 T or C) encoding the amino acid motifA(Y/F)ADD and the primer CGCGCATGAATGGAT-

TAACG complementary to the 28S gene 20 bp down-stream of the R2 site. Amplification of the 59 half of thenew element was obtained with a primer to a sequencedregion from the 39 half of the element and the aboveupstream 28S gene primer. With the exception of ele-ments from isopod, virtually all R2 copies from eachspecies contained an intact ORF. In isopod, most copiescontained a 72-bp deletion and required two frameshiftsto generate the intact ORF presented. The isopod se-quence presented in this report is that of an intact R2element which appears to be at low levels in the genomeof this species. Conceptual translations of each element’sORF were aligned with high gap penalties using CLUS-TAL W (Thompson, Higgins, and Gibson 1994), fol-lowed by minor manual adjustments of the gaps insertedby the program. The phylogeny of the elements wasdetermined using the last 450 aa of the ORF and neigh-bor-joining algorithms (Saitou and Nei 1987). Accessionnumbers for the new R2 sequences are as follows: N.vitripennis (jewel wasp) A, L00950; N. vitripennis B,AF090145; L. polyphemus (horseshoe crab), AF015814;A. maritima (collembola) AF015815; P. scaber (isopod)AF015818; and F. auricularia (earwig) AF015819.

Structure Predictions

The structure of the Drosophila melanogaster R2RT domain was modeled based on the method of Saliand Blundell (1993) using the program MODELLER.This program is designed to find the most probablestructure of a sequence given its alignment with relatedsequences of known structure. The sequence alignmentbetween HIV and R2 was based on an updated versionof our multiple-sequence alignment of RTs from all re-troelements (Xiong and Eickbush 1990). The HIV ter-tiary structures 1HAR, which does not contain thethumb region of the RT domain (Unge et al. 1994), and3HVT, which is the complete p66/p51 heterodimerstructure (Kohlstaedt et al. 1992), were used in the struc-ture predictions. Ten R2 structures were independentlygenerated by the MODELLER program using a randomseed. Most of the differences between these 10 struc-tures were in the regions unique to R2, which were typ-ically predicted as unstructured loops.

Results and DiscussionIsolation of Complete R2 Elements from DiverseArthropods

Complete sequences of the R2 elements from B.mori (Burke, Calalang, and Eickbush 1987), D. melan-ogaster (Jakubczak, Xiong, and Eickbush 1990) andDrosophila mercatorum (Malik and Eickbush 1999)have been previously reported. The 39 ends of numerousother R2 elements have also been reported as part ofprevious phylogenetic studies (Burke et al. 1993, 1998).For this report, the R2 elements from five diverse ar-thropods (jewel wasp, earwig, collembolla, isopod, andhorseshoe crab) were selected for complete sequencing.Multiple clones from the 59 half of each R2 were ob-tained by PCR amplification (see Materials and Meth-ods). The clones obtained from each of these amplifi-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

504 Burke et al.

FIG. 2.—R2 element phylogeny and 39 junctions. A 50% consensus tree based on the neighbor-joining method is shown, with bootstrapvalues indicated. The sequences used for the phylogeny are the 450-aa carboxyl terminal regions of the R2 ORF, the same as used previously(Burke et al. 1998). The tree was rooted using the R4 element of nematodes (Burke, Muller, and Eickbush 1995). Multiple families of R2 arefound in some species, with each family given a letter classification. Elements that have been completely sequenced (fig. 1B) are indicated byboxes around the species name. Accession numbers for previously published R2 elements can be found in Burke et al. (1998). Shown at theright are the last 16 nt of the R2 39 junctions and the sequences of the first 12 nt of the downstream 28S gene. The target sites of R2 insertionsin all arthropod rDNA units are identical.

cations contained less than 0.5% nucleotide sequencedivergence.

Four of the newly sequenced R2 elements revealedstructures that were similar to those of the previouslydescribed silkmoth and Drosophila elements (fig. 1B).The elements ranged in length from 3.5 to 4.3 kb, withmost of this length variation associated with the size oftheir 59 and 39 untranslated regions (UTRs). The se-quence of the fifth R2 element (from jewel wasp) re-vealed an unusual structure. The jewel wasp element(jewel wasp A in fig. 1B) was 7.2 kb in length andencoded two open reading frames (ORFs). The secondORF was similar to that found in all R2 elements. Thefirst ORF was similar to the ORF1 of R1 elements withits three distinctive Cys-His nucleic acid–binding motifs(see Jakubczak, Xiong, and Eickbush 1990 for a dis-cussion of R1 structure). We suggest that this unusualelement represents the fusion of a nearly complete R2element with the 59 end and first ORF of an R1 element.We do not know which of the five or more distinct R1families in the jewel wasp was the origin of this ORF,because only the 39 ends of the R1 families have beensequenced (Burke et al. 1993). The unusual hybrid R1/R2 element appears to be capable of retrotranspositionusing an R2 mechanism, as all eight copies that we havecharacterized of this element are located in the R2 targetsite and have characteristic 59 sequence variation asso-ciated with actively transposing R2 elements (discussedbelow).

The jewel wasp R2A element was originally ob-tained from a lambda phage genomic library (Burke etal. 1993). To determine if all R2 elements in the jewelwasp contained this unusual structure, our standard PCRapproach (Eickbush and Eickbush 1995; Burke et al.1998) was used to isolate first the 39 half and then the59 half of a second distinct R2 family from this species.This second R2 family is here called the B family todifferentiate it from the originally characterized A fam-ily. The 5.1-kb jewel wasp R2B element contained asingle ORF similar to that of R2 from all other arthro-pods.

To determine the phylogenetic relationship of thejewel wasp R2B element with that of the other R2s wehave redone our phylogenetic analysis of all availableR2 sequences (Burke et al. 1993). The phylogeny of thenine completely sequenced R2 elements and several par-tially sequenced R2 elements is shown in figure 2. TheR2 phylogeny is congruent with that of the host phy-logeny (see Burke et al. 1998), with the complication ofmultiple lineages and their differential maintenance orrecovery in the various arthropod taxa studied (Burke etal. 1993). The new jewel wasp R2B element forms adistinct clade with the horseshoe crab element. Thus, itwould appear that three distinct lineages of R2 havebeen maintained since the origin of arthropods (lineage1, found in jewel wasp B and horseshoe crab; lineage2, found in Japanese beetle and mealworm; and lineage3, the major lineage found in most arthropods). This

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

R2 Elements in Arthropods 505

FIG. 3.—R2 59 junctions from the various arthropod species.Shown at the top is the 28S gene sequence flanking the R2 insertionsite in all arthropods. Sequences upstream and downstream of the in-sertion site are indicated with gray and vertical shading, respectively.59 junction sequences of the R2 elements from each species are shownbelow this 28S gene sequence. Nucleotides to the left of the boldvertical line are 28S sequences, nucleotides to the right of the thinvertical line are R2 sequences, and nucleotides between the verticallines are additional nucleotides found at the 59 junctions (those derivedfrom the 28S gene sequences are boxed). The horizontal arrows rep-resent locations at which a continuous sequence is presented on twolines. The numbers of PCR-derived clones that were identical to thesequences shown are indicated at the far right.

may also be an underestimate, as additional ancient lin-eages of R2 are clearly possible. Only in the case ofDrosophila and silkmoth genomes can we be confidentthat all R2 families have been recovered.

R2 Junction Sequences and the Mechanism ofRetrotransposition39 Junctions

Figure 2 also shows that, based on their 39 junc-tions with the 28S gene, all R2 elements have identicalinsertion sites. The 28S gene itself in this region showedno sequence differences. The 39 boundary of R2 is de-termined by the initial ‘‘nick’’ of the 28S gene, whichis used to prime reverse transcription of the R2 transcript(Luan et al. 1993). Therefore, the initial nicks by the R2endonuclease appear to be identical in all arthropods.Compared with this remarkable conservation of targetsite cleavage, there is little conservation of the R2 se-quences at this 39 junction. As shown in figure 2, mostR2s end in an AT-rich sequence, frequently a poly(A)tail. In general, the length of these poly(A) tails variesbetween different copies, while there is less variation atthis junction in those copies without such homopoly-mers (data not shown).

What explains this AT-rich sequence at the R2 39junctions? The presence of poly(A) tails on certain R2elements is unlikely to be the result of a polyadenylatedtranscript; polyadenylation signal sequences are absentand R2 elements appear to be cotranscribed with the 28Sgene by RNA polymerase I (George and Eickbush1998). The AT-rich 39 junctions of R2 can, however, beexplained by properties of the R2 RT that were revealedin our in vitro studies of the silkmoth R2 TPRT reaction(Luan and Eickbush 1995, 1996; Mathews et al. 1997).Although the R2 RT initiates cDNA synthesis at the 39end of the R2 element, it does not recognize these se-quences. Instead, RNA recognition is based on the sec-ondary structure of the 39 UTR and on downstream 28Sgene sequences. Because the RNA template need notanneal to the DNA primer in the TPRT reaction, slip-page can occur, and extra nucleotides are sometimesadded. The observed preference of the silkmoth enzymeto add additional T nucleotides to the cDNA strand invitro (Luan and Eickbush 1995) would thus give rise toR2 39 junctions that could readily evolve into poly(A)tails. Alternative preferences of the R2 enzymes in otherspecies could give rise to the GC-rich sequences occa-sionally observed.

59 Junctions

Figure 3 shows the 98 R2 59 junctions we se-quenced for this study. In vitro, the silkmoth R2 endo-nuclease cleaves the top strand of the 28S gene 2 bpupstream of the bottom strand (arrows in the 28S se-quences at the top of the figure). The 2 bp of this staggerand frequently another upstream nucleotide appear to bedeleted in silkmoth R2 insertions. Comparisons of fig-ures 2 and 3 indicate that R2 insertions in other speciesresult in either no deletion of the 28S gene (D. merca-torum, jewel wasp B) or a deletion of 2–7 bp (isopod,collembola, and horseshoe crab). This finding indicates

that top strand cleavage may vary in location amongarthropods.

Based on the 59 junctions of D. melanogaster R2elements, we previously postulated (George, Burke, andEickbush 1996) that the attachment of R2 sequences tothe upstream target site occurs by means of the RTjumping (switching) from its RNA template to the up-stream DNA target sequences (summarized in fig. 4A).Because 59 truncated R2 elements can be found in D.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

506 Burke et al.

FIG. 4.—Template switch models to explain the observed se-quence variation at the 59 ends of R2 elements. Thick lines indicate28S gene sequences, a dashed line indicates R2 DNA sequences, anda stippled line indicates R2 RNA template, with darker areas repre-senting the flanking 28S rRNA sequences. The highly conserved lo-cation of the nick in the bottom strand is indicated by an asterisk. A,Template switching model showing the path used for most insertions.The R2 reverse transcriptase jumps from the R2 RNA template to thecleaved target DNA at the R2/28S gene junction (full-length insertions)or before reaching this junction (59 truncations). Because upper-strandDNA cleavage may occur at variable positions upstream of the lower-strand site, different amounts of the 28S gene may be missing fromthe 59 junction. B, Template switch model showing the origin of 28Sgene duplications at the 59 ends of many R2 elements. The templateswitch is suggested to occur within the upstream 28S rRNA sequencesto the end of the cleaved DNA, thereby generating a tandem target siteduplication. C, Template switch model showing the origin of down-stream 28S gene sequences at the 59 ends of R2 elements in the earwig.Upper-strand cleavage occurs 21 bp downstream of lower-strand cleav-age.

melanogaster that have junctions similar to the full-length element, we further suggested that this templateswitch could occur at many positions along the R2 RNAtemplate (George, Burke, and Eickbush 1996). The R259 junctions shown in figure 3 differ from those in D.melanogaster in that there are few 59 truncated elements

(only one horseshoe crab element is 59 truncated). Thevirtual absence of such 59 truncations indicates that thepremature template switching hypothesized for D. me-lanogaster is rare in most other arthropods. This mayindicate that sequences at the 59 end of R2, within eitherthe element or the upstream 28S gene sequences of acotranscript, are required for the template switch in thesespecies.

Another characteristic of the R2 59 junctions is thepresence of ‘‘additional nucleotides’’ (fig. 3). Extensiveadditional sequences are in all cases derived from the28S gene and are boxed. Short additions, or short ad-ditions flanking the 28S gene duplications, appear to be‘‘nontemplated’’ and are likely to be derived from slip-page during the template switch. In isopod, silkmoth,horseshoe crab, and jewel wasp A, R2 copies werefound that contained direct duplications of the 28S genefrom 21 to 26 bp in length (gray boxed nucleotides).These duplications cannot be described as target site du-plications because they are tandemly arranged on thesame side of the element. As shown in figure 4B, suchduplications are readily explained by a template switchmodel if 28S gene sequences are part of the R2 tran-script, and the template switch from the RNA templateoccurs after the RT has passed the 59 junction with the28S gene sequences. Once such a duplication has oc-curred, it could be propagated as part of the R2 sequenceand gradually accumulate mutations, or undergo a sub-sequent duplication event. Such mutations and/or secondduplications can be found in the elements from isopod,silkmoth, and horseshoe crab.

This template switch model can also explain thepresence of 21 bp of downstream 28S gene sequencesat the 59 ends of earwig R2 elements (vertical shading).This 28S gene sequence contains three nucleotide sub-stitutions, suggesting that this duplication was not re-cent. As shown in figure 4C, the unusual earwig junctioncould have been generated by an aberrant cleavage ofthe upper strand of the target DNA downstream of theinsertion site. As long as in subsequent R2 retrotran-sposition cycles, upper-strand cleavage is near the siteof lower-strand cleavage and the template switch occursat the normal junction with upstream 28S sequences, thenew junction would be propagated. Indeed, we have re-covered the identical 21-bp duplication from three dif-ferent earwigs collected in different years and differentlocations in the Rochester area, suggesting that this 59variant may be fixed in the local population.

We conclude that the location of the bottom-strandnick in the target DNA is the only sequence-specificfeature that has been rigidly conserved throughout ar-thropods. In comparison, the location and even the spec-ificity of the top-strand cleavage appears to be poorlyconserved. This variation in upper-strand cleavage gen-erates either short deletions, or duplications of the 28Sgene. Such variation has little functional significance forthe element, and new junction sequences frequently be-come fixed in the rDNA locus. Clearly, the extreme 59and 39 ends of R2 elements are under little selectiveconstraint.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

R2 Elements in Arthropods 507

FIG. 5.—Schematic diagram of the amino acid sequence similarity of nine arthropod R2 elements. Plotted is a sliding 15-aa window of thecombined amino acid similarity of the completely sequenced R2 elements to that from D. melanogaster. Identical amino acids are scored as11, chemically similar amino acids are scored as 10.5, and indels are given a penalty of 20.5. The RT domain modeled in figure 6 is indicated,with the seven segments conserved in the RTs of all retroelements (Xiong and Eickbush 1990) shown with darker shading and labeled with thenomenclature used in Eickbush (1994). Highly conserved amino acid regions identified in figures 6 and 8 are also shown. Shown below thegraph is a schematic diagram of the R2 ORF, with the boundaries of the three R2 domains indicated.

FIG. 6.—Comparison of the conserved sequence motifs in the amino-terminal domains of R2 elements. The CCHH motif is boxed, withthe critical residues common to all such motifs indicated below the sequences (from Berg and Shi 1996). Residues at positions 21, 3, and 6(with reference to an a-helical region) are believed to make contact with the bases of the DNA helix. The horseshoe crab and jewel wasp Belements contain three Cys-His motifs, the third of which matches that of those elements with one motif. Downstream of the CCHH motifs isa 50-aa domain with similarity to c-myb. Because a consensus c-myb motif is not available, representative c-myb sequences from mice and D.melanogaster are shown (Ogata et al. 1994).

Conserved Structure of the R2 ORF

The R2 ORFs from different arthropods vary from1,000 to 1,180 aa, with most of this variation associatedwith the extreme amino-terminal end (fig. 1B). There isessentially no sequence conservation between R2 ele-ments at this amino-terminal end, and putative Met ini-tiation codons are not conserved (data not shown). Wepreviously postulated that translation initiation beginswithin the 59 UTR of R2 (George and Eickbush 1998).

Starting at a conserved Cys-His motif near the ami-no-terminal end, the R2 ORFs from different arthropodsare readily aligned throughout their lengths. The totallevels of amino acid sequence identity between the ninedifferent R2 elements used in this study range from 23%

to 62%. Figure 5 is a graphic representation of the levelof sequence conservation in the different regions of theORF. While the RT domain is generally regarded as themost highly conserved region of a retrotransposable el-ement, figure 5 shows that other regions of the R2 ORFare equally well conserved.

The Amino-Terminal Domain

Seven of the R2 elements contain single Cys-Hismotifs at the amino-terminal ends of their ORFs, whilethe other two elements (jewel wasp B and horseshoecrab) contain three such motifs. As shown in figure 6,the single motif of most elements and the third motif injewel wasp B and horseshoe crab correspond to the well-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

508 Burke et al.

characterized Cys2-His2 (CCHH) type of zinc finger(Berg and Shi 1996). This motif was originally identi-fied in the transcription factor TFIIIa and is now knownto be the most prevalent DNA-binding motif found ineukaroytic proteins (Berg and Shi 1996). The conservedCys and His residues coordinate a zinc ion, while threehydrophobic residues form a hydrophobic core next tothis metal complex. These zinc fingers bind within themajor groove of a DNA helix and make primary contactwith the helix in residue positions 21, 3, and 6 (num-bering defined by an a-helical domain in the motif).Because all R2 elements recognize and make their initialnick in an identical 28S gene sequence, one would pre-dict that these three positions of the R2 zinc fingershould be well conserved. Consistent with this predic-tion, the R2 CCHH motifs contain a Thr or Ser at po-sition 21, a Gly at position 3, and a Leu or Val atposition 6 (Gln only in those R2 elements with multiplemotifs).

As shown in figure 6, downstream of the CCHHmotif in all R2 elements is a conserved 50-residue do-main similar to the DNA-binding motifs of the onco-protein, c-myb (Sakura et al. 1989). C-myb DNA bind-ing occurs by means of two 50-residue domains. Eachdomain is composed of three a-helixes with primaryDNA contacts involving the third a-helix (Ogata et al.1994). Clear sequence similarity can be found betweenthe R2 motif and the first and third a-helixes of the c-myb domain. The spacing between these regions in R2is similar to that of c-myb, implying a second helicalregion in R2, but little actual sequence identity can bedetected.

The R2 elements with three Cys-His motifs at theiramino-terminal ends (jewel wasp B and horseshoe crab)represent a distinct lineage in arthropods (fig. 2). Theadditional motifs of these two elements are presumablyalso involved in DNA binding but do not, in general,match the standard CCHH motif (Berg and Shi 1996).Consistent with this model, the CCHH and c-myb motifsof these two elements exhibit several differences fromthe R2 elements with a single zinc finger, a possibleconsequence of the somewhat different selective con-straints on these motifs brought about by the additionalCys-His domains.

We suggest that the amino-terminal domain of R2is a DNA-binding domain which contains both zinc fin-ger and c-myb binding motifs. As shown in figure 5, theregion downstream of this putative DNA-binding do-main is poorly conserved in sequence. This is also theonly region of the R2 protein with segmental differencesbetween arthropods. For example, collembola and jewelwasp B elements contain 60–80-aa insertions in this re-gion, while isopod R2 elements contain a 25-aa deletion.Thus, the putative amino-terminal DNA-binding domainmay be located on a variable extension from the re-mainder of the R2 protein.

Reverse Transcriptase Domain

Seven conserved segments have been found in theRT domains of all retroelements (Eickbush 1994) andare indicated in figure 4 by darker shading (labeled 1,

2, A–E). These seven segments form the fingers andpalm regions of the RT protein of HIV (Kohlstaedt etal. 1992; Unge et al. 1994). This region of the R2 pro-tein is larger than the retroviral RTs, containing addi-tional conserved regions (in particular, peaks labeled 0and 2a). Carboxyl terminal to the fingers and palm re-gions of the HIV RT is the 80-aa thumb region (Kohls-taedt et al. 1992). The 100-aa region downstream ofpeak E in the R2 ORF is as well conserved as the fingersand palm regions. While this region of the R2 ORF hasno sequence identities with the HIV RT, we believe itlikely that it serves the role of RT thumb.

To determine if the central 450-aa region of the R2ORF can fold into a structure similar to that of the HIVRT, we conducted a series of homology modeling stud-ies. These studies determine whether R2 sequences canbe folded into a structure that matches the crystal struc-ture of the HIV RT (see Materials and Methods). Asshown in figure 7A, the R2 sequence can readily be fold-ed into a right-handed structure resembling the HIV pro-tein. In figure 7B, an expanded view is shown of theactive-site cleft of the protein. Amino acid residues con-served across all R2 sequences are shown in blue andcorrespond to the peaks in figure 5. Because segmentsdefined as 0 and 2a do not exist in the HIV protein, theprogram does not assign them a structure within the ac-tive-site cleft. However, given the high degree of se-quence homology at the ends of these loops, we suggestthat in the true structure, the 0 and 2a regions wouldfold back into the core of the protein, where they helpdefine (extend) the active-site cleft. While the thumbregion of R2 has no sequence similarity with the HIVprotein, structural predictions fold this sequence into astructure resembling a thumb.

In conclusion, based on this modeling and our sim-ilarity plot (fig. 5), we propose that a 450-aa region rep-resents the RT domain of the R2 element. Throughoutthis region, sequence similarity can be found with allother non-LTR retrotransposable elements, suggestingthat the RT domain of this entire class of elements canbe folded into a similar structure (unpublished data).

The Endonuclease Domain

As shown in figure 5, the entire 250-aa regiondownstream of the RT domain is as well conserved asthe RT domain. An alignment of these carboxyl-terminalsequences is shown in figure 8. At the center of thisdomain is another highly conserved Cys-His motif withthe general structure Cys-X2–3-Cys-X7–8-His-X4-Cys(CCHC). A number of other non-LTR retrotransposonsencode similar CCHC motifs downstream of their RTdomains (Burke, Muller, and Eickbush 1995). Unlike theCCHH motif found at the amino-terminal end of the R2protein, the CCHC motif found in the carboxyl-terminaldomain is not common in other proteins. The only well-described examples of a CCHC motif are those associ-ated with the RNA-binding domains of retroviral nucle-ocapsid proteins (Berg and Shi 1996). However, thespacing between the conserved Cys and His residues ofthe R2 domain is not identical to that of retroviral nu-cleocapsids; thus, there is little reason to suggest that

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

R2 Elements in Arthropods 509

FIG. 7.—Hypothetical structures of the R2 RT domain folded by homology modeling (Sali and Blundell 1993) to HIV (Kohlstaedt et al.1992; Unge et al. 1994). A, Fingers and palm regions are shown in white, while the thumb region is shown in yellow. Residues correspondingto the peaks in figure 5 are shown in blue. B, A close-up view of the active-site cleft of the R2 protein. Two critical Asp residues in peaks Aand C of the RT domain (shown in red) are parts of adjacent b-sheets.

FIG. 8.—Comparison of the conserved sequence motifs in the carboxyl-terminal domains of R2 elements. The KPDI motif, which, whenmutated, eliminates endonuclease activity (unpublished data), and the CCHC motif have been boxed. Other residues that are highly conservedamong R2 elements are indicated by shading, using MacBoxShade to an 80% consensus.

this domain is responsible for RNA binding. Indeed, webelieve it is more reasonable to assume that the carbox-yl-terminal domain of the R2 protein represents the en-donuclease domain; the conserved amino-terminalCCHH and c-myb motifs are only involved in DNAbinding. We have attempted to identify residues withinthe R2 carboxyl-terminal domain that are candidates forthe active-site residues of the endonuclease. Mutations

within a highly conserved KPDI sequence (fig. 8) werefound to eliminate the ability of the silkmoth R2 proteinto cleave the 28S target site, but did not affect the abilityof the protein to bind DNA or conduct the TPRT reac-tion on prenicked target sites (unpublished data). Giventhe proximity of the CCHC motif with this KPDI se-quence, it is reasonable to suggest that this CCHC motifis involved in DNA binding.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

510 Burke et al.

Conclusions

Our previous analyses have suggested that R2 el-ements have been inserting into arthropod 28S genessince the origin of the phylum (Burke et al. 1998). Asshown here, throughout this history, R2 elements haveretained the same structure and mechanism of integra-tion. Comparison of the nine R2 sequences has sug-gested a domain structure for the R2 protein with thelarge central RT domain of the protein flanked by do-mains involved in DNA cleavage and binding. The high-ly conserved amino acid residues revealed by these se-quence comparisons will serve as valuable guides forsite-directed mutagenesis studies to further elucidate theR2 retrotransposition reaction.

Perhaps the most surprising finding of this studywas that despite their long presence in an invariant siteof the genome, R2 integrations generate the same ‘‘slop-py’’ junctions seen in non-LTR retrotransposons that in-sert nonspecifically throughout eukaryotic genomes. Forexample, the location of upper-strand cleavage by theR2 endonuclease appears to be inherently imprecise.Also like other non-LTR elements, certain R2 lineagescontain variable poly(A) or A-rich tails at their 39 ends,while other lineages possess more precise sequences.Sequences found at the 59 ends of R2 elements are evenmore flexible, with small deletions and duplications be-ing the norm. Non-LTR elements are notorious for theirvariable effects on the target site resulting from the sim-plicity of their TPRT mechanism of integration. R2,which has had 500 Myr to adapt its target site sequence,still relies on this very simple but imprecise mechanism.This study has increased our confidence that the silk-moth R2 element used in our biochemical studies is rep-resentative of all R2s, and remains a useful model forthe study of non-LTR retrotransposition.

Acknowledgments

This work was supported by a National Institutesof Health grant (GM42790) and a National ScienceFoundation grant (MCB-9601198) to T.H.E. We thankDanna Eickbush for comments on the manuscript.

LITERATURE CITED

BERG, J. M., and Y. SHI. 1996. The galvanization of biology:a growing appreciation for the roles of zinc. Science 271:1081–1085.

BURKE, W. D., C. C. CALALANG, and T. H. EICKBUSH. 1987.The site-specific ribosomal insertion element type II ofBombyx mori (R2Bm) contains the coding sequence for areverse transcriptase-like enzyme. Mol. Cell. Biol. 7:2221–2230.

BURKE, W. D., D. G. EICKBUSH, Y. XIONG, J. L. JAKUBCZAK,and T. H. EICKBUSH. 1993. Sequence relationship of retro-transposable elements R1 and R2 within and between di-vergent insect species. Mol. Biol. Evol. 10:163–185.

BURKE, W. D., H. S. MALIK, W. C. LATHE III, and T. H. EICK-BUSH. 1998. Are retrotransposons long-term hitchhikers?Nature 239:141–142.

BURKE, W. D., F. MULLER, and T. H. EICKBUSH. 1995. R4, anon-LTR retrotransposon specific to the large subunit rRNAgenes of nematodes. Nucleic Acids Res. 23:4628–4634.

EICKBUSH, D. G., and T. H. EICKBUSH. 1995. Vertical trans-mission of the retrotransposable elements R1 and R2 duringthe evolution of the Drosophila melanogaster species sub-group. Genetics 139:671–684.

EICKBUSH, T. H. 1994. Origin and evolution relationships ofretroelements. Pp. 121–157. in S. S. MORSE, ed. The evo-lutionary biology of viruses. Raven Press, New York.

FINNEGAN, D. J. 1997. Transposable elements: how non-LTRretrotransposons do it. Curr. Biol. 7:R245–R248.

GEORGE, J. A., W. D. BURKE, and T. H. EICKBUSH. 1996. Anal-ysis of the 59 junctions of R2 insertions with the 28S gene:implications for non-LTR retrotransposition. Genetics 142:853–863.

GEORGE, J. A., and T. H. EICKBUSH. 1999. Conserved featuresat the 59 end of Drosophila R2 retrotransposable elements:implications for transcription and translation. Insect Mol.Biol. 8:3–10.

JAKUBCZAK, J. L., W. D. BURKE, and T. H. EICKBUSH. 1991.Retrotransposable elements R1 and R2 interrupt the rRNAgenes of most insects. Proc. Natl. Acad. Sci. USA 88:3295–3299.

JAKUBCZAK, J. L., Y. XIONG, and T. H. EICKBUSH. 1990. TypeI (R1) and type II (R2) ribosomal DNA insertions of Dro-sophila melanogaster are retrotransposable elements closelyrelated to those of Bombyx mori. J. Mol. Biol. 212:37–52.

JAKUBCZAK, J. L., M. K. ZENNI, R. C. WOODRUFF, and T. H.EICKBUSH. 1992. Turnover of R1 (type I) and R2 (type II)retrotransposable elements in the ribosomal DNA of Dro-sophila melanogaster. Genetics 131:129–142.

JURKA, J. 1997. Sequence patterns indicate an enzymatic in-volvement in integration of mammalian retroposons. Proc.Natl. Acad. Sci. USA 94:1872–1877.

KOHLSTAEDT, L. A., J. WANG, J. M. FRIEDMAN, P. A. RICE,and T. A. STEITZ. 1992. Crystal structure at 3.5 A resolutionof HIV-1 reverse transcriptase complexed with an inhibitor.Science 256:1783–1790.

LATHE, W. C. III, W. D. BURKE, D. G. EICKBUSH, and T. H.EICKBUSH. 1995. Evolutionary stability of the R1 retrotran-sposable element in the genus Drosophila. Mol. Biol. Evol.12:1094–1105.

LATHE, W. C. III, and T. H. EICKBUSH. 1997. A single lineageof R2 retrotransposable elements is an active, evolutionarilystable component of the Drosophila rDNA locus. Mol. Biol.Evol. 14:1232–1241.

LUAN, D. D., and T. H. EICKBUSH. 1995. RNA template re-quirements for target DNA-primed reverse transcription bythe R2 retrotransposable element. Mol. Cell. Biol. 15:3882–3891.

. 1996. Downstream 28S gene sequences on the RNAtemplate affect the choice of primer and the accuracy ofinitiation by the R2 reverse transcriptase. Mol. Cell. Biol.16:4726–4734.

LUAN, D. D., M. H. KORMAN, J. L. JAKUBCZAK, and T. H.EICKBUSH. 1993. Reverse transcription of R2Bm RNA isprimed by a nick at the chromosomal target site: a mecha-nism for non-LTR retrotransposition. Cell 72:595–605.

MALIK, H. S., and T. H. EICKBUSH. 1999. R1 and R2 retro-transposable elements in the rDNA units of Drosophilamercatorum: abnormal abdomen revisited. Genetics 151:653–665.

MATHEWS, D. H., A. R. BANERJEE, D. D. LUAN, T. H. EICK-BUSH, and D. H. TURNER. 1997. Secondary structure modelof the RNA recognized by the reverse transcriptase fromthe R2 retrotransposable element. RNA 3:1–16.

MORAN, J. V., S. E. HOLMES, T. P. NAAS, R. J. DEBERARDINIS,J. D. BOEKE, and H. H. KAZAZIAN. 1996. High frequency

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021

R2 Elements in Arthropods 511

retrotransposition in cultured mammalian cells. Cell 87:917–927.

OGATA, K., S. MORIKAWA, H. NAKAMURA, A. SEKIKAWA, T.INOUE, H. KANAI, A. SARAI, S. ISHII, and Y. NISHIMURA.1994. Solution structure of a specific DNA complex of theMyb DNA-binding domain with cooperative recognitionhelices. Cell 79:639–648.

OKADA, N., M. HAMADA, I. OGIWARA, and K. OHSHIMA. 1997.SINEs and LINEs share common 39 sequences: a review.Gene 205:229–243.

SAITOU, N., and M. NEI. 1987. The neighbor-joining method:a new method for reconstructing phylogenetic trees. Mol.Biol. Evol. 4:406–425.

SAKURA, H., C. KANEI-ISHII, T. NAGASE, H. NAKAGOSHI, T. J.GONDA, and S. ISHII. 1989. Delineation of three functionaldomains of the transcriptional activator encoded by the c-myb protooncogene. Proc. Natl. Acad. Sci. USA 86:5758–5762.

SALI, A., and T. L. BLUNDELL. 1993. Comparative protein mod-elling by satisfaction of spatial restraints. J. Mol. Biol. 234:779–815.

SASSAMAN, D. M., B. A. DOMBROSKI, J. V. MORAN, M. L.KINBERLAND, T. P. NAAS, R. J. DEBERARDINIS, A. GABRIEL,G. D. SWERGOLD, and H. H. KAZAZIAN. 1997. Many humanL1 elements are capable of retrotransposition. Nat. Genet.16:37–43.

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.CLUSTAL W: improving the sensitivity of progressive mul-

tiple sequence alignment through sequence weighting, po-sition-specific gap penalties and weight matrix choice. Nu-cleic Acids Res. 22:4673–4680.

UNGE, T., S. KNIGHT, R. BHIKHABHAI, S. LOVGREN, Z. DAU-TER, K. WILSON, and B. STRANDBERG. 1994. 2.2 A reso-lution structure of the amino-terminal half of HIV-1 reversetranscriptase (fingers and palm subdomains). Structure 2:953–961.

WEINER, A. M., P. L. DEININGER, and A. EFSTRATIADIS. 1986.Nonviral retroposons: genes, pseudogenes, and transposableelements generated by the reverse flow of genetic infor-mation. Annu. Rev. Biochem. 55:631–661.

XIONG, Y., and T. H. EICKBUSH. 1988. Functional expressionof a sequence-specific endonuclease encoded by the retro-transposon R2Bm. Cell 55:235–246.

. 1990. Origin and evolution of retroelements basedupon their reverse transcriptase sequences. EMBO J. 9:3353–3362.

XIONG, Y., B. SAKAGUCHI, and T. H. EICKBUSH. 1988. Geneconversion can generate sequence variants in the late cho-rion multigene families of Bombyx mori. Genetics 120:221–231.

ZIMMERLY, S., H. GUO, P. S. PERLMAN, and A. M. LAMBOWITZ.1995. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82:545–554.

PIERRE CAPY, reviewing editor

Accepted December 15, 1998

Dow

nloaded from https://academ

ic.oup.com/m

be/article/16/4/502/2925438 by guest on 11 Decem

ber 2021