9
RNA Biology 8:6, 938-946; November/December 2011; © 2011 Landes Bioscience 938 RNA Biology Volume 8 Issue 6 Key words: box C/D, boxH/ACA, scaR- NA, evolution, secondary structure Submitted: 05/16/11 Revised: 06/29/11 Accepted: 07/04/11 DOI: 10.4161/rna.8.6.16603 *Correspondence to: Peter F. Stadler; Email: [email protected] T he overwhelming majority of small nucleolar RNAs (snoRNAs) fall into two clearly defined classes characterized by distinctive secondary structures and sequence motifs. A small group of diverse ncRNAs, however, shares the hallmarks of one or both classes of snoRNAs but differs substantially from the norm in some respects. Here, we compile the available information on these excep- tional cases, conduct a thorough homol- ogy search throughout the available metazoan genomes, provide improved and expanded alignments, and investi- gate the evolutionary histories of these ncRNA families as well as their mutual relationships. Introduction Small nucleolar RNAs (snoRNAs) are an abundant class of non-coding RNAs with a wide variety of cellular functions including chemical modification of RNA, telomere maintenance, pre-rRNA process- ing and regulatory activities in alternative splicing. They fall into two main classes, box C/D and box H/ACA snoRNA, which are distinguished by typical RNA secondary structures and characteristic sequence boxes. The main role of box C/D snoRNAs is to determine the targets for 2'-O-ribose methylation, which is impor- tant for rRNA maturation and regulation Animal snoRNAs and scaRNAs with exceptional structures Manja Marz, 1,† Andreas R. Gruber, 3,5,† Christian Höner zu Siederdissen, 3,† Fabian Amman, 3 Stefan Badelt, 3 Sebastian Bartschat, 4 Stephan H. Bernhart, 3 Wolfgang Beyer, 3 Stephanie Kehr, 4 Ronny Lorenz, 3 Andrea Tanzer, 6 Dilmurat Yusuf, 2 Hakim Tafer, 4 Ivo L. Hofacker 3,9 and Peter F. Stadler 3,4,7-10, * 1 RNA Bioinformatik Gruppe; Institut für Pharmazeutische Chemie; Philipps University at Marburg; Marburg, Germany; 2 Department of Oncology and CREATE Health Strategic Centre for Clinical Cancer Research; Lund University; Lund, Sweden; 3 Department of Theoretical Chemistry; University of Vienna; Vienna, Austria; 4 Bioinformatics Group; Department of Computer Science and Interdisciplinary Center for Bioinformatics; University of Leipzig; Leipzig, Germany; 5 Biozentrum; Universität Basel and Swiss Institute of Bioinformatics; Basel, Switzerland; 6 Bioinformatics and Genomics Group; Centre de Regulació Genòmica; PRBB; Barcelona, Spain; 7 Max Planck Institute for Mathematics in the Sciences; 8 Fraunhofer Institut für Zelltherapie und Immunologie; Leipzig, Germany; 9 Center for non-coding RNA in Technology and Health; University of Copenhagen; Frederiksberg C, Denmark; 10 Santa Fe Institute; Santa Fe; NM USA These authors contributed equally to this work. of splicing of some mRNAs. 1 Box H/ ACA snoRNA facilitates the conversion of Uracil to pseudouracil (Ψ) in a specific sequence context, 2 which is determined by hybridization of snoRNA and target RNA, in most cases a ribosomal RNA. The target U is positioned between two specific interactions of the flanking target RNA sequence with the complementary sequence of the recognition loop of the snoRNA. 3 A subset of H/ACA snoRNAs features additional sequence motifs that direct them to the Cajal body, 4 a nuclear organelle that plays an important role in assembly and/or modification of the nuclear-tran- scription and RNA-processing machin- ery. 5 Cajal body-specific small nuclear RNAs (scaRNAs) function as guide RNAs just like ordinary snoRNAs. Their targets, however, are primarily the pol-II transcribed spliceosomal RNAs. 6 The common structural features of each of the two snoRNA classes can be attributed to their incorporation into ribonucleoparticles (snoRNP) that share class-specific protein components. 7 The class-specific secondary structures are also heavily used in computational approaches to detect snoRNAs in genomic DNA data (fisher 8 (H/ACA), snoGPS 9 (H/ACA), SnoScan 10 (C/D), SNO.pl 11 (C/D), sno- Seeker 12 (both), and SnoReport 13 (both)) and to predict their targets (snoTARGET, 14

Animal snoRNAs and scaRNAs with exceptional structures

Embed Size (px)

Citation preview

©2011 Landes Bioscience.Do not distribute.

RNA Biology 8:6, 938-946; November/December 2011; © 2011 Landes Bioscience

938 RNA Biology Volume 8 Issue 6

Key words: box C/D, boxH/ACA, scaR-NA, evolution, secondary structure

Submitted: 05/16/11

Revised: 06/29/11

Accepted: 07/04/11

DOI: 10.4161/rna.8.6.16603

*Correspondence to: Peter F. Stadler; Email: [email protected]

The overwhelming majority of small nucleolar RNAs (snoRNAs) fall into

two clearly defined classes characterized by distinctive secondary structures and sequence motifs. A small group of diverse ncRNAs, however, shares the hallmarks of one or both classes of snoRNAs but differs substantially from the norm in some respects. Here, we compile the available information on these excep-tional cases, conduct a thorough homol-ogy search throughout the available metazoan genomes, provide improved and expanded alignments, and investi-gate the evolutionary histories of these ncRNA families as well as their mutual relationships.

Introduction

Small nucleolar RNAs (snoRNAs) are an abundant class of non-coding RNAs with a wide variety of cellular functions including chemical modification of RNA, telomere maintenance, pre-rRNA process-ing and regulatory activities in alternative splicing. They fall into two main classes, box C/D and box H/ACA snoRNA, which are distinguished by typical RNA secondary structures and characteristic sequence boxes. The main role of box C/D snoRNAs is to determine the targets for 2'-O-ribose methylation, which is impor-tant for rRNA maturation and regulation

Animal snoRNAs and scaRNAs with exceptional structures

Manja Marz,1,† Andreas R. Gruber,3,5,† Christian Höner zu Siederdissen,3,† Fabian Amman,3 Stefan Badelt,3 Sebastian Bartschat,4 Stephan H. Bernhart,3 Wolfgang Beyer,3 Stephanie Kehr,4 Ronny Lorenz,3 Andrea Tanzer,6 Dilmurat Yusuf,2 Hakim Tafer,4 Ivo L. Hofacker3,9 and Peter F. Stadler3,4,7-10,*1RNA Bioinformatik Gruppe; Institut für Pharmazeutische Chemie; Philipps University at Marburg; Marburg, Germany; 2Department of Oncology and

CREATE Health Strategic Centre for Clinical Cancer Research; Lund University; Lund, Sweden; 3Department of Theoretical Chemistry; University of Vienna;

Vienna, Austria; 4Bioinformatics Group; Department of Computer Science and Interdisciplinary Center for Bioinformatics; University of Leipzig; Leipzig,

Germany; 5Biozentrum; Universität Basel and Swiss Institute of Bioinformatics; Basel, Switzerland; 6Bioinformatics and Genomics Group; Centre de Regulació

Genòmica; PRBB; Barcelona, Spain; 7Max Planck Institute for Mathematics in the Sciences; 8Fraunhofer Institut für Zelltherapie und Immunologie; Leipzig,

Germany; 9Center for non-coding RNA in Technology and Health; University of Copenhagen; Frederiksberg C, Denmark; 10Santa Fe Institute; Santa Fe; NM USA†These authors contributed equally to this work.

of splicing of some mRNAs.1 Box H/ACA snoRNA facilitates the conversion of Uracil to pseudouracil (Ψ) in a specific sequence context,2 which is determined by hybridization of snoRNA and target RNA, in most cases a ribosomal RNA. The target U is positioned between two specific interactions of the flanking target RNA sequence with the complementary sequence of the recognition loop of the snoRNA.3

A subset of H/ACA snoRNAs features additional sequence motifs that direct them to the Cajal body,4 a nuclear organelle that plays an important role in assembly and/or modification of the nuclear-tran-scription and RNA-processing machin-ery.5 Cajal body-specific small nuclear RNAs (scaRNAs) function as guide RNAs just like ordinary snoRNAs. Their targets, however, are primarily the pol-II transcribed spliceosomal RNAs.6

The common structural features of each of the two snoRNA classes can be attributed to their incorporation into ribonucleoparticles (snoRNP) that share class-specific protein components.7 The class-specific secondary structures are also heavily used in computational approaches to detect snoRNAs in genomic DNA data (fisher8 (H/ACA), snoGPS9 (H/ACA), SnoScan10 (C/D), SNO.pl11 (C/D), sno-Seeker12 (both), and SnoReport13 (both)) and to predict their targets (snoTARGET,14

©2011 Landes Bioscience.Do not distribute.

www.landesbioscience.com RNA Biology 939

RNA FAmILIes

of the ribosomal RNA precursors instead of directing chemical modifications. In the absence of SNORD22, mature 18S rRNA fails to accumulate.20 SNORD22 is expressed from the conserved host gene SNHG1 (also known as UHG),21 which shows little sequence conservation on the exons and besides SNORD22 harbours several other snoRNAs.22 Expression of SNORD22 is known in human and mouse,22 Xenopus,20 and rhesus.23 In addition Schmitz et al .24 annotates the platypus C/D box snoRNA Oa1907 as an ortholog of the human SNORD22 and Oa1925 as an ortholog of a human SNORD22 pseudogene. With a length between 116 nt (Petromyzon) and 129 nt (Microcebus), it just slightly exceeds the typical length range for box C/D snoRNAs.

The sequence of SNORD22 is well-conserved. Homologs are described throughout the gnathostomes in snoOPY (snoopy.med.miyazaki-u.ac.jp), and addi-tional SNORD22 sequences are detectable in basal vertebrates and in the cephalo-chordate Branchiostoma floridae. The host gene SNHG1 also harbours several other unrelated, canonical snoRNAs, Table 2, namely snR56, SNORD25, SNORD26, SNORD27, SNORD28, SNORD29, SNORD30, SNORD31. Some of these snoRNAs, including SNORD22, are duplicated in some species. The genomic position of the host genes SNHG1 is syntenically conserved within tetrapods (between SLC3A2 and WDR74) and, at a different locus, within teleosts (between B3GAT3 and ARL2).

HBI-43 (SNORD17). This snoRNA, and its mouse ortholog MBI-43,25 stand out because of its exceptional size for a box C/D snoRNA, ranging from 187 nt in Tetraodon to 256 nt in Spermophilus. Two paralogous sequences are reported for rhe-sus,23 and a chicken ortholog GGN47 is reported in reference 26. It has a predicted target in the 28S rRNA (human U3797). Mammalian homologs are reported in snoRNA-LBME-db. Our homology search identified orthologs throughout all gnathostomes. The snoOPY database, furthermore, lists the C. elegans gene Y74C9A.6 as a homolog.

In Eutheria, SNORD17 is located in an intron of SNX5. The second copy in the

the large loop, in the order C/D’/C’/D. Box H/ACA snoRNAs typically consist of two stem-loop structures, each featur-ing a nearly symmetric internal loop. The box H = ANANNA is located between the two components, while the 3' end is formed by ACANNN. Cajal body specific snoRNAs (scaRNAs) of the H/ACA type in addition carry a CAB box = AGAG in the hairpin loops.4

Several snoRNAs exhibit some of these characteristic elements but feature unex-pected deviations of the norm in the form of exceptional length, deviant secondary structures, or hybrid architectures. The best-conserved snoRNA, the box C/D snoRNA U3, has already been dealt with in.17 Here we review in detail 13 addi-tional snoRNA families, Figure 2, whose lengths substantially exceed the expecta-tions, focussing on their evolutionary conservation and structural peculiarities. Two of the families, Z32 and GGN68 were previously not described in Rfam, for the remaining ones we added additional sequences and improved the structural models. To this end, we performed a com-prehensive search for homologs across the available animal genomes, summarized in Table 1, and constructed new alignments and consensus structures for each of these families.

Results

Box C/D snoRNAs and scaRNAs. U22 (SNORD22). The SNORD22 RNA, like U3, U8 and U14, belongs to the small group of snoRNAs that govern the cleavage

RNAsnoop,15 plexy16). On the other hand, there are several ncRNAs that exhibit characteristic hallmarks of snoRNAs, while their secondary structures deviate substantially from the consensus patterns. At least in some cases, such as the U3 RNA,17 such deviations are explained by the presence of additional sequence boxes and corresponding additional components of snoRNP. Figure 1 shows the length dis-tribution of human snoRNAs taken from snoRNA-LBME-db.18 Typical box C/D RNAs have a length of 80 ± 11 nt. 14 of 272 sequences are longer than 120 nt. For box H/ACA we find an average length of 134 ± 7, 8 of 121 longer than 170 nt. Of the 21 scaRNAs, 3 are regular box C/D RNAs (of which U90 has length of 425), and 9 are regular H/ACA RNAs (of which ACA47 has a length of 188). The remain-ing 9 scaRNAs have exceptional com-posite structures: Three snoRNAs (U22, HBI-43 and U6-77) have two C/D box domains, one snoRNA combines two H/ACA box domains comprising a total of four hairpins (U93), four snoRNAs com-bine a box C/D and a boxH/ACA domain (U85, U87, U88 and U89), and the final case is telomerase RNA (which is not con-sidered in this contribution).

The two classes of snoRNAs, which are named for the characteristic sequence motifs that characterize them, also exhibit distinctive secondary structures19 (see Fig. 2, top). Box C/D snoRNAs have the boxes C = RUG AUG A and D = CUGA near the 5' and 3', resp., enclosed by a short terminal helix. In many cases, divergent copies of the boxes, are positioned within

Figure 1. Length distribution of human snoRNAs. Box C/D, box H/ACA and snoRNAs with atypical architecture (e.g., those with both a C/D and an H/ACA domain) are shown by different colors. The scaRNAs, characterized by an additional localization signal, belong to either one of these three classes. Their number is shown by the black curve.

©2011 Landes Bioscience.Do not distribute.

940 RNA Biology Volume 8 Issue 6

vertebrates (exon numbers refer to the human LARP4 gene). In addition, paralo-gous sequences are found in several spe-cies, often populating additional exons of LARP4. Short RNA fragments deriv-ing from ggn68, as observed for many other snoRNAs,31,32 have been reported

a long low complexity region in the central part of the sequence. The total length of the gene is thus highly variable, ranging from 163 nt in Mus to 289 nt in Tursiops.

With few exceptions, ggn68 homologs are found in the intron between exons 10 and 11 of the LARP4 gene throughout

human genome is found in an intron of the adjacent OVOL2 gene. In T. belangeri, E. caballus, and S. araneus the annotated SNX5 gene is located downstream of SNORD17. Most likely, this is caused by incomplete gene models rather than a relocation of SNORD17. In Monodelphis, a second copy is located in CSRP2BP. In earlier gnathostomes, including the shark C. milii, however, SNORD17 is associated within dyskerin (DKC1), a key compo-nent of box H/ACA snoRNPs.

mgU6-77 (SNORD10). SNORD10 exceeds the typical length of a box C/D snoRNA by 80% (148 nt in human). It directs 2'-O-methylation of both C77 in U6 and C2970 in 28S in Xenopus oocytes.27 SNORD10 is located in a cluster with SNORD48 (upstream) and SNORD67 (downstream) in EIF4A1, the eukaryotic translation initiation factor 4A. This association with SNORD48 and SNORD67 is evolutionarily recent. In ver-tebrates other than Eutheria, EIF4A1 har-bors SNORD10 only. While we have been unable to detect a SNORD10 homolog in B. floridae, P. marinus and C. milii, there are putative homologs in S. purpuratus, S. kowalevskii and even A. californica, which, however, are not located in an intron of EIF4A1. The snoOPY data-base, furthermore, lists the C. elegans gene T03F7.8 as a homolog.

GGN68. A survey of ncRNAs in chicken26 reported 125 ncRNAs, among them many novel snoRNAs as well as 10 sequences that remained without firm annotation. Five of these can be identified as members of previously described ncRNA families (ggn147 = vault RNA;28 ggn103 = U4atac;29 ggn141 = snoRNA ACA25;30 ggn67 = snoRNA ACA57 (SCARNA11);6 ggn46 = snoRNA U43), while no homologs outside the sauropsids are detectable for four ncRNAs. The final unclassified sequence, ggn68, EU240284, which is 227 nt in length, however, is highly conserved across the vertebrates. The ggn68 RNA features highly con-served C and D boxes as well as a short terminal hairpin, and a highly conserved sequence motif AGA TTA TGA GAT upstream of D the box, which most likely corresponds to the guide region. As a snoRNA, it is exceptional since it contains

Figure 2. schematic drawing of metazoan snoRNA structures. Recognizable conserved sequence boxes C, D, H and A (for ACA) are annotated. Boxes labeled G indicate the presence of guiding sequences as annotated in snoRNA-LBme-db. Conservation analysis of potential guide regions did not reveal any regions than those already annotated in snoRNA-LBme-db. Dotted lines indicate G/U-rich low-complexity regions.

©2011 Landes Bioscience.Do not distribute.

www.landesbioscience.com RNA Biology 941

Table 1. Overview, phylogenetic distribution and host genes of 13 unnormal snoRNAs

C/D box snoRNAs H/ACA Dual scaRNA RNAs

Organisms

sN

ORD

22/U

22

sN

ORD

17/H

BI-4

3

sN

ORD

10/U

6-77

GG

N68

sca

RNA

7/U

90

sN

ORA

53/A

CA53

sca

RNA

16/

ACA

47

sca

RNA

9/Z3

2

sca

RNA

12/U

89

sca

RNA

6/U

88

sca

RNA

5/U

87

sca

RNA

10/U

85

sca

RNA

13/U

93

Primates [4],2 [7] [6] [4] [6] [7] [5],(1),1 [5],1 [7] [5] [5] [8] [5]

Glires + tbe [6],(1) [6] [6] [7] [5] [4] [3],(1),2 [5] [5] [4] [6] [7] [6]

Laurasiatheria [5],3 [7] [8] [8] [6] [7] [3],(1),3 [5],(1),{1} [6] [5] [7] [8-1] [5]

Afrotheria [1],1 [1] [2] [1] [1] [1] [1],1 [2] [2] [1] [1] [2] [2]

Xenarthra (1) [1] – [2] [1] [1] 1 [1] [1] [1] [2] [2] [1]

M. domestica [1] [1],{1} [1] (1),{1} [1] [1] [1] [1] [1] [1] [1] [1] [1]

O. anatinus 1 (1) – [2-1] [1] [1] 1 [1] – [1] [1] – [1]

A. carolinensis 1 (1) [1] [1] [1] [1] [1] 1 – [1] [1] [1] [1]

Aves 1 (3-1) – [3] [3] [3] [2] [3] – [2] [2] [2] [3-1]

X. tropicalis 1 (1) [1] [2-1] [1] [1] 1 – – [1] [1] [2-1] –

Teleostei {5} (5) [5] [13-8] [4] [3-1] (1),3 [2],2 – [2] [6-1] [5] [5]

C. milii 1 (1) – [1] – 1 – 1 – [1] [1] – –

P. marinus 1 – – [1] – [1] – 1 – – – [1] –

B. floridae 1 – – $2-1$ – 1 – – – – – [1] –

Tunicata – – – *[7-5]* – – – – – – – – –

S. purpuratus – – 1 – – [1] – – – – – – –

S. kowalevskii – – 1 – – – – – – – – – –

Drosophila – – – – – – – – – – – (12) –

Panarthropoda – – – – – – – – – – – 6 –

P. humanus – – – – – 1 – – – – – – –

D. pulex – – – – – 1 – – – – – – –

H. robusta – – – – – 1 – – – – – – –

L. gigantea – – – – – 1 – – – – – – –

C. capitata – – – – – – – – – – – 1 –

A. californica – – 1 – – 1 – – – – – 1 –

N. vectensis – – – – – 1 – – – – – – –

R. spez – – – – – 1 – – – – – – –

T. adhaerens – – – – – 1 – – – – – – –

Host Genes

◊[sL

C3A

2+W

DR7

4](s

LC3A

2)◊{

B3G

AT3+

ARL

2}

[sN

X5]

(DKC

1){C

sRP2

BP}

[eIF

4A]

[LA

RP4]

(WD

R41)

{CD

C25A

}$U

BAP2

L$*X

P_00

2125

970*

[KPN

A4]

[sLC

53A

2]

◊[m

GAT

5B+s

eC14

L1]

(PD

HA

1)

[KIA

A17

31]

(PLR

G1)

{seT

D1B

}

[PH

B2]

[APG

16L1

]

[APG

16L1

]

[NCA

PD2]

(CG

1142

)

◊[G

LRX5

]

Numbers of animals containing the observed snoRNAs are listed. The type of parentheses indicatesassociation with the host genes listed in the last row of the table. The symbol ♦ refers to a non-coding transcript located between or adjacent to gene(s) listed in the parentheses. Numbers notenclosed by parentheses refer to species where no host gene could be determined. Horizontal lines indicate the phylogenetic range of previously reported sequences included in the reference 18. E. europaeus, S. araneus; Afrotheria: L. africana, E. telfairi; Xenarthra: D. novemcinc-tus, C. hoffmanni; Aves: T. guttata, G. gallus, M. gallopavo; Teleostei: D. rerio, T. nigroviridis, T. rubripes, G. aculeatus, O. latipes; Tunicata: O. dioica, C. intestinalis, C. savignyi; Drosophila: D. pseudoobscura, D. yakuba, D. melanogaster, D. erecta, D. simulans, D. sechellia, D. grimshawi, D. mojavensis, D. persimilis, D. virilis, D. ananassae; Panarthropoda: G. mositans, A. aegypti, A. gambiae, N. vitripennis, C. quinquefasciatus, T. castaneum.

©2011 Landes Bioscience.Do not distribute.

942 RNA Biology Volume 8 Issue 6

appears to localize to the Cajal body. The two component snoRNAs, designated mgU2-19 and mgU2-30, exist as sepa-rate entitites localized to the nucleolus.37 A mouse homolog of mgU2-30 has been reported as Z32 snoRNA in GenBank entry AJ242789. The sequence of SCARNA9 is quite well conserved and can be found in all vertebrates. In tetrapods and some tele-osts it is associated with KIAA1731, a con-served protein of unknown function. In zebrafish, GBAS serves as host gene for the SCARNA9 homolog. In M. lucifugus and E. europaeus SCARNA9 is associated with the proteins PLRG1 and SETD1B, respec-tively. A comparison of the two box C/D components shows no evidence that they might have arisen by tandem duplication, Figure 3.

study, SCARNA16 was found in tetrapods and teleosts. In Tetrapoda, it derives from a non-coding host gene designated C17orf86 located between MGAT5B and SEC14L1. In teleosts, on the other hand, SCARNA16 resides in an intron of the protein coding gene PDHA1.

Dual scaRNAs. Dual snoRNAs are composed of two complete snoRNA domains. They can be either of the same type or comprise both a box C/D and a box H/ACA domain. See Figure 2 for schematic drawings.

mgU2-19/30 (SCARNA9). This scaRNA is composed of two C/D box domains with predicted targets in the U2 snRNA (nucleotides G19 and A30).37 The two box C/D domains are separated by a G/U-rich linker. The full-length molecule

for human,33 chicken,34 cattle,35 and monotremes.36 The latter survey anno-tates one of the fragments as microRNA oan-mir-1357.

Using plexy,16 position 676 of the human 18S rRNA is predicted as a pos-sible target for ggn68. Although the target location is well conserved, the predicted binding energies are rather moderate, and the position does not coincide with a known methylation site.

U90 (SCARNA7). This box C/D RNA, which localizes to Cajal bodies in HeLa cells, is predicted to guide the 2'-O-ribose methylation of residue A70 of U1 snRNA.6 Its length ranges from 285 nt in T. gutatta to 425 nt in G. aculeatus, a large part of which consists of an exceptionally long and quite well-conserved, roughly symmetric region between C and D’, and C’ and D box, respectively. The unusual length vari-ation is caused by a repetitive G/U insert. Homologs throughout the amniotes have been reported in snoRNA-LBME-db. We find that SCARNA7 is conserved in all vertebrates, always located in an intron of the KPNA4 gene.

Box H/ACA snoRNA and scaRNA. ACA53 (SNORA53). With a length well above 200 nt this orphan snoRNA is excep-tionally large for an otherwise inconspicu-ous box H/ACA snoRNA. Experimental surveys for snoRNAs have uncovered SNORA53 homologs in human,30 rhe-sus,23 platypus (Oa1744),24 and chicken (GGN66).26 Orthologous sequences across the tetrapoda are reported in the snoOPY database. SNORA53 is one of the most divergent snoRNAs in both sequence and structure. Our systematic survey records SNORA53 sequences through-out the metazoa. Across deuterostomes, it is located in an intron of the highly con-served mitochondrial phosphate carrier SLC53A3. The Strongylocentrotus purpu-ratus candidate contains an insert almost 1,700 nt in length. It remains unclear, however, whether this is an artifact in the genome assembly.

ACA47 (SCARNA16). This box H/ACA scaRNA shows a regular architecture. With a length between 150 nt in T. gutatta and 188 nt in C. familiaris, it lies outside the typ-ical size range for box H/ACA snoRNAs, however. It was experimentally detected in human30 and chicken (GGN10).26 In our

Table 2. sNORD22 is part of a snoRNA cluster consisting of snR56 = sNORD25, sNORD26, sNORD27, sNORD28, sNORD29, sNORD30 and sNORD31 as displayed in the table from 5’ to 3’, according to eNsemBL

Organism 25 26 27 28 22 29 30 31 22 31 31

H. sapiens x x x x x x x x x

P. troglodytes x x x x x x x x x

P. pygmaeus x x x x x x

“ x x x x x x x

M. mulata x x x x x x x x x

O. garnettii x x x x x x x

M. murinus(s) x x x x

M. musculus x x x x x x x x

R. norvegicus x x x x x x x x

S. tridec. x x x x x x x

C. porcellus x x x x x x x x

T. belangeri x x x x x

B. taurus x x x x x x x x x

E. caballus x x x x x x x x

M. lucifugus x x x x x x

S. araneus(s) x x x

M. domestica x x x x x x x x x

T. guttata x x x

X. tropicalis x x x x x* x x

T. nigroviridis x x x x x x x

T. rubripes x x x x x x x x

O. latipes x x x x x x

D. rerio x* x x x x x

In P. pygmaeus the cluster was copied, snR56 is located directly downstream of sNORD22. *In X. tropicalis (and D. rerio) another copy of sNORD31 is located downstream of sNORD22 (and snR56) and upstream of sNORD29. sc., This might be an assembly artefact due to its location on a short scaffold. s. tridec., s. tridecemlineatus. The current assembly of the G. aculeatus genome shows a totally different cluster: sNORD22-sNORD31-sNORD29-sNORD31-sNORD-31sNORD29-sNORD29-sNORD22-sNORD31-sNORD31.

©2011 Landes Bioscience.Do not distribute.

www.landesbioscience.com RNA Biology 943

showed that SCARNA10 contains two functional copies of the CAB box.4

In deuterostomes, SCARNA10 is consistently found in an intron of the NCAPD2 gene. Its Drosophila homo-log is located in an intron of CG1142, a gene of unknown function that can be identified as a homolog of NCAPD2 by sequence alignment. In C. elegans, the box H/ACA snoRNA ΨCeU5-48,41 also known as CeN105,42 is predicted to guide the modification of the homolo-gous position in the U5 snRNA. In refer-ence 41, an evolutionary relation between ΨCeU5-48 and SCARNA10 as well as SCARNA12 was suggested. The discov-ery of complete SCARNA10 homologs in several lophotrochozoan taxa suggests, however, that SCARNA10 has lost its box C/D domain in nematodes.

The similarity of the unusual architec-tures of the SCARNA10 and SCARNA12 families suggests that they might be ancient paralogs. In order to test this hypothesis, we computed the infernal bit scores of alignments of members of one family against the covariance model of the other family, Figure 4. In each case, the scores, which average above 0 are significantly larger than the expected score from a randomized background control, indicating that SCARNA10 and SCARNA12 are indeed paralogous snoRNA families. In contrast, there is no evidence for an evolutionary relation of CeN105 and the H/ACA domain of either SCARNA10 or SCARNA12.

U87 (SCARNA5) and U88 (SCARNA6). These two paralogous scaRNAs contain

a C/D box domain and has been shown to localize to the Cajal bodies.6 It is predicted to modify U46 in U5 snRNA. Its length ranges from 235 nt in Echinops to 283 nt in Monodelphis. Since we have been unable to trace the gene beyond Theria, we suspect that it is a recent innovation. All SCARNA12 sequences are located in an intron of prohibitin 2.

U85 (SCARNA10). SCARNA10 has a similar architecture as SCARNA12. In contrast to the latter, however, it is among the best-conserved snoRNAs, which we can trace through most of the metazoan phyla. Originally detected in human,40 the 5' end of its mouse homolog was reported as mouse MBI-52 C/D box RNA in reference 25. The cow “microRNA” mir-2424-1 origi-nates from the 3' end of the bovine U85 scaRNA. The Drosophila homolog, snoRNA:MeU5-C46, was also described in reference 40, and compared in detail to the human sequence. Mutagenesis studies

U93 (SCARNA13). SCARNA13 was shown to colocalize with coilin in Cajal bodies and guide the pseudouridylation of residue 54 in the U2 spliceosomal snRNA and of residue 53 in snRNA U5 (human coordinates).38,39 The pseuduridylation of both positions was experimentally vali-dated in human, mouse and cow.38 It con-sists of two tandemly arranged, otherwise inconspicuous, box H/ACA domains. With a size of 252 nt (Tetraodon) to 274 nt (Echinops) it perfectly matches the expectation for a duplicated box H/ACA snoRNA. In both Tetrapoda and teleosts it is located in a poorly characterized non-coding host gene SNHG10, downstream of the highly conserved GLRX5 gene. In contrast to SCARNA9 the two H/ACA components might be distant homo-logs (Fig. 3), suggesting that this dual snoRNA arose from a tandem duplication of a canonical H/ACA snoRNA.

U89 (SCARNA12). SCARNA12 is composed of an H/ACA box domain and

Figure 3. Distribution of infernal bit scores of the sequence of the 5’ component aligned to the covariance model of the 3’ component (R on L) and vice versa for sCARNA9 and sCARNA13. The background distribution (randomized sequences) is shown in gray. While there is no indication that the 5’ and 3’ components are related for sCARNA9, the shift of the bit score distributions towards higher values for sCARNA13 shows that the sequences of the two parts (L and R) of this snoRNA are more similar than expected. Although this does not constitute an iron-clad proof, it serves at least as a strong indication that L and R are homologs and like arose through a tandem duplication event.

Figure 4. Distant homology of the sCARNA10 and sCARNA12 families. A comparison of infernal bit scores for alignments of sCARNA12 against the sCARNA10 covariance model (l.h.s., black histogram), and vice versa (r.h.s., black histogram) shows that the sequences of one family fit much better to the model of the other family than random sequences fitting the same secondary structure (gray background).

©2011 Landes Bioscience.Do not distribute.

944 RNA Biology Volume 8 Issue 6

box H/ACA domains appears to be the product of a tandem duplication.

Beyond the detailed characterization of the aberrant animal snoRNAs, this study once again stresses the limitations of homology search approaches for non-coding RNAs. Only SCANRA10 can be unambiguously traced throughout the animal kingdom, while in most other cases our knowledge remains restricted to chordates or deuterostomes. For snoRNAs, it has repeatedly been proposed that snoRNAs targeting orthologous posi-tions in rRNAs or snRNAs can be consid-ered as homologs. The deep conservation of the sites of chemical modifications has lead to the construction of correspon-dence Tables of snoRNAs across kingdom boundaries, reviewed in reference 43, for a recent list comprising animals, fungi and plants. Although this hypothesis is very plausible, and there is evidence that a few of the corresponding snoRNAs are indeed homologs, in most cases there is little direct

from different genomic locations. They still share enough sequence similar-ity to identify them as ancient para-logs. Although we have found distinct SCARNA12 sequences in mammals only, the divergence of the two paralog groups suggests that either SCARNA12 underwent extensive adapative changes, or more likely, that the duplication dates back to a much earlier point in vertebrate phylogeny. There is no evidence, on the other hand, that all four C/D H/ACA hybrids share a common ancestor.

SCARNA9, a fusion of two box C/D snoRNAs that can be expressed as both a single and two separate molecules, shows no homology between the two compo-nents. With the rapid evolution of in particular box C/D snoRNA, we cannot be certain, however, that a hypothetical duplication does not simply date back so far that the residual sequence similarity meanwhile has been erased. In contrast, SCARNA13, a fusion of two complete

both a box C/D domain (targeting U5 snoRNA, human position U41) and an H/ACA domain. Furthermore, the box C/D component of SCARNA6 guides meth-ylation of snRNA U4 (human position A65). Both scaRNAs are found in distinct introns of the human ATG16L1 mRNA.6 The mouse RNA MBI-46,25 is the homo-log of SCARNA6. In chicken, a 177 nt fragment GGN31 of the SCARNA5 homolog was reported in reference 26. The length of SCARNA5 ranges from 260 nt in Takifugu to 291 nt in Monodelphis, while SCARNA6 is slightly shorter, (225 nt Canis to 276 nt in Monodelphis). It is worth noting that SCARNA5 is fre-quently misannoated as SCARNA6 in the current release of ENSEMBL. An interesting peculiarity of both SCARNA5 and SCARNA6 is the absence of well-conserved target sequences. Possibly, the H/ACA domain, which—similar to SCARNA10—contains conserved CAB boxes in its hairpin loops, only mediates transport to the Cajal body.

Figure 5 shows that U87 and U88 snoRNAs can be traced throughout gna-thostome evolution. Although remain-ing in association with the host gene ATG16L1 (with the possible exception of C. milii), both RNAs have been relocated to different introns several times during vertebrate evolution.

Discussion

Aberrant snoRNAs fall into two broad classes: Some exceptional structures are characterized by the incorporation of additional sequence and structure motifs, whose function so far remains to be elucidated. SCARNA7, SCARNA9 and GGN68 with their low complexity regions, and SCARNA16 with its addi-tional hairpin-shaped insert belong to this group. The second class arose through fusion of snoRNAs.

The four known cases of C/DH/ACA hybrids, all of which are scaRNAs, share a common architecture consisting of an H/ACA component that is inserted into the loop of the C/D component. Two of these, SCARNA5 and SCARNA6 share the same hostgene and are clearly paralogous. The other two examples, SCARNA10 and SCARNA12, come

Figure 5. Location of the homologs of sCARNA5 and sCARNA6 in the ATG16L1 gene. Homology of introns was established by sequence alignments. The scaRNAs jumped to different positions several times during vertebrate evolution. exons numbers correspond to the human gene.

©2011 Landes Bioscience.Do not distribute.

www.landesbioscience.com RNA Biology 945

Note

Supplemental materials can be found at:www.landesbioscience.com/journals /rnabiology/article/16603

References1. Samarsky DA, Fournier MJ, Singer RH, Bertrand

E. The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization. EMBO 1998; 17:3747-57.

2. Bachellerie JP, Cavaill¥e J, H®uttenhofer A. The expanding snoRNA world. Biochimie 2002; 84:775-90.

3. Ni J, Tien AL, Fournier MJ. Small nucleolar RNAs direct site-specific synthesis of pseudouridine in ribo-somal RNA. Cell 1997; 89:565-73.

4. Richard P, Darzacq X, Bertrand E, J¥ady BE, Verheggen C, Kiss

T. A common sequence motif determines the Cajal body-specific localization of box H/ACA scaRNAs. EMBO J 2003; 22:4283-93.

5. Gall JG. The centennial of the Cajal body. Nature Reviews Mol Cell Biol 2003; 4:975-80.

6. Darzacq X, J¥ady BE, Verheggen C, Kiss AM, Bertrand E, Kiss

T. Cajal body-specific small nuclear RNAs: a novel class of 2'-O-methylation and pseudouridylation guide RNAs. EMBO J 2002; 21:2746-56.

7. Reichow SL, Hamma T, Ferr¥e-D’Amar¥e AR, Varani G. The structure and function of small nucle-olar ribonucleoproteins. Nucleic Acids Res 2007; 35:1452-64.

8. Edvardsson S, Gardner PP, Poole AM, Hendy MD, Penny D, Moulton V. A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction. Bioinformatics 2002; 19:865-73.

9. Schattner P, Decatur WA, Davis CA, Ares M, Fournier MJ, Lowe TM. Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. Nucleic Acids Res 2004; 32:4281-96.

10. Lowe TM, Eddy SR. A computational screen for methylation guide snoRNAs in yeast. Science 1999; 283:1168-71.

11. Fedorov A, Stombaugh J, Harr MW, Yu S, Nasalean L, Shepelev V. Computer identification of snoR-NA genes using a Mammalian Orthologous Intron Database. Nucleic Acids Res 2005; 33:4578-83.

12. Yang JH, et al. snoSeeker: an advanced computa-tional package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res 2006; 34:5112-23.

13. Hertel J, Hofacker IL, Stadler PF. snoReport: Computational identification of snoRNAs with unknown targets. Bioinformatics 2008; 24:158-64.

14. Bazeley PS, et al. snoTARGET shows that human orphan snoRNA targets locate close to alternative splice junctions. Gene 2008; 408:172-9.

15. Tafer H, Kehr S, Hertel J, Stadler PF. RNAsnoop: Efficient target prediction for box H/ACA snoRNAs. Bioinformatics 2010; 26:610-6.

16. Kehr S, Bartschat S, Stadler PF, Tafer H. PLEXY: Efficient target prediction for box C/D snoRNAs. Bioinformatics 2011; 27:279-80.

17. Marz M, Stadler PF. Comparative analysis of eukary-otic U3 snoRNAs. RNA Biol 2009; 6:503-7.

18. Lestrade L, Weber MJ. snoRNA-LBME-db, a com-prehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 2006; 24:158-62.

19. Henras AK, Dez C, Henry Y. RNA structure and function in C/D and H/ACA s(no)RNPs. Curr Op Struct Biol 2004; 14:335-43.

20. Tycowski KT, Shu MD, Steitz JA. Requirement for intron-encoded U22 small nucleolar RNA in 18S ribosomal RNA maturation. Science 1994; 266:1558-61.

results, refined structure-annotated align-ments were manually constructed using the emacs editor in ralee mode50 and RNAalifold51 for structure prediction. Consensus structures were drawn using r2r.52 Genome sources, genome versions and chromosomal locations of snoR-NAs are presented in detail in the Supplemental material as annotation information in Stockholm-formatted alignment files.

Host gene identification. Host genes of known snoR-NAs were identified in a first step with ENSEMBL genome browser.53 Sequences of host genes, or in case no host gene was annotated, the sequences of the adjacent protein coding genes, were downloaded. We aligned host genes with the help of clustalw and verified possible homology of protein-coding genes with different names (this synteny information is indicated in the stockholm files for the respective snoRNA family). In addition, we searched for the proteins in genomic sequences to determine the relationships between host genes. Furthermore, we used the results of the protein search step on local genome versions to identify putative locations of more divergent snoRNAs.

Distant homologies. In order to assess distant homologies between snoRNA families we used infernal47 to align and score the members of one family to the covariance model of the other one after preliminary observations using cmcom-pare.54 Since snoRNAs of the same class by definition have similar secondary struc-ture, we generated a randomized set of sequences using RNAinverse55 and com-pared the bit score distributions.

Acknowledgments

The research reported here is the outcome of a graduate course on RNA bioinformat-ics taught at the University of Vienna in 2009. We thank Stefanie Wehner for help with preparing Table 2 and carefully read-ing the manuscript. This work has been funded, in part, by the Austrian FWF, project “SFB F43 RNA regulation of the transcriptome” (CHzS), the Austrian GEN-AU projects “bioinformatics inte-gration network III” and “regulatory ncRNAs” (ARG, CHzS, RL), the Deutsche Forschungsgemeinschaft, the DFG project MA-5082/1 and the European Union FP-7 project QUANTOMICS (PFS).

evidence. In fact, unrelated snoRNAs have the same targets in closely related nema-todes C. elegans and C. briggsae.44 At the same time, target specificity may change and recombine even within vertebrate evo-lution.45 Here, we have explored a simple approach based on distributions of infer-nal bit scores towards quantifying faint traces of ancient homologies. Our results, establishing links between some of the structurally aberrant snoRNA families, show that this is a promising approach to snoRNA homologies across phyla or even at the kingdom level.

Last but not least, we provide expanded and manually curated alignments, con-sensus structures, as well as detailed draw-ings of the consensus structure for all 13 snoRNA families covered in this contribu-tion, two of which have not been studied at all so far, providing a useful resource for genome annotation and further studies into snoRNA evolution alike.

Materials and Methods

Homology search. For each snoRNA family we retrieved the known sequences from snoRNA-LBME-db,18 Rfam (v.9.1 and v.10.0, seed sequences) and the novel chicken ncRNA ggn68 from reference 26. We first conducted iterative Blast46 searches in 105 downloadable animal genomes to retrieve initial candidates. In addition, we searched NCBI Genbank to record any direct evidence for individual sequences. After generating alignments and predicting consensus structures (see below), we used infernal (v.1.0),47 to construct and calibrate covariance mod-els that were then used to search with a combined sequence/structure approach in those genomes for which the purely sequence based approaches have remained unsuccessful. Candidates were added to the alignments and evaluated. This step was repeated until no new candidates were found. Candidate sequences were evalu-ated based on their infernal bit score and by manual inspection of their alignments with the already known homologs.

Alignments and consensus structures. Alignments were created with clustalw,48 cmalign,47 and locarna.49 Starting from the set of snoRNA-LBME-db entries, Rfam-alignments and our computational

©2011 Landes Bioscience.Do not distribute.

946 RNA Biology Volume 8 Issue 6

21. Frey MR, Wu W, Dunn JM, Matera AG. The U22 host gene (UHG): chromosomal localization of UHG and distribution of U22 small nucleolar RNA. Histochem Cell Biol 1997; 108:365-70.

22. Tycowski KT, Shu MD, Steitz JA. A mammalian gene with introns instead of exons generating stable rna products. Nature 1996; 379:464-6.

23. Zhang Y, et al. Systematic identification and evolu-tionary features of rhesus monkey small nucleolar RNAs. BMC Genomics 2010; 11:61.

24. Schmitz J, et al. Retroposed SNOfall-a mammalian-wide comparison of platypus snornas. Genome Res 2008; 18:1005-10.

25. H®uttenhofer A, et al. RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J 2001; 20:2943-53.

26. Zhang Y, et al. Systematic identification and charac-terization of chicken (Gallus gallus) ncRNAs. Nucleic Acids Res 2009; 37:6562-74.

27. Tycowski KT, You ZH, Graham PJ, Steitz JA. Modification of U6 spliceosomal RNA is guided by other small RNAs. Mol Cell 1998; 2:629-38.

28. Stadler PF, et al. Evolution of vault RNAs. Mol Biol Evol 2009; 26:1975-91.

29. Marz M, Kirsten T, Stadler PF. Evolution of spliceo-somal snRNA genes in metazoan animals. J Mol Evol 2008; 67:594-607.

30. Kiss AM, Jady BE, Bertrand E, Kiss T. Human box H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol 2004; 24:5797-807.

31. Taft RJ, Glazov EA, Lassmann T, Hayashizaki Y, Carninci P, Mattick JS. Small RNAs derived from snoRNAs. RNA 2009; 15:1233-40.

32. Langenberger D, Bermudez-Santana C, Stadler PF, Hoffmann S. Identification and classification of small RNAs in transcriptome sequence data. Pac Symp Biocomput 2010; 15:80-7.

33. Friedl®ander MR, et al. Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol 2008; 26:407-415.

34. Glazov EA, Cottee PA, Barris WC, Moore RJ, Dalrymple BP, Tizard ML. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res 2008; 18:957-64.

35. Glazov EA, Kongsuwan K, Assavalapsakul W, Horwood PF, Mitter N, Mahony TJ. Repertoire of bovine miRNA and miRNA-like small regulatory RNAs expressed upon viral infection. PLoS ONE 2009; 4:6349.

36. Murchison EP, et al. Conservation of small RNA pathways in platypus. Genome Res 2008; 18:995-1004.

37. Tycowski KT, Aab A, Steitz JA. Guide RNAs with 5' caps and novel box C/D snoRNA-like domains for modification of snR-NAs in metazoa. Curr Biol 2004; 14:1985-95.

38. Kiss AM, J¥ady BE, Darzacq X, Verheggen C, Bertrand E, Kiss T. A Cajal body-specific pseudouri-dylation guide RNA is composed of two box H/ACA snoRNA-like domains. Nucleic Acids Res 2002; 30:4643-9.

39. Schattner P, Barberan-Soler S, Lowe TM. A computa-tional screen for mammalian pseudouridylation guide H/ACA RNAs. Bioinformatics 2006; 12:15-25.

40. J’ady BE, Kiss T. A small nucleolar guide RNA functions both in 2'-O-methylation and pseudouri-dylation of U5 spliceosomal RNA. EMBO J 2001; 20:541-51.

41. Huang ZP, Chen CJ, Zhou H, Li BB, Qu LH. A combined computational and experimental analysis of two families of snoRNA genes from Caenorhabditis elegans, revealing the expression and evolution pat-tern of snoRNAs in nematodes. Genomics 2007; 89:490-501.

42. Deng W, et al. Organisation of the Caenorhabditis elegans small noncoding transcriptome: genomic fea-tures, biogenesis and expression. Genome Res 2006; 16:30-6.

43. Chen CL, Chen CJ, Vallon O, Huang ZP, Zhou H, Qu LH. Genomewide analysis of Box C/D and box H/ACA snoRNAs in Chlamydomonas reinhardtii reveals an extensive organization into intronic gene clusters. Genetics 2008; 179:21-30.

44. Zemann A, op de Bekke A, Kiefmann M, Brosius J, Schmitz J. Evolution of small nucleolar RNAs in nematodes. Nucleic Acids Res 2006; 34:2676-85.

45. Shao P, Yang JH, Zhou H, Guan DG, Qu LH. Genome-wide analysis of chicken snoRNAs provides unique implications for the evolution of vertebrate snoRNAs. BMC Genomics 2009; 10:86.

46. Altschul SF, et al. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res 1997; 24:3389-402.

47. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: Inference of RNA alignments. Bioinformatics 2009; 25:1335-7.

48. Thompson JD, Higgins DG, Gibson TJ. CLUSTALW: improving the sensitivity of progres-sive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 1994; 22:4673-80.

49. Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R. Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Comp Biol 2007; 3:65.

50. Griffths-Jones S. RALEE—RNA Alignment editor in Emacs. Bioinformatics 2005; 21:257-9.

51. Bernhart SH, Hofacker IL, Will S, Gruber AR, Stadler PF. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics 2008; 9:474.

52. Weinberg Z, Breaker RR. R2R—software to speed the depiction of aesthetic consensus RNA secondary structures. BMC Bioinformatics 2011; 12:3.

53. Spudich GM, Fern¥andez-Su¥arez XM. Touring ensembl: a practical guide to genome browsing. BMC Genomics 2010; 11:295.

54. H®oner zu Siederdissen C, Hofacker IL. Discriminatory power of RNA family models. Bioinformatics 2010; 26:453-9.

55. Hofacker IL, Fontana W, Stadler PF, Bonhoefier LS, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. Monatsh Chem 1994; 125:167-88.