19
Computational approaches to 3D modeling of RNA This article has been downloaded from IOPscience. Please scroll down to see the full text article. 2010 J. Phys.: Condens. Matter 22 283101 (http://iopscience.iop.org/0953-8984/22/28/283101) Download details: IP Address: 128.122.250.23 The article was downloaded on 16/12/2011 at 21:49 Please note that terms and conditions apply. View the table of contents for this issue, or go to the journal homepage for more Home Search Collections Journals About Contact us My IOPscience

Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

Computational approaches to 3D modeling of RNA

This article has been downloaded from IOPscience Please scroll down to see the full text article

2010 J Phys Condens Matter 22 283101

(httpiopscienceioporg0953-89842228283101)

Download details

IP Address 12812225023

The article was downloaded on 16122011 at 2149

Please note that terms and conditions apply

View the table of contents for this issue or go to the journal homepage for more

Home Search Collections Journals About Contact us My IOPscience

IOP PUBLISHING JOURNAL OF PHYSICS CONDENSED MATTER

J Phys Condens Matter 22 (2010) 283101 (18pp) doi1010880953-89842228283101

TOPICAL REVIEW

Computational approaches to 3Dmodeling of RNAChristian Laing and Tamar Schlick1

Department of Chemistry and Courant Institute of Mathematical SciencesNew York University 251 Mercer Street New York NY 10012 USA

E-mail schlicknyuedu

Received 26 February 2010 in final form 1 April 2010Published 15 June 2010Online at stacksioporgJPhysCM22283101

AbstractMany exciting discoveries have recently revealed the versatility of RNA and its importance in avariety of functions within the cell Since the structural features of RNA are of majorimportance to their biological function there is much interest in predicting RNA structureeither in free form or in interaction with various ligands including proteins metabolites andother molecules In recent years an increasing number of researchers have developed novelRNA algorithms for predicting RNA secondary and tertiary structures In this review wedescribe current experimental and computational advances and discuss recent ideas that aretransforming the traditional view of RNA folding To evaluate the performance of the mostrecent RNA 3D folding algorithms we provide a comparative study in order to test theperformance of available 3D structure prediction algorithms for an RNA data set of 43structures of various lengths and motifs We find that the algorithms vary widely in terms ofprediction quality across different RNA lengths and topologies most predictions have verylarge root mean square deviations from the experimental structure We conclude by outliningsome suggestions for future RNA folding research

S Online supplementary data available from stacksioporgJPhysCM22283101mmedia

(Some figures in this article are in colour only in the electronic version)

Contents

1 Introduction 12 RNA structure 23 RNA folding proteins dynamics pathways and

pausing 34 RNA structure determination 4

41 Experimental techniques 442 Current advances in RNA structure data 5

5 RNA secondary-structure prediction approaches 551 Secondary-structure prediction using a single

sequence 552 Multiple-sequence alignment 753 Advances in and limitations to RNA secondary-

structure prediction 7

1 Author to whom any correspondence should be addressed

6 RNA tertiary-structure prediction programs 87 Comparison of automated RNA 3D folding algorithms 9

71 Methods 972 Tertiary-structure prediction results 11

8 Discussion 15Acknowledgments 15References 15

1 Introduction

It is now widely recognized that RNA is a fundamentalbiological macromolecule with many biological functions atall stages of cellular life Besides the well-accepted functionalproperties of messenger RNA transfer RNA and ribosomalRNA many new non-coding RNAs are now known to performcatalytic regulatory roles that are essential to an organismrsquos

0953-898410283101+18$3000 copy 2010 IOP Publishing Ltd Printed in the UK amp the USA1

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 1 List of applications of natural and artificial RNAs in medicine and nanodesign

RNA type Definition Application References

Medicine

miRNA (microRNA) Short RNA sequences that can regulategenes in the post-transcription process

miRNA can suppress a tumor or act as anoncogene

[5ndash11]

RNAi (RNA interference) Gene silencing mechanism RNAi can be used as a tool for probing genefunction and rational drug design

[14 15]

Ribozyme RNA molecules with catalytic properties Ribozymes are being explored as biosensors andpotential therapeutics against inflammatorydisorders

[19 20]

sRNA (small RNA) Small functional RNAs that are nottranslated into proteins

Virulence genes are induced or repressed throughsRNA regulators

[21]

Nanodesign

TectoRNA square Molecular nanodesign of RNA squares Artificial RNA building blocks can be used asjigsaw puzzle units for building larger pieces ofdiverse geometry and complexity

[12 13]

TokenRNA aptamerbiosensor

A sequence-specific label-freefluorescent biosensor

Fluorescent signal results in the presence of itstarget and magnesium

[22]

Nanoring and nanotube Molecular design of hexagonal ring andtube units

The helical sequences of the building blocks caninclude siRNAs for drug delivery

[23]

pRNAsiRNA nanoparticle Design of chimeric phi29 packagingRNA (pRNA)siRNA nanoparticles

pRNAsiRNA particles target multiple tumor cellsby siRNA delivery

[24 25]

survival and evolution Small interference RNAs (RNAis) havea remarkable role in gene silencing [1] transfer-messengerRNAs (tmRNAs) direct the addition of tags to peptideson stalled ribosomes thereby affecting protein stability andtransport [2] other small non-coding RNAs (ncRNAs) regulatemessenger RNA stability and translation by base pairing atvarious positions with their target messenger RNAs [3 4]and recent findings indicate that microRNAs (miRNAs) canbe associated with tumorigenesis by acting either as tumorsuppressors [5ndash7] or oncogenes [8ndash11] This astonishingversatility of RNA has also been exploited for nanodesignfor biomedical and technological applications [12 13] Forexample the mechanism of RNA interference (RNAi) tosilence genes in a sequence specific manner is currentlybeing exploited as a tool to design drugs and for antiviraltherapy [14 15] More interesting examples of RNAapplications are described in table 1 Clearly more discoveriesare yet to come given the many novel non-protein-codingtranscripts identified in the human genome [16] Many of theseRNAs have yet unknown functions and new regulatory rolescontinue to emerge [17 18]

The structural features of RNAs are of major importanceto their biological functions because sequence alone does notprovide sufficient functional information Thus one of thegoals in RNA structural biology is to provide insights intohow structure and dynamics lead to specific functions of RNAeither in free form or in interaction with various molecules inthe full cellular milieu

Our focus in this article is on reviewing recent advancesin RNA structure determination and assessing current 3Dprediction methods Section 2 reviews our basic understandingof RNA structure Section 3 discusses new discoveries inRNA dynamics that are important for understanding RNAfolding Section 4 describes current advances and limitationsin experimental techniques used to determine both secondary

and tertiary structures Section 5 summarizes the recentapproaches and trends in the structure determination of RNAsecondary structure Section 6 reviews computational methodsfor RNA 3D structure prediction and section 7 evaluates someof the recent algorithms We conclude with a perspectivecomparing performance of 3D structure prediction for a singlerepresentative RNA data set

Developments in RNA structure prediction studies havebeen previously extensively reviewed In RNA secondary-structure prediction advances Gardner and Griegerich [26]compared secondary-structure prediction methods usingmultiple-sequence alignment while Mathews and Turner [27]presented progress in free energy minimization algorithmsusing dynamic programming Shapiro et al [28] provided acomprehensive review on RNA secondary-structure predictionwith a focus on pseudoknots and RNA 3D structure advanceswith a focus on manual methods

A more recent review by Capriotti and Marti-Renom [29]compiled current computational databases algorithms andcomputer programs that are available to the communityfor purposes ranging from sequence analysis to structureprediction and comparison Schroeder [30] recently reviewedprediction progress on viral RNAs Here we focus on exploringnew ideas that could improve RNA prediction and suggestfurther improvements by comparing the capabilities of current3D predictions programs

2 RNA structure

Understanding RNA structure and function relies on ourability to identify RNArsquos major structural components RNAmolecules can be studied extensively at the secondary-structure level where building blocks include helical stems andsingle-stranded regions such as hairpins internal loops andjunctions (figure 1(a)) Stems are formed by complementary

2

J Phys Condens Matter 22 (2010) 283101 Topical Review

c)

AA

C

G

5

3

53

internal loop

hairpin

junction

stem

a) b)

H1

H2

H3H4

H5

H6

Figure 1 (a) RNA secondary structure is composed of stems and loop regions Stems are formed by canonical GC AU and GU wobble WCbase pairs Loop regions are formed by hairpins internal loops and junctions (b) Loop regions can form long-range interactions as inpseudoknots or by using non-canonical base pairs (c) Non-canonical base pairs A486 (blue on the left) forms a trans WatsonndashWatson basepair with A511 (purple on the right) a trans HoogsteenndashSugar base pair with G482 (red on top) and a cis SugarndashSugar base pair with C505(green at the bottom) The base pair interactions occur in the H marismortui 23S rRNA structure (PDB 1S72)

canonical Watson and Crick base pairs GC and AU alongwith the GU wobble base pair A hairpin is a single-strandedregion that folds back on itself via regions of complementarybase pairs The single-stranded region between two stems isknown as an internal loop while a junction can be definedas the point of connection between three or more helicalstems Single-stranded regions can form pseudoknots whenbase pairs intertwine (figure 1(b)) These basic secondary-structure elements are pieced together via tertiary interactionsto form compact structures of active RNAs

Structural and comparative studies have suggested thatRNA structure is largely composed of repetitive modularbuilding blocks or motifs In particular RNA tertiary motifs areconserved structural patterns formed by pairwise interactionsbetween nucleotides These include base pairing basestacking and basendashphosphate interactions [31] RNAs possessremarkable interaction attributes where base pair interactionscan be classified into twelve geometric families in termsof pairs of interacting edges which can be Watson andCrick (WC) Hoogsteen (H) Sugar (S) and glycosidic bondorientations cis and trans [32] (see figure 1(c)) Furthermorebasendashphosphate interactions are also common and a recentclassification model has been proposed [33]

Graph theory methods have also been developedto represent RNA secondary structures since the 1970swith the pioneering work by Waterman [34] extendedby others [35ndash37] Recent structural or topologicalrepresentations using graph representation (trees and dualgraphs) of RNA secondary structures like RAG [38] constituteinnovative methods from an allied field that can help RNAscience [39]

3 RNA folding proteins dynamics pathways andpausing

Over the past decade many experiments have unveiled newclues into the elegant complexity and dynamics of RNAfolding Besides forming intricate long-range RNAndashRNAinteractions to direct and stabilize the structure the folding

is strongly affected by the speed of elongation and site-specific pausing of the RNA polymerase as well as interactionsof the RNA molecule with proteins solvent and smallmetabolites [40ndash42]

Our current understanding of how RNA folds is that it doesso by a hierarchical process helical elements in the secondarystructure are formed and then compact structures are formedusing tertiary pairwise interactions and tertiary motifs [43]However further studies are suggesting that RNA folding isactually quasi-hierarchical [44ndash46] where rearrangements ofsecondary-structure elements caused by tertiary interactionscan in turn trigger stable native folding

The RNA folding pathway does not always proceeduniquely Indeed the free energy landscape of RNA foldingis highly rugged composed of different and even paralleltrajectories where RNA chains can be easily kinetically trappedinto intermediate metastable states If the RNA falls into anintermediate structure state folding can take longer becausebreaking the non-native base pairs is required to reach its nativestate [47] Fortunately long-range tertiary contacts alongwith solvent (eg water ions) interactions can guide RNAto the correct folding [48] These tertiary interactions workcooperatively to direct folding to the native state [49]

It should be noted however that for some RNAsintermediate or alternative states are functional as forriboswitches (mentioned later) [50] and RNA regions inviruses [51ndash54] For instance there are structural domainsin the hepatitis D (HDV) virus where both branched andunbranched conformations are formed for distinct functionalroles [52] In the turnip crinkle virus (TCV) differentstructural configurations control the processes of virustranslation and replication [53]

In contrast to in vitro folding where unfolded RNAchains fold in the presence of magnesium ions (Mg2+) inthe cellular context nascent RNA molecules start to foldbefore transcription is complete Thus achieving the correctstructure can depend on the speed of transcription Thispresents a problem for helices composed of long-range strandssuch as helix H1 shown in figure 1(a) This is because

3

J Phys Condens Matter 22 (2010) 283101 Topical Review

during transcription the 5prime-strand of H1 is transcribed firstthus it needs to wait for the 3prime-strand of H1 to be fullytranscribed During this time alternative base pairs canform at the 5prime-strand To help minimize competition fromalternative folds RNA polymerase the molecule in chargeof RNA transcription pauses to allow the formation of non-native temporary conformations that easily unfold when the3prime-strand becomes available Such pausing prevents formationof super-stable non-native structures and thus constitutes ageneral strategy for facilitating the folding of long-rangehelices [55 56]

In addition to tertiary interactions solvent molecules suchas water and cations (positive ions such as Mg2+ and K+)also help initiate and guide RNA folding into the nativestructure Water molecules mediate coupled molecular motionsthroughout a folded RNA core and non-canonical bifurcatedbase pairs are formed only in the presence of water [32 57] Inaddition the high negative charge of an RNA molecule worksagainst its folding into its native structure Cations promotefolding by reducing the repulsion between RNA phosphatesThus besides helping to stabilize tertiary interactions duringfolding ionndashRNA interactions influence stability pathwaydiversity and transition states It has been emphasized [42 58]that ions of at least two types modulate the electrostatic surfacepotential of the negatively charged RNA molecule chelatedions which are held in direct contact with the RNA surfaceby electrostatic forces and diffuse ions which accumulate nearthe RNA due to the RNA electrostatic field and remain largelyhydrated

Most RNAs within the cell are parts of RNAndashproteincomplexes Their binding can stabilize the RNA structureinduce conformational changes and even act as RNAchaperones to guide folding Large RNAs such as group Iintron RNase P and the ribosomal RNA (rRNA) structureare often stabilized by RNA-binding proteins For instancea recent model for rRNA folding suggested that the rRNAtertiary structure is dynamic (flexible) in the absence ofproteins and that alternative structural conformations cancompete with each other Thus ribosomal proteins may notonly stabilize rRNA tertiary interactions but might also changethe path of assembly avoiding RNA misfoldings [59] Proteinsalso bind at different stages in the folding process and ahierarchical protein binding is now recognized where primarybinding proteins interact at an early stage thus marking theirimportance in the assembly process Other proteins bind at theend and are related more to functional roles

Though RNA folding is a complex process is sensitive tothe environment and possesses a network of folding transitionsand pathways many RNAs share a common organization oftheir helical elements Indeed analyses on current solvedRNA 3D structures have shown that the majority of helicalelements in junctions tend to arrange roughly in paralleland perpendicular configurations These arrangements arestabilized by both RNAndashRNA interactions as well as RNAndashprotein interactions [60ndash62]

Finally Nature exploits the potential of RNA sequences toform multiple alternative metastable structures for implement-ing highly sensitive molecule switches capable of controlling

gene expression at the level of the mRNA RNA riboswitchesare RNAs found within some messenger RNA (mRNA) thatcan change conformation upon binding small metabolites andthus terminate transcription or block translation The ter-tiary structure of the riboswitch binding pocket is stabilizedonly upon ligand binding As mentioned earlier RNA virusesmake use of transitions between metastable structural domainsfor different functions Thus riboswitches and structural do-mains within viruses exemplify the dynamic properties of RNAmolecules

4 RNA structure determination

Since the lsquoEra of RNA awakeningrsquo began [63] theerroneous perception of RNA as a simple molecule hasdramatically changed due in part to advances in RNA structuredetermination techniques In the early 1970s the first fullRNA structure the transfer RNA (tRNA) was obtained bycrystallization techniques [64] providing many clues to RNArsquoshelical organization A decade later the discovery of catalyticRNAs revolutionized our understanding of RNArsquos complexcellular roles Indeed the ligand-dependent conformationsof riboswitches have shown that even small RNAs arestructurally and functionally complex [50] Similarly theinitial identification of the P4ndashP6 fragment of the group Iintron and later the larger domain P1ndashP9 and the group IIintron have unraveled the intricacies of long-range RNAndashRNAinteraction motifs such as the tetraloop receptor ribose zipperA-minor and other intriguing motifs An important milestonewas reached with high resolution structures of three ribosomalRNA (rRNA) subunits including the 23S rRNA subunit whichis the largest RNA structure solved thus far [65ndash67] thesestructures revealed arrays of RNAndashRNA and RNAndashproteininteractions which in turn can form higher order interactionpatterns [49] These pioneering and far-reaching works onthe ribosome structure by A E Yonath V RamakrishnanT A Steitz and others were recognized in the 2009 Nobel Prizein Chemistry These remarkable breakthroughs however alsodemand atomic-level structural and dynamic information forunderstanding RNA function

41 Experimental techniques

Structural information has been determined from RNAmolecules using numerous experimental strategies Many ofthese experimental approaches have been complemented bycomputational approaches resulting in important improve-ments on both the computational and experimental fronts Be-low we describe some of the experimental methods for RNAstructure determination For a more detailed review see [68]

Fluorescence resonance energy transfer (FRET) is amethod often used to analyze the global structure and evendynamics of RNA elements such as junctions [69 70] Thestrength of FRET is that it allows detecting when two elementsin the structure are in close proximity (10ndash100 A) thus helpingto determine RNArsquos global helical organization

The 2prime-hydroxyl acylation RNA (SHAPE) chemistryapproach is an important probing technique recently developedby the Weeks group [71 72] The protocol exploits the

4

J Phys Condens Matter 22 (2010) 283101 Topical Review

nucleophilic reactivity of the ribose-2prime-hydroxyl positionwhich strongly correlates with the nucleotide flexibilityNucleotides in single-stranded regions are seen as flexiblewhile nucleotides involved in base pairs are more rigidThe main advantage is that the method is simple and thereis no limit on the size of the RNA molecule A fewsecondary- and tertiary-structure prediction programs haveincorporated data from RNA SHAPE to improve the predictionaccuracy [73ndash75]

Other structure-specific probe methods such as usingdimethyl sulfate (DMS) [143] which can focus on certain nu-cleotides (eg adenines) or hydroxyl radical footprinting [77]which can explore the solvent accessibility to the RNA back-bone are also used to deduce RNA structured features Bothmethods can be powerful but are laborious Another approachis the use of microarrays for chemical mapping [78] Thismethod has been combined with dynamic programming algo-rithms for predicting RNA secondary structure replacing otherlabor intensive approaches such as chemical mapping and en-zymatic cleavage

NMR spectroscopy is the well-known technique thatexploits the magnetic properties of certain nuclei NMRspectroscopy is an important tool for probing the structureand dynamics of RNA [79 80] In addition NMR relaxationmethods can be used to obtain dynamic data on timescalesranging from picoseconds to seconds The main limitation onNMR is molecular size (sim20 kDa) However novel methodslike that reported combining small-angle scattering (SAXS)and NMR techniques [63] can be used to predict larger RNAsas applied recently for the 100 nt TCV RNA virus [81]

Of course x-ray crystallography analysis based onthe growth of single well-ordered crystals remains anotherinvaluable method for detailed structural resolution Well-ordered single crystals are required for structural studies byx-ray diffraction methods and obtaining them is often themost difficult step of the structure determination process [82]Most RNAs exist naturally as proteinndashRNA complexes and thecrystallization of these complexes has several advantages overthe crystallization of RNA alone X-ray crystallography hasbeen successful in determining the largest RNA molecules sofar such as the large ribosomal subunits [65ndash67] Neverthelesstechnical difficulties still remain such as the availability ofmany conformations and transitional states for RNAs Thedynamical nature of RNA molecules also complicates matters

Other experimental techniques such as cryo-electronmicroscopy (cryo-EM) have recently undergone tremendousadvances such as those reported in the study of IRES virusdomains [83] Contrary to the crystallography case there is nosize limitation and the molecule does not need to be orderedor isolated from its complex However due to the difficulty ofaccurate image reconstruction the resolution is still limited to10 A at best

42 Current advances in RNA structure data

Advances in sequencing technology have made available agrowing amount of RNA sequence information (figure 2) butthe challenge of how to interpret these data remains unmetThe RNA secondary-structure and statistics database (RNA

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Num

ber

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

RNA structures deposited in the NDB database

Total Yearly

Figure 2 Number of RNA structures deposited in the NDB nucleicacid database (httpndbserverrutgersedu) as of December 2009

STRAND) [144] contains as of May 2010 4666 secondarystructures Furthermore Rfam a database of sequence andsecondary-structure information from several RNA familieshas recorded 1446 RNA families since its last update in January2010 [85] Each family can contain many thousands of RNAsequences (eg the 5S rRNA family has 57 054 sequences)

The structures of a large majority of RNA moleculesremain to be solved Despite many advances in RNAcrystallography NMR and chemical modification RNAstructure determination is a difficult task This makes the needfor accurate prediction using computational methods especiallyurgent

5 RNA secondary-structure prediction approaches

Predicting RNA secondary structures refers to the task ofdetermining given a single RNA sequence or multipleRNA sequences the set of complementary canonicalWatson and Crick GC AU and GU wobble base pairsthat define helical stems as well as the single-strandedregions such as hairpins internal loops and junctions (seesection 2) Many computational programs designed forpredicting RNA secondary structures have been developedover the past three decades Approaches vary widelyincluding free energy minimization using thermodynamicparameters [76 86 87] knowledge-based predictions basedon known RNA structures [88ndash90] comparative sequencealignment algorithms [76 91ndash96] combinations of these andothers Below we describe some of the current predictionprograms More examples are listed in table 2 Other reviewson specific approaches are also available [26 28 97ndash99]

51 Secondary-structure prediction using a single sequence

It is believed that the native RNA structure correspondsto a global minimum of the relevant free energy functionTherefore prediction programs focus on determining thefree energy for a secondary structure One of the firstprograms developed for predicting RNA secondary structuresusing a single sequence called Mfold was developed by

5

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 2: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

IOP PUBLISHING JOURNAL OF PHYSICS CONDENSED MATTER

J Phys Condens Matter 22 (2010) 283101 (18pp) doi1010880953-89842228283101

TOPICAL REVIEW

Computational approaches to 3Dmodeling of RNAChristian Laing and Tamar Schlick1

Department of Chemistry and Courant Institute of Mathematical SciencesNew York University 251 Mercer Street New York NY 10012 USA

E-mail schlicknyuedu

Received 26 February 2010 in final form 1 April 2010Published 15 June 2010Online at stacksioporgJPhysCM22283101

AbstractMany exciting discoveries have recently revealed the versatility of RNA and its importance in avariety of functions within the cell Since the structural features of RNA are of majorimportance to their biological function there is much interest in predicting RNA structureeither in free form or in interaction with various ligands including proteins metabolites andother molecules In recent years an increasing number of researchers have developed novelRNA algorithms for predicting RNA secondary and tertiary structures In this review wedescribe current experimental and computational advances and discuss recent ideas that aretransforming the traditional view of RNA folding To evaluate the performance of the mostrecent RNA 3D folding algorithms we provide a comparative study in order to test theperformance of available 3D structure prediction algorithms for an RNA data set of 43structures of various lengths and motifs We find that the algorithms vary widely in terms ofprediction quality across different RNA lengths and topologies most predictions have verylarge root mean square deviations from the experimental structure We conclude by outliningsome suggestions for future RNA folding research

S Online supplementary data available from stacksioporgJPhysCM22283101mmedia

(Some figures in this article are in colour only in the electronic version)

Contents

1 Introduction 12 RNA structure 23 RNA folding proteins dynamics pathways and

pausing 34 RNA structure determination 4

41 Experimental techniques 442 Current advances in RNA structure data 5

5 RNA secondary-structure prediction approaches 551 Secondary-structure prediction using a single

sequence 552 Multiple-sequence alignment 753 Advances in and limitations to RNA secondary-

structure prediction 7

1 Author to whom any correspondence should be addressed

6 RNA tertiary-structure prediction programs 87 Comparison of automated RNA 3D folding algorithms 9

71 Methods 972 Tertiary-structure prediction results 11

8 Discussion 15Acknowledgments 15References 15

1 Introduction

It is now widely recognized that RNA is a fundamentalbiological macromolecule with many biological functions atall stages of cellular life Besides the well-accepted functionalproperties of messenger RNA transfer RNA and ribosomalRNA many new non-coding RNAs are now known to performcatalytic regulatory roles that are essential to an organismrsquos

0953-898410283101+18$3000 copy 2010 IOP Publishing Ltd Printed in the UK amp the USA1

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 1 List of applications of natural and artificial RNAs in medicine and nanodesign

RNA type Definition Application References

Medicine

miRNA (microRNA) Short RNA sequences that can regulategenes in the post-transcription process

miRNA can suppress a tumor or act as anoncogene

[5ndash11]

RNAi (RNA interference) Gene silencing mechanism RNAi can be used as a tool for probing genefunction and rational drug design

[14 15]

Ribozyme RNA molecules with catalytic properties Ribozymes are being explored as biosensors andpotential therapeutics against inflammatorydisorders

[19 20]

sRNA (small RNA) Small functional RNAs that are nottranslated into proteins

Virulence genes are induced or repressed throughsRNA regulators

[21]

Nanodesign

TectoRNA square Molecular nanodesign of RNA squares Artificial RNA building blocks can be used asjigsaw puzzle units for building larger pieces ofdiverse geometry and complexity

[12 13]

TokenRNA aptamerbiosensor

A sequence-specific label-freefluorescent biosensor

Fluorescent signal results in the presence of itstarget and magnesium

[22]

Nanoring and nanotube Molecular design of hexagonal ring andtube units

The helical sequences of the building blocks caninclude siRNAs for drug delivery

[23]

pRNAsiRNA nanoparticle Design of chimeric phi29 packagingRNA (pRNA)siRNA nanoparticles

pRNAsiRNA particles target multiple tumor cellsby siRNA delivery

[24 25]

survival and evolution Small interference RNAs (RNAis) havea remarkable role in gene silencing [1] transfer-messengerRNAs (tmRNAs) direct the addition of tags to peptideson stalled ribosomes thereby affecting protein stability andtransport [2] other small non-coding RNAs (ncRNAs) regulatemessenger RNA stability and translation by base pairing atvarious positions with their target messenger RNAs [3 4]and recent findings indicate that microRNAs (miRNAs) canbe associated with tumorigenesis by acting either as tumorsuppressors [5ndash7] or oncogenes [8ndash11] This astonishingversatility of RNA has also been exploited for nanodesignfor biomedical and technological applications [12 13] Forexample the mechanism of RNA interference (RNAi) tosilence genes in a sequence specific manner is currentlybeing exploited as a tool to design drugs and for antiviraltherapy [14 15] More interesting examples of RNAapplications are described in table 1 Clearly more discoveriesare yet to come given the many novel non-protein-codingtranscripts identified in the human genome [16] Many of theseRNAs have yet unknown functions and new regulatory rolescontinue to emerge [17 18]

The structural features of RNAs are of major importanceto their biological functions because sequence alone does notprovide sufficient functional information Thus one of thegoals in RNA structural biology is to provide insights intohow structure and dynamics lead to specific functions of RNAeither in free form or in interaction with various molecules inthe full cellular milieu

Our focus in this article is on reviewing recent advancesin RNA structure determination and assessing current 3Dprediction methods Section 2 reviews our basic understandingof RNA structure Section 3 discusses new discoveries inRNA dynamics that are important for understanding RNAfolding Section 4 describes current advances and limitationsin experimental techniques used to determine both secondary

and tertiary structures Section 5 summarizes the recentapproaches and trends in the structure determination of RNAsecondary structure Section 6 reviews computational methodsfor RNA 3D structure prediction and section 7 evaluates someof the recent algorithms We conclude with a perspectivecomparing performance of 3D structure prediction for a singlerepresentative RNA data set

Developments in RNA structure prediction studies havebeen previously extensively reviewed In RNA secondary-structure prediction advances Gardner and Griegerich [26]compared secondary-structure prediction methods usingmultiple-sequence alignment while Mathews and Turner [27]presented progress in free energy minimization algorithmsusing dynamic programming Shapiro et al [28] provided acomprehensive review on RNA secondary-structure predictionwith a focus on pseudoknots and RNA 3D structure advanceswith a focus on manual methods

A more recent review by Capriotti and Marti-Renom [29]compiled current computational databases algorithms andcomputer programs that are available to the communityfor purposes ranging from sequence analysis to structureprediction and comparison Schroeder [30] recently reviewedprediction progress on viral RNAs Here we focus on exploringnew ideas that could improve RNA prediction and suggestfurther improvements by comparing the capabilities of current3D predictions programs

2 RNA structure

Understanding RNA structure and function relies on ourability to identify RNArsquos major structural components RNAmolecules can be studied extensively at the secondary-structure level where building blocks include helical stems andsingle-stranded regions such as hairpins internal loops andjunctions (figure 1(a)) Stems are formed by complementary

2

J Phys Condens Matter 22 (2010) 283101 Topical Review

c)

AA

C

G

5

3

53

internal loop

hairpin

junction

stem

a) b)

H1

H2

H3H4

H5

H6

Figure 1 (a) RNA secondary structure is composed of stems and loop regions Stems are formed by canonical GC AU and GU wobble WCbase pairs Loop regions are formed by hairpins internal loops and junctions (b) Loop regions can form long-range interactions as inpseudoknots or by using non-canonical base pairs (c) Non-canonical base pairs A486 (blue on the left) forms a trans WatsonndashWatson basepair with A511 (purple on the right) a trans HoogsteenndashSugar base pair with G482 (red on top) and a cis SugarndashSugar base pair with C505(green at the bottom) The base pair interactions occur in the H marismortui 23S rRNA structure (PDB 1S72)

canonical Watson and Crick base pairs GC and AU alongwith the GU wobble base pair A hairpin is a single-strandedregion that folds back on itself via regions of complementarybase pairs The single-stranded region between two stems isknown as an internal loop while a junction can be definedas the point of connection between three or more helicalstems Single-stranded regions can form pseudoknots whenbase pairs intertwine (figure 1(b)) These basic secondary-structure elements are pieced together via tertiary interactionsto form compact structures of active RNAs

Structural and comparative studies have suggested thatRNA structure is largely composed of repetitive modularbuilding blocks or motifs In particular RNA tertiary motifs areconserved structural patterns formed by pairwise interactionsbetween nucleotides These include base pairing basestacking and basendashphosphate interactions [31] RNAs possessremarkable interaction attributes where base pair interactionscan be classified into twelve geometric families in termsof pairs of interacting edges which can be Watson andCrick (WC) Hoogsteen (H) Sugar (S) and glycosidic bondorientations cis and trans [32] (see figure 1(c)) Furthermorebasendashphosphate interactions are also common and a recentclassification model has been proposed [33]

Graph theory methods have also been developedto represent RNA secondary structures since the 1970swith the pioneering work by Waterman [34] extendedby others [35ndash37] Recent structural or topologicalrepresentations using graph representation (trees and dualgraphs) of RNA secondary structures like RAG [38] constituteinnovative methods from an allied field that can help RNAscience [39]

3 RNA folding proteins dynamics pathways andpausing

Over the past decade many experiments have unveiled newclues into the elegant complexity and dynamics of RNAfolding Besides forming intricate long-range RNAndashRNAinteractions to direct and stabilize the structure the folding

is strongly affected by the speed of elongation and site-specific pausing of the RNA polymerase as well as interactionsof the RNA molecule with proteins solvent and smallmetabolites [40ndash42]

Our current understanding of how RNA folds is that it doesso by a hierarchical process helical elements in the secondarystructure are formed and then compact structures are formedusing tertiary pairwise interactions and tertiary motifs [43]However further studies are suggesting that RNA folding isactually quasi-hierarchical [44ndash46] where rearrangements ofsecondary-structure elements caused by tertiary interactionscan in turn trigger stable native folding

The RNA folding pathway does not always proceeduniquely Indeed the free energy landscape of RNA foldingis highly rugged composed of different and even paralleltrajectories where RNA chains can be easily kinetically trappedinto intermediate metastable states If the RNA falls into anintermediate structure state folding can take longer becausebreaking the non-native base pairs is required to reach its nativestate [47] Fortunately long-range tertiary contacts alongwith solvent (eg water ions) interactions can guide RNAto the correct folding [48] These tertiary interactions workcooperatively to direct folding to the native state [49]

It should be noted however that for some RNAsintermediate or alternative states are functional as forriboswitches (mentioned later) [50] and RNA regions inviruses [51ndash54] For instance there are structural domainsin the hepatitis D (HDV) virus where both branched andunbranched conformations are formed for distinct functionalroles [52] In the turnip crinkle virus (TCV) differentstructural configurations control the processes of virustranslation and replication [53]

In contrast to in vitro folding where unfolded RNAchains fold in the presence of magnesium ions (Mg2+) inthe cellular context nascent RNA molecules start to foldbefore transcription is complete Thus achieving the correctstructure can depend on the speed of transcription Thispresents a problem for helices composed of long-range strandssuch as helix H1 shown in figure 1(a) This is because

3

J Phys Condens Matter 22 (2010) 283101 Topical Review

during transcription the 5prime-strand of H1 is transcribed firstthus it needs to wait for the 3prime-strand of H1 to be fullytranscribed During this time alternative base pairs canform at the 5prime-strand To help minimize competition fromalternative folds RNA polymerase the molecule in chargeof RNA transcription pauses to allow the formation of non-native temporary conformations that easily unfold when the3prime-strand becomes available Such pausing prevents formationof super-stable non-native structures and thus constitutes ageneral strategy for facilitating the folding of long-rangehelices [55 56]

In addition to tertiary interactions solvent molecules suchas water and cations (positive ions such as Mg2+ and K+)also help initiate and guide RNA folding into the nativestructure Water molecules mediate coupled molecular motionsthroughout a folded RNA core and non-canonical bifurcatedbase pairs are formed only in the presence of water [32 57] Inaddition the high negative charge of an RNA molecule worksagainst its folding into its native structure Cations promotefolding by reducing the repulsion between RNA phosphatesThus besides helping to stabilize tertiary interactions duringfolding ionndashRNA interactions influence stability pathwaydiversity and transition states It has been emphasized [42 58]that ions of at least two types modulate the electrostatic surfacepotential of the negatively charged RNA molecule chelatedions which are held in direct contact with the RNA surfaceby electrostatic forces and diffuse ions which accumulate nearthe RNA due to the RNA electrostatic field and remain largelyhydrated

Most RNAs within the cell are parts of RNAndashproteincomplexes Their binding can stabilize the RNA structureinduce conformational changes and even act as RNAchaperones to guide folding Large RNAs such as group Iintron RNase P and the ribosomal RNA (rRNA) structureare often stabilized by RNA-binding proteins For instancea recent model for rRNA folding suggested that the rRNAtertiary structure is dynamic (flexible) in the absence ofproteins and that alternative structural conformations cancompete with each other Thus ribosomal proteins may notonly stabilize rRNA tertiary interactions but might also changethe path of assembly avoiding RNA misfoldings [59] Proteinsalso bind at different stages in the folding process and ahierarchical protein binding is now recognized where primarybinding proteins interact at an early stage thus marking theirimportance in the assembly process Other proteins bind at theend and are related more to functional roles

Though RNA folding is a complex process is sensitive tothe environment and possesses a network of folding transitionsand pathways many RNAs share a common organization oftheir helical elements Indeed analyses on current solvedRNA 3D structures have shown that the majority of helicalelements in junctions tend to arrange roughly in paralleland perpendicular configurations These arrangements arestabilized by both RNAndashRNA interactions as well as RNAndashprotein interactions [60ndash62]

Finally Nature exploits the potential of RNA sequences toform multiple alternative metastable structures for implement-ing highly sensitive molecule switches capable of controlling

gene expression at the level of the mRNA RNA riboswitchesare RNAs found within some messenger RNA (mRNA) thatcan change conformation upon binding small metabolites andthus terminate transcription or block translation The ter-tiary structure of the riboswitch binding pocket is stabilizedonly upon ligand binding As mentioned earlier RNA virusesmake use of transitions between metastable structural domainsfor different functions Thus riboswitches and structural do-mains within viruses exemplify the dynamic properties of RNAmolecules

4 RNA structure determination

Since the lsquoEra of RNA awakeningrsquo began [63] theerroneous perception of RNA as a simple molecule hasdramatically changed due in part to advances in RNA structuredetermination techniques In the early 1970s the first fullRNA structure the transfer RNA (tRNA) was obtained bycrystallization techniques [64] providing many clues to RNArsquoshelical organization A decade later the discovery of catalyticRNAs revolutionized our understanding of RNArsquos complexcellular roles Indeed the ligand-dependent conformationsof riboswitches have shown that even small RNAs arestructurally and functionally complex [50] Similarly theinitial identification of the P4ndashP6 fragment of the group Iintron and later the larger domain P1ndashP9 and the group IIintron have unraveled the intricacies of long-range RNAndashRNAinteraction motifs such as the tetraloop receptor ribose zipperA-minor and other intriguing motifs An important milestonewas reached with high resolution structures of three ribosomalRNA (rRNA) subunits including the 23S rRNA subunit whichis the largest RNA structure solved thus far [65ndash67] thesestructures revealed arrays of RNAndashRNA and RNAndashproteininteractions which in turn can form higher order interactionpatterns [49] These pioneering and far-reaching works onthe ribosome structure by A E Yonath V RamakrishnanT A Steitz and others were recognized in the 2009 Nobel Prizein Chemistry These remarkable breakthroughs however alsodemand atomic-level structural and dynamic information forunderstanding RNA function

41 Experimental techniques

Structural information has been determined from RNAmolecules using numerous experimental strategies Many ofthese experimental approaches have been complemented bycomputational approaches resulting in important improve-ments on both the computational and experimental fronts Be-low we describe some of the experimental methods for RNAstructure determination For a more detailed review see [68]

Fluorescence resonance energy transfer (FRET) is amethod often used to analyze the global structure and evendynamics of RNA elements such as junctions [69 70] Thestrength of FRET is that it allows detecting when two elementsin the structure are in close proximity (10ndash100 A) thus helpingto determine RNArsquos global helical organization

The 2prime-hydroxyl acylation RNA (SHAPE) chemistryapproach is an important probing technique recently developedby the Weeks group [71 72] The protocol exploits the

4

J Phys Condens Matter 22 (2010) 283101 Topical Review

nucleophilic reactivity of the ribose-2prime-hydroxyl positionwhich strongly correlates with the nucleotide flexibilityNucleotides in single-stranded regions are seen as flexiblewhile nucleotides involved in base pairs are more rigidThe main advantage is that the method is simple and thereis no limit on the size of the RNA molecule A fewsecondary- and tertiary-structure prediction programs haveincorporated data from RNA SHAPE to improve the predictionaccuracy [73ndash75]

Other structure-specific probe methods such as usingdimethyl sulfate (DMS) [143] which can focus on certain nu-cleotides (eg adenines) or hydroxyl radical footprinting [77]which can explore the solvent accessibility to the RNA back-bone are also used to deduce RNA structured features Bothmethods can be powerful but are laborious Another approachis the use of microarrays for chemical mapping [78] Thismethod has been combined with dynamic programming algo-rithms for predicting RNA secondary structure replacing otherlabor intensive approaches such as chemical mapping and en-zymatic cleavage

NMR spectroscopy is the well-known technique thatexploits the magnetic properties of certain nuclei NMRspectroscopy is an important tool for probing the structureand dynamics of RNA [79 80] In addition NMR relaxationmethods can be used to obtain dynamic data on timescalesranging from picoseconds to seconds The main limitation onNMR is molecular size (sim20 kDa) However novel methodslike that reported combining small-angle scattering (SAXS)and NMR techniques [63] can be used to predict larger RNAsas applied recently for the 100 nt TCV RNA virus [81]

Of course x-ray crystallography analysis based onthe growth of single well-ordered crystals remains anotherinvaluable method for detailed structural resolution Well-ordered single crystals are required for structural studies byx-ray diffraction methods and obtaining them is often themost difficult step of the structure determination process [82]Most RNAs exist naturally as proteinndashRNA complexes and thecrystallization of these complexes has several advantages overthe crystallization of RNA alone X-ray crystallography hasbeen successful in determining the largest RNA molecules sofar such as the large ribosomal subunits [65ndash67] Neverthelesstechnical difficulties still remain such as the availability ofmany conformations and transitional states for RNAs Thedynamical nature of RNA molecules also complicates matters

Other experimental techniques such as cryo-electronmicroscopy (cryo-EM) have recently undergone tremendousadvances such as those reported in the study of IRES virusdomains [83] Contrary to the crystallography case there is nosize limitation and the molecule does not need to be orderedor isolated from its complex However due to the difficulty ofaccurate image reconstruction the resolution is still limited to10 A at best

42 Current advances in RNA structure data

Advances in sequencing technology have made available agrowing amount of RNA sequence information (figure 2) butthe challenge of how to interpret these data remains unmetThe RNA secondary-structure and statistics database (RNA

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Num

ber

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

RNA structures deposited in the NDB database

Total Yearly

Figure 2 Number of RNA structures deposited in the NDB nucleicacid database (httpndbserverrutgersedu) as of December 2009

STRAND) [144] contains as of May 2010 4666 secondarystructures Furthermore Rfam a database of sequence andsecondary-structure information from several RNA familieshas recorded 1446 RNA families since its last update in January2010 [85] Each family can contain many thousands of RNAsequences (eg the 5S rRNA family has 57 054 sequences)

The structures of a large majority of RNA moleculesremain to be solved Despite many advances in RNAcrystallography NMR and chemical modification RNAstructure determination is a difficult task This makes the needfor accurate prediction using computational methods especiallyurgent

5 RNA secondary-structure prediction approaches

Predicting RNA secondary structures refers to the task ofdetermining given a single RNA sequence or multipleRNA sequences the set of complementary canonicalWatson and Crick GC AU and GU wobble base pairsthat define helical stems as well as the single-strandedregions such as hairpins internal loops and junctions (seesection 2) Many computational programs designed forpredicting RNA secondary structures have been developedover the past three decades Approaches vary widelyincluding free energy minimization using thermodynamicparameters [76 86 87] knowledge-based predictions basedon known RNA structures [88ndash90] comparative sequencealignment algorithms [76 91ndash96] combinations of these andothers Below we describe some of the current predictionprograms More examples are listed in table 2 Other reviewson specific approaches are also available [26 28 97ndash99]

51 Secondary-structure prediction using a single sequence

It is believed that the native RNA structure correspondsto a global minimum of the relevant free energy functionTherefore prediction programs focus on determining thefree energy for a secondary structure One of the firstprograms developed for predicting RNA secondary structuresusing a single sequence called Mfold was developed by

5

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 3: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 1 List of applications of natural and artificial RNAs in medicine and nanodesign

RNA type Definition Application References

Medicine

miRNA (microRNA) Short RNA sequences that can regulategenes in the post-transcription process

miRNA can suppress a tumor or act as anoncogene

[5ndash11]

RNAi (RNA interference) Gene silencing mechanism RNAi can be used as a tool for probing genefunction and rational drug design

[14 15]

Ribozyme RNA molecules with catalytic properties Ribozymes are being explored as biosensors andpotential therapeutics against inflammatorydisorders

[19 20]

sRNA (small RNA) Small functional RNAs that are nottranslated into proteins

Virulence genes are induced or repressed throughsRNA regulators

[21]

Nanodesign

TectoRNA square Molecular nanodesign of RNA squares Artificial RNA building blocks can be used asjigsaw puzzle units for building larger pieces ofdiverse geometry and complexity

[12 13]

TokenRNA aptamerbiosensor

A sequence-specific label-freefluorescent biosensor

Fluorescent signal results in the presence of itstarget and magnesium

[22]

Nanoring and nanotube Molecular design of hexagonal ring andtube units

The helical sequences of the building blocks caninclude siRNAs for drug delivery

[23]

pRNAsiRNA nanoparticle Design of chimeric phi29 packagingRNA (pRNA)siRNA nanoparticles

pRNAsiRNA particles target multiple tumor cellsby siRNA delivery

[24 25]

survival and evolution Small interference RNAs (RNAis) havea remarkable role in gene silencing [1] transfer-messengerRNAs (tmRNAs) direct the addition of tags to peptideson stalled ribosomes thereby affecting protein stability andtransport [2] other small non-coding RNAs (ncRNAs) regulatemessenger RNA stability and translation by base pairing atvarious positions with their target messenger RNAs [3 4]and recent findings indicate that microRNAs (miRNAs) canbe associated with tumorigenesis by acting either as tumorsuppressors [5ndash7] or oncogenes [8ndash11] This astonishingversatility of RNA has also been exploited for nanodesignfor biomedical and technological applications [12 13] Forexample the mechanism of RNA interference (RNAi) tosilence genes in a sequence specific manner is currentlybeing exploited as a tool to design drugs and for antiviraltherapy [14 15] More interesting examples of RNAapplications are described in table 1 Clearly more discoveriesare yet to come given the many novel non-protein-codingtranscripts identified in the human genome [16] Many of theseRNAs have yet unknown functions and new regulatory rolescontinue to emerge [17 18]

The structural features of RNAs are of major importanceto their biological functions because sequence alone does notprovide sufficient functional information Thus one of thegoals in RNA structural biology is to provide insights intohow structure and dynamics lead to specific functions of RNAeither in free form or in interaction with various molecules inthe full cellular milieu

Our focus in this article is on reviewing recent advancesin RNA structure determination and assessing current 3Dprediction methods Section 2 reviews our basic understandingof RNA structure Section 3 discusses new discoveries inRNA dynamics that are important for understanding RNAfolding Section 4 describes current advances and limitationsin experimental techniques used to determine both secondary

and tertiary structures Section 5 summarizes the recentapproaches and trends in the structure determination of RNAsecondary structure Section 6 reviews computational methodsfor RNA 3D structure prediction and section 7 evaluates someof the recent algorithms We conclude with a perspectivecomparing performance of 3D structure prediction for a singlerepresentative RNA data set

Developments in RNA structure prediction studies havebeen previously extensively reviewed In RNA secondary-structure prediction advances Gardner and Griegerich [26]compared secondary-structure prediction methods usingmultiple-sequence alignment while Mathews and Turner [27]presented progress in free energy minimization algorithmsusing dynamic programming Shapiro et al [28] provided acomprehensive review on RNA secondary-structure predictionwith a focus on pseudoknots and RNA 3D structure advanceswith a focus on manual methods

A more recent review by Capriotti and Marti-Renom [29]compiled current computational databases algorithms andcomputer programs that are available to the communityfor purposes ranging from sequence analysis to structureprediction and comparison Schroeder [30] recently reviewedprediction progress on viral RNAs Here we focus on exploringnew ideas that could improve RNA prediction and suggestfurther improvements by comparing the capabilities of current3D predictions programs

2 RNA structure

Understanding RNA structure and function relies on ourability to identify RNArsquos major structural components RNAmolecules can be studied extensively at the secondary-structure level where building blocks include helical stems andsingle-stranded regions such as hairpins internal loops andjunctions (figure 1(a)) Stems are formed by complementary

2

J Phys Condens Matter 22 (2010) 283101 Topical Review

c)

AA

C

G

5

3

53

internal loop

hairpin

junction

stem

a) b)

H1

H2

H3H4

H5

H6

Figure 1 (a) RNA secondary structure is composed of stems and loop regions Stems are formed by canonical GC AU and GU wobble WCbase pairs Loop regions are formed by hairpins internal loops and junctions (b) Loop regions can form long-range interactions as inpseudoknots or by using non-canonical base pairs (c) Non-canonical base pairs A486 (blue on the left) forms a trans WatsonndashWatson basepair with A511 (purple on the right) a trans HoogsteenndashSugar base pair with G482 (red on top) and a cis SugarndashSugar base pair with C505(green at the bottom) The base pair interactions occur in the H marismortui 23S rRNA structure (PDB 1S72)

canonical Watson and Crick base pairs GC and AU alongwith the GU wobble base pair A hairpin is a single-strandedregion that folds back on itself via regions of complementarybase pairs The single-stranded region between two stems isknown as an internal loop while a junction can be definedas the point of connection between three or more helicalstems Single-stranded regions can form pseudoknots whenbase pairs intertwine (figure 1(b)) These basic secondary-structure elements are pieced together via tertiary interactionsto form compact structures of active RNAs

Structural and comparative studies have suggested thatRNA structure is largely composed of repetitive modularbuilding blocks or motifs In particular RNA tertiary motifs areconserved structural patterns formed by pairwise interactionsbetween nucleotides These include base pairing basestacking and basendashphosphate interactions [31] RNAs possessremarkable interaction attributes where base pair interactionscan be classified into twelve geometric families in termsof pairs of interacting edges which can be Watson andCrick (WC) Hoogsteen (H) Sugar (S) and glycosidic bondorientations cis and trans [32] (see figure 1(c)) Furthermorebasendashphosphate interactions are also common and a recentclassification model has been proposed [33]

Graph theory methods have also been developedto represent RNA secondary structures since the 1970swith the pioneering work by Waterman [34] extendedby others [35ndash37] Recent structural or topologicalrepresentations using graph representation (trees and dualgraphs) of RNA secondary structures like RAG [38] constituteinnovative methods from an allied field that can help RNAscience [39]

3 RNA folding proteins dynamics pathways andpausing

Over the past decade many experiments have unveiled newclues into the elegant complexity and dynamics of RNAfolding Besides forming intricate long-range RNAndashRNAinteractions to direct and stabilize the structure the folding

is strongly affected by the speed of elongation and site-specific pausing of the RNA polymerase as well as interactionsof the RNA molecule with proteins solvent and smallmetabolites [40ndash42]

Our current understanding of how RNA folds is that it doesso by a hierarchical process helical elements in the secondarystructure are formed and then compact structures are formedusing tertiary pairwise interactions and tertiary motifs [43]However further studies are suggesting that RNA folding isactually quasi-hierarchical [44ndash46] where rearrangements ofsecondary-structure elements caused by tertiary interactionscan in turn trigger stable native folding

The RNA folding pathway does not always proceeduniquely Indeed the free energy landscape of RNA foldingis highly rugged composed of different and even paralleltrajectories where RNA chains can be easily kinetically trappedinto intermediate metastable states If the RNA falls into anintermediate structure state folding can take longer becausebreaking the non-native base pairs is required to reach its nativestate [47] Fortunately long-range tertiary contacts alongwith solvent (eg water ions) interactions can guide RNAto the correct folding [48] These tertiary interactions workcooperatively to direct folding to the native state [49]

It should be noted however that for some RNAsintermediate or alternative states are functional as forriboswitches (mentioned later) [50] and RNA regions inviruses [51ndash54] For instance there are structural domainsin the hepatitis D (HDV) virus where both branched andunbranched conformations are formed for distinct functionalroles [52] In the turnip crinkle virus (TCV) differentstructural configurations control the processes of virustranslation and replication [53]

In contrast to in vitro folding where unfolded RNAchains fold in the presence of magnesium ions (Mg2+) inthe cellular context nascent RNA molecules start to foldbefore transcription is complete Thus achieving the correctstructure can depend on the speed of transcription Thispresents a problem for helices composed of long-range strandssuch as helix H1 shown in figure 1(a) This is because

3

J Phys Condens Matter 22 (2010) 283101 Topical Review

during transcription the 5prime-strand of H1 is transcribed firstthus it needs to wait for the 3prime-strand of H1 to be fullytranscribed During this time alternative base pairs canform at the 5prime-strand To help minimize competition fromalternative folds RNA polymerase the molecule in chargeof RNA transcription pauses to allow the formation of non-native temporary conformations that easily unfold when the3prime-strand becomes available Such pausing prevents formationof super-stable non-native structures and thus constitutes ageneral strategy for facilitating the folding of long-rangehelices [55 56]

In addition to tertiary interactions solvent molecules suchas water and cations (positive ions such as Mg2+ and K+)also help initiate and guide RNA folding into the nativestructure Water molecules mediate coupled molecular motionsthroughout a folded RNA core and non-canonical bifurcatedbase pairs are formed only in the presence of water [32 57] Inaddition the high negative charge of an RNA molecule worksagainst its folding into its native structure Cations promotefolding by reducing the repulsion between RNA phosphatesThus besides helping to stabilize tertiary interactions duringfolding ionndashRNA interactions influence stability pathwaydiversity and transition states It has been emphasized [42 58]that ions of at least two types modulate the electrostatic surfacepotential of the negatively charged RNA molecule chelatedions which are held in direct contact with the RNA surfaceby electrostatic forces and diffuse ions which accumulate nearthe RNA due to the RNA electrostatic field and remain largelyhydrated

Most RNAs within the cell are parts of RNAndashproteincomplexes Their binding can stabilize the RNA structureinduce conformational changes and even act as RNAchaperones to guide folding Large RNAs such as group Iintron RNase P and the ribosomal RNA (rRNA) structureare often stabilized by RNA-binding proteins For instancea recent model for rRNA folding suggested that the rRNAtertiary structure is dynamic (flexible) in the absence ofproteins and that alternative structural conformations cancompete with each other Thus ribosomal proteins may notonly stabilize rRNA tertiary interactions but might also changethe path of assembly avoiding RNA misfoldings [59] Proteinsalso bind at different stages in the folding process and ahierarchical protein binding is now recognized where primarybinding proteins interact at an early stage thus marking theirimportance in the assembly process Other proteins bind at theend and are related more to functional roles

Though RNA folding is a complex process is sensitive tothe environment and possesses a network of folding transitionsand pathways many RNAs share a common organization oftheir helical elements Indeed analyses on current solvedRNA 3D structures have shown that the majority of helicalelements in junctions tend to arrange roughly in paralleland perpendicular configurations These arrangements arestabilized by both RNAndashRNA interactions as well as RNAndashprotein interactions [60ndash62]

Finally Nature exploits the potential of RNA sequences toform multiple alternative metastable structures for implement-ing highly sensitive molecule switches capable of controlling

gene expression at the level of the mRNA RNA riboswitchesare RNAs found within some messenger RNA (mRNA) thatcan change conformation upon binding small metabolites andthus terminate transcription or block translation The ter-tiary structure of the riboswitch binding pocket is stabilizedonly upon ligand binding As mentioned earlier RNA virusesmake use of transitions between metastable structural domainsfor different functions Thus riboswitches and structural do-mains within viruses exemplify the dynamic properties of RNAmolecules

4 RNA structure determination

Since the lsquoEra of RNA awakeningrsquo began [63] theerroneous perception of RNA as a simple molecule hasdramatically changed due in part to advances in RNA structuredetermination techniques In the early 1970s the first fullRNA structure the transfer RNA (tRNA) was obtained bycrystallization techniques [64] providing many clues to RNArsquoshelical organization A decade later the discovery of catalyticRNAs revolutionized our understanding of RNArsquos complexcellular roles Indeed the ligand-dependent conformationsof riboswitches have shown that even small RNAs arestructurally and functionally complex [50] Similarly theinitial identification of the P4ndashP6 fragment of the group Iintron and later the larger domain P1ndashP9 and the group IIintron have unraveled the intricacies of long-range RNAndashRNAinteraction motifs such as the tetraloop receptor ribose zipperA-minor and other intriguing motifs An important milestonewas reached with high resolution structures of three ribosomalRNA (rRNA) subunits including the 23S rRNA subunit whichis the largest RNA structure solved thus far [65ndash67] thesestructures revealed arrays of RNAndashRNA and RNAndashproteininteractions which in turn can form higher order interactionpatterns [49] These pioneering and far-reaching works onthe ribosome structure by A E Yonath V RamakrishnanT A Steitz and others were recognized in the 2009 Nobel Prizein Chemistry These remarkable breakthroughs however alsodemand atomic-level structural and dynamic information forunderstanding RNA function

41 Experimental techniques

Structural information has been determined from RNAmolecules using numerous experimental strategies Many ofthese experimental approaches have been complemented bycomputational approaches resulting in important improve-ments on both the computational and experimental fronts Be-low we describe some of the experimental methods for RNAstructure determination For a more detailed review see [68]

Fluorescence resonance energy transfer (FRET) is amethod often used to analyze the global structure and evendynamics of RNA elements such as junctions [69 70] Thestrength of FRET is that it allows detecting when two elementsin the structure are in close proximity (10ndash100 A) thus helpingto determine RNArsquos global helical organization

The 2prime-hydroxyl acylation RNA (SHAPE) chemistryapproach is an important probing technique recently developedby the Weeks group [71 72] The protocol exploits the

4

J Phys Condens Matter 22 (2010) 283101 Topical Review

nucleophilic reactivity of the ribose-2prime-hydroxyl positionwhich strongly correlates with the nucleotide flexibilityNucleotides in single-stranded regions are seen as flexiblewhile nucleotides involved in base pairs are more rigidThe main advantage is that the method is simple and thereis no limit on the size of the RNA molecule A fewsecondary- and tertiary-structure prediction programs haveincorporated data from RNA SHAPE to improve the predictionaccuracy [73ndash75]

Other structure-specific probe methods such as usingdimethyl sulfate (DMS) [143] which can focus on certain nu-cleotides (eg adenines) or hydroxyl radical footprinting [77]which can explore the solvent accessibility to the RNA back-bone are also used to deduce RNA structured features Bothmethods can be powerful but are laborious Another approachis the use of microarrays for chemical mapping [78] Thismethod has been combined with dynamic programming algo-rithms for predicting RNA secondary structure replacing otherlabor intensive approaches such as chemical mapping and en-zymatic cleavage

NMR spectroscopy is the well-known technique thatexploits the magnetic properties of certain nuclei NMRspectroscopy is an important tool for probing the structureand dynamics of RNA [79 80] In addition NMR relaxationmethods can be used to obtain dynamic data on timescalesranging from picoseconds to seconds The main limitation onNMR is molecular size (sim20 kDa) However novel methodslike that reported combining small-angle scattering (SAXS)and NMR techniques [63] can be used to predict larger RNAsas applied recently for the 100 nt TCV RNA virus [81]

Of course x-ray crystallography analysis based onthe growth of single well-ordered crystals remains anotherinvaluable method for detailed structural resolution Well-ordered single crystals are required for structural studies byx-ray diffraction methods and obtaining them is often themost difficult step of the structure determination process [82]Most RNAs exist naturally as proteinndashRNA complexes and thecrystallization of these complexes has several advantages overthe crystallization of RNA alone X-ray crystallography hasbeen successful in determining the largest RNA molecules sofar such as the large ribosomal subunits [65ndash67] Neverthelesstechnical difficulties still remain such as the availability ofmany conformations and transitional states for RNAs Thedynamical nature of RNA molecules also complicates matters

Other experimental techniques such as cryo-electronmicroscopy (cryo-EM) have recently undergone tremendousadvances such as those reported in the study of IRES virusdomains [83] Contrary to the crystallography case there is nosize limitation and the molecule does not need to be orderedor isolated from its complex However due to the difficulty ofaccurate image reconstruction the resolution is still limited to10 A at best

42 Current advances in RNA structure data

Advances in sequencing technology have made available agrowing amount of RNA sequence information (figure 2) butthe challenge of how to interpret these data remains unmetThe RNA secondary-structure and statistics database (RNA

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Num

ber

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

RNA structures deposited in the NDB database

Total Yearly

Figure 2 Number of RNA structures deposited in the NDB nucleicacid database (httpndbserverrutgersedu) as of December 2009

STRAND) [144] contains as of May 2010 4666 secondarystructures Furthermore Rfam a database of sequence andsecondary-structure information from several RNA familieshas recorded 1446 RNA families since its last update in January2010 [85] Each family can contain many thousands of RNAsequences (eg the 5S rRNA family has 57 054 sequences)

The structures of a large majority of RNA moleculesremain to be solved Despite many advances in RNAcrystallography NMR and chemical modification RNAstructure determination is a difficult task This makes the needfor accurate prediction using computational methods especiallyurgent

5 RNA secondary-structure prediction approaches

Predicting RNA secondary structures refers to the task ofdetermining given a single RNA sequence or multipleRNA sequences the set of complementary canonicalWatson and Crick GC AU and GU wobble base pairsthat define helical stems as well as the single-strandedregions such as hairpins internal loops and junctions (seesection 2) Many computational programs designed forpredicting RNA secondary structures have been developedover the past three decades Approaches vary widelyincluding free energy minimization using thermodynamicparameters [76 86 87] knowledge-based predictions basedon known RNA structures [88ndash90] comparative sequencealignment algorithms [76 91ndash96] combinations of these andothers Below we describe some of the current predictionprograms More examples are listed in table 2 Other reviewson specific approaches are also available [26 28 97ndash99]

51 Secondary-structure prediction using a single sequence

It is believed that the native RNA structure correspondsto a global minimum of the relevant free energy functionTherefore prediction programs focus on determining thefree energy for a secondary structure One of the firstprograms developed for predicting RNA secondary structuresusing a single sequence called Mfold was developed by

5

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 4: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

c)

AA

C

G

5

3

53

internal loop

hairpin

junction

stem

a) b)

H1

H2

H3H4

H5

H6

Figure 1 (a) RNA secondary structure is composed of stems and loop regions Stems are formed by canonical GC AU and GU wobble WCbase pairs Loop regions are formed by hairpins internal loops and junctions (b) Loop regions can form long-range interactions as inpseudoknots or by using non-canonical base pairs (c) Non-canonical base pairs A486 (blue on the left) forms a trans WatsonndashWatson basepair with A511 (purple on the right) a trans HoogsteenndashSugar base pair with G482 (red on top) and a cis SugarndashSugar base pair with C505(green at the bottom) The base pair interactions occur in the H marismortui 23S rRNA structure (PDB 1S72)

canonical Watson and Crick base pairs GC and AU alongwith the GU wobble base pair A hairpin is a single-strandedregion that folds back on itself via regions of complementarybase pairs The single-stranded region between two stems isknown as an internal loop while a junction can be definedas the point of connection between three or more helicalstems Single-stranded regions can form pseudoknots whenbase pairs intertwine (figure 1(b)) These basic secondary-structure elements are pieced together via tertiary interactionsto form compact structures of active RNAs

Structural and comparative studies have suggested thatRNA structure is largely composed of repetitive modularbuilding blocks or motifs In particular RNA tertiary motifs areconserved structural patterns formed by pairwise interactionsbetween nucleotides These include base pairing basestacking and basendashphosphate interactions [31] RNAs possessremarkable interaction attributes where base pair interactionscan be classified into twelve geometric families in termsof pairs of interacting edges which can be Watson andCrick (WC) Hoogsteen (H) Sugar (S) and glycosidic bondorientations cis and trans [32] (see figure 1(c)) Furthermorebasendashphosphate interactions are also common and a recentclassification model has been proposed [33]

Graph theory methods have also been developedto represent RNA secondary structures since the 1970swith the pioneering work by Waterman [34] extendedby others [35ndash37] Recent structural or topologicalrepresentations using graph representation (trees and dualgraphs) of RNA secondary structures like RAG [38] constituteinnovative methods from an allied field that can help RNAscience [39]

3 RNA folding proteins dynamics pathways andpausing

Over the past decade many experiments have unveiled newclues into the elegant complexity and dynamics of RNAfolding Besides forming intricate long-range RNAndashRNAinteractions to direct and stabilize the structure the folding

is strongly affected by the speed of elongation and site-specific pausing of the RNA polymerase as well as interactionsof the RNA molecule with proteins solvent and smallmetabolites [40ndash42]

Our current understanding of how RNA folds is that it doesso by a hierarchical process helical elements in the secondarystructure are formed and then compact structures are formedusing tertiary pairwise interactions and tertiary motifs [43]However further studies are suggesting that RNA folding isactually quasi-hierarchical [44ndash46] where rearrangements ofsecondary-structure elements caused by tertiary interactionscan in turn trigger stable native folding

The RNA folding pathway does not always proceeduniquely Indeed the free energy landscape of RNA foldingis highly rugged composed of different and even paralleltrajectories where RNA chains can be easily kinetically trappedinto intermediate metastable states If the RNA falls into anintermediate structure state folding can take longer becausebreaking the non-native base pairs is required to reach its nativestate [47] Fortunately long-range tertiary contacts alongwith solvent (eg water ions) interactions can guide RNAto the correct folding [48] These tertiary interactions workcooperatively to direct folding to the native state [49]

It should be noted however that for some RNAsintermediate or alternative states are functional as forriboswitches (mentioned later) [50] and RNA regions inviruses [51ndash54] For instance there are structural domainsin the hepatitis D (HDV) virus where both branched andunbranched conformations are formed for distinct functionalroles [52] In the turnip crinkle virus (TCV) differentstructural configurations control the processes of virustranslation and replication [53]

In contrast to in vitro folding where unfolded RNAchains fold in the presence of magnesium ions (Mg2+) inthe cellular context nascent RNA molecules start to foldbefore transcription is complete Thus achieving the correctstructure can depend on the speed of transcription Thispresents a problem for helices composed of long-range strandssuch as helix H1 shown in figure 1(a) This is because

3

J Phys Condens Matter 22 (2010) 283101 Topical Review

during transcription the 5prime-strand of H1 is transcribed firstthus it needs to wait for the 3prime-strand of H1 to be fullytranscribed During this time alternative base pairs canform at the 5prime-strand To help minimize competition fromalternative folds RNA polymerase the molecule in chargeof RNA transcription pauses to allow the formation of non-native temporary conformations that easily unfold when the3prime-strand becomes available Such pausing prevents formationof super-stable non-native structures and thus constitutes ageneral strategy for facilitating the folding of long-rangehelices [55 56]

In addition to tertiary interactions solvent molecules suchas water and cations (positive ions such as Mg2+ and K+)also help initiate and guide RNA folding into the nativestructure Water molecules mediate coupled molecular motionsthroughout a folded RNA core and non-canonical bifurcatedbase pairs are formed only in the presence of water [32 57] Inaddition the high negative charge of an RNA molecule worksagainst its folding into its native structure Cations promotefolding by reducing the repulsion between RNA phosphatesThus besides helping to stabilize tertiary interactions duringfolding ionndashRNA interactions influence stability pathwaydiversity and transition states It has been emphasized [42 58]that ions of at least two types modulate the electrostatic surfacepotential of the negatively charged RNA molecule chelatedions which are held in direct contact with the RNA surfaceby electrostatic forces and diffuse ions which accumulate nearthe RNA due to the RNA electrostatic field and remain largelyhydrated

Most RNAs within the cell are parts of RNAndashproteincomplexes Their binding can stabilize the RNA structureinduce conformational changes and even act as RNAchaperones to guide folding Large RNAs such as group Iintron RNase P and the ribosomal RNA (rRNA) structureare often stabilized by RNA-binding proteins For instancea recent model for rRNA folding suggested that the rRNAtertiary structure is dynamic (flexible) in the absence ofproteins and that alternative structural conformations cancompete with each other Thus ribosomal proteins may notonly stabilize rRNA tertiary interactions but might also changethe path of assembly avoiding RNA misfoldings [59] Proteinsalso bind at different stages in the folding process and ahierarchical protein binding is now recognized where primarybinding proteins interact at an early stage thus marking theirimportance in the assembly process Other proteins bind at theend and are related more to functional roles

Though RNA folding is a complex process is sensitive tothe environment and possesses a network of folding transitionsand pathways many RNAs share a common organization oftheir helical elements Indeed analyses on current solvedRNA 3D structures have shown that the majority of helicalelements in junctions tend to arrange roughly in paralleland perpendicular configurations These arrangements arestabilized by both RNAndashRNA interactions as well as RNAndashprotein interactions [60ndash62]

Finally Nature exploits the potential of RNA sequences toform multiple alternative metastable structures for implement-ing highly sensitive molecule switches capable of controlling

gene expression at the level of the mRNA RNA riboswitchesare RNAs found within some messenger RNA (mRNA) thatcan change conformation upon binding small metabolites andthus terminate transcription or block translation The ter-tiary structure of the riboswitch binding pocket is stabilizedonly upon ligand binding As mentioned earlier RNA virusesmake use of transitions between metastable structural domainsfor different functions Thus riboswitches and structural do-mains within viruses exemplify the dynamic properties of RNAmolecules

4 RNA structure determination

Since the lsquoEra of RNA awakeningrsquo began [63] theerroneous perception of RNA as a simple molecule hasdramatically changed due in part to advances in RNA structuredetermination techniques In the early 1970s the first fullRNA structure the transfer RNA (tRNA) was obtained bycrystallization techniques [64] providing many clues to RNArsquoshelical organization A decade later the discovery of catalyticRNAs revolutionized our understanding of RNArsquos complexcellular roles Indeed the ligand-dependent conformationsof riboswitches have shown that even small RNAs arestructurally and functionally complex [50] Similarly theinitial identification of the P4ndashP6 fragment of the group Iintron and later the larger domain P1ndashP9 and the group IIintron have unraveled the intricacies of long-range RNAndashRNAinteraction motifs such as the tetraloop receptor ribose zipperA-minor and other intriguing motifs An important milestonewas reached with high resolution structures of three ribosomalRNA (rRNA) subunits including the 23S rRNA subunit whichis the largest RNA structure solved thus far [65ndash67] thesestructures revealed arrays of RNAndashRNA and RNAndashproteininteractions which in turn can form higher order interactionpatterns [49] These pioneering and far-reaching works onthe ribosome structure by A E Yonath V RamakrishnanT A Steitz and others were recognized in the 2009 Nobel Prizein Chemistry These remarkable breakthroughs however alsodemand atomic-level structural and dynamic information forunderstanding RNA function

41 Experimental techniques

Structural information has been determined from RNAmolecules using numerous experimental strategies Many ofthese experimental approaches have been complemented bycomputational approaches resulting in important improve-ments on both the computational and experimental fronts Be-low we describe some of the experimental methods for RNAstructure determination For a more detailed review see [68]

Fluorescence resonance energy transfer (FRET) is amethod often used to analyze the global structure and evendynamics of RNA elements such as junctions [69 70] Thestrength of FRET is that it allows detecting when two elementsin the structure are in close proximity (10ndash100 A) thus helpingto determine RNArsquos global helical organization

The 2prime-hydroxyl acylation RNA (SHAPE) chemistryapproach is an important probing technique recently developedby the Weeks group [71 72] The protocol exploits the

4

J Phys Condens Matter 22 (2010) 283101 Topical Review

nucleophilic reactivity of the ribose-2prime-hydroxyl positionwhich strongly correlates with the nucleotide flexibilityNucleotides in single-stranded regions are seen as flexiblewhile nucleotides involved in base pairs are more rigidThe main advantage is that the method is simple and thereis no limit on the size of the RNA molecule A fewsecondary- and tertiary-structure prediction programs haveincorporated data from RNA SHAPE to improve the predictionaccuracy [73ndash75]

Other structure-specific probe methods such as usingdimethyl sulfate (DMS) [143] which can focus on certain nu-cleotides (eg adenines) or hydroxyl radical footprinting [77]which can explore the solvent accessibility to the RNA back-bone are also used to deduce RNA structured features Bothmethods can be powerful but are laborious Another approachis the use of microarrays for chemical mapping [78] Thismethod has been combined with dynamic programming algo-rithms for predicting RNA secondary structure replacing otherlabor intensive approaches such as chemical mapping and en-zymatic cleavage

NMR spectroscopy is the well-known technique thatexploits the magnetic properties of certain nuclei NMRspectroscopy is an important tool for probing the structureand dynamics of RNA [79 80] In addition NMR relaxationmethods can be used to obtain dynamic data on timescalesranging from picoseconds to seconds The main limitation onNMR is molecular size (sim20 kDa) However novel methodslike that reported combining small-angle scattering (SAXS)and NMR techniques [63] can be used to predict larger RNAsas applied recently for the 100 nt TCV RNA virus [81]

Of course x-ray crystallography analysis based onthe growth of single well-ordered crystals remains anotherinvaluable method for detailed structural resolution Well-ordered single crystals are required for structural studies byx-ray diffraction methods and obtaining them is often themost difficult step of the structure determination process [82]Most RNAs exist naturally as proteinndashRNA complexes and thecrystallization of these complexes has several advantages overthe crystallization of RNA alone X-ray crystallography hasbeen successful in determining the largest RNA molecules sofar such as the large ribosomal subunits [65ndash67] Neverthelesstechnical difficulties still remain such as the availability ofmany conformations and transitional states for RNAs Thedynamical nature of RNA molecules also complicates matters

Other experimental techniques such as cryo-electronmicroscopy (cryo-EM) have recently undergone tremendousadvances such as those reported in the study of IRES virusdomains [83] Contrary to the crystallography case there is nosize limitation and the molecule does not need to be orderedor isolated from its complex However due to the difficulty ofaccurate image reconstruction the resolution is still limited to10 A at best

42 Current advances in RNA structure data

Advances in sequencing technology have made available agrowing amount of RNA sequence information (figure 2) butthe challenge of how to interpret these data remains unmetThe RNA secondary-structure and statistics database (RNA

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Num

ber

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

RNA structures deposited in the NDB database

Total Yearly

Figure 2 Number of RNA structures deposited in the NDB nucleicacid database (httpndbserverrutgersedu) as of December 2009

STRAND) [144] contains as of May 2010 4666 secondarystructures Furthermore Rfam a database of sequence andsecondary-structure information from several RNA familieshas recorded 1446 RNA families since its last update in January2010 [85] Each family can contain many thousands of RNAsequences (eg the 5S rRNA family has 57 054 sequences)

The structures of a large majority of RNA moleculesremain to be solved Despite many advances in RNAcrystallography NMR and chemical modification RNAstructure determination is a difficult task This makes the needfor accurate prediction using computational methods especiallyurgent

5 RNA secondary-structure prediction approaches

Predicting RNA secondary structures refers to the task ofdetermining given a single RNA sequence or multipleRNA sequences the set of complementary canonicalWatson and Crick GC AU and GU wobble base pairsthat define helical stems as well as the single-strandedregions such as hairpins internal loops and junctions (seesection 2) Many computational programs designed forpredicting RNA secondary structures have been developedover the past three decades Approaches vary widelyincluding free energy minimization using thermodynamicparameters [76 86 87] knowledge-based predictions basedon known RNA structures [88ndash90] comparative sequencealignment algorithms [76 91ndash96] combinations of these andothers Below we describe some of the current predictionprograms More examples are listed in table 2 Other reviewson specific approaches are also available [26 28 97ndash99]

51 Secondary-structure prediction using a single sequence

It is believed that the native RNA structure correspondsto a global minimum of the relevant free energy functionTherefore prediction programs focus on determining thefree energy for a secondary structure One of the firstprograms developed for predicting RNA secondary structuresusing a single sequence called Mfold was developed by

5

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 5: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

during transcription the 5prime-strand of H1 is transcribed firstthus it needs to wait for the 3prime-strand of H1 to be fullytranscribed During this time alternative base pairs canform at the 5prime-strand To help minimize competition fromalternative folds RNA polymerase the molecule in chargeof RNA transcription pauses to allow the formation of non-native temporary conformations that easily unfold when the3prime-strand becomes available Such pausing prevents formationof super-stable non-native structures and thus constitutes ageneral strategy for facilitating the folding of long-rangehelices [55 56]

In addition to tertiary interactions solvent molecules suchas water and cations (positive ions such as Mg2+ and K+)also help initiate and guide RNA folding into the nativestructure Water molecules mediate coupled molecular motionsthroughout a folded RNA core and non-canonical bifurcatedbase pairs are formed only in the presence of water [32 57] Inaddition the high negative charge of an RNA molecule worksagainst its folding into its native structure Cations promotefolding by reducing the repulsion between RNA phosphatesThus besides helping to stabilize tertiary interactions duringfolding ionndashRNA interactions influence stability pathwaydiversity and transition states It has been emphasized [42 58]that ions of at least two types modulate the electrostatic surfacepotential of the negatively charged RNA molecule chelatedions which are held in direct contact with the RNA surfaceby electrostatic forces and diffuse ions which accumulate nearthe RNA due to the RNA electrostatic field and remain largelyhydrated

Most RNAs within the cell are parts of RNAndashproteincomplexes Their binding can stabilize the RNA structureinduce conformational changes and even act as RNAchaperones to guide folding Large RNAs such as group Iintron RNase P and the ribosomal RNA (rRNA) structureare often stabilized by RNA-binding proteins For instancea recent model for rRNA folding suggested that the rRNAtertiary structure is dynamic (flexible) in the absence ofproteins and that alternative structural conformations cancompete with each other Thus ribosomal proteins may notonly stabilize rRNA tertiary interactions but might also changethe path of assembly avoiding RNA misfoldings [59] Proteinsalso bind at different stages in the folding process and ahierarchical protein binding is now recognized where primarybinding proteins interact at an early stage thus marking theirimportance in the assembly process Other proteins bind at theend and are related more to functional roles

Though RNA folding is a complex process is sensitive tothe environment and possesses a network of folding transitionsand pathways many RNAs share a common organization oftheir helical elements Indeed analyses on current solvedRNA 3D structures have shown that the majority of helicalelements in junctions tend to arrange roughly in paralleland perpendicular configurations These arrangements arestabilized by both RNAndashRNA interactions as well as RNAndashprotein interactions [60ndash62]

Finally Nature exploits the potential of RNA sequences toform multiple alternative metastable structures for implement-ing highly sensitive molecule switches capable of controlling

gene expression at the level of the mRNA RNA riboswitchesare RNAs found within some messenger RNA (mRNA) thatcan change conformation upon binding small metabolites andthus terminate transcription or block translation The ter-tiary structure of the riboswitch binding pocket is stabilizedonly upon ligand binding As mentioned earlier RNA virusesmake use of transitions between metastable structural domainsfor different functions Thus riboswitches and structural do-mains within viruses exemplify the dynamic properties of RNAmolecules

4 RNA structure determination

Since the lsquoEra of RNA awakeningrsquo began [63] theerroneous perception of RNA as a simple molecule hasdramatically changed due in part to advances in RNA structuredetermination techniques In the early 1970s the first fullRNA structure the transfer RNA (tRNA) was obtained bycrystallization techniques [64] providing many clues to RNArsquoshelical organization A decade later the discovery of catalyticRNAs revolutionized our understanding of RNArsquos complexcellular roles Indeed the ligand-dependent conformationsof riboswitches have shown that even small RNAs arestructurally and functionally complex [50] Similarly theinitial identification of the P4ndashP6 fragment of the group Iintron and later the larger domain P1ndashP9 and the group IIintron have unraveled the intricacies of long-range RNAndashRNAinteraction motifs such as the tetraloop receptor ribose zipperA-minor and other intriguing motifs An important milestonewas reached with high resolution structures of three ribosomalRNA (rRNA) subunits including the 23S rRNA subunit whichis the largest RNA structure solved thus far [65ndash67] thesestructures revealed arrays of RNAndashRNA and RNAndashproteininteractions which in turn can form higher order interactionpatterns [49] These pioneering and far-reaching works onthe ribosome structure by A E Yonath V RamakrishnanT A Steitz and others were recognized in the 2009 Nobel Prizein Chemistry These remarkable breakthroughs however alsodemand atomic-level structural and dynamic information forunderstanding RNA function

41 Experimental techniques

Structural information has been determined from RNAmolecules using numerous experimental strategies Many ofthese experimental approaches have been complemented bycomputational approaches resulting in important improve-ments on both the computational and experimental fronts Be-low we describe some of the experimental methods for RNAstructure determination For a more detailed review see [68]

Fluorescence resonance energy transfer (FRET) is amethod often used to analyze the global structure and evendynamics of RNA elements such as junctions [69 70] Thestrength of FRET is that it allows detecting when two elementsin the structure are in close proximity (10ndash100 A) thus helpingto determine RNArsquos global helical organization

The 2prime-hydroxyl acylation RNA (SHAPE) chemistryapproach is an important probing technique recently developedby the Weeks group [71 72] The protocol exploits the

4

J Phys Condens Matter 22 (2010) 283101 Topical Review

nucleophilic reactivity of the ribose-2prime-hydroxyl positionwhich strongly correlates with the nucleotide flexibilityNucleotides in single-stranded regions are seen as flexiblewhile nucleotides involved in base pairs are more rigidThe main advantage is that the method is simple and thereis no limit on the size of the RNA molecule A fewsecondary- and tertiary-structure prediction programs haveincorporated data from RNA SHAPE to improve the predictionaccuracy [73ndash75]

Other structure-specific probe methods such as usingdimethyl sulfate (DMS) [143] which can focus on certain nu-cleotides (eg adenines) or hydroxyl radical footprinting [77]which can explore the solvent accessibility to the RNA back-bone are also used to deduce RNA structured features Bothmethods can be powerful but are laborious Another approachis the use of microarrays for chemical mapping [78] Thismethod has been combined with dynamic programming algo-rithms for predicting RNA secondary structure replacing otherlabor intensive approaches such as chemical mapping and en-zymatic cleavage

NMR spectroscopy is the well-known technique thatexploits the magnetic properties of certain nuclei NMRspectroscopy is an important tool for probing the structureand dynamics of RNA [79 80] In addition NMR relaxationmethods can be used to obtain dynamic data on timescalesranging from picoseconds to seconds The main limitation onNMR is molecular size (sim20 kDa) However novel methodslike that reported combining small-angle scattering (SAXS)and NMR techniques [63] can be used to predict larger RNAsas applied recently for the 100 nt TCV RNA virus [81]

Of course x-ray crystallography analysis based onthe growth of single well-ordered crystals remains anotherinvaluable method for detailed structural resolution Well-ordered single crystals are required for structural studies byx-ray diffraction methods and obtaining them is often themost difficult step of the structure determination process [82]Most RNAs exist naturally as proteinndashRNA complexes and thecrystallization of these complexes has several advantages overthe crystallization of RNA alone X-ray crystallography hasbeen successful in determining the largest RNA molecules sofar such as the large ribosomal subunits [65ndash67] Neverthelesstechnical difficulties still remain such as the availability ofmany conformations and transitional states for RNAs Thedynamical nature of RNA molecules also complicates matters

Other experimental techniques such as cryo-electronmicroscopy (cryo-EM) have recently undergone tremendousadvances such as those reported in the study of IRES virusdomains [83] Contrary to the crystallography case there is nosize limitation and the molecule does not need to be orderedor isolated from its complex However due to the difficulty ofaccurate image reconstruction the resolution is still limited to10 A at best

42 Current advances in RNA structure data

Advances in sequencing technology have made available agrowing amount of RNA sequence information (figure 2) butthe challenge of how to interpret these data remains unmetThe RNA secondary-structure and statistics database (RNA

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Num

ber

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

RNA structures deposited in the NDB database

Total Yearly

Figure 2 Number of RNA structures deposited in the NDB nucleicacid database (httpndbserverrutgersedu) as of December 2009

STRAND) [144] contains as of May 2010 4666 secondarystructures Furthermore Rfam a database of sequence andsecondary-structure information from several RNA familieshas recorded 1446 RNA families since its last update in January2010 [85] Each family can contain many thousands of RNAsequences (eg the 5S rRNA family has 57 054 sequences)

The structures of a large majority of RNA moleculesremain to be solved Despite many advances in RNAcrystallography NMR and chemical modification RNAstructure determination is a difficult task This makes the needfor accurate prediction using computational methods especiallyurgent

5 RNA secondary-structure prediction approaches

Predicting RNA secondary structures refers to the task ofdetermining given a single RNA sequence or multipleRNA sequences the set of complementary canonicalWatson and Crick GC AU and GU wobble base pairsthat define helical stems as well as the single-strandedregions such as hairpins internal loops and junctions (seesection 2) Many computational programs designed forpredicting RNA secondary structures have been developedover the past three decades Approaches vary widelyincluding free energy minimization using thermodynamicparameters [76 86 87] knowledge-based predictions basedon known RNA structures [88ndash90] comparative sequencealignment algorithms [76 91ndash96] combinations of these andothers Below we describe some of the current predictionprograms More examples are listed in table 2 Other reviewson specific approaches are also available [26 28 97ndash99]

51 Secondary-structure prediction using a single sequence

It is believed that the native RNA structure correspondsto a global minimum of the relevant free energy functionTherefore prediction programs focus on determining thefree energy for a secondary structure One of the firstprograms developed for predicting RNA secondary structuresusing a single sequence called Mfold was developed by

5

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 6: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

nucleophilic reactivity of the ribose-2prime-hydroxyl positionwhich strongly correlates with the nucleotide flexibilityNucleotides in single-stranded regions are seen as flexiblewhile nucleotides involved in base pairs are more rigidThe main advantage is that the method is simple and thereis no limit on the size of the RNA molecule A fewsecondary- and tertiary-structure prediction programs haveincorporated data from RNA SHAPE to improve the predictionaccuracy [73ndash75]

Other structure-specific probe methods such as usingdimethyl sulfate (DMS) [143] which can focus on certain nu-cleotides (eg adenines) or hydroxyl radical footprinting [77]which can explore the solvent accessibility to the RNA back-bone are also used to deduce RNA structured features Bothmethods can be powerful but are laborious Another approachis the use of microarrays for chemical mapping [78] Thismethod has been combined with dynamic programming algo-rithms for predicting RNA secondary structure replacing otherlabor intensive approaches such as chemical mapping and en-zymatic cleavage

NMR spectroscopy is the well-known technique thatexploits the magnetic properties of certain nuclei NMRspectroscopy is an important tool for probing the structureand dynamics of RNA [79 80] In addition NMR relaxationmethods can be used to obtain dynamic data on timescalesranging from picoseconds to seconds The main limitation onNMR is molecular size (sim20 kDa) However novel methodslike that reported combining small-angle scattering (SAXS)and NMR techniques [63] can be used to predict larger RNAsas applied recently for the 100 nt TCV RNA virus [81]

Of course x-ray crystallography analysis based onthe growth of single well-ordered crystals remains anotherinvaluable method for detailed structural resolution Well-ordered single crystals are required for structural studies byx-ray diffraction methods and obtaining them is often themost difficult step of the structure determination process [82]Most RNAs exist naturally as proteinndashRNA complexes and thecrystallization of these complexes has several advantages overthe crystallization of RNA alone X-ray crystallography hasbeen successful in determining the largest RNA molecules sofar such as the large ribosomal subunits [65ndash67] Neverthelesstechnical difficulties still remain such as the availability ofmany conformations and transitional states for RNAs Thedynamical nature of RNA molecules also complicates matters

Other experimental techniques such as cryo-electronmicroscopy (cryo-EM) have recently undergone tremendousadvances such as those reported in the study of IRES virusdomains [83] Contrary to the crystallography case there is nosize limitation and the molecule does not need to be orderedor isolated from its complex However due to the difficulty ofaccurate image reconstruction the resolution is still limited to10 A at best

42 Current advances in RNA structure data

Advances in sequencing technology have made available agrowing amount of RNA sequence information (figure 2) butthe challenge of how to interpret these data remains unmetThe RNA secondary-structure and statistics database (RNA

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Num

ber

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

RNA structures deposited in the NDB database

Total Yearly

Figure 2 Number of RNA structures deposited in the NDB nucleicacid database (httpndbserverrutgersedu) as of December 2009

STRAND) [144] contains as of May 2010 4666 secondarystructures Furthermore Rfam a database of sequence andsecondary-structure information from several RNA familieshas recorded 1446 RNA families since its last update in January2010 [85] Each family can contain many thousands of RNAsequences (eg the 5S rRNA family has 57 054 sequences)

The structures of a large majority of RNA moleculesremain to be solved Despite many advances in RNAcrystallography NMR and chemical modification RNAstructure determination is a difficult task This makes the needfor accurate prediction using computational methods especiallyurgent

5 RNA secondary-structure prediction approaches

Predicting RNA secondary structures refers to the task ofdetermining given a single RNA sequence or multipleRNA sequences the set of complementary canonicalWatson and Crick GC AU and GU wobble base pairsthat define helical stems as well as the single-strandedregions such as hairpins internal loops and junctions (seesection 2) Many computational programs designed forpredicting RNA secondary structures have been developedover the past three decades Approaches vary widelyincluding free energy minimization using thermodynamicparameters [76 86 87] knowledge-based predictions basedon known RNA structures [88ndash90] comparative sequencealignment algorithms [76 91ndash96] combinations of these andothers Below we describe some of the current predictionprograms More examples are listed in table 2 Other reviewson specific approaches are also available [26 28 97ndash99]

51 Secondary-structure prediction using a single sequence

It is believed that the native RNA structure correspondsto a global minimum of the relevant free energy functionTherefore prediction programs focus on determining thefree energy for a secondary structure One of the firstprograms developed for predicting RNA secondary structuresusing a single sequence called Mfold was developed by

5

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 7: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 2 List of some available programs for RNA secondary-structure prediction using a single-sequence or multiple-sequence comparison

Program Approach Description References

Prediction using a single sequence

Mfold Free energy minimization Determine the set of base pairs that gives theminimum free energy using dynamic programmingmethods

[87]

RNAfold Free energy minimization Predict the minimum free energy structure using thethermodynamic parameter approach it can alsoestimate base pair probabilities

[86]

RNAstructure Free energy minimization Adapt experimental constraints such as chemicalmodification and SHAPE to improve prediction

[103]

MC-Fold Free energy minimization Assemble secondary structures from nucleotidecyclic motifs

[75]

Contrafold Knowledge based Predict structures using statistical data and trainingalgorithms

[89]

Sfold Knowledge based Calculate a set of representative candidates usingstatistical sampling methods and Boltzmannassemble

[104]

RNAShapes Knowledge based Search the folding space using the reduced conceptof RNA shape

[105 106]

Kinwalker Free energy minimizationkinetic folding

Build secondary structures using mimics ofco-transcriptional folding and near optimalstructures of subsequences

[107]

MPGAfold Genetic algorithm Search for possible folding pathways and functionalintermediates using a massively parallel geneticalgorithm

[108 109]

Kinefold Free energy minimizationstochastic foldingsimulations

Predict structures using a co-transcriptional foldingsimulation approach where RNA helices are closedand opened in a stochastic process it can alsopredict pseudoknots

[112 113]

Pknots Free energy minimizationparameter approximations

Predict pseudoknots using a dynamic programmingalgorithm and thermodynamic parametersaugmented with approximated parameters forthermodynamic stability of pseudoknots

[111]

Prediction using multiple sequences

RNAalifold Free energy minimizationcovariation analysis

Determine a consensus secondary structure from amultiple-sequence alignment using covariationanalysis

[94]

ILM Free energy minimizationcovariation analysis

Predict base pairs using an iterative loop matchingalgorithm where base pairs are ranked usingthermodynamic parameters and covariationanalysis it can also predict pseudoknots

[95]

CentroidFold Knowledge based Predict secondary structures using the centroidestimator employed by Sfold and the maximumexpected accuracy estimator used by Contrafold

[90 114]

Dynalign Free energy minimizationpairwise alignment

Compute the lowest free energy sequence alignmentand secondary structure common to two sequences

[76 115 116]

Carnac Free energy minimizationpairwise alignment

Predict the common structure shared by twohomologous sequence using similarities energy andcovariations

[96]

RNAforester Structural comparisonusing tree alignmentmodels

Represent RNA secondary structures by tree graphsand compare and align secondary structures viaalignment algorithms for trees and other graphobjects

[93]

KNetfold Machine learning Determine pairs of aligned columns from aconsensus using multiple-sequence alignment

[88]

Zuker and Stiegler [87] in 1981 It aims to find the setof base pairs that yields the minimum free energy usingdynamic programming methods Mfold uses thermodynamicparameters obtained from experiments near 37 C the humanbody temperature to estimate different base pairing and basestacking forces [100ndash102] The thermodynamic parameters arethen used in a potential function that approximates the overall

energy as a sum of independent terms for different loops andbase pair interactions

Because thermodynamic parameters are measured exper-imentally they are incomplete or subject to inaccuracies andthus a great number of alternative suboptimal structures can fallnear the predicted global energy minimum Therefore Mfoldand other current free energy minimization programs such as

6

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 8: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAfold [86] which is part of the Vienna package considera sample of structures near the optimal free energy conforma-tion In addition RNAstructure [103] is a another programbased on thermodynamic parameters which can improve pre-diction by incorporating constraints from experimental data us-ing 2prime-hydroxyl acylation RNA (SHAPE) chemistry (describedin section 41)

Parameter calculation from known RNA structuresis another useful approach to RNA secondary-structureprediction Programs like Contrafold [89] use statistical dataand Bayesian approaches to train the algorithm on a large setof known secondary structures using base pair probabilitiesSfold [104] uses statistical sampling methods and a Boltzmannassemble to calculate a set of representative candidates forRNA secondary structure Similarly RNAShapes [105 106]performs an abstract search over all predicted RNA shapes toproduce a sample of the folding space

A recent heuristic program called Kinwalker [107] isbased on a kinetic folding algorithm that simulates the dynamicproperties of RNA folding during the co-transcriptionalprocess The algorithm constructs the secondary structureby a stepwise combination of building blocks correspondingto subsequences and their thermodynamic near optimalstructures Kinwalker can be used to fold RNA up to1500 nucleotides long Similarly MPGAfold [108] is amassively parallel genetic algorithm that predicts possiblefolding pathways and functional intermediates using parallelcomputing [109]

The case for RNA structure prediction with pseudoknotshas been shown to be a non-polynomial (NP) complexproblem [110] However special cases have been exploredPrograms like Pknots [111] use dynamics programs withadded approximations to thermodynamic parameters neededfor pseudoknot calculations Cao and Chen [145] recentlydeveloped a predictive model for pseudoknots with inter-helixloops The model gives conformational entropy stabilities andfree energy landscapes from RNA sequences on the basis ofa model that includes volume exclusion Kinefold [112 113]uses a long-time-scale stochastic folding simulation approachwhere RNA helices are closed and opened in a stochasticprocess and pseudoknots are predicted using topological andgeometrical constraints

52 Multiple-sequence alignment

With the steady increase in RNA sequence data secondary-structure prediction for RNA has also been achieved byaligning multiple sequences [91 117ndash119] Comparativesequence analysis consists of aligning many RNA sequencesand looking for patterns of sequence variability between twoor more nucleotides Sequence variability (ie covariation)can be observed because in contrast to nucleotides that varyrandomly base pairs that are conserved by evolution vary bycompensatory changes (eg a GC pair in one sequence canchange into an AU in another sequence) These covariationsmake it possible to detect the base pairs that form thehelical stems and consequently predict a secondary-structuremodel Alignments of RNA sequences also allow identifyingfunctionally important regions that are often conserved

Several approaches for comparative sequence analysishave been developed [26] One approach consists of aligningthe sequences and then folding them on the basis of thatalignment Examples include RNAalifold [94] and ILM [95]which can predict pseudoknots Another approach thatinvolves simultaneous alignment folding and inference ofstructure from a set of homologous sequences is present inDynalign [76 115 116] and Carnac [96] are examples of thisapproach Both programs combine free energy minimizationand comparative sequence analysis If no meaningfulsequence conservation is encountered during the alignment analternative method is used consisting of simultaneous foldingand aligning RNAforester [93] is such an example Alignmentprediction methods for pseudoknots are more reliable onmultiple sequences KNetfold [88] is an example of aconsensus prediction method based on a multiple-sequencealignment

Successful examples include the structure models for the5S 16S and 23S ribosomal RNAs solved by the Gutellrsquos labusing alignments from hundreds of sequences [91] Thesemodels predict base pairs in very good agreement withexperimental data [92] Similarly the Westhof group predictedmodels for the group I intron [117 118] and the ribonucleaseP RNA [119] using similar methods

Resources available to the community at large forsuch alignments include Gutellrsquos lab database (httpwwwrnaccbbutexasedu) of aligned sequences secondarystructures and phylogenetic information for various RNAmolecules [91] The Rfam database also contains multiple-sequence alignments and consensus secondary structures forseveral RNAs [85]

53 Advances in and limitations to RNA secondary-structureprediction

Free energy minimization methods are based on a num-ber of thermodynamic parameters for base pairing basestacking loop lengths and other motifs Although expan-sions and recalculations of the parameters are now avail-able [84 100 101 120] thermodynamic approaches are stilllimited in terms of accuracy of the parameters and the incom-pleteness of the thermodynamic rules used Alternately ana-lyzing suboptimal states of RNA structures in addition to thefree energy minimum has been valuable However it has beenreported that about 73 of known canonical base pairs are pre-dicted by free energy minimization for sequences with less than700 nucleotides [76] Similarly parameter calculation meth-ods are limited to the availability of structural data Whilekinetic folding algorithms such as Kinwalker Kinefold andMPGAfold that incorporate RNA dynamics are very promis-ing because they simulate the dynamics of co-transcriptionalfolding (RNA folding with sequence elongation) they are stillat an early stage

Alignments of multiple RNA sequences present achallenge for two main reasons (1) only four letters areused on the basis of sequence similarity and (2) sequencealignment programs such as Blast [121] or Clustal [122] do notconsider information from secondary-structure conservationBoth are limitations because structure has evolved much

7

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 9: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 3 Some available programs for RNA tertiary-structure prediction and interactive manipulation

Program Input Model Simulation method DescriptionWeb page References

Automatic prediction

iFoldRNA Sequence Coarse-grainedthree-bead model

Replica exchangemolecular dynamics

Uses discrete molecular dynamics andforce fields to simulate RNA foldingdynamicshttptrollmedunceduifoldrna

[132 133]

FARNA (Rosetta) Sequencesecondarystructure

Coarse-grainedone-bead model

Fragment assemblyMonte Carlo

Uses 3-nt fragment library Monte Carlosimulations and a potential function topredict the structurehttpwwwrosettacommonsorgmanualsarchiverosetta30 userguideindexhtml

[125 127]

NAST Secondarystructuretertiary contacts

Coarse-grainedone-bead model

Moleculardynamics

Performs molecular dynamicssimulations guided by aknowledge-based statistical potentialfunction httpssimtkorghomenast

[131]

MC-Fold MC-Sym Sequencesecondarystructure

Nucleotide cyclicmotif

Fragment assemblyLas Vegasalgorithm

Predicts RNA secondary structuresusing free energy minimization withstructure assembled using the fragmentinsertion Las Vegas algorithmhttpwwwmajoririccaMC-Pipeline

[75]

Interactive manipulation

RNA2D3D Secondarystructure

All-atom model Interactivemanipulation

Performs molecular mechanics anddynamics and permits insertion ofcoaxial stacking and manipulation ofhelical elementshttpwwwccrnpncifcrfgovsimbshapirosoftwarehtml

[136]

Assemble Database ofknownfragments andmotifs

All-atom model Interactivemanipulation

Constructs a 3D structure using theinsertion of tertiary motifs and permitsmanipulation of torsion angleshttpwwwbioinformaticsorgassemble

No reference

more slowly than sequences and compensatory mutationssuch as GndashC by AndashU not considered in current sequencealignment approaches frequently occur since they conserve thesecondary structure

Current rigorous alignments of distantly related RNAsequences typically require consideration of both sequenceand secondary structure and are best performed manuallyHowever efforts are under way with the new formation ofthe RNA Structure Alignment Ontology [123] These newapproaches intend to generalize sequence alignments as a set oflsquocorrespondencersquo relations between whole regions rather thanbetween individual nucleotides

Despite great advances in RNA secondary-structureprediction many limitations remain Most prediction programscannot predict pseudoknots nor the multiple configurationsaccessible to one sequence (eg for a riboswitch ordomains within viruses) Pseudoknots are important becausetheir probability of occurrence increases sharply with RNAsize [124] Prediction also becomes less accurate as theRNA size increases and there is currently no approach forquantifying the likelihood or error of a particular predictionNonetheless as mentioned in section 41 many secondary-structure prediction programs are beginning to incorporateadditional experimental data to improve the prediction

accuracy Such hybrid approaches like RNAstructure willindeed make advances on both experimental and computationalfronts

6 RNA tertiary-structure prediction programs

Even though determining the secondary structure providesa blueprint of the RNA molecule it is the knowledge ofthe three-dimensional structure that allows us to understandits function as well as the possible interactions with othermolecules However in contrast to the advances achieved inprotein folding programs RNA structure prediction is still atan early stage Current 3D RNA folding algorithms requireeither manual manipulation or are generally limited to simplestructures Below we describe some of the most recentlydeveloped programs Table 3 provides more details

FARNA is an energy-based program developed by Dasand Baker [125] that predicts RNA 3D structures from asequence It was inspired by the Rosetta low resolution proteinstructure prediction method [126] To represent each baseFARNArsquos model consists of a one bead of a coarse-grainedmodel using each basersquos centroid as the bead origin Tocapture local conformational correlations observed in solved

8

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 10: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

RNAs FARNA builds a 3D structure library consisting of 3-nt fragments taken from a large rRNA subunit from whichtorsion angles and sugar puckering parameters are stored Thena simulation using Monte Carlo methods is used to assemblefragments into native-like structures The folding simulationis guided by a knowledge-based energy function that takesinto account both backbone conformations and side-chaininteraction preferences observed in solved structures Thisfunction includes a term for the radius of gyration a function topenalize steric clashes and functions that favor base stackingand the planarity of both canonical and non-canonical basepairs Knowledge from base pairs can also be incorporatedConstraints using structural inference of native RNAs havebeen used more recently through an experimental methodcalled multiplexed hydroxyl radical (ndashOH) cleavage analysis(MOHCA) [127] to enable detection of tertiary contacts andimprove FARNArsquos prediction For instance when the 158-nucleotide P4ndashP6 domain of the group I intron (PDB 1GID) iscompared to the structure predicted from sequence alone theroot mean square deviation (RMSD) value is 35 A In contrastprediction using secondary structure and data from MOHCAgives an RMSD value of 13 A [127]

Parisien and Major [75] developed a method for modelingRNA 3D structures using energy minimization that builds uponearlier work using predicted cyclic building blocks [128 129]The approach consists of a pipeline implementation of twoprograms MC-Fold and MC-Sym MC-Fold predicts RNAsecondary structure using a free energy minimization functionand the fragment insertion of nucleotide cyclic motifs formedby base pair and base stacking interactions MC-Sym buildsfull-atom models of RNA structures using the 3D versionof the nucleotide cyclic motif fragments The nucleotidecyclic fragment library was built from a list of 531 RNA 3Dstructures The fragment insertion simulation is performedusing the Las Vegas algorithm [130] a Monte Carlo algorithmthat either produces a correct result or reports that suchan answer cannot be found In the case of the MC-Symfragment insertion simulation the algorithm explores as manycorresponding 3D fragments as possible in a given timeand all RNA 3D structures generated are consistent withthe base pair and base stacking input constraints MC-Foldcan also incorporate experimental data in order to restrictthe conformational space input can be from RNA SHAPEchemistry or dimethyl sulfate (DMS) data (mentioned insection 2)

The nucleic acid simulation tool or NAST [131] is acoarse-grained molecular dynamics simulation tool consistingof a knowledge-based statistical potential function Eachresidue is represented as a single pseudo-atom centered at theC3prime atom NAST requires secondary-structure informationand if available tertiary contacts to direct the foldingIn addition the knowledge-based function uses geometricdistances angles and dihedrals from the available ribosomalstructures using C3prime atoms between two three and foursequential residues respectively A repulsive term for non-bonded interactions based on the Lennard-Jones potential isalso applied

iFoldRNA [132] is a Web-based program designedby Dokholyanrsquos group for predicting RNA structures from

sequence It is based on discrete molecular dynamics(DMD) [133] and a tailored force field [134] algorithms forsimulating RNA folding dynamics The coarse-grained modelis composed of three beads for each nucleotide positioned atthe center of mass of the phosphate group sugar ring and six-atom ring in the base The structurersquos angles dihedrals andbonds are used to construct a stepwise potential function thataccounts for base stacking short-range phosphatendashphosphaterepulsion and hydrophobic interactions To explore thepotential energy landscape of the molecular system iFoldRNAuses multiple simulations or replicas of the same systemperformed in parallel at different temperatures The discretemolecular dynamics algorithm is based on a stepwise potentialfunction [133 135] Like NAST and MC-Sym iFoldRNAincorporates data from tertiary contacts based on experimentalcontacts from SHAPE chemistry [74] to overcome sizelimitations

RNA2D3D [136] is a manual-input program that uses datafrom sequence and secondary structure to build a first-orderapproximation RNA 3D model The program allows the userto manually add or remove base pairs and incorporate coaxialstacking interactions useful for exploring diverse RNA con-formations ASSEMBLE is a similar tool created by Jossinetand Westhof (httpwwwbioinformaticsorgassemble) Theprogram uses either secondary-structure information or tertiaryinformation from homologous RNAs to construct a 3D modelmanually ASSEMBLE allows the manual insertion of basepairs and motifs as well as torsion angle modifications rota-tions and translations of modular elements While these user-input tools are useful they rely on manual application of ex-pert knowledge Unfortunately there are only a few of theseexperts

7 Comparison of automated RNA 3D foldingalgorithms

To evaluate the performance of some of the most recent 3Dprediction programs we provide an independent comparativestudy for RNAs in a high resolution data set representedby various RNA lengths and structural features Becausethe programs we consider are recent they are still underdevelopment and could be improved in the future Thereforeour data reflect the state of the art in automatic predictionprograms at the outset of 2010

71 Methods

We consider the tertiary-structure prediction programsFARNA iFoldRNA and MC-FoldMC-Sym (MC for short)for prediction using sequence information only In additiona second comparison among FARNA MC and NAST is con-sidered for prediction based on secondary-structure data Al-though NAST can accept input information from tertiary con-tacts which can dramatically improve prediction accuracy forthe purpose of equal comparison we only produce NASTstructures using secondary-structure data We did not ex-plicitly evaluate computational efficiency however the gen-eral trend was that NAST is less CPU time intensive than

9

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 11: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 4 List of RNA 3D structures predicted using several computer programs The RNA structures are selected with diversity in size (NTs)canonical base pairs (BPs) non-canonical base pairs (NC BPs) and structural complexity such as formation of hairpins internal loopsjunctions and pseudoknots The symbols lsquorsquo and lsquorsquo in the first column next to the PDB code denote the RNA structures that MC and NASTfailed to predict respectively

PDB NTs BPs NC BPs Resolution Organism Structure

2F8K 16 6 0 2 S cerevisiae Hairpin2AB4 20 7 1 24 T maritima Hairpin361D 20 10 1 3 Synthetic Hairpin2ANN 23 6 3 23 H sapiens Hairpin1RLG 25 9 9 27 A fulgidus Hairpin internal loop2QUX 25 9 0 24 P phage PP7 Hairpin387D 26 5 1 31 Synthetic Hairpin1MSY 27 10 4 14 Synthetic Hairpin1L2X 28 14 6 13 Synthetic Pseudoknot2AP5 28 8 3 NA YLV virus Pseudoknot1JID 29 12 2 18 H sapiens Hairpin internal loop1OOA 29 18 1 25 M musculus Hairpin internal loop (aptamer)430D 29 12 4 21 Synthetic Hairpin internal loop2IPY 30 12 0 28 O cuniculus Hairpin internal loop2OZB 33 12 3 26 H sapiens Hairpin internal loop1MJI 34 8 4 25 T thermophilus Hairpin internal loop1ET4 35 50 10 23 Synthetic Pseudoknot2HW8 36 15 5 21 T thermophilus Hairpin internal loop1I6U 37 34 4 26 M jannaschii Hairpin internal loop1F1T 38 14 5 28 Synthetic Hairpin internal loops (aptamer)1ZHO 38 15 4 26 T thermophilus Hairpin internal loops1S03 47 13 3 27 E coli Hairpin internal loops1XJR 47 19 3 27 Synthetic Hairpin internal loops1U63 49 19 3 34 M jannaschii Hairpin internal loop2PXB 49 21 6 2 E coli Hairpin internal loops2FK6 53 20 3 29 B subtilis Pseudoknot three-way junction3E5C 53 21 5 23 Synthetic Three-way junction (riboswitch)1MZP 55 17 12 27 S acidocaldarius Hairpin internal loops1DK1 57 24 4 28 T thermophilus Three-way junction1MMS 58 20 13 26 T maritima Three-way junction3EGZ 65 23 5 22 H sapiens Three-way junction (riboswitch)2QUS 69 26 2 24 Synthetic Pseudoknot three-way junction1KXK 70 28 5 3 Synthetic Hairpin internal loops2DU3 71 27 5 26 A fulgidus Four-way junction (tRNA)2OIU 71 29 3 26 Synthetic Three-way junction (ribozyme)1SJ4 73 19 8 27 HDV virus Pseudoknot four-way junction1P5O 77 29 5 NA HCV virus Hairpin internal loops3D2G 77 28 15 23 A thaliana Three-way junction (riboswitch)2HOJ 79 27 12 25 Synthetic Three-way junction (riboswitch)2GDI 80 32 12 2 Synthetic Three-way junction (riboswitch)2GIS 94 36 13 29 Synthetic Pseudoknot four-way junction (riboswitch)1LNG 97 38 14 23 M jannaschii Three-way junction (SRP)1MFQ 128 49 16 31 H Sapiens Three-way junction (SRP)

iFoldRNA FARNA and MC Structures generated using NASTand FARNA were computed using a Linux Intel Xeon 30 GHzprocessor Structures generated using iFoldRNA and MC werecomputed using their own Web servers Most programs returnresults in less than a few hours For instance the calculationfor predicting RNA 49-nt long takes about 4 min using NAST27 min using iFoldRNA 31 min using MC-Sym and 37 minusing FARNA

Our RNA data set consists of 43 high resolution (34 A orbetter) structures of diverse sizes and motifs (see table 4) Bothsecondary and tertiary structures have been experimentallydetermined for these RNAs The length varies from 16 to128 nucleotides and the topologies range from hairpins tocomplex RNAs such as riboswitches pseudoknots and RNAscontaining junctions Figure 3 shows representative examples

of our data set As part of the pseudoknot category weinclude RNA structures with Kissing hairpins which are loopndashloop interactions between two hairpins forming WC basepairs

To evaluate prediction performance we use the root meansquare deviations (RMSD) from the reference known structureas well as the deviation index (DI) which is a measurethat accounts for base pair and base stacking interactionsand is defined as the quotient between the RMSD and thesquared root of the specificity (PPV) times the sensitivity(STY) of base pair and base stacking interactions as definedin [137] In brief PPV represents the percentage of correctlypredicted interactions that are in the crystal structure whileSTY represents the percentage of interactions in the crystal

10

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 12: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 3 Examples of RNA structures considered for prediction These RNAs vary in length and structural complexity including hairpins(PDB 2QUS 2OZB) pseudoknots (PDB 2AP5) and junctions (PDB 2DU3 1LNG)

structure predicted without accounting for the false positives2The smaller the DI value the better the prediction RMSDvalues were computed for the backbone using VMD [138]Base pair and base stacking interactions were determinedusing FR3D [139] An average of the RMSD and DI valuesfor structures that have the same number of nucleotides wasconsidered The accuracy measured with the PPV and STYvalues is presented in figure 4

The MC and NAST programs failed to produce a foldedstructure in a few cases (see table 4) MC was unable to predictsome structures both from the sequence and the secondarystructure possibly due to the lack of a sufficient numberof cyclic motif fragments to insert Similarly NAST failedin some cases possibly due to the design of the fragmentassembly algorithm Both iFoldRNA and FARNA predicteda structure for every element in the data set

To ensure a reliable predicted model in FARNA we used50 000 iteration cycles Similarly in NAST we consider onemillion steps3 In the MC-FoldMC-Sym pipeline when we

2 PPV = TP(TP + FP) STY = TP(TP + FN) where TP = number ofcorrectly predicted interactions found in the solved structure FP = number ofpredicted interactions not found in the solved structure and FN = number ofinteractions in the solved structure that are not predicted by the algorithm3 The PDB structure 1N32 (16S rRNA) was used for fragment samples duringthe all-atom model formation step

predict the structure solely from the sequence we considerthe secondary structure that is ranked first by MC-Fold Forthe 3D structure selection we use the first-ranked structureaccording to the scoring function available on the MC-Symanalysis tool Although in some programs better predictionsmight be possible by tuning many parameters we used onlythe recommended values to make an unbiased comparison

72 Tertiary-structure prediction results

Specificity (PPV) and sensitivity (STY) values describing theprediction accuracy for each program and system are shownin figure 4 Both PPV and STY take values between 0and 1 where better predictions approach 1 Figure 4 showsthat both PPV and STY values for MC are comparable andhigher than those for the rest of the programs This is duemostly to the library that MC uses that assembles nucleotidecyclic fragments by stacking base pairs In contrast FARNAiFoldRNA and NAST all have their PPV values higher thanSTY values Thus a smaller number of false interactionpredictions are produced at the expense of a smaller number ofcorrect predictions An improvement in FARNArsquos predictionis observed when the secondary-structure (base pair and basestacking) information is included particularly in STY valueswhere their average value increases from 037 to 063

11

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 13: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 4 RNA structure prediction analysis Specificity (PPV) andsensitivity (STY) values are sorted by length and compared amongdifferent prediction programs on the basis of sequence (upper chart)or secondary-structure information (lower chart) The curves arepresented using the moving average trend lines based on twopreceding values for each point

A different representation of the accuracy of each programis given using the RMSD and DI values as shown infigure S1 (available at stacksioporgJPhysCM22283101mmedia) Dashed lines represent the linear approximation tothe RMSD values and solid lines approximate the DI valuesAs expected both RMSD and DI values increase rapidly as thelength of the molecules increases The best values start around6 A RMSD considered poor for protein predictions We seethat the performance improves dramatically when secondary-structure information is added (lower graph) as opposed to thecase for prediction using the sequence only (upper graph) withthe best predictions now starting around 4 A RMSD Except forFARNA and iFoldRNA the slopes of the lines (when scalingthe RMSD to DI values) do not change drastically reinforcingthe independence of these values from RNA size

The accuracy of each program varies from structureto structure In general terms MC performs better inboth prediction experiments (figure 4) Program iFoldRNAperforms better than FARNA for small molecules butFARNA improves as the length increases FARNA performsslightly better than NAST but the difference in DI valuesshows that FARNA makes better use of secondary-structureinformation A difference is also noted in the slope between

the lines that describe the RMSD and DI accuracy values foriFoldRNA (blue lines upper graph of figure S1 availableat stacksioporgJPhysCM22283101mmedia) This impliesthat iFoldRNArsquos accuracy in predicting local features such asbase pair and base stacking interactions is poorer than that forglobal structure prediction

For further analysis we next separate the data by structuralcontext (hairpins pseudoknots and junctions) Hairpinsare easiest to predict (top row of figure S2 available atstacksioporgJPhysCM22283101mmedia) with lowest DIvalues observed Prediction using sequence show that exceptfor a few structures iFoldRNA and MC perform similarly Inparticular the peak at length 23 corresponds to a hairpin with15-nt in the single-stranded region and because alternativesecondary structures with more canonical base pairs arepossible predictions are more difficult Similarly peaksat length 26 33 and 36 correspond to structures that aredifficult to predict The main reason is that for these cases(PDB 387D 2OZB and 2HW8) the native structure includesproteins that have kinked internal loops thus distorting theRNA structure (eg see 2OZB in figure 3) Predictionusing secondary structure shows that NAST and FARNA havesimilar accuracies In general MC performs better due tothe nature of its cyclic fragments model that includes hairpinloops (figure 5(a)) however many non-canonical base pairinteractions in the loop regions are not predicted

The performance for pseudoknots (middle row offigure S2 available at stacksioporgJPhysCM22283101mmedia) shows that the accuracy of prediction of FARNAand iFoldRNA and of FARNA and NAST are overall similarMC performs better in both experiments but fails to producea model for most of the pseudoknot structures predicted fromsecondary structure Figures 5(b) and (c) shows the backboneconformation of the solved pseudoknot structure (PDB 1L2X)in orange with a yellow background against the backbone ofthe predicted structures Predictions from the sequence alonefail to produce a more compact structure (figure 5(b)) This isexpected since in general prediction from the sequence of RNAstructures with pseudoknots is difficult In contrast a greatimprovement occurs when the secondary-structure informationis given (figure 5(c))

Prediction results of structures containing a junctionare shown in the last row of figure S2 (available atstacksioporgJPhysCM22283101mmedia) Although MCoften outperforms other programs it could not predict astructure for most of the RNA junctions even when secondary-structure information is given DI values for FARNA andiFoldRNA and for FARNA and NAST are comparable Likefor pseudoknot prediction if secondary-structure data aregiven then the prediction performance improves considerablyHowever the long-range interactions that often stabilize helicalelements are required to produce an accurate model asobserved in figure 5(d) for the models produced by NAST(cyan curve) and FARNA (green curve) Also the peaks inDI values at lengths 65 and 79 shown in figure S2 (availableat stacksioporgJPhysCM22283101mmedia) (last row left)correspond to two riboswitches that are structurally complex

Structure prediction using MC regularly gives the mostaccurate model However this program often fails to predict

12

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 14: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Figure 5 Structural alignments show the accuracy of diverse prediction programs (a) Hairpin structures predicted from sequences usingFARNA (green middle) iFoldRNA (blue left) and MC-FoldMC-Sym (magenta right) against the known structure (orange with yellowbackground) ((b) (c)) Pseudoknot structures are predicted both from the sequence (b) and from secondary structure for iFoldRNA FARNAand MC respectively in (b) and FARNA and NAST respectively NAST (cyan) in (c) is also included (d) Three-way junction structurepredictions using secondary structures are aligned against the solved structure (orange with yellow background) for MC FARNA and NAST

a structure (see table 4) and while helical regions are wellmodeled non-canonical base pairs occurring at the loop regionare rarely predicted Although iFoldRNA often outperformsFARNA on predicting hairpins their performances are similarfor predicting pseudoknots and junctions Similarly the

NAST and FARNA accuracies of prediction of hairpins usingsecondary-structure data are comparable

Both FARNA and MC allow prediction of multiple foldsor suboptimal folds Although in a true prediction contextone does not have knowledge of the correct structure the

13

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 15: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

Table 5 List of performance values from suboptimal structures predicted using MC-FoldMC-Sym (left) and FARNA (right) usingsecondary-structure information The structures consist of two hairpins and two junctions Values in bold font correspond to the onespresented in figures 4 S1 and S2 (available at stacksioporgJPhysCM22283101mmedia) The average and best performance values arealso shown in bold

MC-FoldMC-Sym FARNA

Structure Name NTs PPV STY RMSD DI Name NTs PPV STY RMSD DI

1KXKm1 70 091 082 905 1051 1KXKr1 70 084 070 2083 27291KXKm2 70 089 084 871 1003 1KXKr2 70 084 076 1475 18461KXKm3 70 091 083 940 1079 1KXKr3 70 088 075 1265 15531KXKm4 70 087 079 921 1112 1KXKr4 70 086 076 1736 21511KXKm5 70 087 079 1106 1336 1KXKr5 70 085 075 2056 2564

1KXKm 70 091 082 906 1052 1KXKr 70 087 066 1458 1926

Average 089 081 949 1116 Average 085 074 1723 2169Best 091 084 871 1003 Best 088 076 1265 1553

1XJRm1 47 089 074 1224 1511 1XJRr1 47 084 078 888 10971XJRm2 47 090 082 933 1091 1XJRr2 47 084 063 1150 15831XJRm3 47 085 077 572 708 1XJRr3 47 087 071 1405 17931XJRm4 47 086 074 772 970 1XJRr4 47 078 078 1166 14871XJRm5 47 086 074 868 1091 1XJRr5 47 081 066 1206 1647

1XJRm 47 092 082 1080 1248 1XJRr 47 086 065 1031 1378

Average 087 076 874 1074 Average 083 071 1163 1521Best 090 082 572 708 Best 087 078 888 1097

2OIUm1 71 092 074 1966 2386 2OIUr1 71 084 047 1598 25342OIUm2 71 092 076 1402 1678 2OIUr2 71 091 064 1951 25542OIUm3 71 092 075 1819 2192 2OIUr3 71 088 056 1984 28352OIUm4 71 093 080 1553 1800 2OIUr4 71 085 070 1666 2158

2OIUr5 71 088 075 1848 2280

2OIUm 71 091 083 1402 1620 2OIUr 71 091 083 1624 1877

Average 092 076 1685 2014 Average 087 063 1810 2472Best 093 080 1402 1678 Best 091 075 1598 2158

2QUSm1 69 087 079 1592 1919 2QUSr1 69 087 052 1680 24882QUSm2 69 086 077 2090 2569 2QUSr2 69 088 073 1207 1503

2QUSr3 69 084 046 1741 28212QUSr4 69 090 061 1320 17782QUSr5 69 080 058 1918 2811

2QUSm 69 082 079 1592 1985 2QUSr 69 092 075 1406 1698

Average 086 078 1841 2244 Average 086 058 1573 2280Best 087 079 1592 1919 Best 090 073 1207 1503

study of such multiple folds helps determine the robustnessin the prediction programs We considered a group of fourrepresentative structures consisting of two hairpins and twojunctions (see table 5) and predicted up to five multiple

folds for each structure In some cases MC predicted fewersuboptimal folds FARNA produced all five folds for eachcase MC produces overall similar results for the 2D structuresince the best structures having PPV and STY values (08ndash09)

14

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 16: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

are similar to the average The RMSD values from the bestperformances are below 95 for hairpins and below 185 Afor junctions Thus as the degree of topological complexityincreases MCrsquos performance worsens In FARNA the STYvalues from the multiple folds are better for hairpins but worsefor junctions The fact that similar 2D structures yield verydifferent 3D folds underscores the central difficulty in tertiaryRNA prediction

Furthermore we analyzed the prediction accuracy ofMC and FARNA from multiple folds using only sequenceinformation on a hairpin (PDB 1KXK) and a junction (PDB2QUS) structure (see table S1 in the supplementary materialavailable at stacksioporgJPhysCM22283101mmedia) MCpredictions show a drop in PPV average values from 09 to08 Interestingly the average RMSD values are similar tothe predictions using secondary-structure information shownabove this suggests that information from secondary structureaffects more local features in these cases Improvingtertiary interactions however depends on the accuratedetermination of long-range contacts to position the RNArsquoshelical elements Prediction accuracy with FARNA using onlysequence information reduces considerably both local (PPVand STY) as well as global (RMSD) features showing thatboth secondary- and tertiary-contact information is required foraccurate prediction in these cases

8 Discussion

Significant advances have been made over the last decade inRNA modeling along with increasing computational resourcesand technologies for RNA investigations It is encouragingto see that many of these advances have been achieved withrelatively simple models which combine energy or statisticalpotentials and fragment assembly techniques These likelyare effective due to RNArsquos modular architecture and structuralhierarchy Still further developments and refinements of theexisting models are needed

One of the major challenges in RNA tertiary-structureprediction is the determination of long-range interactionsOne possible avenue for determining tertiary contacts maycome from studies using multiple-sequence analysis Bycomparing instances of each recurrent base pair or a largerstructural element such as an RNA motif one can identifyneutral mutations that preserve its structure and function [32]Programs like SHEVEK [140] and ISFOLD [141] that usemultiple-sequence alignment approaches are a step in thatdirection

Current algorithms have shown how the use of exper-imental data can dramatically improve the structure predic-tion [63 73ndash75] But data from experimental techniques suchas RNA SHAPES chemistry can potentially provide more in-formation than just canonical WC base pairs To determinenon-canonical base pairs and tertiary contacts further refine-ments in order to better exploit this information should be con-sidered Similarly a program for automatically restricting thestructural conformation space of a model on the basis of lowresolution cryo-EM data might be valuable

Several programs have exploited the hierarchical prop-erties of RNA molecules For instance part of MC-Symprotocolrsquos success rests on the use of cyclic motifs Yet cur-rent automatic prediction methods cannot exploit the modularproperties of the more complex tertiary motifs as is done in AS-SEMBLE and RNA2D3D Similarly recent studies on basendashphosphate interactions [33] have shown that such interactionsare sufficiently frequent to be considered in an energy functionsimilar to FARNArsquos base pairing and base stacking interactionfunctions

The Rosetta method has proven successful for proteinprediction A successful companion method for RNA requireschanges in both the energy function and fragment elementsFor instance in contrast to the compact globular shapes ofproteins rather compact prolate ellipsoidal shapes are favoredby RNA molecules [65 142] Functions like the radius ofgyration that reward globular molecular conformations mightnot suit RNA In addition studies have shown that junctionsarrange their helical arms in parallel and perpendicularconfigurations [60 61] A new scoring function thatencourages these conformations can be helpful

While the use of scoring functions that describe RNAndashioninteractions will certainly improve RNA structure predictionthe computational effort of such a task might not be feasibleinitially Assembly from structure fragments already in thepresence of cations can circumvent this limitation as is donein NAST FARNA and MC Similarly the use of RNAndashproteininteraction prediction programs can help improve predictionsof RNAs in the presence of proteins

In general the prediction accuracy improves with addedknowledge from the secondary structure and tertiary contactsbut these interactions can be greatly affected by the presenceof proteins The lack of correct functions that favor compactRNA-like structure and the failure to detect the presence oflong-range interaction contacts remain challenges

However with the increasing interest and creation ofa community devoted to RNA bioinformatics [39] we cananticipate many exciting developments in automated RNAstructure prediction approaches in the coming decade Thegrowing appreciation for the biological importance of RNAmakes such efforts more essential than ever

Acknowledgments

The work was supported by the Human Frontier ScienceProgram (HFSP) by a joint NSFNIGMS initiative inMathematical Biology (DMS-0201160) and by NSF EMTaward no CF-0727001 Partial support by NIH (grant no R01-GM055164) NIH (grant no 1 R01 ES 012692) NSF (grantno CCF-0727001) and Philip Morris USA Inc and PhilipMorris International is also gratefully acknowledged Theauthors thank Abdul Iqbal Yoon Ha and Segun Jung for theirhelp with data preparation and Bruce Shapiro for a criticalreading of this manuscript We also thank Magdalena Jonikasfor helping with NAST implementation and Feng Ding forhelping with the iFoldRNA applications

15

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 17: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

References

[1] Hannon G J 2002 RNA interference Nature 418 244ndash51[2] Gillet R and Felden B 2001 Emerging views on

tmRNA-mediated protein tagging and ribosome rescueMol Microbiol 42 879ndash85

[3] Masse E and Gottesman S 2002 A small RNA regulates theexpression of genes involved in iron metabolism inEscherichia coli Proc Natl Acad Sci USA 99 4620ndash5

[4] Ruvkun G 2001 Molecular biology Glimpses of a tiny RNAworld Science 294 797ndash9

[5] Calin G A et al 2002 Frequent deletions and down-regulationof micro-RNA genes miR15 and miR16 at 13q14 inchronic lymphocytic leukemia Proc Natl Acad Sci USA99 15524ndash9

[6] Michael M Z et al 2003 Reduced accumulation of specificmicroRNAs in colorectal neoplasia Mol Cancer Res1 882ndash91

[7] Takamizawa J et al 2004 Reduced expression of the let-7microRNAs in human lung cancers in association withshortened postoperative survival Cancer Res 64 3753ndash6

[8] Hayashita Y et al 2005 A polycistronic microRNA clustermiR-17-92 is overexpressed in human lung cancers andenhances cell proliferation Cancer Res 65 9628ndash32

[9] He L et al 2005 A microRNA polycistron as a potentialhuman oncogene Nature 435 828ndash33

[10] Metzler M et al 2004 High expression of precursormicroRNA-155BIC RNA in children with Burkittlymphoma Genes Chromosom Cancer 39 167ndash9

[11] Tagawa H and Seto M 2005 A microRNA cluster as a target ofgenomic amplification in malignant lymphoma Leukemia19 2013ndash6

[12] Chworos A et al 2004 Building programmable jigsaw puzzleswith RNA Science 306 2068ndash72

[13] Jaeger L and Chworos A 2006 The architectonics ofprogrammable RNA and DNA nanostructures Curr OpinStruct Biol 16 531ndash43

[14] Haasnoot J and Berkhout B 2006 RNA interference its use asantiviral therapy Handb Exp Pharmacol 173 117ndash50

[15] Haasnoot J Westerhout E M and Berkhout B 2007 RNAinterference against viruses strike and counterstrike NatBiotechnol 25 1435ndash43

[16] Birney E et al 2007 Identification and analysis of functionalelements in 1 of the human genome by the ENCODEpilot project Nature 447 799ndash816

[17] Amaral P P et al 2008 The eukaryotic genome as an RNAmachine Science 319 1787ndash9

[18] Sharp P A 2009 The centrality of RNA Cell 136 577ndash80[19] Breaker R R 2002 Engineered allosteric ribozymes as

biosensor components Curr Opin Biotechnol 13 31ndash9[20] Sioud M 2006 Ribozymes and siRNAs from structure to

preclinical applications Handb Exp Pharmacol173 223ndash42

[21] Romby P Vandenesch F and Wagner E G 2006 The role ofRNAs in the regulation of virulence-gene expression CurrOpin Microbiol 9 229ndash36

[22] Afonin K A et al 2008 TokenRNA a new type ofsequence-specific label-free fluorescent biosensor forfolded RNA molecules Chembiochem 9 1902ndash5

[23] Yingling Y G and Shapiro B A 2007 Computational design ofan RNA hexagonal nanoring and an RNA nanotubeNano Lett 7 2328ndash34

[24] Khaled A et al 2005 Controllable self-assembly ofnanoparticles for specific delivery of multiple therapeuticmolecules to cancer cells using RNA nanotechnologyNano Lett 5 1797ndash808

[25] Li L et al 2009 Evaluation of specific delivery of chimericphi29 pRNAsiRNA nanoparticles to multiple tumor cellsMol Biosyst 5 1361ndash8

[26] Gardner P P and Giegerich R 2004 A comprehensivecomparison of comparative RNA structure predictionapproaches BMC Bioinformatics 5 140

[27] Mathews D H and Turner D H 2006 Prediction of RNAsecondary structure by free energy minimization CurrOpin Struct Biol 16 270ndash8

[28] Shapiro B A et al 2007 Bridging the gap in RNA structureprediction Curr Opin Struct Biol 17 157ndash65

[29] Marti-Renom M A and Capriotti E 2008 Computational RNAStructure Prediction Curr Bioinform 3 32ndash45

[30] Schroeder S J 2009 Advances in RNA structure predictionfrom sequence new tools for generating hypotheses aboutviral RNA structurendashfunction relationships J Virol83 6326ndash34

[31] Leontis N B Lescoute A and Westhof E 2006 The buildingblocks and motifs of RNA architecture Curr Opin StructBiol 16 279ndash87

[32] Leontis N B Stombaugh J and Westhof E 2002 Thenon-WatsonndashCrick base pairs and their associatedisostericity matrices Nucleic Acids Res 30 3497ndash531

[33] Zirbel C L et al 2009 Classification and energetics of thebasendashphosphate interactions in RNA Nucleic Acids Res37 4898ndash918

[34] Waterman M S and Smith T F 1978 RNA secondary structurea complete mathematical analysis Math Biosci 42 257ndash66

[35] Benedetti G and Morosetti S 1996 A graph-topologicalapproach to recognition of pattern and similarity in RNAsecondary structures Biophys Chem 59 179ndash84

[36] Le S Y Nussinov R and Maizel J V 1989 Tree graphs of RNAsecondary structures and their comparisons ComputBiomed Res 22 461ndash73

[37] Shapiro B A and Zhang K Z 1990 Comparing multiple RNAsecondary structures using tree comparisons Comput ApplBiosci 6 309ndash18

[38] Gan H H et al 2004 RAG RNA-As-graphsdatabasemdashconcepts analysis and features Bioinformatics20 1285ndash91

[39] Schlick T 2009 Mathematical and biological scientists assessthe state-of-the-art in RNA science at an IMA workshoplsquoRNA in biology bioengineering and biotechnologyrsquo Int JMultiscale Comput Eng at press

[40] Al-Hashimi H M and Walter N G 2008 RNA dynamics it isabout time Curr Opin Struct Biol 18 321ndash9

[41] Cruz J A and Westhof E 2009 The dynamic landscapes ofRNA architecture Cell 136 604ndash9

[42] Draper D E 2008 RNA folding thermodynamic and moleculardescriptions of the roles of ions Biophys J 95 5489ndash95

[43] Tinoco I Jr and Bustamante C 1999 How RNA folds J MolBiol 293 271ndash81

[44] Greenleaf W J et al 2008 Direct observation of hierarchicalfolding in single riboswitch aptamers Science 319 630ndash3

[45] Noeske J et al 2007 Interplay of lsquoinduced fitrsquo andpreorganization in the ligand induced folding of theaptamer domain of the guanine binding riboswitch NucleicAcids Res 35 572ndash83

[46] Wu M Jr and Tinoco I 1998 RNA folding causes secondarystructure rearrangement Proc Natl Acad Sci USA95 11555ndash60

[47] Pan J Thirumalai D and Woodson S A 1997 Folding of RNAinvolves parallel pathways J Mol Biol 273 7ndash13

[48] Chauhan S and Woodson S A 2008 Tertiary interactionsdetermine the accuracy of RNA folding J Am Chem Soc130 1296ndash303

[49] Xin Y et al 2008 Annotation of tertiary interactions in RNAstructures reveals variations and correlations RNA14 2465ndash77

[50] Montange R K and Batey R T 2008 Riboswitches emergingthemes in RNA structure and function Annu Rev Biophys37 117ndash33

16

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 18: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

[51] Koev G et al 2002 The 3prime or minute-terminal structure requiredfor replication of Barley yellow dwarf virus RNA containsan embedded 3prime or minute end Virology 292 114ndash26

[52] Linnstaedt S D et al 2006 The role of a metastable RNAsecondary structure in hepatitis delta virus genotype IIIRNA editing RNA 12 1521ndash33

[53] McCormack J C et al 2008 Structural domains within the 3primeuntranslated region of Turnip crinkle virus J Virol82 8706ndash20

[54] Olsthoorn R C et al 1999 A conformational switch at the 3primeend of a plant virus RNA regulates viral replication EmboJ 18 4856ndash64

[55] Pan T and Sosnick T 2006 RNA folding during transcriptionAnnu Rev Biophys Biomol Struct 35 161ndash75

[56] Wong T N Sosnick T R and Pan T 2007 Folding of noncodingRNAs during transcription facilitated by pausing-inducednonnative structures Proc Natl Acad Sci USA104 17995ndash8000

[57] Leontis N B and Westhof E 2001 Geometric nomenclature andclassification of RNA base pairs RNA 7 499ndash512

[58] Draper D E 2004 A guide to ions and RNA structure RNA10 335ndash43

[59] Ramaswamy P and Woodson S A 2009 Global stabilization ofrRNA structure by ribosomal proteins S4 S17 and S20J Mol Biol 392 666ndash77

[60] Laing C et al 2009 Tertiary motifs revealed in analyses ofhigher-order RNA junctions J Mol Biol 393 67ndash82

[61] Laing C and Schlick T 2009 Analysis of four-way junctions inRNA structures J Mol Biol 390 547ndash59

[62] Lescoute A and Westhof E 2006 Topology of three-wayjunctions in folded RNAs RNA 12 83ndash93

[63] Wang J et al 2009 A method for helical RNA global structuredetermination in solution using small-angle x-ray scatteringand NMR measurements J Mol Biol 393 717ndash34

[64] Kim S H et al 1974 The general structure of transfer RNAmolecules Proc Natl Acad Sci USA 71 4970ndash4

[65] Ban N et al 2000 The complete atomic structure of the largeribosomal subunit at 24 A resolution Science 289 905ndash20

[66] Schluenzen F et al 2000 Structure of functionally activatedsmall ribosomal subunit at 33 angstroms resolution Cell102 615ndash23

[67] Wimberly B T et al 2000 Structure of the 30S ribosomalsubunit Nature 407 327ndash39

[68] Felden B 2007 RNA structure experimental analysis CurrOpin Microbiol 10 286ndash91

[69] Hohng S et al 2004 Conformational flexibility of four-wayjunctions in RNA J Mol Biol 336 69ndash79

[70] Walter F et al 1998 Global structure of four-way RNAjunctions studied using fluorescence resonance energytransfer RNA 4 719ndash28

[71] Merino E J et al 2005 RNA structure analysis at singlenucleotide resolution by selective 2prime-hydroxyl acylationand primer extension (SHAPE) J Am Chem Soc127 4223ndash31

[72] Mortimer S A and Weeks K M 2009 Time-resolved RNASHAPE chemistry quantitative RNA structure analysis inone-second snapshots and at single-nucleotide resolutionNat Protoc 4 1413ndash21

[73] Deigan K E et al 2009 Accurate SHAPE-directed RNAstructure determination Proc Natl Acad Sci USA106 97ndash102

[74] Gherghe C M et al 2009 Native-like RNA tertiary structuresusing a sequence-encoded cleavage agent and refinementby discrete molecular dynamics J Am Chem Soc131 2541ndash6

[75] Parisien M and Major F 2008 The MC-Fold and MC-Sympipeline infers RNA structure from sequence data Nature452 51ndash5

[76] Mathews D 2004 Predicting the secondary structure commonto two RNA sequences with Dynalign Curr ProtocBioinform chapter 12 p unit 124

[77] Sclavi B et al 1998 Following the folding of RNA withtime-resolved synchrotron x-ray footprinting MethodsEnzymol 295 379ndash402

[78] Duan S Mathews D H and Turner D H 2006 Interpretingoligonucleotide microarray data to determine RNAsecondary structure application to the 3prime end of Bombyxmori R2 RNA Biochemistry 45 9819ndash32

[79] Latham M P et al 2005 NMR methods for studying thestructure and dynamics of RNA Chembiochem 6 1492ndash505

[80] Zidek L Stefl R and Sklenar V 2001 NMR methodology forthe study of nucleic acids Curr Opin Struct Biol11 275ndash81

[81] Zuo X et al 2010 Solution structure of the cap-independenttranslational enhancer and ribosome-binding element in the3prime UTR of turnip crinkle virus Proc Natl Acad Sci USA107 1385ndash90

[82] Holbrook S R Holbrook E L and Walukiewicz H E 2001Crystallization of RNA Cell Mol Life Sci 58 234ndash43

[83] Spahn C M et al 2004 Cryo-EM visualization of a viralinternal ribosome entry site bound to human ribosomes theIRES functions as an RNA-based translation factor Cell118 465ndash75

[84] Andronescu M et al 2007 Efficient parameter estimation forRNA secondary structure prediction Bioinformatics23 i19ndash28

[85] Griffiths-Jones S et al 2003 Rfam an RNA family databaseNucleic Acids Res 31 439ndash41

[86] Hofacker I L and Stadler P F 2006 Memory efficient foldingalgorithms for circular RNA secondary structuresBioinformatics 22 1172ndash6

[87] Zuker M and Stiegler P 1981 Optimal computer folding oflarge RNA sequences using thermodynamics and auxiliaryinformation Nucleic Acids Res 9 133ndash48

[88] Bindewald E and Shapiro B A 2006 RNA secondary structureprediction from sequence alignments using a network ofk-nearest neighbor classifiers RNA 12 342ndash52

[89] Do C B Woods D A and Batzoglou S 2006 CONTRAfoldRNA secondary structure prediction without physics-basedmodels Bioinformatics 23 e90ndash8

[90] Hamada M et al 2009 Prediction of RNA secondary structureusing generalized centroid estimators Bioinformatics25 465ndash73

[91] Cannone J J et al 2002 The comparative RNA Web (CRW)site an online database of comparative sequence andstructure information for ribosomal intron and otherRNAs BMC Bioinformatics 3 2

[92] Gutell R R Lee J C and Cannone J J 2002 The accuracy ofribosomal RNA comparative structure models Curr OpinStruct Biol 12 301ndash10

[93] Hochsmann M Voss B and Giegerich R 2004 Pure multipleRNA secondary structure alignments a progressive profileapproach IEEEACM Trans Comput Biol Bioinform1 53ndash62

[94] Hofacker I L Fekete M and Stadler P F 2002 Secondarystructure prediction for aligned RNA sequences J MolBiol 319 1059ndash66

[95] Ruan J Stormo G D and Zhang W 2004 An iterated loopmatching approach to the prediction of RNA secondarystructures with pseudoknots Bioinformatics 20 58ndash66

[96] Touzet H and Perriquet O 2004 CARNAC folding families ofrelated RNAs Nucleic Acids Res 32 w142ndash5

[97] Jossinet F Ludwig T E and Westhof E 2007 RNA structurebioinformatic analysis Curr Opin Microbiol 10 279ndash85

[98] Machado-Lima A del Portillo H A and Durham A M 2008Computational methods in noncoding RNA researchJ Math Biol 56 15ndash49

17

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References
Page 19: Computational approaches to 3D modeling of RNA · folding. Section 4 describes current advances and limitations in experimental techniques used to determine both secondary and tertiary

J Phys Condens Matter 22 (2010) 283101 Topical Review

[99] Mathews D H 2006 Revolutions in RNA secondary structureprediction J Mol Biol 359 526ndash32

[100] Mathews D H et al 1999 Expanded sequence dependence ofthermodynamic parameters improves prediction of RNAsecondary structure J Mol Biol 288 911ndash40

[101] Xia T et al 1998 Thermodynamic parameters for an expandednearest-neighbor model for formation of RNA duplexeswith WatsonndashCrick base pairs Biochemistry 37 14719ndash35

[102] Znosko B M et al 2002 Thermodynamic parameters for anexpanded nearest-neighbor model for the formation ofRNA duplexes with single nucleotide bulges Biochemistry41 10406ndash17

[103] Mathews D H 2006 RNA secondary structure analysis usingRNAstructure Curr Protoc Bioinform chapter 12 p unit126

[104] Ding Y and Lawrence C E 2003 A statistical samplingalgorithm for RNA secondary structure prediction NucleicAcids Res 31 7280ndash301

[105] Giegerich R Voss B and Rehmsmeier M 2004 Abstract shapesof RNA Nucleic Acids Res 32 4843ndash51

[106] Voss B Giegerich R and Rehmsmeier M 2006 Completeprobabilistic analysis of RNA shapes BMC Biol 4 5

[107] Geis M et al 2008 Folding kinetics of large RNAs J MolBiol 379 160ndash73

[108] Shapiro B A et al 2001 The massively parallel geneticalgorithm for RNA folding MIMD implementation andpopulation variation Bioinformatics 17 137ndash48

[109] Shapiro B A et al 2001 RNA folding pathway functionalintermediates their prediction and analysis J Mol Biol312 27ndash44

[110] Lyngso R B and Pedersen C N 2000 RNA pseudoknotprediction in energy-based models J Comput Biol7 409ndash27

[111] Rivas E and Eddy S R 1999 A dynamic programmingalgorithm for RNA structure prediction includingpseudoknots J Mol Biol 285 2053ndash68

[112] Xayaphoummine A Bucher T and Isambert H 2005 KinefoldWeb server for RNADNA folding path and structureprediction including pseudoknots and knots Nucleic AcidsRes 33 w605ndash10

[113] Xayaphoummine A et al 2003 Prediction and statistics ofpseudoknots in RNA structures using exactly clusteredstochastic simulations Proc Natl Acad Sci USA100 15310ndash5

[114] Sato K et al 2009 CENTROIDFOLD a Web server for RNAsecondary structure prediction Nucleic Acids Res37 w277ndash80

[115] Harmanci A O Sharma G and Mathews D H 2007 Efficientpairwise RNA structure prediction using probabilisticalignment constraints in Dynalign BMC Bioinform 8 130

[116] Mathews D H and Turner D H 2002 Dynalign an algorithmfor finding the secondary structure common to two RNAsequences J Mol Biol 317 191ndash203

[117] Jaeger L Westhof E and Michel F 1993 Monitoring of thecooperative unfolding of the sunY group I intron ofbacteriophage T4 The active form of the sunY ribozyme isstabilized by multiple interactions with 3prime terminal introncomponents J Mol Biol 234 331ndash46

[118] Lehnert V et al 1996 New loopndashloop tertiary interactions inself-splicing introns of subgroup IC and ID a complete 3Dmodel of the Tetrahymena thermophila ribozyme ChemBiol 3 993ndash1009

[119] Massire C Jaeger L and Westhof E 1998 Derivation of thethree-dimensional architecture of bacterial ribonuclease PRNAs from comparative sequence analysis J Mol Biol279 773ndash93

[120] Dima R I Hyeon C and Thirumalai D 2005 Extractingstacking interaction parameters for RNA from the data setof native structures J Mol Biol 347 53ndash69

[121] Altschul S F et al 1990 Basic local alignment search toolJ Mol Biol 215 403ndash10

[122] Larkin M A et al 2007 Clustal W and Clustal X version 20Bioinformatics 23 2947ndash8

[123] Brown J W et al 2009 The RNA structure alignment ontologyRNA 15 1623ndash31

[124] Gan H H Pasquali S and Schlick T 2003 Exploring therepertoire of RNA secondary motifs using graph theoryimplications for RNA design Nucleic Acids Res31 2926ndash43

[125] Das R and Baker D 2007 Automated de novo prediction ofnative-like RNA tertiary structures Proc Natl Acad SciUSA 104 14664ndash9

[126] Rohl C A et al 2004 Protein structure prediction using RosettaMethods Enzymol 383 66ndash93

[127] Das R et al 2008 Structural inference of native and partiallyfolded RNA by high-throughput contact mapping ProcNatl Acad Sci USA 105 4144ndash9

[128] Lemieux S and Major F 2006 Automated extraction andclassification of RNA tertiary structure cyclic motifsNucleic Acids Res 34 2340ndash6

[129] St-Onge K et al 2007 Modeling RNA tertiary structure motifsby graph-grammars Nucleic Acids Res 35 1726ndash36

[130] Lepage G P 1978 A new algorithm for adaptivemultidimensional integration J Comput Phys27 192ndash203

[131] Jonikas M A et al 2009 Coarse-grained modeling of largeRNA molecules with knowledge-based potentials andstructural filters RNA 15 189ndash99

[132] Sharma S Ding F and Dokholyan N V 2008 iFoldRNAthree-dimensional RNA structure prediction and foldingBioinformatics 24 1951ndash2

[133] Ding F et al 2008 Ab initio RNA folding by discrete moleculardynamics from structure prediction to folding mechanismsRNA 14 1164ndash73

[134] Ding F and Dokholyan N V 2006 Emergence of proteinfold families through rational design PLoS Comput Biol2 e85

[135] Dokholyan N V et al 1998 Discrete molecular dynamicsstudies of the folding of a protein-like model Fold Des3 577ndash87

[136] Martinez H M Maizel J V Jr and Shapiro B A 2008RNA2D3D a program for generating viewing andcomparing 3-dimensional models of RNA J BiomolStruct Dyn 25 669ndash83

[137] Parisien M et al 2009 New metrics for comparing andassessing discrepancies between RNA 3D structures andmodels RNA 15 1875ndash85

[138] Hsin J et al 2008 Using VMD an introductory tutorial CurrProtoc Bioinform chapter 5 p unit 57

[139] Sarver M et al 2008 FR3D finding local and compositerecurrent structural motifs in RNA 3D structures J MathBiol 56 215ndash52

[140] Pang P S et al 2005 Prediction of functional tertiaryinteractions and intermolecular interfaces from primarysequence data J Exp Zool B 304 50ndash63

[141] Mokdad A and Frankel A D 2008 ISFOLD structureprediction of base pairs in non-helical RNA motifs fromisostericity signatures in their sequence alignmentsJ Biomol Struct Dyn 25 467ndash72

[142] Tirumalai D and Hyeon C 2009 Theory of RNA Folding FromHairpins to Ribozymes (Springer Series in Biophysicsvol 13) pp 27ndash47

[143] Wells S E et al 2000 Use of dimethyl sulfate to probe RNAstructure in vivo Methods Enzymol 318 479ndash93

[144] Andronescu M et al 2008 RNA STRAND the RNAsecondary structure and statistical analysis database BMCBioinformatics 9 340ndash9

[145] Cao S and Chen S J 2009 Predicting structures and stabilitesfor H-type pseudoknots with interhelix loops RNA15 696ndash706

18

  • 1 Introduction
  • 2 RNA structure
  • 3 RNA folding proteins dynamics pathways and pausing
  • 4 RNA structure determination
    • 41 Experimental techniques
    • 42 Current advances in RNA structure data
      • 5 RNA secondary-structure prediction approaches
        • 51 Secondary-structure prediction using a single sequence
        • 52 Multiple-sequence alignment
        • 53 Advances in and limitations to RNA secondary-structure prediction
          • 6 RNA tertiary-structure prediction programs
          • 7 Comparison of automated RNA 3D folding algorithms
            • 71 Methods
            • 72 Tertiary-structure prediction results
              • 8 Discussion
              • Acknowledgments
              • References