Upload
lynga
View
218
Download
2
Embed Size (px)
Citation preview
RNA-Puzzles Round II: assessment of RNA structureprediction programs applied to three large RNA structures
ZHICHAO MIAO,1 RYSZARD W. ADAMIAK,2 MARC-FRÉDÉRICK BLANCHET,3 MICHAL BONIECKI,4 JANUSZM. BUJNICKI,4,5 SHI-JIE CHEN,6 CLARENCE CHENG,7 GRZEGORZ CHOJNOWSKI,4 FANG-CHIEH CHOU,7
PABLO CORDERO,7 JOSÉ ALMEIDA CRUZ,1 ADRIAN R. FERRÉ-D’AMARÉ,8 RHIJU DAS,7 FENG DING,9 NIKOLAYV. DOKHOLYAN,10 STANISLAW DUNIN-HORKAWICZ,4 WIPAPAT KLADWANG,7 ANDREY KROKHOTIN,10
GRZEGORZ LACH,4 MARCIN MAGNUS,4 FRANÇOIS MAJOR,3 THOMAS H. MANN,7 BENOÎT MASQUIDA,11
DOROTA MATELSKA,4 MÉLANIE MEYER,12 ALLA PESELIS,13 MARIUSZ POPENDA,2 KATARZYNA J. PURZYCKA,2
ALEXANDER SERGANOV,13 JULIUSZ STASIEWICZ,4 MARTA SZACHNIUK,14 ARPIT TANDON,10 SIQI TIAN,7
JIAN WANG,15 YI XIAO,15 XIAOJUN XU,6 JINWEI ZHANG,8 PEINAN ZHAO,6 TOMASZ ZOK,14
and ERIC WESTHOF11Architecture et Réactivité de l’ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 67000 Strasbourg, France2Department of Structural Chemistry and Biology of Nucleic Acids, Structural Chemistry of Nucleic Acids Laboratory, Institute of BioorganicChemistry, Polish Academy of Sciences, 61-704 Poznan, Poland3Institute for Research in Immunology and Cancer (IRIC), Department of Computer Science and Operations Research, Université de Montréal,Montréal, Québec, Canada H3C 3J74Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland5Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University,61-614 Poznan, Poland6Department of Physics and Astronomy, Department of Biochemistry, and Informatics Institute, University of Missouri-Columbia, Columbia,Missouri 65211, USA7Department of Physics, Stanford University, Stanford, California 94305, USA8National Heart, Lung and Blood Institute, Bethesda, Maryland 20892-8012, USA9Department of Physics and Astronomy, College of Engineering and Science, Clemson University, Clemson, South Carolina 29634, USA10Department of Biochemistry and Biophysics, University of North Carolina, School of Medicine, Chapel Hill, North Carolina 27599, USA11Génétique Moléculaire Génomique Microbiologie, Institut de physiologie et de la chimie biologique, 67084 Strasbourg, France12Institut de génétique et de biologie moléculaire et cellulaire, 67400 Strasbourg, France13Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016, USA14Poznan University of Technology, Institute of Computing Science, 60-965 Poznan, Poland15Department of Physics, Huazhong University of Science and Technology, 430074 Wuhan, China
ABSTRACT
This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNAstructure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited orno homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed toadenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta,DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the threesubsequently released crystallographic structures, solved at diffraction resolutions of 2.5–3.2 Å, were carried out automaticallyusing various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo predictionabilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies tothe native structures, which suggests that computational methods for RNA structure prediction can already provide usefulstructural information for biological problems. However, the prediction accuracy for non-Watson–Crick interactions, key toproper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of thecontinuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/.
Keywords: 3D prediction; bioinformatics; force fields; X-ray structures; models; structure quality
Corresponding author: [email protected] published online ahead of print. Article and publication date are at
http://www.rnajournal.org/cgi/doi/10.1261/rna.049502.114. Freely availableonline through the RNA Open Access option.
© 2015 Miao et al. This article, published in RNA, is available under aCreative Commons License (Attribution 4.0 International), as described athttp://creativecommons.org/licenses/by/4.0/.
BIOINFORMATICS
RNA 21:1–19; Published by Cold Spring Harbor Laboratory Press for the RNA Society 1
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
INTRODUCTION
More than 100,000 structures are currently available in theProtein Data Bank (PDB) (Berman et al. 2000); however,RNA-containing structures take up <6% of these deposi-tions, including RNA structures complexed with other mol-ecules. Although protein related structures constitute >90%of the structure database, <1/1000th of the proteins withknown sequences have experimental structures available(Moult 2008). Given the vast number of noncoding RNAmolecules being discovered in cells and viruses, it is likelythat a very small part of the RNA conformational space hasbeen structurally characterized. RNA structure determina-tion efforts still have a long way to go, and computationalmodeling could play a major role in providing structural in-sights for various biological problem explorations.
“RNA-Puzzles” is a CASP-like (Moult et al. 2014) collec-tive blind experiment for the evaluation of three-dimensional(3D) RNA structure prediction. The primary aims of RNA-Puzzles are (i) to determine the capabilities and limitationsof current methods of 3D RNA structure prediction basedon sequence, (ii) to find whether and how progress hasbeenmade, as well as what has yet to be done to achieve bettersolutions, (iii) to identify whether there are specific bottle-necks that hold back the field, (iv) to promote the availablemethods and guide potential users in the choice of suitabletools for real-world problems, and (v) to encourage theRNA structure prediction community in their efforts to im-prove the current tools and to make automated predictiontools available. Until now, 12 puzzles have been set up andassessments of three puzzles were previously published(Cruz et al. 2012).
We now report a second round focusing on the predictionof large RNA structures, a lariat-capping ribozyme (formerlynamed GIR1), an adenosylcobalamin-binding riboswitch,and a T-box–tRNA complex (Peselis and Serganov 2012;Zhang and Ferre-D’Amare 2013;Meyer et al. 2014). No close-ly homologous structures existed in structure databases at thetime of the experiment, except for theAzoarcus group I intron(Adams et al. 2004) as a potential template for the catalyticcore of the GIR1 ribozyme, templates for the tRNA, and crys-tallographic structure of a segment of a T-box RNA withoutthe tRNA (Wang et al. 2010; Grigg et al. 2013). This roundof prediction focuses on (i) the automatic assessment of denovo prediction of large RNA structures, especially structuretopology, (ii) the evaluation of the contribution of simpleand fast experimental data in structure prediction, such aschemical probing data, and (iii) the identification of bottle-necks in modeling 3D interactions. The ultimate aim is toderive force fields and programming systems allowing for au-tomatic folding of RNA sequences in three-dimensional.However, at this stage, the assessment does notmake a distinc-tion between those groups deriving models based solely on abinitio predictions from those incorporating experimental datalike chemical probing. As a matter of fact, RNA-Puzzles led to
the development of automatic production and retrieval ofsolution data (see Kladwang et al. 2011a,b, 2014).For the three puzzles, the best RMSDs range between 6.8
and 11.7 Å, and all display similar topologies to the nativestructures. Given the sizes of the RNAs (>160 nt), this is avery positive trend for de novo structure modeling. Thebest models always show much better prediction of non-Watson–Crick interactions but also, surprisingly, relativelyhigh clash scores. This reemphasizes the importance ofnon-Watson–Crick interactions for RNA 3D structure mod-eling as well as the difficulty of predicting such interactionson the basis of RNA secondary structure even when comple-mented with chemical probing data. The observed atomicclashes, possibly due to the inclusions of experimental con-straints for nucleotide contacts in the prediction without ad-equate optimization, have led to further experiments andinsight toward better solutions, discussed below.
THE THREE RNA PUZZLES
Problem 5: the lariat-capping ribozyme
The lariat-capping ribozyme represents an individual familyof ribozymes that has evolved specific architectural featuresfrom a group I intron ancestor (Meyer et al. 2014). The LC ri-bozyme catalyzes a distinct reaction involving formation ofa 3-nt 2′,5′ lariat. The 188-nt long sequence is the following:
5′-GGUUGGGUUGGGAAGUAUCAUGGCUAAUCACCAUGAUGCAAUCGGGUUGAACACUUAAUUGGGUUAAAACGGUGGGGGACGAUCCCGUAACAUCCGUCCUAACGGCGACAGACUGCACGGCCCUGCCUCUUAGGUGUGUUCAAUGAACAGUCGUUCCGAAAGGAAGCAUCCGGUAUCCCAAGACAAUC-3′
The crystal structure was resolved to 2.45 Å resolution.Two crystallographic models became available after model-ing, with PDB ID’s 4P95 and 4P9R.
Problem 6: the adenosylcobalamin riboswitch
An adenosylcobalamin riboswitch was crystallized (Peselisand Serganov 2012). The 168 nt adenosylcobalamin ribo-switch consists of a ligand-bound structured core and abent peripheral domain. The sequence is the following:
5′-CGGCAGGUGCUCCCGACCCUGCGGUCGGGAGUUAAAAGGGAAGCCGGUGCAAGUCCGGCACGGUCCCGCCACUGUGACGGGGAGUCGCCCCUCGGGAUGUGCCACUGGCCCGAAGGCCGGGAAGGCGGAGGGGCGGCGAGGAUCCGGAGUCAGGAAACCUGCCUGCCG-3′
The crystal structure (PDB 4GXY) has a resolution of 3.05Å. An adenosylcobalamin molecule is given in the crystalstructure but was not revealed at the start of the puzzle.
Miao et al.
2 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
Problem 10: the T-box–tRNA complex structure
A T-box–tRNA complex structure was solved (Zhang andFerré-D’Amaré 2013). The sequence of the 96 nt T-box isas follows:
5′-UGCGAUGAGAAGAAGAGUAUUAAGGAUUUACUAUGAUUAGCGACUCUAGGAUAGUGAAAGCUAGAGGAUAGUAACCUUAAGAAGGCACUUCGAGCA-3′
The sequence of tRNA is the following (75 nt):
5′-GCGGAAGUAGUUCAGUGGUAGAACACCACCUUGCCAAGGUGGGGGUCGCGGGUUCGAAUCCCGUCUUCCGCUCCA-3′
The structure of the complex was solved at a resolutionof 3.20 Å (PDB 4LCK). The crystallized sequence was slightlydifferent (the acceptor region was engineered in tRNA), butthis detail of the crystal structure was not disclosed in thepuzzle. Several RNA modules, including a K-turn, a G-bulge,a double T-loop and an anticodon loop, appeared in thiscomplex structure.
Additional chemical-mapping data
The Das group provided chemical-mapping data on the threepuzzles to all the modelers. One-dimensional chemical-map-ping data and mutate-and-map (M2) data were acquired,quantitated, and normalized as described in Kladwang et al.(2014) and Seetin et al. (2014), respectively. Three probeswere used: 1M7 (a SHAPE reagent, 1-methyl-7-nitroisatoicanhydride, which acylates 2′-hydroxyls of flexible nucleo-tides); DMS (dimethyl sulfate, reacting with exposed N1/N3 of adenosine/cytosine; and CMCT (1-cyclohexyl(2-morpholinoethyl) carbodiimide metho-p-toluene sulfonate,reacting with exposed N1/N3 of guanosine/uracil) ([Morti-mer and Weeks 2007; Cordero et al. 2012a] and referencestherein). Data were released to modelers on the RNA Map-ping Database in standardized formats (Rocca-Serra et al.2011; Cordero et al. 2012b) in accession codes RNAPZ5_STD_0000, RNAPZ5_1M7_0002, RNAPZ5_DMS_0002;RNAPZ6_STD_0001, RNAPZ6_1M7_0002; RNAPZ10_STD_0001, RNAPZ10_STD_0002. Each group was giventhe possibility to use those data and each group describes be-low at which stage and how these solution data were usedduring the modeling process.
OVERALL COMPARISON RESULTS
Assessment methods
The automatic model assessment methods were the sameas previously used in RNA-Puzzles (Cruz et al. 2012). Togeometrically compare predicted models with the experi-
mental structures, we used the Root Mean Square Deviation(RMSD)measure, the Deformation Index (DI), and the com-plete Deformation Profile matrix (DP) which provides anevaluation of the predictive quality of a model at multiplescales (Parisien et al. 2009). The Clash Score as evaluatedby MolProbity is also used as a control measurement forthe quality of the geometric parameters of the models(Chen et al. 2010). Additionally, MCQ (Mean of CircularQuantities) score (Zok et al. 2014) was added as a referenceto assess prediction in terms of torsion angle space. MCQmeasures the dissimilarity between structures taking into ac-count rotatable bonds and sugar pucker. Due to its sensitivityto local differences and independence from structural align-ment, it may serve as a complement to methods based onatom coordinates. A single distortion, which can significantlyincrease global RMSD, influences only distorted residues incase of MCQ. On the other hand, numerous irregularitiesthat sharpen the backbone may cancel out when RMSD isconsidered, but are revealed in torsion angle space. An imple-mentation of MCQ score is publicly available for downloadunder http://www.cs.put.poznan.pl/tzok/mcq. It allows forseveral usage scenarios, among which the global option wasused to assess models in RNA-Puzzles Round II. For eachpair consisting of the target and the predicted structure,MCQ-global provides a single distance score, representingtheir mean dissimilarity. Its value was computed upon thedifferences between the corresponding sugar pseudorotationangle (P) and seven dihedral angles defined for a residue (α, β,γ, δ, ε, ζ, and χ). The final rank was built to grasp the overallresemblance of models to the target structure in terms oftheir trigonometric representation. The MCQ score rangesbetween 0° and 180°. MCQ-local computes raw differencesbetween particular dihedral angles, thus being sensitiveeven to the smallest discrepancy and allows the observationof high dissimilarity at the residue level. MCQ-global intro-duces an inevitable bias, since the information about singledistortions gets lost during the averaging. Therefore an inter-pretation of the global score should take into account struc-ture size. For large RNAs, global MCQ <15° indicates highsimilarity of structures, while global MCQ >45° indicatesan overall dissimilarity.
Problem 5: the lariat-capping ribozyme
A total of 25 predicted models were submitted with RMSDsranging from 9.15 to 36.5 Å (mean RMSD is 24.1 Å, see Table1). The top two models are better than the others in terms ofRMSD, Deformation Index, non-Watson–Crick interactionsand stacking (Fig. 1). The top three models also have >90%Watson–Crick (WC) base pairs correctly predicted. Most ofthe groups have, however, submitted models with very lowaccuracy in non-WC interactions. The last three modelsranked by RMSD have similarly poor levels of accuracy fornon-WC interactions, with worse prediction of WC pairsas well as worse stacking predictions. Several models present
RNA-Puzzles Round II
www.rnajournal.org 3
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
TABLE
1.Su
mmaryof
theresults
forPu
zzle
5
Group
aNum
berb
RMSD
cRan
kdDIa
lleRan
kdIN
Fallf
Ran
kdIN
Fwcg
Ran
kdIN
Fnw
chRan
kdIN
Fstacki
Ran
kdClash
Scorej
Ran
kdMCQ
kRan
kd
Das
29.15
112
.02
10.76
10.91
30.33
10.75
26.79
1210
.57
2Das
19.95
213
.15
20.76
20.92
10.26
20.75
19.44
2111
.63
4Ada
miak
116
.78
323
.96
30.70
130.90
50.00
50.66
2011
.10
2313
.30
13Dok
holyan
219
.90
429
.14
60.68
190.84
13∗
∗0.66
227.62
1413
.87
17Dok
holyan
419
.90
529
.14
70.68
200.84
14∗
∗0.66
237.62
1513
.87
18Bujnicki
220
.77
628
.09
40.74
30.86
90.00
110.74
51.66
812
.14
6Dok
holyan
620
.82
728
.65
50.73
40.91
20.00
70.69
1210
.43
2215
.81
23Dok
holyan
721
.44
830
.61
80.70
140.91
40.00
80.66
198.77
1914
.63
22Bujnicki
422
.24
932
.80
110.68
210.79
220.00
130.68
151.16
610
.34
1Bujnicki
322
.37
1036
.99
160.61
250.72
240.00
120.62
255.30
1012
.75
12Dok
holyan
523
.01
1132
.55
90.71
110.87
8∗
∗0.68
176.46
1113
.68
16Dok
holyan
323
.69
1232
.64
100.73
50.88
70.00
60.71
108.61
1814
.61
21Bujnicki
523
.81
1334
.15
130.70
160.86
100.00
140.68
160.66
511
.73
5Dok
holyan
824
.01
1433
.84
120.71
100.89
60.00
90.67
189.11
2014
.40
20Bujnicki
124
.78
1536
.69
150.68
220.79
210.00
100.69
131.32
711
.23
3Che
n2
25.67
1636
.33
140.71
120.80
200.00
160.73
67.78
1613
.36
14Che
n7
26.26
1739
.50
190.67
240.76
230.00
210.69
140.00
313
.39
15Dok
holyan
127
.20
1838
.03
170.72
80.84
12∗
∗0.70
117.45
1314
.33
19Che
n1
27.24
1938
.90
180.70
150.81
180.00
150.71
94.64
912
.19
7Che
n3
28.71
2041
.69
200.69
180.71
250.00
170.74
411
.42
2412
.35
10Che
n5
31.39
2144
.06
220.71
90.81
190.00
190.73
713
.41
2512
.19
8Che
n4
31.52
2243
.50
210.73
60.83
150.00
180.74
38.28
1712
.25
9Che
n6
31.55
2344
.07
230.72
70.83
160.00
200.73
80.00
212
.47
11Xiao
232
.53
2448
.54
240.67
230.83
170.20
40.64
240.17
419
.90
25Xiao
136
.54
2552
.90
250.69
170.86
110.21
30.66
210.00
119
.71
24Mean
24.05
34.48
0.70
0.84
0.05
0.69
5.97
13.47
Stan
dard
deviation
4.91
7.11
0.03
0.05
0.06
0.03
4.30
2.30
X-ray
mod
el5.86
Value
sin
each
row
correspo
ndto
apred
ictedmod
el.
a Nam
eof
theresearch
grou
pthat
subm
itted
themod
el.
bNum
berof
themod
elam
ongallm
odelsfro
mthesamegrou
p.c RMSD
ofthemod
elco
mpa
redwith
theaccepted
structure(in
Å).
dColum
nsindicate
therank
ofthemod
elwith
respecttotheleft-ha
ndco
lumnmetric.
e DIa
llistheDeformationInde
xtaking
into
acco
unta
llinteractions
(stacking,
Watson–
Crick,a
ndno
n-Watson–
Crick).
f INFallistheInteractionNetworkFide
litytaking
into
acco
unta
llinteractions.
g INFwcistheInteractionNetworkFide
litytaking
into
acco
unto
nlyWatson–
Crick
interactions.
hIN
Fnw
cistheInteractionNetworkFide
litytaking
into
acco
unto
nlyno
n-Watson–
Crick
interactions.
i INFstackistheInteractionNetworkFide
litytaking
into
acco
unto
nlystacking
interactions.
j Clash
Scoreas
compu
tedby
theMolProb
itysuite
(Daviset
al.2
007).
k MCQ
Scoreisaco
mpa
risonto
nativ
estructurein
torsionan
glespace(in
degrees).
∗ Means
theno
n-WCinteractionisno
tavaila
blein
thestructure.
Miao et al.
4 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
high values for the Clash Score (Chen et al. 2010), while theClash Score for the crystal structure is low at 5.86. This im-plies a need for updated dictionaries of distances and anglesor stronger constraints toward reasonable values both duringcrystal structure refinement and in struc-ture modeling. The crystal structure ofproblem 5 shows an open ring structurearound the center formed by a kissing in-teraction between two peripheral helices.This striking architecture with a clearlyvisible “hole” through the ring is not ex-actly predicted by any of the predictionmodels, although some groups correctlyidentified the overall topology (Fig. 2).
Problem 6: the adenosylcobalaminriboswitch
The RMSDs of the 34 submitted predic-tion models range from 11.4 to 37.0 Å
with a mean value of 23.1 Å (Table 2).These are very large RMSDs but the non-inclusion of the ligand in the puzzlecould be largely the cause of such highvalues. The Das, Major, and Chen groupsrank at the top as they have relatively highaccuracies in non-WC interaction pre-diction, while the other groups do nothave the correct non-WC interactions.As a large riboswitch structure, the nativestructure has a Clash Score of 7.98. Insuch a situation, it is probably under-standable that clashes appear in predic-tion models in order to maintain thesame topology as the native structure.Models from the Das group show muchbetter similarity to the native structurebut with much higher Clash Scores dueto the limited time available for refiningmodels for this target.
Problem 10: T-box–tRNA complex
Twenty-six prediction models were sub-mitted ranging from 6.8 to 16.9 ÅRMSD with 11.5 Å as the mean value(Table 3). As this is a complex of twoRNA molecules, we also compared themodels of each molecule separately. TheRMSDs of the T-box ranges from 5.96to 17.9 Å, with a mean RMSD of 12.1Å, exactly the same as the averageRMSD of the molecular complex. Asthe structure topology of tRNA is wellknown, the modeling is more accurate,
and the average RMSD achieved is 3.8 Å with a RMSD rangebetween 2.49 and 6.9 Å. Therefore, the key comparisons be-tween predictions are the T-box structure and the relativeorientation/interaction between the T-box and the tRNA.
FIGURE 1. Problem 5: the lariat-capping ribozyme (A) secondary structure and (B)Deformation Profile values for the three predicted models with lowest RMSD: Das model 2(green), Das model 1 (blue), and Adamiak model 1 (cyan). (Radial red lines) The minimum,maximum, and mean DP values for each domain. (C) Structure superimposition between nativestructure (green) and best predicted model (blue, Das model 2) with wall–eye stereorepresentation.
FIGURE 2. Illustration of the “ring” topology structure in Problem 5. Native structure with“ring” topology is shown in green; the best prediction model Das model 2 and the third best pre-diction Adamiak model 1 are shown in the same aspect in blue and red, respectively. Although thebest model cannot totally capture the “ring” topology, it is more similar to native topology thanothers.
RNA-Puzzles Round II
www.rnajournal.org 5
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
TABLE
2.Su
mmaryof
theresults
forPu
zzle
6
Group
aNum
berb
RMSD
cRan
kdDIa
lleRan
kdIN
Fallf
Ran
kdIN
Fwcg
Ran
kdIN
Fnw
chRan
kdIN
Fstacki
Ran
kdClash
Scorej
Ran
kdMCQ
kRan
kd
Das
411
.39
115
.72
10.72
110.89
50.32
80.70
1123
.48
3214
.20
8Das
612
.11
216
.67
20.73
80.89
70.33
70.70
1224
.59
3415
.66
14Das
213
.27
318
.29
30.73
70.89
40.35
20.71
817
.24
2914
.95
11Das
114
.27
419
.60
40.73
60.89
30.33
60.71
723
.85
3316
.06
15Das
914
.70
520
.46
50.72
140.89
80.35
30.69
1617
.98
3014
.06
7Das
315
.48
621
.38
70.72
100.87
130.22
140.72
517
.05
2813
.86
4Das
515
.57
720
.80
60.75
10.89
60.36
10.74
213
.39
2613
.19
1Das
717
.43
823
.75
90.73
40.87
140.24
130.73
413
.21
2515
.51
13Das
817
.57
923
.54
80.75
20.88
110.33
50.74
113
.02
2414
.21
9Major
320
.62
1028
.81
110.72
150.88
90.20
160.70
140.18
320
.24
30Major
220
.96
1128
.96
120.72
90.86
160.20
150.72
60.18
219
.94
27Major
121
.35
1228
.65
100.75
30.87
150.34
40.74
30.18
120
.85
34Major
721
.44
1329
.74
130.72
130.88
100.29
100.70
150.55
1020
.02
28Major
421
.55
1429
.79
140.72
120.89
10.20
170.70
100.18
420
.44
31Che
n4
21.77
1533
.35
160.65
250.84
240.00
290.63
270.73
1217
.01
19Che
n2
21.87
1636
.58
210.60
330.72
330.29
110.58
330.18
620
.79
33Dok
holyan
522
.27
1733
.36
170.67
240.85
190.00
250.64
267.52
2017
.42
20Major
622
.52
1830
.82
150.73
50.89
20.29
90.70
90.37
720
.05
29Che
n5
23.33
1936
.27
200.64
280.82
260.00
300.62
2815
.23
2715
.03
12Che
n3
23.37
2036
.84
220.64
300.85
200.11
190.59
320.37
820
.66
32Major
523
.63
2133
.96
180.70
170.86
170.10
200.68
180.55
918
.60
25Che
n1
23.88
2239
.21
260.61
320.80
300.18
180.57
340.18
518
.93
26Dok
holyan
424
.40
2336
.21
190.67
220.85
180.00
240.65
237.52
1917
.70
21Dok
holyan
125
.51
2437
.89
250.67
230.84
230.00
210.66
227.34
1816
.70
18Dok
holyan
325
.77
2537
.23
230.69
190.84
220.00
230.69
176.60
1718
.18
24Dok
holyan
226
.10
2637
.67
240.69
180.83
250.00
220.68
197.89
2117
.86
22Das
1028
.64
2740
.47
270.71
160.88
120.25
120.70
1319
.82
3114
.28
10Bujnicki
230
.55
2844
.96
280.68
210.81
27∗
∗0.67
211.28
1413
.58
3Che
n7
30.74
2948
.99
300.63
310.80
290.00
320.61
309.72
2316
.41
16Bujnicki
431
.73
3048
.67
290.65
270.78
310.00
280.65
241.83
1513
.99
5Bujnicki
331
.87
3153
.39
310.60
340.70
34∗
∗0.59
310.92
1313
.34
2Che
n6
35.73
3255
.76
330.64
290.81
280.00
310.62
292.02
1616
.48
17Bujnicki
136
.61
3353
.70
320.68
200.85
210.00
270.67
200.55
1114
.05
6Dok
holyan
636
.96
3456
.67
340.65
260.78
320.00
260.65
259.54
2217
.92
23Mean
23.09
34.06
0.69
0.84
0.17
0.67
7.80
16.83
Stan
dard
deviation
6.87
11.54
0.04
0.05
0.14
0.05
8.14
2.54
X-ray
mod
el7.98
Value
sin
each
row
correspo
ndto
apred
ictedmod
el.
a Nam
eof
theresearch
grou
pthat
subm
itted
themod
el.
bNum
berof
themod
elam
ongallm
odelsfro
mthesamegrou
p.c RMSD
ofthemod
elco
mpa
redwith
theaccepted
structure(in
Å).
dColum
nsindicate
therank
ofthemod
elwith
respecttotheleft-ha
ndco
lumnmetric.
e DIa
llisthede
form
ationinde
xtaking
into
acco
unta
llinteractions
(stacking,
Watson–
Crick,a
ndno
n-Watson–
Crick).
f INFallistheInteractionNetworkFide
litytaking
into
acco
unta
llinteractions.
g INFwcistheInteractionNetworkFide
litytaking
into
acco
unto
nlyWatson–
Crick
interactions.
hIN
Fnw
cistheInteractionNetworkFide
litytaking
into
acco
unto
nlyno
n-Watson–
Crick
interactions.
i INFstackistheInteractionNetworkFide
litytaking
into
acco
unto
nlystacking
interactions.
j Clash
Scoreas
compu
tedby
theMolProb
itysuite
(Daviset
al.2
007).
k MCQ
Scoreisaco
mpa
risonto
nativ
estructurein
torsionan
glespace(in
degree).
∗ Means
theno
n-WCinteractionisno
tavaila
blein
thestructure.
Miao et al.
6 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
TABLE
3.Su
mmaryof
theresults
forPu
zzle
10
Group
aNum
berb
RMSD
cRan
kdDIa
lleRan
kdIN
Fallf
Ran
kdIN
Fwcg
Ran
kdIN
Fnw
chRan
kdIN
Fstacki
Ran
kdClash
Scorej
Ran
kdMCQ
kRan
kd
Das
36.75
18.30
10.81
120.95
30.70
40.79
1411
.09
2416
.70
4Das
46.98
28.44
20.83
40.95
10.68
70.81
810
.73
2115
.91
1Das
17.49
39.09
30.82
60.92
50.70
30.81
711
.64
2516
.75
5Bujnicki
49.24
412
.33
70.75
150.87
170.40
160.74
150.91
221
.30
15Bujnicki
110
.01
511
.88
40.84
20.92
60.68
80.83
11.27
419
.92
11Bujnicki
810
.05
612
.15
60.83
50.90
110.64
100.82
31.45
719
.11
6Bujnicki
710
.11
712
.65
150.80
140.88
140.54
130.80
131.45
619
.66
10Bujnicki
510
.15
812
.39
80.82
90.87
190.56
120.83
21.81
1019
.44
7Bujnicki
610
.24
912
.62
140.81
130.86
200.51
140.82
51.27
519
.45
8Bujnicki
910
.25
1012
.57
120.82
110.88
150.69
60.81
100.91
319
.94
12Bujnicki
210
.27
1112
.53
110.82
80.90
90.67
90.81
91.63
919
.59
9Bujnicki
310
.27
1212
.48
100.82
70.91
70.56
110.82
60.91
121
.23
14Bujnicki
1010
.28
1312
.58
130.82
100.91
80.69
50.80
121.45
821
.04
13Das
510
.28
1412
.14
50.85
10.95
20.78
20.82
410
.91
2216
.39
3Das
210
.32
1512
.43
90.83
30.93
40.78
10.80
1111
.64
2616
.13
2Che
n1
11.14
1615
.47
160.72
160.84
240.48
150.71
177.81
1423
.02
25Dok
holyan
312
.00
1716
.78
170.72
170.88
160.10
230.71
167.80
1322
.22
20Dok
holyan
113
.04
1818
.84
180.69
190.86
210.10
220.68
1811
.07
2321
.86
16Dok
holyan
1013
.05
1919
.11
200.68
200.89
130.00
260.65
2110
.16
2022
.56
22Dok
holyan
413
.21
2018
.95
190.70
180.87
180.15
180.67
199.07
1822
.58
23Dok
holyan
213
.58
2120
.20
210.67
210.89
120.12
210.63
248.89
1622
.60
24Dok
holyan
815
.26
2222
.70
220.67
230.85
220.13
200.65
238.71
1522
.02
18Dok
holyan
715
.91
2323
.68
230.67
220.83
250.20
170.66
206.35
1121
.95
17Dok
holyan
916
.34
2424
.71
240.66
250.81
260.00
250.65
229.80
1922
.33
21Dok
holyan
516
.42
2526
.00
260.63
260.84
230.00
240.59
268.89
1723
.28
26Dok
holyan
616
.94
2625
.38
250.67
240.90
100.13
190.62
256.71
1222
.03
19Mean
11.52
15.63
0.76
0.89
0.42
0.74
6.32
20.35
Stan
dard
deviation
2.87
5.40
0.07
0.04
0.28
0.08
4.26
2.32
X-ray
mod
el2.28
Value
sin
each
row
correspo
ndto
apred
ictedmod
el.
a Nam
eof
theresearch
grou
pthat
subm
itted
themod
el.
bNum
berof
themod
elam
ongallm
odelsfro
mthesamegrou
p.c RMSD
ofthemod
elco
mpa
redwith
theaccepted
structure(in
Å).
dColum
nsindicate
therank
ofthemod
elwith
respecttotheleft-ha
ndco
lumnmetric.
e DIa
llistheDeformationInde
xtaking
into
acco
unta
llinteractions
(stacking,
Watson–
Crick,a
ndno
n-Watson–
Crick).
f INFallistheInteractionNetworkFide
litytaking
into
acco
unta
llinteractions.
g INFwcistheInteractionNetworkFide
litytaking
into
acco
unto
nlyWatson–
Crick
interactions.
hIN
Fnw
cistheInteractionNetworkFide
litytaking
into
acco
unto
nlyno
n-Watson–
Crick
interactions.
i INFstackistheInteractionNetworkFide
litytaking
into
acco
unto
nlystacking
interactions.
j Clash
Scoreas
compu
tedby
theMolProb
itysuite
(Daviset
al.2
007).
k MCQ
Scoreisaco
mpa
risonto
nativ
estructurein
torsionan
glespace(in
degree).
RNA-Puzzles Round II
www.rnajournal.org 7
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
We find models of the tRNA segment from the Bujnickigroup rank at the top, while the Das group does better forboth the T-box alone and overall T-box/tRNA models. Forthe T-box, the Das group shows good predictions for bothWC base pairs and non-WC interactions. However, theirmodels, even the tRNA structures, involve more atomicclashes. In comparison, the Bujnicki group achieved betteraccuracy on tRNA and included fewer atomic clashes.
MODELING METHODS
Seven research groups pursuing the development of automat-ic modeling approaches participated in this round of RNA-Puzzles experiments. The following provides a brief descrip-tion of the methodology and protocols used by the modelinggroups (arranged alphabetically), together with commentsand discussions.
Adamiak group
The RNA 3D structure from the Adamiak group was pre-dicted using automated method RNAComposer (Popendaet al. 2012) in its batch mode. RNAComposer server (http://rnacomposer.cs.put.poznan.pl) uses sequence and second-ary structure topology information in dot-bracket notation.Secondary structure (using the RNAStructure software)(Reuter and Mathews 2010) was adjusted using experimentaldata given for that RNA sequence by the Das group. Addi-tional information about potential pseudoknots or tertiarycontacts was obtained from manual analysis of the mutate-and-map data provided by the Das laboratory (RNA Map-ping Database).
For Problem 5, two interactions were found: (i) 28UC29with 93GA94 and (ii) 111GACUG115 with 148CAGUC152.Both were introduced as squared brackets into extendeddot-bracket notation input:
GGUUGGGUUGGGAAGUAUCAUGGCUAAUCACCAUGAUGCAAUCGGGUUGAACACUUAAUUGGGUUAAAACGGUGGGGGACGAUCCCGUAACAUCCGUCCUAACGGCGACAGACUGCACGGCCCUGCCUCUUAGGUGUGUUCAAUGAACAGUCGUUCCGAAAGGAAGCAUCCGGUAUCCCAAGACAAUC
(((((..(((((..(((((((((....[[.)))))))))..(((((((((((((......((((....((((((((((..]]))))....)).))))((....)).....[[[[[....)))).(((.....)))))))))).....]]]]].((((....))))..))))))...)))))..)))))
Before pressing “Compose” button in the batch mode, theoption “Add atom distance restraints” was checked to intro-duce restraints concerning the interactions 111GACUG115with 148CAGUC152. To do so, the RNA duplex with thesame sequence was extracted from the X-ray structure(PDB 2Z75, resolution 1.7 Å) and selected using search en-gine of RNA FRABASE (Popenda et al. 2008). Subsequently,related 508 distance restraints were calculated (between at-oms P, C1′, C2′, C5′, O3′, C2, C4, C6, and C8) and uploaded
to RNAComposer. RNAComposer (64-bit Intel Xeon 2.33GHz processor-based platform with scalable 8 GB memory)predicts 10 3D models within <10 min.The resulting models were inspected for the total energy
value calculated by RNAComposer and for the preservation ofthe 111GACUG115/148CAGUC152 and 28UC29/93GA94interactions. Models showing lowest total energy were chosenfor further analysis. Some fragment selections chosen byRNAComposer for the 3D structure assembly prohibitedthe formation of required contacts and such models wererejected at this stage. Subsequently, the selected modelswere investigated for the proximity of the 111GACUG115/148CAGUC152 pseudoknot region to the RNA termini.The mutate-and-map data (RNA Mapping Database) sug-gested that region hosting pseudoknot 111GACUG115/148CAGUC152 should be close to the molecule termini,namely to the internal loop 6GG7/182GA183. Themodel ful-filling this criterion and representing the lowest total energyestimated by RNAComposer was selected. Since RNACom-poser automatically conducts two energy minimization stepsprior to returning final RNA 3D structure this model did notrequire any further refinement. The model was validated us-ing NUCheck (Feng et al. 1998).
Bujnicki group
The Bujnicki group used a hybrid strategy similar to the oneused in the previous editions of the RNA-Puzzles experiment(Cruz et al. 2012), which comprised template-based (com-parative) modeling, global folding with restraints using acoarse-grained method for template-free folding, and high-resolution refinement.First, for all target sequences they attempted to identify
homologous families in the Rfam database (Burge et al.2013) and homologous RNAs with experimentally deter-mined structures. For RNA sequences or sequence fragmentsthat exhibited homology with RNAs with experimentallydetermined structures, initial models were constructedby template-based modeling and fragment assembly usingModeRNA (Rother et al. 2011). Target-template alignmentswere prepared manually, with the aid of secondary structureinformation extracted from Rfam and corrected if neededwith the use of predictions made with RNA metaserver(http://genesilico.pl/rnametaserver/) developed as a part ofthe CompaRNA project (Puton et al. 2013). This stage wasvery similar to that which they used previously in RNA-Puzzles (Cruz et al. 2012). For Problem 5, a lariat-capping ri-bozyme related to group I self-cleaving introns (Rfam familyRF01807), they used a group I intron structure (PDB 1ZZN)as the main template. For Problem 6, an adenosylcobalaminriboswitch (RF00174), they were unable to find a suitabletemplate, therefore no template-based modeling was per-formed. For Problem 10, a tRNA bound to a T-box RNA,template-based modeling of the entire complex was basedon the structure of a related complex (Grigg et al. 2013) built
Miao et al.
8 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
manually by the authors based on crystal and NMR struc-tures of fragments (PDB 4JRC and PDB 2KHY), with theuse of cross-linking, mutagenesis, and SAXS data.The aforementioned initial models of target structures (in
the case of Problem 6—an artificial circular conformation ofthe target sequence with 5′ and 3′ ends close to each other)were used as starting points for global refinement, usingthe SimRNA method for RNA folding simulations, whichuses a coarse-grained representation, relies on the MonteCarlo method for sampling the conformational space, anduses a statistical potential to approximate the energy andidentify conformations that correspond to biologically rele-vant structures (MJ Boniecki, G Lach, K Tomala, W Dawson,P Lukasz, T Soltysinski, KM Rother, and JM Bujnicki, inprep.). Here, they used a novel version of SimRNA, whichuses five (rather than three) atoms per residue: P of the phos-phate group, C4′ of the ribose moiety, and in which basemoi-eties are represented by triangles: N1–C2–C4 for pyrimidinesand N9–C2–C6 for purines. This representation providesmuch improved description of base faces and edges com-pared with the previous version that used only one atomper base (Cruz et al. 2012; Rother et al. 2012) and thereforeimproves the modeling of stacking and base-pairing interac-tions, e.g., it discriminates much better between canonicaland noncanonical base-pairing. Regions predicted to be con-fidently modeled in initial models were “frozen” while otherregions were allowed to change conformation. For modelingof complex 3D structures, SimRNA can use additional re-straints, derived from experimental or computational analy-ses, including information about secondary structure and/or long-range contacts. They have used such informationdepending on its availability. Typically, predictions were firstmade with restraints on predicted secondary structure and ifadditional data became available sufficiently long before theprediction deadline (e.g., results of experiments performedby the Das group and made available to all participants ofRNA-Puzzles), additional simulations were conducted. Giv-en the very tight deadline for Problem 6, they were unableto utilize additional data for this RNA, leading to poor results.Predictions generated by SimRNA were converted to full-
atom representation and ranked for submission using acombination of various criteria, including the results of clus-tering (the higher number of similar well-scored structuresthe better), agreement with experimental data not used inthe process of modeling, manual inspection, and scoringwith independent methods such as RASP (Capriotti et al.2011). If time permitted, models selected for submissionwere subjected to high-resolution refinement whose aimwas to reduce clashes, idealize geometries, and improve localinteractions such as in standard and non-WC base pairs.Here, they used a different method than previously, namelyan in-house software tool QRNAS (J Stasiewicz and JMBujnicki, unpubl.) that extends the AMBER force field withenergy terms explicitly modeling hydrogen bonds, idealizesbase pair planarity and regularizes backbone conformation.
As in their previous (Cruz et al. 2012) modeling exercise,human intervention was relatively large. Most of the timewas devoted to searching for additional information relatedto target RNA sequences and discussions within the group.Time used for alignment preparation and for selection ofmodels for submission varied greatly depending on the diffi-culty of the Problem. Time used for template-based model-ing was negligible. Time required for SimRNA modelingwas typically a few days per target, and the final refinementwas typically run overnight.
Chen group
The Chen group used a hierarchical approach to predict RNA3D structure from the sequence (Xu et al. 2014). For a givenRNA sequence, they first predict the secondary structurefrom the free energy landscape using the Vfold model (Caoand Chen 2005, 2006, 2009; Chen 2008). A unique featureof the Vfold model at secondary level is its ability to computethe RNA motif-based loop entropies. Using two virtualbonds per nucleotide to represent the backbone conforma-tion, Vfold model samples fluctuations of loops/junctionconformations in 3D space through conformational enumer-ation model (Cao and Chen 2005, 2006, 2009; Chen 2008).By calculating the probability of loop formation, the modelcan give the conformational entropy parameters for the for-mation of the different types of loops such as pseudoknotloops. Another notable feature of the Vfold model at second-ary level is the modeling of RNA loop free energy. By enu-merating all the possible (sequence-dependent) intra-loopmismatches, the Vfold model partially accounts for the se-quence-dependence of the loop free energy.Next, a 3D coarse-grained scaffold is constructed based on
the predicted secondary structure (Cao and Chen 2011). Toconstruct a 3D scaffold, the predicted helix stems are mod-eled as A-form helices. For the loops/junctions, 3D fragmentsfrom the known PDB database were used. Specifically, astructural template database (Xu et al. 2014) was built by clas-sifying the structures according to the different motifs such ashairpin loops and internal/bulge loops, three-way junctions,four-way junctions, pseudoknots, etc. For each junction, theoptimal (top 5) fragments were selected for the further struc-ture assembly of the whole RNA. Any 3D structures generat-ed by the structure assembly with structural clashes wouldbe excluded. The final all-atom structures were built basedon the coarse-grained model, followed by refinement usingAMBER energy minimization. Two thousand steps of energyminimization were run, applying 500.0 kcal/mol constraintsto all the residues, followed by another 2000 steps of minimi-zation without constraints.In order to increase the accuracy of RNA secondary struc-
ture prediction, they applied Rfam (Burge et al. 2013) to iden-tify the possible conserved base pairs and used the mostconserved base pair information as constraint to the Vfoldalgorithm to predict secondary structures. If available, the
RNA-Puzzles Round II
www.rnajournal.org 9
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
SHAPE (selective 2′-hydroxyl acylation analyzed by primerextension) (Merino et al. 2005) experimental data were alsoused as constraint in the Vfold algorithm for secondary struc-ture prediction. The SHAPE reactivity is strongly related tothe nucleotide flexibility at single nucleotide resolution.Specifically, some nucleotides are restricted to be in loop re-gions (without forming base pairs with other nucleotides)because of their high SHAPE reactivity. The combinationof SHAPE data and/or homologous sequence informationfromRfam and the Vfold algorithm led to enhanced accuracyof RNA secondary structure prediction.
For the RNA/RNA complex of Problem 10, they built the3D structures for each strand separately using the abovehierarchical approach (Xu et al. 2014). The final complexstructure was built manually based on the previously pub-lished SAXS-reconstructed envelope from DAMMIF (Fig. 5in Grigg et al. 2013). Then, they ran a short-time MD simu-lation to stabilize the interactions between the two RNAmolecules.
In summary, the computation involved two steps: (a) theprediction of the secondary structure and the constructionof the coarse-grained 3D structure and(b) AMBER energy minimization. Thecomputation time (Ta, Tb) for the twosteps are (∼2–3 h, <1 h), (<1 h, <1 h),and (<1 h, <3 h), for Problems 5, 6,and 10, respectively. The computationswere performed on a desktop PC withIntel Core(TM) i7-2600 CPU at 3.40GHz. They manually incorporated theconstraints from the SHAPE experi-ments and the Rfam results into theVfold model for secondary structure pre-diction. In addition to the constructionof 3D structures, the RNA/RNA complexfor Problem 10 involved human interfer-ence based on the SAXS data. All oth-er steps were achieved automatically bycomputations.
Das group
The chemical probing data, obtained inby the Das group, were first used to mod-el secondary structure with availableautomated algorithms and then furtherused for tertiary structure prediction.Automated secondary structure model-ing with RNAstructure was carried outas described in Tian et al. (2014), withtools available by server (http://rmdb.stanford.edu/structureserver/) (Corderoet al. 2012b). In all three Problems, useof 1D SHAPE data improved RNAstruc-ture 5.4 secondary structure predictions
compared with modeling without data, leading to perfect re-covery of helices in Problem 10 (Supplemental Table S1; Sup-plemental Fig. S6B). Nevertheless, approximately half ofhelices remained mispredicted in Problems 5 and 6. In Prob-lem 5, use of DMS/CMCT with RNAstructure gave bettermodels than SHAPE-guided modeling; but for Problems 6and 10, use of DMS/CMCT made models worse comparedeven with modeling without data. On an encouraging note,a bootstrapping procedure that gives conservative estimatesof modeling uncertainty (Kladwang et al. 2011b; Ramachan-dran et al. 2013) was able to highlight confident and noncon-fident regions in all cases. For example, any helix modeledwith >75% bootstrap values agreed with the subsequently re-leased crystallographic model (Supplemental Figs. S1–S3)(Fig. 3).More recent versions of RNAstructure have updated pa-
rameters for nearest-neighbor energies and for convertingSHAPE values to pseudoenergies, and also have the abilityto model pseudoknots (Hajdin et al. 2013). Although not astrictly blind test, the data above allowed a test of these ad-vances. Use of RNAstructure 5.6 Fold for SHAPE-directed
FIGURE 3. Problem 6: the adenosylcobalamin riboswitch (A) secondary structure and (B)Deformation Profile values for the three predicted models with lowest RMSD: Das model 4(green), Das model 6 (blue) and Das model 2 (cyan). (Radial red lines) The minimum, maxi-mum, andmean DP values for each domain. (C) Structure superimposition between native struc-ture (green) and best predicted model (blue, Das model 4) with wall–eye stereo representation.
Miao et al.
10 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
modeling did not improve modeling of Problem 5, and gaveless accurate models of the other Problems 6 and 10, com-pared with RNAstructure 5.4 Fold; the difference appearsto be due to a change in the parameters for SHAPE pseu-doenergy. Use of the ShapeKnots executable produced a sig-nificant improvement but still imperfect model in SHAPE-directed modeling of Problem 5, which contains a pseu-doknot in its catalytic core. In the other cases, ShapeKnotspredictions did not improve upon pseudoknot-free Foldmodeling. These results underscore the challenge of mod-eling RNA secondary structure using conventional 1Dchemical-mapping data, even with continuing algorithmicadvances. As has been discussed previously (Cordero et al.2012a; Leonard et al. 2013; Rice et al. 2014), protection of nu-cleotides may signal non-Watson–Crick rather thanWatson–Crick pairing in the structure, but current methods do notgenerally distinguish these possibilities (Figs. 4, 5).For Problems 5 and 6, secondary chemical-mapping
data were also acquired through the M2 approach (Corderoet al. 2014). In this method, chemical-mapping profiles aremeasured not only for the sequence of interest but also forvariants mutating each nucleotide in the RNA (Kladwanget al. 2011a). Increased reactivity of one nucleotide uponmutation of a sequence-distance nucleotide can signal theirinteraction in three dimensions, and these data can be lever-aged for automatic secondary structure inference in RNAs-tructure. For Problem 5, RNAstructure5.4 Fold guided by M2-SHAPE datarecovered all helices longer than 2 bp,except the catalytic core pseudoknot. In-tegrating M2 data with the more recentRNAstructure 5.6 ShapeKnots recoveredall of these helices, including the pseu-doknot (Supplemental Table S1; Sup-plemental Fig. S1L). For Problem 6, allhelices longer than 2 bp were recoveredcorrectly with M2-SHAPE data and by ei-ther RNAstructure executable. Errors inedge base pairs of several helices remain,as well as a register shift in Problem 5(which was corrected in M2-DMS anal-ysis; Supplemental Fig. S1P). Overall,these comparisons confirm that second-ary chemical mapping coupled to auto-mated algorithms consistently achievescorrect global secondary structures forcomplex RNA folds in terms of helixrecovery, but resolving fine errors inedge base pairs will require methodolog-ical improvements.Beyond basic secondary structure
modeling, this round of puzzles also in-spired development of computationalmethods for 3D modeling, primarily inthree areas. First, simple automated tools
in the Rosetta framework (Leaver-Fay et al. 2011) were creat-ed for threading structural templates into the desired se-quence, such as the catalytic core of Problem 5, the lariat-capping ribozyme (see also below). Second, the Das groupexpanded fragment assembly of RNA with full-atom refine-ment (FARFAR) (Das et al. 2010; Kladwang et al. 2011a),whose interface for job setup was previously cumbersome,especially to solve subpieces of large RNAs (e.g., the three-way P15/P8/P3 junction of Problem 5). The RNA puzzles in-spired them to write a single python script (rna_denovo_setup.py) for straightforward setup of FARFAR jobs, takingas input the full-model sequence and secondary structure,the residues of desired subdomain, and PDB models forany known subpieces of the subdomain. For Problem 6,the Das group also created a mode for setting up rigid-body“docking” of multiple RNA pieces including a placeholdersphere for the adenosylcobalamin ligand. Finally, an expan-sion of “stepwise” assembly was piloted, previously devel-oped for enumerative high-resolution modeling of motifs(Sripakdeevong et al. 2011; Chou et al. 2013), to gene ratecomplex RNA folds by progressively closing “rings” of motifsthrough numerous tertiary buildup paths (e.g., to model thering-like connection of the catalytic core with the P3/P8/P15junction and the P2.1/P5 kissing loops in the GIR1 lariat-cap-ping ribozyme). Compared with prior fragment assemblyapproaches from Das group (Kladwang et al. 2011a), this
FIGURE 4. Problem10: the T-box–tRNA complex (A) secondary structure and (B) DeformationProfile values for the three predicted models with lowest RMSD: Das model 3 (green), Das model4 (blue), and Das model 1 (cyan). (Radial red lines) The minimum, maximum, and mean DPvalues for each domain. (C) Structure superimposition between native structure (green) andbest predicted model (blue, Das model 3).
RNA-Puzzles Round II
www.rnajournal.org 11
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
stepwise strategy was efficient in generating realistic, con-verged conformations of complex folds with multiple tertiarycontacts at subhelical resolution. All Rosetta tools are freelyavailable to academic researchers (Leaver-Fay et al. 2011)and documented at https://www.rosettacommons.org/docs/latest/rna-denovo-setup.html/.
The process was a mix of automatic and manual steps, asmany of the tools were being developed “on-the-fly.” Onone hand, chemical-mapping-guided inference of secondarystructure to double-checkmodels from the literature was car-ried out automatically, but inference of some tertiary contactsfrom these data was guided by visual inspection of the M2
data (see below). On the other hand, identification of poten-tial templates for threading/homology modeling was not car-ried out automatically. Structural templates and alignmentswere instead derived from literature search (group I intronalignment to the lariat-capping ribozyme for Problem 5(Beckert et al. 2008); mapping “half” of the FMN riboswitchto the adenosylcobalamin-binding core for Problem 6 (Bar-rick and Breaker 2007; Geary et al. 2011); and mapping dou-ble T-loop (Grigg et al. 2013; Lehmann et al. 2013), sarcin–ricin loop (Yang et al. 2001) and kink-turn motifs (Vidovicet al. 2000) to T-box, the tRNA (Fukai et al. 2000), and theT-Box/tRNA interface derived from ribosome (Dunkleet al. 2011) for Problem 10) and manual expert inspection(refining the core of the Problem 5 group I intron alignment;a previously unrecognized kink-turn in the Problem 6 ad-enosylcobalamin riboswitch; the ribosome-bound tRNA/mRNA-like interaction in the Problem 10 T-box/tRNA com-plex). Fully automating template recognition and tertiarycontact inference-with guidance from readily available chem-ical-mapping data-appears to be an important challenge forthe field (Table 4).
For all three targets, experimental data were critical for rul-ing out structural hypotheses that would have required sub-stantial computational expense to explore, and in some cases,gave critical data that guided modeling, illustrated in Supple-mental Figures S4–S6. For Problem 5, two peripheral tertiarycontacts were not recognized in previous literature but wereimportant for defining ∼1/3 of the model. The contacts wereapparent in M2 data as changes in chemical mapping on oneside of the contact in response to mutations on the other side.For Problem 6, several secondary structure models had beenproposed in the literature (Ravnum and Andersson 2001;Nahvi et al. 2002, 2004; Vitreschak et al. 2003; Barrick andBreaker 2007), and the M2 analysis was important for unam-biguously confirming the correct model. Further, theM2 datashowed no evidence of extensive interaction between the helixP2 of the P1/P2/P3 junction and the long “arm” P7–P11, orfor interactions within the “arm”; so runs with those contactswere not set up. For Problem10,M2datawasnot acquired dueto time constraints (three other Problems were being mod-eled concomitantly), but the available 1D chemical-mappingdata helped rule out a potential fourth base pair neighboringthe three base pair interaction of the tRNA anticodon and itsT-box binding site; enforcing that interaction would haveproduced inaccurate distortions (Table 5).As many of the computational methods were being devel-
oped at the same time as modeling, performance was not op-timized. Thousands of CPU-hours were used (12 h for ∼20–100 cores) for each 3Dmodeling step that involved fragment-based assembly and refinement of subpieces. For the case ofProblem 5, the Das group ended up expending at least 30,000CPU-hours. Nevertheless, since the prediction period, fur-ther automation and optimization has brought the computa-tional expense of these procedures to under 10,000 CPU-hours per target, taking less than a week of wall clock time.It is noted that academic researchers interested in using thesetools canmake use of free “startup” allocations on the XSEDEsupercomputers of up to 20,000 CPU-hours. As for Problem6 (adenosylcobalamin riboswitch), both experiments andcomputational modeling were carried out in 1 wk.
Dokholyan group
The Dokholyan group at the University of North Carolinaat Chapel Hill in collaboration with the Ding group fromClemson provided predictions for Problems 5, 6, and10 using multiscale discrete molecular dynamics (DMD)method (Proctor et al. 2011; Shirvanyants et al. 2012). Thestructure modeling was performed with coarse-grained fold-ing simulations followed by an all-atom reconstruction.In the coarse-grained folding simulations, the three-beadRNA model, where each nucleotide is represented by threepseudoatoms corresponding to base, sugar, and phosphategroups (Ding et al. 2008), was used. The interactions betweenthe three beads are modeled based on information avail-able from high-resolution RNA structure database. Bonded
FIGURE 5. Modules in Problem 10. (A) Detailed structure of T-loop ofDas model 4, (B) detailed structure of U30 of Das model 4, (C) detailedstructure of K-turn of Das model 4, (D) detailed structure of Loop-E ofDas model 4.
Miao et al.
12 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
TABLE
4.Su
mmaryof
theresults
forPu
zzle
10t-bo
x
Group
aNum
berb
RMSD
cRan
kdDIa
lleRan
kdIN
Fallf
Ran
kdIN
Fwcg
Ran
kdIN
Fnw
chRan
kdIN
Fstacki
Ran
kdClash
Scorej
Ran
kdMCQ
kRan
kd
Das
45.96
17.40
10.81
30.91
20.62
60.80
310
.73
2113
.92
1Das
36.51
28.18
20.80
50.90
40.65
40.79
911
.09
2415
.39
5Das
18.08
310
.10
30.80
40.86
70.65
30.81
211
.64
2514
.87
3Das
59.32
411
.24
40.83
10.91
30.74
10.82
110
.91
2215
.08
4Das
29.40
511
.67
50.81
20.91
10.67
20.80
511
.64
2614
.74
2Bujnicki
410
.62
616
.64
150.64
200.81
200.00
250.64
180.91
221
.77
21Bujnicki
111
.66
714
.80
60.79
60.85
90.50
100.80
61.27
419
.85
10Bujnicki
811
.69
814
.86
70.79
70.85
100.56
80.79
71.45
718
.32
6Bujnicki
511
.87
915
.35
80.77
80.81
170.33
130.80
41.81
1019
.33
7Bujnicki
711
.88
1016
.14
140.74
140.81
180.30
140.75
141.45
620
.07
11Bujnicki
311
.98
1115
.74
110.76
110.86
80.38
120.77
100.91
122
.13
25Bujnicki
612
.00
1215
.95
130.75
130.79
240.17
150.79
81.27
519
.46
8Bujnicki
1012
.02
1315
.60
90.77
90.85
110.63
50.76
121.45
822
.13
24Bujnicki
912
.02
1415
.83
120.76
120.81
190.59
70.76
130.91
320
.43
12Bujnicki
212
.02
1515
.71
100.77
100.84
130.50
90.77
111.63
919
.56
9Dok
holyan
212
.29
1619
.44
180.63
230.80
220.00
170.62
208.89
1620
.90
14Dok
holyan
312
.67
1718
.75
170.68
160.84
120.00
180.67
167.80
1321
.65
20Che
n1
13.01
1818
.53
160.70
150.77
260.42
110.73
157.81
1422
.07
23Dok
holyan
813
.22
1921
.39
200.62
240.83
140.00
230.59
258.71
1521
.63
19Dok
holyan
913
.36
2021
.96
220.61
250.78
25∗
∗0.59
249.80
1921
.24
17Dok
holyan
1013
.47
2120
.90
190.65
190.88
60.00
240.61
2210
.16
2021
.98
22Dok
holyan
414
.14
2222
.23
230.64
220.82
150.00
190.62
219.07
1821
.60
18Dok
holyan
114
.42
2321
.93
210.66
170.80
210.00
160.66
1711
.07
2320
.92
15Dok
holyan
515
.99
2426
.69
250.60
260.82
160.00
200.57
268.89
1722
.41
26Dok
holyan
716
.87
2526
.47
240.64
210.79
230.00
220.64
196.35
1120
.64
13Dok
holyan
617
.93
2627
.69
260.65
180.88
50.00
210.61
236.71
1221
.21
16Mean
12.09
17.35
0.72
0.84
0.31
0.71
6.32
19.74
Stan
dard
deviation
2.76
5.35
0.08
0.04
0.29
0.09
4.26
2.67
X-ray
mod
el2.28
Value
sin
each
row
correspo
ndto
apred
ictedmod
el.
a Nam
eof
theresearch
grou
pthat
subm
itted
themod
el.
bNum
berof
themod
elam
ongallm
odelsfro
mthesamegrou
p.c RMSD
ofthemod
elco
mpa
redwith
theaccepted
structure(in
Å).
dColum
nsindicate
therank
ofthemod
elwith
respecttotheleft-ha
ndco
lumnmetric.
e DIa
llistheDeformationInde
xtaking
into
acco
unta
llinteractions
(stacking,
Watson–
Crick,a
ndno
n-Watson–
Crick).
f INFallistheInteractionNetworkFide
litytaking
into
acco
unta
llinteractions.
g INFwcistheInteractionNetworkFide
litytaking
into
acco
unto
nlyWatson–
Crick
interactions.
hIN
Fnw
cistheInteractionNetworkFide
litytaking
into
acco
unto
nlyno
n-Watson–
Crick
interactions.
i INFstackistheInteractionNetworkFide
litytaking
into
acco
unto
nlystacking
interactions.
j Clash
Scoreas
compu
tedby
theMolProb
itysuite
(Daviset
al.2
007).
k MCQ
Scoreisaco
mpa
risonto
nativ
estructurein
torsionan
glespace(in
degrees).
∗ Means
theno
n-WCinteractionisno
tavaila
blein
thestructure.
RNA-Puzzles Round II
www.rnajournal.org 13
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
TABLE
5.Su
mmaryof
theresults
forPu
zzle
10tRNA
Group
aNum
berb
RMSD
cRan
kdDIa
lleRan
kdIN
Fallf
Ran
kdIN
Fwcg
Ran
kdIN
Fnw
chRan
kdIN
Fstacki
Ran
kdClash
Scorej
Ran
kdMCQ
kRan
kd
Bujnicki
72.49
12.75
10.91
80.97
100.91
70.88
81.45
619
.03
5Bujnicki
92.50
22.76
20.91
90.97
110.91
80.88
90.91
319
.17
6Bujnicki
42.57
32.80
30.92
30.97
81.00
20.89
70.91
220
.56
15Bujnicki
22.60
42.85
40.91
40.97
90.91
60.89
41.63
919
.63
10Bujnicki
62.63
52.89
50.91
50.94
131.00
40.89
61.27
519
.42
8Bujnicki
82.65
62.98
100.89
100.94
140.80
150.88
101.45
720
.39
14Bujnicki
102.67
72.93
60.91
60.97
120.85
140.90
21.45
819
.39
7Bujnicki
52.68
82.96
90.91
70.92
161.00
30.89
51.81
1019
.62
9Bujnicki
32.69
92.93
70.92
20.97
70.89
90.90
10.91
119
.82
12Bujnicki
12.71
102.95
80.92
11.00
50.91
50.89
31.27
420
.03
13Das
32.92
113.39
110.86
151.00
20.85
110.81
1511
.09
2418
.78
4Das
52.96
123.41
120.87
121.00
40.85
130.83
1110
.91
2218
.45
3Das
13.01
133.50
130.86
141.00
10.85
100.81
1411
.64
2519
.82
11Das
23.21
143.69
140.87
110.97
61.00
10.81
1211
.64
2616
.19
1Das
43.32
153.85
150.86
131.00
30.85
120.81
1310
.73
2117
.12
2Che
n1
3.66
164.59
160.80
160.91
190.60
160.78
177.81
1424
.33
26Dok
holyan
64.63
176.52
180.71
240.90
200.26
200.67
256.71
1222
.38
18Dok
holyan
54.87
187.30
220.67
260.84
24∗
∗0.63
268.89
1723
.74
25Dok
holyan
94.88
196.93
200.71
250.82
260.00
240.72
229.80
1923
.39
24Dok
holyan
35.04
206.44
170.78
170.92
180.20
230.80
167.80
1322
.96
21Dok
holyan
15.27
217.01
210.75
190.92
170.22
220.74
1911
.07
2323
.38
23Dok
holyan
75.27
227.32
230.72
220.84
250.45
180.71
236.35
1122
.05
16Dok
holyan
45.30
236.91
190.77
180.92
150.45
170.73
219.07
1823
.18
22Dok
holyan
105.52
247.47
240.74
210.89
220.00
250.74
2010
.16
2022
.89
20Dok
holyan
85.74
257.63
250.75
200.87
230.26
210.75
188.71
1522
.15
17Dok
holyan
26.90
269.62
260.72
230.89
210.26
190.69
248.89
1622
.79
19Mean
3.80
4.78
0.83
0.94
0.65
0.81
6.32
20.79
Stan
dard
deviation
1.33
2.15
0.08
0.05
0.34
0.08
4.26
2.18
X-ray
mod
el2.28
Value
sin
each
row
correspo
ndto
apred
ictedmod
el.
a Nam
eof
theresearch
grou
pthat
subm
itted
themod
el.
bNum
berof
themod
elam
ongallm
odelsfro
mthesamegrou
p.c RMSD
ofthemod
elco
mpa
redwith
theaccepted
structure(in
Å).
dColum
nsindicate
therank
ofthemod
elwith
respecttotheleft-ha
ndco
lumnmetric.
e DIa
llistheDeformationInde
xtaking
into
acco
unta
llinteractions
(stacking,
Watson–
Crick,a
ndno
n-Watson–
Crick).
f INFallistheInteractionNetworkFide
litytaking
into
acco
unta
llinteractions.
g INFwcistheInteractionNetworkFide
litytaking
into
acco
unto
nlyWatson–
Crick
interactions.
hIN
Fnw
cistheInteractionNetworkFide
litytaking
into
acco
unto
nlyno
n-Watson–
Crick
interactions.
i INFstackistheInteractionNetworkFide
litytaking
into
acco
unto
nlystacking
interactions.
j Clash
Scoreas
compu
tedby
theMolProb
itysuite
(Daviset
al.2
007).
k MCQ
Scoreisaco
mpa
risonto
nativ
estructurein
torsionan
glespace(in
degree).
∗ Means
theno
n-WCinteractionisno
tavaila
blein
thestructure.
Miao et al.
14 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
interactions are based on parameters derived from covalent-bonding, bond angles and dihedral angles, while the non-bonded interactions are derived from base-pairing, basestacking, hydrophobic interactions, and phosphate–phos-phate repulsions. Replica exchange DMD simulations wereperformed (Ding et al. 2008) followed by a selection protocolto select the lowest energy structures. Briefly, structures wereselected from coarse-grained simulations based on energiesobtained using the coarse-grained energy function. In thefirst filter, all the structures from every replica, which arethe lowest ten percent of the energies and then perform hier-archical clustering for identifying the most dominant stateamong the lowest energy ensemble, were selected. The cen-troid of the most populated cluster was selected as the repre-sentative structure for the simulation. For the representativestructure, the model was further refined by performing all-atom reconstruction. The all-atom DMD approach forRNA is similar to one used for all-atom protein modeling(Ding et al. 2008).The CPU time for DMD-based RNA structure prediction
depends on the length of the RNA. Previous benchmarks’showed linear dependence on RNA length. For Problems 6and 10, the simulations were performed on UNC Killdevilcomputing cluster (each compute node consists of 12 core,2.99 GHz Intel processors, with either 48 of 96 Gb memory),for Problem 6, a 169-nt length RNA, the total CPU time was∼21 h for eight compute nodes, which roughly translates to∼2.75 h of real time in simulation. The clustering algorithmrun on similar compute node took <15 min to complete.In the predictions they used base-pairing information,
which was derived for each Problem using different methods.Problem 5 wasmainly based on biochemical data provided bythe Das group and sequence comparative analysis obtainedby multiple sequence alignment from Rfam (Griffiths-Joneset al. 2003). The same was true for Problem 6 with additionaldata obtained from the Mfold server (Zuker 2003). Problem10 used biochemical data from the Das group and the Mfoldserver (Zuker 2003).Experimentally derived tertiary structure information was
also used. In Puzzle 5, the specific long-range proximity con-straint between nucleotides 78 and 170 was inferred fromthe cleavage sites between helices 5 (P5) and 10 (P10). ForProblem 6, the two groups used the in-line probing data toapproximate the solvent accessibility (Nahvi et al. 2004),which were added to DMD simulation as the hydroxyl radicalprobing approach (Ding et al. 2012). In the case of Problem10, the two binding sites between tRNA and tBox were takenfrom experimental study (Grigg et al. 2013). The method isfully automated once the base-pairing information for theRNA has been provided.
Major group
The adenosylcobalamin riboswitch of Problem 6 was foundby sequence similarity using BLAST. The secondary structure
for this riboswitch was deduced by Barrick and Breaker(2007) and it was used as a primary template. Among the al-ternative structural elements, they kept the three-way andfour-way junctions, as well as the T-loop-type interaction be-tween the four-way junction and the lower part of P7.The Major group fed the stems P2 and P4 as constraints
to MC-Fold (Parisien and Major 2008). Various alternativesecondary structural elements were indicated by Barrick andBreaker, especially between P7 and P11. The Major groupidentified a sequence with a potential to adopt a kink-turn,similar to the kink-turn predicted in the snoRNA U3 C′/Dbox (Rozhdestvenskyet al. 2003).This kink-turnhas aGA tan-dem and an asymmetric loop. We decided to assume its for-mation by adding it to the constraints using a mask in MC-Fold:
GGGAGU-GCGAGGAUC((((((-))...))))
The 3D model was built by using the T-loop crystal struc-ture of a tRNA (PDB 1EVV). The kink-turn area was mod-eled after Kt-7 (PDB ID 3CC2 of the 23S rRNA ofHaloarcula Marismortui). The remaining parts and the finalassembly were modeled using MC-Sym (major.iric.ca/Web/mctools). The models generated by MC-Sym were mini-mized up to the “brushup” level of the MC-Pipeline. Theselection of the candidate models was based on “Score” val-ues, a homemade all-atoms force-field which is part of the“Analysis” module of the MC-pipeline.
Xiao group
The Xiao group used 3dRNA web server (http://122.205.6.127/3dRNA/3dRNA.html) (Zhao et al. 2012) to completethe prediction for Problems 5 and 6. 3dRNA builds tertiarystructure of an RNA molecule by assembling three-dimen-sional (3D) structural templates of its secondary structureelements, including helix, hairpin loop, internal loop, bulgeloop, and multiway junction. The 3D templates are from a li-brary extracted from experimental RNA structures. In addi-tion, 3dRNA can build different models for an RNAmolecule by using different templates for each of secondarystructure elements.In the prediction for Problems 5 and 6, they first predicted
the secondary structures of the submitted sequences by usingMfold (Zuker 2003) and RNAfold (Hofacker et al. 1994) andpicked out the optimal prediction. Then, the sequences andpredicted secondary structures were submitted to 3dRNAweb server and a set of structural models was generated. Fi-nally, these structural models were ranked with a scoringfunction 3dRNAscore (data not shown) and the lowest ener-gy structures were selected as the candidate structures.For Problem 5, the Xiao group used Mfold server (Zuker
2003) and RNAfold (Hofacker et al. 1994) to predict the sec-ondary structure. The Mfold web server gave 10 secondarystructures consisting of an optimal and nine suboptimal
RNA-Puzzles Round II
www.rnajournal.org 15
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
folds. The RNAfold web server gave an optimal secondarystructure by minimum free energy and an optimal secondarystructure by thermodynamic ensemble. They used both theMfold and RNAfold optimal folds to predict the tertiarystructure. One of the optimal secondary structure is athree-way-junction structure whose secondary structure is(..().().....()). This three-way junction had no 3D templatesin the library but 3dRNA could automatically search the li-brary and pick out a nearest three-way junction, whose sec-ondary structure is (..()..()....()). In this case, the loop wasextracted from a ribozyme fragment 1GID. 3dRNA thenmade deletion and insertion operations on it to match thesecondary structure needed, i.e., (..().().....()). After that,the 3D templates of other secondary structural elementswere automatically searched for by the searching module of3dRNA and then assembled to a whole all-atom structureby the assembling module of 3dRNA. Finally, the structurewas refined by AMBER energy minimization. All calculationswere performed on an Intel S5500BC server (Intel(R) Xeon(R) CPU E5620 @ 2.40 GHz). The template searching and as-sembling process took ∼2 min. The AMBER energy minimi-zation took ∼3 min.
As above, for Problem 6, the Xiao group used Mfold andRNAfold to predict the secondary structure. The optimal sec-ondary structure predicted by Mfold is a four-way junctionstructure and the optimal secondary structure predicted byRNAfold is made of three-way junction structures. The bar-rier for 3dRNA is the lack of the 3D templates for the four-way junction: (().....()....().......). As previously, 3dRNAsearched the templates library for the nearest loop. A four-way junction with the secondary structure (..()........()......()..) was picked out. It was extracted from rRNA 1C2W.After deletion and insertion operations, a 3D template offour-way junction with the secondary structure (().....()....().......) was created. After that, the whole tertiary structurewas assembled smoothly. Ten models were predicted foreach of the optimal predicted secondary structures andthen scored by 3dRNAscore. The lowest energymodel was se-lected as the candidate and further refined with AMBER en-ergy minimization. All calculations were performed on anIntel S5500BC server (Intel(R) Xeon(R) CPU E5620 @ 2.40GHz). The whole template searching and assembling processtook <3 sec this time as the 3dRNA has been reimplemented.The AMBER energy minimization process took 38 sec.
DISCUSSION
Except in some cases (Levitt 1969; Michel and Westhof1990), RNA 3D structure prediction has historically laggedbehind protein structure prediction, although RNAWatson–Crick pairing (secondary structure) is simpler topredict than, for example, β−sheet pairings in proteins.Nevertheless, compared to protein structure, RNA hasmore degrees of freedom. In addition, despite the limitednumber of non-Watson–Crick base pairs that simplifies anal-
ysis and inspection of tertiary structure, these non-Watson–Crick base pairs are difficult to recognize but central to thethree-dimensional architecture of folded RNA molecules.The backbone of a nucleotide has six rotatable bonds, whilean amino acid includes three (and theω dihedral angle is gen-erally fixed∼180° in peptide plane). Therefore, the RNA con-formational landscape is potentially much larger and thethree-dimensional structure prediction of RNAs with 100nt is comparable, in terms of the number of degrees of free-dom andmolecular weight, to the challenge of modeling pro-teins of 200–300 aa. Although the structures in this round ofRNA-Puzzles are large, topologies of the best predictions arenot extreme compared with the native structures. This is apositive signal for RNA structure modeling.In the current stage, most predictions can achieve good ac-
curacy on Watson–Crick base pairs, while non-Watson–Crick interactions remain an open challenge that constitutean important bottleneck in RNA structure modeling. Toimprove non-WC pair prediction, RNA module predic-tion should be emphasized, since RNA modules are stablein structure but difficult to predict. Programs such asRMDetect (Cruz andWesthof 2011) could help in predictingRNA modules and improve the non-WC interaction predic-tion, e.g., for the K-turn in the adenosylcobalamin riboswitchwhich was recognized by modelers but not automatically.Unlike the numerous structures available for proteins, thenumber of RNA structures solved by crystallization is stilllimited and the available conformational space of RNA fold-ing is far from complete. The prediction of non-WC pair andstacking could also be improved with the increase of knownRNA structures and a complete search for RNA modules.Other than module prediction programs, easy and fast ex-
periments can provide direct constraints in structure model-ing. In the protein structure prediction trials CASP Round X(Moult et al. 2014), a new category of “contact-assisted” pre-diction was proposed. Experimental data such as NMR,chemical shift, cross-linking, and surface labeling have beenproved to be instrumental. Previously, contacts inferredfrom evolutionary information also achieved success in pro-tein structure modeling (Morcos et al. 2011) but, at the timeof writing, they still have not had an impact in blind structureprediction tests (Moult et al. 2014). Nevertheless, theseexplorations have revealed a trend in structure modeling:With the help of simple experimental constraints, structuremodeling could achieve the application level in providingstructural information for biological problems, even if no ho-mologous structures are available. According to the threelarge RNA structures in this round of RNA-Puzzles, the mod-eling of RNA topology structures are already close to native,and the relative orientation between T-box and tRNA struc-tures are recovered at a resolution (6.8 Å) comparable to thespacing between nucleotides.Although the best predictions are similar to the topology
of native RNA structures, some beauties in native structure to-pologies cannot be captured.As an example, the dramatic hole
Miao et al.
16 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
in the ring structure formed by two helices in Problem 5 wasnot described at even nucleotide resolution by any predictionmodel. Current fast experiments can only help in detecting lo-cal and detailed interactions rather than global architecturesdetermined by long-range contacts. Thesemethods aremain-ly sensitive to detailed contacts but are largely uninformativeas to global information such as holes. For consistent refine-ment to higher resolution 3D models that predict these strik-ing features, we still need deeper understanding of RNAstructure and/or new fast experimental tools.Currently, the major challenges in RNA structure predic-
tion lie in (1) further improvement of algorithms that in-corporate simple experimental data (contact-assisted data),(2) structure optimization to alleviate atomic clashes andimprove accuracy, and (3) accumulation of comprehensiveRNA structure knowledge with the help of database increasesand automated structural bioinformatic tools. The surprisinghigh values for the clash scores in several otherwise respectablemodels led to attempts to improve the clash score values byrerefinement. The Das group had run ERRASER (Chouet al. 2013), but ERRASER only works to find solutions witheach nucleotide within ∼2 Å RMSD of the starting solution.Many of the derived models used fragments of other crystalstructures that did not fit well together. The relief of chain-breaks and clashes require bigger changes than ERRASERcan currently handle. After an exchange of the previous ver-sions of this article, the Bujnicki group ran their refinementmethod QRNAS on the Das models (all 17 models submittedfor the three Problems). QRNAS is essentially a reimplemen-tation of AMBER with additional regularization and it is usedas the final element in the Bujnicki modeling pipeline. In allcases, a dramatic reduction of Clash Scores was obtained; in10 models even down to zero. Only in three cases the ClashScores remained larger than 4; however, thesemodels had ini-tial Clash Scores of nearly 30. Supplemental Table S2 showsthe values of all the metrics used for comparisons and mostof them display an improvement or at least no worsening.However, the bond angle deviations increase severely in allcases, a not so surprising result since that parameter waskept free during optimization. Thus, further work is requiredfor resolving clashes in automatically derived models.As attested by the number of coauthors involved in these
three RNA Puzzles in most modeling groups, the automatic-ity of the 3D structure prediction process still requires a ma-jor investment in computer science and in the developmentof user-friendly and straightforward computer tools. There-fore, in order to make RNA 3D structure prediction availableto the biological community in solving biological problems,we encourage web servers for automatic RNA 3D structureprediction. Such web servers should take query sequences,probably together with simple experimental data, and returnpossible RNA 3D coordinates. As described, several groupshave already advanced in this direction. As inspiration, in re-cent years, servers have largely caught up with human expertgroups in protein structure prediction (Moult et al. 2014),
and it will be interesting to see if the RNA community can ac-complish the same.Finally, in the present comparisons, it is assumed that
the crystal structure is the relevant and correct target. Cry-stallographic structures constitute highly relevant modelsrepresenting with high precision and accuracy particular ex-periments and conditions. However, not all segments of crys-tallized structures are at the same level of accuracy, because ofresolution issues, disorder, or high segmental mobilities (asrepresented by the thermal B-factors). For those segments,the uncertainty of the reference structure is a real question.A meaningful comparison would thus require that the pre-diction programs derive also a theoretical B-factor for the nu-cleotides representing some aspects of the uncertainty in theprediction. Preliminary results indicate that regions with highexperimental B-factors correlate with regions in disagree-ment with the rest of the structure (regions in red color inthe deformation profiles). Thoughtful weighted comparisonsneed to be developed to address these issues of molecular dy-namics during comparisons between crystal structures andpredicted models.
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.
ACKNOWLEDGMENTS
This work (Z.M. and E.W.) is supported by the French Governmentgrant ANR-10-BINF-02-02 <BACNET>. This work (E.W.) hasbeen published under the framework of the LABEX: ANR-10-LABX-0036_NETRNA and benefits from a funding managed bythe French National Research Agency as part of the Investmentsfor the future program. N.V.D. acknowledges support of theNational Institutes of Health (NIH) grant 2R01GM064803-09 (PI:K.M. Weeks). The 3D structure modeling was supported by theNIH (5 T32 GM007276 to C.C.; R01 GM102519 to R.D.), theBurroughs-Wellcome Foundation (CASI to R.D.), Bio-X andHHMI international fellowships (F.C.C.), a Stanford GraduateFellowship (S.T.), a Conacyt fellowship (P.C.), European ResearchCouncil (grant RNA+P=123D to J.M.B.), Foundation for PolishScience (grant TEAM/2009-4/2 to J.M.B.), European Commission(EU structural funds, POIG.02.03.00-00-003/09 to J.M.B.), bythe Polish National Science Centre (2012/06/A/ST6/00384 toR.W.A. and 2012/04/A/NZ2/00455 to J.M.B.) and Polish Ministryof Science and Higher Education (0492/IP1/2013/72 to K.J.P.).A.P. was supported by the NIH grant T32 GM088118. A.S. wassupported by NYU Whitehead fellowship and the NIH grantR21MH103655. A.R.F. and J.Z. were supported by the IntramuralProgram of the National Heart, Lung and Blood Institute-NIH.
Received January 10, 2015; accepted February 12, 2015.
REFERENCES
Adams PL, Stahley MR, Kosek AB, Wang JM, Strobel SA. 2004. Crystalstructure of a self-splicing group I intron with both exons. Nature430: 45–50.
RNA-Puzzles Round II
www.rnajournal.org 17
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
Barrick JE, Breaker RR. 2007. The distributions, mechanisms, and struc-tures of metabolite-binding riboswitches. Genome Biol 8: R239.
Beckert B, Nielsen H, Einvik C, Johansen SD, Westhof E, Masquida B.2008. Molecular modelling of the GIR1 branching ribozyme givesnew insight into evolution of structurally related ribozymes.EMBO J 27: 667–678.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,Shindyalov IN, Bourne PE. 2000. The protein data bank. NucleicAcids Res 28: 235–242.
Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP,Eddy SR, Gardner PP, Bateman A. 2013. Rfam 11.0: 10 years ofRNA families. Nucleic Acids Res 41: D226–D232.
Cao S, Chen SJ. 2005. Predicting RNA folding thermodynamics with areduced chain representation model. RNA 11: 1884–1897.
Cao S, Chen SJ. 2006. Predicting RNA pseudoknot folding thermody-namics. Nucleic Acids Res 34: 2634–2652.
Cao S, Chen SJ. 2009. Predicting structures and stabilities for H-typepseudoknots with interhelix loops. RNA 15: 696–706.
Cao S, Chen SJ. 2011. Physics-based de novo prediction of RNA 3Dstructures. J Phys Chem B 115: 4216–4226.
Capriotti E, Norambuena T, Marti-RenomMA, Melo F. 2011. All-atomknowledge-based potential for RNA structure prediction and assess-ment. Bioinformatics 27: 1086–1093.
Chen SJ. 2008. RNA folding: conformational statistics, folding kinetics,and ion electrostatics. Annu Rev Biophys 37: 197–214.
Chen VB, Arendall WB III, Headd JJ, Keedy DA, Immormino RM,Kapral GJ, Murray LW, Richardson JS, Richardson DC. 2010.MolProbity: all-atom structure validation for macromolecular crys-tallography. Acta Crystallogr D Biol Crystallogr 66: 12–21.
Chou FC, Sripakdeevong P, Dibrov SM, Hermann T, Das R. 2013.Correcting pervasive errors in RNA crystallography through enu-merative structure prediction. Nat Methods 10: 74–76.
Cordero P, Kladwang W, VanLang CC, Das R. 2012a. Quantitativedimethyl sulfate mapping for automated RNA secondary structureinference. Biochemistry 51: 7037–7039.
Cordero P, Lucks JB, Das R. 2012b. An RNA mapping dataBase forcurating RNA structure mapping experiments. Bioinformatics 28:3006–3008.
Cordero P, Kladwang W, VanLang CC, Das R. 2014. The mutate-and-map protocol for inferring base pairs in structured RNA. MethodsMol Biol 1086: 53–77.
Cruz JA, Westhof E. 2011. Sequence-based identification of 3D struc-tural modules in RNA with RMDetect. Nat Methods 8: 513–521.
Cruz JA, Blanchet MF, Boniecki M, Bujnicki JM, Chen SJ, Cao S, Das R,Ding F, Dokholyan NV, Flores SC, et al. 2012. RNA-Puzzles: aCASP-like evaluation of RNA three-dimensional structure predic-tion. RNA 18: 610–625.
Das R, Karanicolas J, Baker D. 2010. Atomic accuracy in predicting anddesigning noncanonical RNA structure. Nat Methods 7: 291–294.
Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X,Murray LW, Arendall WB 3rd, Snoeyink J, Richardson JS, et al.2007. MolProbity: all-atom contacts and structure validation forproteins and nucleic acids. Nucleic Acids Res 35: W375–W383.
Ding F, Sharma S, Chalasani P, Demidov VV, Broude NE,Dokholyan NV. 2008. Ab initio RNA folding by discrete moleculardynamics: from structure prediction to folding mechanisms. RNA14: 1164–1173.
Ding F, Lavender CA, Weeks KM, Dokholyan NV. 2012. Three-dimen-sional RNA structure refinement by hydroxyl radical probing. NatMethods 9: 603–608.
Dunkle JA, Wang L, FeldmanMB, Pulk A, Chen VB, Kapral GJ, NoeskeJ, Richardson JS, Blanchard SC, Cate JH. 2011. Structures of the bac-terial ribosome in classical and hybrid states of tRNA binding.Science 332: 981–984.
Feng Z, Westbrook J, Berman HM. 1998. NUCheck. NDB-407 RutgersUniversity, New Brunswick, NJ.
Fukai S, Nureki O, Sekine S, Shimada A, Tao J, Vassylyev DG, YokoyamaS. 2000. Structural basis for double-sieve discrimination of L-valine
from L-isoleucine and L-threonine by the complex of tRNA(Val)and valyl-tRNA synthetase. Cell 103: 793–803.
Geary C, Chworos A, Jaeger L. 2011. Promoting RNA helical stacking viaA-minor junctions. Nucleic Acids Res 39: 1066–1080.
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. 2003.Rfam: an RNA family database. Nucleic Acids Res 31: 439–441.
Grigg JC, Chen Y, Grundy FJ, Henkin TM, Pollack L, Ke A. 2013. T boxRNA decodes both the information content and geometry of tRNAto affect gene expression. Proc Natl Acad Sci 110: 7240–7245.
Hajdin CE, Bellaousov S, Huggins W, Leonard CW, Mathews DH,Weeks KM. 2013. Accurate SHAPE-directed RNA secondary struc-ture modeling, including pseudoknots. Proc Natl Acad Sci 110:5498–5503.
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M,Schuster P. 1994. Fast folding and comparison of RNA secondarystructures. Monatsh Chem 125: 167–188.
KladwangW, VanLang CC, Cordero P, Das R. 2011a. A two-dimension-al mutate-and-map strategy for non-coding RNA structure. NatChem 3: 954–962.
Kladwang W, VanLang CC, Cordero P, Das R. 2011b. Understandingthe errors of SHAPE-directed RNA structure modeling. Biochemistry50: 8049–8056.
Kladwang W, Mann TH, Becka A, Tian S, Kim H, Yoon S, Das R.2014. Standardization of RNA chemical mapping experiments.Biochemistry 53: 3063–3065.
Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R,Kaufman K, Renfrew PD, Smith CA, Sheffler W, et al. 2011.ROSETTA3: an object-oriented software suite for the simulationand design of macromolecules. Methods Enzymol 487: 545–574.
Lehmann J, Jossinet F, Gautheret D. 2013. A universal RNA structuralmotif docking the elbow of tRNA in the ribosome, RNAse P andT-box leaders. Nucleic Acids Res 41: 5494–5502.
Leonard CW, Hajdin CE, Karabiber F, Mathews DH, Favorov OV,Dokholyan NV, Weeks KM. 2013. Principles for understandingthe accuracy of SHAPE-directed RNA structure modeling. Biochem-istry 52: 588–595.
Levitt M. 1969. Detailed molecular model for transfer ribonucleic acid.Nature 224: 759–763.
Merino EJ, Wilkinson KA, Coughlan JL, Weeks KM. 2005. RNA struc-ture analysis at single nucleotide resolution by selective 2′-hydroxylacylation and primer extension (SHAPE). J Am Chem Soc 127:4223–4231.
Meyer M, Nielsen H, Olieric V, Roblin P, Johansen SD, Westhof E,Masquida B. 2014. Speciation of a group I intron into a lariat cappingribozyme. Proc Natl Acad Sci 111: 7659–7664.
Michel F, Westhof E. 1990. Modelling of the three-dimensional archi-tecture of group I catalytic introns based on comparative sequenceanalysis. J Mol Biol 216: 585–610.
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C,Zecchina R, Onuchic JN, Hwa T, Weigt M. 2011. Direct-couplinganalysis of residue coevolution captures native contacts acrossmany protein families. Proc Natl Acad Sci 108: E1293–E1301.
Mortimer SA, Weeks KM. 2007. A fast-acting reagent for accurate anal-ysis of RNA secondary and tertiary structure by SHAPE chemistry.J Am Chem Soc 129: 4144–4145.
Moult J. 2008. Comparative modeling in structural genomics. Structure16: 14–16.
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. 2014.Critical assessment of methods of protein structure prediction(CASP)—round x. Proteins 82: 1–6.
Nahvi A, Sudarsan N, Ebert MS, Zou X, Brown KL, Breaker RR. 2002.Genetic control by a metabolite binding mRNA. Chem Biol 9: 1043.
Nahvi A, Barrick JE, Breaker RR. 2004. Coenzyme B12 riboswitches arewidespread genetic control elements in prokaryotes. Nucleic AcidsRes 32: 143–150.
Parisien M, Major F. 2008. The MC-Fold and MC-Sym pipeline infersRNA structure from sequence data. Nature 452: 51–55.
Miao et al.
18 RNA, Vol. 21, No. 6
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
Parisien M, Cruz JA, Westhof E, Major F. 2009. New metrics for com-paring and assessing discrepancies between RNA 3D structures andmodels. RNA 15: 1875–1885.
Peselis A, Serganov A. 2012. Structural insights into ligand binding andgene expression control by an adenosylcobalamin riboswitch. NatStruct Mol Biol 19: 1182–1184.
Popenda M, Blazewicz M, Szachniuk M, Adamiak RW. 2008. RNAFRABASE version 1.0: an engine with a database to search for thethree-dimensional fragments within RNA structures. Nucleic AcidsRes 36: D386–D391.
Popenda M, Szachniuk M, Antczak M, Purzycka KJ, Lukasiak P,Bartol N, Blazewicz J, Adamiak RW. 2012. Automated 3D structurecomposition for large RNAs. Nucleic Acids Res 40: e112.
Proctor EA, Ding F, DokholyanNV. 2011. Discrete molecular dynamics.Wiley Interdiscip Rev Comput Mol Sci 1: 80–92.
Puton T, Kozlowski LP, Rother KM, Bujnicki JM. 2013. CompaRNA:a server for continuous benchmarking of automated methodsfor RNA secondary structure prediction. Nucleic Acids Res 41:4307–4323.
Ramachandran S, Ding F, Weeks KM, Dokholyan NV. 2013. Statisticalanalysis of SHAPE-directed RNA secondary structure modeling.Biochemistry 52: 596–599.
Ravnum S, Andersson DI. 2001. An adenosyl-cobalamin (coenzyme-B12)-repressed translational enhancer in the cob mRNA of Salmo-nella typhimurium. Mol Microbiol 39: 1585–1594.
Reuter JS, Mathews DH. 2010. RNAstructure: software for RNA second-ary structure prediction and analysis. BMC Bioinformatics 11: 129.
Rice GM, Leonard CW, Weeks KM. 2014. RNA secondary structuremodeling at consistent high accuracy using differential SHAPE.RNA 20: 846–854.
Rocca-Serra P, Bellaousov S, Birmingham A, Chen C, Cordero P, Das R,Davis-Neulander L, Duncan CD, HalvorsenM, Knight R, et al. 2011.Sharing and archiving nucleic acid structure mapping data. RNA 17:1204–1212.
Rother M, Rother K, Puton T, Bujnicki JM. 2011. ModeRNA: a tool forcomparative modeling of RNA 3D structure. Nucleic Acids Res 39:4007–4022.
Rother K, Rother M, Boniecki M, Puton T, Tomala K, Lukasz P,Bujnicki JM. 2012. Template-based and template-free modeling ofRNA 3D structure: inspirations from protein structure modeling.In RNA 3D structure analysis and prediction (ed. Leontis NB,Westhof E). Springer, Berlin.
Rozhdestvensky TS, Tang TH, Tchirkova IV, Brosius J, Bachellerie JP,Hüttenhofer A. 2003. Binding of L7Ae protein to the K-turn of ar-chaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea. Nucleic Acids Res 31: 869–877.
Seetin MG, Kladwang W, Bida JP, Das R. 2014. Massively parallel RNAchemical mapping with a reduced bias MAP-seq protocol. MethodsMol Biol 1086: 95–117.
Shirvanyants D, Ding F, Tsao D, Ramachandran S, Dokholyan NV.2012. Discrete molecular dynamics: an efficient and versatile simu-lation method for fine protein characterization. J Phys Chem B 116:8375–8382.
Sripakdeevong P, Kladwang W, Das R. 2011. An enumerative stepwiseansatz enables atomic-accuracy RNA loop modeling. Proc NatlAcad Sci 108: 20573–20578.
Tian S, Cordero P, KladwangW, Das R. 2014. High-throughput mutate-map-rescue evaluates SHAPE-directed RNA structure and uncoversexcited states. RNA 20: 1815–1826.
Vidovic I, Nottrott S, Hartmuth K, Lührmann R, Ficner R. 2000. Crystalstructure of the spliceosomal 15.5kD protein bound to a U4 snRNAfragment. Mol Cell 6: 1331–1342.
Vitreschak AG, Rodionov DA, Mironov AA, Gelfand MS. 2003.Regulation of the vitamin B12 metabolism and transport in bacteriaby a conserved RNA structural element. RNA 9: 1084–1097.
Wang JC, Henkin TM, Nikonowicz EP. 2010. NMR structure and dy-namics of the Specifier Loop domain from the Bacillus subtilis tyrST box leader RNA. Nucleic Acids Res 38: 3388–3398.
Xu XJ, Zhao PN, Chen SJ. 2014. Vfold: a web server for RNA structureand folding thermodynamics prediction. PLoS One 9: e107504.
Yang X, Gerczei T, Glover LT, Correll CC. 2001. Crystal structures ofrestrictocin-inhibitor complexes with implications for RNA recogni-tion and base flipping. Nat Struct Biol 8: 968–973.
Zhang JW, Ferré-D’Amaré AR. 2013. Co-crystal structure of a T-boxriboswitch stem I domain in complex with its cognate tRNA.Nature 500: 363–366.
Zhao YJ, Huang YY, Gong Z, Wang YJ, Man JF, Xiao Y. 2012.Automated and fast building of three-dimensional RNA structures.Sci Rep 2: 734.
Zok T, Popenda M, Szachniuk M. 2014. MCQ4Structures to computesimilarity of molecule structures. Cent Eur J Oper Res 22: 457–473.
Zuker M. 2003. Mfold web server for nucleic acid folding and hybridi-zation prediction. Nucleic Acids Res 31: 3406–3415.
RNA-Puzzles Round II
www.rnajournal.org 19
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
published online April 16, 2015RNA Zhichao Miao, Ryszard W. Adamiak, Marc-Frédérick Blanchet, et al. programs applied to three large RNA structures
Round II: assessment of RNA structure predictionRNA-Puzzles
Material
Supplemental
http://rnajournal.cshlp.org/content/suppl/2015/04/03/rna.049502.114.DC1
P<P
Published online April 16, 2015 in advance of the print journal.
Open Access
Open Access option.RNAFreely available online through the
License
Commons Creative
.http://creativecommons.org/licenses/by/4.0/(Attribution 4.0 International), as described at
, is available under a Creative Commons LicenseRNAThis article, published in
ServiceEmail Alerting
click here.right corner of the article or
Receive free email alerts when new articles cite this article - sign up in the box at the top
http://rnajournal.cshlp.org/subscriptions go to: RNATo subscribe to
© 2015 Miao et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society
Cold Spring Harbor Laboratory Press on May 17, 2017 - Published by rnajournal.cshlp.orgDownloaded from
RNA-Puzzles Round II: Assessment of RNA structure prediction
programs applied to three large RNA structures
ZhichaoMiao1,RyszardW.Adamiak2,Marc-FrédérickBlanchet3,MichalBoniecki4,JanuszM.Bujnicki4,5, Shi-Jie Chen6, Clarence Cheng7, Grzegorz Chojnowski4, Fang-Chieh Chou7, PabloCordero7, José Almeida Cruz1, Adrian Ferre-D'Amare8, Rhiju Das7, Feng Ding9, Nikolay V.Dokholyan10,StanislawDunin-Horkawicz4,WipapatKladwang7,AndreyKrokhotin10,GrzegorzLach4, Marcin Magnus4, François Major3, Thomas H. Mann7, Benoît Masquida11, DorotaMatelska4, Mélanie Meyer12, Alla Peselis13, Mariusz Popenda2, Katarzyna J. Purzycka2,AlexanderSerganov13, JuliuszStasiewicz4,MartaSzachniuk15,ArpitTandon10,SiqiTian7, JianWang14,YiXiao14,XiaojunXu6,JinweiZhang8,PeinanZhao6,TomaszZok15andEricWesthof1,*1ArchitectureetRéactivitédel'ARN,UniversitédeStrasbourg,Institutdebiologiemoléculaireet cellulaire du CNRS, 67000 Strasbourg France; 2Department of Structural Chemistry andBiology of Nucleic Acids, Structural Chemistry of Nucleic Acids Laboratory, Institute ofBioorganicChemistry,PolishAcademyofSciences,Poznan,Poland;3InstituteforResearchinImmunologyandCancer(IRIC),DepartmentofComputerScienceandOperationsResearch,UniversitédeMontréal,Montréal,QuébecH3C3J7,Canada;4LaboratoryofBioinformaticsandProteinEngineering,InternationalInstituteofMolecularandCellBiologyinWarsaw,02-109Warsaw, Poland; 5Laboratory of Bioinformatics, Institute of Molecular Biology andBiotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland;6DepartmentofPhysicsandAstronomy,DepartmentofBiochemistry,UniversityofMissouriInformatics Institute, University of Missouri-Columbia, MO 65211, U.S.A.; 7Department ofPhysics,StanfordUniversity,Stanford,California94305,USA;8NationalHeart,LungandBloodInstitute,50SouthDrive,MSC8012,Bethesda,Maryland20892-8012,USA;9DepartmentofPhysics and Astronomy at Clemson University, College of Engineering and Science, USA;10Department of Biochemistry and Biophysics, University of North Carolina, School ofMedicine,ChapelHill,NorthCarolina,USA;11GénétiqueMoléculaireGénomiqueMicrobiologie,Institutdephysiologieetdelachimiebiologique,21rueRenéDescartes67084Strasbourg,France;12Institut de génétiqueet debiologiemoléculaire et cellulaire, 1Rue Laurent Fries,67400Strasbourg,France;13DepartmentofBiochemistryandMolecularPharmacology,NewYork University School of Medicine, New York, New York, USA, 14Department of Physics,Huazhong University of Science and Technology, Wuhan, China; 15Poznan University ofTechnology,InstituteofComputingScience,Poznan,Poland.
SupplementaryinformationTABLES1.Summaryofexperimentaldata base-pair helix
sensitivity ppv sensitivity ppv
Puzzle5
Nodata 5.4Fold 50.0% 48.4% 46.2% 42.9%
5.6Fold 50.0% 48.4% 46.2% 42.9%
5.6ShapeKnots 50.0% 50.0% 46.2% 42.9%
1DSHAPE 5.4Fold 45.0% 41.5% 38.5% 35.7%
5.6Fold 45.0% 41.5% 69.2% 35.7%
5.6ShapeKnots 78.3% 72.3% 46.2% 69.2%
1DDMS/CMCT 5.4Fold 58.3% 53.9% 46.2% 42.9%
5.6Fold 58.3% 53.9% 76.9% 42.9%
5.6ShapeKnots 81.7% 79.0% 61.5% 76.9%
1DSHAPE/DMS/CMCT 5.6Fold 66.7% 60.6% 61.5% 53.3%
5.6ShapeKnots 73.3% 67.7% 69.2% 66.7%
2DSHAPE 5.4Fold 76.7% 74.2% 69.2% 69.2%
5.6Fold 76.7% 74.2% 84.6% 69.2%
5.6ShapeKnots 90.0% 85.7% 92.3% 91.7%
2DDMS 5.4Fold 81.7% 77.8% 76.9% 71.4%
5.6Fold 81.7% 77.8% 76.9% 71.4%
5.6ShapeKnots 93.3% 86.2% 92.3% 85.7%
Puzzle6 Nodata 5.4Fold 73.7% 73.7% 54.6% 54.6%
5.6Fold 73.7% 73.7% 54.6% 54.6%
5.6ShapeKnots 73.7% 73.7% 54.6% 54.6%
1DSHAPE 5.4Fold 77.2% 81.5% 63.6% 77.8%
5.6Fold 73.7% 73.7% 54.6% 54.6%
5.6ShapeKnots 79.0% 81.8% 63.6% 77.8%
1DDMS/CMCT 5.4Fold 63.2% 64.3% 54.6% 50.0%
5.6Fold 63.2% 64.3% 54.6% 50.0%
5.6ShapeKnots 71.9% 70.7% 54.6% 54.6%
1DSHAPE/DMS/CMCT 5.6Fold 49.1% 57.1% 36.4% 33.3%
5.6ShapeKnots 63.2% 72.0% 63.6% 58.3%
2DSHAPE 5.4Fold 96.5% 96.5% 90.9% 90.9%
5.6Fold 96.5% 96.5% 90.9% 90.9%
5.6ShapeKnots 96.5% 96.5% 90.9% 90.9%
Puzzle10 Nodata 5.4Fold 83.0% 79.6% 88.9% 80.0%
5.6Fold 83.0% 79.6% 88.9% 80.0%
5.6ShapeKnots 83.0% 79.6% 88.9% 80.0%
1DSHAPE 5.4Fold 100.0% 97.9% 100.0% 100.0%
5.6Fold 87.2% 82.0% 88.9% 80.0%
5.6ShapeKnots 97.9% 92.0% 100.0% 90.0%
1DDMS/CMCT 5.4Fold 72.3% 66.7% 77.8% 70.0%
5.6Fold 72.3% 66.7% 77.8% 70.0%
5.6ShapeKnots 85.1% 71.4% 88.9% 72.7%
1DSHAPE/DMS/CMCT 5.6Fold 55.3% 55.3% 55.6% 50.0%
5.6ShapeKnots 63.8% 61.2% 66.7% 60.0%
Table S2. Structure Refinement based on Das lab models done by Bujnicki lab.
Problem Lab Num RMSD P-value DI INF INF_wc INF_nwc INF_stacking clash pct_badbonds pct_resbadbonds pct_badangles pct_resbadangles
5 DasRef* 1 9.96 0.00E+000 13.002 0.766 0.919 0.256 0.766 0 0.7 7.45 8.33 100
5 DasRef 2 9.165 0.00E+000 11.94 0.768 0.908 0.344 0.757 0.17 0.25 1.6 7.07 100
mean 0.475 4.525 7.7 100
5 Das 1 9.948 0.00E+000 13.148 0.757 0.919 0.256 0.751 9.44 0.74 9.57 1.48 27.66
5 Das 2 9.152 0.00E+000 12.019 0.761 0.906 0.334 0.751 6.79 0.49 6.38 1.52 30.85
mean 0.615 7.975 1.5 29.255
Problem Lab Num RMSD P-value DI INF INF_wc INF_nwc INF_stacking clash pct_badbonds pct_resbadbonds pct_badangles pct_resbadangles
6 DasRef 1 14.504 3.23E-009 19.022 0.762 0.897 0.433 0.746 3.85 0.41 2.38 7.88 100
6 DasRef 2 13.683 1.92E-010 17.889 0.765 0.905 0.361 0.755 0 0 0 7.04 100
6 DasRef 3 15.778 1.73E-007 21.019 0.751 0.889 0.334 0.744 0.18 0.05 0.6 7.5 100
6 DasRef 4 11.714 9.59E-014 15.595 0.751 0.897 0.416 0.731 5.32 0.78 5.95 9.34 100
6 DasRef 5 15.862 2.22E-007 20.509 0.773 0.897 0.433 0.763 0 0 0 6.8 100
6 DasRef 6 12.421 1.68E-012 16.464 0.754 0.897 0.433 0.734 3.85 0.37 1.79 7.77 100
6 DasRef 7 17.959 5.12E-005 24.221 0.741 0.877 0.316 0.732 0 0 0 7.15 100
6 DasRef 8 17.97 5.26E-005 23.451 0.766 0.897 0.416 0.755 0 0.82 5.95 8.32 100
6 DasRef 9 15.107 2.26E-008 19.785 0.764 0.905 0.316 0.755 0 0.05 0.6 6.72 100
6 DasRef 10 29.226 9.91E-001 39.946 0.732 0.897 0.312 0.718 0.37 0.41 2.38 8.02 100
mean 0.289 1.965 7.654 100
6 Das 1 14.478 2.96E-009 19.893 0.728 0.885 0.333 0.705 23.85 0.64 8.33 0.87 10.71
6 Das 2 13.627 1.57E-010 18.774 0.726 0.885 0.347 0.705 17.24 0.6 7.14 0.84 10.12
6 Das 3 15.752 1.60E-007 21.759 0.724 0.874 0.217 0.72 17.05 0.55 7.14 0.62 9.52
6 Das 4 11.699 9.02E-014 16.151 0.724 0.885 0.316 0.702 23.48 0.64 8.33 0.92 11.31
6 Das 5 15.834 2.04E-007 21.147 0.749 0.885 0.361 0.738 13.39 0.37 4.76 0.65 8.93
6 Das 6 12.405 1.58E-012 17.08 0.726 0.885 0.333 0.702 24.59 0.64 8.33 0.92 11.9
6 Das 7 17.875 4.22E-005 24.363 0.734 0.869 0.237 0.731 13.21 0.55 7.14 0.6 8.93
6 Das 8 17.956 5.08E-005 24.06 0.746 0.877 0.334 0.743 13.02 0.41 5.36 0.62 8.33
6 Das 9 15.048 1.88E-008 20.942 0.719 0.885 0.347 0.694 17.98 0.6 7.14 0.79 10.12
6 Das 10 29.182 9.91E-001 41.226 0.708 0.877 0.25 0.698 19.82 0.64 7.74 0.62 6.55
mean 0.564 7.141 0.745 9.642
Problem Lab Num RMSD P-value DI INF INF_wc INF_nwc INF_stacking clash pct_badbonds pct_resbadbonds pct_badangles pct_resbadangles
10 DasRef 1 7.64 0.00E+000 8.876 0.861 0.929 0.7 0.861 0 0 0 6.67 98.83
10 DasRef 2 10.539 1.50E-015 12.191 0.864 0.938 0.802 0.847 0 0 0 6.59 98.83
10 DasRef 3 6.837 0.00E+000 8.157 0.838 0.936 0.717 0.823 0 0 0 6.46 98.83
10 DasRef 4 7.077 0.00E+000 8.305 0.852 0.938 0.717 0.842 0 0 0 6.7 98.83
10 DasRef 5 10.482 1.17E-015 11.944 0.878 0.938 0.778 0.87 0 0 0 6.73 98.83
mean 0 0 6.63 98.83
10 Das 1 7.58 0.00E+000 9.199 0.824 0.92 0.7 0.811 11.64 0.36 4.09 0.56 8.77
10 Das 2 10.447 9.99E-016 12.588 0.83 0.929 0.778 0.803 11.64 0.36 4.09 0.56 8.19
10 Das 3 6.803 0.00E+000 8.365 0.813 0.946 0.7 0.786 11.09 0.41 4.68 0.64 9.36
10 Das 4 7.062 0.00E+000 8.539 0.827 0.948 0.684 0.809 10.73 0.41 4.68 0.56 9.94
10 Das 5 10.417 8.88E-016 12.295 0.847 0.948 0.778 0.823 10.91 0.27 2.92 0.45 7.02
mean 0.362 4.092 0.554 8.656
pct_badbonds: percentage of bad bonds
pct_resbadbonds: percentage of residues with bad bonds
pct_badangles: percentage of bad angles
pct_resbadangles: percentage of residues with bad angles
*The refinements were done by Bujnicki lab to alleviate the atomic clashes in models of Das lab and named as DasRef.
Figure S1. Chemical mapping data and secondary structure predictions of RNA Puzzle 5. (A) Secondary structure from crystallographic structures. (B) Secondary structure prediction using no experimental data with RNAstructure
5.4 or 5.6 Fold. Nucleotides are colored according with SHAPE reactivities. Crystallographic pairings missing in this model and new non-crystallographic pairings are drawn as yellow and blue lines, respectively. Percentage labels give bootstrap support values.
(C) Secondary structure prediction using no data with RNAstructure 5.6 ShapeKnots.
(D) Secondary structure prediction using 1D SHAPE data with RNAstructure 5.6 Fold.
(E) Secondary structure prediction using 1D SHAPE data with RNAstructure 5.6 ShapeKnots.
(F) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.4 Fold.
(G) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.6 Fold.
(H) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.6 ShapeKnots.
(I) Secondary structure prediction using 1D SHAPE and DMS/CMCT data with RNAstructure 5.6 Fold.
(J) Secondary structure prediction using 1D SHAPE and DMS/CMCT data with RNAstructure 5.6 ShapeKnots.
(K) Secondary structure prediction using 2D SHAPE M2 data with RNAstructure 5.6 Fold.
(L) Secondary structure prediction using 2D SHAPE M2 data with RNAstructure 5.6 ShapeKnots.
(M) Mutate-and-map (M2) dataset probed by the DMS. (N) Secondary structure prediction using 2D SHAPE M2 data with RNAstructure
5.4 Fold. (O) Secondary structure prediction using 2D DMS M2 data with RNAstructure 5.6
Fold. (P) Secondary structure prediction using 2D DMS M2 data with RNAstructure 5.6
ShapeKnots. Figure S2. Chemical mapping data and secondary structure predictions of RNA Puzzle 6.
(A) Secondary structure from crystallographic structures. (B) Secondary structure prediction without experimental data with RNAstructure
5.4 or 5.6 Fold. Nucleotides are colored with SHAPE reactivities. Crystallographic pairings missing in this model and new non-crystallographic pairings are drawn as yellow and blue lines, respectively. Percentage labels give bootstrap support values.
(C) Secondary structure prediction using no data with RNAstructure 5.6 ShapeKnots.
(D) Secondary structure prediction using 1D SHAPE data with RNAstructure 5.6 Fold.
(E) Secondary structure prediction using 1D SHAPE data with RNAstructure 5.6 ShapeKnots.
(F) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.4 Fold.
(G) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.6 Fold.
(H) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.6 ShapeKnots.
(I) Secondary structure prediction using 1D SHAPE and DMS/CMCT data with RNAstructure 5.6 Fold.
(J) Secondary structure prediction using 1D SHAPE and DMS/CMCT data with RNAstructure 5.6 ShapeKnots.
(K) Secondary structure prediction using 2D SHAPE M2 data with RNAstructure 5.6 Fold.
(L) Secondary structure prediction using 2D SHAPE M2 data with RNAstructure 5.6 ShapeKnots.
Figure S3. Chemical mapping data and secondary structure predictions of RNA Puzzle 10.
(A) Secondary structure from crystallographic structures. (B) Secondary structure prediction without experimental data with RNAstructure
5.4 or 5.6 Fold. Nucleotides are colored with SHAPE reactivities. Crystallographic pairings missing in this model and new non-crystallographic pairings are drawn as yellow and blue lines, respectively. Percentage labels give bootstrap support values.
(C) Secondary structure prediction using no data with RNAstructure 5.6 ShapeKnots.
(D) Secondary structure prediction using 1D SHAPE data with RNAstructure 5.6 Fold.
(E) Secondary structure prediction using 1D SHAPE data with RNAstructure 5.6 ShapeKnots.
(F) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.4 Fold.
(G) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.6 Fold.
(H) Secondary structure prediction using 1D DMS/CMCT data with RNAstructure 5.6 ShapeKnots.
(I) Secondary structure prediction using 1D SHAPE and DMS/CMCT data with RNAstructure 5.6 Fold.
(J) Secondary structure prediction using 1D SHAPE and DMS/CMCT data with RNAstructure 5.6 ShapeKnots.
Figure S4. Chemical mapping data and secondary structure predictions of RNA Puzzle 5.
(A) Normalized reactivity of RNA Puzzle 5 RNA, using SHAPE (1M7), DMS and CMCT in 1-dimensional chemical mapping. Reactivities were normalized to GAGUA referencing hairpins (not shown).
(B) Secondary structure prediction using 1-dimensional SHAPE (1M7) data. Nucleotides are colored with SHAPE reactivities. Crystallographic pairings missing in this model and new non-crystallographic pairings are drawn as yellow and blue lines, respectively. Percentage labels give bootstrap support values.
(C) Mutate-and-map (M2) dataset probed by the SHAPE reagent 1M7. (D) Secondary structure prediction using 2D SHAPE M2 data.
Figure S5. Chemical mapping data and secondary structure predictions of RNA Puzzle 6.
(A) Normalized reactivity of RNA Puzzle 6 RNA, using SHAPE (1M7), DMS and CMCT in 1-dimensional chemical mapping, in presence of 60 μM adenosylcobalamin. Reactivities were normalized to GAGUA referencing hairpins (not shown).
(B) Secondary structure prediction using 1-dimensional SHAPE (1M7) data. Nucleotides are colored with SHAPE reactivities. Crystallographic pairings missing in this model and new non-crystallographic pairings are drawn as yellow and blue lines, respectively. Percentage labels give bootstrap support values.
(C) Mutate-and-map (M2) dataset probed by the SHAPE reagent 1M7, in presence of 60 μM adenosylcobalamin.
(D) Secondary structure prediction using 2D SHAPE M2 data.
Figure S6. Chemical mapping data and secondary structure predictions of RNA Puzzle 10. (A) Normalized reactivity of RNA Puzzle 10 RNA, using SHAPE (1M7), DMS
and CMCT in 1-dimensional chemical mapping, in presence of 1 μM partner RNA strand. Reactivities were normalized to GAGUA referencing hairpins (not shown).
(B) Secondary structure prediction using 1-dimensional SHAPE (1M7) data. Nucleotides are colored with SHAPE reactivities. Crystallographic pairings missing in this model and new non-crystallographic pairings are drawn as yellow and blue lines, respectively. Percentage labels give bootstrap support values.