Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Precision and Accuracyof NMR Structures
NMR Structure Determination in the NESG
NMR
Protein Production
Structures!
Structure ValidationSPINS
HarvestDB
Structure Gallery
PDB entry BMRB entry
www.nesg.org
1
Structure Determination& Validation
Overview
(i) The Problem of Precision and Accuracy in Protein NMR
(ii) Assessing Precision of NMR Structures
(iii) Assessing Accuracy of NMR Structures- PSVS software- RPF software
(iv) Summary
Shortcomings of Protein NMR Field
No Standard Conventions for Estimating PrecisionPrecision: Uncertainty in atomic positions indicated by the
uncertainty in the data and underlying structural assumptions
How tightly the shots cluster
No Standard Conventions for Estimating AccuracyAccuracy: Similarity of model to the actual structure(s)
present in the NMR tubeHow close you get to the bull’s eye
Challenges and IssuesEstimating Precision
RMSD - many issues.- which atoms to include in superimposition? BB vs. heavy vs. H’s- should we define “core” as a mix of BB and SC? Include H atoms?- no standard of sampling: i.e., X structures / Y structures calculated- how to represent disordered regions? Single structure with atom-specific uncertainties?
Estimating Accuracy- Relaxation matrix (CORMA); R-fac; RPF; validation with RDC data. - Knowledge-based assessment (ProCheck, WhatIf, MolProbity, etc), cf. crystal structures.
Constraint violations Back calculation of NOEs - Relaxation Matrix- Constraints are interpreted data - Compare to NOESY Peak List?- No standard for calibrating constraints - Exchange broadening, lineshape,
Constraints per residue differential relaxation effects- Conformationally-restraining - Diagonal, ridges, overlap, residual water,- Constraints per restrained residue Cross validation with RDC - How to define restrained residue? - Not measured universally
ProCheck / MAGE Back calculation of Chemical Shift- Derived from crystal structures H-bond Geometry- Bona fide differences biologically relevant?- Which residues to include/exclude?
How to Assess Precision?
David Snyder, Roberto Tejero
Defining “Core” for Superimposition Using Dihedral Angle Order Parameter
Hyberts and Wagner1992
Best convention to date
“Ordered Residue”S(φ) + S(ψ) > 1.8
How to distinguish surface loop from interdomain linker?
Stopping rule- 1 core or 2 cores? Or more?
Should we restrict core atom sets to backbone atoms?
D.A. Snyder and G.T. Montelione,PROTEINS 2005, 59: 673-686“Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles.”
Inter-atomic Variance Matrix (IVM) matrix of variances in inter-atomic distances. Can be used to partition core atoms into “ordered” vs“disordered”, and to identify “domains”– Nilges, Clore, Gronenborn, 1987– Gerstein & Altman, 1995– Kelly, Sutcliff, et al 1996, 1997– Gelfand, et al 1998
Find Core: can define BB or heavy atom core atom sets
FindCore - Identify “Well Defined’ Core Atom Sets
How to Assess Precision?
We have “convention” to define “core atom sets”for superimposition, but no convention for
generating the ensemble, and no standard of precision
How to Assess Accuracy?
Yuanpeng Janet Huang, Aneeban Bhattacharya,
Dehua Hang, Roberto Tejero
Protein Structure Validation Software (PSVS) Suite
A. Bhattacharya, R. Tejero, G.T. Montelione (2007)
PROTEINS 66:778-795.
Protein Structure Validation Software (PSVS)Bhattacharya, Tejero, Montelione
PROTEINS. 2007Tool(s) Parameter(s) evaluatedPDBStat and FindCore(Tejero and Montelione;Snyder and Montelione)
Analyze number of conformationally-restricting constraints, violations of constraints, define ordered regions of structure and calculate RMSD of atomic coordinates, identify conformationally restricting restraints
RPF(Huang, et al, JACS 2005 127:1665)
Goodness-of-fit of NMR structure with NOESY data
DSSP(Kabsch & Sander, Biopolymers 1983 22: 2577)
Calculate secondary structure
PROCHECK G-factors (for backbone and all dihedrals)(Laskowski et al, 1996 JBNMR 8: 477)
Probability of dihedral angles of a residue type to be within a given range
MolProbity (MAGE, prekin, probe, reduce)(Lovell S C, et al, Proteins 2003 50: 437)(Word et al, 2000 Prot Sci 9: 2251)
Calculate and visualize bad contacts and atomic overlaps, and Cβ deviations
Verify3D(Luthy et al, 1992 Nature 356: 83)
Likelihood of the amino acid sequence to have the three-dimensional packing seen in the structure
ProsaII(Sippl, 1990 J Mol Biol 213: 859)
Energy of pair-wise interaction from the spatial separation of atoms (Cβ atoms)
PDB validation software Close contacts, deviations of bond length and bond angle from ideality
Protein Structure Validation Software Suite (PSVS)
“Rules of Thumb”
I. ProCheck(all) and MolProbitybest distinguish low, med, high resolution crystal structures
II. Verify3D and ProsaIIbest distinguish incorrrect folds
ProCheck and MolProbity Z ScoresX-ray
Bhattacharya, Tejero, MontelionePROTEINS. 2007
< 1.8 Ang 1.8 - 2.5 Ang 2.5
- 3.5 Ang Structures determined with higher resolution data have better Z scores, suggesting that these scores do in fact track structure accuracy
ProCheck and MolProbity Z Scores
X-ray NMRFollowing NMR Structure Refinement
Why NMR different from X-ray?
- “Solution structure”
- Multiple conformational states?
- Less accurate structures?Bhattacharya, Tejero, MontelionePROTEINS. 2007
-6
-5
-4
-3
-2
-1
0
20 00 20 01 20 0 2 20 03 20 04 20 05 20 06 A verag e
X -R a yNM R
-2 0
-1 6
-1 2
-8
-4
0
2 0 0 0 2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 4 2 0 0 5 2 0 0 6 A ve ra g e
X -R a yN M R
Quality Scores for NESG NMR Structures Continue to Increase
ProCheck All Dihedrals MolProprobity Clash Score
2006:red 2005:green 2004:blue 2003:black 2002:magenta 2000-2001:yellowPr
oche
ck(A
ll)
MolProbity
151 NMR - Crystal Structure Pairs
Filtered to be in same ligand state, similar pH
Analysis for FindCore core (bb and sc) atoms only
Line - rmsd of superimposed NMR ensemble “PRECISION”
Shade - rmsdbetween median NMR conformer and Xtal structure “ACCURACY”
Andrec, Snyder, Montelione, Levy, et al
1. NMR overestimates precision of the ensemble
2. NMR provides inaccurate global structure- Ensemble averaging- Just plain wrong
3. Xray is inaccurate
4. Crystallization shifts global conformational equilibria
Need to compare NMR parameters in solution and crystal - ssNMR
NMR RPF Scores: Protein NMR Structure Quality Assessment by Rapid Comparison
of NOESY and 3D Structure Data
Y. J. Huang, R. Powers, G.T. Montelione (2005)
J. Am. Chem. Soc. 127: 1665-74
NMR “R-factors” - RPF Quality scores
3DStructure
NOESY Peak List /
Assignment List
Essentially, acomparison ofcalc and observed contact maps
Goodness-of-fit of theNOESY peak listdata with 3D structure.
Violations map tothe 3D structure andto the NOESY spectrum
NMR “R-factors” - RPF Quality scores
Recall percentage of peaks detected in the NMR experiments that are consistent with the interproton distances of the 3D structures; i.e. NOESY peaks not consistent with the 3D structure. TP / (TP + FN)
Precision percentage of close distance proton pairs in the query structures whose back calculated NOE cross peaks are also actually detected in NMR experiments, weighted by their distances d(h1, h2) -6; i.e. short distances in the 3D structure with no corresponding NOESY cross peak. TP / (TP + FP)
F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the query model structure and the experimental data. (2 x Recall x Precision) / (Recall + Precision)
DP-score measures how the query structure is distinguished from the freely-rotating chain model, and scaled to the completeness of the NOESY data (normalized F-measure score).
3DStructure
NOESY Peak List /
Assignment List
NNN
C CC
Recall and Precision Violations
Recall = 0.825, Precision = 0.971F = 0.892 and DP = 0.723
Recall = 0.769, Precision = 0.969F = 0.857 and DP = 0.629
Recall = 0.729, Precision = 0.917F = 0.812 and DP = 0.508
FGF-2
0
20
40
60
80
100
recall precision F-score DP-score
0
20
40
60
80
100
recall precision F-score DP-score
%
%
%
0
20
40
60
80
100
recall precision F-score DP-scoreFreely Rotating Chain Incorrect Fold I (beta)Incorrrect Fold II (alpha) Incorrect Fold III (alpha+beta)AutoStructure/DYANA AutoStructure/XPLORExpert I Expert IIG-Ideal
IL13
MMP-1
Sensitivity of the quality scores
< 2 Å Partially correct Different fold
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12
rmsd
DP
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12
rmsd
F-m
easu
re
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12
rmsd
Rec
all
00.1
0.20.3
0.40.5
0.60.7
0.80.9
1
0 2 4 6 8 10 12
rmsd
Prec
isio
n
FGF-2
MMP-1IL-13
(-0.795) (-0.459)
(-0.882)(-0.866)
Sensitivity of the quality scores
Recall
F-measure
Precision
DP-score
Quality control of AutoStructure trajectories using AutoQF Scores
0102030405060708090
100
F-scor
e (%
)
0102030405060708090
100
0 1 2 3 4 5 6 7 8 9 10
AutoStructure cycle
DP-
scor
e (%
)
FGF-2 M MP-1 IL-13
0102030405060708090
100
Peak
s assigne
d(%
)
0123456789
10
Mea
n Diff
eren
ce(Å
)
0123456789
10
RM
SD(Å
)
FGF-2 MMP-1 IL-13
Cyc
le 1
Cy c
le 2
Cyc
le 1
0M
anua
l
Hea
vy-a
tom
R
MS
DM
ean
Diff
eren
ces
% P
eak
assi
gned
FD
P
>0.4>0.6
>0.7
>0.9
< 1.0Å
RPF Module in AutoStructure
II. Peak Pickingexample: StR5 project
false positives false negatives
HR2106 PSVS / RPF Analysis
Calculated as a Calculated as a MonomerMonomer
Calculated Calculated as a as a DimerDimer
MAGEMAGE
Clash:Clash:
Knowledge Knowledge Based Based
AssessmentAssessment
RPFRPF
PrecisionPrecision
Violations:Violations:
Goodness of fit to Goodness of fit to NOESY dataNOESY data
Human dyneinlight chain 2A
Input for PSVS / RPFXray Structure
CoordinatesResolution, R, Rfree
NMR StructureCoordinates (ensemble)Constraint List
Dyana, Cyana, Xplor, CNS format
RPF AnalysisResonance Assignments
BioMagResDB formatNOESY Peak List
Frequencies, IntensitiesPSVS runs on any Web browserPSVS results in minutesE-mail sent to user
Summary
(i) Precision: RMSD; FindCore(ii) Accuracy: PSVS; RPF
value of multiple structure quality assessment scores
Other issues:• BMRB: AVS software• Presentation of Structures: stereoview(v) Descriptions in PDB header.
- exact ordered and disordered residue ranges
Summary
Summary
Acknowledgments
Gaetano Montelione
Software Developers: NMR Group:Hunter Moseley Paolo RossiJanet Huang Swapna GurlaMike BaranDehua HangRoberto Tejero Protein Production Group:
Tom ActonRong Xiao