28
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Page 2: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

Automated NMR assignment and protein structure determination using sparsedipolar coupling constraints

Bruce R. Donald *, Jeffrey MartinDepartments of Computer Science and Biochemistry, Duke University, D101 LSRC, Research Drive, Durham, NC 27708-0129, USA

a r t i c l e i n f o

Article history:Received 25 September 2008Accepted 15 December 2008Available online 23 January 2009

Published by Elsevier B.V.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

2. The power of exact solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1052.1. Computing the globally-optimal solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062.2. Limitations and extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3. NMR structure determination algorithms using sparse RDCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084. Nuclear vector replacement for automated NMR assignment and structure determination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115. Protein fold determination via unassigned residual dipolar couplings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126. Automated NOE assignment using a rotamer library ensemble and RDCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137. NMR structure determination of symmetric homo-oligomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1158. Applications and connections to other biophysical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1159. Looking under the hood: how the algorithms work, and outlook for future developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

9.1. Exact solutions for computing backbone dihedral angles from RDCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169.2. Nuclear vector replacement and fold recognition using unassigned RDCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189.3. Automated NOE assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209.4. NMR structure determination of symmetric oligomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

1. Introduction

The introduction of residual dipolar couplings (RDCs) for pro-tein structure determination over 10 years ago has energizeddevelopment of NMR methods. Robust automation of the completeNMR structure determination procedure has been a long-standing

goal, and RDC-based algorithms may increase the consistency andreliability of NMR structural studies. It has also been recognizedthat structure determination based primarily on orientational re-straints could be quicker and more accurate than traditionaldistance-restraint methods. Furthermore, NMR is increasinglyimportant in applications where structural information is alreadyavailable, so that methods which effectively automate NMR assign-ment of known structures would also be a substantial contribution.

Since RDCs are measured in a global coordinate frame, theyenable molecular replacement-like methods that perform assign-ments using structural priors. Furthermore, recent methods forstructure determination have exploited novel RDC equations,which combine RDC data and protein kinematics. Under fairly mildassumptions, the dihedral torsional angles of a protein can be ana-lytically expressed as roots of these low-degree monomials. Solv-ing these equations exactly has enabled a departure from earlierstochastic methods, and led to linear-time, combinatorially-precise

0079-6565/$ - see front matter Published by Elsevier B.V.doi:10.1016/j.pnmrs.2008.12.001

Abbreviations: NMR, Nuclear Magnetic Resonance; ppm, parts per million;RMSD, mean square deviation; HSQC, heteronuclear single quantum coherencespectroscopy; NOE, Nuclear Overhauser Effect; RDC, residual dipolar coupling; PDB,Protein Data Bank; pol g, zinc finger domain of the human DNA Y-polymerase Eta;CH, Ca � Ha; hSRI, human Set2-Rpb1 interacting domain; FF2, FF Domain 2 ofhuman transcription elongation factor CA150 (RNA polymerase II C-terminaldomain interacting protein); POF, principal order frame; SA, simulated annealing;MD, molecular dynamics; SSE, secondary structure element; C0 , carbonyl carbon;WPS, well-packed satisfying; vdW, van der Waals; DOF, degrees of freedom.

* Corresponding author. Tel.: +1 919 660 6583.E-mail address: [email protected] (B.R. Donald).URL: http://www.cs.duke.edu/brd (B.R. Donald).

Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Contents lists available at ScienceDirect

Progress in Nuclear Magnetic Resonance Spectroscopy

journal homepage: www.elsevier .com/locate /pnmrs

Page 3: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

algorithms for NMR structure determination. These algorithms areoptimal in terms of combinatorial (but not algebraic) complexity,and show how structural data can be used to produce a determin-istic, optimal solution for the protein structure in polynomial time.

The coefficients of the RDC equations are determined by the data.An RDC error bound therefore defines a range of coefficients, which,in turn, yield a range of roots representing the structural dihedral an-gles. Hence, the RDC equations define an analytical relationship be-tween the RDC error distribution, and the coordinate error of theensemble of structures that satisfy the experimental restraints. Pre-cise methods that relate the experimental error to the coordinate er-ror of the computed structures therefore appear within reach. Thisarticle reviews these and other recent advances in NMR assignmentand structure determination based on sparse dipolar couplings.

Color versions of the figures in this paper are available online athttp://www.sciencedirect.com/science/journal/00796565.

1.1. Background

While automation is revolutionizing many aspects of biology,the determination of three-dimensional (3D) protein structureremains a harder, more expensive task. Novel algorithms and com-putational methods in biomolecular NMR are necessary to applymodern techniques such as structure-based drug design and struc-tural proteomics on a much larger scale. Traditional (semi-) auto-mated approaches to protein structure determination throughNMR spectroscopy require a large number of experiments and sub-stantial spectrometer time, making them difficult to fully auto-mate. A chief bottleneck in the determination of 3D proteinstructures by NMR is the assignment of chemical shifts and nuclearOverhauser effect (NOE) restraints in a biopolymer.

The introduction of residual dipolar couplings (RDCs) for proteinstructure determination enabled novel attacks on the assignmentproblem, to enable high-throughput NMR structure determination.Similarly, it is difficult to determine protein structures accuratelyusing only sparse data. New algorithms have been developed tohandle the increased spectral complexity encountered for largerproteins, and sparser information content obtained either in ahigh-throughput setting, or for larger or difficult proteins. The over-all goal is to minimize the number and types of NMR experimentsthat must be performed and the amount of human effort requiredto interpret the experimental results, while still producing an accu-rate analysis of the protein structure.

This review is tempered by our recent experiences in automatedassignments [79,82,83,118,153,174], novel algorithms for proteinstructure determination [152,156,117,89,110,151,155,154], char-acterization of protein complexes [118,99] and membrane proteins[117], and fold recognition using only unassigned NMR data[82,83,78,80]. Recent algorithms for automated assignment andstructure determination based on sparse dipolar couplings repre-sent a departure from the stochastic methods frequently employedby the NMR community (e.g., simulated annealing/moleculardynamics (SA/MD), Monte Carlo (MC), etc.) A corollary is that suchstochastic methods, now routinely employed in NMR structuredetermination pipelines [60,53,91,64], should be reconsidered inlight of their inability to assure identification of the unique or glob-ally-optimal structural models consistent with a set of NMR obser-vations. In this vein, our review focuses on sparse data. While SA/MD may perform adequately in a data-rich, highly-constrained set-ting, it is difficult to determine protein structures accurately usingonly sparse data. Sparse data arises not only in high-throughputsettings, but also for larger proteins, membrane proteins [117],symmetric protein complexes [118], and difficult systems includ-ing denatured or disordered proteins [154]. Sparse-data algorithmsrequire guarantees of completeness to ensure that solutions are notmissed and local minima are evaded.

We caution that in the context of NMR, ‘‘high-throughput” is rel-ative, and currently not as rapid as, for example, gene sequencing oreven crystallography. Hence the term ‘‘batch mode” may be moreappropriate. The challenge is to develop new algorithms and com-puter systems to exploit sparse NMR data, demonstrating the largeamount of information available in a few key spectra, and how itcan be extracted using a blend of combinatorial and geometric algo-rithms. Moreover, because of their (relative) experimental simplic-ity, we hypothesize that the computational advantages offered bysuch approaches should ultimately obtain an integrated system inwhich automated assignment and calculation of the global foldcould be performed at rates comparable to current-day proteinscreening for structural genomics using 15N-edited heteronuclearsingle quantum coherence spectroscopy (15N-HSQC).

This article reviews how sparse dipolar couplings can beexploited to address key computational bottlenecks in NMR struc-tural biology. The past few years have yielded rapid progress inautomated assignments, novel algorithms for protein structuredetermination, characterization of protein complexes and mem-brane proteins, and fold recognition using only unassigned NMRdata. We review recent algorithms that assist these advances,including: (1) Sparse-data algorithms for protein structure determi-nation from residual dipolar couplings (RDCs) using exact solutionsand systematic search; (2) RDC-based molecular replacement-liketechniques for structure-based assignment; (3) Structure determi-nation of membrane proteins and complexes, especially symmetricoligomers, enabled by RDCs; and (4) Automated assignment ofNOE restraints in both monomers and complexes, based on back-bones computed primarily using sparse RDC restraints.

These define the four main themes in our review:(1) It is difficult to determine protein structures accurately

using only sparse data. Sparse data arises not only in high-through-put settings, but also for larger proteins, membrane proteins, andsymmetric protein complexes. For de novo structure determina-tion, there are now roots-of-polynomials approaches to computeexact solutions, by systematic search, for internuclear bond vectorsand backbone dihedral angles using as few as 2 recorded RDCs perresidue (for example NH in two media, or NH and Ha—Ca in onemedium). By combining systematic search with exact solutions,it is possible to efficiently compute accurate backbone structuresusing less NMR data than in traditional approaches.

De novo structure determination from sparse dipolar couplingscan exploit structure equations derived by Wang and Donald[152,151]. These include a quartic equation to compute the inter-nuclear (e.g., bond) vectors from as few as 2 recorded RDCs per res-idue, and quadratic equations to subsequently compute proteinbackbone ð/;wÞ angles exactly [152,151]. The structure equationsmake it possible to compute, exactly and in constant time, thebackbone ð/;wÞ angles for a residue from very sparse RDCs. Simu-lated annealing, molecular dynamics, energy minimization, anddistance geometry are not required, since the structure is com-puted exactly from the data. Novel algorithms build upon these ex-act solutions, to perform protein structure determination, usingmostly RDCs but also sparse NOEs. For example, the RDC-EXACT algo-rithm employs a systematic search with provable pruning, todetermine the conformation of helices, strands, and loops and tocompute their orientations using exclusively the angular restraintsfrom RDCs [152,156]. Then, the algorithm uses very sparse dis-tance restraints between these computed segments of structure,to determine the global fold.

(2) Algorithms using sparse dipolar couplings can accelerate pro-tein NMR assignment and structure determination by exploiting apriori structural information. By analogy, in X-ray crystallography,the molecular replacement (MR) technique allows solution of thecrystallographic phase problem when a ‘‘close” or homologousstructural model is known, thereby facilitating rapid structure

102 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 4: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

determination. In contrast, a key bottleneck in NMR structural biol-ogy is the assignment problem – the mapping of spectral peaks totuples of interacting atoms in a protein. For example, peaks in a 3Dnuclear Overhauser enhancement spectroscopy (NOESY) experi-ment establish distance restraints on a protein’s structure by identi-fying pairs of protons interacting through space. An automatedprocedure for rapidly determining NMR assignments given anhomologous structure, can similarly accelerate structure determi-nation. Moreover, even when the structure has already been deter-mined by crystallography or computational homology modeling,NMR assignments are valuable because NMR can be used to probeprotein–protein interactions and protein–ligand binding (via chem-ical shift mapping or line-broadening), and dynamics (via, e.g.,nuclear spin relaxation). Molecular replacement-like approachesfor structure-based assignment of resonances and NOEs, includingstructure-based assignment (SBA) algorithms, can be applied whenan homologous protein is known. Moreover, to find structuralhomologs, it is possible to apply (filter) modules of SBA to a struc-ture database (as opposed to a single structure). This techniqueperforms rapid fold recognition by correlating structural geometryvs. distributions of unassigned NMR data, enabling detection ofhomologous structures before assignments [82,83,78,145,97,80].Hence, the algorithm finds candidate homologs using only unas-signed spectra; then SBA algorithms perform assignments giventhe structural homolog.

Several algorithms have been proposed for structure-basedassignment using RDCs [4,3,63,82,83,79,67,64]. For example, Nu-clear Vector Replacement (NVR) [79] exploits RDCs to performstructure-based assignment (backbone and NOEs) of proteinswhen a homologous structure is known, and requires only 15Nlabeling. NVR was a step in developing a molecular replacement-like method for NMR (useful because many NMR studies, especiallyin drug design and pharmacology, are of homologous proteins).NVR exploits a priori structural information. Automated proce-dures for rapidly determining NMR assignments given an homolo-gous structure, can accelerate structure determination, sinceassignments must generally be obtained before NOEs, chemicalshifts, RDCs, and scalar couplings can be employed for structuredetermination/refinement. NVR offers a high-throughput mecha-nism for the required assignment process. However, the spectralassignment produced by NVR is itself an important product: evenwhen the structure has already been determined by X-ray crystal-lography or computational homology modeling, NMR assignmentsare valuable for structure–activity relation (SAR) by NMR [133,54]and chemical shift mapping [18], which compare assigned NMRspectra for an isolated protein and a protein:ligand or protein:pro-tein complex. Both are used in high-throughput drug activityscreening to determine binding modes. Assignments are also nec-essary to determine the residues implicated in the dynamics datafrom nuclear spin relaxation (e.g., [113,112,71]). Building on NVR[79], the algorithm GD [78] performs rapid fold recognition (viageometric hashing against a protein structural database) to corre-late distributions of unassigned NMR data. GD exploits novel ap-proaches for alignment tensor estimation from unassigned RDCs[78,82,83], to perform maximum-likelihood resonance and NOEassignment [80] (in the NVR framework) against the PDB, to detectthe fold even before assignments.

In contrast to traditional methods, the set of NMR experimentsrequired by NVR and RDC-EXACT is smaller, and requires less spec-trometer time. While these algorithms have exploited uniform13C-/15N-labeling [151,156], NVR and RDC-EXACT have been success-fully applied to experimental spectra from different proteins usingonly 15N-labeling, a cheaper process than 13C labeling (cf. Wüthrich[166]: ‘‘A big asset with regard to future practical applications. . . [is]. . . straightforward, inexpensive experimentation. This applies to theisotope labeling scheme as well as to the NMR spectroscopy. . .”).

(3) We will review recent algorithms that assist in determiningmembrane protein structures. In such systems RDCs serve severalfunctions. First, RDCs enable accurate subunit backbone structurecalculation in complexes. Second, in symmetric homo-oligomers,the RDCs aid in determining the symmetry axis [2,168]. Thesetwo advantages enable complete algorithms for NOE assignmentand structure determination that overcome limitations of thesimulated annealing/molecular dynamics (SA/MD) methodologywhen the data are sparse. Several methods, including SYMBRANE

and AMBIPACK, cast the problem of structure determination for sym-metric homo-oligomers (such as many membrane proteins) into asystematic search of symmetry configuration space, automaticallyassigning NOEs and handling NOE ambiguity while provably char-acterizing the uncertainty in the structural ensemble [117,118].

Membrane proteins present experimental and computationalchallenges. Structural studies can be difficult if a protein is hardto crystallize (for X-ray) or is not well-behaved in an artificialmembrane (for NMR). Many membrane proteins are symmetricoligomers. In an n-mer, identical electronic environments obtainidentical chemical shifts, thereby boosting each signal approxi-mately n-fold. However it is not possible to distinguish signalsfrom symmetric atoms in different subunits. The ambiguity in in-ter-subunit NOEs (with identical chemical shifts) adds to the usualchemical shift ambiguity in assigning NOEs in monomers (with‘merely’ similar shifts). While the latter could, in principle, be re-solved experimentally (for example using 4D NOESY [21]), the for-mer is inherent in the symmetry and cannot currently be resolvedby experimental methods: computational solutions are required.On the other hand, the symmetry (which is known to exist fromthe signal overlap) can be used as an explicit kinematic constraintduring structure determination. SYMBRANE is both complete, in thatit evaluates all possible conformations, and data-driven, in that itevaluates conformations separately for consistency with experi-mental data and for quality of packing. Completeness ensures thatthe algorithm does not miss the native conformation, and beingdata-driven enables it to assess the structural precision possiblefrom data alone. SYMBRANE performs a branch-and-bound search inthe symmetry configuration space. It eliminates structures incon-sistent with intersubunit NOEs, and then identifies conformationsrepresenting every consistent, well-packed structure. SYMBRANE hasbeen used to determine the complete ensemble of NMR structuresof unphosphorylated human cardiac phospholamban [117], a pen-tameric membrane protein. SYMBRANE addresses some of the chal-lenges of protein complex determination, larger proteins, and thedifficulties arising from symmetry and NOE ambiguity. By runningSYMBRANE using different priors (starting symmetries) encoding theputative oligomeric number, one can determine, solely from NMRdata, the maximum-likelihood oligomeric state.

(4) Since accurate protein backbone structures can be computedfrom RDCs, these backbones can then be used to bootstrap NOEassignment. Novel techniques, including the algorithms TRIANGLE

[153] and HANA [174], exploit the accurate, high-throughputbackbone structures obtained exactly using sparse RDCs. NOEassignment can be difficult to fully automate, and structure determi-nation of symmetric membrane proteins by NMR can be challenging.We review how, by combining these two difficult problems, recent re-sults indicate an algorithm that solves both simultaneously, and thatenjoys guarantees on its completeness and complexity [117,118].

An overview of the major steps to automated assignment andstructure determination using sparse RDCs is given in Fig. 1. The fig-ure suggests how these algorithms and software tools could bedeveloped into a set of integrated programs for automated foldrecognition, assignment, monomeric and oligomeric structuredetermination. For each of the modules in the figure, there are algo-rithms and implementations reported by several groups working onNMR methodology. One example for each module is shown in the

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 103

Page 5: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

figure, and should be interpreted as a representative for a class ofalgorithms (reviewed below) with similar function. Fold recognition[82,83,78,145,97,80,128] denotes correlation of unassigned NMRdistributions (e.g., RDCs) against a database of known folds. Struc-ture-based assignment (SBA) [4,3,63,82,83,79,67,64] denotes auto-mated assignment given priors on the putative structure(s) of theprotein. Note that, like sparse data and completeness, SBA is a cross-cutting theme: NVR uses priors on putative homologs (detected byGD) to assign resonances (and unambiguous HN—HN NOEs). Algo-rithms for protein structure determination based on exact solutionsto the RDC equations include [122,159,152,151,156,154,155,174].RDC-EXACT is one such algorithm [151,152,156]. While the algorithmsof [38,69,167,139,61,95,119] are not exact, it is likely that a roots-of-polynomials exact solutions version of these algorithms could be de-rived, although possibly not in closed-form. TRIANGLE uses backbonestructure (computed by RDC-EXACT) to assign ambiguous backboneand side-chain NOEs. Several algorithms exist to determine thestructure of symmetric homo oligomers using a combination ofRDCs, NOEs, and other NMR data [150,117,118,129,168]. SYMBRANE

[117,118,139,61,95,119] and AMBIPACK [150] exploit the subunit(monomer) structure to assign intermolecular NOEs and determinethe complex structure. Finally, note that assignment (NVR) and foldrecognition (GD) operate entirely on unassigned data. Structuredetermination by RDC-EXACT operates on assigned data.

This article concentrates upon the information content of theNMR experiments, and the methods for assignment and structuredetermination, with an emphasis, where possible, on provablealgorithms with guarantees of soundness, completeness, and com-plexity bounds. A number of excellent articles have appeared onthe experimental aspects of RDCs; we recommend [160] for a goodintroduction to RDCs and the interplay between experimental andcomputational challenges.

Rather than describing a competition between computer pro-grams, this review tries to evaluate the strengths and weaknessesof the underlying ideas (algorithms). There are several reasons.

First, we believe no one will be using the same programs in 10years (and if we are, that would reflect poorly on the field). How-ever, the underlying mathematical relationships between the dataand the structures should prove enduring, warranting a character-ization of the completeness, soundness, and complexity of struc-ture determination algorithms exploiting sparse dipolar couplings.

Fig. 2. The scalar component of the residual dipolar coupling. �h is Planck’s constant,l0 is the magnetic permeability of vacuum, ca and cb are the gyromagnetic ratios oftwo nuclei a and b, and h is the angle between the external magnetic field B0 and theinternuclear vector v (from a to b) in the weakly-aligned anisotropic phase. h3 cos2 h�1

2 iis the ensemble average of the second Legendre polynomial of cos h. ra;b is thedistance between nuclei a and b. Here, a and b are assumed to be covalently bonded,and therefore the ensemble average hr3

a;bi in the denominator is replaced with thesingle scalar r3

a;b . In classical solution-state NMR, proteins tumble rapidly andisotropically and therefore the dipolar couplings average to zero. RDCs aremeasured by introducing a dilute alignment medium which biases the orientationaldistribution of the protein so that dipolar couplings can be measured. In contrast toNOEs, whose magnitude is proportional to the interatomic distance to the inversesixth power, RDCs are proportional to 1=r3

a;b . The alignment tensor S represents themolecular alignment in the anisotropic phase. It is convenient to express theresidual dipolar coupling in Yan–Donald tensor notation [82,83], as DmaxvT Sv.

ELGNAIRT

DG

sCDR

BDP larutcurtSledom

RVNenobkcaBecnanoseRstnemngissA

TCAXE-CDR

sCDR

ralucelomretnIsCDR;sEON

sEONesrapS

YSEONartcepS

EON,stnemngissA

remonoMerutcurtS

MYS ENARBciremogilO,rebmunxelpmoCerutcurtS

enobkcaBerutcurtS

SECAP hpargRa blevoN

dloF

suougibmanU&stnemngissaEON

wasgiJ c

sCDR

d

Fig. 1. Overview of the major modules in automated NMR assignment and protein structure determination using sparse dipolar coupling constraints. One example for eachmodule is shown in the figure, and should be interpreted as a representative for a class of algorithms (reviewed in the text) with similar function. GD performs rapid folddetermination from unassigned NMR data. NVR performs structure-based assignment. RDC-EXACT determines backbone structure de novo, from 2 RDCs per residue plus sparseNOEs. TRIANGLE assigns the NOESY spectra, allowing determination of a high-resolution monomer structure. SYMBRANE assigns intermolecular NOEs and determines theoligomeric number and complex structure. Each of these modules takes as input NMR data that can be collected in a high-throughput fashion. The major data sources areshown; complete descriptions of the data requirements are in the text. The solid arrows show the data flow for the molecular replacement-like method for NMR. Dashes showan alternative pathway for de novo assignment and structure determination in the case where a completely novel fold is detected (by GD) from unassigned NMR data. a

PACES,b

RGRAPH and cJIGSAW are ab initio assignment algorithms [22,105,149,70,148,12]. Right solid arrows show the data flow from structure to assignments. Downward arrows show

the data flow from assignments to structure. dSYMBRANE simultaneously performs assignment and structure determination.

104 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 6: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

2. The power of exact solutions

Let us consider an analogy. A point-mass p is fired from a can-non with velocity v, where v is a tangent vector to Euclideanthree-dimensional space R3. Assuming Newtonian dynamics,when, and where, will it hit the ground?

This problem can be solved by numerical forward-integration ofthe dynamical equations of motion, or by random guessing (alsoknown as Monte Carlo sampling), simulated annealing, neural net-works, genetic algorithms, systematic grid search, or a host of othertechniques. However, the following simple technique, from mid-dle-school physics, suffices. The trajectory of the mass p is givenby a quadratic equation in one scalar variable (time). By solving

Fig. 3. The alignment tensor S is a symmetric second-rank tensor that may berepresented by a real-valued 3� 3 matrix that is symmetric and traceless. Hence Shas 5 degrees of freedom and may be decomposed using singular value decom-position (SVD) into a rotation matrix U, called the principal order frame, and adiagonal matrix R encoding its eigenvalues. The principal axes of U encode theeigenvectors. For a fixed experimental RDC D, the possible orientations of thecorresponding internuclear vector v must lie on one of two RDC curves on the two-dimensional sphere S2. Each curve is the intersection of an ellipsoidal cone with S2.(Lower left) RDCs curves are shown spaced at 1 Hz intervals (figure courtesy ofVincent Chen and Tony Yan).

Fig. 4. It is convenient to express the internuclear vector as a unit vector of the formv, corresponding to its direction cosines. The RDC r can be expressed in Yan–Donaldtensor notation [82,83], as r ¼ DmaxvT Sv, or in a principal order frame thatdiagonalizes the alignment tensor, namely r ¼ Sxxx2 þ Syyy2 þ Szzz2. Here, Sxx; Syy

and Szz are the three diagonal elements of a diagonalized Saupe matrix S (thealignment tensor), and x; y and z are, respectively, the x; y; z-components of the unitvector v in a principal order frame (POF) which diagonalizes S.

Fig. 5. A cartoon of the geometry and algebra of RDCs measured in twoindependent aligning media. If the two principal order frames (POFs) are indepen-dent, then the internuclear unit vector v is constrained to simultaneously lie on theblue RDC curve (from the blue POF) and the red RDC curve (from the red POF).Generically, the blue and red RDC curves will intersect at 0, 2, 4, or 8 points. Here,only one of the two RDC curves is shown for each POF. Suppose that r is the red RDCand that the diagonalized red POF can be represented as ðSxx; Syy; SzzÞ. Letu ¼ 1� 2 x

a

� �2, where x is the x-component of v and a2 ¼ ðr � SzzÞ=ðSxx � SzzÞ; seeEq. (6) below and [[152], p. 238]. The discrete points corresponding to the RDCcurve intersections are calculated exactly by solving a quartic monomial equationin u, of the form f4u4 þ f3u3 þ f2u2 þ f1uþ f0 ¼ 0 [152], which is also a quarticmonomial equation in x2.

Fig. 6. Key ingredients to a structure determination algorithm exploiting exactsolutions and systematic search.

Fig. 7. Protein backbone kinematics and RDC restraints.

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 105

Page 7: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

this equation simultaneously with the plane of the ground, z ¼ 0,the solution to our problem may be calculated exactly, in closedform, and in constant-time (using only a constant number of com-puter operations). In this case, ‘‘closed-form” means using only thefield operations (addition, subtraction, multiplication, and divi-sion) plus calculating roots

ffiffi�jp

. In this case, j 6 2.A similar trick is available to assist in protein structure determi-

nation, (Figs. 2–7) when we have measured dipolar couplings(Fig. 6). A simplified example will be helpful to understand theidea. The example arises in protein structure determination fromRDCs measured in one medium, using exact solutions.

Suppose we have recorded RDCs (Figs. 2–4) in a single align-ment medium for NH and Ca—Ha bond vectors (Fig. 7), and thatsecondary structure regions have been identified using eitherchemical shifts, short-range NOEs, or scalar coupling experimentssuch as HNHA or J-doubling to measure the / bond angles. Con-sider the simplified problem of computing the orientation and con-formation of a secondary structure element (helix or strand) hcontaining k residues, and that a good estimate of the alignmenttensor is available. As described in [152,151,156], an initial esti-mate of the alignment tensor may be obtained by fitting paramet-ric ideal helical geometry to RDCs from a secondary structureelement such as a helix. The alignment tensor can be subsequentlyrefined in an iterative fashion [152].

Henceforth we will simply refer to the Ca—Ha bond as a CHbond, and to a Ca—Ha RDC as a CH RDC. Let us assume standardprotein backbone geometry (Fig. 7); our example proceeds by anal-ogy with the mathematical concept of strong induction. Assumewe have already computed the structure of the first i� 1 < k resi-dues of h starting at the N-terminus. In this case, the ði� 1Þst pep-tide plane (between residues i� 1 and i) is known (Fig. 7). As the ith

/ dihedral angle, /i rotates, the orientation of the ith Ca—Ha bondvector will move in a circle (Figs. 7, 8). Under any change of coor-dinate system, this circle will transform to an ellipse, E, on the two-dimensional sphere S2. Such an ellipse is shown in green in Fig. 8.

For each RDC r, the dipolar coupling is given by the top equationin Fig. 2, as shown in [140,142]. It is convenient to express theresidual dipolar coupling in Yan–Donald tensor notation [82,83], as

r ¼ DmaxvT Sv; ð1Þ

where Dmax is the dipolar interaction constant, v is the internuclearvector orientation relative to an arbitrary substructure frame, and Sis the 3� 3 Saupe order matrix [130] (Fig. 2). S is a symmetric, trace-less, rank 2 tensor with 5 degrees of freedom, which describes theaverage substructure alignment in the weakly-aligned anisotropic

phase (Figs. 2, 3). The measurement of five or more independentRDCs in substructures of known geometry allows determinationof S [92].

Now, if the RDC has been measured for the CH bond vector inresidue i, then the RDC Eq. (1) constrains the bond vector orienta-tion to lie on one of two curves R ¼ R1 [ R2. Each of these curves isthe intersection of S2 with an ellipsoidal cone. One such curve isshown in orange in Fig. 8.

Therefore, the /i angles that simultaneously satisfy proteinbackbone kinematics and the CH RDC data are given by the inter-section of curves E and R, shown as the green and orange ellipses,respectively, in Fig. 8. Generically, this intersection will be a set of0, 2, or 4 points (the 1-point solution is non-generic), as shown inFig. 8. Wang and Donald showed that these points are the roots ofa quartic monomial equation [151,152] and hence may be com-puted exactly and in closed-form. (Technically, we compute exactsolutions for the sine and cosine of this angle /i, which com-pletely determine the angle /i, which, if desired, can then becomputed numerically using the 2-argument arctangent functionATAN2).

Now, a solution is chosen for /i from amongst these multipleexact solutions, and the procedure continues along the polypeptidechain. With /i fixed, we find ourselves in a symmetrical situation.As the wi bond angle rotates, the orientation of the NH vector ofresidue iþ 1 will move in an ellipse E0 (Fig. 8). Similarly, if theRDC has been measured for this NH bond then its orientation willsimilarly be constrained to lie on curves R0 on S2. Therefore, the ori-entation of the ðiþ 1Þst NH bond vector must lie on the intersectionof the curves E0 and R0, and the wi angles satisfying this constraintcan similarly be solved for exactly and in closed-form, as done pre-viously for /i.

Again, a solution is chosen for wi from amongst these multipleexact solutions, which defines the ith peptide plane (between res-idues i and iþ 1), and the procedure continues along the polypep-tide chain.

Every exact solution precisely satisfies the data. Since there aremultiple solutions for each backbone dihedral angle a choice mustbe made, and this defines a discrete combinatorial search for thestructure of the secondary structure element h. A scoring functionis used to choose the correct root, and hence the correct backbonedihedral angle. The scoring function can use the Ramachandrandiagram, molecular mechanics energies, and any of the usual com-ponents of an empirical scoring function [152,151,156,154]. Bystructuring the search into a conformation tree [152,88,156,45]and using a depth-first search with backtracking [152], or A* search[45], the optimal solution over the entire secondary structureelement h can be found (Fig. 9). Henceforth, we will call such a sec-ondary structure element h a fragment.

The RDC RMSD term in the scoring function calculates the sumof the squared differences between the experimental RDCs and theback-computed RDCs over all k residues of the secondary structureelement h [152]. Minimizing the scoring function over the combi-natorial number of choices of the polynomial roots representingthe backbone dihedral angles, will yield in the structure that opti-mally fits the data [152,156]. Unlike some traditional methods (SA/MD, MC, etc.) that can only compute local minima, this techniqueis guaranteed to compute the globally optimal solution for h. Wediscuss this point below.

2.1. Computing the globally-optimal solution

It is important to note that the choice of backbone dihedral an-gle is not made locally solely using the RDC information for thatresidue. Rather, the scoring function includes an RDC RMSD term,so that the global optimum is computed over the entire fragment[152,156]. By global optimal we mean the minimum of the scoring

Fig. 8. Given NH and Ca—Ha RDCs measured in one medium, the Wang–Donaldstructure equations yield exact solutions for the ð/;wÞ backbone dihedral angles.

106 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 8: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

function, where the minimum is taken over all ð/;wÞ angles in thefragment that are zeroes of the structure equations. While a gridsearch over all (discretized) ð/;wÞ angles is not computationallyfeasible, a complete search with full backtracking that considersall possible exact solutions for all possible dihedral angles is possi-ble over secondary structure elements of up to about 20 residues[152,156,174]. Typical scoring functions over this tree search haveincluded terms for: RDC RMSD, Ramachandran suitability, andhydrogen bonds [152,156,174] or van der Waals packing [154],but any empirical molecular mechanics energy function would befeasible. Note that while an exhaustive search over the entire treeis theoretically necessary in the worst-case, in practice, combinato-rial speed-up can be obtained since when a node is pruned, the en-tire subtree below it is eliminated [152,156]; see Fig. 9.

This gives a procedure to compute the structure of h that opti-mally fits the data under the scoring function. Now, the procedureis exponential in k, the length of h. This exponential dependenceprovides a combinatorial obstruction to simply proceeding alongthe polypeptide chain for the entire protein. To overcome thisproblem, the following algorithm is used. The protein is partitionedinto secondary structure regions. The orientation and conforma-tion of each secondary structure element is solved using the tech-niques described above. Each may be solved independently and inparallel since, under suitable assumptions about the dynamics,they all share the same alignment tensor. This allows the algorithmto divide and conquer: for a protein with n residues, there could, inprinciple, be at most n secondary structure elements. However,each will have only constant length ðk ¼ Oð1ÞÞ. Therefore the prob-lem is divided into a series of HðnÞ subproblems, each of constantsize. Each of these can be solved in constant time since the expo-nential of a constant is also a constant.

When RDCs are recorded in a single medium there is a fourfoldorientational ambiguity between a pair of secondary structureelements. This cannot be disambiguated solely using the RDCs.However, not all combinations need to be tried. The secondarystructure elements can be assembled sequentially using sparseNOEs to pack them together (Fig. 10). For example if the secondarystructure elements, whose orientations (up to the symmetry of thedipolar operator) and conformations have been optimally deter-

mined are ðh1;h2; . . . ; hmÞ, that the algorithm would first pack h2

to dock with h1, and then pack h3 to dock with the packed sub-structure ðh1;h2Þ, and then pack h4 to dock with the packed sub-structure ðh1;h2;h3Þ, and so forth. Each of these packingoperations can determine the optimal packing including the orien-tational ambiguity. This may be done using a complete algorithmas described by [117,152,156,174]. Note that although there couldbe HðnÞ secondary structure elements, the packing and assemblyproblem is not exponential since it is transformed into a linear-sized sequence of constant-sized packing problems. The require-ments on the NOEs are fairly mild since the conformation of thesecondary structure elements is determined up to translation(and the fourfold discrete orientational degeneracy). This meansthat the translation between the oriented secondary structure ele-ments is not determined using RDCs alone. Therefore a small num-ber of sparse NOEs will suffice to pack the secondary structureelements [152].

Once the global fold has been determined by packing togetherthe secondary structure elements based on the RDCs and sparseNOEs, loops must be determined to connect them. This problemis similar to the kinematic loop closure problem in X-ray crystal-lography. The similarity arises because once the core structure ofthe secondary structure elements has been computed, it definesorientations and positions for the helices and strands. Models ofloops must be built that close these kinematic gaps, to connectthe secondary structure elements. The kinematic loop closureproblem is combined with the RDC restraints that are measuredfor the loop residues, to compute an ensemble of loops that simul-taneously satisfies the closed-chain kinematics and the polynomialequations arising from the RDCs [152,174].

Finally, it is possible to model error in the input RDCs. In thesimplest method, a distribution is placed over the input data, andthat distribution can be sampled [152,151]. This sampling resultsin a set of perturbed RDCs. The combinatorially-precise, exact algo-rithms above are run for different sets of the sampled RDCs, result-ing in different solutions to the structure. Out of this ensemble ofsolutions, the maximum-likelihood solution can be computed[152,151,156]. Alternatively, an ensemble of structures that fitthe data can be returned [154]. In the case that sampling is

Fig. 9. A conformation tree is a data structure used in depth-first search over the exact solution (roots of polynomials) with backtracking or A* search, to optimally computethe backbone dihedral angles that globally best-fit the RDC data and an empirical scoring function.

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 107

Page 9: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

undesirable, it is possible, in principle, to use algebraic algorithms(polynomial arithmetic) to push the RDC error intervals throughthe RDC equations, and obtain a representation of the probabilitydensity function over backbone dihedral angles [156].

In general, when different RDCs (at least 2) have been measuredper residue, a similar algebraic and kinematic derivation holds toobtain exact solutions. The case for NH RDCs recorded in two med-ia is shown in Fig. 5. In all cases, the coefficients of the RDC equa-tions are determined by the data [151]. An RDC error boundtherefore defines a range of coefficients, which, in turn, yield arange of roots representing the structural dihedral angles. Hence,the RDC equations define an analytical relationship between theRDC error distribution, and the coordinate error of the ensembleof structures that satisfy the experimental restraints [156]. Precisemethods that relate the experimental error to the coordinate errorof the computed structures therefore appear within reach.

Of course, whenever exact solutions exist, there usually are alsoexcellent numerical algorithms [37] (as opposed to exact algo-rithms), that stably solve the same analytical equations (7, 10 be-low) not exactly, but up to the accuracy of the floating-pointrepresentation. In our motivating example of a point-mass firedfrom a cannon, these numerical algorithms include (for example),the eponymous Newton’s method. Such techniques, born in thefield of numerical analysis and scientific computation [37], enableprovably-good approximation algorithms for our structure deter-mination problem.

2.2. Limitations and extensions

The approach described above assumes that dynamics can beneglected, although recent studies indicate that modest dynamicaveraging can be tolerated, albeit with reduced accuracy in thedetermined orientations of internuclear bond vectors [69]. In addi-tion, it is assumed that the alignment tensor can be estimatedinitially by fitting parametric secondary structure geometry (heli-ces and b-strands) to the RDCs to obtain the alignment [152],and that the alignment parameters can be optimized in an iterativefashion by alternating the roots-of-polynomials exact solutions ap-proach to structure determination (given an alignment tensor)with fitting the alignment tensor to RDCs and the just-determined

nascent partial structure (using SVD) [152,151,156]. While goodresults have generally been obtained from this methodology[152,151,156,174,171], if inaccurate tensors are used, the resultingstructures may have inaccuracies [69]. We observe that the RDCsare scaled by the order parameter S. Suppose order parameters S2

are measured for the same bond vectors as the RDCs, using, forexample, relaxation experiments. In this case, neglecting dynamicsoutside the timescale of the dynamics measurements, one mayheuristically assume that when S2 is high enough (close to 1), thenthe dynamic averaging due to S is negligible, and the the RDC mea-surement is safe to use for structure determination.

3. NMR structure determination algorithms using sparse RDCs

Several papers [152,151,156,154,155,174] make contributions(1–8, below) to the method of determining protein structures bysolution NMR spectroscopy using RDCs as the main restraints.These contributions may be valuable not only to the NMR commu-nity in particular and structural genomics in general, but also tostructural biologists more broadly. This is because in both experi-mental and computational structural biology, exact computationalmethods have been, for the most part, elusive to date. Second, rig-orous comparisons of structures derived from NMR vs. X-ray crys-tallography are made possible by these techniques, and thesecomparisons should be of general interest.

One algorithm, RDC-EXACT (Fig. 6), requires the following exper-imental NMR data: (a) two RDCs of backbone internuclear vectorsper residue (e.g., assigned NH RDCs in two media or NH and CHRDCs in a single medium); (b) identified a-helices and b-sheetswith known hydrogen bonds (H-bonds) between paired strands,and (c) a few NOE distance restraints. The implementation in[152,151,156,174] uses this experimental data, and allows formissing data as well. In contrast to NOE assignment, RDCs canbe recorded and assigned on the order of hours. Additionally, itis relatively straightforward to rapidly obtain the few (three orfour), unambiguous NOEs required for the packing algorithm(see Section 2) from a standard NOESY spectrum, or by using,for example, the labeling strategy of Kay and coworkers [40].The secondary structure types of residues along the backbonecan be determined by NMR from experimentally-recorded scalar

Fig. 10. The orientations and conformations of secondary structure elements (SSEs) can be calculated using sparse RDCs. Then the SSEs are packed using sparse NOEs.Packings are scored separately for data fit and molecular mechanics energies [117] to avoid bias. The packing by NOEs also disambiguates the discrete 4-fold orientationaldegeneracy due to the symmetry of the dipolar operator.

108 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 10: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

coupling HNHA [[16], pp. 524–528] data, or J-doubling [31] datafor larger proteins (these experiments report on the / backboneangles). NMR chemical shifts [163,165,164,94,25] or automatedassignment [12] can also be used. Hydrogen bonds can be deter-mined by NMR from experimentally-recorded data [24,157], or,for example, by using backbone resonance assignment programssuch as JIGSAW [12]. In the remainder of this review, we discussthe algorithm RDC-EXACT assuming that we are given assigned NHRDCs in two media (Fig. 5). However, the results also hold forthe case of NH and CH RDCs in one medium with slight modifica-tions to the equations in Section 9.1 as shown in ref. [151], and asillustrated in Section 2 above.

Most traditional algorithms focus on using NOE restraints todetermine protein structure. This approach has been shown to beNP-hard [131], essentially due to the local nature of the con-straints. The most notable characteristic of NP-hard problems isthat no fast solution to them is known [162]; that is, the time re-quired to solve the problem using any currently-known algorithmincreases very quickly as the size of the problem grows. As a result,the time required to provably solve even moderately large versionsof many of these problems becomes prohibitive using any cur-rently-available amount of computing power. Here, this impliesthat no algorithm for computing a structure that globally satisfiesa dense network of NOE constraints can be mathematically provento produce a satisfactory solution in a reasonable time. This is anundesirable property for structure determination software. In par-ticular, in the case of algorithms such as SA/MD and Monte Carlo,no guarantees of soundness, efficiency, or completeness can bemade. In contrast, it is remarkable that by primarily using RDCs in-stead of NOEs, provably polynomial-time algorithms can be ob-tained, that have guarantees of soundness, efficiency, andcompleteness.

In practice, approaches such as molecular dynamics and simu-lated annealing [15,52], which lack both combinatorial precisionand guarantees on running time and solution quality, are used rou-tinely for structure determination. Several structure determinationapproaches do use RDCs, along with other experimental restraintssuch as chemical shifts or sparse NOEs [6,32,46,62,125,139], yetremain heuristic in nature, without guarantees on solution qualityor running time. Unlike previous approaches ([14] is a notableexception), which have either no theoretical guarantees[15,52,6,32,46,62,125,139], or run in worst-case exponential time[131,29,28,55,56], recent methodology has shown that it is possi-ble to exploit RDC data, which gives global restraints on the orien-tation of internuclear bond vectors, in conjunction with very sparseNOE data, to obtain an algorithm that runs in polynomial time andprovably computes the structure that agrees best with the experi-mental data.

These results are consistent with earlier observations[140,142,139,38,120,6,32,46,62,125,159] that, empirically, RDCsincrease the speed and accuracy of biomacromolecular structuredetermination: RDC-EXACT formally quantifies the complexity-theo-

retic benefits of employing globally-referenced angular data oninternuclear bond vectors. The main contributions of this workwere as follows.

1. To derive low-degree monomial equations that can be solvedexactly and in constant time, to determine backbone ð/;wÞangles from experimentally-recorded RDCs. Only two RDCsper residue are required. For example, after measuring RDCscorresponding to a single internuclear vector v in two differentaligning media, the easily-computable exact solutions eliminatethe need for one-dimensional grid-search previously employed[159] to compute the direction of v or two-dimensional grid-search [46,139,158,93] to compute ð/;wÞ angles. The mainresults also hold for the case of NH and CH RDCs in one mediumwith slight modifications to the Wang–Donald equations in Sec-tion 9.1, as shown in [151,156,174] (see Section 2). Further-more, these equations are very general and can be extendedto compute other backbone and side-chain dihedral angles.The method can be applied mutatis mutandis to derive similarequations for computing dihedral angles from RDCs in nucleicacids.

2. The first NMR structure determination algorithm that simulta-neously uses exact solutions, systematic search and only twoRDCs per residue. (A systematic search is a search over all pos-sible conformations (solutions) that employs a provable prun-ing strategy which guarantees pruned conformations need notbe considered further).

3. The first combinatorially precise, polynomial-time algorithm forstructure determination using RDCs, secondary structure type,and very sparse NOEs.

4. The first provably polynomial-time algorithm for de novo back-bone protein structure determination solely from experimentaldata (of any kind).

5. An implementation of the algorithm that is competitive interms of empirical accuracy and speed, but requires muchless data than, previous NMR structure determinationtechniques.

6. Testing and results of the algorithm on protein NMR data.Representative results from RDC-EXACT, including RMSD to high-resolution crystal structures and NMR structures, are shownin Figs. 11, 12. In addition to these studies several blind testsof RDC-EXACT were performed, in which the structure was notknown ahead of time [174]. NMR data were recorded for theubiquitin-binding zinc finger domain of the human DNA Y-polymerase eta (polg). [174] used NH and Ha—Ca RDCs recordedin one medium, with typically 10–15% missing data (but up to32%) in one secondary structure region, plus 9 NOEs betweenthe helix and b-strands. The structure of polg was then com-puted by RDC-EXACT, and is shown in Fig. 13L. The RDC-EXACT struc-ture [174] was compared with the structure being determined(by conventional techniques) in Dr. Pei Zhou’s laboratory(Fig. 13C). The core structure (helix and sheet) computed by

Fig. 11. Results of RDC-EXACT. (a) Experimental RDC data for ubiquitin (PDB ID: 1D3Z), Dini (PDB ID: 1GHH) and Protein G (PDB ID: 3GB1) were taken from the Protein Data Bank(PDB). (b) Number of residues in a-helices or b-sheets, versus the total number of residues. (c) The total number of experimental RDCs (note that RDCs are missing for someresidues). (d) RDCS from different experimental datasets (for different bond vectors) were used. (e) Number of hydrogen bonds used. (f) Number of NOEs used. (g) RMSD (forCa;N, and C0 backbone atoms) between the oriented and translated secondary structure elements (excluding loop regions) computed by RDC-EXACT to reference structures:ubiquitin to a high-resolution X-ray structure (PDB ID:1UBQ); Dini to an NMR structure (PDB ID: 1GHH); and Protein G to an NMR structure (PDB ID: 3GB1).

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 109

Page 11: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

RDC-EXACT was 1.28 Å RMSD from the Zhou lab structure(Fig. 13C). In a second test, the same suite of NMR data [174]was obtained for a second protein, the human Set2-Rpb1 inter-acting (SRI) domain. The global fold of human SRI was deter-mined by RDC-EXACT, similarly using NH and Ha—Ca RDCsrecorded in one medium, plus sparse NOEs. The resulting corestructure (a 3-helix bundle) had an RMSD of 1.61 Å to the refer-ence structure (see Fig. 13R). Both reference structures weredetermined using traditional methods (XPLOR [15]) requiring amuch larger set of experimental spectra. The accuracy of RDC-

EXACT on these blind tests is comparable to the accuracyachieved in the retrospective studies (Figs. 11, 12). The abilityof RDC-EXACT to determine the global fold of polg and humanSRI with reasonable accuracy, using only a minimal suite ofexperiments, that could be collected in a high-throughput fash-ion, supports the feasibility of the exact solutions approach forstructure determination.

7. RDC-EXACT can compute b-sheets from RDC data alone, which fun-damentally extends previous methods [38] targeting onlyentirely helical proteins. Unlike a-helices, b-strands are often

twisted in globular proteins so it is important to refine themaccurately from RDC data. RDC-EXACT can determine the backbonestructures of proteins consisting of either a-helices, b-sheets, orboth, and thus has wider application since most proteins haveboth a-helices and b-sheets.

8. RDC-EXACT was the first demonstration that the conformationsand orientations of both a-helices and b-strands can be com-puted accurately and efficiently using exclusively RDCs mea-sured on a single bond vector type (NH) in only twoaligning media [152]. Similar accuracies and efficiency wereobtained using only NH and CH RDCs in one medium[151,156,174]. With a minimum number of additional dis-tance restraints a three-dimensional structure could be com-puted consequently [152,156,174].

Structure determination using sparse data is an underconstrainedproblem. Additional constraint may be obtained using structureprediction [73,126] or homology modeling [121]. The formerreduces to decoy detection, pruned by NMR data. The secondreduces to biasing the structure determination using the PDB.

Fig. 12. (Top) Structure of ubiquitin backbone with loops. The ubiquitin backbone structure (blue) was computed by extending RDC-EXACT to handle loop regions along theprotein backbone [156]. The structure was computed using 59 NH and 58 CH RDCs (117 out of 137 possible RDCs, 20 are missing), 12 H-bonds and 2 unambiguous NOEs. Theblue structure has a backbone RMSD of 1.45 Å with the high-resolution X-ray structure (PDB ID: 1UBQ, in magenta) [147]. (Bottom) Comparison of RDC-EXACT with previousapproaches. (a) References to previously-computed ubiquitin backbone structures (including loop regions). (b) Algorithmic technique. (c) Data requirements. (d) BackboneRMSD of structure (for Ca;N, and C0 backbone atoms) compared to the X-ray structure (1UBQ). The structure computed by RDC-EXACT includes loops and turns, as shown at top.(e) References [152,151,156,155].

Fig. 13. (Left) The global fold of polg, computed by RDC-EXACT using 2 RDCs per residue measured in one medium plus 9 NOEs between the helix and b-strands. Center:comparison of polg to the reference structure, PDB id: 2I5O. The RMSD of the secondary structure elements is 1.28 Å. Right: global fold of human SRI, computed by RDC-EXACT

(thick lines) using two RDCs per residue measured in one medium plus sparse NOEs. The reference structure is shown in thin lines (PDB id: 2A7O [86]).

110 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 12: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

In both approaches, sparse RDCs can be employed, but, comparedwith conventional protocols, the resulting structures obtain theirauthority less from the data and more from modeling or homol-ogy. In contrast, the exact solutions technique admits algorithmsthat can extract more structural information from less NMR data,than had been previously exploited. This can be done using acombination of computer algebra, computational geometry, andstatistical methods. Compared with previous algorithms for com-puting backbone structures using RDCs, RDC-EXACT achieves similaraccuracies but requires less data, relies less on statistics from thePDB and does not depend on molecular dynamics or simulatedannealing (Figs. 11–13). Since RDCs can be acquired and assignedmuch more quickly than NOEs in general, the results show it ispossible to compute structures rapidly and inexpensively usingmainly RDC restraints.

4. Nuclear vector replacement for automated NMR assignmentand structure determination

High-throughput NMR structural biology can play an importantrole in structural genomics. Recent results have generalized and ex-tended structure-based assignment algorithms such as JIGSAW [12],to obtain an automated procedure for high-throughput NMR reso-nance assignment for a protein of known structure, or of an homol-ogous structure [79,82,83,78,8]. Nuclear Vector Replacement (NVR)uses Expectation/Maximization (EM) to compute assignments(Fig. 14L). NVR is an RDC-based algorithm, which computes assign-ments that correlate experimentally-measured RDCs, chemicalshifts, HN—HN NOEs (which are called dNNs) and amide exchangerates to a given a priori 3D backbone structural model. The algo-rithm requires only uniform 15N-labeling of the protein, and pro-cesses unassigned HN—15N HSQC spectra, HN—15N RDCs, andsparse dNNs, all of which can be acquired in a fraction of the timeneeded to record the traditional suite of experiments used to per-form resonance assignments (Fig. 14R). NVR could form the basisfor ‘‘Molecular Replacement (MR) by NMR”. RDCs provide global ori-entational restraints on internuclear bond vectors, (see Eq. (1) and

Figs. 2, 3 above). Once the alignment tensor S has been determined,RDCs may be simulated (back-calculated) given any other internu-clear vector vi. In particular, suppose an (HN; 15N) peak i in anHN—15N HSQC (subsequently termed simply ‘‘HSQC”) spectrum isassigned to residue j of a protein, whose crystal structure is known.Let ri be the measured RDC value corresponding to this peak. Thenthe RDC ri is assigned to amide bond vector vj of a known structure,and we should expect that ri � DmaxvT

j Svj (although noise, dynam-ics, crystal contacts in the structural model, and other experimentalfactors will cause deviations from this ideal).

SBA approaches use unassigned NMR data, such as RDCs. Notethat, in contrast, assigned RDCs have also been employed by a vari-ety of structure refinement [19] and structure determinationmethods [62,6,159], including: orientation and placement of sec-ondary structure to determine protein folds [38], pruning anhomologous structural database [7,96], de novo structure determi-nation [125], in combination with a sparse set of assigned NOEs todetermine the global fold [100], and a method for fold determina-tion that selects heptapeptide database fragments best fitting theassigned RDC data [32]. Bax and co-workers termed their tech-nique ‘‘molecular fragment replacement” [32], by analogy withX-ray crystallography MR techniques. Unassigned RDCs have beensuccessfully used to expedite resonance assignments [176,32,139].

The idea of correlating unassigned experimentally measuredRDCs with bond vector orientations from a known structure wasfirst proposed by Al-Hashimi and Patel [4] and subsequently dem-onstrated by Al-Hashimi et al. [3] who considered permutations ofassignments for RNA, and also in reference [63]. Brüschweiler andco-workers [63] successfully applied RDC-based maximum bipar-tite matching to structure-based resonance assignment. Their tech-nique requires RDCs from several different bond types which, inturn, requires 13C-labeling of the protein and triple resonanceexperiments. NVR builds on these works and offers some improve-ments in terms of isotopic labeling, spectrometer time, accuracyand computational complexity. NVR algorithms have addressedthe following hypothesis: Are backbone amide RDCs and dNNs suffi-cient for performing resonance assignment? Like the techniques of

Fig. 14. Nuclear vector replacement. (Left) Schematic of the NVR algorithm for resonance assignment using EM. The NVR algorithm takes as input a model of the targetprotein and several unassigned spectra, including the 15N–HSQC, HN—15N RDC, 15N–HSQC NOESY, and an H–D exchange-HSQC to measure amide exchange rates. In the firstphase, NVR computes the alignment tensors for both media using chemical shift prediction, dNNs, H–D exchange–exchange rates and the Expectation/Maximization (EM)algorithm. This step takes time Oðn2Þ, where n is the number of residues. In the second phase, chemical shift predictions, dNNs, RDCs in two media and the EM algorithm areused to assign all remaining peaks. This entire process runs in minutes, and is guaranteed to converge in time Oðn3Þ. NVR experiment suite: (Right) The 5 unassigned NMRspectra used by NVR to perform resonance assignment. The HSQC provides the backbone resonances to be assigned. HN—15N RDC data in two media provide independent,global restraints on the orientation of each backbone amide bond vector. The H–D exchange HSQC identifies fast exchanging amide protons. These amide protons are likely tobe solvent-exposed and non-hydrogen bonded and can be correlated to the structural model. A sparse number of unambiguous, unassigned dNNs can be obtained from theNOESY. These dNNs provide distance constraints between spin systems which can be correlated to the structural model. Chemical shift predictions are used as a probabilisticconstraint on assignment.

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 111

Page 13: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

Hus et al. [63], NVR calls optimal bipartite matching as a subrou-tine, but within an Expectation/Maximization (EM) frameworkthat offers some benefits, described below. Previous methods(and later algorithms [67,64]) required 13C-labeling and RDCsfrom many different internuclear vectors (for example,C0—15N;C0—HN;Ca—Ha, etc.). NVR uses a different algorithm and re-quires only amide bond vector RDCs, no triple-resonance experi-ments, and no 13C-labeling. Moreover, NVR is more efficient. Thecombinatorial complexity of the assignment problem is a functionof the number n of residues (or bases in a nucleic acid) to be as-signed, and, if a rotation search is required, the resolution k3 of arotation-space grid over the Lie group SOð3Þ of 3D rotations. Thetime-complexity of the RNA-assignment method, named CAP [3]grows exponentially with n. In particular, CAP performs an exhaus-tive search over all permutations, making it difficult to scale up tolarger RNAs. The method presented in reference [63] runs in timeOðIn3Þ, where Oðn3Þ is the complexity of bipartite matching [76]and I is the number of times that the bipartite matching algorithmis called. I may be bounded by Oðk3Þ, the time to search for theprincipal order frame (POF) over SOð3Þ. Thus, the full time-com-plexity of the algorithm presented in reference [63] is Oðk3n3Þ. Ver-sion 1 of NVR [82,83,78] also performed a discrete grid search forthe POF over SOð3Þ, but used a more efficient algorithm withtime-complexity Oðnk3Þ. Once the POF has been computed, reso-nance assignments are made in time Oðn3Þ. Thus, the total runningtime of NVR Version 1.0 [82,83] is less: Oðnk3 þ n3Þ. Zweckstetterand Bax [175] estimated alignment tensors (but not assignments)using permutations of assignments on a subset of the residuesidentified using either selective labeling or 13Ca=b chemical shifts.If m residues can be thus identified a priori, then this method pro-vides an Oðnm6Þ tensor estimation algorithm that searches over allpossible assignment permutations.

NVR Version 2.0 [79] requires neither a search over assignmentpermutations, nor an explicit rotation search over SOð3Þ. Rather,EM [33] is used to correlate the chemical shifts of the HN � 15N

HSQC resonance peaks with the structural model. In practice,the application of EM on the chemical shift data is sufficient touniquely assign a small number of resonance peaks, and directlydetermine the alignment tensor by singular value decomposition(SVD) – see Fig. 3. NVR 2.0 eliminates the rotation grid-searchover SOð3Þ, and hence any complexity dependence on a grid orits resolution k, running in Oðn3Þ time, scaling easily to proteinsin the middle NMR size range (n ¼ 56 to 129 residues) [79]. More-over, NVR elegantly handles missing data (both resonances andRDCs).

NVR adopts a sparse-data, or minimalist approach [12], demon-strating the large amount of information available in a few keyspectra. By eliminating the need for triple resonance experiments,NVR saves spectrometer time. The required data (Fig. 14R) can beacquired in about one day of spectrometer time using a cryoprobe.NVR runs in minutes and efficiently assigns the ðHN; 15NÞ backboneresonances as well as the sparse dNNs from the 3D 15N-NOESY spec-trum. NVR was tested on NMR data from 3 proteins using 20 differ-ent alternative structures, all determined either by X-raycrystallography or by different NMR experiments (without RDCs)(Table 1). When NVR was run on NMR data from the 76-residueprotein, human ubiquitin (matched to four structures, includingone mutant/homolog), it achieved 100% assignment accuracy. Sim-ilarly good results were obtained in experiments with the 56-res-idue streptococcal protein G (SPG) (99%) and the 129-residue henlysozyme (100%) when they were matched by NVR to 16 3D struc-tural models. Table 1 summarizes the performance of NVR usingalternative structures of ubiquitin, SPG, and lysozyme, none ofwhich were refined using RDCs. 1UD7 is a mutant form of ubiquitinwhere 7 hydrophobic core residues have been altered (I3V, V5L,I13V, L15V, I23F, V26F, L67I). It was chosen to test the effectivenessof NVR when the model is a (close) homolog of the target protein.This success in assigning the mutant 1UD7, suggests that NVRcould be applied more broadly to assign spectra based on homolo-gous structures [79,8]. Thus, NVR could play a role in structuralgenomics.

5. Protein fold determination via unassigned residual dipolarcouplings

Sequence homology can be used to predict the fold of a protein,yielding important clues as to its function. However, it is possiblefor two dissimilar amino acid sequences to fold to the ‘‘same”tertiary structure. For example, the RMSD between the humanubiquitin structure (PDB Id: 1D3Z) and the structure of the UbxDomain from human Fas-associated factor 1 (Faf1; PDB Id:1H8C) is quite small (1.9 Å), yet they have only 16% sequenceidentity. Detecting structural homology given low sequence iden-tity poses a difficult challenge for sequence-based homology pre-dictors. Is there a set of fast, cheap experiments that can beanalyzed to rapidly compute 3D structural homology? A newmethod for homology detection has been developed, called GD, thatexploits high-throughput solution-state NMR. This algorithm ex-tends the NVR technique to perform protein 3D structural homol-ogy detection, demonstrating that NVR and its generalization GD,are able to identify structural homologies between remote aminoacid sequences from a database of structural models. The first pa-per on protein fold determination using unassigned RDCs was pub-lished in RECOMB in April, 2003 [82]. Other papers on this topicinclude [78] and [83,80]. One goal of structural genomics is theidentification of new protein folds. GD is an automated procedurefor detecting 3D structural homologies from sparse, unassignedprotein NMR data, and could aid in prioritizing unknown proteinsfor structure determination. GD identifies the 3D structural modelsin a protein structural database whose ‘‘unassigned geometries”best fit the unassigned experimental NMR data. It does not use

Table 1Backbone Amide resonance assignment accuracy using NVR. Accuracies report thepercentage of correctly-assigned backbone HSQC peaks.

PDB IDa Accuracyb

(i) Ubiquitin1G6J [10] 100%1UBI [123] 100%1UBQ [147] 100%1UD7 [66] 100%

(ii) SPG1GB1 [50] 100%2GB1 [50] 96%1PGB [39] 100%

(iii) Lysozyme193L [146] 100%1AKI [9] 100%1AZF [90] 100%1BGI [106] 100%1H87 [47] 100%1LSC [77] 100%1LSE [77] 100%

(iv) Lysozyme1LYZ [35] 100%2LYZ [35] 100%3LYZ [35] 100%4LYZ [35] 100%5LYZ [35] 100%6LYZ [35] 100%

a Structural model used.b Accuracy of NVR on the NMR data shown in Fig. 14R for Ubiquitin (i), SPG (ii),

and Lysozyme (iii–iv). The 96% accuracy for 2GB1 reflects a single incorrectassignment.

112 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 14: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

sequence information and is thus not limited by sequence homol-ogy. GD can also be used to confirm or refute structural predictionsmade by other techniques such as protein threading or sequencehomology.

GD runs in Oðpnk3Þ time, where p is the number of proteins in thedatabase (2456 in [82,78,83]), n is the number of residues in thetarget protein, and k is the resolution of a rotation search. GD re-quires only uniform 15N-labeling of the protein and processesunassigned HN—15N RDCs, which can be acquired rapidly. Experi-ments on NMR data from 5 different proteins demonstrated thatGD identifies closely related protein folds, despite low-sequencehomology between the target protein and the computed model[82,78,83,80]. The overall rankings of the top predicted homologsare good (Fig. 15). Human Faf1 (1H8C) (discussed above) was iden-tified as a structural homolog of ubiquitin (Fig. 15U). GD does beston lysozyme, where the native structure and 5 homologous struc-tures occupied the top 6 places (Fig. 15L).

GD [82] represented the first systematic algorithm to exploit anunassigned ‘‘fingerprint” of RDCs for rapid protein fold determina-tion. After Langmead et al. [82], a similar idea was independentlyproposed and extended by number of researchers including Preste-gard [145,128], Baker [97], [98], and coworkers. GD was later ex-tended to demonstrate high-throughput inference of protein-protein interfaces using only unassigned NMR data [99], whichshould be valuable in structural proteomics.

6. Automated NOE assignment using a rotamer libraryensemble and RDCs

Despite recent advances in RDC-based structure determination(see Sections 2, 3), NOE distance restraints are still important forcomputing a complete three dimensional solution structureincluding side-chain conformations. In general, NOE restraintsmust be assigned before they can be used in a structure determina-tion program. NOE assignment is very time-consuming to do man-ually, challenging to fully automate, and has become a keybottleneck for high-throughput NMR structure determination.The difficulty in automated NOE assignment is ambiguity: therecan be tens of possible different assignments for an NOE peakbased solely on its chemical shifts. Most automated NOE assign-ment approaches [103,57,59,68,60,53,91] rely on an ensemble ofstructures, computed from a subset of all the NOEs, to iterativelyfilter ambiguous assignments. Despite this progress, there is roomfor improvement since previous methods require quite high qual-ity input data. For example, they typically require initializing theassignment/structure determination/assignment cycle with a largenumber of unambiguous or manually-assigned NOE peaks (e.g.,> 5 NOEs per residue in [103]), > 85% complete resonance assign-ments (including side chains), almost complete aromatic side-chain assignments, a low percentage of noise peaks, and only smallerrors in chemical shifts (6 0:03 ppm for 3D NOESY spectra). More-

Fig. 15. GD, representative results. GD was tested on unassigned HN—15N RDCs for 5 proteins [78]; representative scatter plots of RMSD vs. the score computed by GD areshown for human ubiquitin (Top left) and hen lysozyme (Top right). Only proteins within 10% of the target protein’s length are plotted. Open circles are data points for thenative structure (1D3Z for ubiquitin; 1E8L for lysozyme) and five homologous structures (Tables U and L). The + signs are the data points associated with non-homologousproteins. The diamond is the 2D mean of the +’s while the triangle is the 2D mean of the open circles. The trend line shows the correlation between the score computed by GD

and RMSD for all the data points. The scores associated with the native fold and the 5 homologs are statistically significantly lower than the scores of unrelated proteins (p-values < 2:7� 10�5). Tables: GD results for (U) ubiquitin and (L) lysozyme. The sequence identity and RMSD of these 2 test proteins with their respective top 5 homologs areshown. aThe rank of each model, out of 2456 proteins in the database, using the score computed by GD. bThe Ubx Domain from human Faf1 (see Section 5) was ranked 11th.

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 113

Page 15: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

over, previous algorithms are heuristic in nature, provide no guar-antees on solution quality or running time, and consume manyhours to weeks of computation time.

Sparse dipolar couplings have enabled the development of newNOE assignment algorithms, including TRIANGLE [153] and HANA

[174]. An in-depth understanding of HANA would require a reviewof the minimum Hausdorff distance [37] and Chernov tail bounds[174], which are beyond the scope of this review. Therefore, we de-scribe the conceptually-simpler TRIANGLE algorithm; since the infor-mation content of the input data is identical, TRIANGLE will illustratethe basic paradigm. The interested reader can consult Zeng et al.[174] for a description of HANA, which extends TRIANGLE and is moregeneral and robust. TRIANGLE begins with known resonance assign-ments and an accurate backbone structure computed using RDC-EXACT

(Sections 2, 3, [152,151,156]). In principle, another structure deter-mination algorithm could be used instead. However, RDC-EXACT is theonly algorithm that can compute a complete protein backbonestructure de novo using only two RDCs per residue. TRIANGLE usesthe backbone structure determined by RDC-EXACT to boot-strap theautomated assignment of NOEs: the assignment proceeds by filter-ing the experimentally-measured NOEs based on consistency withthe backbone structure.

One novel feature of TRIANGLE is the use of a rotamer-library-de-rived ensemble of intra-residue vectors between the backboneatoms and side-chain protons to reduce the ambiguity in theassignment of NOE restraints involving side-chain protons, espe-cially aliphatic protons. The rotamer database was built fromultra-high resolution structures ð< 1:0 ÅÞ in the PDB [153]. TRIANGLE

merges this ensemble of intra-residue vectors, together with inter-nuclear vectors from the computed backbone structure. For exam-ple, consider the putative assignment of an NOE to an ðHN;HdÞ pairof protons. The triangle relationship (Fig. 16L) defines a decision

procedure to filter NOE assignments by fusing information fromRDCs ðvNbÞ, rotamer modeling ðvbdÞ, and NOEs ðjvNdjÞ. More detailsof the TRIANGLE algorithm are provided in [153,174] and Section 9.3below.

In [153], RDC-EXACT was first employed to compute an accuratebackbone structure of ubiquitin using only two backbone RDCsper residue (Fig. 12 (top), [152,151,156]). Next, TRIANGLE was suc-cessfully applied to assign more than 1700 NOEs, with better than90% accuracy on the experimental 15N- and 13C-edited 3D NOESYspectra for ubiquitin [153]. The result of the automated assignmentis summarized in Table 2. Out of the 1153 NOE peaks picked fromthe 15N-edited NOESY spectrum, 1083 originate from backboneamide protons, the remaining 50 are from side-chain amide pro-tons. Only 420 NOE peaks originating from backbone Ha protonscould be picked from the 13C-edited NOESY spectrum. TRIANGLE

was able to assign 1053 NOE peaks from the 1083 peaks pickedfrom 15N-edited NOESY spectrum and 393 peaks from the 420peaks picked from 13C-edited NOESY spectrum. The assigned NOEdistance restraints are divided into two classes: sequential NOEsand medium/long-range NOEs (Table 2). The number of assignedNOE distance restraints is larger than the number of assignedpeaks since it is possible that more than one NOE restraint couldbe assigned to a single peak. It took less than one second on a2.4 GHz single-processor Linux workstation for TRIANGLE to computethe assignments.

To test the quality of the 1783 NOE restraints automaticallygenerated and assigned by TRIANGLE (Table 2), [153] then usedthese restraints to calculate the structure of ubiquitin, using ahybrid distance-geometry and SA protocol (XPLOR [15]). No RDCrestraints were used. In the resulting ensemble of 70 structures,163 restraints had NOE violations larger than 0.5 Å in 50 structures.However, none of the NOE violations was larger than 2.5 Å. After

Fig. 16. (Left) The triangle relationship, defines a decision procedure to filter NOE assignments by fusing information from structure ðvNbÞ, modeling ðvbdÞ, and experimentðjvNd jÞ. An accurate backbone structure is first computed using only 2 RDCs per residue (Section 3). The vector vNb is computed from this backbone structure. vbd is a rotamerensemble-based intra-residue vector mined from the PDB. The computed length of vNd is compared with the experimental NOE distance dN (measured using NOE crosspeakintensities) to filter ambiguous NOE assignments. The complexity of NOE assignment is Oðn2 log nÞ, where n is the number of protons in the protein. One cycle of NOEassignment suffices when a well-defined backbone structure is computed using RDC-EXACT. In practice, it took less than one second to assign 1783 NOE restraints from the NOEpeak list picked from both the 3D 15N-edited and 13C-edited NOESY spectra of human ubiquitin. Right and center: The NMR structures computed from the automatically-assigned NOEs. The middle panel shows the 12 best NMR structures with no NOE distance violation larger than 0.5 Å. The side-chains are blue; the backbone is magenta. Theright panel is the overlay of the NMR average structure (blue) with the 1.8 Å X-ray structure (magenta) [147].

Table 2NOE restraints automatically assigned by TRIANGLE.

Spectrum No. of peaks No. of assigned NOEs No. of sequential NOEsa No. of mediumb, long-range NOEsc

15N-NOESY 1083 1288 822 46613C-NOESY 420 495 228 167

a NOEs between residue i and iþ 1.b NOEs between residue i and iþ j where 1 < j < 4.c NOEs between residue i and iþ j where j P 4. No.: number.

114 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 16: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

these 163 restraints were deleted from the NOE list, XPLOR was in-voked for a second time to compute the structures using theremaining 1620 NOE restraints. Twelve structures out of 70 totalcomputed structures had no NOE violations larger than 0.5 Å. Thus,the NOE assignment algorithm had an accuracy of 91%, and theincorrect assignments could easily be detected and removed. The12 best NMR structures (Fig. 16C) can be overlayed with a pairwiseRMSD of 1:18� 0:16 Å for backbone atoms and a pairwise RMSD of1:84� 0:19 Å for all heavy atoms. The accuracy of the structurescomputed using the automatically-assigned NOEs from TRIANGLE isin the range of typical high- to medium-resolution NMR structures.The average structure (Fig. 16R) computed from the best 12 struc-tures has a 1.43 Å backbone RMSD and 2.13 Å all heavy-atomRMSD from the 1.8 Å ubiquitin X-ray structure [147]. HANA, a suc-cessor to TRIANGLE, was tested on additional proteins, resulting insimilarly good accuracies [174].

7. NMR structure determination of symmetric homo-oligomers

Symmetric homo-oligomers play pivotal roles in complex bio-logical processes including ion transport and regulation, signaltransduction, and transcriptional regulation. RDCs were recentlyused in studies of phospholamban, a symmetric homo-pentamericmembrane protein that regulates the calcium levels between cyto-plasm and sarcoplasmic reticulum and hence aids in muscle con-traction and relaxation [111]. Ion conductance studies [75] alsosuggest that phospholamban might have a separate role as anion channel. To understand the dual function of phospholambanand other such symmetric homo-oligomers, a combined experi-mental-computational approach, SYMBRANE [117,118], determinedtheir structures using sparse inter-subunit NOE distance restraintsand van der Waals (vdW) packing. In this case, the RDCs were usedto refine the subunit structure.

SYMBRANE is complete in that it tests all possible conformations,and it is data-driven in that it first tests conformations for consis-tency with data and only then evaluates each of the consistent con-formations for vdW packing. Completeness in a structuredetermination method is a key requirement since it ensures thatno conformation consistent with the data is missed. This avoidsany bias in the search, as well as any potential for becomingtrapped in local minima, problems inherent in energy minimiza-tion-based approaches. (RDC-EXACT, described above in Sections 2,3, is also complete). The data-driven nature of SYMBRANE allowsone to independently quantify the amount of structural constraintprovided by data alone, versus both data and packing. This avoidsoverreliance on subjective choices of parameters for energy mini-mization, and consequent false precision in determined structures.Analogously to single particle analysis in cryoelectron microscopy[107], SYMBRANE also allows determination of the oligomeric numberof the complex. SYMBRANE was the first approach that determines theoligomeric number of a symmetric homo-oligomer from inter-sub-unit NOE distance restraints. The complete ensemble of NMRstructures of the human cardiac phospholamban pentamer, deter-mined by SYMBRANE, was reported by Potluri et al. [117] and depos-ited in the PDB (id: 2HYN).

SYMBRANE exploits the observation that, given the structure of asubunit, the structure of a symmetric homo-oligomer is completelydetermined by the parameters (position and orientation) of itssymmetry axis. Thus we can formulate structure determinationas a search in the space of all symmetry axis parameters, symmetryconfiguration space (SCS). SYMBRANE performs a branch-and-boundsearch in the SCS, which defines all possible Cn homo-oligomericcomplexes for a given subunit structure. It eliminates those sym-metry axes inconsistent with NOE distance restraints and thenidentifies conformations representing any consistent, well-packedstructure.

SYMBRANE consists of two phases. The first phase performs a com-plete, data-driven search in SCS and returns consistent regions inSCS, that is, regions representing all conformations consistent with(i.e., satisfying) the data. In the second phase, all satisfying struc-tures are evaluated for vdW packing. SYMBRANE computes a set ofwell-packed satisfying (WPS) structures that are not only consistentwith data, but also have high-quality vdW packing. SYMBRANE explic-itly quantifies the uncertainty in determined structures using thesize of the regions in SCS and the variations in atomic coordinatesin the structures. The difference in uncertainty between the satis-fying structures and the WPS structures illustrates the relative pre-cision that is possible from data alone, versus data and packingtogether. SYMBRANE simultaneously assigns the intermolecular NOEsduring the SCS search for structure determination [118]. Thisassignment resolves the ambiguity as to which pairs of protonsgenerated the observed NOE peaks, and thus should be restrainedin structure determination.

For inter-subunit NOEs in symmetric homo-oligomers, theambiguity includes both the identities of the protons and the iden-tities of the subunits to which they belong. SYMBRANE resolves bothambiguities to determine its structures, and returns all structuresconsistent with the available data (ambiguous or not). However,while SYMBRANE is complete, it avoids explicit enumeration of theexponential number of combinations of possible assignments. SYM-

BRANE geometrically prunes SCS regions and NOE assignments thatare mutually inconsistent. Pruning occurs only due to provableinconsistency and thereby avoids the pitfall of local minima thatcould arise from best-first sampling-based approaches. Ultimately,SYMBRANE returns a mutually-consistent set of conformations andNOE assignments. SYMBRANE can draw two types of conclusions notpossible under previous methods: (a) that different assignmentsfor an NOE would lead to different structural classes, or (b) thatit is not necessary to assign an NOE since the choice of assignmentwould have little impact on structural precision.

In contrast to previous techniques, SYMBRANE separately quanti-fies the amount of structural constraint provided by data alone,versus data and packing (Fig. 18d). For the human phospholam-ban pentamer in dodecylphosphocholine micelles, using thestructure of one subunit determined from a subset of the exper-imental NMR data, SYMBRANE identifies a diverse set of complexstructures consistent with the nine inter-subunit NOE restraints[117]. The distribution of structures determined in the ensemble(PDB Id 2HYN) provides an objective characterization of structuraluncertainty: incorporating vdW packing reduced the structuraldiversity by 29% and average variance in backbone atomic coordi-nates by 44%.

By comparing data consistency and packing quality in a searchusing different assumptions of oligomeric number, SYMBRANE iden-tified the C5 pentamer as the most likely oligomeric state of phos-pholamban, demonstrating that it is possible to determineoligomeric number directly from NMR data. Additional tests onsix other homo-oligomers, from dimer to heptamer, similarlydemonstrated the power of SYMBRANE to provide unbiased determi-nation and evaluation of homo-oligomeric complex structures[117,118].

8. Applications and connections to other biophysical methods

Both NVR and SYMBRANE are structure-based (in that they exploitpriors on the monomer structure) and have clear analogies tomolecular replacement (MR) in X-ray crystallography, and itsleverage of subunit structure and non-crystallographic symmetry(NCS). At the heart of NVR is a novel rotation search (over 3D rota-tion space, SOð3Þ), and at the core of SYMBRANE is novel exploitationof the kinematics of Cn symmetry. It is possible to generalize andapply these algorithms to help expedite structure determination

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 115

Page 17: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

even using very different experiments (crystallography). To ana-lyze NCS in MR-based X-ray diffraction data of biopolymers, onemust ‘‘recognize” a finite subgroup of SOð3Þ out of a large set ofmolecular orientations. This problem may be reduced to clusteringin SOð3Þmodulo a finite group, and solved efficiently by ‘‘factoring”into a clustering on the unit circle followed by clustering on the 2-sphere S2, plus some group-theoretic calculations [89]. This yieldsa polynomial-time algorithm, CRANS, that is efficient in practice, andwhich enabled determination of the structure of dihydrofolatereductase-thymidylate synthase (DHFR-TS) from Cryptosporidiumhominis, PDB id: 1QZF [110,109,89]. Cryptosporidium is an organismhigh on the bioterrorism list of the Centers for Disease Control, aCategory B bioterrorist threat. There is currently no drug therapyfor cryptosporidosis. The enzyme DHFR-TS is in the sole de novobiosynthetic pathway for the pyrimidine deoxyribonucleotidedTMP, and therefore an attractive drug target. Solving the structureof DHFR-TS from C. hominis enabled species-specific drug design[5,116], exploiting structural differences between the human en-zyme and C. hominis DHFR-TS.

In general, algorithms like NVR and SYMBRANE are often promiscu-ous, in that such tools can be applied to many problems in struc-tural biology [36]. The CRANS NCS method for MR [89,110,109] inX-ray crystallography generalized NVR rotation search algorithmsthat were originally proposed for a different problem, NMR assign-ment [82,83,78,79]. Similarly, work in target selection for struc-tural and functional genomics [84,81,87] has applied algorithmsfrom NMR signal processing such as maximum entropy spectralanalysis. Recent work on enzyme redesign in the non-ribosomalpeptide synthetase pathway [88,136,45,44,26] (using crystallogra-phy and site-directed mutations) generalized algorithms for mod-eling NMR ensembles [49,11] and rotamers for NOE assignment[153,174]. These rotamer optimization algorithms [88,45,44] can,in turn, be used as a tool in structure determination as discussedbelow in Sections 9.3, 9.4.

9. Looking under the hood: how the algorithms work, andoutlook for future developments

In the future, one expects improvements and extensions to thealgorithms described above. In particular, the primary goals focuson robustness and scaling. We wish the algorithms to be more ro-bust to noise and missing data. We want the implemented systemsto scale to run successfully on more proteins, and larger proteins. Anumber of smaller steps and subgoals are forseen to achieve thesegoals. In this section, we describe in more detail how the algo-rithms work so the reader can develop an appreciation both fortheir scope and how they may be extended and improved.

In general, but particularly for RDC-EXACT, it is useful for the algo-rithms and software to allow users a choice: to record either (i) onetype of backbone RDC (such as NH RDCs) in two aligning media, or(ii) two types of backbone RDCs (such as NH and CH RDCs) in a sin-gle medium. This flexibility allows application to a wider range ofproteins. Experimental NH RDCs in two media require only an 15N-labeled sample, which is an order of magnitude cheaper to preparethan a doubly-labeled 15N/13C sample. However, it is not alwaysstraightforward to find two aligning media for a protein; in thiscase the algorithm can use NH and CH RDCs in a single mediumsince recording an extra set of RDCs in the same medium requiresonly slightly more spectrometer time. New methods [127] will bevery useful for measuring RDCs in multiple alignment media.

9.1. Exact solutions for computing backbone dihedral angles from RDCs

RDC-EXACT is an exact solution, systematic search-based algorithmfor structure determination. We now describe the outlook for suchalgorithms, focusing on robustness and scaling.

RDC-EXACT was developed both as a de novo structure determina-tion algorithm and as a tool in ‘‘MR for NMR.” In particular, NVRassignments are structure-based. However, RDC-EXACT does not cur-rently exploit (or require) structural priors. Thus, a naı̈ve imple-mentation of the MR-like pathway in Fig. 1 might ignore thestructural homolog found by GD, and hence lose information. How-ever, RDC-EXACT could use the structural homologs found by GD as abias in structure determination. More specifically: RDC-EXACT is cur-rently a de novo structure determination method, starting with res-onance assignments. Those assignments could either come fromNVR or from conventional triple-resonance experiments. Indeed,in the case where GD detects (from unassigned NMR data) a novelfold, not in the database (Fig. 1), one can use algorithms for de novoassignment, that process either triple-resonance experiments orsequential connectivities deduced from short-range NOEs[22,105,149,70,148,12]. However, for a non-novel fold, the struc-tural model determined by GD can be exploited to bias the choicein RDC-EXACT of polynomial roots representing the backbone dihedralangles. Initially these were chosen by Ramachandran fitness; wenow describe an improved version of RDC-EXACT that exploits bias to-wards a priori structure in the regions of regular secondary struc-ture and low dynamics. A series of filters can be employed tochoose between the discrete, finite roots representing conforma-tions. These filters (which may be viewed as a fitness function)combine the fit to the data, modeling (Ramachandran fitness),and structural homology (from GD). In the case of missing data[156], or for denatured or disordered proteins [154], one canintroduce into the fitness function van der Waals or molecularmechanics energies [114,15,117]. This increases reliance on mod-eling, using an empirical energy function to compensate for miss-ing data. Such fitness functions may combine sources of data andmodeling using relative weights. The choice of these weights canbe problematic [124]. By converting each term of the fitness func-tion to a Bayesian probability (proposed by Nilges [124], and suc-cessfully employed in characterizing denatured structures [154]and automated assignments [105]) the resulting probabilitiescan be combined as log likelihood (the sum of the logarithms ofthe probabilities), without explicit weights, thereby reducingsubjectivity.

When 13C chemical shifts are available, TALOS [25] can be usedfor secondary structure determination and to provide restraint onthe polynomial roots representing ð/;wÞ angles. Once RDC-EXACT

has used sparse RDCs to determine the conformations and orienta-tions of elements of secondary structure, these fragments are rig-idly translated using sparse NOEs [152,174] or PREs [154].Connecting loops between these elements can then be solved (asshown in Fig. 12L and [156,174]). Without NMR data, this would re-duce to the kinematic loop closure problem (KLCP), which can beaddressed using computer algebra or optimization techniques[74,134,104,26,27]. With RDC data, the polynomial system of theKLCP can be combined with the quartic and quadratic RDC mono-mials (Eqs. 7, 10 below) to simultaneously find satisfying solutions(conformations) to the KLCP and the RDC equations. This simulta-neous bicriterion optimization search requires solving higher-de-gree polynomials, which can be approximated using homotopycontinuation or solved exactly using multivariate resultant orGröbner basis methods, as presented in [37]. Since the Ramachan-dran plot is less informative for loops, the enhanced filters above,that incorporate packing and empirical molecular mechanics ener-gies, are helpful outside regions of regular secondary structure.One can also employ a modified version of the robotics-basedCCD algorithm [174]. One can represent the loop residues as a con-formation tree (Fig. 9) [152,88,156,174], and filter/search it for theloop ensemble that best fits the experimental RDCs (and any avail-able long-range NOEs). Recent results suggest that this search canreturn a structural loop ensemble with good stereochemistry that

116 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 18: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

simultaneously solves the KLCP consistent with the experimentalNMR measurements [156,174].

RDC-EXACT should be particularly useful for the studies of largeprotein complexes because backbone resonances and associatedRDCs can be readily assigned and measured even in a very largeprotein, such as the 82 kDa malate synthase G [144,143]. The RDC-

EXACT algorithm computes backbone dihedral angles from two RDCsper residue, in constant time per residue. It is possible to derive,from the physics of RDCs, low-degree monomials (with degree atmost 4) whose solutions give the backbone ð/;wÞ angles.

We describe the methodology and protocol for exact solutionsassuming that we are given assigned NH RDCs in two media. Thesemethods also hold for the case of NH and CH RDCs in one medium withslight modifications to the equations below, as shown in Section 2 andreference [151]. For ease of exposition, we assume that the dipolarinteraction constant Dmax is equal to 1. By considering a globalcoordinate frame that diagonalizes the alignment tensor, Eq. (1)becomes:

r ¼ Sxxx2 þ Syyy2 þ Szzz2; ð2Þ

where Sxx; Syy and Szz are the three diagonal elements of a diagonal-ized Saupe matrix S (the alignment tensor), and x; y and z are, respec-tively, the x; y; z-components of the unit vector v in a principal orderframe (POF) which diagonalizes S. Now, S is a 3� 3 symmetric,traceless matrix with five independent elements [140,142]. GivenNH RDCs in two aligning media, the associated NH vector v mustlie on the intersection of two conic curves [135,159]. We now derivethe Wang-Donald Equations (Eqs. 7, 10 and 13) below, that permitthe ð/;wÞ backbone dihedral angles to be solved exactly and inclosed-form, given protein kinematics and two RDCs per reside. Verysimilar equations have been derived for NH and CaHa RDCs mea-sured in a single medium (see Section 2 and reference [151]).

Proposition 1. [152]. Given the diagonal Saupe elements Sxx and Syy

for medium 1, S0xx and S0yy for medium 2 and a relative rotation matrixR between the POFs of medium 1 and 2, the square of the x-componentof the unit vector v satisfies a monomial quartic equation.

The following is a sketch of the proof. The methods for the com-putation of the seven parameters (Sxx; Syy; S

0xx, S0yy and R12) and the

full expressions for the polynomial coefficients and temporaryvariables (a2; b2; c1, etc.) can be found in reference [152].

Proof Sketch: Fix a backbone NH vector v along the backboneand let r (Eq. 2) and r0 ¼ S0xxx02 þ S0yyy02 þ S0zzz

02 be the experimentalRDCs for v in the first and second medium, respectively. We haveðx0 y0 z0ÞT ¼ Rðx y zÞT , where x; y; z are the 3 components of v in aPOF of medium 1, r0 and x0; y0; z0 are the corresponding variablesfor medium 2. Let R ¼ ðRijÞi;j¼1;���;3. Eliminating x0; y0 and z0 we have

c0 ¼ a2x2 þ b2y2 þ c1xyþ c2xzþ c3yz ð3Þc4 ¼ a1x2 þ b1y2 ð4Þ

where a2¼ðS0xx�S0zzÞðR211�R2

13ÞþðS0yy�S0zzÞðR

221�R2

23Þ;c2¼2ðS0xx�S0zzÞR11R13þ2ðS0yy�S0zzÞR21R23, and a1; b1; b2; c0; . . . ; c4 are similar con-stants; full details are given in reference [152]. Eliminating z fromEq. (3) we obtain

d8x4 þ d7x3yþ d6x2y2 � d5x2 þ d4xy3 � d3xy� d2y2 þ d1y4 þ d0 ¼ 0

ð5Þ

where d8 ¼ a22 þ c2

2, and d7;d6; . . . ;d0 are analogously defined; theseare fully specified in [152]. Eq. (5) is a degree 8 monomial in x afterdirect elimination of y using Eq. (4). However, it can be reduced to aquartic equation by substitution since all terms have even degree.Introducing new variables t and u such that

x ¼ a sin t; y ¼ b cos t; u ¼ cos 2t ð6Þ

and through algebraic manipulation we finally obtain the Wang–Donald equation of type I for NH vectors from RDCs in 2 media:

f4u4 þ f3u3 þ f2u2 þ f1uþ f0 ¼ 0: ð7Þ

The full expressions for coefficients a; b; f0; f1; f2; f3; f4 are given in ref-erence [152]. Since u ¼ 1� 2 x

a

� �2 Eq. (7) is also a quartic equation inx2. h

Similar equations to Eq. (7) have been derived for NH and CaHa

RDCs measured in a single medium (see Section 2 and reference[151]). The y-component of v can be computed directly from Eq.(6). Due to two-fold symmetry in the RDC equation the numberof real solutions for v is at most 8. We will refer to the bond vectorbetween the N and Ca atoms as the NCa vector. Given two unit vec-tors in consecutive peptide planes we can use backbone kinematicsto derive quadratic equations to compute the sines and cosines ofthe ð/;wÞ angles:

Proposition 2. [152]. Given the NH unit vectors vi and viþ1 ofresidues i and iþ 1 and the NCa vector of residue i the sines andcosines of the intervening backbone dihedral angles ð/;wÞ satisfy thetrigonometric equations sinð/þ g1Þ ¼ h1 and sinðwþ g2Þ ¼ h2,where g1 and h1 are constants depending on vi and viþ1, and g2and h2 depend on vi;viþ1, sin / and cos /. Furthermore, exactsolutions for sin /; cos /, sin w, and cos w can be computed from aquadratic equation by tangent half-angle substitution.

The following is a sketch of the proof. Full expressions for thepolynomial coefficients and temporary variables ðx1; y1; z1; x2;

y2; z2; g1;h1; g2;h2Þ introduced in the proof are given in reference[152].

Proof Sketch: Using backbone inverse kinematics, the two NHvectors vi and viþ1 can be related by 8 rotation matrices betweentwo coordinate systems in peptide planes i and iþ 1:

vi ¼Rxðh7ÞRyðh6ÞRxðh5ÞRzðwþpÞRxðh3ÞRyð/ÞRyðh8ÞRxðh1Þviþ1: ð8Þ

The definitions of the coordinate systems, the expressions forthe rotation matrices Rx;Ry and Rz and the definitions of the sixbackbone angles (h1; h3; h5; h6; h7 and h8) are given in reference[152]. The backbone ð/;wÞ angles are defined according to thestandard convention. Given the values of these six anglesRl ¼ Rxðh7ÞRyðh6ÞRxðh5Þ and Rr ¼ Ryðh8ÞRxðh1Þ are two 3� 3 con-stant matrices. Define two new vectors w1 ¼ ðx1; y1; z1Þ ¼ R�1

l vi

and w2 ¼ ðx2; y2; z2Þ ¼ Rrviþ1 to obtain

x1 ¼ �ðcos / cos wþ sin h3 sin / sin wÞx2 � cos h3 sin wy2

þ ðcos w sin /� cos / sin h3 sin wÞz2

y1 ¼ ðcos / sin w� sin h3 sin / cos wÞx2 � cos h3 cos wy2

� ðsin / sin wþ cos / sin h3 cos wÞz2

z1 ¼ cos h3 sin /x2 � sin h3y2 þ cos h3 cos /z2: ð9Þ

By Eq. (9) we can then obtain the Wang–Donald equation of typeII for the / dihedral angle from RDCs in 2 media:

sinð/þ g1Þ ¼ h1 ð10Þ

where

h1 ¼z1 þ y2 sin h3ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðx2 cos h3Þ2 þ ðz2 cos h3Þ2q ð11Þ

and g1 is a similar constant; see reference [152] for details. The val-ues of sin / and cos / can be computed from a quadratic equationby the substitution

w ¼ tan/2; sin / ¼ 2w

1þw2 ; cos / ¼ 1�w2

1þw2 ð12Þ

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 117

Page 19: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

Substituting the computed sin / and cos / into Eq. (9) we canobtain the analogous Equation of type II for the w dihedral angle fromRDCs in 2 media [152]; it is another simple trigonometric equation:

sinðwþ g2Þ ¼ h2: ð13Þ

Hence, sin w and cos w can be computed similarly from a quadraticequation where both g2 and h2 6 1 are computed fromy1; x2; y2; z2; h3; sin / and cos /. h

Similar equations to Eqs. (10)–(13) have been derived for NHand CaHa RDCs measured in a single medium (see Section 2 andreference [151]).

The structures computed by RDC-EXACT may be slightly differentfrom structures computed using traditional protocols (Section 3),which use different data, more data, and rely more upon modeling.In contrast to, for example, simulated annealing approaches, RDC-EX-

ACT is built upon the exact solutions for computing backbone ð/;wÞangles from RDC data and systematic search. RDC-EXACT is guaranteedto compute the global minimum (the best fit, or maximum likeli-hood solution to the data) and hence is provable in terms of com-pleteness, correctness, and complexity. We caution that in thecontext of RDC-EXACT, the term ‘provably correct’ refers to the compu-tational correctness of the algorithm. It is not meant to imply thatthe RDC-EXACT structures computed using the above modelingassumptions (i.e., standard protein covalent geometries, pruningof ð/;wÞ solutions using the Ramachandran plot and steric clashes)will always compute biologically correct structures. More precisely,the proofs guarantee accuracy (of the computed structures) andoptimality (of the maximum likelihood structure) up to, but onlyup to the accuracy of the modeling assumptions. One can relaxthe modeling assumptions and improve their accuracy, as outlinedabove. However, a helpful first step has been to develop an algo-rithm that is provably correct up to the limitations of the initialmodel. It is unusual for protein structure determination algorithmsto have guarantees of computing the optimal structure, in polyno-mial time. RDC-EXACT has both these desirable properties [156], andhence promises significant advantages as a method to computestructures accurately using only very sparse restraints.

For smaller proteins it may be possible to collect side-chainRDCs, in which case RDC-EXACT can be extended to compute side-chain conformations. One can extend the algorithms to computecomplete protein structures (including side-chains), since exactequations analogous to Eqs. (7,12) can be derived mutatis mutandisto compute the side-chain dihedral angles v1;v2; . . . from side-chain RDCs. In this case, the average or Ramachandran angles usedas root-filters in [152], may be replaced with modal side-chain rot-amer library angles va;1;va;2; . . .. One can also expedite proteinstructure determination using pseudocontact shift restraints [72]or carbonyl chemical shift anisotropy upon alignment [167], sincesimilar equations can be derived for computing the correspondingvector orientation and dihedral angles.

9.2. Nuclear vector replacement and fold recognition using unassignedRDCs

NVR-like algorithms for automated assignment form a corner-stone for molecular-replacement (MR) type methods for NMR start-ing with only unassigned data. Since GD can be viewed as theapplication (filtering) of NVR modules to a database (as opposed toa single structure), improvements to NVR will, mutatis mutandis, im-prove the accuracy and performance of GD. Note that unlike MR (incrystallography), NVR/GD is sufficiently efficient computationally thatit can be applied to thousands of template structures in a short time.

NVR uses two classes of constraints: geometric and probabilistic(Fig. 14). The HN � 15N RDC, H–D exchange and 15N-NOESY each

provide independent geometric constraints on assignment. Asparse number of dNNs are extracted from the unassigned NOESYafter the diagonal peaks of the NOESY are cross-referenced to thepeaks in the HSQC. These dNNs provide distance restraints forassignments. In general, there are only a small number of unam-biguous dNNs that can be obtained from an unassigned 3D 15N-edi-ted NOESY. The amide exchange information probabilisticallyidentifies the peaks in the HSQC corresponding to non hydrogenbonded, solvent-accessible backbone amide protons. RDCs provideprobabilistic constraints on each backbone amide-bond vector’sorientation in the POF. Finally, chemical shift prediction is em-ployed [79] to compute a probabilistic constraint on assignment.NVR exploits the geometric and probabilistic constraints by com-bining them within an Expectation/Maximization framework.

NVR can be improved by exploiting different, but still efficientexperiments. Currently, NVR uses amide exchange experimentsto distinguish surface from buried residues [79]. This ‘‘inside/out-side” information is correlated with the structural model, and in-creases assignment accuracy. When these experiments may beimpractical for larger proteins, one can, in principle, employ para-magnetic quenching [172] or ‘‘water HSQC” experiments [41,51],which identify fast-exchanging or solvent-exposed backbone pro-tons. Since this information is complementary to the slow-ex-change protons identified by H–D exchange, it likely could beintegrated into NVR by complementing the corresponding edgeweights in a bipartite graph, which we shall discuss below. Simi-larly, it is probable that assignment accuracy will improve byincorporating CH RDCs into NVR and GD, since in helices theCa—Ha bond vector orientations are less degenerate than NH vec-tors, thereby both improving tensor accuracy in the bootstrappingphase and increasing disambiguation of helical RDCs in the assign-ment phase. 4-dimensional 15N-,15N- and 13C-,15N-edited NOESYexperiments [21] from singly and doubly labeled samples, respec-tively, will reduce the ambiguity in both spin systems and assign-ments, and facilitate matching of NOE restraints to structuralmodels. While the acquisition time for 4D NOESY is longer, the di-rect polar Fourier transformation holds great potential to speed updata collection significantly, by producing quantitatively accuratespectra from radial and concentric sampling [23,13].

The ultimate goal is to make NVR useful in an analogous man-ner to molecular replacement (MR) in crystallography. To do thisNVR has been extended to operate on more distant structuralhomologs [8]. The relative sensitivity of RDCs to structural noiseis evident in Eq. (1), from the quadratic dependence on the bondvector v versus the linear dependence on the alignment tensor S.In the NVR scoring function, the M step of the EM algorithm usesedge weights to compute the joint probability of the resonanceassignments in an ensemble of bipartite graphs. This score is amaximum likelihood estimator of assignment accuracy, and asparameters and inputs to the algorithm (such as the structuralmodel and alignment tensor) are varied, its log likelihood maybe optimized. As cross-validation, an alternative scoring function,proposed in [79], employs the mean tensor consistency computedfrom an ensemble of subsets of assignments against the struc-tural model. In principle, both methods allow a parametric familyof structural models to be explored about the putative homolog,enabling NVR to bootstrap assignments analogously to the role ofMR, starting with a structural homolog in crystallography. In-deed, traditional (X-ray) MR had previously been extended tosolve phase problems from more distant models by using normalmode analysis (NMA) [138,137]. Analogously, one can perform ageometric exploration of conformation space to generate anensemble of neighboring structures to the initial homolog usingNMA [8].

NMA can be performed as described traditionally [58,17] orwith a Gaussian network model [141]. It would also be valuable

118 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 20: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

to compare and geometrically simulate diffusive motions to sam-ple the internal mobility of the protein using alternative tech-niques such as FRODA, FIRST, and ROCK [65,173,161], that are basedon graph rigidity theory. The structures that maximize the likeli-hood of assignment accuracy can then be used for NVR [8]. An al-lied idea was used earlier by Baker and co-workers for decoydetection in the context of pruning structure predictions by NMRdata [97]. NVR-NMA ensembles [8] are different in that they sys-tematically represent the deformation space for equilibrium pro-tein motions, rather than a family that mixes correct andincorrect structures. It appears that as in [138], the NMA-like ap-proaches above can adequately sample the necessary convolutionof structural variation, noise, and dynamics around the targetstructure [8].

We now review a number of technical and algorithmic featuresof NVR. NVR uses a variation of the EM algorithm [79], which is astatistical method for computing the maximum likelihood esti-mates of parameters for a generative model. EM has been a populartechnique in a number of different fields, including machine learn-ing and image understanding. It has been applied to bipartitematching problems in computer vision [30]. Bipartite matching isextremely sensitive to noise in the edge weights. There is evidencethat sophisticated algorithms such as EM or NVR must be em-ployed if bipartite matching is to be used as a subroutine on datathat is sparse or noisy [79,8]. In the EM framework there are bothobserved and hidden (i.e., unobserved) random variables. In thecontext of structure-based assignment, the observed variables arethe chemical shifts, unassigned dNNs, amide exchange rates, RDCs,and the 3D structure of the target protein. Let X be the set of ob-served variables.

The hidden variables Y ¼ YG [ YS are the true (i.e., correct) res-onance assignments YG, and YS, the correct, or ‘true’ alignment ten-sors. Of course, the values of the hidden variables are unknown.Specifically, YG is the set of edge weights of a bipartite graph,G ¼ fK;R;K � Rg, where K is the set of peaks in the HSQC and Ris the set of residues in the protein. The weights YG represent cor-rect assignments, therefore encode a perfect matching in G. Hence,for each peak k 2 K (respectively, residue r 2 R), exactly one edgeweight from k (respectively r) is 1 and the rest are 0.

The probabilities on all variables in Y are parameterized by the‘model’, which is the set H of all assignments made so far by thealgorithm. Initially, H is empty. As EM makes more assignments,H grows, and both the probabilities on the edge weights YG andthe probabilities on the alignment tensor values YS will change.The goal of the EM algorithm is to estimate Y accurately to discoverthe correct edge weights YG, thereby computing the correct assign-ments. The EM algorithm has two steps; the Expectation (E) stepand the Maximization ðMÞ step. The E step computes the expecta-tion EðH [H0jHÞ ¼ Eðlog PðX;Y jH [H0ÞÞ. Here, H0 is a non-emptyset of candidate new assignments that is disjoint from H. The Mstep computes the maximum likelihood new assignmentsH� ¼ argmaxH0EðH [H0jHÞ. Then the master list of assignments isupdated, H H [H�.

The alignment tensors are re-computed at the end of each iter-ation, using all the assignments in H. Thus, the tensor estimateswill be continually refined during the run of the algorithm. Thealgorithm terminates when each peak has been assigned.

Care must be taken to implement the probabilistic EM frame-work efficiently; for details see [79]. In brief: Individual bipartitegraphs are constructed for each of the 7 sources of data (see Fig.14. NVR operates on bipartite graphs between peaks and residues.The edge weights from each peak to all residues form a probabilitydistribution. The probabilities are derived from (1) ‘‘inside/outside”experiments (H–D exchange, paramagnetic quenching), (2) dNNs,(3) chemical shift predictions based on average chemical shiftsfrom the BMRB [132], (4) chemical shift predictions made by the

program SHIFTS [169], (5) chemical shift predictions made by theprogram SHIFTX [101], and (6–7) constraints from RDCs in twomedia).

Data (1–7) above are converted into constraints on assignment.The difference between the experimentally-determined chemicalshifts and the set of predicted chemical shifts are converted intoassignment probabilities. Let S1 and S2 be the alignment tensorscomputed from unassigned data as described in Section 4 and ref-erences [79,82,83,78]. The amide bond vectors from the structuralmodel are used to back-compute a set of RDCs using Eq. (1). Thedifference between each back-computed RDC and each experimen-tally recorded RDC is converted into a probability on assignment.Let Dm be the set of observed RDCs in medium m, and Fm be theset of back-computed RDCs using the model and Sm. Two bipartitegraphs M1 and M2 are constructed on the peaks in K and residuesin R. The edge weights are computed as probabilities as follows:wðk; rÞ ¼ Pðk # rjSmÞ ¼ gðk; rÞ where k 2 K and r 2 R. Here,gðk; rÞ ¼NðdmðkÞ � bmðrÞ;rmÞ where dmðkÞ 2 Dm, bmðrÞ 2 Fm. Theprobabilities are computed using a 1-dimensional Gaussian distri-bution Nðx� l;rÞ ¼ 1

rffiffiffiffi2pp exp � ðx�lÞ2

2r

� �, with mean dmðkÞ � bmðrÞ

and standard deviation rm. Langmead et al. [79] used r ¼ L=8 Hzin their experiments (Section 4), where L is the range of the RDCsin that medium (the maximum-valued RDC minus the minimumvalued RDC). If an RDC is missing in medium i for a peak k, thenNVR sets the weight wðk; rÞ ¼ 1=n0 in graph Mi, for each residue rof the n0 remaining (i.e., unassigned) residues. The details of howthe chemical shifts, dNNs and amide exchange rates are convertedinto constraints on assignment are analogous, see [79].

NVR uses a unique voting scheme for combining multiplesources of information [79,8]. For each possible combination ofdata (1–7) above, a combined graph is constructed whose edgeweights are the joint probabilities of the edges in the single-spectragraphs. NVR then constructs a new bipartite graph V , called theexpectation graph, whose edge weights are initialized to 0. For eachcombined graph c, we compute a maximum bipartite matching. LetHc � E be the matching computed on c. For each edge e 2 Hc NVRincrements the weight on the same edge in the expectation graph,V , by 1. This is done for all combined graphs. Hence, each bipartitematching ‘‘votes” for a set of edges. Thus, edge weights in V recordthe number of times a particular edge was part of a maximumbipartite matching. Note that the edge weights are probabilitiesin the bipartite graphs. Thus, a bipartite matching gives a maxi-mum likelihood solution on that graph, which in turn maximizesthe expected log-likelihood of the average edge weight.

Let w0 be the largest edge weight in V . The M step is computed,and assignments are made by H H [w�1ðw0Þ. As previously sta-ted, each of the constituent votes used to construct V is a maxi-mum likelihood solution. Hence, it maximizes the expected edgeweight in the matching. Therefore, in the bipartite graph V , thoseedges with maximum votes have the highest expected values overall combinations of the data, thus correctly computing the maxi-mum likelihood new assignments H� as the argmax in the M-Stepdescribed above.

When an assignment is made, the associated nodes k 2 K andr 2 R are removed from BNOE, the NOE graph. The geometric NOErestraints are applied to prune matching. The three graphs forchemical shift prediction are synchronized with BNOE so that allgraphs have the same set of zero-weight edges. Each graph is thenre-normalized so that the edge weights from each peak form aprobability distribution. This completes a single iteration of theEM algorithm.

Recently, a new version of the GD module of NVR, called HD,improved the accuracy of NVR/GD [80]. In particular, HD eliminatesthe false-positives evident in Fig. 15, even against a largerdatabase of 4500 folds (vs. only 2456 in GD [78]); see Fig. 17 and[80]. Furthermore, it has been found that combining HD with

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 119

Page 21: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

NMA ensembles (described above) can boost the assignment accu-racy using distant structural homologs (3–7 Å backbone RMSD) byup to 22% [8].

The advantage of NVR over maximum bipartite matching lies inits iterative nature. NVR takes a conservative approach, makingonly likely assignments, given the current information. After mak-ing these assignments, the edge weights between the remainingunassigned peaks and residues are updated. Suppose that, duringthe ith iteration of the algorithm, peak k is assigned to residue r.The edge weight between peak k and residue r is then set to 1, indi-cating the certainty of that assignment. The edge weights form aprobability distribution. Accordingly, the edge weight betweenpeak k and any other residue is set to 0. Similarly, the weights onthe edges from any other peak to r are immediately set to 0. The(non-zero) edge weights from each remaining unassigned peakare re-normalized prior to the next iteration. Thus, a peak whoseassignment may be ambiguous in iteration i may become unam-biguous in iteration iþ 1.

Somewhat surprisingly, results of perturbation studies [[79], pp.120–130] suggest that NVR is not very sensitive to the quality ofthe initial tensor estimates, because the additional, non-RDC linesof evidence (chemical shift prediction, amide-exchange, dNNs) canovercome these inaccuracies. The NVR voting algorithm (see aboveand references [79,8]) used to integrate different lines of evidenceis essentially a means to increase the signal-to-noise ratio. Here thesignal is the computed likelihood of the assignment between apeak and the (correct) residue. The noise is the uncertainty in thedata, where the probability mass is distributed among multipleresidues. Each line of evidence (i.e., experiment) has noise, butthe noise tends to be random and thus cancels when the lines ofevidence are combined. Conversely, the signals embedded in eachline of evidence tend to reinforce each other, resulting in (rela-tively) unambiguous assignments. Hence, even if the two initialtensor estimates are poor, it is unlikely that they can conspire(by voting together) to force an incorrect assignment. More gener-ally, given the NVR voting scheme (above, and [79,8]), any pair oflines of evidence is unlikely to outvote the majority. Finally, theiterative update (described above) by the alignment tensors at

each cycle of the EM, allows the tensor accuracy to improve asmore assignments are made [79].

9.3. Automated NOE assignment

TRIANGLE is discussed for illustrative purposes; the interestedreader is referred to HANA [174] for a more sophisticated algorithmwith the same flavor. The TRIANGLE [153] and HANA [174] algorithmsfor automated NOE assignment use a rotamer library ensemble andRDCs. Both algorithms employ a rotamer-library-derived ensembleof intra- and inter-residue vectors between the backbone atomsand side-chain protons to reduce the ambiguity in the assignmentof NOE restraints involving side-chain protons, especially aliphaticprotons. For example, consider the putative assignment of an NOEto an ðHN;HdÞ (respectively, ðHa;HdÞ) pair of protons. The trianglerelationship (see Fig. 16L) defines a decision procedure to filterNOE assignments by fusing information from RDCs ðvNbÞ, rotamermodeling ðvbdÞ, and NOEs ðjvNdjÞ. Specifically,

Proposition 3. [153]. Given the vector vbd from a Cb atom to a side-chain proton S (e.g., Hd), and the vector vNb from the backbone HN tothe Cb, (respectively the vector vab from the Ha atom to Cb), the vectorvNS from the HN to S (respectively the vector vaS from the Ha to S) canbe computed by: vNS ¼ vNb þ vbd;vaS ¼ vab þ vbd.

vNb, and vab are computed directly from the backbone structuredetermined by RDC-EXACT and can be inter- or intra-residue. The vec-tors vNS;vNb and vbd form three sides of a triangle (Fig. 16), wherevNb is computed from the backbone structure and vbd lies in a finiteset of vectors, mined from the PDB rotamers. Hence, for each side-chain proton, vbd lies in a finite set of vectors. Consequently, there isa finite set VN (respectively Va) of vectors containing vNS (respec-tively vaS). The triangle relationship (Proposition 3) is then usedto filter the assignment of NOE peaks involving side-chain protons.The algorithm first estimates the NOE distances, dN and da, using theNOE intensities from, respectively, the experimental NOE peakspicked from the 3D 15N- and 13C-NOESY spectra. TRIANGLE tests dN

and da to see whether they satisfy the constraints: min jvNSj�eN 6 dN 6max jvNSj þ eN;min jvaSj � ea 6 da 6max jvaSj þ ea, where

A C

B

Fig. 17. Improved fold detection from unassigned NMR data [80]: (A) score computed by HD vs. backbone RMSD (Å). The line is a least-squares fit to the data. The correlationcoefficient is �0.75. Open diamonds are homologous structures with low sequence similarity, detected by HD score P �10 (Table B). Solid diamonds are structures classifiedas non-homologs by HD score < �10 (Table C). Tables B–C: (a) 3 test proteins: (i) Ubiquitin, (ii) GaIP, (iii) SPG. bPDB/chain Id for the structures detected by HD. cSequenceidentity of the 3 test proteins(a) vs. the primary sequence of the structure.b dBackbone RMSD between the structuresb and the native structures of the test proteins.(a) eScorecomputed by the algorithm HD. Higher HD-scores (closer to 0) indicate closer structural similarity. Table B does not include those structures detected by HD that have > 30%

sequence identity.

120 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 22: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

min and max for vNS (respectively vaS) are both computed overvNS 2 VN (respectively vaS 2 Va). eN and ea are the NOE distanceerror bounds for the corresponding 3D 15N- (respectively 13C-) edi-ted NOESY peaks.

Tests of TRIANGLE and HANA are described in Section 6 and references[153,174]. In the future, it will be important to quantify any possiblebias introduced by the rotamer library, using both rotamer subsam-pling and rotamer energy minimization (generalizing prior work onrotamer optimization [88,45,44]), and comparison to conventionalassignments. For example, one can use a subsampled rotamerlibrary that is >6000-times finer than the library in [153]. In preli-minary tests on ubiquitin NOE spectra, the finer library resulted inmodest improvements in assignment accuracy (about 5%) at aconsiderable computational cost (two hours vs. one second). Thissuggests that the optimal number of rotamers should lie in between.

Further study is required of the effect of spin diffusion on cross-peak intensity and the NOE distance jvNdj in the triangle relation-ship (Fig. 16L) for assignment pruning. An extension of TRIANGLE

has been used with RDC-EXACT to assign the NOEs and compute struc-tures, including side-chains [174]. It will be valuable to extend TRI-

ANGLE to account for the various labeling patterns necessary in largeproteins to solve the structures of large systems. These labelingstrategies can sometimes vitiate the HCCH-TOCSY [108] for side-chain resonance assignments. One can extend TRIANGLE to assignNOEs involving these ambiguous proton resonances, by replacingtheir side-chain assignment with a set of possible labels (analogousto [1]). The labels, which represent competing assignments for theindirect 1H dimension, can be pruned using Proposition 3. Theresulting NOESY assignments from TRIANGLE would provide side-chain resonance assignments as a byproduct. This extension to TRI-

ANGLE would not only reduce the dependence of NOE assignmentalgorithms upon the (less sensitive) TOCSY experiments, but alsoincrease their tolerance of incomplete side-chain resonanceassignments.

Side-chain/side-chain NOEs may be assigned by ‘lifting’ TRIAN-

GLE’s rotamer-based ensemble of internuclear vectors (via Cartesianproduct). 4D NOESY experiments can expedite TRIANGLE’s computa-tion of these assignments. A discussion of the information contentof 4D NOESY vs. selective labeling is therefore warranted. TRIANGLE

and its successor, HANA [174], employ a ‘symmetry check’ [60] to in-crease assignment confidence using ‘back’ NOEs. Consider a puta-tive assignment for an NOE, 15N—HN $ 1H, where xð1HÞ isassigned to 1Hd of F16. If the assignment is correct then we shouldsee a symmetric NOE in the 13C-NOESY: HN $ 1Hd � 13Cd of F16.Finding this peak requires searching a box in the 3D 13C-NOESY,about the ðHN; 1H;13CdÞ shifts. Since the ‘back’ NOE might be miss-ing and there might be other peaks in this box, the symmetry checkmust be treated inferentially, as changing the Bayesian probabilityof the assignment. Using a 4D NOESY reduces this ambiguity, andin principle the symmetric interaction will always be observed.As a pruning condition, back NOEs and 4D NOEs complement thetriangle relationship (Proposition 3) by integrating the geometricand spectral constraint of the additional observed 13Cd. However,under some isotopic labeling schemes, 4D NOEs to protons bondedto 12C will not be observed, and the set-of-labels strategy abovemust be employed. Finally, TRIANGLE and HANA should be helpful toassist with proposed experimental methods for 24–65 kDa pro-teins that currently use manual 4D NOE assignment (e.g., [170]),which could then be automated, saving time and reducingsubjectivity.

9.4. NMR structure determination of symmetric oligomers

Many experimental challenges exist for determining the struc-ture of membrane protein complexes. Current algorithms in thisarea, including SYMBRANE (Section 7) do not address all of these prob-

lems, and make a number of assumptions. Most importantly, it cansometimes be assumed that the overall fold of a subunit can be ob-tained from RDCs and other NMR measurements in the holo state,and that inter-subunit NOEs can be distinguished from intra-sub-unit NOEs using isotope filtering, as was done for phospholamban[111,117]. In particular, the automated ambiguous inter-subunitNOE assignment in SYMBRANE relies on these assumptions [118].

One wishes to extend and relax these assumptions; although along-term goal is to develop algorithms for general membrane pro-teins, this review focuses next on the challenges of symmetric olig-omers, which include many biomedically important systems.Moreover, RDCs can play a crucial role for such complexes, not onlyin determining the conformation and orientation of the individualsubunits, but also in computing the symmetry axis. The overallscheme of SYMBRANE is shown in Fig. 18. In contrast, traditional pro-tocols [102] for structure determination of protein complexes fromNMR data use simulated annealing (SA) and molecular dynamics(MD). However, the SA/MD mechanism is not complete and couldget trapped in local minima. Furthermore, the precision in thedetermined structure is strongly affected by the temperature. SYM-

BRANE avoids temperature dependence and local minima problems,and does not suffer from ‘‘false precision” in characterizing thediversity of determined structures. SYMBRANE’s search in the symme-try configuration space (SCS) takes advantage of the ‘closed-ring’constraint of a symmetric homo-oligomer. AMBIPACK [150] used abranch-and-bound algorithm to compute rigid body transforma-tions satisfying potentially ambiguous inter-subunit distance re-straints. In contrast, SYMBRANE uses the oligomeric number toenforce an a priori symmetry constraint. In this sense it is analo-gous to the manner in which non-crystallographic symmetry ishandled in molecular replacement for X-ray crystallography [89].By formulating the structure determination problem in the SCSrather than in the space of atom positions, SYMBRANE is able to ex-ploit directly the kinematics of the ‘closed-ring’ constraint, andthereby derive an analytical bound for pruning, which is tighterand more accurate than previous randomized numerical tech-niques [150]. SYMBRANE’s guarantees of completeness should beespecially important for membrane proteins (where sparse datais a particularly important problem), but should also prove usefulfor water-soluble oligomers.

SYMBRANE simultaneously assigns NOEs and determines structure(by identifying regions in SCS), and is guaranteed to find all consis-tent assignments and structures [118]. Let R be a set of (possiblyambiguous) NOEs. Let rkl represent the lth possible assignment ofthe kth NOE rk of R, and let skl � S2 � R2 (see Fig. 18a) be the regionof the SCS in which distance restraint rkl is satisfied. An assignmentspecifies the subunit and the protons involved in the NOE. A deter-mined structure must satisfy all ambiguous NOEs; it satisfies eachby satisfying one or more of its possible assignments. In the SCS,this translates into finding the region

pðRÞ¼ ðs11[ s12[ s13[ . . .Þ\ðs21[ s22[ . . .Þ\ . . .¼\jRjk¼1

[tk

l¼1

skl

!ð14Þ

where tk is the number of possible assignments for rk, the kth NOE.Using Eq. (14), the algorithm computes redundant and inconsis-

tent NOEs and assignments, via unions and intersections of geo-metric sets in SCS. Let Q � R be a set of NOEs. Let rk be an(arbitrary) ambiguous NOE (not in Q) with tk assignments. Wecompute that NOE assignment rkl is redundant with respect to Qif and only if pðQÞ � skl. Similarly, we compute that rkl is inconsis-tent with respect to Q if and only if pðQÞ \ skl ¼ ;, where ; is theempty set. We can extend these algorithms from assignmentsto NOEs as follows. Let Sk ¼

Stkl¼1skl be the region of the SCS which

satisfies at least one assignment of rk. We then compute that rk isredundant with respect to Q if and only if pðQÞ � Sk, and compute

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 121

Page 23: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

that rk is inconsistent with respect to Q if and only if pðQÞ \ Sk ¼ ;.SYMBRANE eliminates inconsistent NOEs and assignments, and isable to detect redundant ones. SYMBRANE’s approach to identifyingthe mutually-consistent, complete set of NOE assignments andSCS regions is summarized in Fig. 19 and described in detail inreference [118].

When RDCs can be obtained in combination with NOEs, thenstructure determination of symmetric homo-oligomers reduces toan inverse kinematics problem that can be solved exactly (analo-gously to the procedures in Section 3), without resorting to thebranch-and-bound search used in references [117,118]. It is likelythat ambiguous NOE assignment [118] will also be improved sincethe ensemble of exact solutions will be smaller, increasing the struc-tural precision that geometrically prunes the assignments. SYMBRANE

currently allows for limited uncertainty in the subunit structure:SYMBRANE computes satisfying structures using rigid monomers, but

energy scores allowing side-chain minimization [117]. This allowsfor some side-chain flexibility, but is subject to local minima andis not systematic. Instead, one could use a rotamer library to sys-tematically sample different distinct conformational side-chainstates, exploiting recent advances in rotamer optimization[88,45,44], particularly in the presence of provable rotamer energyminimization [45,44]. Considering rotamers increases the complex-ity of the hierarchical subdivision scheme, but techniques such asdead-end elimination (DEE) [34,85,48,115,44] and minimized DEE[45,44] can be used to prune rotamers and compute solutions morequickly. It may also be possible to derive exact or analytic solutionsfor NOE assignment when RDCs are available. While more difficult,backbone flexibility could also be incorporated in a DEE-like search[42,43]. Allowing greater uncertainty in the subunit structure,would make SYMBRANE more robust and eventually useful for multi-mers with a more intricate folding topology.

Fig. 18. SYMBRANE algorithm. (a) Given the subunit structure, the relative position t 2 R2 and orientation a 2 S2 of the symmetry axis uniquely determines structure. Structuredetermination is a search problem in 4-dimensional symmetry configuration space (SCS), S2 � R2. An NOE is shown between protons p and q0 . (b) The branch-and-boundalgorithm proceeds as a tree search in SCS. The 4D SCS is represented as two 2D regions, a sphere representing the orientation space S2 and a square representing thetranslation space R2. The dark shaded regions at each node of the tree represent the region in SCS being explored ðA� T � S2 � R2Þ. Ultimately (bottom left of the tree) thebranch-and-bound search returns regions in 4D space representative of structures that possibly satisfy all the restraints. At each node, we test satisfaction of each restraint ofform pq0 6 d by testing intersection between the ball of radius d centered at p and the convex hull bounding possible positions of q0 . If there exists an intersection betweenthe ball and the convex hull for each restraint, further branching is done (node 1); otherwise, the entire node and its subtree are pruned (node 2). (c) Representative results:Complete ensemble of NMR structures of the unphosphorylated human Phospholamban pentamer, PDB id: 2HYN [117]. (d) Phospholamban restraint satisfaction score vs.packing score for all structures. The vertical and horizontal lines indicate the cutoffs for WPS structures: 1 Å for the satisfaction score and 0 kcal/mol for the packing score. Thegreen stars and the blue crosses indicate the set of satisfying structures. The magenta points indicate the set of non-satisfying structures that have been pruned. The greenstars indicate the set of WPS structures and the red star indicates the reference structure.

122 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 24: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

abcd

bc

d

abcd

a

Ambiguity Consistent Regions in SCS Assigned NOE

c

d cb a

dc

ba

Subunit 1ab

dSubunit 2

Subunit 3

Subunit 3

Subunit 2Subunit 1

d b a

dc

ab

c

da b

c

(Three possible atom assignments each)

Ambiguity resolution algorithm

Subunit assignment #2Subunit assignment #1

(one restraint illustrated)

Branch-and-bound search

Possible assignments of ambiguous NOEs

Consistent regions in SCS

Oligomeric Number.Subunit Structure,Subset of the NOEs,

Fig. 19. SYMBRANE strategy to resolve ambiguous NOEs in structure determination of symmetric homo-oligomers. (Top) Different possible assignments for one ambiguous NOEare illustrated for a trimer. The NOE could be ordered between subunits as 1–2–3 or 1–3–2, and has chemical shift degeneracy between protons a; b, and c. The former is called‘‘subunit ambiguity” and the latter ‘‘atom ambiguity.” Phase 1: Consistent regions in the SCS are obtained using a branch-and-bound algorithm [117], taking a subset ofavailable NOEs (typically atom-unambiguous NOEs, if any) as input. The 4-dimensional SCS is cartoon-represented as in Fig. 18b. Phase 2: SYMBRANE uses the consistent regionsoutput from the branch-and-bound search plus possible assignments of the ambiguous NOEs, to determine a mutually-consistent, complete set of NOE assignments andambiguity consistent regions (ACR) in SCS. (Bottom) Pseudocode for SYMBRANE’s ambiguity resolution algorithm. Each NOE’s score is the average of the scores of all of itspossible assignments (the line marked (*), Bottom). This score provides an ordering criterion for NOE assignment. [118] shows that the order does not affect the completeness,results, or accuracy, but can improve the efficiency.

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 123

Page 25: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

Two novel features of SYMBRANE are of particular note: (1) theability to assign inter-subunit NOEs in a complete search thatavoids exhaustive, exponential enumeration [118]; and (2) theability to determine the oligomeric number using NMR data alone[117]. Specifically:

(1) Errors in resolving inter-subunit NOE ambiguities have ledto incorrect NMR structures [20]. Beyond the importanceof this particular problem in NMR data analysis, SYMBRANE

represents a departure from the stochastic methods fre-quently employed by the NMR community. A corollary isthat such stochastic methods, now routinely employed,should be reconsidered in light of their inability to assureidentification of the unique or globally-optimal structuralmodels consistent with a set of NMR observations. SYMBRANE

can provide more confident answers to these problems.(2) In general, the oligomeric state of the complex includes the

symmetry type (Cn;Dn, etc.) as well as the oligomeric numberðn ¼ 2;3;4; . . .Þ. In addition to assigning the intermolecularNOEs and determining the complex structure, SYMBRANE testswhether the available data suffices to determine the oligo-meric number for Cn symmetry. For each possible symmetrytype and oligomeric number, SYMBRANE can determine a set Yof WPS structures. For some oligomeric states, Y ¼ ;: we canplace higher confidence on the oligomeric state that (a) hassome WPS structures (i.e., Y–;), and (b) affords better vdWpacking. Thus we can determine the oligomeric state usingthe NMR data and vdW packing. SYMBRANE provides for anindependent verification of the oligomeric state (which istypically determined using experiments such as chemicalcross-linking followed by SDS-PAGE, or by equilibrium sed-imentation). The results from SYMBRANE show how novel algo-rithms can draw conclusions from NMR data that werepreviously possible only using additional biophysical exper-iments. Such algorithms can reduce experimental time andprovide cross-validation of structure and assembly.

Acknowledgements

We thank our colleagues for collaborations and assistance inNMR methodology, without which this article would not havebeen possible: Chris Bailey-Kellogg, Jeffrey Boyles, James Chou,Jeff Hoch, Dan Keedy, Shobha Potluri, David and Jane Richardson,Chittu Tripathy, Gerhard Wagner, Tony Yan, Anna Yershova, andPei Zhou. We thank the following colleagues for helpful discus-sions, which greatly improved this article: Marcelo Berardi, Vin-cent Chen, David Cowburn, Brian Hare, Terry Oas, TomásLozano-Pérez, Art Palmer, Len Spicer, Bruce Tidor, Ron Venters,and Peter Wright. We thank all members of the Donald labora-tory, past and present, for helpful discussions and comments.We thank the Duke University NMR Center for their assistanceand support. The authors are supported by NIH grantGM-65982 to B.R.D.

References

[1] A.B. Eiso, D.J.R. Pugh, R. Kaptein, R. Boelens, A.M.J.J. Bonvin, Direct use ofunassigned resonances in NMR structure calculations with proxy residues, J.Am. Chem. Soc. 128 (23) (2006) 7566–7571.

[2] H.M. Al-Hashimi, P.J. Bolon, J.H. Prestegard, Molecular symmetry as an aid togeometry determination in ligand protein complexes, J. Magn. Reson. 142(2000) 153–158.

[3] H.M. Al-Hashimi, A. Gorin, A. Majumdar, Y. Gosser, D.J. Patel, Towardsstructural genomics of RNA: rapid NMR resonance assignment andsimultaneous RNA tertiary structure determination using residual dipolarcouplings, J. Mol. Biol. 318 (2002) 637–649.

[4] H.M. Al-Hashimi, D.J. Patel, Residual dipolar couplings: synergy between NMRand structural genomics, J. Biomol. NMR 22 (1) (2002) 1–8.

[5] A.C. Anderson, Two crystal structures of dihydrofolate reductase-thymidylatesynthase from Cryptosporidium hominis reveal protein–ligand interactionsincluding a structural basis for observed antifolate resistance, ActaCrystallogr. Sect. F Struct. Biol. Cryst. Commun. 61 (Pt. 3) (2005) 258–262.

[6] M. Andrec, P. Du, R.M. Levy, Protein backbone structure determination usingonly residual dipolar couplings from one ordering medium, J. Biomol. NMR 21(4) (2001) 335–347.

[7] A. Annila, H. Aitio, E. Thulin, T. Drakenberg, Recognition of protein folds viadipolar couplings, J. Biomol. NMR 14 (1999) 223–230.

[8] S. Apaydin, V. Conitzer, B.R. Donald, Structure-based protein NMRassignments using native structural ensembles, J. Biomol. NMR 40 (2008)263–276.

[9] P.J. Artymiuk, C.C.F. Blake, D.W. Rice, K.S. Wilson, The structures of themonoclinic and orthorhombic forms of hen egg-white lysozyme at 6angstroms resolution, Acta Crystallogr. B Biol. Crystallogr. 38 (1982) 778.

[10] C.R. Babu, P.F. Flynn, A.J. Wand, Validation of protein structure frompreparations of encapsulated proteins dissolved in low viscosity fluids, J.Am. Chem. Soc. 123 (2001) 2691.

[11] C. Bailey-Kellogg, J.J. Kelley III, R. Lilien, B.R. Donald, Physical geometricalgorithms for structural molecular biology, in: The Special Session onComputational Biology & Chemistry, Proceedings IEEE InternationalConference on Robotics and Automation (ICRA-2001), May 2001, pp. 940–947.

[12] C. Bailey-Kellogg, A. Widge, J.J. Kelley III, M.J. Berardi, J.H. Bushweller, B.R.Donald, The NOESY jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data, J. Comp. Biol. 3–4 (7)(2000) 537–558.

[13] B.E. Coggins, P. Zhou, Sampling of the NMR time domain along concentricrings, J. Magn. Reson. 184 (2) (2007) 207–221.

[14] B. Berger, J. Kleinberg, F.T. Leighton, Reconstructing a three-dimensionalmodel with arbitrary errors, J. ACM 46 (2) (1999) 212–235.

[15] A.T. Brünger, XPLOR: A System for X-ray Crystallography and NMR, YaleUniversity Press, New Haven, 1993.

[16] J. Cavanaugh, W.J. Fairbrother, A.G. Palmer III, N.J. Skelton, Protein NMRSpectroscopy: Principles and Practice, Academic Press, 1995.

[17] C.N. Cavasotto, J.A. Kovacs, R.A. Abagyan, Representing receptor flexibility inligand docking through relevant normal modes, J. Am. Chem. Soc. 127 (26)(2005) 9632–9640.

[18] Y. Chen, J. Reizer, M.H. Saier Jr., W.J. Fairbrother, P.E. Wright, Mapping of thebinding interfaces of the proteins of the bacterial phosphotransferase system,HPr and IIAglc, Biochemistry 32 (1) (1993) 32–37.

[19] J.J. Chou, S. Li, A. Bax, Study of conformational rearrangement and refinementof structural homology models by the use of heteronuclear dipolar couplings,J. Biomol. NMR 18 (2000) 217–227.

[20] G.M. Clore, J.G. Omichinski, K. Sakaguchi, N. Zambrano, H. Sakamoto, E.Appella, A.M. Gronenborn, Interhelical angles in the solution structure of theoligomerization domain of p53: correction, Science 267 (5203) (1995) 1515–1516.

[21] B.E. Coggins, R.A. Venters, P. Zhou, Filtered backprojection for thereconstruction of a high-resolution (4,2)D CH3 � NH NOESY spectrum on a29 kDa protein, J. Am. Chem. Soc. 127 (33) (2005) 11562–11563.

[22] B.E. Coggins, P. Zhou, PACES: protein sequential assignment by computer-assisted exhaustive search, J. Biomol. NMR 26 (2) (2003) 93–111.

[23] B.E. Coggins, P. Zhou, Polar Fourier transforms of radially sampled NMR data,J. Magn. Reson. 182 (1) (2006) 84–95.

[24] F. Cordier, M. Rogowski, S. Grzesiek, A. Bax, Observation of through-hydrogen-bond 2hJHC0 in a perdeuterated protein, J. Magn. Reson. 40 (2)(1999) 510–512.

[25] G. Cornilescu, F. Delaglio, A. Bax, Protein backbone angle restraints fromsearching a database for chemical shift and sequence homology, J. Biomol.NMR 13 (1999) 289–302.

[26] E.A. Coutsias, C. Seok, M.P. Jacobson, K.A. Dill, A kinematic view of loopclosure, J. Comput. Chem. 25 (4) (2004) 510–528.

[27] E.A. Coutsias, C. Seok, M.J. Wester, K.A. Dill, Resultants and loop closure, Int. J.Quant. Chem. 106 (1) (2006) 176–189.

[28] G. Crippen, Chemical distance geometry: current realization and futureprojection, J. Math. Chem. 6 (1991) 307–324.

[29] G.M. Crippen, T.F. Havel, Distance Geometry and Molecular Conformations,Wiley, 1988.

[30] A.D.J. Cross, E.R. Hancock, Graph matching with a dual-step EM algorithm,IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1236–1253.

[31] F. del Rio-Portílla, V. Blechta, R. Freeman, Measurement of poorly-resolvedsplittings by J-doubling in the frequency domain, J. Magn. Reson. 111a (1994)132–135.

[32] F. Delaglio, G. Kontaxis, A. Bax, Protein structure determination usingmolecular fragment replacement and NMR dipolar couplings, J. Am. Chem.Soc. 122 (9) (2000) 2142–2143.

[33] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete datavia the EM algorithm, J. R. Stat. Soc. Ser. B 39 (1) (1977) 1–38.

[34] J. Desmet, M. DeMaeyer, B. Hazes, I. Lasters, The dead-end eliminationtheorem and its use in protein side chain positioning, Nature 356 (1992) 539–542.

[35] R. Diamond, Real-space refinement of the structure of hen egg-whitelysozyme, J. Mol. Biol. 82 (1974) 371–391.

[36] B.R. Donald, Plenary lecture: algorithmic challenges in structural molecularbiology and proteomics, in: M. Erdmann, M. Overmars, D. Hsu, A.F. van der

124 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 26: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

Stappen (Eds.), Proceedings of the Sixth International Workshop on theAlgorithmic Foundations of Robotics (WAFR), pp 1–10, Utrecht/Zeist, TheNetherlands, July 2004. University of Utrecht, Algorithmic Foundations ofRobotics VI, Springer Tracts in Advanced Robotics, vol. 17, Springer, Berlin,2005, pp. 1–10.

[37] B.R. Donald, D. Kapur, J. Mundy, Symbolic and Numerical Computationfor Artificial Intelligence, Academic Press, Harcourt Jovanovich, London,1992.

[38] A.C. Fowler, F. Tian, H.M. Al-Hashimi, J.H. Prestegard, Rapid determination ofprotein folds using residual dipolar couplings, J. Mol. Biol. 304 (3) (2000)447–460.

[39] T. Gallagher, P. Alexander, P. Bryan, G.L. Gilliland, Two crystal structures ofthe B1 immunoglobulin-binding domain of streptococcal protein G andcomparison with NMR, Biochemistry 33 (1994) 4721–4729.

[40] K.H. Gardner, L.E. Kay, Production and incorporation of 15N, 13C, 2H (1H� d1methyl) isoleucine into proteins for multidimensional NMR studies, J. Am.Chem. Soc. 119 (32) (1997) 7599–7600.

[41] G. Gemmecker, W. Jahnke, H. Kessler, Measurement of fast proton exchangerates in isotopically labelled compounds, J. Am. Chem. Soc. 115 (24) (1993)11620–11621.

[42] I. Georgiev, B.R. Donald, Dead-end elimination with backbone flexibility,Bioinformatics 23 (13) (2007) (special issue on papers from the InternationalConference on Intelligent Sys. for Mol. Biol. (ISMB 2007), Vienna, Austria, July21–25, 2007).

[43] I. Georgiev, D. Keedy, J. Richardson, D. Richardson, B.R. Donald, Algorithm forbackrub motions in protein design, Bioinformatics 22 (2008) e174–e183(special issue on papers from International Conference on Intelligent Sys. forMol. Biol. (ISMB 2008) Toronto, CA, July, 2008).

[44] I. Georgiev, R. Lilien, B.R. Donald, Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to proteindesign, Bioinformatics 22 (14) (2006) e174–e183 (special issue on papersfrom the Int’l Conf. on Intelligent Sys. Mol. Biol. (ISMB 2006), Fortaleza,Brazil).

[45] I. Georgiev, R. Lilien, B.R. Donald, The minimized dead-end eliminationcriterion and its application to protein redesign in a hybrid scoring and searchalgorithm for computing partition functions over molecular ensembles, J.Comput. Chem. 29 (10) (2008) 1527–1542.

[46] A.W. Giesen, S.W. Homans, J.M. Brown, Determination of protein global foldsusing backbone residual dipolar coupling and long-range NOE restraints, J.Biomol. NMR 25 (2003) 63–71.

[47] E. Girard, L. Chantalat, J. Vicat, R. Kahn, Gd-HPDO3A, a complex to obtainhigh-phasing-power heavy atom derivatives for SAD and MAD experiments:results with tetragonal hen egg-white lysozyme, Acta Crystallogr. D Biol.Crystallogr. 58 (2001) 1–9.

[48] R. Goldstein, Efficient rotamer elimination applied to protein side-chains andrelated spin glasses, Biophys. J. 66 (1994) 1335–1340.

[49] M.J. Gorczynski, J. Grembecka, Y. Zhou, Y. Kong, L. Roudaiya, M.G. Douvas, M.Newman, I. Bielnicka, G. Baber, T. Corpora, J. Shi, M. Sridharan, R. Lilien, B.R.Donald, N.A. Speck, M.L. Brown, J.H. Bushweller, Allosteric inhibition of theprotein–protein interaction between the leukemia-associated proteinsRUNX1 and CBFb, Chem. Biol. 14 (10) (2007).

[50] A.M. Gronenborn, D.R. Filpula, N.Z. Essig, A. Achari, M. Whitlow, P.T.Wingfield, G.M. Clore, A novel highly stable fold of the immunoglobulinbinding domain of streptococcal protein G, Science 253 (1991) 657.

[51] S. Grzesiek, A. Bax, Measurement of amide proton exchange rates and NOEswith water in 13C/15N-enriched calcineurin B, J. Biomol. NMR 3 (6) (1993)627–638.

[52] P. Güntert, C. Mumenthaler, K. Wüthrich, Torsion angle dynamics for NMRstructure calculation with the new program DYANA, J. Mol. Biol. 273 (1997)283–298.

[53] Peter Guntert, Automated NMR structure calculation with CYANA, MethodsMol. Biol. 278 (2004) 353–378.

[54] P.J. Hajduk, R.P. Meadows, S.W. Fesik, Drug design: discovering high-affinityligands for proteins, Science 278 (1997) 497–499.

[55] B. Hendrickson, Conditions for unique graph realizations, SIAM J. Comput. 21(1992) 65–84.

[56] B. Hendrickson, The molecule problem: exploiting structures in globaloptimization, SIAM J. Optimization 5 (1995) 835–857.

[57] T. Herrmann, P. Güntert, K. Wüthrich, Protein NMR structuredetermination with automated noe assignment using the new softwareCANDID and the torsion angle dynamics algorithm DYANA, J. Mol. Biol.319 (2002) 209–227.

[58] K. Hinsen, Normal mode theory and harmonic potential approximations,Course and Lecture Notes. http://dirac.cnrs-orleans.fr/~hinsen/,Centre de Biophysique Molculaire (CNRS), Orleans, France, 2000.

[59] Y.J. Huang, G.V. Swapna, P.K. Rajan, H. Ke, B. Xia, K. Shukla, M. Inouye, G.T.Montelione, Solution NMR structure of ribosome-binding factor a (RbfA), acoldshock adaptation protein from Escherichia coli, J. Mol. Biol. 327 (2003)521–536.

[60] Y.J. Huang, R. Tejero, R. Powers, G.T. Montelione, A topology-constraineddistance network algorithm for protein structure determination from NOESYdata, Proteins 62 (3) (2006) 587–603.

[61] J.-C. Hus, D. Marion, M. Blackledge, De novo determination of proteinstructure by NMR using orientational and long-range order restraints, J.Mol. Biol. 298 (5) (2000) 927–936.

[62] J.C. Hus, D. Marion, M. Blackledge, Determination of protein backbone usingonly residual dipolar couplings, J. Am. Chem. Soc. 123 (2001) 1541–1542.

[63] J.C. Hus, J. Prompers, R. Brüschweiler, Assignment strategy for proteins ofknown structure, J. Magn. Reson. 157 (2002) 119–125.

[64] J. Korukottu, M. Bayrhuber, P. Montaville, V. Vijayan, Y.S. Jung, S. Becker,M. Zweckstetter, Fast high-resolution protein structure determination byusing unassigned NMR data, Angew. Chem. Int. Ed. Engl. 46 (2007) 1176–1179.

[65] D.J. Jacobs, A.J. Rader, L.A. Kuhn, M.F. Thorpe, Protein flexibility predictionsusing graph theory, Proteins 44 (2) (2001) 150–165.

[66] E.C. Johnson, G.A. Lazar, J.R. Desjarlais, T.M. Handel, Solution structure anddynamics of a designed hydrophobic core variant of ubiquitin, Struct. FoldDes. 7 (1999) 967–976.

[67] Y.S. Jung, M. Zweckstetter, Backbone assignment of proteins with knownstructure using residual dipolar couplings, J. Biomol. NMR 30 (1) (2004) 25–35.

[68] K. Juszewski, C.D.S. Schwieters, D.S. Garrett, R.A. Byrd, N. Tjandra, G.M. Clore,Completely automated, highly error-tolerant macromolecular structuredetermination from multidimensional nuclear Overhauser enhancementspectra and chemical shift assignments, J. Am. Chem. Soc. 126 (2004)6258–6273.

[69] K. Ruan, K.B. Briggman, J.R. Tolman, De novo determination of internuclearvector orientations from residual dipolar couplings measured in threeindependent alignment media, J. Biomol. NMR 41 (2) (2008) 61–76.

[70] H. Kamisetty, C. Bailey-Kellogg, G. Pandurangan, An efficient randomizedalgorithm for contact-based NMR backbone resonance assignment,Bioinformatics 22 (2) (2006) 172–180.

[71] L.E. Kay, Protein dynamics from NMR, Nat. Struct. Biol. 5 (1998) 513–517.[72] M.D. Kemple, B.D. Ray, K.B. Lipkowitz, F.G. Prendergast, B.D. Rao, The use of

lanthanides for solution structure determination of biomolecules by NMR:evaluation of the methodology with EDTA derivatives as model systems, J.Am. Chem. Soc. 110 (25) (1988) 8275–8287.

[73] D.E. Kim, D. Chivian, D. Baker, Protein structure prediction and analysis usingthe Robetta server, Nucleic Acids Res. 32 (Web Server issue) (2004) 526–531.

[74] R. Kolodny, L. Guibas, M. Levitt, P. Koehl, Inverse kinematics in biology: theprotein loop closure problem, Int. J. Robotics Res. 24 (2005) 151–163.

[75] R.J. Kovacs, M.T. Nelson, H.K. Simmerman, L.R. Jones, Phospholamban formsCa2+-selective channels in lipid bilayers, J. Biol. Chem. 263 (1988) 18364–18368.

[76] H.W. Kuhn, Hungarian method for the assignment problem, Nav. Res. Logist.Q. 2 (1955) 83–97.

[77] I.V. Kurinov, R.W. Harrison, The influence of temperature on lysozymecrystals – structure and dynamics of protein and water, Acta Crystallogr. DBiol. Crystallogr. 51 (1995) 98–109.

[78] C. Langmead, B.R. Donald, 3D structural homology detection via unassignedresidual dipolar couplings, in: Proceedings of the IEEE Computer SocietyBioinformatics Conference (CSB), Stanford, August 2003, pp. 209–217, PMID:16452795.

[79] C. Langmead, B.R. Donald, An expectation/maximization nuclear vectorreplacement algorithm for automated NMR resonance assignments, J.Biomol. NMR 29 (2) (2004) 111–138.

[80] C. Langmead, B.R. Donald, High-throughput 3D structural homology detectionvia NMR resonance assignment, in: Proceedings of the IEEE ComputationalSystems Bioinformatics Conference (CSB), Stanford, CA, August 2004, pp.278–289, PMID: 16448021.

[81] C. Langmead, C.R. McClung, B.R. Donald, A maximum entropy algorithm forrhythmic analysis of genome-wide expression patterns, in: Proceedings of theIEEE Computer Society Bioinformatics Conference (IEEE CSB), August 2002,pp. 237–245, PMID: 15838140.

[82] C. Langmead, A. Yan, R. Lilien, L. Wang, B.R. Donald, A polynomial-timenuclear vector replacement algorithm for automated NMR resonanceassignments, in: Proceedings of the Seventh Annual InternationalConference on Research in Computational Molecular Biology (RECOMB), pp.176–187, Berlin, Germany, ACM Press, April, 2003.

[83] C. Langmead, A. Yan, R. Lilien, L. Wang, B.R. Donald, A polynomial-timenuclear vector replacement algorithm for automated NMR resonanceassignments, J. Comput. Biol. 11 (2–3) (2004) 277–298.

[84] C. Langmead, A. Yan, C.R. McClung, B.R. Donald, Phase-independent rhythmicanalysis of genome-wide expression patterns, J. Comput. Biol. 10 (3–4) (2003)521–536.

[85] I. Lasters, J. Desmet, The fuzzy-end elimination theorem: correctlyimplementing the side chain placement algorithm based on the dead-endelimination theorem, Protein Eng. 6 (1993) 717–722.

[86] M. Li, H.P. Phatnani, Z. Guan, H. Sage, A.L. Greenleaf, P. Zhou, Solutionstructure of the Set2-Rpb1 interacting domain of human Set2 and itsinteraction with the hyperphosphorylated C-terminal domain of Rpb1, Proc.Natl. Acad. Sci. USA 102 (49) (2005) 17636–17641.

[87] R. Lilien, H. Farid, B.R. Donald, Probabilistic disease classification ofexpression-dependent proteomic data from mass spectrometry of humanserum, J. Comput. Biol. 10 (6) (2003) 925–946.

[88] R. Lilien, B. Stevens, A. Anderson, B.R. Donald, A novel ensemble-based scoringand search algorithm for protein redesign, and its application to modify thesubstrate specificity of the gramicidin synthetase A phenylalanineadenylation enzyme, J. Comput. Biol. 12 (6–7) (2005) 740–761.

[89] R.H. Lilien, C. Bailey-Kellogg, A.C. Anderson, B.R. Donald, A subgroupalgorithm to identify cross-rotation peaks consistent with non-

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 125

Page 27: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

crystallographic symmetry, Acta Crystallogr. D Biol. Crystallogr. 60 (6) (2004)1057–1067.

[90] K. Lim, A. Nadarajah, E.L. Forsythe, M.L. Pusey, Locations of bromide ions intetragonal lysozyme crystals, Acta Crystallogr. D Biol. Crystallogr. 54 (1998)899–904.

[91] B. Lopez-Mendez, P. Guntert, Automated protein structure determinationfrom NMR spectra, J. Am. Chem. Soc. 128 (40) (2006) 13112–13122.

[92] J.A. Losonczi, M. Andrec, M.W. Fischer, J.H. Prestegard, Order matrix analysisof residual dipolar couplings using singular value decomposition, J. Magn.Reson. 138 (2) (1999) 334–342.

[93] M. Bryson, F. Tian, J.H. Prestegard, H. Valafar, REDCRAFT: a tool forsimultaneous characterization of protein backbone structure and motionfrom RDC data, J. Magn. Reson. 191 (2) (2008) 322–334.

[94] A. Marin, T.E. Malliavin, P. Nicolas, M.A. Delsuc, From NMR chemical shifts toamino acid types: investigation of the predictive power carried by nuclei, J.Biomol. NMR 30 (1) (2004) 47–60.

[95] J. Meiler, N. Blomberg, M. Nilges, C. Griesinger, A new approach for applyingresidual dipolar couplings as restraints in structure elucidation, J. Biomol.NMR 16 (2000) 245–252.

[96] J. Meiler, W. Peti, C. Griesinger, DipoCoup: a versatile program for 3D-structure homology comparison based on residual dipolar couplings andpseudocontact shifts, J. Biomol. NMR 17 (2000) 283–294.

[97] J. Meiler, D. Baker, Rapid protein fold determination using unassigned NMRdata, Proc. Natl. Acad. Sci. USA 100 (26) (2003) 15404–15409.

[98] J. Meiler, D. Baker, The fumarate sensor DcuS: progress in rapid protein foldelucidation by combining protein structure prediction methods with NMRspectroscopy, J. Magn. Reson. 173 (2) (2005) 310–316.

[99] R. Mettu, R. Lilien, B.R. Donald, High-throughput inference of protein–proteininterfaces from unassigned NMR data, Bioinformatics 21 (Suppl. 1) (2005)i292–i301.

[100] G.A. Mueller, W.Y. Choy, D. Yang, J.D. Forman-Kay, R.A. Venters, L.E. Kay,Global folds of proteins with low densities of NOEs using residual dipolarcouplings: application to the 370-residue maltodextrin-binding protein, J.Mol. Biol. 300 (1) (2000) 197–212.

[101] S. Neal, A.M. Nip, H. Zhang, D.S. Wishart, Rapid and accurate calculationof protein 1H, 13C and 15N chemical shifts, J. Biomol. NMR 26 (2003) 215–240.

[102] M. Nilges, A calculation strategy for the structure determination ofsymmetric dimers by 1H NMR, Proteins 17 (3) (1993) 297–309.

[103] M. Nilges, M. Macias, S. Odonoghue, H. Oschkinat, Automated NOESYinterpretation with ambiguous distance restraints: the refined NMRsolution structure of the pleckstrin homology domain from b-spectrin, J.Mol. Biol. 269 (1997) 408–422.

[104] K. Noonan, D. O’Brien, J. Snoeyink, Probik: protein backbone motion byinverse kinematics, Int. J. Robotics Res. 24 (11) (2005) 971–982.

[105] O. Vitek, C. Bailey-Kellogg, B. Craig, J. Vitek, Inferential backbone assignmentfor sparse data, J. Biomol. NMR 35 (3) (2006) 187–208.

[106] H. Oki, Y. Matsuura, H. Komatsu, A.A. Chernov, Refined structure oforthorhombic lysozyme crystallized at high temperature: correlationbetween morphology and intermolecular contacts, Acta Crystallogr. D Biol.Crystallogr. 55 (1999) 114.

[107] A.L. Okorokov, M.B. Sherman, C. Plisson, V. Grinkevich, K. Sigmundsson, G.Selivanova, J. Milner, E.V. Orlova, The structure of p53 tumour suppressorprotein reveals the basis for its functional plasticity, EMBO J. 25 (21) (2006)5191–5200.

[108] E.T. Olejniczak, R.X. Xu, S.W. Fesik, A 4D HCCH-TOCSY experiment forassigning the side chain 1H and 13C resonances of proteins, J. Biomol. NMR 2(6) (1992) 655–659.

[109] R. O’Neil, R. Lilien, B.R. Donald, R. Stroud, A. Anderson, The crystal structure ofdihydrofolate reductase-thymidylate synthase from Cryptosporidium hominisreveals a novel architecture for the bifunctional enzyme, J. Eukaryot.Microbiol. 50 (6) (2003) 555–556.

[110] R. O’Neil, R. Lilien, B.R. Donald, R. Stroud, A. Anderson, Phylogeneticclassification of protozoa based on the structure of the linker domain in thebifunctional enzyme, dihydrofolate reductase-thymidylate synthase, J. Biol.Chem. 278 (52) (2003) 52980–52987.

[111] K. Oxenoid, J.J. Chou, The structure of phospholamban pentamer reveals achannel-like architecture in membranes, PNAS 102 (2005) 10870–10875.

[112] A.G. Palmer, Probing molecular motion by NMR, Curr. Opin. Struct. Biol. 7(1997) 732–737.

[113] A.G. Palmer, J. Williams, A. McDermott, Nuclear magnetic resonance studiesof biopolymer dynamics, J. Phys. Chem. 100 (1996) 13293–13310.

[114] D.A. Pearlman, D.A. Case, J.W. Caldwell, W.S. Ross, T.E. Cheatham, S. DeBolt, D.Ferguson, G. Seibel, P. Kollman, AMBER, a package of computer programs forapplying molecular mechanics, normal mode analysis, molecular dynamicsand free energy calculations to simulate the structures and energies ofmolecules, Comp. Phys. Commun. 91 (1995) 1–41.

[115] N. Pierce, J. Spriet, J. Desmet, S. Mayo, Conformational splitting: a morepowerful criterion for dead-end elimination, J. Comput. Chem. 21 (2000)999–1009.

[116] V.M. Popov, D.C.M. Chan, Y.A. Fillingham, W.A. Yee, D.L. Wright, A.C.Anderson, Analysis of complexes of inhibitors with Cryptosporidium hominisDHFR leads to a new trimethoprim derivative, Bioorg. Med. Chem. Lett. 16(16) (2006) 4366–4370.

[117] S. Potluri, A. Yan, J.J. Chou, B.R. Donald, C. Bailey-Kellogg, Structuredetermination of symmetric homo-oligomers by a complete search of

symmetry configuration space using NMR restraints and van der Waalspacking, Proteins: Struct. Funct. Bioinform. 65 (1) (2006) 203–219.

[118] S. Potluri, A. Yan, B.R. Donald, C. Bailey-Kellogg, A complete algorithm toresolve ambiguity for inter-subunit NOE assignment in structuredetermination of symmetric homo-oligomers, Protein Sci. 16 (1) (2006).

[119] J.H. Prestegard, New techniques in structural NMR – anisotropic interactions,Nat. Struct. Biol. (1998) 517–522.

[120] J.H. Prestegard, C.M. Bougault, A.I. Kishore, Residual dipolar couplings instructure determination of biomolecules, Chem. Rev. 104 (8) (2004) 3519–3540.

[121] Y. Qu, J.T. Guo, V. Olman, Y. Xu, Protein fold recognition through applicationof residual dipolar coupling data, Pac. Symp. Biocomput. (2004) 459–470.

[122] R. Bertram, J.R. Quine, M.S. Chapman, T.A. Cross, Atomic refinement usingorientational restraints from solid-state NMR, J. Magn. Reson. 147 (2000) 9–16.

[123] R. Ramage, J. Green, T.W. Muir, O.M. Ogunjobi, S. Love, K. Shaw, Synthetic,structural and biological studies of the ubiquitin system: the total chemicalsynthesis of ubiquitin, J. Biochem. 299 (1994) 151–158.

[124] W. Rieping, M. Habeck, M. Nilges, Inferential structure determination, Science209 (5732) (2005) 303–306.

[125] C.A. Rohl, D. Baker, De Novo determination of protein backbone structurefrom residual dipolar couplings using Rosetta, J. Am. Chem. Soc. 124 (11)(2002) 2723–2729.

[126] C.A. Rohl, Protein structure estimation from minimal restraints using Rosetta,Methods Enzymol. 394 (2005) 244–260.

[127] K. Ruan, J.R. Tolman, Composite alignment media for the measurement ofindependent sets of NMR residual dipolar couplings, J. Am. Chem. Soc. 127(43) (2005) 15032–15033.

[128] S. Bansal, X. Miao, M.W. Adams, J.H. Prestegard, H. Valafar, Rapidclassification of protein structure models using unassigned backbone RDCsand probability density profile analysis (PDPA), J. Magn. Reson. 192 (1) (2008)60–68.

[129] S. Rumpel, S. Becker, M. Zweckstetter, High-resolution structuredetermination of the CylR2 homodimer using paramagnetic relaxationenhancement and structure-based prediction of molecular alignment, J.Biomol. NMR 40 (1) (2008) 1–13.

[130] A. Saupe, Recent results in the field of liquid crystals, Angew. Chem. 7 (1968)97–112.

[131] J.B. Saxe, Embeddability of weighted graphs in k-space is strongly NP-hard,in: Proceedings of the 17th Allerton Conference on Communications, Control,and Computing, 1979, pp. 480–489.

[132] B.R. Seavey, E.A. Farr, W.M. Westler, J.L. Markley, A relational database forsequence-specific protein NMR data, J. Biomol. NMR 1 (1991) 217–236.

[133] S.B. Shuker, P.J. Hajduk, R.P. Meadows, S.W. Fesik, Discovering high-affinityligands for proteins: SAR by NMR, Science 274 (1996) 1531–1534.

[134] R. Singh, B. Berger, ChainTweak: sampling from the neighbourhood of aprotein conformation, in: Pacific Symposium on Biocomputing 2005, pp. 54–65.

[135] N.R. Skrynnikov, L.E. Kay, Assessment of molecular structure using frame-independent orientational restraints derived from residual dipolar couplings,J. Biomol. NMR 18 (3) (2000) 239–252.

[136] B. Stevens, R. Lilien, I. Georgiev, B.R. Donald, A. Anderson, Redesigning thePheA domain of gramicidin synthetase leads to a new understanding of theenzyme’s mechanism and selectivity, Biochemistry 45 (51) (2006) 15495–15504.

[137] K. Suhre, Y.H. Sanejouand, ElNemo: a normal mode web server for proteinmovement analysis and the generation of templates for molecularreplacement, Nucleic Acids Res. 32 (Web Server issue) (2004) 610–614.

[138] K. Suhre, Y.H. Sanejouand, On the potential of normal-mode analysis forsolving difficult molecular-replacement problems, Acta Crystallogr. D Biol.Crystallogr. 60 (Pt. 4) (2004) 796–799.

[139] F. Tian, H. Valafar, J.H. Prestegard, A dipolar coupling based strategy forsimultaneous resonance assignment and structure determination of proteinbackbones, J. Am. Chem. Soc. 123 (47) (2001) 11791–11796.

[140] N. Tjandra, A. Bax, Direct measurement of distances and angles inbiomolecules by NMR in a dilute liquid crystalline medium, Science 278(1997) 1111–1114.

[141] D. Tobi, I. Bahar, Structural changes involved in protein binding correlatewith intrinsic motions of proteins in the unbound state, Proc. Natl. Acad. Sci.USA 102 (52) (2005) 18908–18913.

[142] J.R. Tolman, J.M. Flanagan, M.A. Kennedy, J.H. Prestegard, Nuclearmagnetic dipole interactions in field-oriented proteins: information forstructure determination in solution, Proc. Natl. Acad. Sci. USA 92 (1995)9279–9283.

[143] V. Tugarinov, W.Y. Choy, V.Y. Orekhov, L.E. Kay, Solution NMR-derived globalfold of a monomeric 82-kDa enzyme, Proc. Natl. Acad. Sci. USA 102 (3) (2005)622–627.

[144] V. Tugarinov, R. Muhandiram, A. Ayed, L.E. Kay, Four-dimensional NMRspectroscopy of a 723-residue protein: chemical shift assignments andsecondary structure of malate synthase g, J. Am. Chem. Soc. 124 (34) (2002)10025–10035.

[145] H. Valafar, J.H. Prestegard, Rapid classification of a protein fold family using astatistical analysis of dipolar couplings, Bioinformatics 19 (12) (2003) 1549–1555.

[146] M.C. Vaney, S. Maignan, M. Ries-Kautt, A. Ducruix, High-resolution structure(1.33 angstrom) of a HEW lysozyme tetragonal crystal grown in the APCF

126 B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127

Page 28: Author's personal copy - Duke Universitybrd/papers/pdf-reprints/pnmrs09.pdfexact solutions, by systematic search, for internuclear bond vectors and backbone dihedral angles using as

Author's personal copy

apparatus. Data and structural comparison with a crystal grown undermicrogravity from SpaceHab-01 mission, Acta Crystallogr. D Biol. Crystallogr.52 (1996) 505–517.

[147] S. Vijay-Kumar, C.E. Bugg, W.J. Cook, Structure of ubiquitin refined at 1.8 Åresolution, J. Mol. Biol. 194 (1987) 531–544.

[148] O. Vitek, C. Bailey-Kellogg, B. Craig, P. Kuliniewicz, J. Vitek, Reconsideringcomplete search algorithms for protein backbone NMR assignment,Bioinformatics 21 (Suppl. 2) (2005) ii230–ii236.

[149] O. Vitek, J. Vitek, B. Craig, C. Bailey-Kellogg, Model-based assignment andinference of protein backbone Nuclear Magnetic Resonances, Stat. Appl.Genet. Mol. Biol. 3 (1) (2004).

[150] C.E. Wang, T.L. Pérez, B. Tidor, AMBIPACK: a systematic algorithm for packingof macromolecular structures with ambiguous distance constraints, Proteins:Struct. Funct. Genet. 32 (1998) 26–42.

[151] L. Wang, B.R. Donald, Analysis of a systematic search-based algorithm fordetermining protein backbone structure from a minimal number of residualdipolar couplings, in: Proceedings of the IEEE Computational SystemsBioinformatics Conference (CSB), Stanford, CA, August 2004, pp. 319–330,PMID: 16448025.

[152] L. Wang, B.R. Donald, Exact solutions for internuclear vectors and backbonedihedral angles from NH residual dipolar couplings in two media, and theirapplication in a systematic search algorithm for determining proteinbackbone structure, J. Biomol. NMR 29 (3) (2004) 223–242.

[153] L. Wang, B.R. Donald, An efficient and accurate algorithm for assigningnuclear Overhauser effect restraints using a rotamer library ensemble andresidual dipolar couplings, in: Proceedings of the IEEE ComputationalSystems Bioinformatics Conference (CSB), Stanford, CA, August 2005, pp.189–202, PMID: 16447976.

[154] L. Wang, B.R. Donald, A data-driven, systematic search algorithm forstructure determination of denatured or disordered proteins, in:Proceedings of the LSS Computational Systems Bioinformatics Conference(CSB), Stanford, CA, August 2006, pp. 68–78, PMID: 17369626.

[155] L. Wang, R. Mettu, B.R. Donald, An algebraic geometry approach to proteinbackbone structure determination from NMR data, in: Proceedings of theIEEE Computational Systems Bioinformatics Conference (CSB), , Stanford, CA,August 2005, pp. 235–246, PMID: 16447981.

[156] L. Wang, R. Mettu, B.R. Donald, A polynomial-time algorithm for de novoprotein backbone structure determination from NMR data, J. Comput. Biol. 13(7) (2006) 1276–1288.

[157] Y.X. Wang, J. Jacob, F. Cordier, P. Wingfield, S.J. Stahl, S. Lee-Huang, D. Torchia,S. Grzesiek, A. Bax, Measurement of 3hJNC’ connectivities across hydrogenbonds in a 30 kDa protein, J. Biomol. NMR 14 (2) (1999) 181–184.

[158] Y.X. Wang, J.L. Marquardt, P. Wingfield, S.J. Stahl, S. Lee-Huang, D. Torchia, A.Bax, Simultaneous measurement of 1H–15N, 1H–13C0 , and 15N–13C0 dipolarcouplings in a perdeuterated 30 kDa protein dissolved in a dilute liquidcrystalline phase, J. Am. Chem. Soc. 120 (29) (1998) 7385–7386.

[159] W.J. Wedemeyer, C.A. Rohl, H.A. Scheraga, Exact solutions for chemical bondorientations from residual dipolar couplings, J. Biomol. NMR 22 (2002) 137–151.

[160] H. Weidong, L. Wang, Residual dipolar couplings: measurements andapplications to biomolecular studies, Annu. Rep. NMR Spectrosc. 58 (2006)231–303.

[161] S. Wells, S. Menor, B. Hespenheide, M.F. Thorpe, Constrained geometricsimulation of diffusive motion in proteins, Phys. Biol. 2 (4) (2005)127–136.

[162] Wikipedia, NP-completeness, Wikipedia, the Free Encyclopedia, September2008. Available online: http://en.wikipedia.org/wiki/NP-complete.

[163] D.S. Wishart, B.D. Sykes, The 13C chemical shift index: a simple method forthe identification of protein secondary structure using 13C chemical shiftdata, J. Biomol. NMR 4 (1994) 171–180.

[164] D.S. Wishart, B.D. Sykes, F.M. Richards, Relationship between nuclearmagnetic resonance chemical shift and protein secondary structure, J. Mol.Biol. 222 (2) (1991) 311–333.

[165] D.S. Wishart, B.D. Sykes, F.M. Richards, The 13C chemical shift index: a fastand simple method for the identification of protein secondary structure using13C chemical shift data, Biochemistry 31 (6) (1992) 1647–1651.

[166] K. Wuthrich, Protein recognition by NMR, Nat. Struct. Biol. 7 (3) (2000)188–189.

[167] W.Y. Choy, M. Tollinger, G.A. Mueller, L.E. Kay, Direct structure refinement ofhigh molecular weight proteins against residual dipolar couplings andcarbonyl chemical shift changes upon alignment: an application to maltosebinding protein, J. Biomol. NMR 21 (1) (2001) 31–40.

[168] X. Wang, S. Bansal, M. Jiang, J.H. Prestegard, RDC-assisted modeling ofsymmetric protein homo-oligomers, Protein Sci. 17 (5) (2008)899–907.

[169] X.P. Xu, D.A. Case, Automated prediction of 15N, 13C‘alpha‘, 13C‘beta‘ and 13C0

chemical shifts in proteins using a density functional database, J. Biomol.NMR 21 (2001) 321–333.

[170] Y. Xu, Y. Zheng, J.S. Fan, D. Yang, A new strategy for structure determinationof large proteins in solution without deuteration, Nat. Methods 3 (11) (2006)931–937.

[171] A. Yan, C. Langmead, B.R. Donald, A probability-based similarity measure forSaupe alignment tensors with applications to residual dipolar couplings inNMR structural biology, Int. J. Robotics Res. 24 (2–3) (2005) 165–182 (specialissue on Robotics Techniques Applied to Computational Biology).

[172] J. Zamoon, A. Mascioni, D.D. Thomas, G. Veglia, NMR solution structure andtopological orientation of monomeric phospholamban indodecylphosphocholine micelles, Biophys. J. 85 (4) (2003) 2589–2598.

[173] M.I. Zavodszky, M. Lei, M.F. Thorpe, A.R. Day, L.A. Kuhn, Modeling correlatedmain-chain motions in proteins for flexible molecular recognition, Proteins57 (2) (2004) 243–261.

[174] J. Zeng, C. Tripathy, P. Zhou, B.R. Donald, A Hausdorff-based NOE assignmentalgorithm using protein backbone determined from residual dipolarcouplings and rotamer patterns, in: Proceedings of the LSS ComputationalSystems Bioinformatics Conference (CSB), Stanford, CA, Aug 2008, pp. 169–181, PMID: 19122773.

[175] M. Zweckstetter, Determination of molecular alignment tensors withoutbackbone resonance assignment: aid to rapid analysis of protein–proteininteractions, J. Biomol. NMR 27 (1) (2003) 41–56.

[176] M. Zweckstetter, A. Bax, Single-step determination of protein substructuresusing dipolar couplings: aid to structural genomics, J. Am. Chem. Soc. 123(38) (2001) 9490–9491.

B.R. Donald, J. Martin / Progress in Nuclear Magnetic Resonance Spectroscopy 55 (2009) 101–127 127