14
Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures of proteins in solution by nuclear magnetic resonance spectroscopy G.Marius Clore and Angela M.Gronenborn Max-Planck Institut fQr Biochcmie, D-8033 Martinsried bei MOnchen, FRG Introduction Ever since the early days of biological nuclear magnetic resonance (n.m.r.) spectroscopy, it was appreciated that n.m.r. could in principle be used to determine the three-dimensional structures of proteins in solution at a resolution comparable to that afford- ed in the crystal state by X-ray diffraction methods. The struc- tural information that can be extracted by n.m.r. arises from two main sources. The first and most important is the interproton nuclear Overhauser effect (NOE) which can be used to demonstrate the proximity of protons in space and to determine their separation (Noggle and Schirmer, 1971). The second is the measurement of three bond coupling constants which can be us- ed to derive approximate torsion angle restraints from Karplus (1963) type relationships. Before such information can be ex- tracted, however, it is essential to assign the 'H-n.m.r. spectrum of the protein. This presents a formidable task and it is only in the past few years with the advent of very high field (500 MHz) n.m.r. spectrometers and new experimental techniques, in par- ticular the whole array of two-dimensional n.m.r. experiments (Ernst et al., 1986), that this has become a feasible proposition. Once a large set of interproton distance and torsion angle restraints has been obtained, it is necessary to use computational techniques to convert this information into cartesian coordinates. This too presents a difficult problem as the n.m.r. data are limited in their number, range {<, 5 A) and accuracy. At the present time, the structures of only a few proteins have been determined by n.m.r. and a summary of these is provided in Table I. In this article, we will present an overview of the various steps involv- ed in solving the three-dimensional structure of a protein in solu- tion by n.m.r. as summarized by the flow chart in Table II. Sequential resonance assignment The sequential resonance assignment of the 'H-n.m.r. spectrum of a protein relies on two types of experiments: (i) those demonstrating through-bond connectivities, and (ii) those demonstrating through-space ( s 5 A ) connectivities. These ex- periments have to be performed both in H 2 O and D 2 O, the former to identify connectivities involving the exchangeable NH protons, and the latter to identify connectivities involving only non-exchangeable protons. There exist a large number of two dimensional n.m.r. ex- periments for demonstrating homonuclear through-bond connec- tivities (Ernst et al., 1986). Collectively these can be called correlation experiments and are used to identify spin systems. These experiments permit one to group resonances belonging to individual amino acids, as only intraresidue connectivities are displayed, as well as to identify the spin class of amino acid to which they belong. Some spin classes have only one amino acid, so that the actual amino acid can be identified. This is the case for Gly, Ala, Val, Thr, Leu, De and Lys. In other cases, a par- ticular class may contain several amino acids. Thus Asp, Asn and Cys, and the aliphatic protons of His, Phe, Tyr and Trp, all belong to the AMX spin system (i.e. one C°H proton and two C"H protons). The simplest correlation experiment is the COSY experiment which was first described in 1976 by Aue et al. (1976), and demonstrates only direct though-bond connectivities. Thus, for a residue which has NH, C H , C"H and C H protons, connec- tivities will only be manifested between the NH and C"H, C°H and C"H, and C"H and C^H protons. The basic COSY experi- ment has now been superceded by more sophisticated experiments such as DQF-COSY (Ranee et al., 1983) and col-scaled DQF-COSY (Brown, 1984) which have the advantage of ex- hibiting pure phase absorption diagonals when the spectra are recorded in the pure absorption mode. Taken alone, experiments which only demonstrate direct Table I. Summary of proteins and polypeptides whose three-dimensional structures have been determined by n.m.r. Protein" (residues) Micelle bound glucagon (29) Insectotoxin I 5 A (35) BUSI (57) Lac repressor headpiece (51) helix F of CRP (17) orl-purothionin (45) Tendasrrustat (74) Metallothionein-2 (61) Phoratoxin (46) Hirudin (65) GH5 (79) hEGF (50) BPTI (56) CPI (39) BSPI-2 (64) Model calculations 0 BPTI (56) BPTI (56) Crambin (46) Crambin (46) Method b DG DG DG Model building and RD RD DG and RD RLST RLST DG and RD DG and RD DG and RD DG DG and RLST DG and RD DG and RD DG RLST RD DG Reference Braun et al. (1983) Arseniev et al. (1984) Williamson et al. (1985) Kaptein et al. (1985) Clore et al. (1985) Clore et al. (1986a) Kline et al. (1986) Braun et al. (1986) Clore et al. (1987a) Clore et al. (1987b) Clore et al. (1987c) Cook et al. (1987) Wagner et al. (1987) Clore et al. (1987d) Clore et al. (1987e) Havel and WQthrich (1985) Braun and Go (1986) BrOnger et al. (1986) and Clore et al. (1986b) Clore et al. (1987f) The abbreviations for the various proteins are as follows: BUSI, proteinase inhibitor HA from bull seminal plasma; CRP, cAMP receptor protein of Escherichia coll; GH5, globular domain of histone H5; BPTI, basic pan- creatic trypsin inhibitor; CPI, potato carboxypeptidase inhibitor, hEGF, human epidermal growth factor; BSPI-2, barley serine proteinase inhibitor 2. b The abbreviations for the various methods are: DG, metric matrix distance geometry; RLST, restrained least square minimization in torsion space with a variable target function; RD, restrained molecular dynamics. c In the model calculations, a set of interproton distance restraints, that could realistically be determined experimentally, were derived from the X-ray structures. © IRL Press Limited, Oxford, England 275

REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Protein Engineering vol.1 no.4 pp.275-288, 1987

REVIEW

Determination of three-dimensional structures of proteins in solutionby nuclear magnetic resonance spectroscopy

G.Marius Clore and Angela M.Gronenborn

Max-Planck Institut fQr Biochcmie, D-8033 Martinsried bei MOnchen, FRG

IntroductionEver since the early days of biological nuclear magnetic resonance(n.m.r.) spectroscopy, it was appreciated that n.m.r. could inprinciple be used to determine the three-dimensional structuresof proteins in solution at a resolution comparable to that afford-ed in the crystal state by X-ray diffraction methods. The struc-tural information that can be extracted by n.m.r. arises from twomain sources. The first and most important is the interprotonnuclear Overhauser effect (NOE) which can be used todemonstrate the proximity of protons in space and to determinetheir separation (Noggle and Schirmer, 1971). The second is themeasurement of three bond coupling constants which can be us-ed to derive approximate torsion angle restraints from Karplus(1963) type relationships. Before such information can be ex-tracted, however, it is essential to assign the 'H-n.m.r. spectrumof the protein. This presents a formidable task and it is only inthe past few years with the advent of very high field (500 MHz)n.m.r. spectrometers and new experimental techniques, in par-ticular the whole array of two-dimensional n.m.r. experiments(Ernst et al., 1986), that this has become a feasible proposition.Once a large set of interproton distance and torsion anglerestraints has been obtained, it is necessary to use computationaltechniques to convert this information into cartesian coordinates.This too presents a difficult problem as the n.m.r. data are limitedin their number, range {<, 5 A) and accuracy. At the presenttime, the structures of only a few proteins have been determinedby n.m.r. and a summary of these is provided in Table I. In thisarticle, we will present an overview of the various steps involv-ed in solving the three-dimensional structure of a protein in solu-tion by n.m.r. as summarized by the flow chart in Table II.

Sequential resonance assignmentThe sequential resonance assignment of the 'H-n.m.r. spectrumof a protein relies on two types of experiments: (i) thosedemonstrating through-bond connectivities, and (ii) thosedemonstrating through-space ( s 5 A ) connectivities. These ex-periments have to be performed both in H2O and D2O, theformer to identify connectivities involving the exchangeable NHprotons, and the latter to identify connectivities involving onlynon-exchangeable protons.

There exist a large number of two dimensional n.m.r. ex-periments for demonstrating homonuclear through-bond connec-tivities (Ernst et al., 1986). Collectively these can be calledcorrelation experiments and are used to identify spin systems.These experiments permit one to group resonances belonging toindividual amino acids, as only intraresidue connectivities aredisplayed, as well as to identify the spin class of amino acid towhich they belong. Some spin classes have only one amino acid,so that the actual amino acid can be identified. This is the case

for Gly, Ala, Val, Thr, Leu, De and Lys. In other cases, a par-ticular class may contain several amino acids. Thus Asp, Asn andCys, and the aliphatic protons of His, Phe, Tyr and Trp, allbelong to the AMX spin system (i.e. one C°H proton and twoC"H protons).

The simplest correlation experiment is the COSY experimentwhich was first described in 1976 by Aue et al. (1976), anddemonstrates only direct though-bond connectivities. Thus, fora residue which has NH, C H , C"H and C H protons, connec-tivities will only be manifested between the NH and C"H, C°Hand C"H, and C"H and C^H protons. The basic COSY experi-ment has now been superceded by more sophisticated experimentssuch as DQF-COSY (Ranee et al., 1983) and col-scaledDQF-COSY (Brown, 1984) which have the advantage of ex-hibiting pure phase absorption diagonals when the spectra arerecorded in the pure absorption mode.

Taken alone, experiments which only demonstrate direct

Table I. Summary of proteins and polypeptides whose three-dimensionalstructures have been determined by n.m.r.

Protein" (residues)

Micelle boundglucagon (29)Insectotoxin I5A (35)BUSI (57)Lac repressor

headpiece (51)helix F of CRP (17)orl-purothionin (45)Tendasrrustat (74)Metallothionein-2 (61)Phoratoxin (46)Hirudin (65)GH5 (79)hEGF (50)BPTI (56)CPI (39)BSPI-2 (64)

Model calculations0

BPTI (56)BPTI (56)Crambin (46)

Crambin (46)

Methodb

DG

DGDGModel building

and RDRDDG and RDRLSTRLSTDG and RDDG and RDDG and RDDGDG and RLSTDG and RDDG and RD

DGRLSTRD

DG

Reference

Braun et al. (1983)

Arseniev et al. (1984)Williamson et al. (1985)Kaptein et al. (1985)

Clore et al. (1985)Clore et al. (1986a)Kline et al. (1986)Braun et al. (1986)Clore et al. (1987a)Clore et al. (1987b)Clore et al. (1987c)Cook et al. (1987)Wagner et al. (1987)Clore et al. (1987d)Clore et al. (1987e)

Havel and WQthrich (1985)Braun and Go (1986)BrOnger et al. (1986)and Clore et al. (1986b)Clore et al. (1987f)

The abbreviations for the various proteins are as follows: BUSI, proteinaseinhibitor HA from bull seminal plasma; CRP, cAMP receptor protein ofEscherichia coll; GH5, globular domain of histone H5; BPTI, basic pan-creatic trypsin inhibitor; CPI, potato carboxypeptidase inhibitor, hEGF,human epidermal growth factor; BSPI-2, barley serine proteinase inhibitor 2.bThe abbreviations for the various methods are: DG, metric matrix distancegeometry; RLST, restrained least square minimization in torsion space witha variable target function; RD, restrained molecular dynamics.cIn the model calculations, a set of interproton distance restraints, that couldrealistically be determined experimentally, were derived from the X-raystructures.

© IRL Press Limited, Oxford, England 275

Page 2: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.Clore and A.M.Gronenborn

Table U. Flow chart of the steps involved in determining the three-dimensionalstructure of a protein in solution by n.m.r.

1 . Sequential resonance assignment.(i) Identification of spin systems by means of experiments

demonstrating through-bond connectivities (COSY, DQF-COSY, multiplequantum spectroscopy, HOHAHA).

(ii) Indentification of neighbouring amino acids by means of NOEmeasurements (NOESY) demonstrating through-space ( i 5 A ) short range(|i—j\ s 5 ) interproton connectivities.

2 . Assignment of tertiary long range (\i-j\ >5) NOEs.3 . Quantification or classification of NOEs to yield approximate distance

restraints.4 . Measurement of three-bond coupling constants where feasible using ex-

periments such as DQF-COSY, wl-scaled DQF-COSY, E-COSY and z-COSY, in order to derive approximate torsion angle restraints.

5 . Identification of regular secondary structure elements by means of aqualitative interpretation of the NOE data.

6. Determination of the three-dimensional structure on the basis of the ap-proximate interproton distance and torsion angle restraints using a com-bination of the following approaches:

(i) Manual and semi-automatic model building methods(ii) Methods relying on data bases derived from X-ray structures ofproteins(iii) Metric matrix distance geometry(iv) Restrained least squares minimization in torsion angle space withvariable target functions(v) Restrained molecular dynamics

Abbreviations used: COSY, homonuclear two-dimensional correlated spectroscopy;DQF, double quantum filtered; HOHAHA, two dimensional homonuclearHartmann-Hahn spectroscopy; NOE, nuclear Overhauser effect; NOESY, twodimensional NOE spectroscopy.

through-bond connectivities are of limited value owing to pro-blems of spectral overlap. Thus, as one progresses from the NHand CaH protons to the side chain protons, the spectral overlaptends to increase. For this reason, experiments which alsodemonstrate indirect or relayed through bond-connectivities, forexample between NH and C"H protons, are invaluable. The firstsuch experiment to be described was the relayed-COSY (Boltonand Bodenhausen, 1982), followed by homonuclearHartmann —Hahn (HOHAHA) spectroscopy (Davis and Bax,1985). The HOHAHA experiment is particularly versatile as onecan adjust the experimental mixing time to obtain direct, single,double and multiple relayed connectivities at will. Further, themultiple! components of the cross-peaks are all in phase (i.e. havethe same sign) in HOHAHA spectra, whereas in COSY typespectra they are in antiphase. This has the advantage that theHOHAHA experiment is considerably more sensitive and affordsbetter resolution than the COSY type experiments.

Some examples of cross peak patterns that are observed inCOSY and HOHAHA experiments for different amino acids areshown in Figure 1, and two examples of HOHAHA spectrarecorded with different mixing times are shown in Figure 2.

Once a few spin systems have been identified, one can thenproceed to identify sequential through-space connectivities in-volving the NH, C H and C'H protons by means of two dimen-sional NOE experiments (known as NOESY). For this purposethe most important connectivities are the CHOJ-NHO' + 1),C H ( 0 - N H ( / + 2), C<*H(0-NH(/ + 3), NH(/)-NH(' + 1),C^H(0-NH(/ + 1) and CaH(i)-C^H(i + 3) connectivities(Wuthrich et al., 1984). Characteristic short range NOESY con-nectivities are illustrated in Figure 3 and examples of someNOESY spectra are shown in Figure 4.

Fig. 1. Patterns of through connectivities observed for different amino acids.In COSY spectra only direct connectivities ( • ) are observed, whereas inHOHAHA spectra direct ( • ) as well as single (O), double (D) and triple( • ) relayed connectivities can be progressively observed as the mixing timein the pulse sequence is increased.

Interproton distance restraintsThe initial slope of the time dependence of the NOE, N,-, ( T ^ ,between two protons i and j is equal to the cross-relaxation rate<jy between the two protons (Wagner and Wuthrich, 1979; Dob-son et al., 1982; Clore and Gronenborn, 1985). This is the rateat which magnetization is exchanged through space between thetwo protons, ay, in turn, is proportional to <ry~6> and r^ij)(where ry is the distance between the two protons, and r ^ y ) ,the effective correlation time of the i—j vector). It thereforefollows that, at short mixing times rm, ratios of NOEs can yieldratios of distances, through the relationship r{jlrkl —[^ki(Tm)^ij{Tm)]U6 providing the effective correlation times forthe two interproton distance vectors are approximately the same.In the case of proteins, however, there are quite significant varia-tions in effective correlation times as one progresses outwardsfrom the main chain atoms to the side chain atoms, associatedwith higher mobility of side chains, expecially long ones. Never-theless, because of the <r~6> dependence of the NOE, ap-proximate distance restraints can be derived even in the presenceof large variations in effective correlation times. Empirically,the type of classification of distances generally used is one inwhich strong, medium and weak NOEs correspond to distanceranges of approximately 1.8-2.7 A, 1.8-3.3 A and 1.8-5.0A, where the lower limit of 1.8 A corresponds of the sum ofthe van der Waal's radii of two protons. By using such a scheme,variations in effective correlation times do not introduce errors

276

Page 3: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Determination of 3-D protein structure by n.m.r.

3.6

Fl (PPM)

4.0

4,4 -

HOHAHA 14MS

Yl

D25

F6

1

1 Y1O

N8

««-•» D3

V13 Q -

Q24 0 <R2OK21

K12Q16 <3

K21

• 0 15O -. X27

W -L17

» L22 •O R 2 9

14

< £ A 1 9

A4 A

A2 «J

Q

, .T7fr»Y

a

3.6

Fl (PPM)

4.4

HOHAHA 63MS

R20 'Q24

Y l • " ' " "

R29 ft >« Y2 Bl 82

D25Bl 82 Yl Y2

4A ** Ll^y 61 "2A V

N8

•t—II . I D 3

3.2 2.8 2.4 2.0

F2 (PPM)

1.6 1.2 0.8

H H|C5

H 0 H H 0

Fig. 2. Examples of HOHAHA spectra of the CH(F1 axis)-aliphatic (F2 axis) region of the (1 -29) fragment of human growth hormone releasing factorshowing the effect of mixing time. In (a) only direct C°H-CflH connectivities are seen, while in (b) direct and multiple relayed connectivities are seen. Forexample, the entire spin system of the three Arg residues is apparent in (b). Residues are labelled using the one letter code and X stands for rtorleucine (Nle).The experimental conditions are 4 mM peptide in 20 mM phosphate buffer pH 4.0 and 30% (v/v) d3-trifluoroethanol at 25"C (Clore et al., 1986c).

multiple! patterns in COSY and COSY-like (e.g. DQF-COSY)spectra. The simplest coupling constants to determine are the3JHNa coupling constants which can be obtained by simplymeasuring the peak-to-peak separation of the antiphase com-ponents of the C a H-NH COSY cross-peaks. Values of 3JHNa

< 6 Hz and > 9 Hz correspond to ranges of -10° to -90°and -80° to -180°, respectively, in the 0 backbone torsionangles (Pardi et al., 1984). Considerable care, however, shouldbe taken in deriving <j> backbone torsion angle ranges from ap-parent values of 3JHNO coupling constants measured in this way,as the minimum separation between the antiphase componentsof a COSY cross-peak is equal to -0 .58 times the linewidth athalf-height (Nenhaus et al., 1985).

Secondary structureRegular secondary structure elements can easily be identified froma qualitative interpretation of the sequential NOEs as each typeof secondary structure is characterized by a particular patternof short range (\i-j\&5) NOEs (Wuthrich et al., 1984;Wutherich, 1986; Wagner et al., 1986). Thus, for example,helices are characterized by a stretch of strong or mediumNH(/)-HN(/+l), CaH(i)-NH(/+3) and C°H(0-C^H(/ + 3)NOEs and weak CaH(/)-NH(/+1) NOEs. Strands, on the other

Fig. 3. Schematic diagram illustrating the sequential NOE connectivitiesinvolving the NH, C'H and C"H protons that can potentially be observedfor a residue i in a tetrapeptide segment.

into the distance restraints. Rather they only result in an increasein the estimated range for a particular interproton distance. Thusthe effect of a decrease in the effective correlation time of aninterproton vector i—j is simply manifested by a reduction inthe magnitude of the corresponding NOE, N,-,, and therefore bya reduction in the precision with which that distance is defined(i.e. for a very mobile side chain, an interproton distance of say2.3 A is likely to appear as a weak NOE and hence be classifiedin the 1 . 8 - 5 . 0 A range instead of the 1 . 8 - 2 . 7 A range).

Torsion angle restraints

Some torsion angle restraints can be derived from three-bondcoupling constants. The latter may be obtained by analysing the

277

Page 4: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.Clore and A.M.Gronenborn

9.5 •

9.0

8.5

F2 (PPM)

8.0 ]

7.5

7.0 -

6.5 -

9.5 "

9.0 .

8.5 .

F2 (PPM)

8.0 -

7.5 .

7,5 -

7,0 -

6.5 -

NOESY 150 MS H20

4 , 3 3 • 1 3

2 , 3 5

4 , 3 2

'39,

r' 7 ? ' -

..426.

416

42O i

37 ,10.

i a

o

9

I25BB t .V25AB

ft A96 4 .

t

4a 32a

NOESY150 MSD20

• 561 562

5a _ 5 5 1

23a 27a 26

3B 5B 33B

1 7Y 346

4453

44c3

134

13t

5 . 0 H.5 4 . 0 3 . 5 3.0 2.5F l (PPM)

2.0 1.5 1.0 0.5

278

Page 5: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Determination of 3-D protein structure by n.m.r.

6 . 5 -

7 . 0 -

Fl(PPM)

7 . 5 -

8 . 0 -

8 . 5 -

9.0 -

9,5 -

9.5 9.0 8.5 8.0

F2 (PPM)

7.5 7.0 6.5

Fig. 4. HOHAHA (a) NOESY (b and c) spectra for phoratoxin (Clore et al., 1987a). The NH/aromatic(F2 axis)-aliphatic(Fl axis) region is displayed in (a)and (b) while the NH/aromatic(Fl axis)-NH/aromatic(F2 axis) is displayed in (c). In the HOHAHA spectrum (a) some relayed connectivities are indicated bycontinuous lines and the labels are at the positions of the direct NH-CH cross-peaks. The upper NOESY spectrum in (b) was recorded in H2O and the lowerone, showing mainly aromatic to aliphatic connectivities, in DjO. Some d ^ ; , / + 1) and daN(i,i + 3) NOESY connectivities indicated by continuous ( )and dashed ( ) lines, respectively, and the peaks are labelled by residue number at the positions of the NH(i)—C"H(i) intraresidue NOESY cross-peaks.Also indicated in the H2O spectrum are some long range NH(0-C rH(/) NOEs and some d ^ i . i + 1) connectivities with the peaks labelled by residuesnumber followed by the letter /3 at the position of the NH(Q—C"H(i) intraresidue NOESY cross-peaks. Long range interresidue NOEs from the C^H andC 3H protons of Trp 44 and the C*H and CH protons of Tyr 13 are indicated in the D2O spectrum. In (c) the set]uence of dNN{/,i + 1) NOE connectivitiesfrom residues 8 to 19 and 25 to 29 is indicated by continuous ( ) and dashed ( ) lines, respectively. There are two sets of NH peaks for residues25-29 corresponding to phoratoxin A and B, which are present in a ratio of approximately 5:1. Experimental conditions: 8.6 mM phoratoxin in either 90%H2O/10% D2O or 99.96% D2O at pH 3.1 and 25°C.

hand, are characterized by strong C°H(0-NH(/ +1) NOEs andthe absence of other short range NOEs involving the NH andC°H protons. /3-sheets can be identified and aligned from in-terstrand NOEs involving the NH and C°H protons. A summaryof the expected NOEs for different types of regular secondarystructure elements is given in Figure 5, and an example of theapplication of this approach for the small protein al-purothioninis shown in Figure 6. It should also be pointed out that the iden-tification of secondary structure elements can be aided by NHexchange data (i.e. slowly exchanging NH protons are usuallyinvolved in hydrogen bonds) and 3JHNO coupling constant data.

In assessing the accuracy of the secondary structure deducedusing this approach several factors should be borne in mind.Essentially, it is a data base approach in so far that the expectedpatterns of short range NOE connectivities for the different secon-dary structure elements have been derived by examining thevalues of all the short range distance involving the NH, CaHand C^H protons in regular secondary structure elements pre-sent in protein X-ray structures. Thus, it tends to perform rela-tively poorly in regions of irregular structure such as loops. In

helix1 2 34 567

strand1 2 3 4 56

turn I12 3 4

turn II half-turn12 34 12 34

daN0.i.1)

daN(,,i.3) -=^=~

daNd,w2)

dNN(i.i.2)3 J H N (Hz) U U U l 9 9 9 9 9 9 9 1.9 I 9

Fig. 5. Diagram of the pattern of short range NOEs involving the NH andC°H protons observed in different types of secondary structure. The relativeintensities of the NOEs are represented by the thickness of the lines.

addition, the exact start and end of helices tend to be rather ill-defined, particularly as the pattern of NOEs for turns is not alltoo dissimilar from that present in helices. Thus, a turn at the endof a helix could be misinterpreted as still being part of the helix.In the case of/3-sheets, the definition of the start and end is more

279

Page 6: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.Clore and A.M.Gronenboni

1 5 10 15 20 25 30 35 M) 15K S C C R S T L G R N C T N L C R A R G A Q K L C A G V C R C K I S S G L S C P K G F P K

NH exchange

dasftl.1)dNNd.i.2)dNNai.3)da Nai.2)

dapOu-3)

2* structure ~vs^ -

SI HA HB S2

Fig. 6. Summary of the short range interresidue NOEs involving the NH, C°H and C^H protons observed for al-purothionin together with the secondarystructure deduced from them (Clore el al., 1987h). The relative intensities of the NOEs are indicated by the thickness of the lines. NH protons that are stillpresent after dissolving the protein in D2O are indicated by filled circles, and apparent values of 3JHNa < 6 Hz, 6 Hz < 3JHNa < 9 Hz and 3JHN<I > 9 Hzare indicated by the symbols O , • and • , respectively. The symbols for the secondary structure elements are as follows: HA and HB are helices, SI andS2 are small strands which form a mini-antiparallel /3-sheet, and T are turns.

accurate as the alignment is accomplished from the interstrandNOEs involving the NH and CaH protons.

Assignment of long range (|j—j\ > 5 ) NOEs

Before the tertiary structure of a protein can be determined, itis necessary to identify as many long range (i.e. tertiary) NOEsas possible, as these are essential for determining the polypep-tide fold of the protein. Once complete assignments are available,many such long range NOEs can be identified in a straightfor-ward manner. It is usually the case, however, that the assign-ment of a number of long range NOE cross-peaks remainsambiguous due to resonance overlap. In some cases, such am-biguities can be readily resolved using model building on the basisof the available data (i.e. the secondary structure and the assign-ed long range NOEs) to derive a low resolution structure.

Tertiary structure determination

A number of different approaches can be used to determine thethree-dimensional structure of a protein from n.m.r. experimentaldata.

The simplest approach, at least conceptually, is model building.This can be carried out either with real models or by means ofinteractive molecular graphics. It suffers, however, from thedisadvantage that it is unable to provide an unbiased measureof the size of conformational space consistent with the n.m.r.data. Nevertheless, model building can play an important rolein the early stages of a structure determination, particularly withrespect to resolving ambiguities in the assignments of some ofthe long range NOEs.

Another approach that will no doubt become increasingly us-ed to generate low resolution structures is based on the elegantmethod proposed by Kraulis and Jones (1987) which combinesthe NOE data with known substructures taken from thecrystallographic protein data bank. The underlying philosophyof this approach is the notion that most local structural featuresare already well represented by existing protein structures. Themethod relies on only short range NOEs involving the NH, C°Hand C H protons. The data base is composed of a series ofdistance matrices (comprising nine combinations of distances in-volving the NH, C'H and C^H protons) generated from the pro-tein X-ray structures in the protein data bank. A zone in thesequence of the protein under investigation is then chosen of fiveto eight residues in length and a set of mini distance matricesis generated from the NOE data. These mini-matrices are then

simultaneously slid along the diagonals of the precalculateddistance matrices in the data base until a best fit(s) is obtainedand the coordinates of the associated known fragment(s) are ex-tracted and saved. This procedure is repeated for overlappingfragments of the sequence and the entire protein structure is thenbuilt by superimposing sequentially overlapping parts. The resultis a polyalanine representation of the protein. Because the methoddoes not make use of any long range NOEs, it can be appliedat an early stage in the investigation. Long range NOEs whichhave been unambiguously assigned can be used to confirm thegeneral correctness of the global fold. In addition, the structure's)can be used as an aid to resolving ambiguities in the assignmentsof other long range NOEs.

The other methods used to generate structures from n.m.r. dataare not based on a data base approach. There are three of these.They all have very large radii of convergence and provide anunbiased and reliable measure of the size of the conformationalspace consistent with the n.m.r. data. These are metric matrixdistance geometry algorithms (Kuntz et al., 1976, 1979; Crip-pen, 1977; Crippen and Havel, 1978; Wako and Scheraga, 1982;Havel et al. 1983; Havel and Wutherich , 1984, 1985; Sippl andScheraga, 1986), restrained least square minimization in torsionangle space with a variable target function (Braun and Go, 1985),and restrained molecular dynamics (Clore et al., 1985, 1986b;Briinger et al., 1986). Of the three, metric matrix distancegeometry calculations do not require an initial structure as allthe calculations are carried out in n-dimensional distance space.In the case of the latter two methods, initial structures are re-quired. These can be (i) random structures, (ii) structures thatare very far from the final structure (e.g. a completely extendedstrand), or (iii) structures generated by the metric matrix distancegeometry calculations. They should not, however, comprisestructures derived by model building as this inevitably biases thefinal outcome. A flow chart of the calculational strategy that wehave used in solving protein structures in our laboratory is shownin Figure 7. All three methods are comparable in convergencepower. In general, however, the structures generated by restrain-ed molecular dynamics are better in energetic terms than the struc-tures generated by the other two methods, particularly withrespect to the non-bonded interactions. It is our view, therefore,that all converged structures should be subjected to refinementby restrained molecular dynamics. In our experience not onlydoes this result in large improvements in the non-bonded con-tacts but it also improves significantly the agreement with theexperimental n.m.r. data.

280

Page 7: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Determination of 3-D protein structure by n.m.r.

NMR assignmentsNOE restraints

<t> restraints

Initial structuresA,B,C,

Fig. 7. Flowchart summarizing the strategy that is used in our laboratory forthe determination of three-dimensional structures of proteins on the basis ofn m.r. data. DG, metric matrix distance geometry (program DISGEO);RLST, restrained least squares minimization in torsion angle space with avariable target function (DISMAN); RD, restrained molecular dynamics.

The metric matrix distance geometry methods are based sole-ly on the use of distance and planarity restraints comprising in-terproton distance restraints, and distances derived from torsionangle restraints, bond lengths, bond angles, planes and soft vander Waal's repulsion terms. The most widely used program todate with respect to n.m.r. applications is DISGEO (Havel,1986). The calculations generally proceed in four phases. In thefirst phase a complete set of bounds on the distances betweenall atoms of the molecule is determined by triangulation fromthe NOE interproton distance restraints and the distance andplanarity restraints derived from the primary structure. The lat-ter consist of assumed exact distances between all covalentlybonded and geminal pairs of atoms, as well as lower limits onthe distances between all atoms more than three bonds apart(which are assumed to be no larger than the sum of the atomhard sphere radii). In the second phase a set of substructures isembedded, consistent with the bounds corresponding to distancesbetween a subset of all the atoms. In protein applications sucha suitable subset would comprise the main chain C, C°, N andC°H atoms as well as all non-terminal C" and C 7 atoms. Thisis followed by the third phase in which a set of initial structureswhich approximately fit all the data is computed. This involveschoosing approximate distances at random within the trianglelimits between all pairs of atoms not in the substructures, giventhe distances between all atoms in the substructures. This pro-cedure is known as metrization and is the most time consuming

part of the calculation. The distances are then converted into carte-sian coordinates by projection from n dimensional distance spaceinto cartesian coordinate space, and the resulting coordinates aresubjected to restrained least squares refinement with respect toall the distances in the final phase.

Like the metric matrix distance geometry methods, restrainedleast squares refinement in torsion space relies on distancerestraints alone. The bond lengths and angles are kept fixed dur-ing the minimization and only the torsion angles are varied. Whatdistinguishes this method from other minimization methods isthe use of a variable target function such that interproton distancesinvolving residues further and further apart in the sequence aregradually incorporated into the target function. At present, theonly program implementing this method is DISMAN (Braun andGo, 1985).

Unlike the other two methods, restrained molecular dynamicsalso includes full energetic considerations during the entire courseof the structure determination. It involves the simultaneous solu-tion of the classical equations of motion for all atoms in the systemfor a suitable time period (McCammon et al., 1977, 1979;Karplus and McCammon, 1983) with the interproton distancerestraints (and <j> backbone torsion angle restraints if available)incorporated into the total energy function of the system in theform of effective potentials (Levitt, 1983; Clore et al, 1985,1986b; Kaptein et al., 1985; Nilsson et al., 1986; Brunger etal., 1986). The power of the method lies in its ability to over-come local energy barriers and reliably locate the global minimumregion, and it is similar in spirit to the simulated annealing pro-cess of non-linear optimization used in electronic circuit designand pattern recognition. There are a number of moleculardynamics programs available such as CHARMM (Brooks et al.,1983), AMBER (Weiner and Kolhnan, 1981), GROMOS (vanGunsteren and Berendsen, 1982), DISCOVER (Hagler, unpub-lished data) and XPLOR (Brunger, unpublished data). The onethat is used in our laboratory is XPLOR which was originallyderived from CHARMM and is especially adapted for the needsof restrained molecular dynamics.

Quality of the structures generated from n.m.r. dataThere are two crucial questions regarding structures determinedby n.m.r.: namely, how unique are they and how accurately havethey been determined? To answer these questions it is essentialto calculate a number of structures and to examine the degreeof convergence. In the case of the metric matrix distancegeometry calculations this is done by using different randomnumber seeds for the computation of the distance matrices; inthe case of the restrained least squares refinement in torsion spacewith a variable target function, by using random starting struc-tures; finally, in the case of the restrained molecular dynamicscalculations by either using random starting structures or by us-ing a single initial structure which is very far from the final struc-ture (such as an extended /3-strand) and different random numberseeds for the assignment of the initial velocities. If such a seriesof calculations results in a number of different structural types,all of which satisfy the experimental data within their error limits,then the information content of the experimental data can bedeemed insufficient to determine the three-dimensional structureof the protein. Conversely, if convergence to a 'unique' struc-tural set satisfying the experimental data is achieved, then onecan be confident that a realistic and accurate picture of the ac-tual solution structure has been obtained and that the region ofconformational space occupied by the global energy minimumhas been located. Finally, if convergence does not occur and,

281

Page 8: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.Ctore and A.M.Gronenborn

in addition, the experimental restraints are not satisfied, it is likelythat the restraints contain errors, for example due to the incor-rect assignment of one or more long range NOESY cross-peaks.

The accuracy of the method is best judged from model calcula-tions in which a set of interproton distances that could berealistically obtained from n.m.r. measurements has been derivedfrom X-ray structures and then used to compute a set of struc-tures using one of the above methods. The DISGEO programhas been tested on basic pancreatic trypsin inhibitor (BPTI; Haveland Wuthrich, 1985) and crambin (Clore et al., 1987f), theDISMAN program on BPTI (Braun and Go, 1985), and therestrained molecular dynamics approach on crambin (Brungeret al. 1986; Clore et al., 1986b). In all cases the overall shape,size and folding of the polypeptide chain are reasonably wellreproduced. The conformational space sampled by the methodsappears to be approximately comparable, although somewhatlarger for both DISMAN and restrained molecular dynamics thanfor DISGEO. In the case of DISGEO and DISMAN the struc-tures tend to be slightly expanded relative to the X-ray struc-ture, quite large deformations in the local backbone structuresare apparent, and the non-bonded contacts are poor. In contrast,the structures produced by restrained molecular dynamics tendto be slightly compressed relative to the X-ray structures, thelocal structure tends to be better reproduced and the non-bondedcontacts are good. Subjecting the structures obtained by DISGEOor DISMAN to restrained molecular dynamics refinement resultsin structures of the same quality as that obtained using restrain-ed molecular dynamics from the beginning of the calculations,and indeed this may be the strategy of choice, given that therestrained molecular dynamics calculations are more time con-suming than either the DISGEO or DISMAN ones.

In quantitative terms, the average atomic rms difference bet-ween the calculated structures and the corresponding X-ray struc-ture is 1.5-2.5 A for the backbone atoms and 2.0-3.0 A forall atoms. Significant improvements, however, can be obtainedby averaging the coordinates of the calculated structures, bestfitted to each other. In the case of the restrained moleculardynamics crambin structures this resulted in a structure closerto the X-ray structure than any of the individual restrainedmolecular dynamics structures and reduced the atomic rms dif-ference from 1.8 ± 0.3 A to 1.0 A for the backbone atomsand from 2.3 ±1 0.3 A to 1.6 A for all atoms (Clore et al.,1986b).

The average structure itself has no physical significance in sofar that it does not represent a structure that actually exists andfurther is extremely poor in energetic terms. The concept of anaverage structure, however, is very useful. It represents the meanstructure about which the individual structures are randomlydistributed, and can be characterized in terms of the rms errorin its coordinates (simply given by -rmsd/V", where rmsd isthe average atomic rms difference between the n structures andthe average structure). In addition, it provides a reference pointfor measuring the atomic distribution of the individual structures.

A structure with physical significance that is closer to theaverage structure than any of the individual structures and isenergetically comparable to the individual structures, can begenerated from the average structure by restrained energyminimization (Clore et al., 1986b). Care, however, has to betaken when doing this as the non-bonding energy of the averagestructure is very high. Consequently, it is essential to increasethe van der Waal's radii slowly from about a quarter of theirusual values to their full values during the course of the calcula-tions (Clore et al., 1986b). In the case of crambin, this resulted

in only a small increase in the atomic rms difference with respectto the X-ray structure (1.2 A and 1.9 A for the backbone atomsand all atoms, respectively,, ;n the case of the structure derivedfrom the restrained molecular dynamics calculations; Clore etal., 1986b). A best fit superposition of the individual restrainedmolecular dynamics crambin structures together with a best fitsuperposition of the restrained energy minimized average struc-ture and the X-ray structure is shown in Figure 8.

A further test of the quality of the structures that can be ob-tained from n.m.r. interproton distance data as well as theusefulness of the concept of the average structure comes froma study in which the n.m.r. structures were used to solve theX-ray structure of crambin directly by molecular replacement(Brunger et al., 1987). It was shown that the correct solutionof the translation and rotation functions of the Patterson searchcould only be obtained using either the restrained energy minimiz-ed average structure as a starting model or by averaging the in-dividual rotation and translation functions obtained using theindividual restrained dynamics structures as starting models. Itwas also shown that such starting models could be refined byconventional refinement techniques which reduced the R factorfor the restrained energy minimized average structure from 0.43at 4 A resolution to 0.27 at 2 A resolution without inclusionof water molecules. This compares to an R factor of 0.25 at 2A resolution for the published X-ray structure (Hendrickson andTeeter, 1981) when all water molecules are omitted in the com-putation of the calculated structure factors.

Comparison of solution and X-ray structures of globularproteinsTo date there are three globular proteins whose structures havebeen solved both by X-ray crystallography and by n.m.r. spec-troscopy and where a detailed comparison of the structures ob-tained by the two methods has been carried out: namely, basicpancreatic trypsin inhibitor (BPTI; Wagner et al., 1987), potatocarboxypeptidase inhibitor (CPI; Clore et al., 1987d) and barleyserine proteinase inhibitor 2 (BSPI-2; Clore et al., 1987e,g). Theresults obtained provide a yardstick to assess the extent to whichdifferences between the 'solution' and X-ray structures arise (i)from genuine differences as reflected in differences between theexperimental interproton distance restraints and a correspondingset of X-ray-derived restraints on the one hand, and (ii) frominadequacies in the input data used to determine the 'solution'structures as reflected by the limitations of the experimental data.

In all three cases there appear to be some genuine differencesbetween the solution and X-ray structures as manifested by con-siderably larger interproton distance deviations between thecalculated and experimental distances for the X-ray structuresthan for the computed 'solution' structures. In the case of CPIand BSPI-2, these differences are clearly of a minor nature inso far that most of the deviations can be corrected by restrainedenergy minimization of the X-ray structures with only minoratomic rms shifts (of the order of ~0.5 A). Nevertheless, a fewinterproton distance deviations larger than 0.5 A still remainafter this procedure, and reflect more substantial differences.Thus, in the case of CPI there are two regions where the backboneatomic rms differences are significant. These areas are readilyappreciated from the best fit superposition shown in Figure 9.The first region comprises the segment from residues 18-20which are part of a helix extending from residues 16-21 in the'solution' structures, whereas in the X-ray structure this regionis slightly more distorted. The second region comprises the turn

282

Page 9: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Determination of 3-D protein structure by n.m.r.

Fig. 8. Comparison of the X-ray structure of crambin with structures obtained by restrained molecular dynamics using interproton distance restraints (Clore etal., 1986b). (a) Best fit superposition of the backbone (N.C.C) atoms of five restrained molecular dynamics structures; (b) best fit superposition of thebackbone (N,C°,C,O) atoms of the restrained energy minimized average structure derived from the individual restrained dynamics structures (thick lines) andthe X-ray structure (thin lines). Four of the restrained dynamics structures were calculated starting from a completely extended a-strand, and the fifth structurefrom an extended mixed helix/strand structure (with residues 7-19 and 23-30 in the form of helices and the others in the form of extended |3-strands). Thecalculations were based on a total of 240 approximate interproton distance restraints derived from the X-ray structure. They comprised 159 short (\i-j\ £5)and 56 long (\i-j\ >5) range interresidue distances and 25 intraresidue distances. All the distance restraints were S 4 A . The X-ray structure was solved byHendrickson and Teeter (1981).

formed by residues 28-31 which has a slightly different orien-tation with respect to the rest of the protein in the 'solution' andX-ray structures. In addition to such differences, there are oftenquite large differences at the N- and C-terminal ends. These,however, cannot in general be regarded as significant as in mostcases the residues at the N- and C-termini are poorly determin-ed by the n.m.r. data (cf. Figure 9a). Rather, they simply reflectthe paucity of experimental restraints at both ends of the polypep-tide chain, most likely due to a larger degree of flexibility.

In quantitative terms the X-ray and computer 'solution' struc-tures are very similar. Thus the backbone atomic rms differencebetween the X-ray structure and the average restrained energyminimized structure is 1.3 A for CPI (Clore et al., 1987d) and1.5 A for BSPI-2 (Clore et al., 1987g). The corresponding valuesfor all atoms are a little larger (2.1 A and 2.4 A, respectively).

In addition to the above three proteins, another protein has alsobeen solved by n.m.r. and X-ray crystallography, namely thea-amylase inhibitor tendamistat (Kline et al., 1986; Pflugrath

283

Page 10: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.Clore and A.M.Gronenborn

Fig. 9. Comparison of the solution and X-ray structures of potato carboxypeptidase inhibitor (Clore el al., 1987d). (a) Best fit superposition of the backbone(N,C°,Q atoms of the eleven converged 'solution' structures; (b) and (c) best fit superposition (residues 2-38) of the restrained energy minimized averagestructure derived from the 11 individual structures (thick lines) and the X-ray structure (thin lines) A smoothed backbone (N,C*,C) atom representation withthe location of the disulphide bridges indicated by lines joining the appropriate C° atoms is shown in (b), and all the backbone (N.C.C.O) atoms are shownin (c). The 'solution' structures were computed using a combination of metric matrix distance geometry and restrained molecular dynamics on the basis of 309NOE restraints. The X-ray structure was solved by Rees and Lipscomb (1982).

284

Page 11: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Determination of 3-D protein structure by n.m.r.

Fig. 10. Comparison of the solution structure of phoratoxin (thick lines) with the X-ray structure of crambin (thin lines) (Clore et al., 1987a). The 'solution'structure shown is the restrained energy minimized average structure derived from eight independently calculated structures using a combination of metricmatrix distance geometry and restrained molecular dynamics on the basis of 331 NOE restraints. All backbone (N.C.C.O) atoms are shown. The X-raystructure of crambin was solved by Hendrickson and Teeter (1981).

et al., 1986). No detailed quantitative comparison, however, hasbeen presented as yet. Nevertheless, it would appear that the con-clusions drawn above will also apply to this protein.

Comparison of solution and X-ray structures of non-globularproteins and polypeptidesConsidering non-globular proteins and polypeptides there are twocases available where the structures have been solved by n.m.r.and X-ray crystallography, namely glucagon (Braun et al., 1983;Sazaki et al., 1975) and metallothionein (Braun et al., 1986;Furey et al., 1986). In both cases there large and significant dif-ferences. Thus, the X-ray structure of glucagon is almost en-tirely helical, whereas the structure of micelle-bound glucagondetermined by n.m.r. is composed of an irregular strand (residues5-10) followed by two helices (residues 10-13 and 17-29)connected by a half-turn (residues 14—16). In the case ofmetallothionein, a two domain structure is found by both techni-ques and the polypeptide fold is approximately similar, but thereis severe disagreement over the ligands for the seven metal ions.The coordinating amino acids for the seven Cd ions were unam-biguously determined in solution by identifying scalar through-bond couplings between the Cys C^H protons and the n.m.r. ac-

tive (I = 1/2) l u Cd z + ions using heteronuclear 'H- luCdCOSY spectroscopy (Frey et al., 1985). Thus, it is not unlikelythat selective crystallization of a minor species might have oc-curred under the experimental conditions employed.

It would therefore appear that for non-globular proteins par-ticular care should be taken in deriving structure—function rela-tionships from the results of X-ray analyses alone as the structurein the crystal state may not represent the major species in solution.

Solution structures of proteins for which the X-ray structureof a related protein existsThe structure of two proteins, al-purothionin (Clore et al. 1986a)and phoratoxin (Clore et al., 1987a), have been solved in solu-tion and compared with the X-ray structure of the related pro-tein crambin (Henderickson and Teeter, 1981). The sequence

homology between these three proteins is quite high:al-purothionin and phoratoxin exhibit sequence homologies of33% and 39%, respectively, with respect to crambin, and 46%with respect to each other. As expected from this degree ofhomology, the computed solution structures of al-purothioninand phoratoxin are close to that of the X-ray structure of cram-bin, as well as to each other. A best fit superposition of thephoratoxin solution and crambin X-ray structures is shown inFigure 10. Considering only the average restrained energyminimized structures, the backbone atomic rms differences withrespect to crambin are 2.6 A and 1.6 A for al-purothionin andphoratoxin, respectively. It is interesting to note that theal-purothionin and phoratoxin structures are closer to crambinthan to each other (backbone atomic rms differnces of 3.1 A),although their amino acid sequences are more homologous to eachother than to crambin. Given that the experimental restraints us-ed to determine the structures of al-purothionin and phoratoxinwere similar both in quality and quantity, the larger atomic rmsdifference between al-purothionin and crambin may reflect thedeletion of residue 24 and the presence of an extra disulphidebridge between residues 12 and 29 in al-purothionin.

Another example is provided by the proteinase inhibitor IIAfrom bull seminal plasma (BUSI; Williamson et al., 1985). Theoverall globular fold of BUSI is very close to those of the thirddomain of Japanese quail ovomucoid (OMJPQ3; Papapmakos etal., 1982) and pancreatic secretory trypsin inhibitor (PSTI;Bolognesi et al., 1982), consistent with sequence homologies of45% and 26%, respectively. A quantitative comparison was car-ried out for the C-terminal segment by superimposing residues23-57 of BUSI and residues 22-56 of the two other inhibitors(the difference in alignment arising from a deletion). The C"atomic rms difference between BUSI on the one hand andOMJPQ3 and PSTI on the other is 1.9 A and 2.2 A, respec-tively, for this segment, which compares to a C° atomic rms dif-ference of 1.3 A between the two X-ray structures.

Finally, there are two other examples of solution structureswhere the polypeptide fold appears similar to a related X-raystructure but where no quantitative comparison has been

285

Page 12: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.Clore and A.M.Gronenborn

mmm

Fig. 11. Smoothed backbone (N.C.C) atom representation of the solution structure of the globular domain of histone H5 (Clore et al., 1987c). (a)Superposition of the core residues (3—18, 22—34, 37—60 and 71 -79) of the eight independently calculated structures (blue) on the restrained energyminimized average structure (red) derived from them, (b) Distribution of charged (red), polar (lilac) and hydrophobic (yellow) residues of the 'core' region ofthe restrained energy minimized average structure. In both (a) and (b) the two loops (residues 19—22 and 61 -70) are shown as dashed lines as, althoughthey are reasonably well defined locally, their orientation with respect to the 'core' residues could not be determined owing to the absence of any long(l '~y | >5) NOEs between the loop and 'core' residues. There are, in addition, two ill-defined regions (residues 1 - 2 and 35—36) which are also shown asdashed lines.

presented: namely, the 'short' scorpion insectotoxin I5A(Arseniev et al., 1984) which is similar to the helix and an-tiparallel /3-sheet fragment in the crystal structure of the 'long'scorpion toxin v-3 (Fonticella-Camps et al., 1982), and lacrepressor headpiece (Kaptein et al., 1985) which is similar tothe central portion (helices 2—4) of the X-ray structure of thesequence related XCI repressor (Pabo and Lewis, 1982).

Solution structures of proteins for which there is no crystalstructure

From the model calculations as well as from the results onglobular proteins with known X-ray structures, it would appearthat n.m.r. can be used to determine reliably the three-

dimensional structures of globular proteins within certain welldefined limits. The accuracy of the coordinates are clearlynowhere near as high as those determined by X-raycrystallography (typically <0.3 A at a resolution of 2.5 A orbetter), and the quality of the structure depends crucially on thenumber and type of experimental restraints and this can only beassessed by calculating a number of structures and examiningtheir atomic rms distribution. This is extremely important as insome cases the experimental restraints, although sufficient todetermine the approximate overall polypeptide fold, may not besufficient to determine the structure completely, such that the lackof experimental information produces variable conformations forcertain parts of the protein. Two classes of ill-defined regionscan occur. The ill-defined region can either be fully disordered,

286

Page 13: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

Determination of 3-D protein structure by n.m.r.

or it may be locally well-defined but globally ill-defined suchthat the position of a whole group of residues varies from struc-ture to structure with respect to the remainder of the protein.The structure of hirudin (Clore et al., 1987b) provides a typicalexample of the latter case. Hirudin is a 63-residue protein whichhas a well defined core and two domains consisting of a fingerof antiparallel /3-sheet (residues 31-36) and an exposed loop(residue 47-55). The two minor domains are locally well definedbut their relative positions with respect to the central core couldnot be determined as no long range (i.e. \i—j\ >5)NOEs couldbe detected between the two minor domains and the core.

Examples of other structures determined by n.m.r., for whichno X-ray structure exists as yet, are the globular domain of histoneH5 (Clore et al., 1987c) and human epidermal growth factor(Cooke et al., 1987).

Limitations of the n.m.r. methodIt is clear from the above discussion that n.m.r. is a powerfulmethod of structure determination in solution, but what are itslimitations? At present it is limited to proteins of mol. wt :£12 000. Indeed, the largest protein whose three-dimensionalstructure has been determined by n.m.r. is the globular domainof histone H5 (GH5) at 79 residues (Clore et al., 1987c). Astereoview of the structure of GH5 is shown in Figure 11. Thismolecular weight limit will no doubt be raised with the introduc-tion of ever more powerful magnets and with the developmentof new n.m.r. experiments, for example three-dimensional n.m.r.In addition, n.m.r. is limited to non-aggregating solutions andrequires a high concentration of material (several millimolar).X-ray cystallography also has its limitations, the most obviousbeing the requirement of suitable crystals which diffract to highresolution and of heavy atom derivatives to solve the phase pro-blem, hi practice, therefore, it is likely that there will be relativelyfew proteins that will be amenable to both n.m.r. and X-raycystallography. In such cases, the information afforded by n.m.r.and cystallography is clearly complementary, and the structuresobtained by n.m.r. could potentially be used to solve the X-raystructures directly by molecular replacement.

AcknowledgementsThe work from the authors laboratory was supported by the Max-PlankGcsellschaft, Grant No. 321/4O03/03198O9A from the Bundesministerium fflrForschung und Technologic and Grant No. Cl 86/1-1 from the DeutscheForschungsgemeinschaft.

ReferencesArseniev.S.A., Kondakov.V.I. Maiorov.V.N. and Bystrov.V.F. (1984) FEBS

Lett., 165, 65-62.Aue.W.P., Batholdi,E. and Emst,R.R. (1976)7. Chem. Phys., 64, 2229-2246.Bolognesi,M., GaJti.G., Menegatu'.E., Guameri.M., Marquart,M., Papamakos.E.

and Huber,R. (1982) 7. Mol. Bioi, 162, 839-868.Bolton.P.H. and Bondenhausen.G. (1982) Chem. Phys. Lett., 89, 139-144.Braun.W. and Go,N. (1985) 7. Mol. Biol., 186, 611-626.Braun.W., Wider.G., Lec.K.H. and WQthrich.K. (1983) 7. Mol. Biol., 169,

921-948.Braun.W., Wagner.G., Worgotter.E., Vasak.M., KagU.H.R. and WQthrich.K.

(1986)7. Mol. Biol.. 169, 921-948.Brook,B.R., Bruccoleri.R.E., Olafson.B.D., Slates,D.J., Swaminathan,S. and

Karplus.M. (1983)7. Comput. Chem., 4, 187-217.Brown.L.R. (1984) 7. Magn. Reson., 57, 513-518.BrungerAT., Clore.G.M., Gronenbom.A.M. and Karplus.M. (1986) Proc. Nail.

Acad. Sci. USA., 83, 3801-3805.BrungerAT., CampbeU.R.L., Clore.G.M., GronenbomAM., Karplus.M., Pet-

zko.G.A. and Teeter.M.M. (1987) Science, 235, 1049-1053.

Clore.G.M. and Gronenborn.A.M. (1985)7. Magn. Reson., 61, 158-164.Clore.G.M., Gronenborn.A.M.. BrungerA-T. and Karplus.M. (1985) 7. Mol.

Biol., 186, 435-455.Clore.G.M., Nilges.M., Sukumaran.D.K., BrungerAT., Karplus.M. and

Gronenbom.A.M. (1986a) EMBOJ., 5, 2729-2735.Clore.G.M., Brunger.A.T., Karplus.M. and Gronenborn.A.M. (1986b) 7. Mol.

Biol., 191, 523-551.Clore.G.M., Martin.S.R. and Gronenborn.A.M. (1986c) 7. Mol. BioL, 191,

553-561.Clore.G.M., Sukumaran.D.K., Nilges.M. and Gronenborn.A.M. (1987a)

Biochemistry, 26, 1732-1745.Clore.G.M., Suloimaran.D.K., Nilges.M., Zarbock.J. and Gronenborn.A.M.

U987b) EMBOJ., 6, 529-537.Clore.G.M., Gronenbom.A.M., Nilges.M. Sukumaran.D.K. and Zarbock, J.

(1987c) EMBOJ., 6, 1833-1842.Oore.G.M., GronenbomAM., Nilges.M., and Ryan.C.A. (1987d) Biochemistry,

in press.Clore.G.M., Gronenbom.A.M., Kjaer.M. and Poulsen.F.M. (1987e) Protein

Engineering, 1, 305-311.Clore.G.M., Gronenborn.A.M., Brunger.A.T., Karplus.M. and Gronen-

bom,A.M. (19870 FEBS Lett., 213, 269-277.Clore.G.M., GronenbomAM., James.M.N.G., Kjaer.M., McPhalen.C.A. and

Poulsen.F. (1987g) Protein Engineering, 1, 313-318.Clore.G.M., Sukumaran.D.K., Gronenbom.A.M., Teeter.M.M., Whitlow.M.

and Jones.B.L. (1987h) 7. Mol. Biol., 193, 571-578.Cooke.R.M., Willdnson.A.J., Baron,M., Pastorc.A., Tappin.M.J., Camp-

bell.I.D., Gregory,H. and Sheard.B. (1987) Nature, 327, 339-341.Crippen.G.M. (1977) 7. Comp. Phys., 24, 96-107.Crippen.G.M. and Havel.T.F. (1975) Ada Crystaliogr., A3A, 282-284.Davis.D.G. and Bax.A. (1985) 7. Am. Chem Soc., 107, 2821-2822.Dobson.C.M., Otejniczak,E.T. Poulsen.F.M. and Ratdiffe.R.G. (1982)7. Magn.

Reson., 48, 97-110.Ernst,R.R., Bodenhausen.G. and Wokaun.A. (1986) Principles of Nuclear

Magnetic Resonance in One and Two Dimensions. Clarendon Press, Oxford.Fontecilla-CampsJ.C, Almassy.R.J., Suddath.F.L. and Bugg.C.e. (1982) Tox-

icon, 20, 1-7.Frey.M.H., Wagner.G., Vasak.M., Sprcnsen.O.W., Neuhaus.D., Kagi.J.H.R.,

Ernst.R.R. and WQthrich.K. (1985) 7. Am Chem Soc., 107, 6847-6851.Furey.W.F., Robbins.A.H., Clancy.L.L., Wunge.D.R., Wang.B.C. and

Stroud.C.D. (1986) Science, 231, 704-709.Havel,T.F. (1986) D1SGEO, Quantum Chemistry Program Exchange Program

No. 507, Indiana University.Havel.T.F. and WuthrichJC. (1984) Bull. Math. Biol., 45, 673-698.Havel.T.F. and Wuthrich.K. (1985) 7. Mol. Biol., 182, 281-294.Havel.T.F., Kuntz.I.D. and Crippen.G.M. (1983) Bull. Math. Biol, 45,

665-720.Henderickson.W.A. and Teeter.M.M. (1981) Nature, 290, 107-112.Kapstein.R., Zuiderweg.E.R.P., Scheek.R.M., Boelens.R. and van

Gunsteren.W.F. (1985)7. Mol Biol, 182, 179-128.Karplus.M. (1963)7. Am Chem. Soc., 85, 2870-2871.Karplus.M. and McCammonJ.A. (1983) Annu. Rev. Biochem, 52, 263-300.Kline.A.D., Braun.W. and WQthrich.K. (1986) 7. Mol. Biol, 183, 503-507.Kraulis.PJ. and Jones.T.A. (1987) In Ehrenberg.A., Rigler.R., Graslund.A. and

Nilsson.L., (eds) Structure, Dynamics and Function of Biomolecules. Springer-Verlag, Berlin, pp. 118-121.

Kuntz.I.D., Crippen.G.M., Kollman.P.A. and Kimmelman.D. (1976)7. Mol.Biol., 106, 983-994.

Kuntz.I.D., Crippen.G.M. and Kollman.P.A. (1979) Biopotymers, 18, 939-957.Levitt.M. (1983) 7. Mol. Biol., 170, 723-764.McCammonJ.A., Gelin.B.R. and Karplus.M. (1977) Nature, 267, 585-590.McCammon,J.A., Wolynes.P.G. and Karplus.M. (1979) Biochemistry, 18,

927-942.Neuhaus.D., Wagner.G., Vasak.M., KagU.H.R. and Wuthrich.K. (1985) Eur.

7. Biochem. 151, 257-273.Nilsson.L., Clore.G.M., Gronenbom.A.M., BrungerAT. and Karplus.M. (1986)

7. Mol. Biol, 188, 455-475.Noggle J.H. and Schirmer.R.E. (1971) The Nuclear Overhauser Effect - Chemical

Applications. Academic Press, New York.Pabo.C.O. and Lewis.M. (1982) Nature, 298, 443-447.Papamakos.E., Weber.E., Bode.W., Huber.R., Empie.M.W., Kato.I. and

Laskowski.M. (1982)7. Mol. Biol, 158, 515-537.Pardi.A., Billeter.M. and WQthrich.K. (1984)7. Mol. Biol, 180, 741-751.PflugrauV., Wiegand.E., Huber.R. and Vertesy.L. (1986) 7. Mol. Biol, 189,

383-386.Ranee,M., S0renson,O.W., Bodenhausen.G., Wagner.G., Emst.R.R. and

WQthrich.K. (1983) Biochem Biophys. Res. Commun., 117, 479-485.Rees.D.C. and Lipscomb.W.N. (1982) 7. Mol. Biol. 160, 475-498.

287

Page 14: REVIEW Determination of three-dimensional structures of … · 2017-10-17 · Protein Engineering vol.1 no.4 pp.275-288, 1987 REVIEW Determination of three-dimensional structures

G.M.CIore and A.M.Gronenbom

Sazaki.K., Dockevill.S., Ackmiak.D.A., Tickle.I.J. and Bluitdcll.T.L. (1975)Nature, 257, 751-757.

Sippl.M.J. and Scheraga.H.A. (1986) Proc. Noll. Acad. Sci. USA, 83,2283-2287.

van Gunstercn.W.F. and Berendsen.H.J.C. (1982) Biochem. Soc. Trans., 10,301-305.

Wagner.G. and WOthrich.K. (1979) J. Magn. Reson., 33, 675-680.Wagner.G. Neuhaus.D., Worgotter.E., Vasalc.M., KagiJ.R.H. and WQthridi.K.

(1986)/. Mol Bioi, 187, 131-135.Wagner.G., Braim.W., Havel.T.F., Schaumann.T., Go,N and WuthrichJC. (1987)

J. Mol. Biol., in press.Walco.H. and Scheraga.H.A. (1982)7. Prot. Chem., 1, 85-117.Weiner.P.K. and Kollman.P.A. (1981) J. Compul. Chem., 2, 287-303.Williamson.M.P., Havel.T.F. and Wuthrich.K. (1985) J. Mol. Biol., 182,

295-315.WQthrich.K. (1986) NMR of Proteins and Nucleic Acids. J. Wiley, New York.Wuthrich.K. Billeter.M. and Braun.W. (1984)/ Mol. Bio!., 180, 715-740.

288